Time |
Nickname |
Message |
00:03
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
00:06
🔗
|
|
dashcloud has joined #archiveteam |
00:31
🔗
|
|
_Crocatow has quit IRC (Read error: Connection reset by peer) |
00:31
🔗
|
|
_Crocatow has joined #archiveteam |
00:33
🔗
|
arkiver |
MrRadar: right. Thanks r3c0d3x! |
00:36
🔗
|
arkiver |
I'm off for the night |
00:36
🔗
|
arkiver |
Next up if fixing MyVIP, getting new items in fotolog and starting corbisimages and Experience Project |
00:41
🔗
|
|
JesseW has joined #archiveteam |
00:56
🔗
|
|
fmope has quit IRC (Remote host closed the connection) |
00:56
🔗
|
|
fmope has joined #archiveteam |
00:59
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
01:02
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
01:06
🔗
|
|
scottmodd has joined #archiveteam |
01:07
🔗
|
scottmodd |
Hi all, I wanted to see if there is someone here I can speak to regarding the Gamefront backup |
01:07
🔗
|
scottmodd |
http://tracker.archiveteam.org/gamefront/ |
01:08
🔗
|
MrRadar |
You just missed the guy to talk to, arkiver |
01:08
🔗
|
scottmodd |
Ahh, I saw you commenting on ModDB actually |
01:08
🔗
|
scottmodd |
what hours is he normally available? |
01:08
🔗
|
VADemon |
What do you need? There's still a chance we can help you |
01:09
🔗
|
scottmodd |
Well i've been speaking to defymedia about buying gamefront |
01:09
🔗
|
scottmodd |
disclaimer: I own moddb.com |
01:10
🔗
|
scottmodd |
anyhow, problem is they only want to sell the domains to keep the closure simple and protect users privacy etc (understandable) |
01:10
🔗
|
scottmodd |
So essentially what I was hoping to do, was should the deal go through, instead of the site going dark and URLs 100% failing, to somehow serve up at least the static HTML with links to archive.org should you want to download the file |
01:11
🔗
|
scottmodd |
and wanted to see if you have a big static HTML dump of gamefront that I could grab. Or use an API? |
01:11
🔗
|
xmc |
hm, kind of like what i did with gitorious.org ? |
01:12
🔗
|
MrRadar |
You can download the raw scrape data here: https://archive.org/details/archiveteam_gamefront |
01:12
🔗
|
MrRadar |
It's quite big |
01:12
🔗
|
VADemon |
xmc, previous real URL -> data in WARCs on archive.org. I would not know how to make it easily |
01:13
🔗
|
scottmodd |
yeah read-only essentially |
01:13
🔗
|
xmc |
hm, yeah, you'd need a proxy thinger |
01:13
🔗
|
xmc |
unless you could just rewrite to wayback urls |
01:13
🔗
|
MrRadar |
It's in a format called a Web ARChive which is a container for saving HTTP responses |
01:13
🔗
|
xmc |
web.archive.org/*/http://gamefront.whatever/theactualurl/iguess.zip |
01:13
🔗
|
MrRadar |
You could grab the associated CDX files (which are indexes of WARCs) and redirect any URL we saved to the IA |
01:14
🔗
|
xmc |
yea |
01:15
🔗
|
scottmodd |
There is no way to just get the static HTML? (ignoring the files which are obviously much larger) |
01:15
🔗
|
xmc |
the cdx files are pretty darn small |
01:15
🔗
|
xmc |
but no, it's probably an all or nothing thing |
01:15
🔗
|
MrRadar |
You could extract the HTML from the WARCs using the CDXs |
01:16
🔗
|
MrRadar |
But just redirecting to the IA is probably easier |
01:17
🔗
|
scottmodd |
I'm trying to avoid redirects if possible - to preserve the domains value as ideally we'd like to relaunch gamefront in some form. but dont want to lose the history |
01:17
🔗
|
xmc |
then download the files and host it |
01:18
🔗
|
xmc |
??? |
01:19
🔗
|
scottmodd |
well that is essentially what we are trying |
01:20
🔗
|
scottmodd |
Just thought i'd check if there was an easy API way or something similar, your team is doing awesome work |
01:20
🔗
|
xmc |
nah, there's not a good way to download partial warcs en masse without a ton of work |
01:20
🔗
|
VADemon |
You'd need to start with .cdx since they contain the metadata and then work your way through the actual big archives |
01:21
🔗
|
xmc |
i'd bet the pages are probably interspersed randomly with files |
01:21
🔗
|
scottmodd |
yeah looks like a ton of work |
01:22
🔗
|
MrRadar |
The good news with the IA is their servers support range requests so if you know the byte range you need (which the CDX files tell you) you won't need to download the whole WARC files |
01:23
🔗
|
scottmodd |
yeah i'll do some experimental coding |
01:23
🔗
|
scottmodd |
appreiate your help |
01:24
🔗
|
MrRadar |
Also, to answer your question from earlier, arkiver is on Eurpoean time |
01:27
🔗
|
scottmodd |
cheers, I assume he won't have an easier solution? |
01:27
🔗
|
MrRadar |
Probably not |
01:37
🔗
|
JesseW |
scottmodd: one important thing, if you are buying the domain name, is *please* avoid putting any robots.txt file on it. As long as that is done, the files should be available from the wayback machine, at least. |
01:37
🔗
|
scottmodd |
of course |
01:37
🔗
|
MrRadar |
As long as it allows the IA's bot I think it's OK to have a robots.txt |
01:38
🔗
|
JesseW |
MrRadar: there are bugs in IA's handling of robots.txt files -- at least in some circumstances, it seems to interpret the mere presence of any Disallow line as forbidding access. |
01:38
🔗
|
MrRadar |
Right, forgot about that |
01:39
🔗
|
JesseW |
scottmodd: There is also a tool available from the Internet Archive (I don't remember the exact name) that will redirect any URLs that would otherwise 404 to the wayback machine. That could be an easy solution, in that it would allow you to host new content, but make the old links still work. |
01:39
🔗
|
JesseW |
And you could gradually fill in the old links with local copies as time went on. |
01:40
🔗
|
ranma |
I'm annoyed one site used some redirect that prevented the php application from being backed up (minishowcase) |
01:41
🔗
|
JesseW |
ranma: say more? |
01:43
🔗
|
ranma |
I think the site www.minishowcase.net used a redirect in a link to download the zip file (minishowcase.net/?download) |
01:43
🔗
|
ranma |
I'll type more when I'm home |
02:02
🔗
|
|
scottmodd has quit IRC (Quit: Page closed) |
02:05
🔗
|
ranma |
so this site was hosting an opensource ajax gallery: https://web.archive.org/web/20100102033623/http://minishowcase.net/ |
02:05
🔗
|
ranma |
at one point they started charging for it https://web.archive.org/web/20130227055048/http://minishowcase.net/? |
02:06
🔗
|
ranma |
unfortunately, the way the zip file was originally linked, the IA wasn't able to back it up :< |
02:08
🔗
|
JesseW |
interesting. So it was under CC-BY-SA-2.5 |
02:10
🔗
|
JesseW |
ranma: it looks like there are various places that claim to have copies of it: https://duckduckgo.com/?q=minishowcase+v09b142&t=ffsb |
02:10
🔗
|
ranma |
i'm slightly suspicious of those x) |
02:11
🔗
|
ranma |
someone on github forked it, tho |
02:11
🔗
|
ranma |
so i trust that a bit more |
02:12
🔗
|
JesseW |
we should probably take this to -bs |
02:18
🔗
|
|
Stiletto has joined #archiveteam |
02:19
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
02:26
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
02:33
🔗
|
|
BartoCH has joined #archiveteam |
03:28
🔗
|
|
bwn_ has joined #archiveteam |
03:29
🔗
|
|
Medowar has quit IRC (Quit: Connection closed for inactivity) |
03:34
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
03:39
🔗
|
|
bwn_ has quit IRC (Ping timeout: 633 seconds) |
03:47
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
03:52
🔗
|
|
bwn has joined #archiveteam |
03:54
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
03:55
🔗
|
|
aMunster has quit IRC (Read error: Operation timed out) |
03:55
🔗
|
|
mhazinsk has quit IRC (Read error: Operation timed out) |
03:55
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
03:56
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
03:57
🔗
|
|
chazchaz has quit IRC (Read error: Operation timed out) |
03:57
🔗
|
|
chazchaz has joined #archiveteam |
03:57
🔗
|
|
swebb sets mode: +o chazchaz |
03:58
🔗
|
|
RichardG has quit IRC (Ping timeout: 272 seconds) |
04:00
🔗
|
|
RichardG has joined #archiveteam |
04:00
🔗
|
|
Frogging has quit IRC (Read error: Operation timed out) |
04:03
🔗
|
|
achip has quit IRC (Ping timeout: 258 seconds) |
04:04
🔗
|
|
K4k has quit IRC (Read error: Operation timed out) |
04:04
🔗
|
|
bwn has quit IRC (Ping timeout: 258 seconds) |
04:04
🔗
|
|
sivoais has quit IRC (Read error: Operation timed out) |
04:05
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
04:05
🔗
|
|
godane has quit IRC (Ping timeout: 258 seconds) |
04:05
🔗
|
|
Kaz has quit IRC (Read error: Operation timed out) |
04:05
🔗
|
|
Infreq has quit IRC (Ping timeout: 258 seconds) |
04:05
🔗
|
|
logchfoo1 has quit IRC (Ping timeout: 258 seconds) |
04:10
🔗
|
|
logchfoo4 starts logging #archiveteam at Fri Apr 29 04:10:22 2016 |
04:10
🔗
|
|
logchfoo4 has joined #archiveteam |
04:10
🔗
|
|
fie_ has quit IRC (Read error: Connection reset by peer) |
04:10
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
04:11
🔗
|
|
balrog has joined #archiveteam |
04:11
🔗
|
|
swebb sets mode: +o balrog |
04:11
🔗
|
|
ring has joined #archiveteam |
04:11
🔗
|
|
SirCmpwn has joined #archiveteam |
04:12
🔗
|
|
K4k has joined #archiveteam |
04:12
🔗
|
|
achip has joined #archiveteam |
04:13
🔗
|
|
Emcy has joined #archiveteam |
04:13
🔗
|
|
zenguy has joined #archiveteam |
04:14
🔗
|
|
mr-b has joined #archiveteam |
04:15
🔗
|
|
sivoais has joined #archiveteam |
04:15
🔗
|
|
acridAxid has joined #archiveteam |
04:18
🔗
|
|
joepie91 has quit IRC (Read error: Operation timed out) |
04:20
🔗
|
|
joepie91 has joined #archiveteam |
04:20
🔗
|
|
swebb sets mode: +o joepie91 |
04:21
🔗
|
|
wyatt8740 has joined #archiveteam |
04:22
🔗
|
|
godane has joined #archiveteam |
04:22
🔗
|
|
Kaz has joined #archiveteam |
04:24
🔗
|
|
dashcloud has joined #archiveteam |
04:45
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:52
🔗
|
|
Sk1d has joined #archiveteam |
04:58
🔗
|
|
bwn has joined #archiveteam |
05:08
🔗
|
|
bwn has quit IRC (Quit: Quit) |
05:20
🔗
|
|
aMunster has joined #archiveteam |
05:20
🔗
|
|
beardicus has joined #archiveteam |
05:20
🔗
|
|
swebb sets mode: +o beardicus |
05:23
🔗
|
|
Honno has joined #archiveteam |
05:26
🔗
|
|
MMovie has joined #archiveteam |
05:34
🔗
|
|
mhazinsk has joined #archiveteam |
05:46
🔗
|
|
Mayonaise has joined #archiveteam |
06:31
🔗
|
|
bwn has joined #archiveteam |
06:49
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
07:03
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
07:23
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
07:49
🔗
|
|
Skyrider_ has joined #archiveteam |
07:49
🔗
|
Skyrider_ |
Ello everyone |
07:55
🔗
|
PurpleSym |
Hi. |
07:59
🔗
|
|
bwn has joined #archiveteam |
08:06
🔗
|
|
schbirid has joined #archiveteam |
08:18
🔗
|
|
Medowar has joined #archiveteam |
08:35
🔗
|
|
metalcamp has joined #archiveteam |
08:45
🔗
|
|
Skyrider_ has quit IRC (Quit: Page closed) |
09:24
🔗
|
|
Atom__ has joined #archiveteam |
10:06
🔗
|
|
SketchCo1 has joined #archiveteam |
10:06
🔗
|
|
swebb sets mode: +o SketchCo1 |
10:07
🔗
|
|
SketchCow has quit IRC (Read error: Connection reset by peer) |
10:07
🔗
|
|
nekomune has quit IRC (Ping timeout: 244 seconds) |
10:08
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
10:08
🔗
|
|
BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) |
10:08
🔗
|
|
joepie91 has quit IRC (Ping timeout: 244 seconds) |
10:08
🔗
|
|
SN4T14 has quit IRC (Ping timeout: 244 seconds) |
10:08
🔗
|
|
zerkalo has quit IRC (Ping timeout: 244 seconds) |
10:09
🔗
|
|
BnA-Rob1n has joined #archiveteam |
10:09
🔗
|
|
nekomune has joined #archiveteam |
10:10
🔗
|
|
zerkalo has joined #archiveteam |
10:11
🔗
|
|
joepie91 has joined #archiveteam |
10:11
🔗
|
|
swebb sets mode: +o joepie91 |
10:12
🔗
|
|
SN4T14 has joined #archiveteam |
10:53
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:22
🔗
|
arkiver |
We won't be able to get all files from GameFront |
12:22
🔗
|
arkiver |
The popular files are fully backed up |
12:22
🔗
|
arkiver |
The not-so-popular files, which have like 20 downloads, are very problematic when downloading |
12:23
🔗
|
arkiver |
For some reason the download URL only works sometimes |
12:23
🔗
|
arkiver |
randomly it seems |
12:23
🔗
|
arkiver |
I spend some hours trying to figure it out |
12:23
🔗
|
arkiver |
It looks like the load on GameFront isn't the problem |
12:23
🔗
|
arkiver |
I checked cookies, and those seem to be ok |
12:24
🔗
|
arkiver |
So it might be on GameFront's side. |
12:24
🔗
|
* |
arkiver is not giving up hope yet though |
12:25
🔗
|
PurpleSym |
I’ve had a site that required grabbing URL A before URL B worked. |
12:25
🔗
|
arkiver |
Same on this site |
12:25
🔗
|
PurpleSym |
Sometimes this caused timing problems. |
12:28
🔗
|
arkiver |
Let me recheck cookies |
12:33
🔗
|
arkiver |
I do see a new POST request now |
12:33
🔗
|
arkiver |
They might have changed something recently |
12:42
🔗
|
phuzion |
arkiver: are items not going out on gamefront right now? |
12:42
🔗
|
arkiver |
yeah |
12:42
🔗
|
arkiver |
I paused it, fixing the scripts |
12:42
🔗
|
phuzion |
oh ok |
12:42
🔗
|
arkiver |
Please keep it running though!! |
12:42
🔗
|
phuzion |
Will do. |
12:43
🔗
|
phuzion |
Apparently my droplets did about 1TB of gamefront since I turned them on yesterday. |
12:58
🔗
|
|
weslord has joined #archiveteam |
13:29
🔗
|
phuzion |
Scripts updated on all droplets! |
13:32
🔗
|
|
weslord has quit IRC (Quit: Lost terminal) |
13:37
🔗
|
Medowar |
bayimg cluster running. 40 script, 20 warrior |
13:38
🔗
|
arkiver |
I think I have gamefront fixed! |
13:41
🔗
|
arkiver |
yes, totally fixed |
13:41
🔗
|
Medowar |
so we can hammer it again? |
13:42
🔗
|
arkiver |
in a bit |
13:42
🔗
|
arkiver |
I'll get the new script up now |
13:43
🔗
|
Medowar |
cool. Tell me, when it is up, i have space for ~100 Instances. |
13:43
🔗
|
Medowar |
Old decommissioned server with a few days on the contract left. |
13:43
🔗
|
|
VADemon has joined #archiveteam |
13:44
🔗
|
arkiver |
nice!! |
13:44
🔗
|
arkiver |
New version is online! |
13:45
🔗
|
arkiver |
Let's finish these last items :D |
13:46
🔗
|
Medowar |
do we have any info on how log bayimg stays up? |
13:47
🔗
|
arkiver |
no |
13:47
🔗
|
arkiver |
'a week or so' |
13:47
🔗
|
Medowar |
ok. So I keep hammering it.. |
13:48
🔗
|
arkiver |
Yes, the items currently loaded in the tracker is everything. So we'll just get as much as possible |
13:49
🔗
|
arkiver |
GameFront is highest priority at the moment. We just need to finish these last items |
13:50
🔗
|
Medowar |
fos down? |
13:50
🔗
|
Medowar |
rsync: failed to connect to fos.textfiles.com (208.70.31.74): Connection timed out (110) |
13:51
🔗
|
arkiver |
on what grab is that? |
13:51
🔗
|
Medowar |
bayimg |
13:51
🔗
|
arkiver |
Probably |
13:51
🔗
|
Medowar |
but works for gf |
13:52
🔗
|
arkiver |
GameFront doesn't use FOS |
13:52
🔗
|
MrRadar |
gf is on zino's server |
13:52
🔗
|
arkiver |
yeah |
13:52
🔗
|
Medowar |
yeah, just saw that |
13:52
🔗
|
arkiver |
who is vantec again? |
13:53
🔗
|
Medowar |
fos is back up |
13:57
🔗
|
Medowar |
No HTTP response received from tracker. The tracker is probably overloaded. Retrying after 60 seconds... |
13:57
🔗
|
Medowar |
gf project |
13:58
🔗
|
arkiver |
Happens sometimes, should fix itself |
13:59
🔗
|
arkiver |
Everyone: please put as much as you can on GameFront, we might only have a few hours left... |
14:03
🔗
|
phuzion |
engaging cannons |
14:06
🔗
|
arkiver |
phuzion: awesome!! |
14:07
🔗
|
HCross2 |
arkiver: will the current scripts work for a bit? Not home atm so can't update |
14:07
🔗
|
arkiver |
The most recent version will work, older scripts won't |
14:08
🔗
|
arkiver |
phuzion: I'm going to release another update, skipping the facebook URLs |
14:08
🔗
|
phuzion |
Ok, I'll do a touch stop now |
14:08
🔗
|
phuzion |
Unless you foresee the update taking > 1hour |
14:08
🔗
|
|
atomotic has joined #archiveteam |
14:09
🔗
|
arkiver |
no, will be here in a bit |
14:10
🔗
|
|
scyther_ has joined #archiveteam |
14:10
🔗
|
|
scyther_ has quit IRC (Connection closed) |
14:12
🔗
|
HCross2 |
I'll be able to throw most of a vps providers node at you tonight |
14:12
🔗
|
arkiver |
HCross2: sounds good! |
14:13
🔗
|
arkiver |
phuzion: scripts are updated! |
14:13
🔗
|
phuzion |
arkiver: how likely is it that the tasks I have now will finish cleanly? Or should I just abandon them and reboot the droplets? |
14:14
🔗
|
arkiver |
You can just force quit them if you'd like, I'll requeue the items |
14:14
🔗
|
phuzion |
rebooting |
14:14
🔗
|
arkiver |
requeued. |
14:15
🔗
|
phuzion |
deploying cannons |
14:17
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:20
🔗
|
|
TC01 has joined #archiveteam |
14:21
🔗
|
|
bwn_ has joined #archiveteam |
14:22
🔗
|
phuzion |
And 40 x 6 online. |
14:23
🔗
|
Medowar |
and 4x20 online |
14:23
🔗
|
Medowar |
more coming up soon |
14:24
🔗
|
phuzion |
4 instances of 20 threads? or vice versa? |
14:24
🔗
|
phuzion |
Because I've got 40 instaces of 6 threads. |
14:24
🔗
|
Medowar |
4 instances with 20 threats |
14:24
🔗
|
Medowar |
it is easier to spin up docker images with 20 script threats, than warrior |
14:25
🔗
|
phuzion |
Oh, I don't bring up docker images, I have an ansible script for deploying a lot of the more basic warrior scripts |
14:26
🔗
|
Medowar |
oh, ok... I am using https://hub.docker.com/r/infrequent/at-as-dockerfile/, since I have very few but powerfull servers. |
14:26
🔗
|
phuzion |
I just use Digital Ocean :) |
14:27
🔗
|
Medowar |
yeah, but I already have my servers |
14:27
🔗
|
MrRadar |
arkiver: it looks like we've overloaded GameFront's token POST endpoint. A few of my items are getting 504s from it |
14:27
🔗
|
Medowar |
same here |
14:28
🔗
|
Medowar |
actually quiet a few 504s. |
14:28
🔗
|
HCross2 |
phuzion: can you link to your ansible please, I might need it. Spoken to a friend who works for a vps provider, and will be getting a fair few VPSes soon |
14:29
🔗
|
phuzion |
HCross2: github.com/phuzion/archiveteam-deploy |
14:29
🔗
|
Medowar |
one image is getting more timeouts than actual jobs finished |
14:29
🔗
|
arkiver |
still working for me |
14:29
🔗
|
MrRadar |
Yeah, sometimes it works other times it gets a 504 |
14:30
🔗
|
Medowar |
http://pastebin.com/r3aWQh8J |
14:30
🔗
|
MrRadar |
That's exactly what I'm seeing |
14:31
🔗
|
arkiver |
Will keep it at 100 for the moment |
14:31
🔗
|
arkiver |
100 items/min |
14:34
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
14:59
🔗
|
arkiver |
phuzion: Medowar: I'll have to make another update |
15:00
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
15:00
🔗
|
phuzion |
ok, let me know when it's deployed |
15:01
🔗
|
arkiver |
ok |
15:01
🔗
|
|
Honno has joined #archiveteam |
15:10
🔗
|
arkiver |
phuzion: scripts are updated |
15:11
🔗
|
arkiver |
Unfortunately I can't easily check if the update works with the probem, so I'll have to see from the files that are returned |
15:11
🔗
|
arkiver |
If it doesn't work, I'll have to make another update |
15:11
🔗
|
phuzion |
I'm trying from a separate machine before I deploy the droplets |
15:13
🔗
|
MrRadar |
Did you mean to say "Fotolog is overloaded!" in the script? |
15:13
🔗
|
arkiver |
hmm |
15:13
🔗
|
arkiver |
sorry |
15:13
🔗
|
MrRadar |
No prob, it's just funny :) |
15:14
🔗
|
arkiver |
it's gone |
15:14
🔗
|
arkiver |
That last part of taken from the fotolog script. It should work though |
15:15
🔗
|
phuzion |
arkiver: I've pushed an item or two up with the latest scripts, wanna check and let me know if we're good to deploy widely? |
15:15
🔗
|
arkiver |
ok |
15:16
🔗
|
arkiver |
Though this problem might only occur when the load on gamefront is high |
15:16
🔗
|
phuzion |
Ah, ok |
15:16
🔗
|
|
Start has joined #archiveteam |
15:16
🔗
|
arkiver |
the thankyou page sometimes returns nothing for some reason |
15:17
🔗
|
SketchCo1 |
Load on FOS is down to 2 |
15:17
🔗
|
SketchCo1 |
I see rsyncs are not backed up like crazy either. |
15:17
🔗
|
SketchCo1 |
So either we moved something super major to the other host or something else. |
15:18
🔗
|
arkiver |
GameFront is moved. Already 3 TB on the other target |
15:18
🔗
|
SketchCo1 |
Ha |
15:18
🔗
|
SketchCo1 |
Well OK THEN |
15:18
🔗
|
|
SketchCo1 is now known as SketchCow |
15:18
🔗
|
SketchCow |
I'm watching FOS deal |
15:18
🔗
|
arkiver |
phuzion: so I guess jsut fire it up and we'll see what happens |
15:19
🔗
|
phuzion |
DEPLOYING bbl gonna go get lunch |
15:19
🔗
|
SketchCow |
I also see last night had some interesting discussion. |
15:19
🔗
|
arkiver |
The GameFront discussion? |
15:22
🔗
|
SketchCow |
I need to finish reading it |
15:23
🔗
|
SketchCow |
After I fix this "I have no water in the house" problem |
15:31
🔗
|
|
Yoshimura has joined #archiveteam |
15:43
🔗
|
|
JesseW has joined #archiveteam |
15:46
🔗
|
SketchCow |
Fixed |
15:49
🔗
|
SketchCow |
Caught up. |
15:49
🔗
|
MrRadar |
arkiver: Some of my GameFront items are getting truncated at GameFront's end. For example item 14733169 says it's 83 megs on GameFront's file info page but I only downloaded 18 megs of it |
15:50
🔗
|
MrRadar |
Not sure if there's anything we can do about it though |
15:50
🔗
|
arkiver |
MrRadar: did the file return to the tracker? |
15:50
🔗
|
MrRadar |
Yes |
15:50
🔗
|
arkiver |
or did it retry the download? |
15:50
🔗
|
MrRadar |
As far as I can tell it uploaded |
15:51
🔗
|
arkiver |
How do you know it downloaded 18 MB? |
15:51
🔗
|
MrRadar |
rsync only uploaded 18 megs |
15:51
🔗
|
MrRadar |
It's possible the file compressed really well, but I doubt it since the file is a a video file already in a zip file |
15:52
🔗
|
MrRadar |
I'm trying to download it manually to verify the size but GameFront is being tempermental |
15:53
🔗
|
MrRadar |
OK, Chrome says the file should be 74 megs in size (based on the HTTP header presumably) |
15:55
🔗
|
MrRadar |
The manual download is stuck at 1.4 megs; I'll let it run until either Chrome or GameFront abort the connection |
15:58
🔗
|
MrRadar |
OK, the download "finished" with 1.5 megs retrieved and the ZIP header says it's supposed to be 83.2 megs compressed |
15:58
🔗
|
arkiver |
Finished for me with 74 MB |
15:59
🔗
|
MrRadar |
Can you extract it? |
15:59
🔗
|
arkiver |
no, 74 is not the full file apparently |
15:59
🔗
|
MrRadar |
:( |
15:59
🔗
|
arkiver |
Grab is paused. |
16:00
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
16:01
🔗
|
arkiver |
SketchCow: see above |
16:02
🔗
|
arkiver |
We got all popular files. We did not get the less popular files, that is files with around 20 or less downloads |
16:02
🔗
|
arkiver |
I'm stopping the GameFront grab, since we can't trust their returned data anymore, and there's not a good way to verify the downloaded data in the scripts |
16:02
🔗
|
arkiver |
Well, we got a part of the less popular files. |
16:04
🔗
|
HCross |
How are we going with the forums? |
16:04
🔗
|
arkiver |
I just checked sizes of some of the recent grabbed files, they sometimes don't match the size from the gamefront page |
16:04
🔗
|
arkiver |
The older grabbed files do seem to match the sizes, so I'd say it's only a part of the very recent files that is corrupted |
16:05
🔗
|
arkiver |
The forums grab is still running |
16:05
🔗
|
HCross |
I'd say keep an eye on it, and see if they decide to behave |
16:05
🔗
|
arkiver |
yeah |
16:06
🔗
|
arkiver |
I think they might have started the shutdown though |
16:06
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:06
🔗
|
SketchCow |
Assume that's the case, then. |
16:06
🔗
|
SketchCow |
Some stuff will be lost, etc. |
16:06
🔗
|
phuzion |
arkiver: want me to move resources to the forums grab? |
16:06
🔗
|
SketchCow |
But we got, what.... a bit, right |
16:06
🔗
|
SketchCow |
Terabit |
16:06
🔗
|
phuzion |
SketchCow: Tracker says about 36TB or so. |
16:06
🔗
|
arkiver |
We got a very very large part of everything from gamefront |
16:07
🔗
|
HCross |
http://www.gamefront.com/gamefront-is-closing-down-april-30-2016/ hmm, very interesting comment at the bottom |
16:07
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:08
🔗
|
SketchCow |
We'll see! |
16:08
🔗
|
arkiver |
So http://gamefront.online/ would host everything... |
16:08
🔗
|
arkiver |
8 TB is really far off of our number though |
16:08
🔗
|
SketchCow |
I LOVE it when there's multiple attempts to download something. |
16:08
🔗
|
arkiver |
and I'm pretty sure we did not get duplicates |
16:08
🔗
|
SketchCow |
Well, that's what happens when someone works alone in the dark. |
16:09
🔗
|
SketchCow |
Did we get more than 326,000 files? |
16:09
🔗
|
SketchCow |
Also, he's probably saying "files" versus "all HTML, etc." |
16:09
🔗
|
arkiver |
I can't be 100% sure, but I'd say yes |
16:10
🔗
|
arkiver |
Anyway, when everything is uploaded I'll work on an index of every file we have saved including extracted title, description, date and the direct download URL in the wayback machine |
16:10
🔗
|
arkiver |
Since some mod websites have shown interest in these files. |
16:11
🔗
|
arkiver |
It might make it easier for them to get and link to the saved files in the Wayback Machine |
16:12
🔗
|
arkiver |
SketchCow: I found out why the website saved only 326000 files. |
16:12
🔗
|
arkiver |
http://www.gamefront.com/files/ says there's around that number of files |
16:13
🔗
|
|
luckcolor has joined #archiveteam |
16:13
🔗
|
luckcolor |
Hi guys |
16:13
🔗
|
luckcolor |
So i wa reading the logs |
16:13
🔗
|
arkiver |
That is only though for IDs that are indexed and sorted under games and categories |
16:13
🔗
|
arkiver |
There are a lot of files that were not sorted under a category and game, and we got those too |
16:14
🔗
|
arkiver |
So that'd explain the difference |
16:14
🔗
|
|
dashcloud has joined #archiveteam |
16:15
🔗
|
luckcolor |
So arkiver how much do you think we have missed in terms of files on Gamefront? |
16:15
🔗
|
arkiver |
I don't know |
16:16
🔗
|
SketchCow |
I expect that eventually we'll talk to the guy. |
16:18
🔗
|
luckcolor |
Guys just in general about irc channels and such, like how do you knwo that somebody is not using an fake nickname? |
16:18
🔗
|
arkiver |
Can check IP, but we can't be sure of that |
16:18
🔗
|
luckcolor |
right |
16:18
🔗
|
arkiver |
Maybe this arkiver is a fake arkiver |
16:18
🔗
|
luckcolor |
lel |
16:19
🔗
|
SketchCow |
If this is a fake arkiver, that little charlatan's working his ass off |
16:19
🔗
|
SketchCow |
With me, people ask reasonable questions |
16:19
🔗
|
SketchCow |
And if I answer politely, they know it's not me |
16:20
🔗
|
SketchCow |
The key is for it to be a totally generic question, like "how are you doing" or "we can't help but notice you're not keeping track of disk space" |
16:20
🔗
|
SketchCow |
If what follows isn't a 34 paragraph rant threateneing the lives of at least 3 people |
16:20
🔗
|
SketchCow |
impostor |
16:20
🔗
|
luckcolor |
well right |
16:20
🔗
|
luckcolor |
I'm probably going oftopic now |
16:21
🔗
|
luckcolor |
sorry :P |
16:21
🔗
|
arkiver |
We got a very very large part of everything, but it still feels horrible to not get everything :( |
16:22
🔗
|
luckcolor |
yeah |
16:23
🔗
|
luckcolor |
Bah i should totally get some irc client on my server i'm tired of reading logs |
16:23
🔗
|
arkiver |
http://gamefront.online/ must have had some help from inside GameFront |
16:24
🔗
|
luckcolor |
well |
16:24
🔗
|
luckcolor |
it says 8000 gb and we did 36480 gb in total |
16:32
🔗
|
MrRadar |
Hello |
16:33
🔗
|
MrRadar |
n/m, wrong window |
16:35
🔗
|
luckcolor |
arkiver this seems to be the profile of the guy who made that website http://www.moddb.com/members/d-airy |
16:35
🔗
|
luckcolor |
I was reading the comments of moddb :P |
16:37
🔗
|
schbirid |
100% and still doing 83.691 MB/s? |
16:37
🔗
|
luckcolor |
dunno |
16:38
🔗
|
luckcolor |
it seems he just registered on moddb for that |
16:38
🔗
|
luckcolor |
http://gamefront.online/get_progress.php this is the url for the data on the webpage |
16:45
🔗
|
schbirid |
http://www.eurogamer.net/articles/2016-04-29-fable-developer-lionhead-closes-down-today |
16:45
🔗
|
schbirid |
<arkiver> We got all popular files. We did not get the less popular files, that is files with around 20 or less downloads <- aww man, i wish you focused on the stuff that would not be available on any other gaming file site... |
16:47
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:48
🔗
|
MrRadar |
The problem is on GameFront's end (mostly) |
16:48
🔗
|
MrRadar |
We can't do anything if their copies are truncated |
16:48
🔗
|
MrRadar |
But yeah, it does suck |
16:49
🔗
|
luckcolor |
Yeah |
16:54
🔗
|
|
dashcloud has joined #archiveteam |
17:07
🔗
|
atrocity |
so gamefront... all are sayign this for me: Tracker rate limiting is active. We don't want to overload the site we're archiving, so we've limited the number of downloads per minute. Retrying after 300 seconds... |
17:07
🔗
|
xmc |
gamefront is kinda busted |
17:09
🔗
|
atrocity |
lol |
17:09
🔗
|
atrocity |
should i stay on it? |
17:09
🔗
|
xmc |
¯\_(ツ)_/¯ |
17:09
🔗
|
xmc |
is there anything else on the tracker that you want to run instead? |
17:10
🔗
|
MrRadar |
Move over to the gamefrontforums or bayimg project |
17:10
🔗
|
atrocity |
kk |
17:10
🔗
|
MrRadar |
The GameFront forums are ending today and bayimg in a few days |
17:10
🔗
|
MrRadar |
Altho the forums are currently tracker limited |
17:11
🔗
|
|
metalcamp has joined #archiveteam |
17:28
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
17:32
🔗
|
|
yakfish has joined #archiveteam |
17:32
🔗
|
|
matthusby has joined #archiveteam |
17:34
🔗
|
|
SadDM has joined #archiveteam |
17:34
🔗
|
|
swebb sets mode: +o SadDM |
17:37
🔗
|
|
jspiros has joined #archiveteam |
17:49
🔗
|
luckcolor |
arkiver can you check the staus of FOS i get this error |
17:49
🔗
|
luckcolor |
rsync: chgrp "/bayimg/.bayimg-16pictures_ganbdaaa-20160429-174836_data.txt.IFhkO7" (in dev) failed: Operation not permitted (1) |
17:49
🔗
|
luckcolor |
*status |
17:50
🔗
|
phuzion |
luckcolor: are you rsyncing manually or by script? |
17:51
🔗
|
|
xhdr has quit IRC (Remote host closed the connection) |
17:51
🔗
|
luckcolor |
by srcipt |
17:51
🔗
|
luckcolor |
*script |
17:55
🔗
|
|
xhdr has joined #archiveteam |
17:55
🔗
|
|
xhdr has quit IRC (Excess Flood) |
17:56
🔗
|
|
xhdr has joined #archiveteam |
17:58
🔗
|
arkiver |
schbirid: I agree |
17:59
🔗
|
|
Emcy has quit IRC (Ping timeout: 370 seconds) |
17:59
🔗
|
arkiver |
It later turned out that the less popular were a little problematic to download compared to the 'normal' files |
18:00
🔗
|
arkiver |
Apparently the less popular files first needed to be activated using a POST request with a special string |
18:00
🔗
|
arkiver |
If they were not activated that way, they would not download |
18:00
🔗
|
phuzion |
Is that what the token thing was? |
18:00
🔗
|
arkiver |
yeah |
18:00
🔗
|
phuzion |
ok |
18:00
🔗
|
arkiver |
As with any project, we first grab everything and then have a look at the problematic items |
18:01
🔗
|
arkiver |
In this case the problematic items were those less popular files |
18:05
🔗
|
arkiver |
schbiridi: let me know if you have any other question regarding what we saved |
18:06
🔗
|
arkiver |
schbirid* |
18:19
🔗
|
|
luckcolor has quit IRC (Quit: Page closed) |
18:19
🔗
|
|
Start has joined #archiveteam |
18:43
🔗
|
|
remsen has quit IRC (ircd.choopa.net irc2.choopa.net) |
18:43
🔗
|
|
remsen1 has joined #archiveteam |
18:46
🔗
|
|
Morbus has quit IRC (Read error: Operation timed out) |
18:50
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
19:10
🔗
|
|
bwn_ has joined #archiveteam |
19:22
🔗
|
|
Emcy has joined #archiveteam |
19:27
🔗
|
|
Emcy has quit IRC (Read error: Connection reset by peer) |
19:44
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
20:17
🔗
|
|
db48x has joined #archiveteam |
20:28
🔗
|
|
remsen1 has quit IRC (ZNC 1.6.2 - http://znc.in) |
20:28
🔗
|
|
remsen has joined #archiveteam |
20:29
🔗
|
Gfy |
is there a full site grab of iSONEWS? It'll be closing down http://www.theisonews.com/forums/index.php/topic,161745.0.html |
20:39
🔗
|
MrRadar |
I put it into ArchiveBot |
20:48
🔗
|
|
tomwsmf-a has joined #archiveteam |
20:49
🔗
|
|
Emcy has joined #archiveteam |
21:32
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
22:21
🔗
|
schbirid |
arkiver: damn nice job in any case :)) |
22:25
🔗
|
|
Start has joined #archiveteam |
22:57
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
23:19
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
23:26
🔗
|
|
BartoCH has joined #archiveteam |
23:29
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
23:44
🔗
|
|
Stiletto has quit IRC () |
23:55
🔗
|
arkiver |
schbirid: thank you! All less popular files should also be available on http://gamefront.online/ soon |
23:55
🔗
|
arkiver |
It looks like the person behind that site was able to do a grab too |