#archiveteam 2016-04-29,Fri

↑back Search

Time Nickname Message
00:03 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:06 🔗 dashcloud has joined #archiveteam
00:31 🔗 _Crocatow has quit IRC (Read error: Connection reset by peer)
00:31 🔗 _Crocatow has joined #archiveteam
00:33 🔗 arkiver MrRadar: right. Thanks r3c0d3x!
00:36 🔗 arkiver I'm off for the night
00:36 🔗 arkiver Next up if fixing MyVIP, getting new items in fotolog and starting corbisimages and Experience Project
00:41 🔗 JesseW has joined #archiveteam
00:56 🔗 fmope has quit IRC (Remote host closed the connection)
00:56 🔗 fmope has joined #archiveteam
00:59 🔗 Stiletto has quit IRC (Read error: Operation timed out)
01:02 🔗 philpem has quit IRC (Ping timeout: 260 seconds)
01:06 🔗 scottmodd has joined #archiveteam
01:07 🔗 scottmodd Hi all, I wanted to see if there is someone here I can speak to regarding the Gamefront backup
01:07 🔗 scottmodd http://tracker.archiveteam.org/gamefront/
01:08 🔗 MrRadar You just missed the guy to talk to, arkiver
01:08 🔗 scottmodd Ahh, I saw you commenting on ModDB actually
01:08 🔗 scottmodd what hours is he normally available?
01:08 🔗 VADemon What do you need? There's still a chance we can help you
01:09 🔗 scottmodd Well i've been speaking to defymedia about buying gamefront
01:09 🔗 scottmodd disclaimer: I own moddb.com
01:10 🔗 scottmodd anyhow, problem is they only want to sell the domains to keep the closure simple and protect users privacy etc (understandable)
01:10 🔗 scottmodd So essentially what I was hoping to do, was should the deal go through, instead of the site going dark and URLs 100% failing, to somehow serve up at least the static HTML with links to archive.org should you want to download the file
01:11 🔗 scottmodd and wanted to see if you have a big static HTML dump of gamefront that I could grab. Or use an API?
01:11 🔗 xmc hm, kind of like what i did with gitorious.org ?
01:12 🔗 MrRadar You can download the raw scrape data here: https://archive.org/details/archiveteam_gamefront
01:12 🔗 MrRadar It's quite big
01:12 🔗 VADemon xmc, previous real URL -> data in WARCs on archive.org. I would not know how to make it easily
01:13 🔗 scottmodd yeah read-only essentially
01:13 🔗 xmc hm, yeah, you'd need a proxy thinger
01:13 🔗 xmc unless you could just rewrite to wayback urls
01:13 🔗 MrRadar It's in a format called a Web ARChive which is a container for saving HTTP responses
01:13 🔗 xmc web.archive.org/*/http://gamefront.whatever/theactualurl/iguess.zip
01:13 🔗 MrRadar You could grab the associated CDX files (which are indexes of WARCs) and redirect any URL we saved to the IA
01:14 🔗 xmc yea
01:15 🔗 scottmodd There is no way to just get the static HTML? (ignoring the files which are obviously much larger)
01:15 🔗 xmc the cdx files are pretty darn small
01:15 🔗 xmc but no, it's probably an all or nothing thing
01:15 🔗 MrRadar You could extract the HTML from the WARCs using the CDXs
01:16 🔗 MrRadar But just redirecting to the IA is probably easier
01:17 🔗 scottmodd I'm trying to avoid redirects if possible - to preserve the domains value as ideally we'd like to relaunch gamefront in some form. but dont want to lose the history
01:17 🔗 xmc then download the files and host it
01:18 🔗 xmc ???
01:19 🔗 scottmodd well that is essentially what we are trying
01:20 🔗 scottmodd Just thought i'd check if there was an easy API way or something similar, your team is doing awesome work
01:20 🔗 xmc nah, there's not a good way to download partial warcs en masse without a ton of work
01:20 🔗 VADemon You'd need to start with .cdx since they contain the metadata and then work your way through the actual big archives
01:21 🔗 xmc i'd bet the pages are probably interspersed randomly with files
01:21 🔗 scottmodd yeah looks like a ton of work
01:22 🔗 MrRadar The good news with the IA is their servers support range requests so if you know the byte range you need (which the CDX files tell you) you won't need to download the whole WARC files
01:23 🔗 scottmodd yeah i'll do some experimental coding
01:23 🔗 scottmodd appreiate your help
01:24 🔗 MrRadar Also, to answer your question from earlier, arkiver is on Eurpoean time
01:27 🔗 scottmodd cheers, I assume he won't have an easier solution?
01:27 🔗 MrRadar Probably not
01:37 🔗 JesseW scottmodd: one important thing, if you are buying the domain name, is *please* avoid putting any robots.txt file on it. As long as that is done, the files should be available from the wayback machine, at least.
01:37 🔗 scottmodd of course
01:37 🔗 MrRadar As long as it allows the IA's bot I think it's OK to have a robots.txt
01:38 🔗 JesseW MrRadar: there are bugs in IA's handling of robots.txt files -- at least in some circumstances, it seems to interpret the mere presence of any Disallow line as forbidding access.
01:38 🔗 MrRadar Right, forgot about that
01:39 🔗 JesseW scottmodd: There is also a tool available from the Internet Archive (I don't remember the exact name) that will redirect any URLs that would otherwise 404 to the wayback machine. That could be an easy solution, in that it would allow you to host new content, but make the old links still work.
01:39 🔗 JesseW And you could gradually fill in the old links with local copies as time went on.
01:40 🔗 ranma I'm annoyed one site used some redirect that prevented the php application from being backed up (minishowcase)
01:41 🔗 JesseW ranma: say more?
01:43 🔗 ranma I think the site www.minishowcase.net used a redirect in a link to download the zip file (minishowcase.net/?download)
01:43 🔗 ranma I'll type more when I'm home
02:02 🔗 scottmodd has quit IRC (Quit: Page closed)
02:05 🔗 ranma so this site was hosting an opensource ajax gallery: https://web.archive.org/web/20100102033623/http://minishowcase.net/
02:05 🔗 ranma at one point they started charging for it https://web.archive.org/web/20130227055048/http://minishowcase.net/?
02:06 🔗 ranma unfortunately, the way the zip file was originally linked, the IA wasn't able to back it up :<
02:08 🔗 JesseW interesting. So it was under CC-BY-SA-2.5
02:10 🔗 JesseW ranma: it looks like there are various places that claim to have copies of it: https://duckduckgo.com/?q=minishowcase+v09b142&t=ffsb
02:10 🔗 ranma i'm slightly suspicious of those x)
02:11 🔗 ranma someone on github forked it, tho
02:11 🔗 ranma so i trust that a bit more
02:12 🔗 JesseW we should probably take this to -bs
02:18 🔗 Stiletto has joined #archiveteam
02:19 🔗 VADemon has quit IRC (Quit: left4dead)
02:26 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
02:33 🔗 BartoCH has joined #archiveteam
03:28 🔗 bwn_ has joined #archiveteam
03:29 🔗 Medowar has quit IRC (Quit: Connection closed for inactivity)
03:34 🔗 bwn has quit IRC (Read error: Operation timed out)
03:39 🔗 bwn_ has quit IRC (Ping timeout: 633 seconds)
03:47 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
03:52 🔗 bwn has joined #archiveteam
03:54 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
03:55 🔗 aMunster has quit IRC (Read error: Operation timed out)
03:55 🔗 mhazinsk has quit IRC (Read error: Operation timed out)
03:55 🔗 MMovie has quit IRC (Read error: Operation timed out)
03:56 🔗 beardicus has quit IRC (Read error: Operation timed out)
03:57 🔗 chazchaz has quit IRC (Read error: Operation timed out)
03:57 🔗 chazchaz has joined #archiveteam
03:57 🔗 swebb sets mode: +o chazchaz
03:58 🔗 RichardG has quit IRC (Ping timeout: 272 seconds)
04:00 🔗 RichardG has joined #archiveteam
04:00 🔗 Frogging has quit IRC (Read error: Operation timed out)
04:03 🔗 achip has quit IRC (Ping timeout: 258 seconds)
04:04 🔗 K4k has quit IRC (Read error: Operation timed out)
04:04 🔗 bwn has quit IRC (Ping timeout: 258 seconds)
04:04 🔗 sivoais has quit IRC (Read error: Operation timed out)
04:05 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
04:05 🔗 godane has quit IRC (Ping timeout: 258 seconds)
04:05 🔗 Kaz has quit IRC (Read error: Operation timed out)
04:05 🔗 Infreq has quit IRC (Ping timeout: 258 seconds)
04:05 🔗 logchfoo1 has quit IRC (Ping timeout: 258 seconds)
04:10 🔗 logchfoo4 starts logging #archiveteam at Fri Apr 29 04:10:22 2016
04:10 🔗 logchfoo4 has joined #archiveteam
04:10 🔗 fie_ has quit IRC (Read error: Connection reset by peer)
04:10 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:11 🔗 balrog has joined #archiveteam
04:11 🔗 swebb sets mode: +o balrog
04:11 🔗 ring has joined #archiveteam
04:11 🔗 SirCmpwn has joined #archiveteam
04:12 🔗 K4k has joined #archiveteam
04:12 🔗 achip has joined #archiveteam
04:13 🔗 Emcy has joined #archiveteam
04:13 🔗 zenguy has joined #archiveteam
04:14 🔗 mr-b has joined #archiveteam
04:15 🔗 sivoais has joined #archiveteam
04:15 🔗 acridAxid has joined #archiveteam
04:18 🔗 joepie91 has quit IRC (Read error: Operation timed out)
04:20 🔗 joepie91 has joined #archiveteam
04:20 🔗 swebb sets mode: +o joepie91
04:21 🔗 wyatt8740 has joined #archiveteam
04:22 🔗 godane has joined #archiveteam
04:22 🔗 Kaz has joined #archiveteam
04:24 🔗 dashcloud has joined #archiveteam
04:45 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:52 🔗 Sk1d has joined #archiveteam
04:58 🔗 bwn has joined #archiveteam
05:08 🔗 bwn has quit IRC (Quit: Quit)
05:20 🔗 aMunster has joined #archiveteam
05:20 🔗 beardicus has joined #archiveteam
05:20 🔗 swebb sets mode: +o beardicus
05:23 🔗 Honno has joined #archiveteam
05:26 🔗 MMovie has joined #archiveteam
05:34 🔗 mhazinsk has joined #archiveteam
05:46 🔗 Mayonaise has joined #archiveteam
06:31 🔗 bwn has joined #archiveteam
06:49 🔗 bwn has quit IRC (Read error: Operation timed out)
07:03 🔗 Honno has quit IRC (Read error: Operation timed out)
07:23 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
07:49 🔗 Skyrider_ has joined #archiveteam
07:49 🔗 Skyrider_ Ello everyone
07:55 🔗 PurpleSym Hi.
07:59 🔗 bwn has joined #archiveteam
08:06 🔗 schbirid has joined #archiveteam
08:18 🔗 Medowar has joined #archiveteam
08:35 🔗 metalcamp has joined #archiveteam
08:45 🔗 Skyrider_ has quit IRC (Quit: Page closed)
09:24 🔗 Atom__ has joined #archiveteam
10:06 🔗 SketchCo1 has joined #archiveteam
10:06 🔗 swebb sets mode: +o SketchCo1
10:07 🔗 SketchCow has quit IRC (Read error: Connection reset by peer)
10:07 🔗 nekomune has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 BnA-Rob1n has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 joepie91 has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 SN4T14 has quit IRC (Ping timeout: 244 seconds)
10:08 🔗 zerkalo has quit IRC (Ping timeout: 244 seconds)
10:09 🔗 BnA-Rob1n has joined #archiveteam
10:09 🔗 nekomune has joined #archiveteam
10:10 🔗 zerkalo has joined #archiveteam
10:11 🔗 joepie91 has joined #archiveteam
10:11 🔗 swebb sets mode: +o joepie91
10:12 🔗 SN4T14 has joined #archiveteam
10:53 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:22 🔗 arkiver We won't be able to get all files from GameFront
12:22 🔗 arkiver The popular files are fully backed up
12:22 🔗 arkiver The not-so-popular files, which have like 20 downloads, are very problematic when downloading
12:23 🔗 arkiver For some reason the download URL only works sometimes
12:23 🔗 arkiver randomly it seems
12:23 🔗 arkiver I spend some hours trying to figure it out
12:23 🔗 arkiver It looks like the load on GameFront isn't the problem
12:23 🔗 arkiver I checked cookies, and those seem to be ok
12:24 🔗 arkiver So it might be on GameFront's side.
12:24 🔗 * arkiver is not giving up hope yet though
12:25 🔗 PurpleSym I’ve had a site that required grabbing URL A before URL B worked.
12:25 🔗 arkiver Same on this site
12:25 🔗 PurpleSym Sometimes this caused timing problems.
12:28 🔗 arkiver Let me recheck cookies
12:33 🔗 arkiver I do see a new POST request now
12:33 🔗 arkiver They might have changed something recently
12:42 🔗 phuzion arkiver: are items not going out on gamefront right now?
12:42 🔗 arkiver yeah
12:42 🔗 arkiver I paused it, fixing the scripts
12:42 🔗 phuzion oh ok
12:42 🔗 arkiver Please keep it running though!!
12:42 🔗 phuzion Will do.
12:43 🔗 phuzion Apparently my droplets did about 1TB of gamefront since I turned them on yesterday.
12:58 🔗 weslord has joined #archiveteam
13:29 🔗 phuzion Scripts updated on all droplets!
13:32 🔗 weslord has quit IRC (Quit: Lost terminal)
13:37 🔗 Medowar bayimg cluster running. 40 script, 20 warrior
13:38 🔗 arkiver I think I have gamefront fixed!
13:41 🔗 arkiver yes, totally fixed
13:41 🔗 Medowar so we can hammer it again?
13:42 🔗 arkiver in a bit
13:42 🔗 arkiver I'll get the new script up now
13:43 🔗 Medowar cool. Tell me, when it is up, i have space for ~100 Instances.
13:43 🔗 Medowar Old decommissioned server with a few days on the contract left.
13:43 🔗 VADemon has joined #archiveteam
13:44 🔗 arkiver nice!!
13:44 🔗 arkiver New version is online!
13:45 🔗 arkiver Let's finish these last items :D
13:46 🔗 Medowar do we have any info on how log bayimg stays up?
13:47 🔗 arkiver no
13:47 🔗 arkiver 'a week or so'
13:47 🔗 Medowar ok. So I keep hammering it..
13:48 🔗 arkiver Yes, the items currently loaded in the tracker is everything. So we'll just get as much as possible
13:49 🔗 arkiver GameFront is highest priority at the moment. We just need to finish these last items
13:50 🔗 Medowar fos down?
13:50 🔗 Medowar rsync: failed to connect to fos.textfiles.com (208.70.31.74): Connection timed out (110)
13:51 🔗 arkiver on what grab is that?
13:51 🔗 Medowar bayimg
13:51 🔗 arkiver Probably
13:51 🔗 Medowar but works for gf
13:52 🔗 arkiver GameFront doesn't use FOS
13:52 🔗 MrRadar gf is on zino's server
13:52 🔗 arkiver yeah
13:52 🔗 Medowar yeah, just saw that
13:52 🔗 arkiver who is vantec again?
13:53 🔗 Medowar fos is back up
13:57 🔗 Medowar No HTTP response received from tracker. The tracker is probably overloaded. Retrying after 60 seconds...
13:57 🔗 Medowar gf project
13:58 🔗 arkiver Happens sometimes, should fix itself
13:59 🔗 arkiver Everyone: please put as much as you can on GameFront, we might only have a few hours left...
14:03 🔗 phuzion engaging cannons
14:06 🔗 arkiver phuzion: awesome!!
14:07 🔗 HCross2 arkiver: will the current scripts work for a bit? Not home atm so can't update
14:07 🔗 arkiver The most recent version will work, older scripts won't
14:08 🔗 arkiver phuzion: I'm going to release another update, skipping the facebook URLs
14:08 🔗 phuzion Ok, I'll do a touch stop now
14:08 🔗 phuzion Unless you foresee the update taking > 1hour
14:08 🔗 atomotic has joined #archiveteam
14:09 🔗 arkiver no, will be here in a bit
14:10 🔗 scyther_ has joined #archiveteam
14:10 🔗 scyther_ has quit IRC (Connection closed)
14:12 🔗 HCross2 I'll be able to throw most of a vps providers node at you tonight
14:12 🔗 arkiver HCross2: sounds good!
14:13 🔗 arkiver phuzion: scripts are updated!
14:13 🔗 phuzion arkiver: how likely is it that the tasks I have now will finish cleanly? Or should I just abandon them and reboot the droplets?
14:14 🔗 arkiver You can just force quit them if you'd like, I'll requeue the items
14:14 🔗 phuzion rebooting
14:14 🔗 arkiver requeued.
14:15 🔗 phuzion deploying cannons
14:17 🔗 Start has quit IRC (Quit: Disconnected.)
14:20 🔗 TC01 has joined #archiveteam
14:21 🔗 bwn_ has joined #archiveteam
14:22 🔗 phuzion And 40 x 6 online.
14:23 🔗 Medowar and 4x20 online
14:23 🔗 Medowar more coming up soon
14:24 🔗 phuzion 4 instances of 20 threads? or vice versa?
14:24 🔗 phuzion Because I've got 40 instaces of 6 threads.
14:24 🔗 Medowar 4 instances with 20 threats
14:24 🔗 Medowar it is easier to spin up docker images with 20 script threats, than warrior
14:25 🔗 phuzion Oh, I don't bring up docker images, I have an ansible script for deploying a lot of the more basic warrior scripts
14:26 🔗 Medowar oh, ok... I am using https://hub.docker.com/r/infrequent/at-as-dockerfile/, since I have very few but powerfull servers.
14:26 🔗 phuzion I just use Digital Ocean :)
14:27 🔗 Medowar yeah, but I already have my servers
14:27 🔗 MrRadar arkiver: it looks like we've overloaded GameFront's token POST endpoint. A few of my items are getting 504s from it
14:27 🔗 Medowar same here
14:28 🔗 Medowar actually quiet a few 504s.
14:28 🔗 HCross2 phuzion: can you link to your ansible please, I might need it. Spoken to a friend who works for a vps provider, and will be getting a fair few VPSes soon
14:29 🔗 phuzion HCross2: github.com/phuzion/archiveteam-deploy
14:29 🔗 Medowar one image is getting more timeouts than actual jobs finished
14:29 🔗 arkiver still working for me
14:29 🔗 MrRadar Yeah, sometimes it works other times it gets a 504
14:30 🔗 Medowar http://pastebin.com/r3aWQh8J
14:30 🔗 MrRadar That's exactly what I'm seeing
14:31 🔗 arkiver Will keep it at 100 for the moment
14:31 🔗 arkiver 100 items/min
14:34 🔗 bwn has quit IRC (Read error: Operation timed out)
14:59 🔗 arkiver phuzion: Medowar: I'll have to make another update
15:00 🔗 WinterFox has quit IRC (Remote host closed the connection)
15:00 🔗 phuzion ok, let me know when it's deployed
15:01 🔗 arkiver ok
15:01 🔗 Honno has joined #archiveteam
15:10 🔗 arkiver phuzion: scripts are updated
15:11 🔗 arkiver Unfortunately I can't easily check if the update works with the probem, so I'll have to see from the files that are returned
15:11 🔗 arkiver If it doesn't work, I'll have to make another update
15:11 🔗 phuzion I'm trying from a separate machine before I deploy the droplets
15:13 🔗 MrRadar Did you mean to say "Fotolog is overloaded!" in the script?
15:13 🔗 arkiver hmm
15:13 🔗 arkiver sorry
15:13 🔗 MrRadar No prob, it's just funny :)
15:14 🔗 arkiver it's gone
15:14 🔗 arkiver That last part of taken from the fotolog script. It should work though
15:15 🔗 phuzion arkiver: I've pushed an item or two up with the latest scripts, wanna check and let me know if we're good to deploy widely?
15:15 🔗 arkiver ok
15:16 🔗 arkiver Though this problem might only occur when the load on gamefront is high
15:16 🔗 phuzion Ah, ok
15:16 🔗 Start has joined #archiveteam
15:16 🔗 arkiver the thankyou page sometimes returns nothing for some reason
15:17 🔗 SketchCo1 Load on FOS is down to 2
15:17 🔗 SketchCo1 I see rsyncs are not backed up like crazy either.
15:17 🔗 SketchCo1 So either we moved something super major to the other host or something else.
15:18 🔗 arkiver GameFront is moved. Already 3 TB on the other target
15:18 🔗 SketchCo1 Ha
15:18 🔗 SketchCo1 Well OK THEN
15:18 🔗 SketchCo1 is now known as SketchCow
15:18 🔗 SketchCow I'm watching FOS deal
15:18 🔗 arkiver phuzion: so I guess jsut fire it up and we'll see what happens
15:19 🔗 phuzion DEPLOYING bbl gonna go get lunch
15:19 🔗 SketchCow I also see last night had some interesting discussion.
15:19 🔗 arkiver The GameFront discussion?
15:22 🔗 SketchCow I need to finish reading it
15:23 🔗 SketchCow After I fix this "I have no water in the house" problem
15:31 🔗 Yoshimura has joined #archiveteam
15:43 🔗 JesseW has joined #archiveteam
15:46 🔗 SketchCow Fixed
15:49 🔗 SketchCow Caught up.
15:49 🔗 MrRadar arkiver: Some of my GameFront items are getting truncated at GameFront's end. For example item 14733169 says it's 83 megs on GameFront's file info page but I only downloaded 18 megs of it
15:50 🔗 MrRadar Not sure if there's anything we can do about it though
15:50 🔗 arkiver MrRadar: did the file return to the tracker?
15:50 🔗 MrRadar Yes
15:50 🔗 arkiver or did it retry the download?
15:50 🔗 MrRadar As far as I can tell it uploaded
15:51 🔗 arkiver How do you know it downloaded 18 MB?
15:51 🔗 MrRadar rsync only uploaded 18 megs
15:51 🔗 MrRadar It's possible the file compressed really well, but I doubt it since the file is a a video file already in a zip file
15:52 🔗 MrRadar I'm trying to download it manually to verify the size but GameFront is being tempermental
15:53 🔗 MrRadar OK, Chrome says the file should be 74 megs in size (based on the HTTP header presumably)
15:55 🔗 MrRadar The manual download is stuck at 1.4 megs; I'll let it run until either Chrome or GameFront abort the connection
15:58 🔗 MrRadar OK, the download "finished" with 1.5 megs retrieved and the ZIP header says it's supposed to be 83.2 megs compressed
15:58 🔗 arkiver Finished for me with 74 MB
15:59 🔗 MrRadar Can you extract it?
15:59 🔗 arkiver no, 74 is not the full file apparently
15:59 🔗 MrRadar :(
15:59 🔗 arkiver Grab is paused.
16:00 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
16:01 🔗 arkiver SketchCow: see above
16:02 🔗 arkiver We got all popular files. We did not get the less popular files, that is files with around 20 or less downloads
16:02 🔗 arkiver I'm stopping the GameFront grab, since we can't trust their returned data anymore, and there's not a good way to verify the downloaded data in the scripts
16:02 🔗 arkiver Well, we got a part of the less popular files.
16:04 🔗 HCross How are we going with the forums?
16:04 🔗 arkiver I just checked sizes of some of the recent grabbed files, they sometimes don't match the size from the gamefront page
16:04 🔗 arkiver The older grabbed files do seem to match the sizes, so I'd say it's only a part of the very recent files that is corrupted
16:05 🔗 arkiver The forums grab is still running
16:05 🔗 HCross I'd say keep an eye on it, and see if they decide to behave
16:05 🔗 arkiver yeah
16:06 🔗 arkiver I think they might have started the shutdown though
16:06 🔗 Start has quit IRC (Quit: Disconnected.)
16:06 🔗 SketchCow Assume that's the case, then.
16:06 🔗 SketchCow Some stuff will be lost, etc.
16:06 🔗 phuzion arkiver: want me to move resources to the forums grab?
16:06 🔗 SketchCow But we got, what.... a bit, right
16:06 🔗 SketchCow Terabit
16:06 🔗 phuzion SketchCow: Tracker says about 36TB or so.
16:06 🔗 arkiver We got a very very large part of everything from gamefront
16:07 🔗 HCross http://www.gamefront.com/gamefront-is-closing-down-april-30-2016/ hmm, very interesting comment at the bottom
16:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:08 🔗 SketchCow We'll see!
16:08 🔗 arkiver So http://gamefront.online/ would host everything...
16:08 🔗 arkiver 8 TB is really far off of our number though
16:08 🔗 SketchCow I LOVE it when there's multiple attempts to download something.
16:08 🔗 arkiver and I'm pretty sure we did not get duplicates
16:08 🔗 SketchCow Well, that's what happens when someone works alone in the dark.
16:09 🔗 SketchCow Did we get more than 326,000 files?
16:09 🔗 SketchCow Also, he's probably saying "files" versus "all HTML, etc."
16:09 🔗 arkiver I can't be 100% sure, but I'd say yes
16:10 🔗 arkiver Anyway, when everything is uploaded I'll work on an index of every file we have saved including extracted title, description, date and the direct download URL in the wayback machine
16:10 🔗 arkiver Since some mod websites have shown interest in these files.
16:11 🔗 arkiver It might make it easier for them to get and link to the saved files in the Wayback Machine
16:12 🔗 arkiver SketchCow: I found out why the website saved only 326000 files.
16:12 🔗 arkiver http://www.gamefront.com/files/ says there's around that number of files
16:13 🔗 luckcolor has joined #archiveteam
16:13 🔗 luckcolor Hi guys
16:13 🔗 luckcolor So i wa reading the logs
16:13 🔗 arkiver That is only though for IDs that are indexed and sorted under games and categories
16:13 🔗 arkiver There are a lot of files that were not sorted under a category and game, and we got those too
16:14 🔗 arkiver So that'd explain the difference
16:14 🔗 dashcloud has joined #archiveteam
16:15 🔗 luckcolor So arkiver how much do you think we have missed in terms of files on Gamefront?
16:15 🔗 arkiver I don't know
16:16 🔗 SketchCow I expect that eventually we'll talk to the guy.
16:18 🔗 luckcolor Guys just in general about irc channels and such, like how do you knwo that somebody is not using an fake nickname?
16:18 🔗 arkiver Can check IP, but we can't be sure of that
16:18 🔗 luckcolor right
16:18 🔗 arkiver Maybe this arkiver is a fake arkiver
16:18 🔗 luckcolor lel
16:19 🔗 SketchCow If this is a fake arkiver, that little charlatan's working his ass off
16:19 🔗 SketchCow With me, people ask reasonable questions
16:19 🔗 SketchCow And if I answer politely, they know it's not me
16:20 🔗 SketchCow The key is for it to be a totally generic question, like "how are you doing" or "we can't help but notice you're not keeping track of disk space"
16:20 🔗 SketchCow If what follows isn't a 34 paragraph rant threateneing the lives of at least 3 people
16:20 🔗 SketchCow impostor
16:20 🔗 luckcolor well right
16:20 🔗 luckcolor I'm probably going oftopic now
16:21 🔗 luckcolor sorry :P
16:21 🔗 arkiver We got a very very large part of everything, but it still feels horrible to not get everything :(
16:22 🔗 luckcolor yeah
16:23 🔗 luckcolor Bah i should totally get some irc client on my server i'm tired of reading logs
16:23 🔗 arkiver http://gamefront.online/ must have had some help from inside GameFront
16:24 🔗 luckcolor well
16:24 🔗 luckcolor it says 8000 gb and we did 36480 gb in total
16:32 🔗 MrRadar Hello
16:33 🔗 MrRadar n/m, wrong window
16:35 🔗 luckcolor arkiver this seems to be the profile of the guy who made that website http://www.moddb.com/members/d-airy
16:35 🔗 luckcolor I was reading the comments of moddb :P
16:37 🔗 schbirid 100% and still doing 83.691 MB/s?
16:37 🔗 luckcolor dunno
16:38 🔗 luckcolor it seems he just registered on moddb for that
16:38 🔗 luckcolor http://gamefront.online/get_progress.php this is the url for the data on the webpage
16:45 🔗 schbirid http://www.eurogamer.net/articles/2016-04-29-fable-developer-lionhead-closes-down-today
16:45 🔗 schbirid <arkiver> We got all popular files. We did not get the less popular files, that is files with around 20 or less downloads <- aww man, i wish you focused on the stuff that would not be available on any other gaming file site...
16:47 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:48 🔗 MrRadar The problem is on GameFront's end (mostly)
16:48 🔗 MrRadar We can't do anything if their copies are truncated
16:48 🔗 MrRadar But yeah, it does suck
16:49 🔗 luckcolor Yeah
16:54 🔗 dashcloud has joined #archiveteam
17:07 🔗 atrocity so gamefront... all are sayign this for me: Tracker rate limiting is active. We don't want to overload the site we're archiving, so we've limited the number of downloads per minute. Retrying after 300 seconds...
17:07 🔗 xmc gamefront is kinda busted
17:09 🔗 atrocity lol
17:09 🔗 atrocity should i stay on it?
17:09 🔗 xmc ¯\_(ツ)_/¯
17:09 🔗 xmc is there anything else on the tracker that you want to run instead?
17:10 🔗 MrRadar Move over to the gamefrontforums or bayimg project
17:10 🔗 atrocity kk
17:10 🔗 MrRadar The GameFront forums are ending today and bayimg in a few days
17:10 🔗 MrRadar Altho the forums are currently tracker limited
17:11 🔗 metalcamp has joined #archiveteam
17:28 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
17:32 🔗 yakfish has joined #archiveteam
17:32 🔗 matthusby has joined #archiveteam
17:34 🔗 SadDM has joined #archiveteam
17:34 🔗 swebb sets mode: +o SadDM
17:37 🔗 jspiros has joined #archiveteam
17:49 🔗 luckcolor arkiver can you check the staus of FOS i get this error
17:49 🔗 luckcolor rsync: chgrp "/bayimg/.bayimg-16pictures_ganbdaaa-20160429-174836_data.txt.IFhkO7" (in dev) failed: Operation not permitted (1)
17:49 🔗 luckcolor *status
17:50 🔗 phuzion luckcolor: are you rsyncing manually or by script?
17:51 🔗 xhdr has quit IRC (Remote host closed the connection)
17:51 🔗 luckcolor by srcipt
17:51 🔗 luckcolor *script
17:55 🔗 xhdr has joined #archiveteam
17:55 🔗 xhdr has quit IRC (Excess Flood)
17:56 🔗 xhdr has joined #archiveteam
17:58 🔗 arkiver schbirid: I agree
17:59 🔗 Emcy has quit IRC (Ping timeout: 370 seconds)
17:59 🔗 arkiver It later turned out that the less popular were a little problematic to download compared to the 'normal' files
18:00 🔗 arkiver Apparently the less popular files first needed to be activated using a POST request with a special string
18:00 🔗 arkiver If they were not activated that way, they would not download
18:00 🔗 phuzion Is that what the token thing was?
18:00 🔗 arkiver yeah
18:00 🔗 phuzion ok
18:00 🔗 arkiver As with any project, we first grab everything and then have a look at the problematic items
18:01 🔗 arkiver In this case the problematic items were those less popular files
18:05 🔗 arkiver schbiridi: let me know if you have any other question regarding what we saved
18:06 🔗 arkiver schbirid*
18:19 🔗 luckcolor has quit IRC (Quit: Page closed)
18:19 🔗 Start has joined #archiveteam
18:43 🔗 remsen has quit IRC (ircd.choopa.net irc2.choopa.net)
18:43 🔗 remsen1 has joined #archiveteam
18:46 🔗 Morbus has quit IRC (Read error: Operation timed out)
18:50 🔗 bwn_ has quit IRC (Read error: Operation timed out)
19:10 🔗 bwn_ has joined #archiveteam
19:22 🔗 Emcy has joined #archiveteam
19:27 🔗 Emcy has quit IRC (Read error: Connection reset by peer)
19:44 🔗 Start has quit IRC (Quit: Disconnected.)
20:17 🔗 db48x has joined #archiveteam
20:28 🔗 remsen1 has quit IRC (ZNC 1.6.2 - http://znc.in)
20:28 🔗 remsen has joined #archiveteam
20:29 🔗 Gfy is there a full site grab of iSONEWS? It'll be closing down http://www.theisonews.com/forums/index.php/topic,161745.0.html
20:39 🔗 MrRadar I put it into ArchiveBot
20:48 🔗 tomwsmf-a has joined #archiveteam
20:49 🔗 Emcy has joined #archiveteam
21:32 🔗 tomwsmf-a has quit IRC (Read error: Operation timed out)
22:21 🔗 schbirid arkiver: damn nice job in any case :))
22:25 🔗 Start has joined #archiveteam
22:57 🔗 Honno has quit IRC (Read error: Operation timed out)
23:19 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
23:26 🔗 BartoCH has joined #archiveteam
23:29 🔗 schbirid has quit IRC (Remote host closed the connection)
23:44 🔗 Stiletto has quit IRC ()
23:55 🔗 arkiver schbirid: thank you! All less popular files should also be available on http://gamefront.online/ soon
23:55 🔗 arkiver It looks like the person behind that site was able to do a grab too

irclogger-viewer