[02:06] I'm having a problem with Warrior. [02:06] I'm working on the Formspring project, but it keeps failing uploads. [02:07] Anyone have an idea why? [02:12] oh well. [02:12] I hope that this at least gets to whoever made Warrior. [06:08] http://archive.org/details/ernie1241_general [06:08] Lots of goodies here [14:55] Big ol' ops hug [14:55] OK, so how are we doing, here? [14:55] Posterous is slow but it's progressing along, we'll get what we get until the machines are gone. [14:56] Formspring humming along - Formspring is not happy about this at all, but it's happening. [14:56] URL Team fine, last I checked, new admin Chronomex [14:57] I think we're currently stalled with Xanga. We really should not be. [14:57] Are there other web downloads I'm missing? [14:58] IGN/Gamespot finished afaik [14:59] I've uploaded all my warcs. They are currently sitting on my account. [14:59] Pouet is still going. [14:59] Need help putting them somewhere? [14:59] Well they are all on IA. I can't do anything I believe? [14:59] I'd love to have them go somewhere sensible. [14:59] Oh, sitting on your IA account! [14:59] Yah, you've been busy so I didn't want to keep chasing :) [15:00] Also theres 50+ newamerica warcs I believe. [15:00] at 2Gb each. [15:00] I am going through piles of uploads like that. [15:00] k, hit my account at some point, then if you let me know, I'll tweet my pride :D [15:00] Ha ha Yes. [15:01] pm me the e-mail account. I'll go through it [15:01] Oh yes, one of my favorites. [15:02] Oh, you HAVE been busy [15:02] rw-r--r-- 1 tim.bowers games 626M Jun 11 16:01 ./bin/ign/storage/pouet/pouet.net_06052013.cdx [15:02] -rw-r--r-- 1 tim.bowers games 57G Jun 11 16:01 ./bin/ign/storage/pouet/pouet.net_06052013.warc [15:02] Still going. [15:02] Dude, pouet is large. Do you realize how large? [15:02] SketchCow: yah, 5 weeks signed off work will do that to a guy XD [15:02] though I did it all in about 2 weeks. [15:02] After that, kind of ran out of things to grab XD [15:03] SketchCow: Like I care... it'll keep going until it explodes :D [15:03] Unless it tops out 1.5Tb, then we have a problem D: [15:07] OK, they're swapped over. [15:07] greader-grab can really start going once I strip out the 404s [15:07] http://archive.org/details/archiveteam_ignsites will populate sooner rather than later. [15:07] yey [15:09] There, it populated. [15:12] \o/ [15:12] you got the comment pages: https://archive.org/details/www.g4tv.com-thefeed-comments-pages-20130610 [15:12] SketchCow: omf_ also has done some grabs but I don't know if he's uploaded them [15:15] so what was the deal with formspring not really shutting down [15:16] FAME \o/ [15:16] DFJustin: someone brought them [15:16] Yeah [15:16] ort threw money at them or something. [15:17] But they did it really stupid under the radar. [15:17] But we all know how money runs out.... [15:17] Showing, of course, they were inclined to be non transparent the whole way. [15:17] Yahoo shits out money, then loves watching the reaction of people when they shut something down. [15:18] They MUST be doing it for the reaction at this point. [15:18] SketchCow: this one shouldnt' be in that collection: http://archive.org/details/newamerica.net [15:18] godane1: good catch, was about to go looking for that [15:18] Sadly it's incomplete, but it OOM'ed a 12Gb box [15:19] thats ok [15:19] this was just a panic run anyways [15:19] in cause if they do shutdown [15:23] Yeah, they're going to end up being 80tb [15:24] wow [15:24] i figure maybe 1TB at the most [15:26] I mean Formspring. [15:26] oh [15:30] Uploading hundreds of issues of Manga, millions of pages. [15:30] you mean the stuff anime is based on? [15:31] Sometimes it's based on that. [15:31] I'm doing it by hand with lots of script help - adding a new manga issue every 3 seconds. [15:32] http://archive.org/details/manga_library [15:32] oh that reminds me [15:33] that guy who has a pile of manga and other "adult" japanese scans hasn't come back to me [15:34] I'm not toooo worried these will disappear [15:35] hentai encyclopedia would be a good thing to dump onto ia at some point [15:39] also a lot of these would benefit from page-progression=rl [15:40] Breaking: Greek public broadcaster ERT is being closed down tonight - website at www.ert.gr (in greek, obviously) [15:41] YEAH ALRIGHT LETS DO THIS [15:42] Yeah, get on that shit. [15:42] WARC 'er up [15:44] LEEEEEEEEEEROOOOOOOY [15:44] JEEEEEENKIIIIIIIIIIINS [15:46] Anarchive is happily slurping away at it on a root tmux session. [15:46] what's the best way to archive a site like that? just wget --mirror ? [15:47] There's a few generic commands at www.archiveteam.org/index.php?title=User:Djsmiley2k#Generic_Wget_command (need to get to moving it over to an article) [15:49] nice thanks, I have 100 bits/s down and an unused laptop that i can bring on the site [15:50] 100 bit's down o_O [15:50] sweden! ftw [15:50] mbits*? [15:50] Walk Tephra_ through the WARC wget [15:51] well we pay for 100 down but usally get 60-70 [15:54] should i just use the generic command onDjsmileys page? [15:54] That's all that I'm doing. [15:55] (I suck at using wget) [15:55] Yes, that's a good one for a panic download, Tephra_ [15:56] SketchCow GLaDOS I'm starting it now, will report when its sucked [15:56] everything [15:57] Thanks [16:17] http://www.archiveteam.org/index.php?title=Main_Page has a wrong link http://www.archiveteam.org/index.php?tile=Formspring [16:17] ..and we/someone took ert.gr down. [16:18] needs to be corrected to http://www.archiveteam.org/index.php?title=Formspring or better [[Formspring|etc.]] [16:18] My bad. [16:18] GLaDOS: Tephra was punished by being kicked off by IRC for 3 min, it seems [16:19] Heh [16:19] seems OK here [16:19] Yeah, came back up. [16:19] phew [16:21] Apparently the staff are planning to keep on broadcasting and are guarding the headquarters. [16:21] lots happening. [16:42] SketchCow: getting lots of "can't write to ert.gr/xx/xxx/yy/yyy/blah-blah-blah (not a directory) [16:43] seems like wget didn't create a correct directory structure or something [16:44] now maybe jason can do something about this :) http://www.foxnews.com/tech/2013/06/11/scientists-searching-for-world-first-web-page-turn-to-north-carolina/ [18:19] Internet Archive's having some connectivity/s3 issues. [18:55] https://twitter.com/textfiles/status/344528364129882114 [19:04] lol [19:15] We won the 2013 National Digital Stewardship Alliance award for an Organization. https://twitter.com/the_idea_agency/status/344532491446657025 http://blogs.loc.gov/digitalpreservation/2013/06/and-the-winner-is-announcing-the-2013-ndsa-innovation-award-winners/ [19:16] See there are awards for groups of loud assholes ;) [19:22] I for one am going to retweet that, maybe even leak a nice comment on the blog. Lets publicize our VICTORY!!1! [19:22] s/leak/leave/ # too much NSA [19:23] Jason, can you ask the NSA to borrow a copy of the Internet ? [19:25] I bet the NSA has the most pristine and complete copy of 4chan ever [21:39] Hi @ all [21:39] I'm having a little problem... today the warrior "died". It forgot the old state and restarted with new jobs. The VM did not reboot. The files were still on the disk... [21:40] it happened at least twice... more than 50gb of downloaded work gone... [21:40] is there any possibility to resume old threads? [21:40] Which site are you downloading? [21:40] formspring [21:40] Well, I'm sorry to hear it's lost. [21:40] but there where also a few posterous jobs [21:41] Well, the answer is to go to #warrior and ask people if they want to help and to be patient. [21:41] Neither Posterous or Formspring are too time critical. [21:41] there are no other jobs in the warrior atm ;-) [21:41] or are there things which should come first? [21:43] No, you should go back to Formspring, but people in #warrior should be able to trace back what MIGHT have been your issue. [21:47] 400 claims? [21:48] me? [21:48] no about 65 are at work now... [21:49] perhaps 70 [21:49] Come to other chanel -> #warrior [22:26] SketchCow: update for ert.gr, they seem to have shutdown the tv broadcast but the site is still up and kicking for now. I'll keep wget up and running through the nightm hopefully the site will be up long enough to grab a good portion of it, and hopefully I have solved the problem with wget not getting the directories right [22:29] Thanks. [22:37] looks like i can upload again [22:37] :-D [22:53] Tephra: is it possible to help with ert.gr?