[02:51] The bitsavers mirror I am doing is down to the w's in pdf. [02:51] Only took it 5 days. [06:51] alard: Can I go ahead and archive out Cinch and put City of Heroes on its own directory? [07:17] Fun Trivia: Currently, Fortress of Solitude has 7.4tb of space free. [07:18] :D [07:18] git annex get */* [07:24] http://archive.org/details/wikileaksarchive_20100522 [07:24] bwa ha ha, well, that can't go south [07:25] :D [07:25] I think I finally found ap roject no one else does [07:26] Redfish Magazine [07:28] damn slow site due to it being in US tho. [07:29] http://redfishmagazine.com.au/ [07:34] err in Au even ;D [07:37] I could do that magazine in under 20 minutes, I think. [07:38] Grab them, we can put them on archive.org. [07:39] i'm doing em now :P [07:39] Its something I can manage \o/ [07:39] It's asking me which collection to put it in.... they are PDF text so... text, but.... Community texts? [07:39] SketchCow: Yes, you can remove City of Heroes from the Cinch directory. It's not being used at the moment. However, we might want to do another run of the CoH boards before it closes, so it would be useful to have a different rsync upload space if you close Cinch. (Perhaps a reusable "warrior" rsync thing?) [07:40] (yes I suck and have never done this before :( ) [07:41] everyone learns sometime [07:41] well, some people never learn [07:41] :D [07:42] such as which licence do I choose etc :< [07:47] Do your best, Smiley. I'll fix everything soon. [07:47] When do CoH close? [07:48] SketchCow: ok thanks :D [07:49] I could write a bash script to get the different versions, but its almost as quick for me to simply type in the browser :D [07:50] Got all the EU ones to date so far, just waiting for the collection to update to ensure I'm doing this right ;D [08:06] "The City of Heroes® servers will shut off on November 30, 2012" (although it doesn't mention the boards specifically) [08:17] Can you turn on the CoH warrior job? [08:20] Is it time yet? (We've already got a full copy, just not the recent posts.) [08:21] http://bitsavers.trailing-edge.com/www.computer.museum.uq.edu.au/ [08:22] alard: I dunno, it was a question of if it's "that simple". [08:23] Yes, it's not hard. I have to clear the current 'done' list, then queue the ids again. (And I've just said goodbye to the upload space on fos, so that must be fixed first.) [08:23] http://bitsavers.trailing-edge.com/www.computer.museum.uq.edu.au/newsletters/ [08:24] Why look, something I'm going to shove into archive.org [08:24] WHY LOOK [08:25] server readonly -- tasks waiting for harddrive fix ??? [08:25] Also, what is the "submit time" ? [08:25] time the task was kicked off [08:26] TheSienaNewsJune161948(2.1 years)xxx@xxxxx.xxx-1waiting [08:26] lulz? 2.1 years ago? [08:26] sounds old [08:27] theres another for 2.4 years [08:38] hm, I have a request for help finding data in focity [08:38] anyone remember how we were doing that? [08:39] focity, fortune city? [08:39] I'd only just joined when that was going on. No idea sry :< [08:40] right, fortunecity [08:41] http://archive.org/download/test-memac-index-test/fortunecity.html [08:42] thanks [08:42] That's only for the new username-style sites, not for the ancient area/street/number structure. [08:43] hmmm, she's actually looking for anissa.myblogsite.com [08:43] myblogsite.com was a fortunecity service, did we suck that in? [08:44] No, I don't think so. This is the first time I hear of that. [08:44] damn. [08:44] ok [08:44] We only have things in *.fortunecity.* [08:45] (They apparently also had MyPhotoAlbum.com.) [08:46] nice name too [08:46] ... [09:09] ah, wonderful! TASK FAILED AT UTC: 2012-10-02 18:11:37 http://www.us.archive.org/log_show.php?task_id=124346847 [09:09] nice /usr/local/petabox/deriver/derive.php /var/tmp/autoclean/derive/EB1911WMF 'task_id=124346847&identifier=EB1911WMF&server=iw600306.us.archive.org&cmd=derive.php&args=dir%3D%252F12%252Fitems%252FEB1911WMF%26prevtask%3D124346467%26server_primary%3Dia601202.us.archive.org&submittime=2012-09-22+14%3A27%3A06&submitter=federicoleva%40tiscali.it&priority=-6&wait_admin=0&finished=0' failed with exit code: 9 [11:11] errr, [11:11] when I select Public Domain Mark it kills firefox :/ [11:16] added a film that Jason references in one of his talks which is appently public domain but not on the Archive - Jim Henson's "The Cube". [11:16] except it doesn't show under my uploads :< [11:28] http://archive.org/details/TheCube-JimHenson-1969 worked the second time :S [14:16] SketchCow: add that to the http://archive.org/details/wikileaksarchive collection? [14:23] Aurgh now the cube appears twice! Failure! [14:41] https://archive.org/details/groks209 its silent o_O [17:45] DO NOT take this the wrong way as I am absolutely sure your heart.s in the right place, [17:45] Look, Jase .. [17:45] but judt why, exactly are you doing this? [17:45] I mean, I can see the computer magazines & everything being done . the .Computer Gaming World. PDF archive is a gift from God itself! . but I.ll wager 99.9% of the books you.re Scribe-ing will never ever be looked at. [17:45] And all of the important works I.m sure.s been covered already by the Gutenberg Project. [17:46] Is .Just Knowing It.s There & Available. really a good enough reason to be doing all of this incredibly difficult work ..? [17:46] .... [17:46] This is the best comment, ever. [17:53] punctuation motherfucker [17:53] ~ [17:54] what kind of keyboard does that guy have [17:54] | tr "'," ".." [18:24] alard: Are you around? [18:25] Hey, so good news, everyone. Archive.org is now generating the files and beginning the process of a new wayback machine index. [18:26] I'm tasked with helping us prepare the archiveteam uploads of the last year for inclusion into the Wayback. [18:26] So we're going to need an inventory of the sites we've grabbed, which of our stuff is in WARC format. [18:43] SketchCow: Yes, somewhat. [18:43] That's good news. [18:48] Does this include the warc-in-tar stuff? [19:28] Yes, I'm about to set up an inventory of all our projects, so we can pass it for testing [19:28] Most wil be fine, SOME might need to be rejiggered in some way. [19:46] SketchCow: FUCK YEAH [19:46] :D [19:48] Without the MobileMe stuff, right? [20:29] this link is now sorta old but anyway... http://h30565.www3.hp.com/t5/Feature-Articles/The-History-of-the-Floppy-Disk/ba-p/6434 [21:35] The mobileme stuff is a separate project for now. [21:35] MAYBE it goes in. [21:36] Archive.org is basically DOUBLING the amount of data coming in from the last crawl (which people have figured out, was early last year.) [21:36] This is going to just be a massive ingestion of data, but then our stuff joins in. [22:13] https://docs.google.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync [22:23] So, that's me collecting all our items and collections that have .WARC files involved in them. [22:24] So they'll go into the archive.org wayback this month! [22:24] And the Wayback will jump up to six months ago. [22:24] yaaaay [22:24] is it some kind of horrendous semimanual batch update? [22:24] yes. [22:24] Oh, as horrible as you can imagine. [22:24] The number thrown to me is that the DB has 168 billion rows. [22:25] Wait, no. [22:25] 168 million, sorry. [22:25] Anyway, it jumps to 240 million after this single update. [22:25] that's a decent database [22:25] So it's... significant [22:25] crap [22:25] It may double the wayback [22:25] but the data is mostly from just recently? [22:27] is godane's stuff going in [22:27] I guess it's all too new [22:28] SketchCow: don't forget devilskitchen ;) [22:31] Already in there. Look again. [22:31] I haven't browsed godane's stuff yet. [22:31] reminder there's still 3 items kicking around in http://archive.org/details/archiveteam-mobileme [22:35] Fixed. [22:36] This is definitely where I need help - to find orphan items so we can clean it up, and then shove the right things into the crawl. [22:39] [22:40] not fixed for http://archive.org/details/archiveteam-mobileme-hero-2511x [22:48] friendster also absent [22:49] Friendster isn't warc, as far as I can tell. [22:50] Am I wrong? [22:50] no idea [23:14] Investigating. [23:32] Just checked, Friendster is the last of the No-WARC saves. [23:32] It was soon after that the request for WARC came. [23:46] also Friendster had the really crazy phantomjs ripper, since we needed javascript or something.. [23:48] "why are you doing this?" uh, because otheriwise it doen't get done and winds up lost to the dust of time. [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA! [23:54] WHY DON'T I STRETCH OUT? AHAHAHA!