[01:23] i have more of marxists.org now then what archivebot got [01:24] its around 87k urls now [01:24] only 254 have 403 errors [01:28] neat [01:36] any OSX users around? [01:56] 92k now [02:18] http://www.reddit.com/r/DataHoarder/comments/245ij1/start_your_own_rgonewild_archive_automated_data/ [04:36] 99k urls now [04:36] some of the pdfs will be uploaded so we can have collections of them [04:36] like Peking Review [04:40] i'm looking at peking review pdfs and i think they don't have them all links to html pages [04:41] i will see about grabbing the missing ones [07:16] SketchCow: you may get another copy of the computer chronices [07:17] a guy on myspleen is taking the 1gb file and making 175mb mp4 [07:22] the reason is that mp4 are better then what archive.org makes [07:34] i notice one of the computer Chronicles breaks around the 1:47 mark to a home video with a baby in it [07:35] its this one by the way: https://archive.org/details/CC601_macworld [07:36] also another thing about the coomputer chronicles from myspleen [07:36] we will have a collection that can be in season and episode order [16:27] lol dreamspark [16:27] Kennwort: Das Kennwort muss mindestens sechs Zeichen lang sein und kann keine dieser Zeichen enthalten <,>,',;,=,(,),|,[,],?,/,#. [16:27] that looks like perl [16:41] how much does archivebot like to grab github? [16:48] not much I would expect [17:11] so i think marxists.org is redownloading pdfs that have been downloaded [17:11] very odd [17:17] i'm stop my mirror of marxists.org [17:18] it was redownloading files that are downloaded so it was best to stop it [17:18] really....? [17:19] Today, I found out that Bill Murray wasn't in Charlie's Angels 2 and 3 because he pointed to Lucy Liu and said "I have no idea why you're here." [17:19] thats the way it looks anyways [17:20] if anything else you guys are getting 85gb of the website [17:20] more then what archviebot got [17:21] fantastic [17:22] and since i have the files i may make some collections out of the pdfs i got [17:23] cool [17:23] I don't know where you find the time or energy, man [17:23] you are a machine [17:28] He burns with the hate of a thousand suns [17:28] deletionists deserve all the hate they can get [17:29] godane: do we know which files are the ones that are getting removed? [17:29] it's all the ones published by that publishing company [17:29] aha: https://www.marxists.org/archive/marx/works/cw/ [17:31] i think i have all that too in my dump [17:31] that's what's getting pulled [17:34] i'm uploading the first 11gb of warc.gz right now [17:47] "<@exmic> you are a machine" That might actually be it... godane's a robot! [17:47] here is the item its being uploaded to: https://archive.org/details/www.marxists.org-20140426 [17:48] heh [17:51] i'm also still mirroring nbc/cbs/abc news stuff [18:09] i'm starting to upload some pdfs for collections: https://archive.org/details/v1n01-nov-15-1910-agitator [18:37] get in on the act Stratego MAGEGO JUMBASTIC http://ow.ly/vusaO [18:45] i'm uploading American Appeal volume 7 and 8 [18:57] Quite the up-tick of spam in the last few days. [18:59] You think this is an uptick? [18:59] Well, I mean, along the lines of 'saw second cow' [19:00] Well, more than I've seen since I've been around. [19:02] midas: honestly I wouldn't grab github with archivebot [19:02] I'd use github-mirrorer [19:03] er [19:03] whatever closure's thing is called [19:03] github-backup [19:06] i'm getting a 2009 episode of 60 minutes that talks about movie pirates [19:14] so i may have found a way to grab the original file names [19:14] it was like what i thought it should be [19:15] its something like imagename_646.flv [19:16] i'm trying to get the original files from cbs news cause alot of the newer links got to media.cbsnews.com [19:16] but they black bars on the the sides [19:17] yipdw: will do [19:27] yipdw: can you backup winocm's repos? [19:37] http://gizmodo.com/inside-the-us-nuclear-silos-where-floppy-disk-are-still-1568609439? [19:42] balrog: starting now; keep in mind that this only gets public data [19:44] cabal install is toasting my laptop [19:57] i'm now uploading American Socialist pdf collection [20:08] gah, the github module doesn't work [20:08] balrog: never mind, there's something wrong with my Haskell environment [20:45] why does it take haskell to run git clone a bunch? [20:47] exmic: github-backup does more than that [20:47] sure, wikis and tickets and suchlike [20:48] I could just clone them all I guess [20:57] yipdw: still having issues? [20:59] SketchCow: can u tell me how big the pdf collection is so far? [20:59] the ones im sending i mean [21:00] balrog: haven't been able to get to it -- in the middle of app release procedure [21:23] ah ... ok [21:23] we probably have a few days [21:25] i'm also starting to upload more buck sexton show: https://archive.org/details/the-buck-sexton-show-01-04-2014 [21:26] Which ones are yours, SmileyG [21:30] radioamerica i think the dir was called [21:31] 22G american_radio/ [21:31] du -sh american_radio/ [21:31] urgh 1/4 [22:07] I'M ON A SUGER RUSH FROM DONUTS [23:12] godane: Is there an issue with me uploading these Amazon manuals? [23:13] I just removed the dupes. [23:14] no [23:14] i removed alot of the dupes before uploading [23:17] I know. [23:17] And I got the rest. [23:29] just saw this today: http://rr-project.org/ rr records nondeterministic executions and debugs them deterministically [23:42] handy [23:45] it was designed for use with Firefox, but works with most programs [23:46] http://24.media.tumblr.com/ebe179bca4dc0d7c6bd0bd7d0cbbccd3/tumblr_n4md4pMruz1qa7q1no1_1280.jpg [23:48] MIDI sequencing in JS: http://mudcu.be/midi-js/