[00:41] What was it? [00:58] i just want you to know that cbs radio feeds don't have audio files for 2009-07-25 to 2009-07-31 [00:59] i also want you guys to know there is a lot of broken or unplayable files for the daily cnn podcast [00:59] in 2010 aways [01:01] also with the cbs radio feeds [01:01] i think it stoped around or just before 8:30PM on 2009-07-24 [01:02] good news is i think only one mp3 is missing in 2009-08 files [01:05] link me one? [01:06] (of the broken files) [02:25] balrog: http://podcasts.cnn.net/cnn/big/podcasts/cnnnewsroom/video/2010/02/04/the.daily.02.04.cnn.m4v [03:11] http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public [11:54] download.cbsnews.com/media/2007/01/28/video2405937.flv [11:54] thats a 60 minutes segment talking about tech support [14:19] i'm grabbing the internet history podcast [15:54] https://cdn.mediacru.sh/gGp2y7AcPdcO.jpe [18:02] if anyone has the official linux format dvd for 166 please check its site with this one: https://archive.org/details/cdrom-linuxformatmagazine-166 [18:03] i only has cause when mounted it will say there is 4.4gb on the disk [18:03] but its only 2.3gb in size [18:03] also most of linux format dvds are at least 3.9gb [18:10] Jan 31 15:32:33 someone do ftp://gamefiles.blueyonder.co.uk/ - https://archive.org/details/gamefiles.blueyonder.co.uk (only 4 months later..) underscor SketchCow ? That needs sexying up and moving to ftpsites when you get the time (I fluffed the torrent, that can be ignored/removed) [18:40] hmm... still new to this.. I take if this derive task will run until it connects to the torrent (which isn't available due to an rtorrent size error) I uploaded the file after with the python package, that finished but the task is waiting to be run... https://catalogd.archive.org/history/gamefiles.blueyonder.co.uk [18:40] confusing, murp :3 [18:43] Not sure what is happening there, actually. [18:43] Kind of neat. [18:50] Did I break it.. XD [18:50] Oh, not sure. Not sure what the torrents do, to be honest. [18:50] How they made them, etc. [18:51] Can you cancel the current derive and have it move to the archive task? [18:52] chfoo is continously checking in great things to URLTeams repository [18:52] (in the repo "terroroftinytown" that is) [19:02] ohhdemgir: the torrent task should time out eventually and then the other task should run [19:02] eventually.. heh [19:02] okies :) [19:02] takes at least a day, I forget [19:03] when getting .gov sites from the list do I just ignore headers like [19:03] Anonymous user logged in. [19:03] U.S. Government computer, unauthorized use prohibited by Title 18, U.S.C. [19:03] Welcome, ftp, to ftp,cdc,gov [19:05] that's just boiler plate, unauthorized use of any server is prohibited by law [19:05] government or no [19:06] if it's a publicly listed site that allows anonymous login then presumably that use is authorized [19:06] sounds good [19:07] the courts can be stupid about that though as demonstrated by the weev and manning cases [19:07] DFJustin: Unfortunately, as of yet not upheld by the courts. [19:07] Yeah [19:08] We had a local case where a kid "hacked" the sheriff website by going to a non-listed URL and downloading records. [19:08] I'm in the uk, ripping us sites from a server in france, fuck it [19:12] these country could extradite you to usa [19:12] I'll risk it [19:13] I am in the US. I probably violated six different laws just getting to work this morning. [19:13] nico: and an asteroid *could* hit you in the head [19:13] ersi: Bus is much more likely. [19:14] I said nothing about probability [19:15] ohhdemgir: nice work with gamefiles by! [19:15] sorry I let it sit for so long [19:16] if you consider that long then i don't want to talka bout the fileplanet stuff ever again ;P [19:16] .. I ... I still have some of that.. [19:16] XD [19:16] i'm downloading more cbsnews stuff [19:16] :-D [19:16] some of the stuff in 2007 is very interesting [19:19] is it bad to be uploading 4 things and downloading 4 things at the same time? [19:19] ohhdemgir: you must have changed nicks ;) [19:19] i really think my ocd is kicking in today [19:20] anyways, if you have leftovers from the id iteration downloading we did, you can safely delete [19:20] schbirid, I was tarx or tarxvf before [19:20] :) [19:20] i am more of a tarxfvz guy [19:21] heh [19:22] I used p7z because I kept forgetting the tar flags. [19:23] rocode, that's why I used it as my username XD [19:23] I always used p7z before [19:23] Ah. [19:24] schbirid, anysite I should get next? [19:26] if you could do anything to turn reddit back to ~2009 and remove all the fucking image macros (are they still called that?) from the web, that would be nice [19:26] schbirid: I run my own reddit proxy that pretty much does the same thing. That site really went to hell. [19:27] I think I have enough data to host reddit as it was in 2009 XD [19:27] ohhdemgir: It's ~100gb. There was a redditdev backup floating around. Two tables *shudder* [19:28] yeah, ish* I have most of it 2007 - early 2013 [19:29] Most reddit data is worthless unless you get their researcher feed, with the amount of fudging they do. [19:30] pain in the ass though, last time I put it up admins took it down and asked me to see how they 'wished to handle the release of such data' never heard back, will ia when I get the chance [19:31] right now I'm using it to put up things like this http://www.reddit.com/r/AmateurArchives/comments/24vr5r/rgonewild_history_20092013_torrents/ [19:31] https://archive.org/search.php?query=Gonewild%20Data [19:31] because boobies and data, yiss! [19:31] LOL [19:32] Reddit admins try to avoid overt backups because of the legal mess their user contributed data is. [19:32] A few more of those have gone down. [19:32] hah [19:32] Like, a couple albums. [19:32] They shut down our /r/theoryofreddit bot because we were using old data to try to create a heuristic moderation system. [19:32] SketchCow, from the original 220GB one? [19:32] rocode: wow........... [19:32] Ostensibly, yes [19:32] the problem you're gonna run into here is that you can't remove a small subset without removing the whole thing [19:33] SketchCow, tsk, silly, I'm trying to either not include usernames or release those separately now [19:33] balrog, communities as a whole go through this cycle constantly. Slashdot saw the same, fark saw the same. When enough money and public interest occurs, things go to hell. [19:34] Awww, it's rocode, our little bucket of reality [19:34] :( [19:36] balrog, true but I feel warm and fuzzy knowing that ia still has it :3 [19:36] would be nice if there was a way to only dark a portion of an archive [19:36] agreed [19:36] SketchCow: Someone has to save all this data to hand over to our AI overlords of the future. [19:36] Is this the part where rocode is going to win me over to archiving maximalism? [19:36] * SketchCow gets popcorn [19:37] http://www.cbc.ca/strombo/content/images/mj-popcorn.gif [19:37] Archiving maximalism? Saving everything? [19:37] Regarding the Gonewild Archive situation, your problem is that it's WAY too large and WAY too big for one file. [19:38] It should be, like, 4 items, each with 100 files or so. [19:38] I say this with full 20/20 hindsight. [19:38] I mean, there was no way to know, but now that people are coming out to take issue, it becomes the case. [19:39] aye, seems without even linking to it underscor's upload went dark too [19:40] I think you may be mistaking me with someone else, SketchCow, I was refering to historical voting data and comment history of reddit. [19:40] the only way around it is archiving each user as a new item, which is a pain in the ass [19:41] Well, no. [19:41] The way I proposed will make it so the part that needs replacement is much smaller and manageable. [19:42] Right now you have to basically put an aircraft carrier up on blocks to yank a single bolt off the bottom. [19:42] Having to put ONE truck out of a fleet up on blocks is still annoying but comparatively OK. [19:44] Well, wouldn't it be easier to handle smaller chunks that you can always combine into a large chunk later if needed? [19:44] hmm, true ,we'll see when it comes to uploading more [19:44] I think there is around 120GB waiting again [19:45] ohhdemgir: how difficult is it to make a script that splits it? [19:46] each user has their own folder, it can be split any old way and still make sense [19:46] so, easy [19:46] then where is this difficult? [19:47] Beyond that, you don't need to split it within users. [19:47] You can just split users. [19:47] rocode: how do they deal with the people who run the "undelete comment" stuff? [19:47] ^ this [19:47] No user's going to be more than a gig. [19:47] yeah, since a takedown request will nearly always be for at least an entire user [19:47] SketchCow, some are 3-5GB [19:47] No user's going to be more than 10 gigs. [19:47] lol [19:47] Same difference. [19:47] No user will need more than 640kb [19:48] balrog: They leave it up to the submod staff and note it in the reddit ToS as harassment. [19:48] a.k.a CYA [19:48] rocode: which part? [19:48] sec [19:49] I'm talking about https://www.unedditreddit.com [19:49] (lol expired cert) [19:50] balrog: Oh, thought you meant the auto screenshot bot [19:50] 3rd party sites are 3rd party, therefore they don't care. If it becomes a issue, they ban the IPs from the API. [19:50] do they use the API or do they scrape? [19:51] API, AFAIK. [19:52] Heh, firefox mobile does not allow of temp allow for expired certs. [19:53] Oh, they are scraping. Those guys got banned from the API. [19:55] SketchCow: you maybe getting a marxist.org section for texts [19:56] i'm trying to upload the pdfs i got from that site [19:59] godane: Was your download prior to the april 30th purge? [20:00] yes [20:01] i got about 80gb [20:01] but i think it started to redownload stuff so i killed it [20:01] i think it was redownload cause i had -E option in my script [20:02] which makes .html files if there is a folder link [20:02] but the site has alot of .htm files [20:03] so i guess it was redownloading with .htm file [20:03] its better to have the -E option other wise you folder install of folder.html [20:04] *instead [20:04] that way the folder/file.pdf will get download [20:05] other wise it will say folder is can't be wrote to since folder will be a file and folder [20:06] here is the upload item: https://archive.org/details/www.marxists.org-20140426 [20:48] http://alcatel-lucent.com/bstj/ [20:49] garyrh: Already imported into archive.org. [20:49] great! [21:51] failing miserably at reserving a hotel room, apparently [21:51] who know this was so hard [21:51] Where? [21:52] What are you using? [21:52] I tend to use Kayak these days for the US [21:52] cool [21:52] hipmunk told me that something broke and it didn't get reserved [21:52] some other site said my credit card said "NO" [21:54] CVS Loyalty Cards are not Credit Cards, you know that right [21:54] hmmm [21:54] really? [21:54] I know [21:54] I KNOWWWWW [21:54] I was surprised too [21:55] you used to be able to pay for airplane telephone service with radioshack gift cards [21:55] That was one aaaaangry hooker [21:55] because theyhad a creditcard like magstripe [21:55] lol [21:55] Oh yeah, because they couldn't run the thing until you landed [21:55] So always use the seat next to you [21:55] ding ding ding [21:56] despite having phones on planes, they couldn't use modems on planes [21:56] or something [21:56] Props to the hand-wavy cockblaster who pooh-poohed that scenario at the meeting [21:59] maybe spending money on canadian hotels is not a scenario that my bank envisioned me wanting to do [21:59] I do find you have to call ahead to the bank to get the card opened to that. [21:59] Like, when I was married to a canadian, this came up all the time. [21:59] "I'm going to Canadia, free the card" [22:00] Otherwise I was Mr. Big for dinner and couldn't buy a gum stick the next morning. [22:01] lol [22:01] canada, pfeh, who goes there [22:01] My Boston bank would block my card if I bought 4 things in NYC [22:03] to be fair, new york is really far from boston [22:04] there aren't even any direct flights [22:06] hm, what's the state department say about traveling to canada [22:06] are there any dictatorships there [22:28] urgh fucking sensorship