[00:05] poor city pf heroes [00:24] http://www.glitch.com/closing [00:25] shutting down in 7 days... [04:55] Why in the items/hour box, I see two black and blue line? [09:40] SketchCow: the most downloaded TGM is the one with empty ISO :p https://archive.org/details/cdrom-gamesmachinedvd-volume-14 [09:43] paging dr godane [09:50] That is a mess, huh. [10:01] hey all [10:01] going to bad [10:02] nite [10:06] i'm uploading some nintendo 64 promo tapes [10:06] nite [12:21] Let me see about shoving more crap off of FOS into the world. [12:23] Data wants to be free! [12:32] 53 CD-ROMs about to dead drop. [12:32] I have a program called bchunk that converts a .bin/.cue to an .iso [12:32] About to run it on all these poor things. [12:39] And after that, assuming it works, I'd like us to go through the cdrom collection and find all bin/cue and I can write something that grabs the bin/cue, makes an iso, and uploads the iso. [12:55] SketchCow: Are you sure converting them to ISO insteads of MDF is a good choice? ISO is for one tracked images only, what if they have more tracks? [12:57] Then it makes more. [12:58] Regardless, the reason to convert them to .iso (AS WELL, I include the original .bin/cue) is that the archive.org viewer works with iso. [12:58] I have both the original AND the convert. [13:01] http://archive.org/details/gambler_cdrom_01 (First one!) [13:02] http://ia601500.us.archive.org/isoview.php?iso=/20/items/gambler_cdrom_01/GAMBLER_01_1996_0901.iso therefore works. [13:02] Now, please allow me to do this 53 more times. [13:09] http://archive.org/details/cdrom_gambler [13:09] Aaaand, they're appearing! [13:09] Not bad, only takes a few minutes. [13:26] SketchCow: the code of that isoview.php is free? is public somewhere? [13:29] I assume so, no idea where though. [13:29] It's just using tar. [13:31] ok [13:48] Hey, want to hear a pet peeve? [13:49] When someone criticizes my shit, and puts a :) at the end. [13:49] Want. to. dragon. punch. [14:07] void__: It's basically just formatting the output of 7z l [14:07] The code isn't very exciting, and is nearly all just archive-specific header spewing, etc [14:09] underscor: 7z called with a system() or equivalent? [14:09] Yes [14:09] ok, thank you [14:09] We wrap system calls in a bunch of layers of abstraction, but that's what ends up happening at the very end [15:13] SketchCow: is https://archive.org/details/gambler_cdrom_32b supposed to have 10 cds on it [15:18] They're tracks. [15:18] This is what BCHUNK does. [15:21] oic [15:26] underscor: any chance of using lsar/unar instead of 7z? there are currently a lot of archives that confuse it for one reason or another [15:34] and they understand .bin without needing all this tomfoolery [15:48] I asked about this. There was a Reason. [15:48] It was mostly related to not shaking up the poor cluster of machines to do this new thing. [15:48] Until, I guess, a more powerful set happens AND has an opportunity to update. [16:02] Yeah, basically what SketchCow said. Adding things to the workers won't happen until we roll out a new set [17:02] is there some linux tool/script to nicely archive websites via their rss feeds? [18:11] Hey sketch, you around? [18:46] BCHUNK apparently supports .wav instead of .cdr output, might be more ia-friendly [18:58] it does, i use that all the time [19:27] just saw jason's tweet, http//mdf2iso.berlios.de is nice too [19:27] i also have mdfextract installed but do not recall using it [21:29] Know I'm probably late to the party on this, but is there any desire to archive The Daily? [21:31] the daily what? [21:32] Newscorp's "Revolutionary" ipad-only newspaper [21:32] Articles were posted to a web CMS but could only be accessed when sent from a subscriber [21:32] sure, why not [21:33] Andy Baio started it back in 2011 when it started [21:33] The Daily is also the name of my alma mater's newspaper, which happens to be the oldest news periodical in the state [21:34] hah [21:36] http://waxy.org/2011/02/the_daily_indexed/ [21:37] Bascially Steve Jobs made Murdoch fall in love with the iPad, convinced it it'd save the business model of newspapers. It didn't. [21:38] But Murdoch hired some of the smartest writers and locked their content up so almost nobody was able to read it [21:38] yup [21:39] It'd be a shitty process; probably involve people with iPads subscribing or using the week free trial, grabbing as much of the backlog as possible, and sharing as many article links per issue as they could to an email address (or set of email accounts) that we could use to get the links to the stories and scrape them [21:40] http://www.theatlantic.com/technology/archive/2012/12/3-theses-about-the-dailys-demise/265842/ [21:40] Grab anything you can [21:40] If you can [21:40] "It's also worth noting that Google's slowly indexing all the articles too, and search engines aren't blocked in their robots.txt file." [21:41] balrog_: not anymore [21:41] Their robots.txt now blocks robots [21:41] riordan: they're still showing up in google [21:41] rly? [21:42] als, http://www.thedaily.com/robots.txt [21:42] maybe they *once* had excluded ia_archiver? [21:42] oh damn [21:42] looks like they already pulled the app out [21:42] that or they did "If you cannot put a robots.txt file up, read our exclusion policy. If you think it applies to you, send a request to us at info@archive.org." [21:42] so it's all moot [21:42] they're still google indexed at least [21:43] site:thedaily.com inurl:page/2012 and site:thedaily.com inurl:page/2011 [21:44] For some reason I'm not getting anything from the google index for site:thedaily.com [21:45] I get 42700 results [21:45] hmmm [21:45] with just the query "site:thedaily.com" [21:45] (no quotes) [21:46] hmmm - well something up what google's serving to me then [21:46] try in another browser [21:46] got it [21:46] ok [21:49] and they've got their recent material in a sitemap [21:49] http://www.thedaily.com/sitemap-news.xml.gz [21:49] only the past month or so [21:51] Unlike most newspapers, which get sucked up into lexisnexis, this thing's probably going nowhere [21:52] yeah I'm afraid so [21:53] I've got a friend who knows their editor in chief - I'll see if I can pass a message along to ask if there's a plan of succession for content? [21:53] it would be nice if the old content could at least be archived [21:53] precisely [21:54] I like how the website is just images of text [21:54] that's A+ CMS right there [21:55] AAA+ [21:56] a+++ business model. Would fail again