[08:53] Just to let you guys know, I'm doing another backup of Everything2's nodes: http://archiveteam.org/index.php?title=Everything2 [08:53] will try to throw it up onto archive.org once it's done, getting .warc files for all the nodes as well [12:40] So I'm banging away trying to get exmic's updated lua script to work on the CBR old forums and nothing is working. Finally I decide to look at what is actually getting downloaded, and I see this: [12:40] The administrator has banned your IP address. [12:41] I don't know when it happened, or what I did to trigger it. [12:41] :-( [13:03] Does anyone know if there's a plain text archive of any of FidoNet's Echomail anywhere? The closest I can find is the very much not raw material copies over at http://fidonet.ozzmosis.com [14:21] I wish there was! [14:21] That's something I always lose track of. [14:57] Ack. Well at least we have a lot of USENET and mailing lists. The GNU mailing lists alone are quite significant, albeit of kinda niche interest. :) [15:24] ZoeB: some of it was gated to Usenet, so you could look at fido.* groups in Usenet archives... [15:24] Ooh, good point, thanks! [15:25] Ah, yes! https://archive.org/details/usenet-fido [15:25] https://archive.org/search.php?query=collection%3Ausenet%20fido [15:27] Basically, https://archive.org/details/usenet in general then. :) [16:52] http://www.itworld.com/storage/416783/sony-develops-tape-tech-could-lead-185-tb-cartridges [16:52] Tape makes a resurgence. [16:56] Huh. That's quite the breakthrough. [16:59] MP3s should have a warmer, more natural tone coming from a tape deck... [16:59] Hurr... [17:18] hah [21:41] SketchCow: everything here (https://archive.org/search.php?query=uploader%3A%22aeakett%40gmail.com%22%20AND%20subject%3A%22fighting%20fantasy%22) would be a good fit in the Game Books Collection (https://archive.org/details/gamebooks). [21:47] They're already there. [21:47] So I am in total agreement. [21:51] * Smiley looks in [21:51] huh... we must have had this conversation already. Sorry about that. [22:11] It's a good conversation. [22:19] Quick walk to store, then let's kick some archive ass tonight. [22:41] archive the night away [22:48] we need a theme song titled "archive the night away" [22:48] although I'm worried it would eventually devolve into a rendition of "Fuck Yahoo" 100 times or something [22:51] 'eventually'? [22:57] how do i report a page on the site? [22:58] a what on where and why? :) [22:59] https://archive.org/details/GiannaMichaelsVideos_Videos [23:00] what do you want to report about it? [23:00] it's spam with no content [23:00] send those to info@archive.org [23:01] ah [23:01] yeah [23:01] zenguy_pc: archiveteam does not equal archive.org [23:01] we just love each other very much [23:01] <<<<3333 [23:02] we also touch archive.org in inappropriate ways [23:08] since you guys archive ... anyone know of an easy way to get a list of all starred projects on github.com? [23:09] i'd like to periodically archive the repos i starr [23:27] github lifted all restrictions on internet archive crawling [23:28] fab [23:31] :O nice [23:31] that doesn't really help if you want to preserve git history though [23:31] that would require a specific crawler for git [23:32] that's great news [23:33] it's handy for stuff using github basically as a webhost like https://github.com/greatfire/wiki [23:33] hmm just tried it and the css doesn't come through [23:34] still a robots.txt block there: http://web.archive.org/web/20140502233237cs_/https://github.global.ssl.fastly.net/assets/github-825241e13de547a733c8a9cc535c8f6b411b52c1.css [23:45] I just pinged the github guy [23:45] github crawling by the Internet Archive is just one of our tools, as you know. [23:48] is another the comfy chair? [23:48] Autism [23:49] a near-fanatical devotion to the pope [23:49] SketchCow: for the datasheets I gave you, the PDF metadata is useless crap unfortunately (an artifact of the site they come from), but if you extract the text from the first or second page, they all have a great overview section. I tested using pdftotext, and it came out pretty well [23:49] In another window, I am hand-editing a collection of Sega Master System cartridges [23:50] that's pretty awesome [23:59] is there a good way to extract tabular data from pdfs?