#archiveteam 2014-05-02,Fri

↑back Search

Time Nickname Message
08:53 🔗 danneh_ Just to let you guys know, I'm doing another backup of Everything2's nodes: http://archiveteam.org/index.php?title=Everything2
08:53 🔗 danneh_ will try to throw it up onto archive.org once it's done, getting .warc files for all the nodes as well
12:40 🔗 SadDM So I'm banging away trying to get exmic's updated lua script to work on the CBR old forums and nothing is working. Finally I decide to look at what is actually getting downloaded, and I see this:
12:40 🔗 SadDM The administrator has banned your IP address.
12:41 🔗 SadDM I don't know when it happened, or what I did to trigger it.
12:41 🔗 SadDM :-(
13:03 🔗 ZoeB Does anyone know if there's a plain text archive of any of FidoNet's Echomail anywhere? The closest I can find is the very much not raw material copies over at http://fidonet.ozzmosis.com
14:21 🔗 SketchCow I wish there was!
14:21 🔗 SketchCow That's something I always lose track of.
14:57 🔗 ZoeB Ack. Well at least we have a lot of USENET and mailing lists. The GNU mailing lists alone are quite significant, albeit of kinda niche interest. :)
15:24 🔗 ats ZoeB: some of it was gated to Usenet, so you could look at fido.* groups in Usenet archives...
15:24 🔗 ZoeB Ooh, good point, thanks!
15:25 🔗 ZoeB Ah, yes! https://archive.org/details/usenet-fido
15:25 🔗 DFJustin https://archive.org/search.php?query=collection%3Ausenet%20fido
15:27 🔗 ZoeB Basically, https://archive.org/details/usenet in general then. :)
16:52 🔗 APerti http://www.itworld.com/storage/416783/sony-develops-tape-tech-could-lead-185-tb-cartridges
16:52 🔗 APerti Tape makes a resurgence.
16:56 🔗 ZoeB Huh. That's quite the breakthrough.
16:59 🔗 APerti MP3s should have a warmer, more natural tone coming from a tape deck...
16:59 🔗 APerti Hurr...
17:18 🔗 exmic hah
21:41 🔗 SadDM SketchCow: everything here (https://archive.org/search.php?query=uploader%3A%22aeakett%40gmail.com%22%20AND%20subject%3A%22fighting%20fantasy%22) would be a good fit in the Game Books Collection (https://archive.org/details/gamebooks).
21:47 🔗 SketchCow They're already there.
21:47 🔗 SketchCow So I am in total agreement.
21:51 🔗 * Smiley looks in
21:51 🔗 SadDM huh... we must have had this conversation already. Sorry about that.
22:11 🔗 SketchCow It's a good conversation.
22:19 🔗 SketchCow Quick walk to store, then let's kick some archive ass tonight.
22:41 🔗 DFJustin archive the night away
22:48 🔗 BlueMax we need a theme song titled "archive the night away"
22:48 🔗 BlueMax although I'm worried it would eventually devolve into a rendition of "Fuck Yahoo" 100 times or something
22:51 🔗 Baljem 'eventually'?
22:57 🔗 zenguy_pc how do i report a page on the site?
22:58 🔗 schbirid a what on where and why? :)
22:59 🔗 zenguy_pc https://archive.org/details/GiannaMichaelsVideos_Videos
23:00 🔗 exmic what do you want to report about it?
23:00 🔗 DFJustin it's spam with no content
23:00 🔗 DFJustin send those to info@archive.org
23:01 🔗 exmic ah
23:01 🔗 exmic yeah
23:01 🔗 schbirid zenguy_pc: archiveteam does not equal archive.org
23:01 🔗 schbirid we just love each other very much
23:01 🔗 exmic <<<<3333
23:02 🔗 DFJustin we also touch archive.org in inappropriate ways
23:08 🔗 zenguy_pc since you guys archive ... anyone know of an easy way to get a list of all starred projects on github.com?
23:09 🔗 zenguy_pc i'd like to periodically archive the repos i starr
23:27 🔗 SketchCow github lifted all restrictions on internet archive crawling
23:28 🔗 exmic fab
23:31 🔗 balrog :O nice
23:31 🔗 balrog that doesn't really help if you want to preserve git history though
23:31 🔗 balrog that would require a specific crawler for git
23:32 🔗 dashcloud that's great news
23:33 🔗 DFJustin it's handy for stuff using github basically as a webhost like https://github.com/greatfire/wiki
23:33 🔗 DFJustin hmm just tried it and the css doesn't come through
23:34 🔗 DFJustin still a robots.txt block there: http://web.archive.org/web/20140502233237cs_/https://github.global.ssl.fastly.net/assets/github-825241e13de547a733c8a9cc535c8f6b411b52c1.css
23:45 🔗 SketchCow I just pinged the github guy
23:45 🔗 SketchCow github crawling by the Internet Archive is just one of our tools, as you know.
23:48 🔗 trs80 is another the comfy chair?
23:48 🔗 SketchCow Autism
23:49 🔗 DFJustin a near-fanatical devotion to the pope
23:49 🔗 dashcloud SketchCow: for the datasheets I gave you, the PDF metadata is useless crap unfortunately (an artifact of the site they come from), but if you extract the text from the first or second page, they all have a great overview section. I tested using pdftotext, and it came out pretty well
23:49 🔗 SketchCow In another window, I am hand-editing a collection of Sega Master System cartridges
23:50 🔗 dashcloud that's pretty awesome
23:59 🔗 kanzure is there a good way to extract tabular data from pdfs?

irclogger-viewer