[07:40] I'm running the keyworder against 40,000 manuals. [07:40] To celebrate all the new ones that didn't get keywords (if they did, it skips them) [07:41] how IO intensive is that process anyway? [07:41] Not very. [07:42] It basically uses the python interface to see if "subject" has any data. If it doesn't, it moves on. [07:42] When it does find one, it downloads the djvu text OCR result, which is, in many cases, rarely over 50k and sometimes 100k. [07:42] And the keyword search is barely a blip, instantaneous. [07:43] Much more intensive is the making of makefiles for the JSMESS rebuild I'm doing. [07:43] That's very nice [07:43] Or the megawarc building going on for the fotopedia grab [07:44] 7z decompression of MediaWiki XML dumps is usually quite fast too ;) [07:44] https://archive.org/details/datasheet_AT26DF321 [07:44] Like, that took a second or two. [07:44] device; status; pin; sector; opcode; sprl; protection; erase; clocked; status register; sector protection; wel bit; sprl bit; protection registers; write status; global protect; protection register; block erase; write enable [07:45] https://archive.org/search.php?query=collection%3Amanuals&sort=-publicdate [07:45] Eventually, all those will populate. [07:46] I'm also transferring Apple floppies, packing magazines to go to an archive, and collecting laundry. [07:47] the last one is probably more load intensive on the washing machine in the end, atleast for me it usally is [07:48] FOS was mostly sick because DFJustin did an experiment and it went a tad south. [07:49] Process ended up sucking 11.6gb of memory, which was.... noticable [07:49] oops [07:50] this happened on my megawarc box yesterday: http://hacktheplanet.nl/munin/megawarc.org/fr.megawarc.org/load-day.png [07:51] mostly IOwait due to rsync going mad at 700Mbit and trying to stuff 1TB of warc files in bigger warc files [07:53] Keywording is going along as expected - new keywords needed every 50-100 files. [07:54] Brewster had huge skepticism this would work, but it's working. [07:54] https://archive.org/details/Kenmore_Microwave_Oven_Microwave_Oven_User_Manual oven keywords, bitches [07:54] oven; cooking; touch; temperature; display; probe; microwave; recipe; micro; auto; temperature probe; display window; touch numbers; touch micro; oven temperature; power level; convection cooking; microwave roasting; touch start; auto recipe [08:07] heh cool [08:15] Hello, one of my favorite sites is shutting down on September 15 and it's called Domo Animate. It's powered by Go!Animate and it's getting shut down because it's not fulfilling its purpose to the Domo brand. I was wondering if you could please archive the site, thank you. The link is: domo.goanimate.com [08:16] Anyone want to take a shot [08:17] ia upload 2014.08.fotopedia-cc-export-collection * -m "date:2014-08" -m "mediatype:data" -m "collection:archiveteam" -m "title:Fotopedia Creative Commons Colleciton" -m "description:A collection of Fotopedia files that were licensed under creative commons, evacuated with the shutdown of the service." [08:17] By the way. [08:17] 297gb of Creative Commons Photos and their info [08:19] uploading export-fotopedia-cc.tar: [ ] 4409/283999 - 07:51:07 [08:20] Thinks it'll take 8 hours to upload - not bad. [08:20] Domo Animate is pretty big [08:21] according to the sitemap, which is devided in a lot of subsitemaps and all of those subsitemaps have a lot of links http://goanimate.com/gositemapindex.xml [08:21] Maybe better a warrior project? [13:55] SketchCow: when you have the time, can you move these around? https://archive.org/search.php?query=collection%3A%22opensource%22%20AND%20%28uploader%3A%22midas-ia%40hacktheplanet.nl%22%29 [13:56] not sure how they came there and the rawporter grab should be somewhere else [15:53] The manuals are all getting keywords now in the oven world. Looking good! [15:56] SketchCow: is there also a way to have a tag cloud? :) [15:58] ? [15:58] Tag cloud what [15:59] thanks for that [16:00] just know i had to manual go thur kenmore manuals [16:00] turns out there are pdfs that link to pdfs [16:00] midas: fixed [16:45] Cloud of keywords in a collection [17:11] mmmm [17:44] http://www.angryasianman.com/images/angry/ourstoryamovementofundocumentedapis01.jpg [17:44] er oops [17:46] not the kind of undocumented API that I was expecting [17:46] yeah, same here [17:46] -> #[]-bs [20:43] The great manual keynoting is done. [20:43] I'll run one again when things puff up again [21:07] nyaa.se will be shutting down soon can you save it plz. [21:13] impressive [21:13] 4 seconds [21:14] and no source.. [21:14] no explanation either [21:14] there's some messages on the chatbox [21:15] doesn't really need an explanation, it's an illegal torrent site [21:15] I think it is a torrent site based on wikipedia. [21:15] Liumen: we put archivebot on it [21:15] liumen isn't here any more [21:15] ah [21:15] heh [21:15] its an anime centric torrent site yeah. [21:15] typical nub behavior [21:27] i did do a grab of the first 10000 torrent pages [21:28] high five o/\o [21:28] http://archive.org/details/www.nyaa.se-id-1-to-10000-20140428 [21:28] hmm wait I don't see the actual warc on there, just cdx and log [21:29] oh you're just adding it now [21:33] yup [22:02] its uploaded now [22:02] just know its the raw html from what i can remember