#archiveteam 2014-08-12,Tue

↑back Search

Time Nickname Message
07:40 🔗 SketchCow I'm running the keyworder against 40,000 manuals.
07:40 🔗 SketchCow To celebrate all the new ones that didn't get keywords (if they did, it skips them)
07:41 🔗 midas how IO intensive is that process anyway?
07:41 🔗 SketchCow Not very.
07:42 🔗 SketchCow It basically uses the python interface to see if "subject" has any data. If it doesn't, it moves on.
07:42 🔗 SketchCow When it does find one, it downloads the djvu text OCR result, which is, in many cases, rarely over 50k and sometimes 100k.
07:42 🔗 SketchCow And the keyword search is barely a blip, instantaneous.
07:43 🔗 SketchCow Much more intensive is the making of makefiles for the JSMESS rebuild I'm doing.
07:43 🔗 Nemo_bis That's very nice
07:43 🔗 SketchCow Or the megawarc building going on for the fotopedia grab
07:44 🔗 Nemo_bis 7z decompression of MediaWiki XML dumps is usually quite fast too ;)
07:44 🔗 SketchCow https://archive.org/details/datasheet_AT26DF321
07:44 🔗 SketchCow Like, that took a second or two.
07:44 🔗 SketchCow device; status; pin; sector; opcode; sprl; protection; erase; clocked; status register; sector protection; wel bit; sprl bit; protection registers; write status; global protect; protection register; block erase; write enable
07:45 🔗 SketchCow https://archive.org/search.php?query=collection%3Amanuals&sort=-publicdate
07:45 🔗 SketchCow Eventually, all those will populate.
07:46 🔗 SketchCow I'm also transferring Apple floppies, packing magazines to go to an archive, and collecting laundry.
07:47 🔗 midas the last one is probably more load intensive on the washing machine in the end, atleast for me it usally is
07:48 🔗 SketchCow FOS was mostly sick because DFJustin did an experiment and it went a tad south.
07:49 🔗 SketchCow Process ended up sucking 11.6gb of memory, which was.... noticable
07:49 🔗 midas oops
07:50 🔗 midas this happened on my megawarc box yesterday: http://hacktheplanet.nl/munin/megawarc.org/fr.megawarc.org/load-day.png
07:51 🔗 midas mostly IOwait due to rsync going mad at 700Mbit and trying to stuff 1TB of warc files in bigger warc files
07:53 🔗 SketchCow Keywording is going along as expected - new keywords needed every 50-100 files.
07:54 🔗 SketchCow Brewster had huge skepticism this would work, but it's working.
07:54 🔗 SketchCow https://archive.org/details/Kenmore_Microwave_Oven_Microwave_Oven_User_Manual oven keywords, bitches
07:54 🔗 SketchCow oven; cooking; touch; temperature; display; probe; microwave; recipe; micro; auto; temperature probe; display window; touch numbers; touch micro; oven temperature; power level; convection cooking; microwave roasting; touch start; auto recipe
08:07 🔗 midas heh cool
08:15 🔗 SketchCow Hello, one of my favorite sites is shutting down on September 15 and it's called Domo Animate. It's powered by Go!Animate and it's getting shut down because it's not fulfilling its purpose to the Domo brand. I was wondering if you could please archive the site, thank you. The link is: domo.goanimate.com
08:16 🔗 SketchCow Anyone want to take a shot
08:17 🔗 SketchCow ia upload 2014.08.fotopedia-cc-export-collection * -m "date:2014-08" -m "mediatype:data" -m "collection:archiveteam" -m "title:Fotopedia Creative Commons Colleciton" -m "description:A collection of Fotopedia files that were licensed under creative commons, evacuated with the shutdown of the service."
08:17 🔗 SketchCow By the way.
08:17 🔗 SketchCow 297gb of Creative Commons Photos and their info
08:19 🔗 SketchCow uploading export-fotopedia-cc.tar: [ ] 4409/283999 - 07:51:07
08:20 🔗 SketchCow Thinks it'll take 8 hours to upload - not bad.
08:20 🔗 Arkiver2 Domo Animate is pretty big
08:21 🔗 Arkiver2 according to the sitemap, which is devided in a lot of subsitemaps and all of those subsitemaps have a lot of links http://goanimate.com/gositemapindex.xml
08:21 🔗 Arkiver2 Maybe better a warrior project?
13:55 🔗 midas SketchCow: when you have the time, can you move these around? https://archive.org/search.php?query=collection%3A%22opensource%22%20AND%20%28uploader%3A%22midas-ia%40hacktheplanet.nl%22%29
13:56 🔗 midas not sure how they came there and the rawporter grab should be somewhere else
15:53 🔗 SketchCow The manuals are all getting keywords now in the oven world. Looking good!
15:56 🔗 Nemo_bis SketchCow: is there also a way to have a tag cloud? :)
15:58 🔗 SketchCow ?
15:58 🔗 SketchCow Tag cloud what
15:59 🔗 godane thanks for that
16:00 🔗 godane just know i had to manual go thur kenmore manuals
16:00 🔗 godane turns out there are pdfs that link to pdfs
16:00 🔗 SketchCow midas: fixed
16:45 🔗 Nemo_bis Cloud of keywords in a collection
17:11 🔗 SketchCow mmmm
17:44 🔗 yipdw http://www.angryasianman.com/images/angry/ourstoryamovementofundocumentedapis01.jpg
17:44 🔗 yipdw er oops
17:46 🔗 xmc not the kind of undocumented API that I was expecting
17:46 🔗 yipdw yeah, same here
17:46 🔗 xmc -> #[]-bs
20:43 🔗 SketchCow The great manual keynoting is done.
20:43 🔗 SketchCow I'll run one again when things puff up again
21:07 🔗 Liumen nyaa.se will be shutting down soon can you save it plz.
21:13 🔗 xmc impressive
21:13 🔗 xmc 4 seconds
21:14 🔗 deathy and no source..
21:14 🔗 xmc no explanation either
21:14 🔗 DFJustin there's some messages on the chatbox
21:15 🔗 DFJustin doesn't really need an explanation, it's an illegal torrent site
21:15 🔗 aaaaaaaaa I think it is a torrent site based on wikipedia.
21:15 🔗 Arkiver2 Liumen: we put archivebot on it
21:15 🔗 xmc liumen isn't here any more
21:15 🔗 Arkiver2 ah
21:15 🔗 Arkiver2 heh
21:15 🔗 Jonimus its an anime centric torrent site yeah.
21:15 🔗 xmc typical nub behavior
21:27 🔗 godane i did do a grab of the first 10000 torrent pages
21:28 🔗 DFJustin high five o/\o
21:28 🔗 godane http://archive.org/details/www.nyaa.se-id-1-to-10000-20140428
21:28 🔗 DFJustin hmm wait I don't see the actual warc on there, just cdx and log
21:29 🔗 DFJustin oh you're just adding it now
21:33 🔗 godane yup
22:02 🔗 godane its uploaded now
22:02 🔗 godane just know its the raw html from what i can remember

irclogger-viewer