#archiveteam 2012-09-22,Sat

↑back Search

Time Nickname Message
02:39 🔗 SketchCow I am very sad they destroyed these BYTEs to make these scans, but I am happy with the results
02:40 🔗 SketchCow So they're awful actions resulting in nice outcomes
02:52 🔗 joepie91 currently scraping devilskitchen blog, for anyone who cares, here is the scraper source: http://git.cryto.net/cgit/joepie91/tree/tools/scrapers/devilskitchen.py
02:55 🔗 kennethre SketchCow: you happen to be in SF?
03:00 🔗 bsmith094 SketchCow: are these the atariage forum scans?
03:06 🔗 SketchCow Half are
03:06 🔗 SketchCow Someone took it up after the guy disappeared and stopped
03:12 🔗 joepie91 okay, I'm encountering something really strange
03:12 🔗 joepie91 can anyone try to go to http://www.devilskitchen.me.uk/2009_01_01_archive.html and see if it loads?
03:12 🔗 joepie91 or whether they get an 'account error'?
03:14 🔗 joepie91 wtf
03:14 🔗 joepie91 403s all over the place
10:08 🔗 SmileyG Account Load Error
10:08 🔗 SmileyG Your Google account has been disabled or suspended or deleted. lol
11:36 🔗 godane it looks like 2009_01 to 2009_04 are all blocked
11:40 🔗 godane i'm grabbing all the old T3 magazines i can get
13:03 🔗 SketchCow Greetings from Train
13:14 🔗 underscor nice
13:58 🔗 dragondon Greetings all. Can't import the latest ova image into VirtualBox on a Debian AMD-FX6100 system. "Could not read OVF file 'archiveteam-warrior-v2-20120813.ovf' (VERR_TAR_END_OF_FILE)."
13:58 🔗 dragondon I am attempting to download again just in case.
13:59 🔗 Frigolit considering the "END_OF_FILE" it does sound like a corrupt download
14:07 🔗 SketchCow I'm blasting dozens maybe hundreds of laptop service manuals in.
14:08 🔗 SketchCow Laptop all the things
14:09 🔗 SketchCow Past 80 added now.
14:09 🔗 SketchCow So that's something.
14:16 🔗 SketchCow Little things like manuals can have a huge effect.
14:41 🔗 dragondon Second download, same error. Does someone have an MD5 that I can check against?
14:46 🔗 SketchCow http://archive.org/details/dell-service-manual-64ptnen wheeee
14:47 🔗 alard dragondon: e9079fbbcf5e05b3493fee8c05cd6f77 (from http://ia601200.us.archive.org/3/items/archiveteam-warrior/archiveteam-warrior_files.xml)
14:48 🔗 SketchCow dragondon: Set your modem to 8,N,1
14:48 🔗 alard dragondon: What sometimes helps is to rename the ova file to archiveteam-warrior-v2.ova (remove the date)
14:50 🔗 alard I renamed the ova file before uploading, so the ovf file inside is still called archiveteam-warrior-v2.ovf. Some versions of VirtualBox don't like that.
20:23 🔗 Nemo_bis now deriving a 29700 pages book: http://www.us.archive.org/log_show.php?task_id=124346847
20:23 🔗 Nemo_bis (Britannica 1911)
20:25 🔗 Nemo_bis looks like Module AbbyyXML will take 109 h
20:25 🔗 Nemo_bis I'm curious to see the result...
20:29 🔗 DFJustin lol
20:31 🔗 bsmith094 you think thats big, all the stories i've got, if i printed them, would take 225 REAMS of paper, 117215 pages
20:34 🔗 Nemo_bis noo I don't think it's big
20:34 🔗 Nemo_bis it's just the deriver being a silly bully
20:34 🔗 Nemo_bis "I'll show you, I can eat ALL OF IT at once!"
20:35 🔗 bsmith094 hey, you know what would be a serious PITA to OCR, House of Leaves?
20:35 🔗 bsmith094 typographic nightmare
20:36 🔗 Nemo_bis oh, don't worry, I have my own
20:36 🔗 bsmith094 which is?
20:36 🔗 Nemo_bis http://archive.org/details/VocabolarioDellaLinguaItaliana2
20:36 🔗 Nemo_bis Zingarelli_images.zip 23-Dec-2011 04:03 21792613118
20:37 🔗 bsmith094 ok, why does a 6.4gb pdf of mostly text even exist?
20:37 🔗 Nemo_bis looks like no OCR system is able to correctly recognize text if the entry exponent is twice as big in a font as the rest of the entry
20:37 🔗 Nemo_bis it's just a broken pdf
20:37 🔗 Nemo_bis and it's an image pdf
20:37 🔗 Nemo_bis we took scans at 400 dpi IIRC
