[02:39] I am very sad they destroyed these BYTEs to make these scans, but I am happy with the results [02:40] So they're awful actions resulting in nice outcomes [02:52] currently scraping devilskitchen blog, for anyone who cares, here is the scraper source: http://git.cryto.net/cgit/joepie91/tree/tools/scrapers/devilskitchen.py [02:55] SketchCow: you happen to be in SF? [03:00] SketchCow: are these the atariage forum scans? [03:06] Half are [03:06] Someone took it up after the guy disappeared and stopped [03:12] okay, I'm encountering something really strange [03:12] can anyone try to go to http://www.devilskitchen.me.uk/2009_01_01_archive.html and see if it loads? [03:12] or whether they get an 'account error'? [03:14] wtf [03:14] 403s all over the place [10:08] Account Load Error [10:08] Your Google account has been disabled or suspended or deleted. lol [11:36] it looks like 2009_01 to 2009_04 are all blocked [11:40] i'm grabbing all the old T3 magazines i can get [13:03] Greetings from Train [13:14] nice [13:58] Greetings all. Can't import the latest ova image into VirtualBox on a Debian AMD-FX6100 system. "Could not read OVF file 'archiveteam-warrior-v2-20120813.ovf' (VERR_TAR_END_OF_FILE)." [13:58] I am attempting to download again just in case. [13:59] considering the "END_OF_FILE" it does sound like a corrupt download [14:07] I'm blasting dozens maybe hundreds of laptop service manuals in. [14:08] Laptop all the things [14:09] Past 80 added now. [14:09] So that's something. [14:16] Little things like manuals can have a huge effect. [14:41] Second download, same error. Does someone have an MD5 that I can check against? [14:46] http://archive.org/details/dell-service-manual-64ptnen wheeee [14:47] dragondon: e9079fbbcf5e05b3493fee8c05cd6f77 (from http://ia601200.us.archive.org/3/items/archiveteam-warrior/archiveteam-warrior_files.xml) [14:48] dragondon: Set your modem to 8,N,1 [14:48] dragondon: What sometimes helps is to rename the ova file to archiveteam-warrior-v2.ova (remove the date) [14:50] I renamed the ova file before uploading, so the ovf file inside is still called archiveteam-warrior-v2.ovf. Some versions of VirtualBox don't like that. [20:23] now deriving a 29700 pages book: http://www.us.archive.org/log_show.php?task_id=124346847 [20:23] (Britannica 1911) [20:25] looks like Module AbbyyXML will take 109 h [20:25] I'm curious to see the result... [20:29] lol [20:31] you think thats big, all the stories i've got, if i printed them, would take 225 REAMS of paper, 117215 pages [20:34] noo I don't think it's big [20:34] it's just the deriver being a silly bully [20:34] "I'll show you, I can eat ALL OF IT at once!" [20:35] hey, you know what would be a serious PITA to OCR, House of Leaves? [20:35] typographic nightmare [20:36] oh, don't worry, I have my own [20:36] which is? [20:36] http://archive.org/details/VocabolarioDellaLinguaItaliana2 [20:36] Zingarelli_images.zip 23-Dec-2011 04:03 21792613118 [20:37] ok, why does a 6.4gb pdf of mostly text even exist? [20:37] looks like no OCR system is able to correctly recognize text if the entry exponent is twice as big in a font as the rest of the entry [20:37] it's just a broken pdf [20:37] and it's an image pdf [20:37] we took scans at 400 dpi IIRC