[01:27] the new jstor liberator is really nice- and so is the website [01:34] what's the jstor liberator [01:44] it's a project that alard cooked up for downloading articles from JSTOR and liberating them [01:44] don't have the url off-hand [01:50] [Sep 23 11 04:43] If you've lost the bookmarklet: http://severe-samurai-6114.heroku.com/ [01:56] so what does it do exactly? pick random documents from the old collection they've opened up? [01:58] not random, exactly [01:58] to the user it looks random [02:01] the server is handing out documents that still need to be downloaded [02:05] looks like you need text, alard. shall I compose some for you? [02:20] OK, so. [02:20] I just did an experiment, I had a file misnamed in the friendster collection. [02:20] FRIENDSTER-000000000 has a 40gb file I was calling 000-999 [02:20] No. [02:20] It's 000000-340000 [02:21] So there's all those early files. [02:23] I would claim that file is the most critical of all of them. [02:23] makes sense [02:30] New laptop came in. [02:31] Soon it'll be digitizing VHS tapes. [02:33] alard: you should either flag or reject any PDFs that are 2kb in size (they can never be valid)- that's what you get if you just click liberate without having recently viewed a PDF [02:54] SketchCow: do you know if the archive.org deriver handle .tar.gz of images properly, or only .tar? [02:54] the documentation only says .tar, but my images are uncompressed so .gz is a good deal better [02:55] or would it be better to look into compressing my originals [03:17] DOn't compress. [03:17] Try _images.tar.gz [03:17] or images_tar [03:17] I mean _images.tar [03:59] okay, i suppose i will try that [03:59] i just want to avoid a huge put that doesnt derive [04:10] why not compress, exactly? I would only use lossless tiff compression that's supported universally, i.e. DEFLATE [05:04] * OGshoop slaps shoop around a bit with a large trout [05:24] SketchCow: hmm. _neither_ _images.tar nor _images.tar.gz derived properly. [05:24] _images.tar: http://www.us.archive.org/log_show.php?task_id=84691742 [05:24] _images.tar.gz: http://www.us.archive.org/log_show.php?task_id=84690874 [05:25] I didn't put the images in a directory; they're in the root of the _images.tar just like they are in the root of a _images.zip [05:47] Car [05:47] Gar [06:03] I'm going to look into creating a utility to DEFLATE-compress these tiff files in place before uploading, if you guys can't take _images.tar.gz [06:04] well, might get better compression that way in any case [06:06] why not just _images.zip [06:07] because _images.zip has a 2G hard limit, and I have some scans that wind up being 5G [06:08] :o [06:08] compression only gets rid of 40% usually [07:05] hello [07:12] think I read somewhere youtube has around 500 million videos [07:13] think it may have been their annual report to investors [07:15] that amount of data is just wow [07:17] anyways just checking in since the google video project :) be back later [07:33] tux0: howdy :) [07:33] and yes, 500 million videos is rather a large number [07:33] we should start right away [08:54] chronomex: Thanks for the offer, but SketchCow is already thinking about text for the JSTOR toy. [08:56] dashclouds: The uploaded PDFs do indeed need checking. But did you mean that the bookmarklet uploads non-PDFs? That would be strange, since it's supposed to always download the PDF first. [08:57] Sorry, dashcloud without the s (if he's still here). [09:02] okay [09:47] DERP [09:48] 20:16:40 <@SketchCow> DOn't compress. [09:48] why not? [09:49] I apologize for that terse, unhelpful statment. [09:49] don't apologize, just explain [09:51] Just had to set some uploads. [09:52] In generally, it's better not to compress, and archive.org's derives will do all the right thing, and make new versions of everything. [09:52] But that's not 100% guaranteed. [09:52] why's it better, exactly? [09:52] Like, I upload .avi films whenever possible, and it then makes compressed .MPG, .OGG, .MP4 versions, etc. [09:52] that's lossy compression [09:52] It adds a layer of complexity to the deriver. [09:53] An alternate version, of course, is to derive them yourself and upload them, it'll deal. [09:53] Right, lossy derivatives from your uncompressed original. [09:53] As it always keeps the original, then all the options are there. [09:53] I'm taking the scanner's raw uncompressed tiffs and turning them into DEFLATE lossless-compressed tifs [09:53] is there something you think is wrong with that? [09:54] No. [09:54] okay, good [09:54] because there isn't [09:54] Ha ha [09:54] OH REALLY [09:54] * SketchCow grabs bottle, smashes neck [09:54] I maybe old but I can cut you [09:55] if you're worried about bitrot in the future, a DEFLATEd .tif stored in a .zip file is no better off than an uncompressed .tif DEFLATEd into a .zip file [09:55] but the former is a lot easier for me [09:55] I'm not even a little worried about bitrot. [09:55] you may be old but your cat has a billion people read him on the internet [09:55] Mostly, it's that nearly every other place on the internet takes your fatty file, makes lossy derivatives, then shows that and won't let you at the original. [09:56] fuck that shit [09:56] At IA, it's oppostite, you can ALWAYS get to the original, and then it's touch and go what lossys get out. [09:56] So I like to encourage that. [09:56] But original can be whatever original you want, to taste. [09:56] But I just try to wean/ward people off wrecking the file if they don't have to, since IA has the space and the will. [09:56] I'm compressing the originals because it makes it much faster to move around [09:57] * chronomex nods [09:57] ofc I checked that all the scanner metadata makes it through the compress process [09:57] dpi, model, etc [09:58] Good deal. [09:58] I'm no dummy. [10:02] I've decided I'm sick of people who, seeing the thousands of magazines I've uploaded, go "but he totally forgot XXXXX" [10:02] Where XXXX is their random magazine they remember, vaguely. [10:02] Never mind there's now more issues up than they could possibly read. [10:02] Or that I might have more. [10:02] Gift Horses! [10:03] yeahhhhhfuckoff [10:18] Poland Magazine [10:29] One guy who said this, I DID have the magazine he mentioned up... it just wasn't on the week-old list someone had posted in the forum. [12:19] SketchCow: Are you there? If so, I have a question for you about the server-side things of JSTOR. [16:23] SketchCow: it seems some people didn't learn this lesson as a kid: http://www.youtube.com/watch?v=wm-E7zkyCCA [16:51] such a wasted chance to use "look a gift pony in the mouth" [18:17] ALard, yes [18:29] I have to step out to do some presentations/opening at a hacker space. E-mail me, I'll respond. [23:31] join #foreveralone [23:31] Oops. [23:37] "...Iâll be down at Cafe du Chapeau chowing down." This is the best way to say you'd eat your hat ever.