Time |
Nickname |
Message |
01:27
๐
|
dashcloud |
the new jstor liberator is really nice- and so is the website |
01:34
๐
|
Ymgve |
what's the jstor liberator |
01:44
๐
|
db48x2 |
it's a project that alard cooked up for downloading articles from JSTOR and liberating them |
01:44
๐
|
db48x2 |
don't have the url off-hand |
01:50
๐
|
Coderjoe |
[Sep 23 11 04:43] <alard> If you've lost the bookmarklet: http://severe-samurai-6114.heroku.com/ |
01:56
๐
|
Ymgve |
so what does it do exactly? pick random documents from the old collection they've opened up? |
01:58
๐
|
db48x2 |
not random, exactly |
01:58
๐
|
db48x2 |
to the user it looks random |
02:01
๐
|
db48x2 |
the server is handing out documents that still need to be downloaded |
02:05
๐
|
chronomex |
looks like you need text, alard. shall I compose some for you? |
02:20
๐
|
SketchCow |
OK, so. |
02:20
๐
|
SketchCow |
I just did an experiment, I had a file misnamed in the friendster collection. |
02:20
๐
|
SketchCow |
FRIENDSTER-000000000 has a 40gb file I was calling 000-999 |
02:20
๐
|
SketchCow |
No. |
02:20
๐
|
SketchCow |
It's 000000-340000 |
02:21
๐
|
SketchCow |
So there's all those early files. |
02:23
๐
|
SketchCow |
I would claim that file is the most critical of all of them. |
02:23
๐
|
chronomex |
makes sense |
02:30
๐
|
SketchCow |
New laptop came in. |
02:31
๐
|
SketchCow |
Soon it'll be digitizing VHS tapes. |
02:33
๐
|
dashcloud |
alard: you should either flag or reject any PDFs that are 2kb in size (they can never be valid)- that's what you get if you just click liberate without having recently viewed a PDF |
02:54
๐
|
chronomex |
SketchCow: do you know if the archive.org deriver handle .tar.gz of images properly, or only .tar? |
02:54
๐
|
chronomex |
the documentation only says .tar, but my images are uncompressed so .gz is a good deal better |
02:55
๐
|
chronomex |
or would it be better to look into compressing my originals |
03:17
๐
|
SketchCow |
DOn't compress. |
03:17
๐
|
SketchCow |
Try _images.tar.gz |
03:17
๐
|
SketchCow |
or images_tar |
03:17
๐
|
SketchCow |
I mean _images.tar |
03:59
๐
|
chronomex |
okay, i suppose i will try that |
03:59
๐
|
chronomex |
i just want to avoid a huge put that doesnt derive |
04:10
๐
|
chronomex |
why not compress, exactly? I would only use lossless tiff compression that's supported universally, i.e. DEFLATE |
05:04
๐
|
* |
OGshoop slaps shoop around a bit with a large trout |
05:24
๐
|
chronomex |
SketchCow: hmm. _neither_ _images.tar nor _images.tar.gz derived properly. |
05:24
๐
|
chronomex |
_images.tar: http://www.us.archive.org/log_show.php?task_id=84691742 |
05:24
๐
|
chronomex |
_images.tar.gz: http://www.us.archive.org/log_show.php?task_id=84690874 |
05:25
๐
|
chronomex |
I didn't put the images in a directory; they're in the root of the _images.tar just like they are in the root of a _images.zip |
05:47
๐
|
SketchCow |
Car |
05:47
๐
|
SketchCow |
Gar |
06:03
๐
|
chronomex |
I'm going to look into creating a utility to DEFLATE-compress these tiff files in place before uploading, if you guys can't take _images.tar.gz |
06:04
๐
|
chronomex |
well, might get better compression that way in any case |
06:06
๐
|
DFJustin |
why not just _images.zip |
06:07
๐
|
chronomex |
because _images.zip has a 2G hard limit, and I have some scans that wind up being 5G |
06:08
๐
|
DFJustin |
:o |
06:08
๐
|
chronomex |
compression only gets rid of 40% usually |
07:05
๐
|
tux0 |
hello |
07:12
๐
|
tux0 |
think I read somewhere youtube has around 500 million videos |
07:13
๐
|
tux0 |
think it may have been their annual report to investors |
07:15
๐
|
tux0 |
that amount of data is just wow |
07:17
๐
|
tux0 |
anyways just checking in since the google video project :) be back later |
07:33
๐
|
db48x |
tux0: howdy :) |
07:33
๐
|
db48x |
and yes, 500 million videos is rather a large number |
07:33
๐
|
db48x |
we should start right away |
08:54
๐
|
alard |
chronomex: Thanks for the offer, but SketchCow is already thinking about text for the JSTOR toy. |
08:56
๐
|
alard |
dashclouds: The uploaded PDFs do indeed need checking. But did you mean that the bookmarklet uploads non-PDFs? That would be strange, since it's supposed to always download the PDF first. |
08:57
๐
|
alard |
Sorry, dashcloud without the s (if he's still here). |
09:02
๐
|
chronomex |
okay |
09:47
๐
|
SketchCow |
DERP |
09:48
๐
|
chronomex |
20:16:40 <@SketchCow> DOn't compress. |
09:48
๐
|
chronomex |
why not? |
09:49
๐
|
SketchCow |
I apologize for that terse, unhelpful statment. |
09:49
๐
|
chronomex |
don't apologize, just explain |
09:51
๐
|
SketchCow |
Just had to set some uploads. |
09:52
๐
|
SketchCow |
In generally, it's better not to compress, and archive.org's derives will do all the right thing, and make new versions of everything. |
09:52
๐
|
SketchCow |
But that's not 100% guaranteed. |
09:52
๐
|
chronomex |
why's it better, exactly? |
09:52
๐
|
SketchCow |
Like, I upload .avi films whenever possible, and it then makes compressed .MPG, .OGG, .MP4 versions, etc. |
09:52
๐
|
chronomex |
that's lossy compression |
09:52
๐
|
SketchCow |
It adds a layer of complexity to the deriver. |
09:53
๐
|
SketchCow |
An alternate version, of course, is to derive them yourself and upload them, it'll deal. |
09:53
๐
|
SketchCow |
Right, lossy derivatives from your uncompressed original. |
09:53
๐
|
SketchCow |
As it always keeps the original, then all the options are there. |
09:53
๐
|
chronomex |
I'm taking the scanner's raw uncompressed tiffs and turning them into DEFLATE lossless-compressed tifs |
09:53
๐
|
chronomex |
is there something you think is wrong with that? |
09:54
๐
|
SketchCow |
No. |
09:54
๐
|
chronomex |
okay, good |
09:54
๐
|
chronomex |
because there isn't |
09:54
๐
|
SketchCow |
Ha ha |
09:54
๐
|
SketchCow |
OH REALLY |
09:54
๐
|
* |
SketchCow grabs bottle, smashes neck |
09:54
๐
|
SketchCow |
I maybe old but I can cut you |
09:55
๐
|
chronomex |
if you're worried about bitrot in the future, a DEFLATEd .tif stored in a .zip file is no better off than an uncompressed .tif DEFLATEd into a .zip file |
09:55
๐
|
chronomex |
but the former is a lot easier for me |
09:55
๐
|
SketchCow |
I'm not even a little worried about bitrot. |
09:55
๐
|
chronomex |
you may be old but your cat has a billion people read him on the internet |
09:55
๐
|
SketchCow |
Mostly, it's that nearly every other place on the internet takes your fatty file, makes lossy derivatives, then shows that and won't let you at the original. |
09:56
๐
|
chronomex |
fuck that shit |
09:56
๐
|
SketchCow |
At IA, it's oppostite, you can ALWAYS get to the original, and then it's touch and go what lossys get out. |
09:56
๐
|
SketchCow |
So I like to encourage that. |
09:56
๐
|
SketchCow |
But original can be whatever original you want, to taste. |
09:56
๐
|
SketchCow |
But I just try to wean/ward people off wrecking the file if they don't have to, since IA has the space and the will. |
09:56
๐
|
chronomex |
I'm compressing the originals because it makes it much faster to move around |
09:57
๐
|
* |
chronomex nods |
09:57
๐
|
chronomex |
ofc I checked that all the scanner metadata makes it through the compress process |
09:57
๐
|
chronomex |
dpi, model, etc |
09:58
๐
|
SketchCow |
Good deal. |
09:58
๐
|
chronomex |
I'm no dummy. |
10:02
๐
|
SketchCow |
I've decided I'm sick of people who, seeing the thousands of magazines I've uploaded, go "but he totally forgot XXXXX" |
10:02
๐
|
SketchCow |
Where XXXX is their random magazine they remember, vaguely. |
10:02
๐
|
SketchCow |
Never mind there's now more issues up than they could possibly read. |
10:02
๐
|
SketchCow |
Or that I might have more. |
10:02
๐
|
SketchCow |
Gift Horses! |
10:03
๐
|
chronomex |
yeahhhhhfuckoff |
10:18
๐
|
RedType |
Poland Magazine |
10:29
๐
|
SketchCow |
One guy who said this, I DID have the magazine he mentioned up... it just wasn't on the week-old list someone had posted in the forum. |
12:19
๐
|
alard |
SketchCow: Are you there? If so, I have a question for you about the server-side things of JSTOR. |
16:23
๐
|
Coderjoe |
SketchCow: it seems some people didn't learn this lesson as a kid: http://www.youtube.com/watch?v=wm-E7zkyCCA |
16:51
๐
|
Ymgve |
such a wasted chance to use "look a gift pony in the mouth" |
18:17
๐
|
SketchCow |
ALard, yes |
18:29
๐
|
SketchCow |
I have to step out to do some presentations/opening at a hacker space. E-mail me, I'll respond. |
23:31
๐
|
Paradoks |
join #foreveralone |
23:31
๐
|
Paradoks |
Oops. |
23:37
๐
|
Wyatt |
"...Iรขยยll be down at Cafe du Chapeau chowing down." This is the best way to say you'd eat your hat ever. |