Time |
Nickname |
Message |
00:00
🔗
|
joepie91 |
aside: http://sprunge.us/caKB |
00:40
🔗
|
dashcloud |
Please enjoy "Sounds of Slashdot": https://archive.org/details/SoundsOfSlashdot |
00:58
🔗
|
joepie91 |
SketchCow: FYI; the CDX documentation on archive.org is missing the "S" parameter, which appears to indicate content-length |
00:58
🔗
|
joepie91 |
awesome. look like I now have a streaming CDX parser |
00:58
🔗
|
joepie91 |
looks * |
01:00
🔗
|
SketchCow |
How's the fundraising. |
01:01
🔗
|
joepie91 |
SketchCow: campaign has ended, $10k-$11k acquired |
01:01
🔗
|
joepie91 |
out of the $17k target |
01:01
🔗
|
joepie91 |
so I'll be good for a few months |
01:01
🔗
|
SketchCow |
Good start. |
01:02
🔗
|
SketchCow |
Webcam dancing. |
01:02
🔗
|
joepie91 |
indeed :) |
01:02
🔗
|
joepie91 |
lol |
01:02
🔗
|
SketchCow |
That's all I'm saying. |
01:02
🔗
|
SketchCow |
Ladyboy shows |
01:02
🔗
|
joepie91 |
anyway, SketchCow, any input on CDX questions above? |
01:02
🔗
|
SketchCow |
I'm like Elon Musk |
01:02
🔗
|
joepie91 |
the documentation is... sparse, to say the least |
01:02
🔗
|
SketchCow |
I wish I could tell you I knew the first thing. |
01:02
🔗
|
SketchCow |
And you saying "the documentation is sparse" is like me saying "you probably know dutch" |
01:03
🔗
|
SketchCow |
The whole place needs a doc writeup |
01:03
🔗
|
joepie91 |
haha |
01:04
🔗
|
joepie91 |
not going to disagree |
01:04
🔗
|
joepie91 |
anyway, streaming CDX parser works, correctly parses an IA-generated cdx |
01:04
🔗
|
joepie91 |
from a wget-generated WARC |
01:04
🔗
|
joepie91 |
so minimum bar of viability reached |
01:05
🔗
|
joepie91 |
now on to WARC itself... |
01:05
🔗
|
joepie91 |
no, actually, need to implement writing a CDX *generator* first |
01:05
🔗
|
joepie91 |
er |
01:05
🔗
|
joepie91 |
s/writing// |
01:05
🔗
|
joepie91 |
but, brb |
01:28
🔗
|
* |
joepie91 has returned |
01:28
🔗
|
joepie91 |
it's funny how $cat apparently understands the concept of changing cat litter |
01:29
🔗
|
joepie91 |
comes asking me to change it, then sits quietly like a meter away from me, watching what I do, waiting until there's clean litter |
03:18
🔗
|
joepie91 |
mmm... I *think* I can create an interface in my WARC library where you can just append 'request' and 'response' objects straight from the Node HTTP client, and it'll turn them into WARC records |
03:18
🔗
|
joepie91 |
and automatically save and cut off at a certain size |
03:25
🔗
|
chfoo |
joepie91: if you have time, maybe you can compare my implementation with yours: https://github.com/chfoo/warcat . i suggest only looking at warc output and not code to avoid copying bugs |
03:26
🔗
|
joepie91 |
chfoo: might be a good idea |
03:26
🔗
|
joepie91 |
chfoo: you don't happen to have anything handy that speaks CDX? |
03:28
🔗
|
chfoo |
joepie91: https://github.com/chfoo/wpull/blob/64dda8a156fd4a4c5877fbd8a99290fb9d3b9284/wpull/warc.py#L186 and https://github.com/chfoo/wpull/blob/2a595381d38e67eb01a61c49857706ad1f26c72a/wpull/recorder.py#L421 |
03:29
🔗
|
joepie91 |
chfoo: that's handy as a reference on WARC -> CDX, but I mostly just need something that can verify the CDX files generated, including actually trying to do something with the fields |
03:29
🔗
|
joepie91 |
or does it do that also? |
03:30
🔗
|
joepie91 |
(as an aside; do the field markers have to be separated by the delimiter as well? I thought the delimiter only applied to record lines, and that field markers were always space-delimited) |
03:30
🔗
|
chfoo |
joepie91: oh, no sorry. i just dump it out. |
03:31
🔗
|
chfoo |
i;ve read someone the first character is the delimiter |
03:31
🔗
|
joepie91 |
bonus question: how does CDX deal with field values that contain spaces (or rather, delimiters) |
03:31
🔗
|
joepie91 |
yes |
03:31
🔗
|
joepie91 |
but I thought that only applies to record lines |
03:31
🔗
|
chfoo |
i might have a bug then. |
03:32
🔗
|
joepie91 |
chfoo: maybe you do, maybe you don't |
03:32
🔗
|
joepie91 |
realistically the CDX docs are more or less non-existent |
03:32
🔗
|
joepie91 |
lol |
03:32
🔗
|
joepie91 |
hm |
03:32
🔗
|
joepie91 |
doesn't heritrix write CDX? |
03:32
🔗
|
joepie91 |
perhaps look at the source for that |
03:32
🔗
|
joepie91 |
alternatively, there's some CDX-Writer thing that might have pointers |
03:32
🔗
|
chfoo |
i deal with delimiters by following whatwg url spec by percent encoding spaces or less than u+0020 |
03:32
🔗
|
joepie91 |
is that valid for all fields? not just URL? |
03:33
🔗
|
chfoo |
oh. i think i have another bug then |
03:33
🔗
|
joepie91 |
lol |
03:33
🔗
|
joepie91 |
I guess I'm going to have to read Heritrix source... |
03:36
🔗
|
chfoo |
joepie91: i suggest reading cdx-writer source code later because it also had bugs not recognizing wget headers |
03:37
🔗
|
joepie91 |
heh |
03:37
🔗
|
joepie91 |
itc: every single archiving-related tool has breaking bugs |
03:37
🔗
|
joepie91 |
:P |
04:34
🔗
|
balrog |
chfoo: ping? |
04:34
🔗
|
balrog |
I did *not* see your notaol work |
04:34
🔗
|
balrog |
I was going to start out doing the same thing :P |
04:35
🔗
|
balrog |
chfoo: you have the fdo documentation I hope? if not: https://files.app.net/wjpq0CBVU.zip |
04:36
🔗
|
chfoo |
balrog: i got stuck on the protocol deserialization. |
04:36
🔗
|
balrog |
aah |
04:36
🔗
|
balrog |
might want to rejoin #aohell |
04:36
🔗
|
balrog |
if you need help though, I'm very interested in making this work |
11:33
🔗
|
godane |
SketchCow: your getting a EPIC huge 80s commerical mix video |
11:33
🔗
|
godane |
its over 4 hours |
14:20
🔗
|
balrog |
anyone here with ubuntu? |
14:20
🔗
|
balrog |
I'd like to verify a little thing |
14:29
🔗
|
midas |
shoot |
15:36
🔗
|
garyrh |
There's a vulnerability in apt, update recommended: https://www.debian.org/security/2014/dsa-3025 |
15:39
🔗
|
Rotab |
do i update it via apt? |
15:39
🔗
|
Rotab |
:D |
15:40
🔗
|
garyrh |
checking the signatures by hand might be a good idea. :) |
15:41
🔗
|
aaaaaaaaa |
I'll get out my slide rule. |
15:41
🔗
|
Rotab |
haha |
16:19
🔗
|
balrog |
nevermind on what I said |
16:51
🔗
|
SketchCow |
Twitpic gets the archiveteam clients. http://i.imgur.com/MC81uSF.gif |
16:54
🔗
|
xmc |
lol |
17:00
🔗
|
aaaaaaaaa |
SketchCow: Do you have a statue at the IA? |
17:07
🔗
|
SketchCow |
It is being made right now. |
17:07
🔗
|
SketchCow |
I spoke to the sculptor yesterday. We were discussing wings. |
17:07
🔗
|
SketchCow |
I'm going to purchase wings and put them on it, because clay wings will cause a balance/weight issue. |
17:07
🔗
|
xmc |
hahaha |
17:08
🔗
|
SketchCow |
Plus then the wings can be larger. |
17:08
🔗
|
xmc |
nice |
17:08
🔗
|
SketchCow |
I expect to have it waiting for me when I return in October. |
17:09
🔗
|
vantec |
Very nice. |