[00:00] aside: http://sprunge.us/caKB [00:40] Please enjoy "Sounds of Slashdot": https://archive.org/details/SoundsOfSlashdot [00:58] SketchCow: FYI; the CDX documentation on archive.org is missing the "S" parameter, which appears to indicate content-length [00:58] awesome. look like I now have a streaming CDX parser [00:58] looks * [01:00] How's the fundraising. [01:01] SketchCow: campaign has ended, $10k-$11k acquired [01:01] out of the $17k target [01:01] so I'll be good for a few months [01:01] Good start. [01:02] Webcam dancing. [01:02] indeed :) [01:02] lol [01:02] That's all I'm saying. [01:02] Ladyboy shows [01:02] anyway, SketchCow, any input on CDX questions above? [01:02] I'm like Elon Musk [01:02] the documentation is... sparse, to say the least [01:02] I wish I could tell you I knew the first thing. [01:02] And you saying "the documentation is sparse" is like me saying "you probably know dutch" [01:03] The whole place needs a doc writeup [01:03] haha [01:04] not going to disagree [01:04] anyway, streaming CDX parser works, correctly parses an IA-generated cdx [01:04] from a wget-generated WARC [01:04] so minimum bar of viability reached [01:05] now on to WARC itself... [01:05] no, actually, need to implement writing a CDX *generator* first [01:05] er [01:05] s/writing// [01:05] but, brb [01:28] * joepie91 has returned [01:28] it's funny how $cat apparently understands the concept of changing cat litter [01:29] comes asking me to change it, then sits quietly like a meter away from me, watching what I do, waiting until there's clean litter [03:18] mmm... I *think* I can create an interface in my WARC library where you can just append 'request' and 'response' objects straight from the Node HTTP client, and it'll turn them into WARC records [03:18] and automatically save and cut off at a certain size [03:25] joepie91: if you have time, maybe you can compare my implementation with yours: https://github.com/chfoo/warcat . i suggest only looking at warc output and not code to avoid copying bugs [03:26] chfoo: might be a good idea [03:26] chfoo: you don't happen to have anything handy that speaks CDX? [03:28] joepie91: https://github.com/chfoo/wpull/blob/64dda8a156fd4a4c5877fbd8a99290fb9d3b9284/wpull/warc.py#L186 and https://github.com/chfoo/wpull/blob/2a595381d38e67eb01a61c49857706ad1f26c72a/wpull/recorder.py#L421 [03:29] chfoo: that's handy as a reference on WARC -> CDX, but I mostly just need something that can verify the CDX files generated, including actually trying to do something with the fields [03:29] or does it do that also? [03:30] (as an aside; do the field markers have to be separated by the delimiter as well? I thought the delimiter only applied to record lines, and that field markers were always space-delimited) [03:30] joepie91: oh, no sorry. i just dump it out. [03:31] i;ve read someone the first character is the delimiter [03:31] bonus question: how does CDX deal with field values that contain spaces (or rather, delimiters) [03:31] yes [03:31] but I thought that only applies to record lines [03:31] i might have a bug then. [03:32] chfoo: maybe you do, maybe you don't [03:32] realistically the CDX docs are more or less non-existent [03:32] lol [03:32] hm [03:32] doesn't heritrix write CDX? [03:32] perhaps look at the source for that [03:32] alternatively, there's some CDX-Writer thing that might have pointers [03:32] i deal with delimiters by following whatwg url spec by percent encoding spaces or less than u+0020 [03:32] is that valid for all fields? not just URL? [03:33] oh. i think i have another bug then [03:33] lol [03:33] I guess I'm going to have to read Heritrix source... [03:36] joepie91: i suggest reading cdx-writer source code later because it also had bugs not recognizing wget headers [03:37] heh [03:37] itc: every single archiving-related tool has breaking bugs [03:37] :P [04:34] chfoo: ping? [04:34] I did *not* see your notaol work [04:34] I was going to start out doing the same thing :P [04:35] chfoo: you have the fdo documentation I hope? if not: https://files.app.net/wjpq0CBVU.zip [04:36] balrog: i got stuck on the protocol deserialization. [04:36] aah [04:36] might want to rejoin #aohell [04:36] if you need help though, I'm very interested in making this work [11:33] SketchCow: your getting a EPIC huge 80s commerical mix video [11:33] its over 4 hours [14:20] anyone here with ubuntu? [14:20] I'd like to verify a little thing [14:29] shoot [15:36] There's a vulnerability in apt, update recommended: https://www.debian.org/security/2014/dsa-3025 [15:39] do i update it via apt? [15:39] :D [15:40] checking the signatures by hand might be a good idea. :) [15:41] I'll get out my slide rule. [15:41] haha [16:19] nevermind on what I said [16:51] Twitpic gets the archiveteam clients. http://i.imgur.com/MC81uSF.gif [16:54] lol [17:00] SketchCow: Do you have a statue at the IA? [17:07] It is being made right now. [17:07] I spoke to the sculptor yesterday. We were discussing wings. [17:07] I'm going to purchase wings and put them on it, because clay wings will cause a balance/weight issue. [17:07] hahaha [17:08] Plus then the wings can be larger. [17:08] nice [17:08] I expect to have it waiting for me when I return in October. [17:09] Very nice.