#archiveteam-bs 2014-09-17,Wed

↑back Search

Time Nickname Message
00:00 🔗 joepie91 aside: http://sprunge.us/caKB
00:40 🔗 dashcloud Please enjoy "Sounds of Slashdot": https://archive.org/details/SoundsOfSlashdot
00:58 🔗 joepie91 SketchCow: FYI; the CDX documentation on archive.org is missing the "S" parameter, which appears to indicate content-length
00:58 🔗 joepie91 awesome. look like I now have a streaming CDX parser
00:58 🔗 joepie91 looks *
01:00 🔗 SketchCow How's the fundraising.
01:01 🔗 joepie91 SketchCow: campaign has ended, $10k-$11k acquired
01:01 🔗 joepie91 out of the $17k target
01:01 🔗 joepie91 so I'll be good for a few months
01:01 🔗 SketchCow Good start.
01:02 🔗 SketchCow Webcam dancing.
01:02 🔗 joepie91 indeed :)
01:02 🔗 joepie91 lol
01:02 🔗 SketchCow That's all I'm saying.
01:02 🔗 SketchCow Ladyboy shows
01:02 🔗 joepie91 anyway, SketchCow, any input on CDX questions above?
01:02 🔗 SketchCow I'm like Elon Musk
01:02 🔗 joepie91 the documentation is... sparse, to say the least
01:02 🔗 SketchCow I wish I could tell you I knew the first thing.
01:02 🔗 SketchCow And you saying "the documentation is sparse" is like me saying "you probably know dutch"
01:03 🔗 SketchCow The whole place needs a doc writeup
01:03 🔗 joepie91 haha
01:04 🔗 joepie91 not going to disagree
01:04 🔗 joepie91 anyway, streaming CDX parser works, correctly parses an IA-generated cdx
01:04 🔗 joepie91 from a wget-generated WARC
01:04 🔗 joepie91 so minimum bar of viability reached
01:05 🔗 joepie91 now on to WARC itself...
01:05 🔗 joepie91 no, actually, need to implement writing a CDX *generator* first
01:05 🔗 joepie91 er
01:05 🔗 joepie91 s/writing//
01:05 🔗 joepie91 but, brb
01:28 🔗 * joepie91 has returned
01:28 🔗 joepie91 it's funny how $cat apparently understands the concept of changing cat litter
01:29 🔗 joepie91 comes asking me to change it, then sits quietly like a meter away from me, watching what I do, waiting until there's clean litter
03:18 🔗 joepie91 mmm... I *think* I can create an interface in my WARC library where you can just append 'request' and 'response' objects straight from the Node HTTP client, and it'll turn them into WARC records
03:18 🔗 joepie91 and automatically save and cut off at a certain size
03:25 🔗 chfoo joepie91: if you have time, maybe you can compare my implementation with yours: https://github.com/chfoo/warcat . i suggest only looking at warc output and not code to avoid copying bugs
03:26 🔗 joepie91 chfoo: might be a good idea
03:26 🔗 joepie91 chfoo: you don't happen to have anything handy that speaks CDX?
03:28 🔗 chfoo joepie91: https://github.com/chfoo/wpull/blob/64dda8a156fd4a4c5877fbd8a99290fb9d3b9284/wpull/warc.py#L186 and https://github.com/chfoo/wpull/blob/2a595381d38e67eb01a61c49857706ad1f26c72a/wpull/recorder.py#L421
03:29 🔗 joepie91 chfoo: that's handy as a reference on WARC -> CDX, but I mostly just need something that can verify the CDX files generated, including actually trying to do something with the fields
03:29 🔗 joepie91 or does it do that also?
03:30 🔗 joepie91 (as an aside; do the field markers have to be separated by the delimiter as well? I thought the delimiter only applied to record lines, and that field markers were always space-delimited)
03:30 🔗 chfoo joepie91: oh, no sorry. i just dump it out.
03:31 🔗 chfoo i;ve read someone the first character is the delimiter
03:31 🔗 joepie91 bonus question: how does CDX deal with field values that contain spaces (or rather, delimiters)
03:31 🔗 joepie91 yes
03:31 🔗 joepie91 but I thought that only applies to record lines
03:31 🔗 chfoo i might have a bug then.
03:32 🔗 joepie91 chfoo: maybe you do, maybe you don't
03:32 🔗 joepie91 realistically the CDX docs are more or less non-existent
03:32 🔗 joepie91 lol
03:32 🔗 joepie91 hm
03:32 🔗 joepie91 doesn't heritrix write CDX?
03:32 🔗 joepie91 perhaps look at the source for that
03:32 🔗 joepie91 alternatively, there's some CDX-Writer thing that might have pointers
03:32 🔗 chfoo i deal with delimiters by following whatwg url spec by percent encoding spaces or less than u+0020
03:32 🔗 joepie91 is that valid for all fields? not just URL?
03:33 🔗 chfoo oh. i think i have another bug then
03:33 🔗 joepie91 lol
03:33 🔗 joepie91 I guess I'm going to have to read Heritrix source...
03:36 🔗 chfoo joepie91: i suggest reading cdx-writer source code later because it also had bugs not recognizing wget headers
03:37 🔗 joepie91 heh
03:37 🔗 joepie91 itc: every single archiving-related tool has breaking bugs
03:37 🔗 joepie91 :P
04:34 🔗 balrog chfoo: ping?
04:34 🔗 balrog I did *not* see your notaol work
04:34 🔗 balrog I was going to start out doing the same thing :P
04:35 🔗 balrog chfoo: you have the fdo documentation I hope? if not: https://files.app.net/wjpq0CBVU.zip
04:36 🔗 chfoo balrog: i got stuck on the protocol deserialization.
04:36 🔗 balrog aah
04:36 🔗 balrog might want to rejoin #aohell
04:36 🔗 balrog if you need help though, I'm very interested in making this work
11:33 🔗 godane SketchCow: your getting a EPIC huge 80s commerical mix video
11:33 🔗 godane its over 4 hours
14:20 🔗 balrog anyone here with ubuntu?
14:20 🔗 balrog I'd like to verify a little thing
14:29 🔗 midas shoot
15:36 🔗 garyrh There's a vulnerability in apt, update recommended: https://www.debian.org/security/2014/dsa-3025
15:39 🔗 Rotab do i update it via apt?
15:39 🔗 Rotab :D
15:40 🔗 garyrh checking the signatures by hand might be a good idea. :)
15:41 🔗 aaaaaaaaa I'll get out my slide rule.
15:41 🔗 Rotab haha
16:19 🔗 balrog nevermind on what I said
16:51 🔗 SketchCow Twitpic gets the archiveteam clients. http://i.imgur.com/MC81uSF.gif
16:54 🔗 xmc lol
17:00 🔗 aaaaaaaaa SketchCow: Do you have a statue at the IA?
17:07 🔗 SketchCow It is being made right now.
17:07 🔗 SketchCow I spoke to the sculptor yesterday. We were discussing wings.
17:07 🔗 SketchCow I'm going to purchase wings and put them on it, because clay wings will cause a balance/weight issue.
17:07 🔗 xmc hahaha
17:08 🔗 SketchCow Plus then the wings can be larger.
17:08 🔗 xmc nice
17:08 🔗 SketchCow I expect to have it waiting for me when I return in October.
17:09 🔗 vantec Very nice.

irclogger-viewer