[01:20] well darn. [01:20] the high speed drive in the duplicator is not compatible with the kryoflux (or a standard PC floppy controller, for that matter) [01:21] it would be awesome to get the KF to support it, though [01:21] or some other tool [01:22] it runs at 600RPM (or 720RPM with a jumper change), but the cool part is that it can read or write both sides of the floppy simultaneously [01:22] and I found docs today on the pinout differences [01:23] also, I read something interesting WRT drive speed. And it makes sense once you think about it. The faster the media rotates past the heads, the stronger the pulses from the head (meaning you can pick up weaker signals a normal drive may have trouble with) [01:25] also, checked alignment on four drives, and adjusted three of them. (the fourth was rather inconsistent. it gets the drive to the tracks, within 600 microinches, but the offset from the track changed each time I tested. Other drives are much more solid) [01:27] btw, unless you want to spend a LOT of time, don't even think about adjusting the offsets of the heads relative to each other. (the only ordinary adjustment usually needed is radial, made by rotating the motor a small amount) [01:27] ... on 3.5" drives. 5.25" are quite different [01:30] good to know [01:32] I spent 8 hours trying to get one drive's heads back in place after I decided I didn't like how much of an offset there was between heads [01:32] (I had software and a special alignment floppy to check the radial alignment, azimuth, and index timing) [01:33] i need to find a 5.25" alignment disk [01:35] and perhaps write some software to use the KF to do the alignment tests. Then, if I can find another drive to mutilate (I only have one 5.25") I can potentially adjust the head alignment to remove that 8 sector offset for flippies [01:46] I pruned-out some of the more inactive groups in the irc log to clean things up a little. [02:54] server readonly -- tasks waiting for harddrive fix :( [03:35] I'm accepting feedback: http://badcheese.com/~steve/archiveteamfire.jpg [03:36] that shit it hot, yo [03:52] swebb: pretty. is this inspired by the volunteer firefighter comment? [04:54] Nah. Just messing around. :) [04:55] le word! [05:03] L'oiseau est le mot. [05:06] Flame-y: http://www.youtube.com/watch?v=lBI9v3Lzyss [05:27] swebb: can you run two instances in discover mode? [05:28] Sure. How do I do that? [05:28] I think that I saw the instructions somewhere. [05:29] Just add 'discover' after the script name? Does that work with the IPv^ one? [05:29] pass "discover" instead of "download" as the first parameter [05:29] IPv6 that is. [05:29] I'll give it a shot. [05:29] hm, don't know.. [05:30] thanks [05:33] I fired one up. Machine load is higher, but it's not outputting anything. [05:36] Fired up a second one. [05:41] Can you tell if I'm sending any discover work in? [05:52] The IPv6 script wasn't doing anything in discover mode, so I downloaded and started the IPv4 one in discover mode. It's outputting stuff now. [06:06] Smokier: http://www.youtube.com/watch?v=QgjN6kmgaOI [06:08] ack [06:22] No wind: http://www.youtube.com/watch?v=htBRSXKCB0s [08:54] howdy all [08:57] hey db48xOthe [09:00] no2pencil: what's new? [09:11] wiki is quiet [09:11] wikiwikiwiki [09:12] * chronomex still at defcon, still drunk [09:12] some spam, but soultcer has been on top of it [09:12] chronomex: cool. learn anything? [09:15] ummmmmm [09:15] probably? [09:15] heh [09:15] lulz [09:16] its fun, defcon's a party more than anything [09:18] depositfiles.com/files/rtx2j0qz4 [09:36] db48x0the: Hey. I think it's time to ask the wget mailinglist about adding the WARC extension. Except for the metadata records, which I'm not sure about, I think it is more or less finished. [09:37] Any tips for writing to a GNU mailing list? [09:42] nope [09:42] the last email I wrote about a change to a gnu program (patch) was ignored [09:44] Ah. I've browsed through the bug-wget archive, and it seems reasonably active. The chance of getting at least a reply is pretty high, I think. [09:44] I'll give it a try. [09:44] yea, should have better luck [09:45] I'd like wget to automatically add the records described in section 2.4.4 of the WARC Guidelines document [09:46] it can just add the command line that was used to invoke wget as the crawler configuration [09:48] then as an archivist I would add another record that includes a copy of the script I used to run wget that references the metadata records that it created [09:48] Yes. Well, the command line arguments are already included in the warcinfo headers, but it might be useful to add these extra 2.4.4 records as well. [09:48] oh, interesting [09:49] (And you can add your own headers to that by providing --warc-header options, so you could add your name, organization etc.) [09:49] So how would that work? Would you provide wget with the filename of the script? [09:50] nah, I'd just build a record and append it to the file [09:52] Okay. (It doesn't feel right to me to put that kind of functionality in wget. It doesn't really belong there, I think.) [09:52] gzip crawl.sh > crawl.gz; echo '...' > headers; cat headers crawl.gz > metadata-record, etc [09:52] alard: yea, I agree [09:52] So which of the records in section 2.4.4 could wget add? [09:52] The list of warcinfo-ids should be no problem. [09:52] all three of them [09:53] the log is a bit tricky, since it might not have been kept, or it might have only been appended to an existing file [09:53] Yeah. [09:53] A temporary file? (another) [09:53] yea [09:53] --warc-log [09:54] The -nv log level? Or more detailed? [09:55] (With -nv, it might be possible to keep it in memory, for not too large crawls.) [09:57] I think it should respect whatever logging options the user has set [09:57] That's better, yes. [09:58] so just whatever would have gone to stdout or the file specified by -o or -a [09:58] should make it easier to implement [09:58] oh, and I haven't been able to compile it [09:58] Oh? [09:59] it can't find a header file [09:59] Is that the git version, or one of the tar.gz? [09:59] tmp-file.h or something [09:59] the git version [10:00] Maybe you should run ./bootstrap.sh again? [10:00] hmm [10:00] that's worth a try [10:00] had forgotten about it [10:01] It's possible that I have added an extra gnulib requirement. [10:02] I added tmpdir on July 06, 2011. [10:03] alas, nobody has commented on my wget patch [10:03] https://savannah.gnu.org/bugs/index.php?33654 [10:08] That's a pity. Seems like a sensible change. (But then, it's only a month or so ago, so who knows.) [10:08] so what are you going to put in your email? [10:09] oh, and there was another idea I had [10:09] it'd be cool if I could feed wget with a WARC file that it had previously created, and have it create the files on disk that go with it [10:10] Don't know yet. Introduce WARC, say that it is very useful to have for archivists, point to the warctools library and point to the github repository, ask whether they think this is something to add to wget? [10:10] In any case, I'll have a look at the metadata records first. [10:10] WARC extraction would be cool, but is probably not something that wget should do? [10:11] http://groups.google.com/group/warc-tools/browse_thread/thread/e65be965b86e0939 [10:13] well, I would hate to write another program that mimics wget's processing of the files, to make the links work and all [10:19] That's true. I forgot about the making-the-links-work bit. [10:19] It would be an 'offline wget'. [10:27] yea, of a sort [12:52] alard: maybe we should just store the modified versions of the files in the WARC file, in addition to the originals? [12:56] with a header that distinguishes the originals from the modified versions [12:56] Wouldn't that lead to a lot of unnecessary duplication? [12:56] It doesn't contain any new information. [12:57] kinda true [12:57] unless they change the way the files are processed to make the local mirror, in which case you wouldn't be able to recover it later [12:57] There is the 'conversion' record type for this, by the way. [12:57] right [12:58] A 'conversion' record shall contain an alternative version of another record's content that was created as [12:58] the result of an archival process. Typically, this is used to hold content transformations that maintain viability of content after widely available rendering tools for the originally stored format disappear [12:59] indeed [12:59] should wget dissapear one day, it'll be hard to browse the files in the warc because the links don't work [12:59] but that is mainly about file formats, I think. [12:59] agreed [12:59] Is it? Everything is there. It's not different from a warc generated by Heritrix, for instance. [13:00] everything is there except that all of the links are broken [13:00] So you can use one of the available wayback tools to serve the pages (and even rewrite the urls). [13:00] It should be a postprocessing step, in my opinion. [13:00] ok [13:01] was just a crazy idea [13:01] :) [13:01] I think that what wget does is nothing more than a hack, which is necessary to make it work, but is far from ideal. [13:03] A tool that generates a local mirror from a WARC file could be useful, though. [13:03] hmm. you haven't posted to the list yet? [13:03] No, I'm currently working on the log files and metadata stuff. [13:03] ah [13:03] Nearly done. [13:03] sweet [13:03] Should it be the default, or optional? [13:05] I suggested --warc-log, but I don't really see why you wouldn't want it in the warc file if you're doing -o or -a [13:06] True. [13:06] I have it enabled by default now, with --no-warc-keep-log for if you don't want it. [13:06] works for me [13:07] Also, would it be useful to store the metadata in a separate WARC file? [13:07] .meta.warc.gz ? [13:07] (For the multi-warc case, for the single warc case it's probably better to keep it in the same file.) [13:10] that's an idea [13:10] although I'm not sure if I really would want to go that way [13:10] the metadata could get separated from the data [13:11] Hmm, yeah. [13:11] But in the multi-warc case, you already have multiple files. [13:11] mywarc.00000.warc.gz / mywarc.00001.warc.gz etc. [13:12] and then you would have mywarc.meta.warc.gz [13:12] (instead of a log file in mywarc.00001.warc.gz) [13:16] the metadata should be in every file, of course [13:16] disk space is cheap, and anyway it's all compressed [13:19] That's not what the guidelines document suggests. [13:20] There is a warcinfo record with some metadata in each file, of course, but the log file etc. are different. [13:20] guidelines can be wrong :) [13:20] section 2.4.3: It is recommended that all resource records containing processing information files are [13:20] stored in a specific WARC file (that may be called a ï¿½metadata WARC fileï¿½). [13:20] anyway, bbl [13:20] Okay! [13:21] oh, and wget-warc is failing to build still [13:21] but for a different reason [13:21] no Makefile in trunk/libwarc/base32 [13:23] automake? [13:23] Perhaps start again with a clean checkout, or run make clean or make distclean (never know which does what) [13:25] automake doesn't fix it [13:26] Strange. [13:26] (Here I must admit that I'm not an expert in these tools, I just run a few of them and then it eventually works.) [13:27] heh, same here [13:27] There is a Makefile.am in trunk/libwarc that includes things from trunk/libwarc/base32, so I guess it generates it from there. [13:29] sounds like it shouldn't need to recurse into base32 then [13:31] I do have a Makefile in trunk/libwarc/base32 [13:38] I think the Makefile comes from the base32 source. I removed it, now it won't build. [13:40] It must have escaped via .gitignore. I've committed it now, git update and it should (hopefully) work. [15:03] SketchCow: thanks for helping wikiteam to get a corner at IA [15:04] can you "open" another request, to mirro Jamendo? i developed a script which can be located in a IA server to slurp the whole albums collection (~2TB) [15:04] i would be glad to provide it [16:54] I am at the CCC [19:24] OKAY HELLO [19:24] HERE I AM [19:24] emijrp: Send me an e-mail about it, so I can bring it up with the right people. [19:24] If need be, I can, of course, create non-wayback versions that we host. [19:50] Bunch of people said they were going to join Archive Team after my speech. [19:50] We'll see how that goes. [19:50] alard, you're back! [19:50] jch: How's CCC? [19:55] sent [20:07] aha [20:08] wget compiles now [20:14] ok, what I need right now is a program to help me dissect binary files where the format is only partially known [20:14] I want to be able to assign field names to ranges of bytes [20:15] and to search the files for segments that look like valid instances of known formats [20:51] The Google Groups tracker is kinda-sorta dead again. [21:24] I'm downloading Jamendo. [21:25] Off it goes! [21:25] 12 downloaded, 50,000 to go [21:43] SketchCow: Hi! [21:43] (I don't take my computer with me when I go on vacation. :) [21:45] db48xOthe: good. [21:46] I'll see about the wget mailinglist tomorrow. [21:52] OK, great. [22:08] SketchCow: Nice Twaud.io collection. I'm not sure if you've already found them, but there may be more files in my upload directory. (gv_14 on blindtiger) [22:11] I think that's what I took. [22:11] In fact, I'm sure of it. [22:13] Okay, if you're sure. It's still there, so I was just wondering. :) (The text file only lists things by underscor, http://www.archive.org/details/twaudio-2009-2011 ) [22:18] So actually, I can't find where you've put mine. [22:21] SketchCow: any thoughts on mirroring/archiving/buying the Disk Sleeve Archive? (http://www.cyberden.com/dsa) [22:21] obscure and geeky, yes, but good history IMO [22:26] I can look into it. [22:26] http://www.archive.org/details/philosophicaltransactions [22:27] Why look, 18,500 Royal Society Philosphical Transactions from 1923 and older. [22:27] I wonder how that got there. [22:27] alard: I didn't just check, so I didn't see. [22:27] I will happily pair them up, give me a moment. [22:49] jch: you in berlin yet? [22:49] jch: I'm here!