#archiveteam 2011-08-08,Mon

↑back Search

Time Nickname Message
01:20 πŸ”— Coderjoe well darn.
01:20 πŸ”— Coderjoe the high speed drive in the duplicator is not compatible with the kryoflux (or a standard PC floppy controller, for that matter)
01:21 πŸ”— Coderjoe it would be awesome to get the KF to support it, though
01:21 πŸ”— Coderjoe or some other tool
01:22 πŸ”— Coderjoe it runs at 600RPM (or 720RPM with a jumper change), but the cool part is that it can read or write both sides of the floppy simultaneously
01:22 πŸ”— Coderjoe and I found docs today on the pinout differences
01:23 πŸ”— Coderjoe also, I read something interesting WRT drive speed. And it makes sense once you think about it. The faster the media rotates past the heads, the stronger the pulses from the head (meaning you can pick up weaker signals a normal drive may have trouble with)
01:25 πŸ”— Coderjoe also, checked alignment on four drives, and adjusted three of them. (the fourth was rather inconsistent. it gets the drive to the tracks, within 600 microinches, but the offset from the track changed each time I tested. Other drives are much more solid)
01:27 πŸ”— Coderjoe btw, unless you want to spend a LOT of time, don't even think about adjusting the offsets of the heads relative to each other. (the only ordinary adjustment usually needed is radial, made by rotating the motor a small amount)
01:27 πŸ”— Coderjoe ... on 3.5" drives. 5.25" are quite different
01:30 πŸ”— chronomex good to know
01:32 πŸ”— Coderjoe I spent 8 hours trying to get one drive's heads back in place after I decided I didn't like how much of an offset there was between heads
01:32 πŸ”— Coderjoe (I had software and a special alignment floppy to check the radial alignment, azimuth, and index timing)
01:33 πŸ”— Coderjoe i need to find a 5.25" alignment disk
01:35 πŸ”— Coderjoe and perhaps write some software to use the KF to do the alignment tests. Then, if I can find another drive to mutilate (I only have one 5.25") I can potentially adjust the head alignment to remove that 8 sector offset for flippies
01:46 πŸ”— swebb I pruned-out some of the more inactive groups in the irc log to clean things up a little.
02:54 πŸ”— Silent700 server readonly -- tasks waiting for harddrive fix :(
03:35 πŸ”— swebb I'm accepting feedback: http://badcheese.com/~steve/archiveteamfire.jpg
03:36 πŸ”— no2pencil that shit it hot, yo
03:52 πŸ”— chronomex swebb: pretty. is this inspired by the volunteer firefighter comment?
04:54 πŸ”— swebb Nah. Just messing around. :)
04:55 πŸ”— jch le word!
05:03 πŸ”— Coderjoe L'oiseau est le mot.
05:06 πŸ”— swebb Flame-y: http://www.youtube.com/watch?v=lBI9v3Lzyss
05:27 πŸ”— ndurner1 swebb: can you run two instances in discover mode?
05:28 πŸ”— swebb Sure. How do I do that?
05:28 πŸ”— swebb I think that I saw the instructions somewhere.
05:29 πŸ”— swebb Just add 'discover' after the script name? Does that work with the IPv^ one?
05:29 πŸ”— ndurner1 pass "discover" instead of "download" as the first parameter
05:29 πŸ”— swebb IPv6 that is.
05:29 πŸ”— swebb I'll give it a shot.
05:29 πŸ”— ndurner1 hm, don't know..
05:30 πŸ”— ndurner1 thanks
05:33 πŸ”— swebb I fired one up. Machine load is higher, but it's not outputting anything.
05:36 πŸ”— swebb Fired up a second one.
05:41 πŸ”— swebb Can you tell if I'm sending any discover work in?
05:52 πŸ”— swebb The IPv6 script wasn't doing anything in discover mode, so I downloaded and started the IPv4 one in discover mode. It's outputting stuff now.
06:06 πŸ”— swebb Smokier: http://www.youtube.com/watch?v=QgjN6kmgaOI
06:08 πŸ”— ndurner1 ack
06:22 πŸ”— swebb No wind: http://www.youtube.com/watch?v=htBRSXKCB0s
08:54 πŸ”— db48xOthe howdy all
08:57 πŸ”— no2pencil hey db48xOthe
09:00 πŸ”— db48xOthe no2pencil: what's new?
09:11 πŸ”— db48xOthe wiki is quiet
09:11 πŸ”— chronomex wikiwikiwiki
09:12 πŸ”— * chronomex still at defcon, still drunk
09:12 πŸ”— db48xOthe some spam, but soultcer has been on top of it
09:12 πŸ”— db48xOthe chronomex: cool. learn anything?
09:15 πŸ”— chronomex ummmmmm
09:15 πŸ”— chronomex probably?
09:15 πŸ”— db48xOthe heh
09:15 πŸ”— chronomex lulz
09:16 πŸ”— chronomex its fun, defcon's a party more than anything
09:18 πŸ”— anna1987 depositfiles.com/files/rtx2j0qz4
09:36 πŸ”— alard db48x0the: Hey. I think it's time to ask the wget mailinglist about adding the WARC extension. Except for the metadata records, which I'm not sure about, I think it is more or less finished.
09:37 πŸ”— alard Any tips for writing to a GNU mailing list?
09:42 πŸ”— db48xOthe nope
09:42 πŸ”— db48xOthe the last email I wrote about a change to a gnu program (patch) was ignored
09:44 πŸ”— alard Ah. I've browsed through the bug-wget archive, and it seems reasonably active. The chance of getting at least a reply is pretty high, I think.
09:44 πŸ”— alard I'll give it a try.
09:44 πŸ”— db48xOthe yea, should have better luck
09:45 πŸ”— db48xOthe I'd like wget to automatically add the records described in section 2.4.4 of the WARC Guidelines document
09:46 πŸ”— db48xOthe it can just add the command line that was used to invoke wget as the crawler configuration
09:48 πŸ”— db48xOthe then as an archivist I would add another record that includes a copy of the script I used to run wget that references the metadata records that it created
09:48 πŸ”— alard Yes. Well, the command line arguments are already included in the warcinfo headers, but it might be useful to add these extra 2.4.4 records as well.
09:48 πŸ”— db48xOthe oh, interesting
09:49 πŸ”— alard (And you can add your own headers to that by providing --warc-header options, so you could add your name, organization etc.)
09:49 πŸ”— alard So how would that work? Would you provide wget with the filename of the script?
09:50 πŸ”— db48xOthe nah, I'd just build a record and append it to the file
09:52 πŸ”— alard Okay. (It doesn't feel right to me to put that kind of functionality in wget. It doesn't really belong there, I think.)
09:52 πŸ”— db48xOthe gzip crawl.sh > crawl.gz; echo '...' > headers; cat headers crawl.gz > metadata-record, etc
09:52 πŸ”— db48xOthe alard: yea, I agree
09:52 πŸ”— alard So which of the records in section 2.4.4 could wget add?
09:52 πŸ”— alard The list of warcinfo-ids should be no problem.
09:52 πŸ”— db48xOthe all three of them
09:53 πŸ”— db48xOthe the log is a bit tricky, since it might not have been kept, or it might have only been appended to an existing file
09:53 πŸ”— alard Yeah.
09:53 πŸ”— alard A temporary file? (another)
09:53 πŸ”— db48xOthe yea
09:53 πŸ”— db48xOthe --warc-log
09:54 πŸ”— alard The -nv log level? Or more detailed?
09:55 πŸ”— alard (With -nv, it might be possible to keep it in memory, for not too large crawls.)
09:57 πŸ”— db48xOthe I think it should respect whatever logging options the user has set
09:57 πŸ”— alard That's better, yes.
09:58 πŸ”— db48xOthe so just whatever would have gone to stdout or the file specified by -o or -a
09:58 πŸ”— db48xOthe should make it easier to implement
09:58 πŸ”— db48xOthe oh, and I haven't been able to compile it
09:58 πŸ”— alard Oh?
09:59 πŸ”— db48xOthe it can't find a header file
09:59 πŸ”— alard Is that the git version, or one of the tar.gz?
09:59 πŸ”— db48xOthe tmp-file.h or something
09:59 πŸ”— db48xOthe the git version
10:00 πŸ”— alard Maybe you should run ./bootstrap.sh again?
10:00 πŸ”— db48xOthe hmm
10:00 πŸ”— db48xOthe that's worth a try
10:00 πŸ”— db48xOthe had forgotten about it
10:01 πŸ”— alard It's possible that I have added an extra gnulib requirement.
10:02 πŸ”— alard I added tmpdir on July 06, 2011.
10:03 πŸ”— db48xOthe alas, nobody has commented on my wget patch
10:03 πŸ”— db48xOthe https://savannah.gnu.org/bugs/index.php?33654
10:08 πŸ”— alard That's a pity. Seems like a sensible change. (But then, it's only a month or so ago, so who knows.)
10:08 πŸ”— db48xOthe so what are you going to put in your email?
10:09 πŸ”— db48xOthe oh, and there was another idea I had
10:09 πŸ”— db48xOthe it'd be cool if I could feed wget with a WARC file that it had previously created, and have it create the files on disk that go with it
10:10 πŸ”— alard Don't know yet. Introduce WARC, say that it is very useful to have for archivists, point to the warctools library and point to the github repository, ask whether they think this is something to add to wget?
10:10 πŸ”— alard In any case, I'll have a look at the metadata records first.
10:10 πŸ”— alard WARC extraction would be cool, but is probably not something that wget should do?
10:11 πŸ”— alard http://groups.google.com/group/warc-tools/browse_thread/thread/e65be965b86e0939
10:13 πŸ”— db48xOthe well, I would hate to write another program that mimics wget's processing of the files, to make the links work and all
10:19 πŸ”— alard That's true. I forgot about the making-the-links-work bit.
10:19 πŸ”— alard It would be an 'offline wget'.
10:27 πŸ”— db48xOthe yea, of a sort
12:52 πŸ”— db48xOthe alard: maybe we should just store the modified versions of the files in the WARC file, in addition to the originals?
12:56 πŸ”— db48xOthe with a header that distinguishes the originals from the modified versions
12:56 πŸ”— alard Wouldn't that lead to a lot of unnecessary duplication?
12:56 πŸ”— alard It doesn't contain any new information.
12:57 πŸ”— db48xOthe kinda true
12:57 πŸ”— db48xOthe unless they change the way the files are processed to make the local mirror, in which case you wouldn't be able to recover it later
12:57 πŸ”— alard There is the 'conversion' record type for this, by the way.
12:57 πŸ”— db48xOthe right
12:58 πŸ”— alard A 'conversion' record shall contain an alternative version of another record's content that was created as
12:58 πŸ”— alard the result of an archival process. Typically, this is used to hold content transformations that maintain viability of content after widely available rendering tools for the originally stored format disappear
12:59 πŸ”— db48xOthe indeed
12:59 πŸ”— db48xOthe should wget dissapear one day, it'll be hard to browse the files in the warc because the links don't work
12:59 πŸ”— alard but that is mainly about file formats, I think.
12:59 πŸ”— db48xOthe agreed
12:59 πŸ”— alard Is it? Everything is there. It's not different from a warc generated by Heritrix, for instance.
13:00 πŸ”— db48xOthe everything is there except that all of the links are broken
13:00 πŸ”— alard So you can use one of the available wayback tools to serve the pages (and even rewrite the urls).
13:00 πŸ”— alard It should be a postprocessing step, in my opinion.
13:00 πŸ”— db48xOthe ok
13:01 πŸ”— db48xOthe was just a crazy idea
13:01 πŸ”— alard :)
13:01 πŸ”— alard I think that what wget does is nothing more than a hack, which is necessary to make it work, but is far from ideal.
13:03 πŸ”— alard A tool that generates a local mirror from a WARC file could be useful, though.
13:03 πŸ”— db48xOthe hmm. you haven't posted to the list yet?
13:03 πŸ”— alard No, I'm currently working on the log files and metadata stuff.
13:03 πŸ”— db48xOthe ah
13:03 πŸ”— alard Nearly done.
13:03 πŸ”— db48xOthe sweet
13:03 πŸ”— alard Should it be the default, or optional?
13:05 πŸ”— db48xOthe I suggested --warc-log, but I don't really see why you wouldn't want it in the warc file if you're doing -o or -a
13:06 πŸ”— alard True.
13:06 πŸ”— alard I have it enabled by default now, with --no-warc-keep-log for if you don't want it.
13:06 πŸ”— db48xOthe works for me
13:07 πŸ”— alard Also, would it be useful to store the metadata in a separate WARC file?
13:07 πŸ”— alard .meta.warc.gz ?
13:07 πŸ”— alard (For the multi-warc case, for the single warc case it's probably better to keep it in the same file.)
13:10 πŸ”— db48xOthe that's an idea
13:10 πŸ”— db48xOthe although I'm not sure if I really would want to go that way
13:10 πŸ”— db48xOthe the metadata could get separated from the data
13:11 πŸ”— alard Hmm, yeah.
13:11 πŸ”— alard But in the multi-warc case, you already have multiple files.
13:11 πŸ”— alard mywarc.00000.warc.gz / mywarc.00001.warc.gz etc.
13:12 πŸ”— alard and then you would have mywarc.meta.warc.gz
13:12 πŸ”— alard (instead of a log file in mywarc.00001.warc.gz)
13:16 πŸ”— db48xOthe the metadata should be in every file, of course
13:16 πŸ”— db48xOthe disk space is cheap, and anyway it's all compressed
13:19 πŸ”— alard That's not what the guidelines document suggests.
13:20 πŸ”— alard There is a warcinfo record with some metadata in each file, of course, but the log file etc. are different.
13:20 πŸ”— db48xOthe guidelines can be wrong :)
13:20 πŸ”— alard section 2.4.3: It is recommended that all resource records containing processing information files are
13:20 πŸ”— alard stored in a specific WARC file (that may be called a Γ―ΒΏΒ½metadata WARC fileΓ―ΒΏΒ½).
13:20 πŸ”— db48xOthe anyway, bbl
13:20 πŸ”— alard Okay!
13:21 πŸ”— db48xOthe oh, and wget-warc is failing to build still
13:21 πŸ”— db48xOthe but for a different reason
13:21 πŸ”— db48xOthe no Makefile in trunk/libwarc/base32
13:23 πŸ”— alard automake?
13:23 πŸ”— alard Perhaps start again with a clean checkout, or run make clean or make distclean (never know which does what)
13:25 πŸ”— db48xOthe automake doesn't fix it
13:26 πŸ”— alard Strange.
13:26 πŸ”— alard (Here I must admit that I'm not an expert in these tools, I just run a few of them and then it eventually works.)
13:27 πŸ”— db48xOthe heh, same here
13:27 πŸ”— alard There is a Makefile.am in trunk/libwarc that includes things from trunk/libwarc/base32, so I guess it generates it from there.
13:29 πŸ”— db48xOthe sounds like it shouldn't need to recurse into base32 then
13:31 πŸ”— alard I do have a Makefile in trunk/libwarc/base32
13:38 πŸ”— alard I think the Makefile comes from the base32 source. I removed it, now it won't build.
13:40 πŸ”— alard It must have escaped via .gitignore. I've committed it now, git update and it should (hopefully) work.
15:03 πŸ”— emijrp SketchCow: thanks for helping wikiteam to get a corner at IA
15:04 πŸ”— emijrp can you "open" another request, to mirro Jamendo? i developed a script which can be located in a IA server to slurp the whole albums collection (~2TB)
15:04 πŸ”— emijrp i would be glad to provide it
16:54 πŸ”— jch I am at the CCC
19:24 πŸ”— SketchCow OKAY HELLO
19:24 πŸ”— SketchCow HERE I AM
19:24 πŸ”— SketchCow emijrp: Send me an e-mail about it, so I can bring it up with the right people.
19:24 πŸ”— SketchCow If need be, I can, of course, create non-wayback versions that we host.
19:50 πŸ”— SketchCow Bunch of people said they were going to join Archive Team after my speech.
19:50 πŸ”— SketchCow We'll see how that goes.
19:50 πŸ”— SketchCow alard, you're back!
19:50 πŸ”— SketchCow jch: How's CCC?
19:55 πŸ”— emijrp sent
20:07 πŸ”— db48xOthe aha
20:08 πŸ”— db48xOthe wget compiles now
20:14 πŸ”— db48xOthe ok, what I need right now is a program to help me dissect binary files where the format is only partially known
20:14 πŸ”— db48xOthe I want to be able to assign field names to ranges of bytes
20:15 πŸ”— db48xOthe and to search the files for segments that look like valid instances of known formats
20:51 πŸ”— swebb The Google Groups tracker is kinda-sorta dead again.
21:24 πŸ”— SketchCow I'm downloading Jamendo.
21:25 πŸ”— SketchCow Off it goes!
21:25 πŸ”— SketchCow 12 downloaded, 50,000 to go
21:43 πŸ”— alard SketchCow: Hi!
21:43 πŸ”— alard (I don't take my computer with me when I go on vacation. :)
21:45 πŸ”— alard db48xOthe: good.
21:46 πŸ”— alard I'll see about the wget mailinglist tomorrow.
21:52 πŸ”— SketchCow OK, great.
22:08 πŸ”— alard SketchCow: Nice Twaud.io collection. I'm not sure if you've already found them, but there may be more files in my upload directory. (gv_14 on blindtiger)
22:11 πŸ”— SketchCow I think that's what I took.
22:11 πŸ”— SketchCow In fact, I'm sure of it.
22:13 πŸ”— alard Okay, if you're sure. It's still there, so I was just wondering. :) (The text file only lists things by underscor, http://www.archive.org/details/twaudio-2009-2011 )
22:18 πŸ”— alard So actually, I can't find where you've put mine.
22:21 πŸ”— Silent700 SketchCow: any thoughts on mirroring/archiving/buying the Disk Sleeve Archive? (http://www.cyberden.com/dsa)
22:21 πŸ”— Silent700 obscure and geeky, yes, but good history IMO
22:26 πŸ”— SketchCow I can look into it.
22:26 πŸ”— SketchCow http://www.archive.org/details/philosophicaltransactions
22:27 πŸ”— SketchCow Why look, 18,500 Royal Society Philosphical Transactions from 1923 and older.
22:27 πŸ”— SketchCow I wonder how that got there.
22:27 πŸ”— SketchCow alard: I didn't just check, so I didn't see.
22:27 πŸ”— SketchCow I will happily pair them up, give me a moment.
22:49 πŸ”— ersi jch: you in berlin yet?
22:49 πŸ”— ersi jch: I'm here!

irclogger-viewer