#archiveteam 2011-08-08,Mon

↑back Search

Time	Nickname	Message
01:20 ^🔗	Coderjoe	well darn.
01:20 ^🔗	Coderjoe	the high speed drive in the duplicator is not compatible with the kryoflux (or a standard PC floppy controller, for that matter)
01:21 ^🔗	Coderjoe	it would be awesome to get the KF to support it, though
01:21 ^🔗	Coderjoe	or some other tool
01:22 ^🔗	Coderjoe	it runs at 600RPM (or 720RPM with a jumper change), but the cool part is that it can read or write both sides of the floppy simultaneously
01:22 ^🔗	Coderjoe	and I found docs today on the pinout differences
01:23 ^🔗	Coderjoe	also, I read something interesting WRT drive speed. And it makes sense once you think about it. The faster the media rotates past the heads, the stronger the pulses from the head (meaning you can pick up weaker signals a normal drive may have trouble with)
01:25 ^🔗	Coderjoe	also, checked alignment on four drives, and adjusted three of them. (the fourth was rather inconsistent. it gets the drive to the tracks, within 600 microinches, but the offset from the track changed each time I tested. Other drives are much more solid)
01:27 ^🔗	Coderjoe	btw, unless you want to spend a LOT of time, don't even think about adjusting the offsets of the heads relative to each other. (the only ordinary adjustment usually needed is radial, made by rotating the motor a small amount)
01:27 ^🔗	Coderjoe	... on 3.5" drives. 5.25" are quite different
01:30 ^🔗	chronomex	good to know
01:32 ^🔗	Coderjoe	I spent 8 hours trying to get one drive's heads back in place after I decided I didn't like how much of an offset there was between heads
01:32 ^🔗	Coderjoe	(I had software and a special alignment floppy to check the radial alignment, azimuth, and index timing)
01:33 ^🔗	Coderjoe	i need to find a 5.25" alignment disk
01:35 ^🔗	Coderjoe	and perhaps write some software to use the KF to do the alignment tests. Then, if I can find another drive to mutilate (I only have one 5.25") I can potentially adjust the head alignment to remove that 8 sector offset for flippies
01:46 ^🔗	swebb	I pruned-out some of the more inactive groups in the irc log to clean things up a little.
02:54 ^🔗	Silent700	server readonly -- tasks waiting for harddrive fix :(
03:35 ^🔗	swebb	I'm accepting feedback: http://badcheese.com/~steve/archiveteamfire.jpg
03:36 ^🔗	no2pencil	that shit it hot, yo
03:52 ^🔗	chronomex	swebb: pretty. is this inspired by the volunteer firefighter comment?
04:54 ^🔗	swebb	Nah. Just messing around. :)
04:55 ^🔗	jch	le word!
05:03 ^🔗	Coderjoe	L'oiseau est le mot.
05:06 ^🔗	swebb	Flame-y: http://www.youtube.com/watch?v=lBI9v3Lzyss
05:27 ^🔗	ndurner1	swebb: can you run two instances in discover mode?
05:28 ^🔗	swebb	Sure. How do I do that?
05:28 ^🔗	swebb	I think that I saw the instructions somewhere.
05:29 ^🔗	swebb	Just add 'discover' after the script name? Does that work with the IPv^ one?
05:29 ^🔗	ndurner1	pass "discover" instead of "download" as the first parameter
05:29 ^🔗	swebb	IPv6 that is.
05:29 ^🔗	swebb	I'll give it a shot.
05:29 ^🔗	ndurner1	hm, don't know..
05:30 ^🔗	ndurner1	thanks
05:33 ^🔗	swebb	I fired one up. Machine load is higher, but it's not outputting anything.
05:36 ^🔗	swebb	Fired up a second one.
05:41 ^🔗	swebb	Can you tell if I'm sending any discover work in?
05:52 ^🔗	swebb	The IPv6 script wasn't doing anything in discover mode, so I downloaded and started the IPv4 one in discover mode. It's outputting stuff now.
06:06 ^🔗	swebb	Smokier: http://www.youtube.com/watch?v=QgjN6kmgaOI
06:08 ^🔗	ndurner1	ack
06:22 ^🔗	swebb	No wind: http://www.youtube.com/watch?v=htBRSXKCB0s
08:54 ^🔗	db48xOthe	howdy all
08:57 ^🔗	no2pencil	hey db48xOthe
09:00 ^🔗	db48xOthe	no2pencil: what's new?
09:11 ^🔗	db48xOthe	wiki is quiet
09:11 ^🔗	chronomex	wikiwikiwiki
09:12 ^🔗	*	chronomex still at defcon, still drunk
09:12 ^🔗	db48xOthe	some spam, but soultcer has been on top of it
09:12 ^🔗	db48xOthe	chronomex: cool. learn anything?
09:15 ^🔗	chronomex	ummmmmm
09:15 ^🔗	chronomex	probably?
09:15 ^🔗	db48xOthe	heh
09:15 ^🔗	chronomex	lulz
09:16 ^🔗	chronomex	its fun, defcon's a party more than anything
09:18 ^🔗	anna1987	depositfiles.com/files/rtx2j0qz4
09:36 ^🔗	alard	db48x0the: Hey. I think it's time to ask the wget mailinglist about adding the WARC extension. Except for the metadata records, which I'm not sure about, I think it is more or less finished.
09:37 ^🔗	alard	Any tips for writing to a GNU mailing list?
09:42 ^🔗	db48xOthe	nope
09:42 ^🔗	db48xOthe	the last email I wrote about a change to a gnu program (patch) was ignored
09:44 ^🔗	alard	Ah. I've browsed through the bug-wget archive, and it seems reasonably active. The chance of getting at least a reply is pretty high, I think.
09:44 ^🔗	alard	I'll give it a try.
09:44 ^🔗	db48xOthe	yea, should have better luck
09:45 ^🔗	db48xOthe	I'd like wget to automatically add the records described in section 2.4.4 of the WARC Guidelines document
09:46 ^🔗	db48xOthe	it can just add the command line that was used to invoke wget as the crawler configuration
09:48 ^🔗	db48xOthe	then as an archivist I would add another record that includes a copy of the script I used to run wget that references the metadata records that it created
09:48 ^🔗	alard	Yes. Well, the command line arguments are already included in the warcinfo headers, but it might be useful to add these extra 2.4.4 records as well.
09:48 ^🔗	db48xOthe	oh, interesting
09:49 ^🔗	alard	(And you can add your own headers to that by providing --warc-header options, so you could add your name, organization etc.)
09:49 ^🔗	alard	So how would that work? Would you provide wget with the filename of the script?
09:50 ^🔗	db48xOthe	nah, I'd just build a record and append it to the file
09:52 ^🔗	alard	Okay. (It doesn't feel right to me to put that kind of functionality in wget. It doesn't really belong there, I think.)
09:52 ^🔗	db48xOthe	gzip crawl.sh > crawl.gz; echo '...' > headers; cat headers crawl.gz > metadata-record, etc
09:52 ^🔗	db48xOthe	alard: yea, I agree
09:52 ^🔗	alard	So which of the records in section 2.4.4 could wget add?
09:52 ^🔗	alard	The list of warcinfo-ids should be no problem.
09:52 ^🔗	db48xOthe	all three of them
09:53 ^🔗	db48xOthe	the log is a bit tricky, since it might not have been kept, or it might have only been appended to an existing file
09:53 ^🔗	alard	Yeah.
09:53 ^🔗	alard	A temporary file? (another)
09:53 ^🔗	db48xOthe	yea
09:53 ^🔗	db48xOthe	--warc-log
09:54 ^🔗	alard	The -nv log level? Or more detailed?
09:55 ^🔗	alard	(With -nv, it might be possible to keep it in memory, for not too large crawls.)
09:57 ^🔗	db48xOthe	I think it should respect whatever logging options the user has set
09:57 ^🔗	alard	That's better, yes.
09:58 ^🔗	db48xOthe	so just whatever would have gone to stdout or the file specified by -o or -a
09:58 ^🔗	db48xOthe	should make it easier to implement
09:58 ^🔗	db48xOthe	oh, and I haven't been able to compile it
09:58 ^🔗	alard	Oh?
09:59 ^🔗	db48xOthe	it can't find a header file
09:59 ^🔗	alard	Is that the git version, or one of the tar.gz?
09:59 ^🔗	db48xOthe	tmp-file.h or something
09:59 ^🔗	db48xOthe	the git version
10:00 ^🔗	alard	Maybe you should run ./bootstrap.sh again?
10:00 ^🔗	db48xOthe	hmm
10:00 ^🔗	db48xOthe	that's worth a try
10:00 ^🔗	db48xOthe	had forgotten about it
10:01 ^🔗	alard	It's possible that I have added an extra gnulib requirement.
10:02 ^🔗	alard	I added tmpdir on July 06, 2011.
10:03 ^🔗	db48xOthe	alas, nobody has commented on my wget patch
10:03 ^🔗	db48xOthe	https://savannah.gnu.org/bugs/index.php?33654
10:08 ^🔗	alard	That's a pity. Seems like a sensible change. (But then, it's only a month or so ago, so who knows.)
10:08 ^🔗	db48xOthe	so what are you going to put in your email?
10:09 ^🔗	db48xOthe	oh, and there was another idea I had
10:09 ^🔗	db48xOthe	it'd be cool if I could feed wget with a WARC file that it had previously created, and have it create the files on disk that go with it
10:10 ^🔗	alard	Don't know yet. Introduce WARC, say that it is very useful to have for archivists, point to the warctools library and point to the github repository, ask whether they think this is something to add to wget?
10:10 ^🔗	alard	In any case, I'll have a look at the metadata records first.
10:10 ^🔗	alard	WARC extraction would be cool, but is probably not something that wget should do?
10:11 ^🔗	alard	http://groups.google.com/group/warc-tools/browse_thread/thread/e65be965b86e0939
10:13 ^🔗	db48xOthe	well, I would hate to write another program that mimics wget's processing of the files, to make the links work and all
10:19 ^🔗	alard	That's true. I forgot about the making-the-links-work bit.
10:19 ^🔗	alard	It would be an 'offline wget'.
10:27 ^🔗	db48xOthe	yea, of a sort
12:52 ^🔗	db48xOthe	alard: maybe we should just store the modified versions of the files in the WARC file, in addition to the originals?
12:56 ^🔗	db48xOthe	with a header that distinguishes the originals from the modified versions
12:56 ^🔗	alard	Wouldn't that lead to a lot of unnecessary duplication?
12:56 ^🔗	alard	It doesn't contain any new information.
12:57 ^🔗	db48xOthe	kinda true
12:57 ^🔗	db48xOthe	unless they change the way the files are processed to make the local mirror, in which case you wouldn't be able to recover it later
12:57 ^🔗	alard	There is the 'conversion' record type for this, by the way.
12:57 ^🔗	db48xOthe	right
12:58 ^🔗	alard	A 'conversion' record shall contain an alternative version of another record's content that was created as
12:58 ^🔗	alard	the result of an archival process. Typically, this is used to hold content transformations that maintain viability of content after widely available rendering tools for the originally stored format disappear
12:59 ^🔗	db48xOthe	indeed
12:59 ^🔗	db48xOthe	should wget dissapear one day, it'll be hard to browse the files in the warc because the links don't work
12:59 ^🔗	alard	but that is mainly about file formats, I think.
12:59 ^🔗	db48xOthe	agreed
12:59 ^🔗	alard	Is it? Everything is there. It's not different from a warc generated by Heritrix, for instance.
13:00 ^🔗	db48xOthe	everything is there except that all of the links are broken
13:00 ^🔗	alard	So you can use one of the available wayback tools to serve the pages (and even rewrite the urls).
13:00 ^🔗	alard	It should be a postprocessing step, in my opinion.
13:00 ^🔗	db48xOthe	ok
13:01 ^🔗	db48xOthe	was just a crazy idea
13:01 ^🔗	alard	:)
13:01 ^🔗	alard	I think that what wget does is nothing more than a hack, which is necessary to make it work, but is far from ideal.
13:03 ^🔗	alard	A tool that generates a local mirror from a WARC file could be useful, though.
13:03 ^🔗	db48xOthe	hmm. you haven't posted to the list yet?
13:03 ^🔗	alard	No, I'm currently working on the log files and metadata stuff.
13:03 ^🔗	db48xOthe	ah
13:03 ^🔗	alard	Nearly done.
13:03 ^🔗	db48xOthe	sweet
13:03 ^🔗	alard	Should it be the default, or optional?
13:05 ^🔗	db48xOthe	I suggested --warc-log, but I don't really see why you wouldn't want it in the warc file if you're doing -o or -a
13:06 ^🔗	alard	True.
13:06 ^🔗	alard	I have it enabled by default now, with --no-warc-keep-log for if you don't want it.
13:06 ^🔗	db48xOthe	works for me
13:07 ^🔗	alard	Also, would it be useful to store the metadata in a separate WARC file?
13:07 ^🔗	alard	.meta.warc.gz ?
13:07 ^🔗	alard	(For the multi-warc case, for the single warc case it's probably better to keep it in the same file.)
13:10 ^🔗	db48xOthe	that's an idea
13:10 ^🔗	db48xOthe	although I'm not sure if I really would want to go that way
13:10 ^🔗	db48xOthe	the metadata could get separated from the data
13:11 ^🔗	alard	Hmm, yeah.
13:11 ^🔗	alard	But in the multi-warc case, you already have multiple files.
13:11 ^🔗	alard	mywarc.00000.warc.gz / mywarc.00001.warc.gz etc.
13:12 ^🔗	alard	and then you would have mywarc.meta.warc.gz
13:12 ^🔗	alard	(instead of a log file in mywarc.00001.warc.gz)
13:16 ^🔗	db48xOthe	the metadata should be in every file, of course
13:16 ^🔗	db48xOthe	disk space is cheap, and anyway it's all compressed
13:19 ^🔗	alard	That's not what the guidelines document suggests.
13:20 ^🔗	alard	There is a warcinfo record with some metadata in each file, of course, but the log file etc. are different.
13:20 ^🔗	db48xOthe	guidelines can be wrong :)
13:20 ^🔗	alard	section 2.4.3: It is recommended that all resource records containing processing information files are
13:20 ^🔗	alard	stored in a specific WARC file (that may be called a ï¿½metadata WARC fileï¿½).
13:20 ^🔗	db48xOthe	anyway, bbl
13:20 ^🔗	alard	Okay!
13:21 ^🔗	db48xOthe	oh, and wget-warc is failing to build still
13:21 ^🔗	db48xOthe	but for a different reason
13:21 ^🔗	db48xOthe	no Makefile in trunk/libwarc/base32
13:23 ^🔗	alard	automake?
13:23 ^🔗	alard	Perhaps start again with a clean checkout, or run make clean or make distclean (never know which does what)
13:25 ^🔗	db48xOthe	automake doesn't fix it
13:26 ^🔗	alard	Strange.
13:26 ^🔗	alard	(Here I must admit that I'm not an expert in these tools, I just run a few of them and then it eventually works.)
13:27 ^🔗	db48xOthe	heh, same here
13:27 ^🔗	alard	There is a Makefile.am in trunk/libwarc that includes things from trunk/libwarc/base32, so I guess it generates it from there.
13:29 ^🔗	db48xOthe	sounds like it shouldn't need to recurse into base32 then
13:31 ^🔗	alard	I do have a Makefile in trunk/libwarc/base32
13:38 ^🔗	alard	I think the Makefile comes from the base32 source. I removed it, now it won't build.
13:40 ^🔗	alard	It must have escaped via .gitignore. I've committed it now, git update and it should (hopefully) work.
15:03 ^🔗	emijrp	SketchCow: thanks for helping wikiteam to get a corner at IA
15:04 ^🔗	emijrp	can you "open" another request, to mirro Jamendo? i developed a script which can be located in a IA server to slurp the whole albums collection (~2TB)
15:04 ^🔗	emijrp	i would be glad to provide it
16:54 ^🔗	jch	I am at the CCC
19:24 ^🔗	SketchCow	OKAY HELLO
19:24 ^🔗	SketchCow	HERE I AM
19:24 ^🔗	SketchCow	emijrp: Send me an e-mail about it, so I can bring it up with the right people.
19:24 ^🔗	SketchCow	If need be, I can, of course, create non-wayback versions that we host.
19:50 ^🔗	SketchCow	Bunch of people said they were going to join Archive Team after my speech.
19:50 ^🔗	SketchCow	We'll see how that goes.
19:50 ^🔗	SketchCow	alard, you're back!
19:50 ^🔗	SketchCow	jch: How's CCC?
19:55 ^🔗	emijrp	sent
20:07 ^🔗	db48xOthe	aha
20:08 ^🔗	db48xOthe	wget compiles now
20:14 ^🔗	db48xOthe	ok, what I need right now is a program to help me dissect binary files where the format is only partially known
20:14 ^🔗	db48xOthe	I want to be able to assign field names to ranges of bytes
20:15 ^🔗	db48xOthe	and to search the files for segments that look like valid instances of known formats
20:51 ^🔗	swebb	The Google Groups tracker is kinda-sorta dead again.
21:24 ^🔗	SketchCow	I'm downloading Jamendo.
21:25 ^🔗	SketchCow	Off it goes!
21:25 ^🔗	SketchCow	12 downloaded, 50,000 to go
21:43 ^🔗	alard	SketchCow: Hi!
21:43 ^🔗	alard	(I don't take my computer with me when I go on vacation. :)
21:45 ^🔗	alard	db48xOthe: good.
21:46 ^🔗	alard	I'll see about the wget mailinglist tomorrow.
21:52 ^🔗	SketchCow	OK, great.
22:08 ^🔗	alard	SketchCow: Nice Twaud.io collection. I'm not sure if you've already found them, but there may be more files in my upload directory. (gv_14 on blindtiger)
22:11 ^🔗	SketchCow	I think that's what I took.
22:11 ^🔗	SketchCow	In fact, I'm sure of it.
22:13 ^🔗	alard	Okay, if you're sure. It's still there, so I was just wondering. :) (The text file only lists things by underscor, http://www.archive.org/details/twaudio-2009-2011 )
22:18 ^🔗	alard	So actually, I can't find where you've put mine.
22:21 ^🔗	Silent700	SketchCow: any thoughts on mirroring/archiving/buying the Disk Sleeve Archive? (http://www.cyberden.com/dsa)
22:21 ^🔗	Silent700	obscure and geeky, yes, but good history IMO
22:26 ^🔗	SketchCow	I can look into it.
22:26 ^🔗	SketchCow	http://www.archive.org/details/philosophicaltransactions
22:27 ^🔗	SketchCow	Why look, 18,500 Royal Society Philosphical Transactions from 1923 and older.
22:27 ^🔗	SketchCow	I wonder how that got there.
22:27 ^🔗	SketchCow	alard: I didn't just check, so I didn't see.
22:27 ^🔗	SketchCow	I will happily pair them up, give me a moment.
22:49 ^🔗	ersi	jch: you in berlin yet?
22:49 ^🔗	ersi	jch: I'm here!

irclogger-viewer