[10:50] <ZoeB> Hey, I just fixed a bug in my USENET and mailing list message archiver... when importing single messages, it's no longer phased by ISO-8859-1 format messages. So if anyone's using that, now's a great time to pull the latest version from GitHub. :)
[10:53] <Nemo_bis> well done
[10:59] <ZoeB> I'm stoked, I've gone from importing maybe 70-80% of messages to well over 95% I think
[10:59] <ZoeB> now I just have to figure out how to host them all... :D
[11:00] <Nemo_bis> ZoeB: but your code must be linked from http://archiveteam.org/index.php?title=Usenet
[11:01] <Nemo_bis> "Each message is stored in a separate file, named after the SHA1 of its universally unique message ID." Can that be a Maildir?
[11:01] <ZoeB> The URL's https://github.com/ZoeB/arcmesg , shall I add it to the list? It's quite different from the USENET backups on archive.org, and indeed I'm actually pulling those into my collection now
[11:02] <Nemo_bis> Yes, please add
[11:02] <ZoeB> hmm, I'm not familiar with maildirs, only mbox... let me read up on that!
[11:02] <Nemo_bis> Eventually we should use your tool to download the newsgroups not covered by giganews
[11:03] <ZoeB> the filesystem I'm using is hugely influenced by Git and designed so if you know the message ID of a message you'd like to see, it should be trivial to find the actual message
[11:03] <ZoeB> filesystem's the wrong word... the way I'm organising and naming the files
[11:04] <ZoeB> I'd be honoured if anyone used or improved it at all.  I'm just using it with a few NNTP servers, mailing lists, and any USENET dumps and mbox archives of public mailing lists I can get my hands on
[11:06] <ZoeB> hmm, I only have an account on the fileformats wiki, not the main one... WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
[11:07] <Nemo_bis> yahoosucks :)
[11:08] <ZoeB> haha! as someone who just rewatched the Scott talk about Archive Team: A Preservation of Service Attack, I can appreciate that :D
[11:12] <ZoeB> Added, thank you!
[11:15] <ZoeB> OK, I've read about Maildir now, and this is quite different. When dealing with millions of files, even Linux starts to complain, hence they're stored 2 levels deep, in (by default) ~/messages/[first 2 chars of message ID sha1]/[next 2 chars]/[remaining chars].  The idea being that if you know a message ID and want to find the message, it can be automatically found and retrieved very quickly.
[11:20] <Nemo_bis> Ah, so everything goes in a single directory structure whatever its source?
[11:30] <ZoeB> exactly, yes
[11:31] <ZoeB> you could handle each thing you want to archive separately, and have a different overall directory for each one, or you can put them all together (either right away or later on) with pretty much a guarantee that the archives can be merged together without the files stepping on each other's toes
[11:32] <ZoeB> messages IDs are designed to be universally unique, so the SHA1s of them should be too, at least in practical terms
[11:33] <Nemo_bis> Yes, that's neat
[11:33] <Nemo_bis> It would however be nice to have a "database" that can be directly read by some mail software
[11:34] <Nemo_bis> Not necessary, just nice. :)
[11:37] <ZoeB> That *would* be nice, yes. I'd love to write all sorts of scripts that could import and export messages from this system into various other formats
[11:37] <ZoeB> I'm not *that* good a programmer though and could do with help on such matters
[11:39] <ZoeB> I managed to make a website that lets people browse some of the mailing lists I have a personal interest in: http://analogue.bytenoise.co.uk But it's a bit buggy and doesn't scale well yet.
[11:41] <ZoeB> An eventual goal would be to let multiple people each host their own archives on their servers, which could talk to each other peer-to-peer to get each other's files, and all let the public browse them. But that's a tad ambitious. :)
[14:44] <dashcloud> This is pretty cool- The people who bought 3D Realms, Interceptor, have been releasing stuff from the 3D Realms archives: http://forums.duke4.net/topic/7366-duke3d-sw-earlyalphabetagold-material/
[16:14] <SketchCow> Hey, so, I'm in and out this week but hope to be online during the day (attending film festival, seeing stuff at night).
[16:36] <SmileyG> So many gamefaq's are missing, hmmm