[10:50] Hey, I just fixed a bug in my USENET and mailing list message archiver... when importing single messages, it's no longer phased by ISO-8859-1 format messages. So if anyone's using that, now's a great time to pull the latest version from GitHub. :) [10:53] well done [10:59] I'm stoked, I've gone from importing maybe 70-80% of messages to well over 95% I think [10:59] now I just have to figure out how to host them all... :D [11:00] ZoeB: but your code must be linked from http://archiveteam.org/index.php?title=Usenet [11:01] "Each message is stored in a separate file, named after the SHA1 of its universally unique message ID." Can that be a Maildir? [11:01] The URL's https://github.com/ZoeB/arcmesg , shall I add it to the list? It's quite different from the USENET backups on archive.org, and indeed I'm actually pulling those into my collection now [11:02] Yes, please add [11:02] hmm, I'm not familiar with maildirs, only mbox... let me read up on that! [11:02] Eventually we should use your tool to download the newsgroups not covered by giganews [11:03] the filesystem I'm using is hugely influenced by Git and designed so if you know the message ID of a message you'd like to see, it should be trivial to find the actual message [11:03] filesystem's the wrong word... the way I'm organising and naming the files [11:04] I'd be honoured if anyone used or improved it at all. I'm just using it with a few NNTP servers, mailing lists, and any USENET dumps and mbox archives of public mailing lists I can get my hands on [11:06] hmm, I only have an account on the fileformats wiki, not the main one... WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [11:07] yahoosucks :) [11:08] haha! as someone who just rewatched the Scott talk about Archive Team: A Preservation of Service Attack, I can appreciate that :D [11:12] Added, thank you! [11:15] OK, I've read about Maildir now, and this is quite different. When dealing with millions of files, even Linux starts to complain, hence they're stored 2 levels deep, in (by default) ~/messages/[first 2 chars of message ID sha1]/[next 2 chars]/[remaining chars]. The idea being that if you know a message ID and want to find the message, it can be automatically found and retrieved very quickly. [11:20] Ah, so everything goes in a single directory structure whatever its source? [11:30] exactly, yes [11:31] you could handle each thing you want to archive separately, and have a different overall directory for each one, or you can put them all together (either right away or later on) with pretty much a guarantee that the archives can be merged together without the files stepping on each other's toes [11:32] messages IDs are designed to be universally unique, so the SHA1s of them should be too, at least in practical terms [11:33] Yes, that's neat [11:33] It would however be nice to have a "database" that can be directly read by some mail software [11:34] Not necessary, just nice. :) [11:37] That *would* be nice, yes. I'd love to write all sorts of scripts that could import and export messages from this system into various other formats [11:37] I'm not *that* good a programmer though and could do with help on such matters [11:39] I managed to make a website that lets people browse some of the mailing lists I have a personal interest in: http://analogue.bytenoise.co.uk But it's a bit buggy and doesn't scale well yet. [11:41] An eventual goal would be to let multiple people each host their own archives on their servers, which could talk to each other peer-to-peer to get each other's files, and all let the public browse them. But that's a tad ambitious. :) [14:44] This is pretty cool- The people who bought 3D Realms, Interceptor, have been releasing stuff from the 3D Realms archives: http://forums.duke4.net/topic/7366-duke3d-sw-earlyalphabetagold-material/ [16:14] Hey, so, I'm in and out this week but hope to be online during the day (attending film festival, seeing stuff at night). [16:36] So many gamefaq's are missing, hmmm