#archiveteam 2014-04-20,Sun

↑back Search

Time Nickname Message
10:50 🔗 ZoeB Hey, I just fixed a bug in my USENET and mailing list message archiver... when importing single messages, it's no longer phased by ISO-8859-1 format messages. So if anyone's using that, now's a great time to pull the latest version from GitHub. :)
10:53 🔗 Nemo_bis well done
10:59 🔗 ZoeB I'm stoked, I've gone from importing maybe 70-80% of messages to well over 95% I think
10:59 🔗 ZoeB now I just have to figure out how to host them all... :D
11:00 🔗 Nemo_bis ZoeB: but your code must be linked from http://archiveteam.org/index.php?title=Usenet
11:01 🔗 Nemo_bis "Each message is stored in a separate file, named after the SHA1 of its universally unique message ID." Can that be a Maildir?
11:01 🔗 ZoeB The URL's https://github.com/ZoeB/arcmesg , shall I add it to the list? It's quite different from the USENET backups on archive.org, and indeed I'm actually pulling those into my collection now
11:02 🔗 Nemo_bis Yes, please add
11:02 🔗 ZoeB hmm, I'm not familiar with maildirs, only mbox... let me read up on that!
11:02 🔗 Nemo_bis Eventually we should use your tool to download the newsgroups not covered by giganews
11:03 🔗 ZoeB the filesystem I'm using is hugely influenced by Git and designed so if you know the message ID of a message you'd like to see, it should be trivial to find the actual message
11:03 🔗 ZoeB filesystem's the wrong word... the way I'm organising and naming the files
11:04 🔗 ZoeB I'd be honoured if anyone used or improved it at all. I'm just using it with a few NNTP servers, mailing lists, and any USENET dumps and mbox archives of public mailing lists I can get my hands on
11:06 🔗 ZoeB hmm, I only have an account on the fileformats wiki, not the main one... WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
11:07 🔗 Nemo_bis yahoosucks :)
11:08 🔗 ZoeB haha! as someone who just rewatched the Scott talk about Archive Team: A Preservation of Service Attack, I can appreciate that :D
11:12 🔗 ZoeB Added, thank you!
11:15 🔗 ZoeB OK, I've read about Maildir now, and this is quite different. When dealing with millions of files, even Linux starts to complain, hence they're stored 2 levels deep, in (by default) ~/messages/[first 2 chars of message ID sha1]/[next 2 chars]/[remaining chars]. The idea being that if you know a message ID and want to find the message, it can be automatically found and retrieved very quickly.
11:20 🔗 Nemo_bis Ah, so everything goes in a single directory structure whatever its source?
11:30 🔗 ZoeB exactly, yes
11:31 🔗 ZoeB you could handle each thing you want to archive separately, and have a different overall directory for each one, or you can put them all together (either right away or later on) with pretty much a guarantee that the archives can be merged together without the files stepping on each other's toes
11:32 🔗 ZoeB messages IDs are designed to be universally unique, so the SHA1s of them should be too, at least in practical terms
11:33 🔗 Nemo_bis Yes, that's neat
11:33 🔗 Nemo_bis It would however be nice to have a "database" that can be directly read by some mail software
11:34 🔗 Nemo_bis Not necessary, just nice. :)
11:37 🔗 ZoeB That *would* be nice, yes. I'd love to write all sorts of scripts that could import and export messages from this system into various other formats
11:37 🔗 ZoeB I'm not *that* good a programmer though and could do with help on such matters
11:39 🔗 ZoeB I managed to make a website that lets people browse some of the mailing lists I have a personal interest in: http://analogue.bytenoise.co.uk But it's a bit buggy and doesn't scale well yet.
11:41 🔗 ZoeB An eventual goal would be to let multiple people each host their own archives on their servers, which could talk to each other peer-to-peer to get each other's files, and all let the public browse them. But that's a tad ambitious. :)
14:44 🔗 dashcloud This is pretty cool- The people who bought 3D Realms, Interceptor, have been releasing stuff from the 3D Realms archives: http://forums.duke4.net/topic/7366-duke3d-sw-earlyalphabetagold-material/
16:14 🔗 SketchCow Hey, so, I'm in and out this week but hope to be online during the day (attending film festival, seeing stuff at night).
16:36 🔗 SmileyG So many gamefaq's are missing, hmmm

irclogger-viewer