Time |
Nickname |
Message |
10:50
🔗
|
ZoeB |
Hey, I just fixed a bug in my USENET and mailing list message archiver... when importing single messages, it's no longer phased by ISO-8859-1 format messages. So if anyone's using that, now's a great time to pull the latest version from GitHub. :) |
10:53
🔗
|
Nemo_bis |
well done |
10:59
🔗
|
ZoeB |
I'm stoked, I've gone from importing maybe 70-80% of messages to well over 95% I think |
10:59
🔗
|
ZoeB |
now I just have to figure out how to host them all... :D |
11:00
🔗
|
Nemo_bis |
ZoeB: but your code must be linked from http://archiveteam.org/index.php?title=Usenet |
11:01
🔗
|
Nemo_bis |
"Each message is stored in a separate file, named after the SHA1 of its universally unique message ID." Can that be a Maildir? |
11:01
🔗
|
ZoeB |
The URL's https://github.com/ZoeB/arcmesg , shall I add it to the list? It's quite different from the USENET backups on archive.org, and indeed I'm actually pulling those into my collection now |
11:02
🔗
|
Nemo_bis |
Yes, please add |
11:02
🔗
|
ZoeB |
hmm, I'm not familiar with maildirs, only mbox... let me read up on that! |
11:02
🔗
|
Nemo_bis |
Eventually we should use your tool to download the newsgroups not covered by giganews |
11:03
🔗
|
ZoeB |
the filesystem I'm using is hugely influenced by Git and designed so if you know the message ID of a message you'd like to see, it should be trivial to find the actual message |
11:03
🔗
|
ZoeB |
filesystem's the wrong word... the way I'm organising and naming the files |
11:04
🔗
|
ZoeB |
I'd be honoured if anyone used or improved it at all. I'm just using it with a few NNTP servers, mailing lists, and any USENET dumps and mbox archives of public mailing lists I can get my hands on |
11:06
🔗
|
ZoeB |
hmm, I only have an account on the fileformats wiki, not the main one... WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
11:07
🔗
|
Nemo_bis |
yahoosucks :) |
11:08
🔗
|
ZoeB |
haha! as someone who just rewatched the Scott talk about Archive Team: A Preservation of Service Attack, I can appreciate that :D |
11:12
🔗
|
ZoeB |
Added, thank you! |
11:15
🔗
|
ZoeB |
OK, I've read about Maildir now, and this is quite different. When dealing with millions of files, even Linux starts to complain, hence they're stored 2 levels deep, in (by default) ~/messages/[first 2 chars of message ID sha1]/[next 2 chars]/[remaining chars]. The idea being that if you know a message ID and want to find the message, it can be automatically found and retrieved very quickly. |
11:20
🔗
|
Nemo_bis |
Ah, so everything goes in a single directory structure whatever its source? |
11:30
🔗
|
ZoeB |
exactly, yes |
11:31
🔗
|
ZoeB |
you could handle each thing you want to archive separately, and have a different overall directory for each one, or you can put them all together (either right away or later on) with pretty much a guarantee that the archives can be merged together without the files stepping on each other's toes |
11:32
🔗
|
ZoeB |
messages IDs are designed to be universally unique, so the SHA1s of them should be too, at least in practical terms |
11:33
🔗
|
Nemo_bis |
Yes, that's neat |
11:33
🔗
|
Nemo_bis |
It would however be nice to have a "database" that can be directly read by some mail software |
11:34
🔗
|
Nemo_bis |
Not necessary, just nice. :) |
11:37
🔗
|
ZoeB |
That *would* be nice, yes. I'd love to write all sorts of scripts that could import and export messages from this system into various other formats |
11:37
🔗
|
ZoeB |
I'm not *that* good a programmer though and could do with help on such matters |
11:39
🔗
|
ZoeB |
I managed to make a website that lets people browse some of the mailing lists I have a personal interest in: http://analogue.bytenoise.co.uk But it's a bit buggy and doesn't scale well yet. |
11:41
🔗
|
ZoeB |
An eventual goal would be to let multiple people each host their own archives on their servers, which could talk to each other peer-to-peer to get each other's files, and all let the public browse them. But that's a tad ambitious. :) |
14:44
🔗
|
dashcloud |
This is pretty cool- The people who bought 3D Realms, Interceptor, have been releasing stuff from the 3D Realms archives: http://forums.duke4.net/topic/7366-duke3d-sw-earlyalphabetagold-material/ |
16:14
🔗
|
SketchCow |
Hey, so, I'm in and out this week but hope to be online during the day (attending film festival, seeing stuff at night). |
16:36
🔗
|
SmileyG |
So many gamefaq's are missing, hmmm |