#projectnewsletter 2017-11-30,Thu

↑back Search

Time Nickname Message
02:21 🔗 CoolCanuk has joined #projectnewsletter
03:39 🔗 Somebody2 has joined #projectnewsletter
03:39 🔗 CoolCanuk hi
03:39 🔗 Somebody2 I think the best format for uploading email is mbox format.
03:40 🔗 Somebody2 Which is just plain text with some weirdness to separate messages
03:40 🔗 Somebody2 And most email clients have a way to export messages in that format.
03:40 🔗 CoolCanuk it wont display images? :(
03:41 🔗 Somebody2 Images *are* encoded as plain text in emails.
03:41 🔗 Somebody2 And once you import the mbox format file into a client, the images show show up again.
03:41 🔗 CoolCanuk why plaintext over webpage?
03:41 🔗 Somebody2 WARC, you mean?
03:42 🔗 Somebody2 because emails aren't transactions, and mbox is a well-known format?
03:42 🔗 CoolCanuk I guess >_>
03:42 🔗 Somebody2 But really, *any* way is better than not doing it.
03:43 🔗 Somebody2 So, you said you have a cache of emails you want to upload?
03:43 🔗 CoolCanuk in my gmail inbox lol
03:43 🔗 Somebody2 Nice -- let me dig around to find out how gmail lets you export messages...
03:44 🔗 CoolCanuk alternatively, I could just connect an email client ;)
03:44 🔗 CoolCanuk I'll use google takeout
03:45 🔗 CoolCanuk and assign a label to the sears messages
03:45 🔗 Somebody2 Sounds good.
03:46 🔗 Somebody2 CoolCanuk: https://www.lifewire.com/how-to-export-your-emails-from-gmail-as-mbox-files-1171881
03:46 🔗 CoolCanuk hehe. thats what I used
03:47 🔗 Somebody2 Ah the glories of non-personalized search results. :-)
03:47 🔗 Somebody2 Then just upload those to IA, tag it with archiveteam, and add as much descriptive context as you can.
03:47 🔗 Somebody2 and mention the IA item identifier on the wiki page.
03:49 🔗 CoolCanuk fk
03:49 🔗 CoolCanuk it contains my email address
03:50 🔗 Somebody2 Ah. You could filter that out with search & replace...
03:50 🔗 Somebody2 Are you on Windows, Mac or Linux?
03:50 🔗 CoolCanuk windows
03:50 🔗 CoolCanuk I opened with Notepad++
03:50 🔗 Somebody2 Notepad++ should work fine.
03:50 🔗 Somebody2 Just replace the address with XXXX@XXXX
03:50 🔗 CoolCanuk newsletterproject@archiveteam.org ?
03:50 🔗 CoolCanuk oh ok
03:51 🔗 Somebody2 or newsletterproject@archiveteam.org -- that works fine too
03:51 🔗 Somebody2 just something to make it clear it's a replacement.
03:51 🔗 CoolCanuk I will ask SketchCow and make sure he doesn't have a "catch all" email (all email gets sent to one address if it doesnt match), which would make the site recieve spam
03:52 🔗 Somebody2 Don't worry about it.
03:52 🔗 Somebody2 archiveteam.org is a well-known enough domain, it gets plenty of spam already.
03:53 🔗 CoolCanuk :P
03:54 🔗 CoolCanuk but, I mean, what good is this mbox when all the external images wont load?
03:54 🔗 CoolCanuk (in the future)
03:54 🔗 Somebody2 CoolCanuk: it's still the text of the emails -- that's still useful.
03:55 🔗 Somebody2 As for getting the external images -- you could extract the URLs into a file, then put the file into ArchiveBot.
03:55 🔗 Somebody2 with !archiveonly <
03:58 🔗 CoolCanuk fair
04:00 🔗 CoolCanuk what is the "tag"? sorry
04:00 🔗 CoolCanuk "mention the IA item identifier"
04:02 🔗 Somebody2 sensible questions
04:03 🔗 Somebody2 By "tag", I mean the "Subject" metadata field -- fill them in on the upload form where it says: "Subject Tags".
04:04 🔗 Somebody2 The IA item identifier is the part after /details/ in the URL -- like "nasa" in this URL: https://archive.org/details/nasa
04:04 🔗 Somebody2 In the uploader, it's the "Page URL" field.
04:04 🔗 CoolCanuk ohok
04:05 🔗 Somebody2 I was just suggesting you add a link to the item you uploaded on the projectnewsletter wiki page.
04:05 🔗 CoolCanuk this is going to be a ton of work to scrub unsubscript urls and reply addresses
04:05 🔗 CoolCanuk yeah
04:05 🔗 Somebody2 Please take notes on what you do to scrub it, so other people can do the same, later.
04:05 🔗 Somebody2 And put the notes on the wiki page!
04:06 🔗 CoolCanuk I will save all that work for tomorrow
04:06 🔗 CoolCanuk idealy, i'd make a script that scrubs it
04:07 🔗 Somebody2 Yes please!
04:10 🔗 CoolCanuk how important are email headers?
04:10 🔗 CoolCanuk other than to, from, subject
04:18 🔗 Somebody2 email headers are VERY USEFUL
04:18 🔗 Somebody2 if it's feasible to keep them
04:18 🔗 Somebody2 as they provide facinating context about how email was routed at particular times in history
04:19 🔗 Somebody2 and what kind of spam and anti-spam efforts were in use
04:19 🔗 Somebody2 If it's too painful to scrub them to your satisfaction, though -- better to have the bodies than nothing.
04:21 🔗 CoolCanuk found this http://www.spamdex.org
04:21 🔗 CoolCanuk they support the archive as well BUT I am not sure if they submit to it.
04:22 🔗 CoolCanuk would be helpful to see how they process the email and display it (how does it work in the backend)?
04:27 🔗 Somebody2 Hm, that is interesting, yes. Probably worth adding a link on projectnewsletter wiki page.
04:46 🔗 CoolCanuk sketchcow did reply to me and said he will recieve anything@archiveteam.org
04:51 🔗 Somebody2 OK.
10:18 🔗 CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
17:00 🔗 CoolCanuk has joined #projectnewsletter

irclogger-viewer