[02:21] *** CoolCanuk has joined #projectnewsletter [03:39] *** Somebody2 has joined #projectnewsletter [03:39] hi [03:39] I think the best format for uploading email is mbox format. [03:40] Which is just plain text with some weirdness to separate messages [03:40] And most email clients have a way to export messages in that format. [03:40] it wont display images? :( [03:41] Images *are* encoded as plain text in emails. [03:41] And once you import the mbox format file into a client, the images show show up again. [03:41] why plaintext over webpage? [03:41] WARC, you mean? [03:42] because emails aren't transactions, and mbox is a well-known format? [03:42] I guess >_> [03:42] But really, *any* way is better than not doing it. [03:43] So, you said you have a cache of emails you want to upload? [03:43] in my gmail inbox lol [03:43] Nice -- let me dig around to find out how gmail lets you export messages... [03:44] alternatively, I could just connect an email client ;) [03:44] I'll use google takeout [03:45] and assign a label to the sears messages [03:45] Sounds good. [03:46] CoolCanuk: https://www.lifewire.com/how-to-export-your-emails-from-gmail-as-mbox-files-1171881 [03:46] hehe. thats what I used [03:47] Ah the glories of non-personalized search results. :-) [03:47] Then just upload those to IA, tag it with archiveteam, and add as much descriptive context as you can. [03:47] and mention the IA item identifier on the wiki page. [03:49] fk [03:49] it contains my email address [03:50] Ah. You could filter that out with search & replace... [03:50] Are you on Windows, Mac or Linux? [03:50] windows [03:50] I opened with Notepad++ [03:50] Notepad++ should work fine. [03:50] Just replace the address with XXXX@XXXX [03:50] newsletterproject@archiveteam.org ? [03:50] oh ok [03:51] or newsletterproject@archiveteam.org -- that works fine too [03:51] just something to make it clear it's a replacement. [03:51] I will ask SketchCow and make sure he doesn't have a "catch all" email (all email gets sent to one address if it doesnt match), which would make the site recieve spam [03:52] Don't worry about it. [03:52] archiveteam.org is a well-known enough domain, it gets plenty of spam already. [03:53] :P [03:54] but, I mean, what good is this mbox when all the external images wont load? [03:54] (in the future) [03:54] CoolCanuk: it's still the text of the emails -- that's still useful. [03:55] As for getting the external images -- you could extract the URLs into a file, then put the file into ArchiveBot. [03:55] with !archiveonly < [03:58] fair [04:00] what is the "tag"? sorry [04:00] "mention the IA item identifier" [04:02] sensible questions [04:03] By "tag", I mean the "Subject" metadata field -- fill them in on the upload form where it says: "Subject Tags". [04:04] The IA item identifier is the part after /details/ in the URL -- like "nasa" in this URL: https://archive.org/details/nasa [04:04] In the uploader, it's the "Page URL" field. [04:04] ohok [04:05] I was just suggesting you add a link to the item you uploaded on the projectnewsletter wiki page. [04:05] this is going to be a ton of work to scrub unsubscript urls and reply addresses [04:05] yeah [04:05] Please take notes on what you do to scrub it, so other people can do the same, later. [04:05] And put the notes on the wiki page! [04:06] I will save all that work for tomorrow [04:06] idealy, i'd make a script that scrubs it [04:07] Yes please! [04:10] how important are email headers? [04:10] other than to, from, subject [04:18] email headers are VERY USEFUL [04:18] if it's feasible to keep them [04:18] as they provide facinating context about how email was routed at particular times in history [04:19] and what kind of spam and anti-spam efforts were in use [04:19] If it's too painful to scrub them to your satisfaction, though -- better to have the bodies than nothing. [04:21] found this http://www.spamdex.org [04:21] they support the archive as well BUT I am not sure if they submit to it. [04:22] would be helpful to see how they process the email and display it (how does it work in the backend)? [04:27] Hm, that is interesting, yes. Probably worth adding a link on projectnewsletter wiki page. [04:46] sketchcow did reply to me and said he will recieve anything@archiveteam.org [04:51] OK. [10:18] *** CoolCanuk has quit IRC (Quit: Connection closed for inactivity) [17:00] *** CoolCanuk has joined #projectnewsletter