#projectnewsletter 2017-11-30,Thu

↑back Search

Time	Nickname	Message
02:21 ^🔗		CoolCanuk has joined #projectnewsletter
03:39 ^🔗		Somebody2 has joined #projectnewsletter
03:39 ^🔗	CoolCanuk	hi
03:39 ^🔗	Somebody2	I think the best format for uploading email is mbox format.
03:40 ^🔗	Somebody2	Which is just plain text with some weirdness to separate messages
03:40 ^🔗	Somebody2	And most email clients have a way to export messages in that format.
03:40 ^🔗	CoolCanuk	it wont display images? :(
03:41 ^🔗	Somebody2	Images are encoded as plain text in emails.
03:41 ^🔗	Somebody2	And once you import the mbox format file into a client, the images show show up again.
03:41 ^🔗	CoolCanuk	why plaintext over webpage?
03:41 ^🔗	Somebody2	WARC, you mean?
03:42 ^🔗	Somebody2	because emails aren't transactions, and mbox is a well-known format?
03:42 ^🔗	CoolCanuk	I guess >_>
03:42 ^🔗	Somebody2	But really, any way is better than not doing it.
03:43 ^🔗	Somebody2	So, you said you have a cache of emails you want to upload?
03:43 ^🔗	CoolCanuk	in my gmail inbox lol
03:43 ^🔗	Somebody2	Nice -- let me dig around to find out how gmail lets you export messages...
03:44 ^🔗	CoolCanuk	alternatively, I could just connect an email client ;)
03:44 ^🔗	CoolCanuk	I'll use google takeout
03:45 ^🔗	CoolCanuk	and assign a label to the sears messages
03:45 ^🔗	Somebody2	Sounds good.
03:46 ^🔗	Somebody2	CoolCanuk: https://www.lifewire.com/how-to-export-your-emails-from-gmail-as-mbox-files-1171881
03:46 ^🔗	CoolCanuk	hehe. thats what I used
03:47 ^🔗	Somebody2	Ah the glories of non-personalized search results. :-)
03:47 ^🔗	Somebody2	Then just upload those to IA, tag it with archiveteam, and add as much descriptive context as you can.
03:47 ^🔗	Somebody2	and mention the IA item identifier on the wiki page.
03:49 ^🔗	CoolCanuk	fk
03:49 ^🔗	CoolCanuk	it contains my email address
03:50 ^🔗	Somebody2	Ah. You could filter that out with search & replace...
03:50 ^🔗	Somebody2	Are you on Windows, Mac or Linux?
03:50 ^🔗	CoolCanuk	windows
03:50 ^🔗	CoolCanuk	I opened with Notepad++
03:50 ^🔗	Somebody2	Notepad++ should work fine.
03:50 ^🔗	Somebody2	Just replace the address with XXXX@XXXX
03:50 ^🔗	CoolCanuk	newsletterproject@archiveteam.org ?
03:50 ^🔗	CoolCanuk	oh ok
03:51 ^🔗	Somebody2	or newsletterproject@archiveteam.org -- that works fine too
03:51 ^🔗	Somebody2	just something to make it clear it's a replacement.
03:51 ^🔗	CoolCanuk	I will ask SketchCow and make sure he doesn't have a "catch all" email (all email gets sent to one address if it doesnt match), which would make the site recieve spam
03:52 ^🔗	Somebody2	Don't worry about it.
03:52 ^🔗	Somebody2	archiveteam.org is a well-known enough domain, it gets plenty of spam already.
03:53 ^🔗	CoolCanuk	:P
03:54 ^🔗	CoolCanuk	but, I mean, what good is this mbox when all the external images wont load?
03:54 ^🔗	CoolCanuk	(in the future)
03:54 ^🔗	Somebody2	CoolCanuk: it's still the text of the emails -- that's still useful.
03:55 ^🔗	Somebody2	As for getting the external images -- you could extract the URLs into a file, then put the file into ArchiveBot.
03:55 ^🔗	Somebody2	with !archiveonly <
03:58 ^🔗	CoolCanuk	fair
04:00 ^🔗	CoolCanuk	what is the "tag"? sorry
04:00 ^🔗	CoolCanuk	"mention the IA item identifier"
04:02 ^🔗	Somebody2	sensible questions
04:03 ^🔗	Somebody2	By "tag", I mean the "Subject" metadata field -- fill them in on the upload form where it says: "Subject Tags".
04:04 ^🔗	Somebody2	The IA item identifier is the part after /details/ in the URL -- like "nasa" in this URL: https://archive.org/details/nasa
04:04 ^🔗	Somebody2	In the uploader, it's the "Page URL" field.
04:04 ^🔗	CoolCanuk	ohok
04:05 ^🔗	Somebody2	I was just suggesting you add a link to the item you uploaded on the projectnewsletter wiki page.
04:05 ^🔗	CoolCanuk	this is going to be a ton of work to scrub unsubscript urls and reply addresses
04:05 ^🔗	CoolCanuk	yeah
04:05 ^🔗	Somebody2	Please take notes on what you do to scrub it, so other people can do the same, later.
04:05 ^🔗	Somebody2	And put the notes on the wiki page!
04:06 ^🔗	CoolCanuk	I will save all that work for tomorrow
04:06 ^🔗	CoolCanuk	idealy, i'd make a script that scrubs it
04:07 ^🔗	Somebody2	Yes please!
04:10 ^🔗	CoolCanuk	how important are email headers?
04:10 ^🔗	CoolCanuk	other than to, from, subject
04:18 ^🔗	Somebody2	email headers are VERY USEFUL
04:18 ^🔗	Somebody2	if it's feasible to keep them
04:18 ^🔗	Somebody2	as they provide facinating context about how email was routed at particular times in history
04:19 ^🔗	Somebody2	and what kind of spam and anti-spam efforts were in use
04:19 ^🔗	Somebody2	If it's too painful to scrub them to your satisfaction, though -- better to have the bodies than nothing.
04:21 ^🔗	CoolCanuk	found this http://www.spamdex.org
04:21 ^🔗	CoolCanuk	they support the archive as well BUT I am not sure if they submit to it.
04:22 ^🔗	CoolCanuk	would be helpful to see how they process the email and display it (how does it work in the backend)?
04:27 ^🔗	Somebody2	Hm, that is interesting, yes. Probably worth adding a link on projectnewsletter wiki page.
04:46 ^🔗	CoolCanuk	sketchcow did reply to me and said he will recieve anything@archiveteam.org
04:51 ^🔗	Somebody2	OK.
10:18 ^🔗		CoolCanuk has quit IRC (Quit: Connection closed for inactivity)
17:00 ^🔗		CoolCanuk has joined #projectnewsletter

irclogger-viewer