Time |
Nickname |
Message |
02:21
🔗
|
|
CoolCanuk has joined #projectnewsletter |
03:39
🔗
|
|
Somebody2 has joined #projectnewsletter |
03:39
🔗
|
CoolCanuk |
hi |
03:39
🔗
|
Somebody2 |
I think the best format for uploading email is mbox format. |
03:40
🔗
|
Somebody2 |
Which is just plain text with some weirdness to separate messages |
03:40
🔗
|
Somebody2 |
And most email clients have a way to export messages in that format. |
03:40
🔗
|
CoolCanuk |
it wont display images? :( |
03:41
🔗
|
Somebody2 |
Images *are* encoded as plain text in emails. |
03:41
🔗
|
Somebody2 |
And once you import the mbox format file into a client, the images show show up again. |
03:41
🔗
|
CoolCanuk |
why plaintext over webpage? |
03:41
🔗
|
Somebody2 |
WARC, you mean? |
03:42
🔗
|
Somebody2 |
because emails aren't transactions, and mbox is a well-known format? |
03:42
🔗
|
CoolCanuk |
I guess >_> |
03:42
🔗
|
Somebody2 |
But really, *any* way is better than not doing it. |
03:43
🔗
|
Somebody2 |
So, you said you have a cache of emails you want to upload? |
03:43
🔗
|
CoolCanuk |
in my gmail inbox lol |
03:43
🔗
|
Somebody2 |
Nice -- let me dig around to find out how gmail lets you export messages... |
03:44
🔗
|
CoolCanuk |
alternatively, I could just connect an email client ;) |
03:44
🔗
|
CoolCanuk |
I'll use google takeout |
03:45
🔗
|
CoolCanuk |
and assign a label to the sears messages |
03:45
🔗
|
Somebody2 |
Sounds good. |
03:46
🔗
|
Somebody2 |
CoolCanuk: https://www.lifewire.com/how-to-export-your-emails-from-gmail-as-mbox-files-1171881 |
03:46
🔗
|
CoolCanuk |
hehe. thats what I used |
03:47
🔗
|
Somebody2 |
Ah the glories of non-personalized search results. :-) |
03:47
🔗
|
Somebody2 |
Then just upload those to IA, tag it with archiveteam, and add as much descriptive context as you can. |
03:47
🔗
|
Somebody2 |
and mention the IA item identifier on the wiki page. |
03:49
🔗
|
CoolCanuk |
fk |
03:49
🔗
|
CoolCanuk |
it contains my email address |
03:50
🔗
|
Somebody2 |
Ah. You could filter that out with search & replace... |
03:50
🔗
|
Somebody2 |
Are you on Windows, Mac or Linux? |
03:50
🔗
|
CoolCanuk |
windows |
03:50
🔗
|
CoolCanuk |
I opened with Notepad++ |
03:50
🔗
|
Somebody2 |
Notepad++ should work fine. |
03:50
🔗
|
Somebody2 |
Just replace the address with XXXX@XXXX |
03:50
🔗
|
CoolCanuk |
newsletterproject@archiveteam.org ? |
03:50
🔗
|
CoolCanuk |
oh ok |
03:51
🔗
|
Somebody2 |
or newsletterproject@archiveteam.org -- that works fine too |
03:51
🔗
|
Somebody2 |
just something to make it clear it's a replacement. |
03:51
🔗
|
CoolCanuk |
I will ask SketchCow and make sure he doesn't have a "catch all" email (all email gets sent to one address if it doesnt match), which would make the site recieve spam |
03:52
🔗
|
Somebody2 |
Don't worry about it. |
03:52
🔗
|
Somebody2 |
archiveteam.org is a well-known enough domain, it gets plenty of spam already. |
03:53
🔗
|
CoolCanuk |
:P |
03:54
🔗
|
CoolCanuk |
but, I mean, what good is this mbox when all the external images wont load? |
03:54
🔗
|
CoolCanuk |
(in the future) |
03:54
🔗
|
Somebody2 |
CoolCanuk: it's still the text of the emails -- that's still useful. |
03:55
🔗
|
Somebody2 |
As for getting the external images -- you could extract the URLs into a file, then put the file into ArchiveBot. |
03:55
🔗
|
Somebody2 |
with !archiveonly < |
03:58
🔗
|
CoolCanuk |
fair |
04:00
🔗
|
CoolCanuk |
what is the "tag"? sorry |
04:00
🔗
|
CoolCanuk |
"mention the IA item identifier" |
04:02
🔗
|
Somebody2 |
sensible questions |
04:03
🔗
|
Somebody2 |
By "tag", I mean the "Subject" metadata field -- fill them in on the upload form where it says: "Subject Tags". |
04:04
🔗
|
Somebody2 |
The IA item identifier is the part after /details/ in the URL -- like "nasa" in this URL: https://archive.org/details/nasa |
04:04
🔗
|
Somebody2 |
In the uploader, it's the "Page URL" field. |
04:04
🔗
|
CoolCanuk |
ohok |
04:05
🔗
|
Somebody2 |
I was just suggesting you add a link to the item you uploaded on the projectnewsletter wiki page. |
04:05
🔗
|
CoolCanuk |
this is going to be a ton of work to scrub unsubscript urls and reply addresses |
04:05
🔗
|
CoolCanuk |
yeah |
04:05
🔗
|
Somebody2 |
Please take notes on what you do to scrub it, so other people can do the same, later. |
04:05
🔗
|
Somebody2 |
And put the notes on the wiki page! |
04:06
🔗
|
CoolCanuk |
I will save all that work for tomorrow |
04:06
🔗
|
CoolCanuk |
idealy, i'd make a script that scrubs it |
04:07
🔗
|
Somebody2 |
Yes please! |
04:10
🔗
|
CoolCanuk |
how important are email headers? |
04:10
🔗
|
CoolCanuk |
other than to, from, subject |
04:18
🔗
|
Somebody2 |
email headers are VERY USEFUL |
04:18
🔗
|
Somebody2 |
if it's feasible to keep them |
04:18
🔗
|
Somebody2 |
as they provide facinating context about how email was routed at particular times in history |
04:19
🔗
|
Somebody2 |
and what kind of spam and anti-spam efforts were in use |
04:19
🔗
|
Somebody2 |
If it's too painful to scrub them to your satisfaction, though -- better to have the bodies than nothing. |
04:21
🔗
|
CoolCanuk |
found this http://www.spamdex.org |
04:21
🔗
|
CoolCanuk |
they support the archive as well BUT I am not sure if they submit to it. |
04:22
🔗
|
CoolCanuk |
would be helpful to see how they process the email and display it (how does it work in the backend)? |
04:27
🔗
|
Somebody2 |
Hm, that is interesting, yes. Probably worth adding a link on projectnewsletter wiki page. |
04:46
🔗
|
CoolCanuk |
sketchcow did reply to me and said he will recieve anything@archiveteam.org |
04:51
🔗
|
Somebody2 |
OK. |
10:18
🔗
|
|
CoolCanuk has quit IRC (Quit: Connection closed for inactivity) |
17:00
🔗
|
|
CoolCanuk has joined #projectnewsletter |