Time |
Nickname |
Message |
15:19
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:24
🔗
|
|
Start has joined #projectnewsletter |
16:24
🔗
|
|
Start has quit IRC (Client Quit) |
16:24
🔗
|
|
Start has joined #projectnewsletter |
17:07
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:12
🔗
|
|
Start has joined #projectnewsletter |
18:38
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
19:36
🔗
|
|
Start has joined #projectnewsletter |
20:37
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
20:43
🔗
|
|
Start has joined #projectnewsletter |
20:45
🔗
|
|
Start has quit IRC (Client Quit) |
20:48
🔗
|
|
Start has joined #projectnewsletter |
21:19
🔗
|
|
nickname has joined #projectnewsletter |
21:19
🔗
|
nickname |
hello |
21:21
🔗
|
|
nickname has quit IRC (Client Quit) |
21:33
🔗
|
|
nickname has joined #projectnewsletter |
21:33
🔗
|
nickname |
anyone scraped google yet? |
21:50
🔗
|
achip |
nickname, feel free to, it's likely changed since the last time |
21:51
🔗
|
nickname |
what should I use? |
21:52
🔗
|
achip |
good question. My usual goto is just open a browser, enter the search and ctrl+click to open each page of the results in a tab. then use a "copy links" extension to copy all the links on the tabs. paste that into a document then regex |
21:52
🔗
|
achip |
it's manually intensive but it works |
21:56
🔗
|
nickname |
I found a big list of email pages, only problem is that their in some strange compressed JS. |
21:57
🔗
|
nickname |
Here's an unrelated pastebin scrape link: https://pastebin.com/raw.php?i=0tZYHKRP |
22:03
🔗
|
achip |
here's what I regex'd from that list http://paste.nerds.io/raw/qipuzanafa |
22:08
🔗
|
nickname |
Here's another link, some of the links on it may be dead and it's in JS: https://pastebin.com/raw.php?i=8LKpiZD6 |
22:14
🔗
|
achip |
and what I got from that: http://paste.nerds.io/raw/anadaresug |
22:14
🔗
|
nickname |
Thank you |
22:15
🔗
|
nickname |
I'm on windows, so I can't do the awesome command line text manipulation that is the *nix terminal |
22:16
🔗
|
achip |
no problem, for future reference that was: cat thingy2.txt | egrep -oE "\.mba\":\"[^\"]*" | sed "s/^.mba\":\"//" > thingy2-res.txt |
22:16
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:18
🔗
|
nickname |
There's a whole git repository of just gnu mailman links |
22:18
🔗
|
achip |
perfect! |
22:19
🔗
|
nickname |
It's at bitbucket: https://bitbucket.org/themailbait/themailbait.bitbucket.org/src/501cbbc613d2ebc56d77ccc0f3288c88c0a0d042/jsonp/?at=master |
22:19
🔗
|
nickname |
There all in JS |
22:19
🔗
|
nickname |
and there may be some dead links, but it's a start! |
22:21
🔗
|
achip |
that mail bait project is interesting (http://www.mailbait.info) at least I think it's the same |
22:22
🔗
|
nickname |
It's the same, it's linked in the source of the page |
22:22
🔗
|
nickname |
proof: check the source of this page www.mailbait.info/run.html?pack=52 |
22:23
🔗
|
achip |
nice, good find |
22:38
🔗
|
nickname |
Another one: https://pastebin.com/raw.php?i=2Ui43VaE |
22:47
🔗
|
* |
nickname slaps achip around a bit with a large fishbot |
22:49
🔗
|
achip |
pretty similar: http://paste.nerds.io/raw/aseyitohum |
22:52
🔗
|
nickname |
As for the problem of archiving the email messages, I suggest using gmail, but having 1,000 variations on 1 account, such as, wearegoingtorescue@gmail.com to we.are.going.t.o.res.cu.e@gmail.com |
22:55
🔗
|
|
nickname_ has joined #projectnewsletter |
22:55
🔗
|
nickname_ |
woah |
22:55
🔗
|
nickname_ |
this is weird |
22:56
🔗
|
|
nickname has quit IRC (Ping timeout: 240 seconds) |
22:57
🔗
|
|
nickname has joined #projectnewsletter |
22:57
🔗
|
nickname |
I am here |
22:57
🔗
|
nickname |
achip: here is another one: /mailman/subscribe/ |
22:58
🔗
|
nickname |
ignore that |
22:58
🔗
|
nickname |
I have too much stuff in my clipboard |
23:00
🔗
|
|
nickname_ has quit IRC (Ping timeout: 240 seconds) |
23:17
🔗
|
nickname |
achip: another one: https://pastebin.com/raw.php?i=WnTrt6NL |
23:29
🔗
|
|
Start has joined #projectnewsletter |
23:30
🔗
|
|
svchfoo1 sets mode: +o Start |
23:36
🔗
|
Start |
achip: does project newsletter regularly upload archived newsletters to archive.org yet? |
23:41
🔗
|
arkiver |
Start: the project isn't completely finished |
23:41
🔗
|
arkiver |
yet* |
23:41
🔗
|
arkiver |
We should get back to work on it soon |
23:44
🔗
|
Start |
alright, i was just wondering because mail1-3.newsletter.nerds.io have all been archiving various newsletters for several months |
23:45
🔗
|
Start |
i'd personally love to have a warrior project for discovering newsletters, although that would likely be very hard to code |
23:54
🔗
|
nickname |
I found a git repo of JS files containing GNU mailman links |
23:54
🔗
|
nickname |
Here it is:https://bitbucket.org/themailbait/themailbait.bitbucket.org/src/501cbbc613d2ebc56d77ccc0f3288c88c0a0d042/jsonp/?at=master |