[15:19] *** Start has quit IRC (Quit: Disconnected.) [16:24] *** Start has joined #projectnewsletter [16:24] *** Start has quit IRC (Client Quit) [16:24] *** Start has joined #projectnewsletter [17:07] *** Start has quit IRC (Quit: Disconnected.) [17:12] *** Start has joined #projectnewsletter [18:38] *** Start has quit IRC (Quit: Disconnected.) [19:36] *** Start has joined #projectnewsletter [20:37] *** Start has quit IRC (Quit: Disconnected.) [20:43] *** Start has joined #projectnewsletter [20:45] *** Start has quit IRC (Client Quit) [20:48] *** Start has joined #projectnewsletter [21:19] *** nickname has joined #projectnewsletter [21:19] hello [21:21] *** nickname has quit IRC (Client Quit) [21:33] *** nickname has joined #projectnewsletter [21:33] anyone scraped google yet? [21:50] nickname, feel free to, it's likely changed since the last time [21:51] what should I use? [21:52] good question. My usual goto is just open a browser, enter the search and ctrl+click to open each page of the results in a tab. then use a "copy links" extension to copy all the links on the tabs. paste that into a document then regex [21:52] it's manually intensive but it works [21:56] I found a big list of email pages, only problem is that their in some strange compressed JS. [21:57] Here's an unrelated pastebin scrape link: https://pastebin.com/raw.php?i=0tZYHKRP [22:03] here's what I regex'd from that list http://paste.nerds.io/raw/qipuzanafa [22:08] Here's another link, some of the links on it may be dead and it's in JS: https://pastebin.com/raw.php?i=8LKpiZD6 [22:14] and what I got from that: http://paste.nerds.io/raw/anadaresug [22:14] Thank you [22:15] I'm on windows, so I can't do the awesome command line text manipulation that is the *nix terminal [22:16] no problem, for future reference that was: cat thingy2.txt | egrep -oE "\.mba\":\"[^\"]*" | sed "s/^.mba\":\"//" > thingy2-res.txt [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:18] There's a whole git repository of just gnu mailman links [22:18] perfect! [22:19] It's at bitbucket: https://bitbucket.org/themailbait/themailbait.bitbucket.org/src/501cbbc613d2ebc56d77ccc0f3288c88c0a0d042/jsonp/?at=master [22:19] There all in JS [22:19] and there may be some dead links, but it's a start! [22:21] that mail bait project is interesting (http://www.mailbait.info) at least I think it's the same [22:22] It's the same, it's linked in the source of the page [22:22] proof: check the source of this page www.mailbait.info/run.html?pack=52 [22:23] nice, good find [22:38] Another one: https://pastebin.com/raw.php?i=2Ui43VaE [22:47] * nickname slaps achip around a bit with a large fishbot [22:49] pretty similar: http://paste.nerds.io/raw/aseyitohum [22:52] As for the problem of archiving the email messages, I suggest using gmail, but having 1,000 variations on 1 account, such as, wearegoingtorescue@gmail.com to we.are.going.t.o.res.cu.e@gmail.com [22:55] *** nickname_ has joined #projectnewsletter [22:55] woah [22:55] this is weird [22:56] *** nickname has quit IRC (Ping timeout: 240 seconds) [22:57] *** nickname has joined #projectnewsletter [22:57] I am here [22:57] achip: here is another one: /mailman/subscribe/ [22:58] ignore that [22:58] I have too much stuff in my clipboard [23:00] *** nickname_ has quit IRC (Ping timeout: 240 seconds) [23:17] achip: another one: https://pastebin.com/raw.php?i=WnTrt6NL [23:29] *** Start has joined #projectnewsletter [23:30] *** svchfoo1 sets mode: +o Start [23:36] achip: does project newsletter regularly upload archived newsletters to archive.org yet? [23:41] Start: the project isn't completely finished [23:41] yet* [23:41] We should get back to work on it soon [23:44] alright, i was just wondering because mail1-3.newsletter.nerds.io have all been archiving various newsletters for several months [23:45] i'd personally love to have a warrior project for discovering newsletters, although that would likely be very hard to code [23:54] I found a git repo of JS files containing GNU mailman links [23:54] Here it is:https://bitbucket.org/themailbait/themailbait.bitbucket.org/src/501cbbc613d2ebc56d77ccc0f3288c88c0a0d042/jsonp/?at=master