[09:22] *** wessel151 has joined #webroasting [09:24] are ther plans to save ziggo home.nl [09:25] and chello.nl [10:36] @jaa [10:36] @JAA [13:44] wessel151: No specific plans yet, but it's on our radar, and we'll certainly try. [14:16] wessel151: Do you know if there is a list of webspaces/usernames somewhere? [14:18] Also, list of affected webspaces: http://members.casema.nl/USERNAME/ http://home.casema.nl/USERNAME/ http://members.home.nl/USERNAME/ http://members.quicknet.nl/USERNAME/ http://members.upc.nl/USERNAME/ http://members.chello.nl/USERNAME/ http://members.ziggo.nl/USERNAME/ [14:29] @JAA i am currently using reverse search on duckdockgo [14:30] just site:http://members.home.nl/ [14:30] Right [14:31] We could use a good Dutch word list of likely candidates for archival. [14:31] Here's one I used before for TalkTalk (UK): 'family' 'genealogy' 'club' 'society' 'clan' 'company' 'ltd' 'home' 'index' 'wedding' 'school' 'college' 'archive' 'history' 'document' 'church' 'band' 'manual' 'product' [14:35] you want a dutch dictionary of same sorts [14:36] Yeah, just words commonly used on websites that would be hosted on such ISP webspaces. [14:43] i found the fist 20.000 names already [14:46] JAA can i upload them somewhere [14:47] maby a new github project [14:58] wessel151: Do you mean the word list or usernames? [14:59] word list [14:59] Hmm, yeah, it has to be quite small to be effective. I want to use it for search engine scraping. [15:00] Bing's the only search engine that lets you persistently scrape, but only at a very slow pace. [15:02] So something like a few dozen to maybe a couple hundred words. [15:26] JAA: can we use this https://gathering.tweakers.net/forum/find?keyword=http%3A%2F%2Fmembers.home.nl%2F#filter:q1bKTq0szy9KUbJSyigpKbCK0Y_Rz03NTUotKtbLyM9N1cvLidFX0lEqSExPVbIyhDCCM6tAHAODWgA [15:34] wessel151: Sure. Someone just needs to write a scraper for it. [15:36] i can do it manualy [15:36] ist verry small [17:04] *** Ryz has joined #webroasting [17:13] *** dock has joined #webroasting [17:19] *** dock has quit IRC (Quit: Page closed) [18:08] *** qwebirc12 has joined #webroasting [18:08] test [18:08] *** qwebirc12 has left [18:09] *** wessel152 has joined #webroasting [20:46] JAA can you do something with this https://cse.google.nl/cse?cx=013325157016033957817:6qqkwllcjas [20:46] i can make api cals to it [21:07] *** wessel152 has quit IRC (Ping timeout: 260 seconds) [23:07] *** logchfoo1 starts logging #webroasting at Mon Mar 02 23:07:06 2020 [23:07] *** logchfoo1 has joined #webroasting