[00:17] *** Start has joined #webroasting [01:35] *** chfoo has quit IRC (Ping timeout: 499 seconds) [01:35] *** chfoo has joined #webroasting [01:36] *** svchfoo3 sets mode: +o chfoo [02:08] *** Start has quit IRC (Read error: Connection reset by peer) [02:08] *** Start has joined #webroasting [02:20] *** dashcloud has quit IRC (Read error: Operation timed out) [02:23] *** dashcloud has joined #webroasting [02:41] *** chfoo0 has joined #webroasting [02:45] *** chfoo has quit IRC (Ping timeout: 260 seconds) [02:47] *** chfoo0 is now known as chfoo [15:11] *** Start has quit IRC (Quit: Disconnected.) [15:47] *** Start has joined #webroasting [17:08] *** Start has quit IRC (Quit: Disconnected.) [19:20] *** Start has joined #webroasting [20:26] so i'd like to start automating the web hosting project [20:26] probably through a warrior project [20:27] it would work something like this: [20:27] 1. get a list of urls from google, bing, etc. [20:27] *manually get [20:28] 2. put the urls into the project [20:28] 3. sites are grabbed and any other isp hosted sites linked to are grabbed as well [20:32] one issue is how sites with hidden subdirectories would be handled (e.g. http://homepage.ntlworld.com/ashen1/ and http://homepage.ntlworld.com/ashen1/ashen/) [20:34] discovery will be trivial for hostings with numerical usernames [20:35] a dictionary based method of discovery (common words and names with numbers) could work for the rest [20:36] maybe setup an irc bot where people can add lists of isp hosted sites to the grab [20:45] *** Start has quit IRC (Quit: Disconnected.)