#wikiteam 2012-03-26,Mon

↑back Search

Time Nickname Message
11:59 🔗 emijrp hey guys
12:01 🔗 emijrp i would like to scan the internet for mediawiki wikis
12:01 🔗 emijrp sort by api or not api, size, and last edit (probably abandoned wikis)
12:02 🔗 emijrp then download from most inactive to activest ones
12:02 🔗 emijrp there is a wikicrawler, but the .csv is from 2009
12:05 🔗 ersi sounds great
12:06 🔗 ersi I'm working on a little webcrawler to just find <a href=""> links and extract whatever is in href="" - havn't come far yet, but I bet one could put in some "signatures" of a mediawiki page and make it find that as well
12:16 🔗 emijrp http://www.lolcatbible.com/index.php?title=Main_Page
12:19 🔗 emijrp https://www.google.es/#q=%22This+page+was+last+modified+on%22+%22This+page+has+been+accessed%22
12:19 🔗 emijrp there is a way to avoid dupe domains in google results?
12:53 🔗 Nemo_bis I guess that's a question for alard

irclogger-viewer