[11:59] hey guys [12:01] i would like to scan the internet for mediawiki wikis [12:01] sort by api or not api, size, and last edit (probably abandoned wikis) [12:02] then download from most inactive to activest ones [12:02] there is a wikicrawler, but the .csv is from 2009 [12:05] sounds great [12:06] I'm working on a little webcrawler to just find links and extract whatever is in href="" - havn't come far yet, but I bet one could put in some "signatures" of a mediawiki page and make it find that as well [12:16] http://www.lolcatbible.com/index.php?title=Main_Page [12:19] https://www.google.es/#q=%22This+page+was+last+modified+on%22+%22This+page+has+been+accessed%22 [12:19] there is a way to avoid dupe domains in google results? [12:53] I guess that's a question for alard