Time |
Nickname |
Message |
11:59
🔗
|
emijrp |
hey guys |
12:01
🔗
|
emijrp |
i would like to scan the internet for mediawiki wikis |
12:01
🔗
|
emijrp |
sort by api or not api, size, and last edit (probably abandoned wikis) |
12:02
🔗
|
emijrp |
then download from most inactive to activest ones |
12:02
🔗
|
emijrp |
there is a wikicrawler, but the .csv is from 2009 |
12:05
🔗
|
ersi |
sounds great |
12:06
🔗
|
ersi |
I'm working on a little webcrawler to just find <a href=""> links and extract whatever is in href="" - havn't come far yet, but I bet one could put in some "signatures" of a mediawiki page and make it find that as well |
12:16
🔗
|
emijrp |
http://www.lolcatbible.com/index.php?title=Main_Page |
12:19
🔗
|
emijrp |
https://www.google.es/#q=%22This+page+was+last+modified+on%22+%22This+page+has+been+accessed%22 |
12:19
🔗
|
emijrp |
there is a way to avoid dupe domains in google results? |
12:53
🔗
|
Nemo_bis |
I guess that's a question for alard |