Time |
Nickname |
Message |
04:29
🔗
|
Nemo_bis |
jack293: just updates for now |
04:29
🔗
|
Nemo_bis |
wikiapiary is down so I couldn't make a list of unarchived wikis |
07:30
🔗
|
jack293 |
for how long is down wikiapiary? it died? Nemo_bis |
08:31
🔗
|
Nemo_bis |
jack293: I think it was still up last week |
08:32
🔗
|
Nemo_bis |
Specifically, I'm pretty sure it was up as of May 2 https://www.mediawiki.org/w/index.php?title=Wikimedia_Language_engineering/Reports/2018-April&diff=2770809 |
09:10
🔗
|
jack293 |
28,000 wikispaces found, still 270,000 profiles to explore |
09:12
🔗
|
jack293 |
if you check html code, there is a ID, example id: '4631883', for https://iuccommonsproject.wikispaces.com/ |
09:13
🔗
|
jack293 |
i have seen wikis with ID over 20 million, though I dont know if there is a way to find any wiki by its ID |
09:13
🔗
|
jack293 |
so for now I rely on a spider |
09:14
🔗
|
jack293 |
i assume there are over 20 million wikis (including probably deleted ones which are unreachable) |
09:16
🔗
|
jack293 |
and including million of 1page (home page) low-interest wikis |
12:16
🔗
|
|
davisonio has quit IRC () |
12:16
🔗
|
|
davisonio has joined #wikiteam |
12:38
🔗
|
jack293 |
wikispaces script is ready, it downloads and uploads to IA |
12:39
🔗
|
jack293 |
have fun and report any error |
13:21
🔗
|
jack293 |
31,000 wikis found |
15:18
🔗
|
|
logchfoo0 starts logging #wikiteam at Sun May 06 15:18:12 2018 |
15:18
🔗
|
|
logchfoo0 has joined #wikiteam |
15:53
🔗
|
|
charles81 has quit IRC () |
15:53
🔗
|
|
charles81 has joined #wikiteam |
18:09
🔗
|
Nemo_bis |
300k pages of spam, soon duly archived for posterity http://www.scanbc.com/wiki/index.php?title=Main_Page |
18:10
🔗
|
Nemo_bis |
The spambots are cheerily edit warring www.scanbc.com/wiki/index.php?diff=932789 |
18:44
🔗
|
jack293 |
first 75 wikispaces archived https://archive.org/search.php?query=subject%3A%22wikispaces%22&and%5B%5D=mediatype%3A%22web%22&sort=-publicdate |
19:06
🔗
|
Nemo_bis |
6239 MediaWiki wikis updated so far |
19:07
🔗
|
Nemo_bis |
The remaining 4k are stuck in some error... I have little energy to fix those, better just add an alternative where we download the XML from the API for all revisions 500 at a time |
20:32
🔗
|
jack293 |
good job Nemo_bis |
21:03
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |