[09:11] 2,000 wikidots found so far, still finding more [09:12] we could use warrior/tracker or a script ad-hoc to export wikicode+images, or both approaches [09:13] i think wikidot is the largest wikifarm, non-mediawiki, without a dump [09:14] the last one was wikispaces, and it was archived with warrior (400,000 wikispaces) and ad-hoc script (200,000 wikispaces) [11:44] *** kiska18 has quit IRC (Ping timeout (120 seconds)) [11:47] *** kiska18 has joined #wikiteam [11:48] *** Iglooop1 sets mode: +o kiska18 [13:33] 3,000 wikidots found [14:06] VoynichCr: are you still using a Google scraper? [14:29] *** LowLevelM has joined #wikiteam [15:14] *** vicarage_ has joined #wikiteam [15:16] Hi. I've just spent 9 months porting our 30000 page wiki from wikidot to mediawiki. It was not an easy process, so I suspect if [15:16] wikidot goes done, most of its content will go with it [15:16] wikidot goes down, most of its content will go with it [15:55] We had to write special software to extract the page metadata (title, tags, aka categories) to combine with the wikidot provided backup page which just provided page content [16:00] Backup can only be done by adminstrators, not merely members who've contributed to the each wiki [16:21] vicarage_: hello, welcome on #wikiteam [16:21] any and all code you used for your migration would be very welcome if published under a free license, so it can possibly be used for migration or export tools [16:23] LowLevelM: no, I don't think archivebot is able to cycle through all the content on wikidot wikis [16:24] Maybe just the forum? [16:24] Its the world's nastiest bash and sed script, combined with a colleagues nasty python. But you are welcome to it [16:41] ArchiveTeam *thrives* on nasty code. [16:55] heh [16:56] vicarage_: we can probably recycle regular expressions, for instance [16:57] LowLevelM: sure, the forum might be fine [17:00] Meanwhile, the admin of http://editthis.info/ mysteriously came back from the dead. Over five years ago I would have bet the death of the wiki farm was imminent. :) [18:24] Nemo_bis: if you append /random-site.php to any wikidot wiki, you jump to other [18:24] i used that, but i only got 3155 wikis [18:25] i scraped the four wiki suggestions that sometimes are available at the bottom of any wiki [18:25] 3155 seems low to me... but i can't get more using this scheme [18:29] wikidot list posted on wikiteam github [18:37] That list definitely not complete, does not include my big wiki fancyclopedia.wikidot.com, not a trivial testfancy.wikidot.com. [18:38] s/nor a/ [18:40] My colleague wrote an api based download that uses a key which is only available to admistrators of paid sites, so not much use for external archivers [18:44] When I tried the /random-page.php 5 times, I got all 5 in your file, so it suggests that it just gives access to a curated sub-selection [19:57] Which makes sense [20:05] Can the list of users be used for discovery? e.g. https://www.wikidot.com/user:info/andraholguin69 [20:05] The commoncrawl.org dataset seems to have quite a few, see e.g. https://index.commoncrawl.org/CC-MAIN-2019-47-index?url=*.wikidot.com&output=json (warning, big page) [20:21] JAA: yeah i was about to say :P [20:26] Via a convoluted route. If you have a wiki, you can send invites, and after you've typed 2 characters, the user list appears, which you could scrape and see what wikis they were members of [20:31] Note it only shows some 100 usernames, so for aa it gets to 'aaa*'. Sending a message, the dropdown list is much shorter, only 10 names [20:45] A google search on a page on every wiki 'site:wikidot.com -www search:site' gives 129000 hits. But I expect you lot know more about subdomain finders than me [20:47] Usually we try to to search for some content that appears on every subdomain and doesn't fall foul of Google's deduplication (yes, contradictory requirement), a bit like Special:Version on MediaWiki [20:57] site:wikidot.com "system:list-all-pages" gives 6400 clean results [21:18] vicarage_: what is your estimate for number of wikis? [21:19] according to wikidot mainpage, there are 80+ million pages [21:19] even in my list there are a lot test wikis and spam ones [21:22] Nemo_bis: the "member of" loads after click, via javascript [21:23] never scraped dynamic content, tips welcome [21:36] VoynichCr: seems to be a simple POST request of the kind curl 'https://www.wikidot.com/ajax-module-connector.php' --data 'user_id=3657632&moduleName=userinfo%2FUserInfoMemberOfModule&callbackIndex=1&wikidot_token7=abcd' [21:37] So it might be enough to scrape the user pages from the commoncrawl.org list and then query the corresponding user IDs with this "API", at a reasonable speed [21:37] wikipedia page suggests 150000 sites. I had a look, and no running tally seems available. [21:45] The search engine list would be the most useful ones to save. http://community.wikidot.com/whatishot shows the activity level [21:51] https://web.archive.org/web/20131202221632/http://www.wikidot.com/stats has 128000 sites in 2009