[01:15] \o/ [11:22] snails [11:22] snails everywhereeee [12:07] * Nemo_bis hands a hammer [12:07] useful also if they lose an s [18:15] if i pull wikia backups should i put them in https://archive.org/details/wikiteamhttps://archive.org/details/wikiteam ? [18:33] pft: you can't [18:34] but please add WikiTeam keyword so that an admin can later move them [18:34] ok [18:34] pft: What sort of Wikia backups are you pulling? [18:34] just their own dumps [18:34] yes but which [18:34] memory alpha and wookiepedia at the moment [18:34] i might pull more later [18:34] ah ok, little scale [18:34] yes [18:35] wookiepedia is 400m so it's pretty small [18:35] are their dumps complete? [18:35] i haven't tested yet [18:35] I'm in contact with them for them to upload to archive.org all their dumps, but I've been told it needs to be discussed in some senior staff meeting [18:35] i will ungzip at home and probably load into mediawikis [18:35] balrog: define complete? [18:36] have all the data that's visible [18:36] Nemo_bis: why is a senior staff meeting required, if I may ask? [18:36] how would I know :) [18:36] and no, not all data visible, that's impossible with XML dumos [18:37] but all data needed to make all the data which is visible, minus logs and private user data :) except I don't see images dumps any longer and they don't dump all wikis [18:37] yeah i didn't see any image dumps anywhere which is frustrating [18:37] they don't rpovide image dumps? :/ [18:37] provide* [18:38] and not all wikis? can wiki administrators turn it off individually? [18:38] well dumps.wikia.net appears to be gone and the woanloads available on Special:Statistics seem to be user-generated by staff and have a limited duration [18:38] http://en.memory-alpha.org/wiki/Special:Statistics [18:38] that page has a "current pages and history" but i don't see anything about images [18:38] it never did but they made them nevertheless [18:39] perhaps we just need to find out the filenames [18:39] s3://wikia_xml_dumps/w/wo/wowwiki_pages_current. xml.gz etc [18:39] for images [18:40] this is how it used to be https://ia801507.us.archive.org/zipview.php?zip=/28/items/wikia_dump_20121204/c.zip [18:41] hmm [18:41] wikia's source code is open [18:41] including the part that uploads the dumps to S3 [18:41] interesting [18:42] https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Close/maintenance.php [18:42] look for DumpsOnDemand::putToAmazonS3 [18:42] well, not actually all of it [18:42] though they are working on open sourcing it all [18:43] Nemo_bis: interesting [18:43] https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Dumps/DumpsOnDemand.php [18:43] "url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_current.xml.gz" ), [18:43] "url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_full.xml.gz" ), [18:43] don't see anything for images [18:45] yeah, they appear as tars in the link Nemo_bis pasted [18:45] i'm guessing that was more of a manual thing they did [18:45] "Wikia does not perform dumps of images (but see m:Wikix)." [18:45] http://meta.wikimedia.org/wiki/Wikix [18:46] ...interesting [18:46] that will extract and grab all images in an xml dump [18:46] nice1 [18:46] er nice! [18:46] pft: it was not manual [18:46] ahh o [18:46] er ok [18:47] wikix is horribly painful [18:47] and it's not designed to handle 300k wikis [18:47] ahhh [18:47] sorry, i realize this is all stuff you have been down before [18:48] Nemo_bis: really? :/ [18:48] just trying to figure out how to help [18:48] Nemo_bis: is there a reference to what's been done? [18:55] balrog: about? [18:55] with regards to what tools have been tested and such [18:56] for what [18:57] dumping large wikis [18:57] most Wikia wikis are very tiny [18:57] there isn't much to test, we only need to see if Wikia is helpful or not [18:58] if it's not helpful, we'll have to run dumpgenerator on all their 350k wikis to get all the text and images [18:58] ouch [18:58] but that's not particularly painful, just a bit boring [18:58] how difficult would it be to submit a PR to their repo that would cause images to also be archived? [18:58] unless they go really rogue and disable API or so, which I don't think they'd do though [18:59] they allegedly have problems with space [18:59] how many wikis have we run into which have disabled API access? [18:59] this is probably what the seniors have to discuss, whether to spend 10 $ instead of 5 for the space on S3 :) [18:59] thousands [18:59] how do we dump those? :/ [18:59] with pre-API method [18:59] Special:Export [19:00] some disable even that, but it's been only a couple wikis so far [19:00] i tried to grab memory-alpha but coudln't find the api page for it before i did more readinga nd found that I could download the dump [19:00] usually the problem with wiki sysadmins is stupidity, not malice [19:04] same with forums, too [19:06] :) [19:08] what's the best way to dump forums though? they're not as rough on wget at least [19:11] we need to start contributing to open-source projects to put in easy backup things that are publicly enabled by default ;) [19:12] pft: you're welcome :) https://bugzilla.wikimedia.org/buglist.cgi?resolution=---&query_format=advanced&component=Export%2FImport [19:13] nice