#wikiteam 2013-11-13,Wed

↑back Search

Time Nickname Message
01:15 🔗 xmc \o/
11:22 🔗 ersi snails
11:22 🔗 ersi snails everywhereeee
12:07 🔗 * Nemo_bis hands a hammer
12:07 🔗 Nemo_bis useful also if they lose an s
18:15 🔗 pft if i pull wikia backups should i put them in https://archive.org/details/wikiteamhttps://archive.org/details/wikiteam ?
18:33 🔗 Nemo_bis pft: you can't
18:34 🔗 Nemo_bis but please add WikiTeam keyword so that an admin can later move them
18:34 🔗 pft ok
18:34 🔗 Nemo_bis pft: What sort of Wikia backups are you pulling?
18:34 🔗 pft just their own dumps
18:34 🔗 Nemo_bis yes but which
18:34 🔗 pft memory alpha and wookiepedia at the moment
18:34 🔗 pft i might pull more later
18:34 🔗 Nemo_bis ah ok, little scale
18:34 🔗 pft yes
18:35 🔗 pft wookiepedia is 400m so it's pretty small
18:35 🔗 balrog are their dumps complete?
18:35 🔗 pft i haven't tested yet
18:35 🔗 Nemo_bis I'm in contact with them for them to upload to archive.org all their dumps, but I've been told it needs to be discussed in some senior staff meeting
18:35 🔗 pft i will ungzip at home and probably load into mediawikis
18:35 🔗 Nemo_bis balrog: define complete?
18:36 🔗 balrog have all the data that's visible
18:36 🔗 balrog Nemo_bis: why is a senior staff meeting required, if I may ask?
18:36 🔗 Nemo_bis how would I know :)
18:36 🔗 Nemo_bis and no, not all data visible, that's impossible with XML dumos
18:37 🔗 Nemo_bis but all data needed to make all the data which is visible, minus logs and private user data :) except I don't see images dumps any longer and they don't dump all wikis
18:37 🔗 pft yeah i didn't see any image dumps anywhere which is frustrating
18:37 🔗 balrog they don't rpovide image dumps? :/
18:37 🔗 balrog provide*
18:38 🔗 balrog and not all wikis? can wiki administrators turn it off individually?
18:38 🔗 pft well dumps.wikia.net appears to be gone and the woanloads available on Special:Statistics seem to be user-generated by staff and have a limited duration
18:38 🔗 pft http://en.memory-alpha.org/wiki/Special:Statistics
18:38 🔗 pft that page has a "current pages and history" but i don't see anything about images
18:38 🔗 Nemo_bis it never did but they made them nevertheless
18:39 🔗 Nemo_bis perhaps we just need to find out the filenames
18:39 🔗 balrog s3://wikia_xml_dumps/w/wo/wowwiki_pages_current. xml.gz etc
18:39 🔗 Nemo_bis for images
18:40 🔗 Nemo_bis this is how it used to be https://ia801507.us.archive.org/zipview.php?zip=/28/items/wikia_dump_20121204/c.zip
18:41 🔗 balrog hmm
18:41 🔗 balrog wikia's source code is open
18:41 🔗 balrog including the part that uploads the dumps to S3
18:41 🔗 pft interesting
18:42 🔗 balrog https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Close/maintenance.php
18:42 🔗 balrog look for DumpsOnDemand::putToAmazonS3
18:42 🔗 Nemo_bis well, not actually all of it
18:42 🔗 Nemo_bis though they are working on open sourcing it all
18:43 🔗 balrog Nemo_bis: interesting
18:43 🔗 balrog https://github.com/Wikia/app/blob/dev/extensions/wikia/WikiFactory/Dumps/DumpsOnDemand.php
18:43 🔗 balrog "url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_current.xml.gz" ),
18:43 🔗 balrog "url" => 'http://s3.amazonaws.com/wikia_xml_dumps/' . self::getPath( "{$wgDBname}_pages_full.xml.gz" ),
18:43 🔗 balrog don't see anything for images
18:45 🔗 pft yeah, they appear as tars in the link Nemo_bis pasted
18:45 🔗 pft i'm guessing that was more of a manual thing they did
18:45 🔗 balrog "Wikia does not perform dumps of images (but see m:Wikix)."
18:45 🔗 balrog http://meta.wikimedia.org/wiki/Wikix
18:46 🔗 balrog ...interesting
18:46 🔗 balrog that will extract and grab all images in an xml dump
18:46 🔗 pft nice1
18:46 🔗 pft er nice!
18:46 🔗 Nemo_bis pft: it was not manual
18:46 🔗 pft ahh o
18:46 🔗 pft er ok
18:47 🔗 Nemo_bis wikix is horribly painful
18:47 🔗 Nemo_bis and it's not designed to handle 300k wikis
18:47 🔗 pft ahhh
18:47 🔗 pft sorry, i realize this is all stuff you have been down before
18:48 🔗 balrog Nemo_bis: really? :/
18:48 🔗 pft just trying to figure out how to help
18:48 🔗 balrog Nemo_bis: is there a reference to what's been done?
18:55 🔗 Nemo_bis balrog: about?
18:55 🔗 balrog with regards to what tools have been tested and such
18:56 🔗 Nemo_bis for what
18:57 🔗 balrog dumping large wikis
18:57 🔗 Nemo_bis most Wikia wikis are very tiny
18:57 🔗 Nemo_bis there isn't much to test, we only need to see if Wikia is helpful or not
18:58 🔗 Nemo_bis if it's not helpful, we'll have to run dumpgenerator on all their 350k wikis to get all the text and images
18:58 🔗 balrog ouch
18:58 🔗 Nemo_bis but that's not particularly painful, just a bit boring
18:58 🔗 balrog how difficult would it be to submit a PR to their repo that would cause images to also be archived?
18:58 🔗 Nemo_bis unless they go really rogue and disable API or so, which I don't think they'd do though
18:59 🔗 Nemo_bis they allegedly have problems with space
18:59 🔗 balrog how many wikis have we run into which have disabled API access?
18:59 🔗 Nemo_bis this is probably what the seniors have to discuss, whether to spend 10 $ instead of 5 for the space on S3 :)
18:59 🔗 Nemo_bis thousands
18:59 🔗 balrog how do we dump those? :/
18:59 🔗 Nemo_bis with pre-API method
18:59 🔗 Nemo_bis Special:Export
19:00 🔗 Nemo_bis some disable even that, but it's been only a couple wikis so far
19:00 🔗 pft i tried to grab memory-alpha but coudln't find the api page for it before i did more readinga nd found that I could download the dump
19:00 🔗 Nemo_bis usually the problem with wiki sysadmins is stupidity, not malice
19:04 🔗 xmc same with forums, too
19:06 🔗 Nemo_bis :)
19:08 🔗 balrog what's the best way to dump forums though? they're not as rough on wget at least
19:11 🔗 pft we need to start contributing to open-source projects to put in easy backup things that are publicly enabled by default ;)
19:12 🔗 Nemo_bis pft: you're welcome :) https://bugzilla.wikimedia.org/buglist.cgi?resolution=---&query_format=advanced&component=Export%2FImport
19:13 🔗 pft nice

irclogger-viewer