[00:11] *** Start has joined #wikiteam
[01:28] *** kyan has joined #wikiteam
[01:51] *** vitzli has joined #wikiteam
[03:45] *** kyan has quit IRC (This computer has gone to sleep)
[03:48] *** vitzli has quit IRC (Leaving)
[06:03] *** vitzli has joined #wikiteam
[06:11] *** vitzli has quit IRC (Leaving)
[06:22] *** vitzli has joined #wikiteam
[08:32] *** kyan has joined #wikiteam
[08:48] *** kyan has quit IRC (Leaving)
[13:57] * Nemo_bis screams https://github.com/WikiTeam/wikiteam/issues/269
[14:09] <vitzli> could you share wikipuella_maginet-20160211-images.txt please?
[14:09] <vitzli> I'd like to stare at it
[14:17] <Nemo_bis> vitzli: the reporter didn't attach it
[14:18] <vitzli> I thought it was you, sorry
[14:59] *** Start has quit IRC (Quit: Disconnected.)
[15:39] *** Start has joined #wikiteam
[15:52] *** Fletcher has quit IRC (Ping timeout: 252 seconds)
[15:56] *** midas has quit IRC (Ping timeout: 260 seconds)
[16:05] *** midas has joined #wikiteam
[16:21] *** Start has quit IRC (Quit: Disconnected.)
[16:24] *** Start has joined #wikiteam
[16:25] *** Start has quit IRC (Remote host closed the connection)
[16:25] *** Start has joined #wikiteam
[16:40] *** Fletcher has joined #wikiteam
[17:07] *** Start has quit IRC (Quit: Disconnected.)
[18:37] *** svchfoo3 has quit IRC (Read error: Operation timed out)
[18:39] *** svchfoo3 has joined #wikiteam
[18:39] *** svchfoo1 sets mode: +o svchfoo3
[18:42] *** Start has joined #wikiteam
[18:43] *** vitzli has quit IRC (Leaving)
[19:19] *** Start has quit IRC (Quit: Disconnected.)
[19:23] *** Start has joined #wikiteam
[20:28] *** ploopkazo has joined #wikiteam
[20:29] <ploopkazo> once a wiki is downloaded with dumpgenerator.py, is there a way to turn it into a zim for use with a zim reader like kiwix?
[20:45] *** Start has quit IRC (Quit: Disconnected.)
[20:55] <Nemo_bis> ploopkazo: not really
[20:55] <Nemo_bis> ploopkazo: are you trying to make a ZIM file for a dead wiki?
[21:15] <ploopkazo> Nemo_bis: no, it's still online
[21:17] <Nemo_bis> ploopkazo: then you should install Parsoid and run mwoffliner with it
[21:17] <ploopkazo> Nemo_bis: how does one view a dumpgenerator.py dump if not conversion to zim?
[21:17] <Nemo_bis> The XML dump can be parsed properly only by MediaWiki itself
[21:18] <Nemo_bis> dumpgenerator.py attempts to collect all the information one will need to create a clone of the original wiki
[21:18] <ploopkazo> oh
[21:18] <xmc> welllll there's another tool that reads mediawiki xml dumps https://github.com/chronomex/wikiscraper
[21:18] <xmc> not full fidelity but it's fun to play with
[21:19] <xmc> :)
[21:20] <ploopkazo> so if I want to convert an xml dump to zim, my only real option is to run php+mysql and then scrape my local instance with mwoffliner?
[21:21] <Nemo_bis> AFAIK yes
[21:21] <ploopkazo> how easy is it to load the dump into a mw instance?
[21:21] <Nemo_bis> Well, there's also the old dumpHTML way but that's a bit hacky
[21:21] <Nemo_bis> Depends on the wiki size and extensions
[21:23] <ploopkazo> what kind of extensions? I haven't run mediawiki before
[21:23] <Nemo_bis> Can you link the wiki in question?
[21:23] <ploopkazo> at the moment, https://wiki.puella-magi.net
[21:23] <ploopkazo> though I imagine I'll collect a lot of them once I have the process figured out
[21:24] <Nemo_bis> Ok, that's an easy one
[21:24] <ploopkazo> oh, http
[21:25] <ploopkazo> https isn't loading for some reason
[21:25] <ploopkazo> though https is the one in my history
[21:25] <Nemo_bis> loaded for me
[21:25] <xmc> pretty slow on my end
[21:25] <ploopkazo> weird
[21:25] <ploopkazo> Nemo_bis: how do you tell it's an easy one? is there an info page with the installed extensions or something?
[21:26] <Nemo_bis> ploopkazo: http://wiki.puella-magi.net/Special:Version you're especially interested in the parser-related things
[21:27] <ploopkazo> parsoid is always desirable, right?
[21:27] <Nemo_bis> ploopkazo: in theory yes but in practice few use it outside Wikimedia and Wikimedia-like installs
[21:27] <Nemo_bis> And it doesn't  work even for some Wikimedia wikis yet
[21:27] <ploopkazo> hmm
[21:28] <Nemo_bis> In theory the worse that can happen is that it doesn't parse some pages
[21:28] <ploopkazo> which things on that wiki make it easy?
[21:28] <Nemo_bis> It's a small wiki and it only has the most common parser extensions
[21:28] <ploopkazo> oh
[21:28] <Nemo_bis> <gallery>, <math>, <nowiki>, <pre>, <ref> and <references>
[21:29] <Nemo_bis> <math> can be nasty but it's still a common one
[21:29] <ploopkazo> so once my xml+image dump is complete, how would I go about loading that into a local mw instance?
[21:29] <Nemo_bis> And of course it can happen that a wiki has a custom parser extension nobody has the code of
[21:30] <Nemo_bis> ploopkazo: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
[21:30] <Nemo_bis> ploopkazo: in theory Parsoid is near to the point where it doesn't even need MediaWiki to parse the wikitext; I'm not sure if someone tried yet
[21:31] <Nemo_bis> Maybe for simple wikis it works, in that case you'd "just" run the node.js service, feed it with wikitext from the XML and get your HTML
[21:32] <Nemo_bis> No idea how it expands templates though
[21:34] <Nemo_bis> ploopkazo: ah and if you want to make this scale you probably need to look into installation automatisms like https://www.mediawiki.org/wiki/MediaWiki-Vagrant or https://github.com/wikimedia/mediawiki-containers
[21:36] <ploopkazo> Nemo_bis: is the format dumpgenerator.py creates practically identical to the format wikimedia releases their dumps in?
[21:36] <Nemo_bis> Now sorry if I overloaded you with information instead of making your life simpler... but if you succeed, that's the holy grail. :) 
[21:37] <Nemo_bis> ploopkazo: the format is identical for all wikis (given a release, of course).
[21:37] <Nemo_bis> ploopkazo: but each page is just a long blob of text that might contain anything
[21:38] <Nemo_bis> If your question is whether mwimport is supposed to work, the answer is (suprisingly) yes
[21:39] <Nemo_bis> The basics of the database schema for MediaWiki didn't change since 2005 when mwimport was created
[21:41] <Nemo_bis> Anyway, for more specific help if you get stuck somewhere ask cscott (for Parsoid-without-MediaWiki) and Kelson on #kiwix (for mwoffliner, dumpHTML etc.), both on Freenode. Now I'm going to bed :)
[21:46] <ploopkazo> thanks
[23:15] *** Start has joined #wikiteam