[00:11] *** Start has joined #wikiteam [01:28] *** kyan has joined #wikiteam [01:51] *** vitzli has joined #wikiteam [03:45] *** kyan has quit IRC (This computer has gone to sleep) [03:48] *** vitzli has quit IRC (Leaving) [06:03] *** vitzli has joined #wikiteam [06:11] *** vitzli has quit IRC (Leaving) [06:22] *** vitzli has joined #wikiteam [08:32] *** kyan has joined #wikiteam [08:48] *** kyan has quit IRC (Leaving) [13:57] * Nemo_bis screams https://github.com/WikiTeam/wikiteam/issues/269 [14:09] could you share wikipuella_maginet-20160211-images.txt please? [14:09] I'd like to stare at it [14:17] vitzli: the reporter didn't attach it [14:18] I thought it was you, sorry [14:59] *** Start has quit IRC (Quit: Disconnected.) [15:39] *** Start has joined #wikiteam [15:52] *** Fletcher has quit IRC (Ping timeout: 252 seconds) [15:56] *** midas has quit IRC (Ping timeout: 260 seconds) [16:05] *** midas has joined #wikiteam [16:21] *** Start has quit IRC (Quit: Disconnected.) [16:24] *** Start has joined #wikiteam [16:25] *** Start has quit IRC (Remote host closed the connection) [16:25] *** Start has joined #wikiteam [16:40] *** Fletcher has joined #wikiteam [17:07] *** Start has quit IRC (Quit: Disconnected.) [18:37] *** svchfoo3 has quit IRC (Read error: Operation timed out) [18:39] *** svchfoo3 has joined #wikiteam [18:39] *** svchfoo1 sets mode: +o svchfoo3 [18:42] *** Start has joined #wikiteam [18:43] *** vitzli has quit IRC (Leaving) [19:19] *** Start has quit IRC (Quit: Disconnected.) [19:23] *** Start has joined #wikiteam [20:28] *** ploopkazo has joined #wikiteam [20:29] once a wiki is downloaded with dumpgenerator.py, is there a way to turn it into a zim for use with a zim reader like kiwix? [20:45] *** Start has quit IRC (Quit: Disconnected.) [20:55] ploopkazo: not really [20:55] ploopkazo: are you trying to make a ZIM file for a dead wiki? [21:15] Nemo_bis: no, it's still online [21:17] ploopkazo: then you should install Parsoid and run mwoffliner with it [21:17] Nemo_bis: how does one view a dumpgenerator.py dump if not conversion to zim? [21:17] The XML dump can be parsed properly only by MediaWiki itself [21:18] dumpgenerator.py attempts to collect all the information one will need to create a clone of the original wiki [21:18] oh [21:18] welllll there's another tool that reads mediawiki xml dumps https://github.com/chronomex/wikiscraper [21:18] not full fidelity but it's fun to play with [21:19] :) [21:20] so if I want to convert an xml dump to zim, my only real option is to run php+mysql and then scrape my local instance with mwoffliner? [21:21] AFAIK yes [21:21] how easy is it to load the dump into a mw instance? [21:21] Well, there's also the old dumpHTML way but that's a bit hacky [21:21] Depends on the wiki size and extensions [21:23] what kind of extensions? I haven't run mediawiki before [21:23] Can you link the wiki in question? [21:23] at the moment, https://wiki.puella-magi.net [21:23] though I imagine I'll collect a lot of them once I have the process figured out [21:24] Ok, that's an easy one [21:24] oh, http [21:25] https isn't loading for some reason [21:25] though https is the one in my history [21:25] loaded for me [21:25] pretty slow on my end [21:25] weird [21:25] Nemo_bis: how do you tell it's an easy one? is there an info page with the installed extensions or something? [21:26] ploopkazo: http://wiki.puella-magi.net/Special:Version you're especially interested in the parser-related things [21:27] parsoid is always desirable, right? [21:27] ploopkazo: in theory yes but in practice few use it outside Wikimedia and Wikimedia-like installs [21:27] And it doesn't work even for some Wikimedia wikis yet [21:27] hmm [21:28] In theory the worse that can happen is that it doesn't parse some pages [21:28] which things on that wiki make it easy? [21:28] It's a small wiki and it only has the most common parser extensions [21:28] oh [21:28] , , ,
,  and 
[21:29]   can be nasty but it's still a common one
[21:29]  so once my xml+image dump is complete, how would I go about loading that into a local mw instance?
[21:29]  And of course it can happen that a wiki has a custom parser extension nobody has the code of
[21:30]  ploopkazo: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
[21:30]  ploopkazo: in theory Parsoid is near to the point where it doesn't even need MediaWiki to parse the wikitext; I'm not sure if someone tried yet
[21:31]  Maybe for simple wikis it works, in that case you'd "just" run the node.js service, feed it with wikitext from the XML and get your HTML
[21:32]  No idea how it expands templates though
[21:34]  ploopkazo: ah and if you want to make this scale you probably need to look into installation automatisms like https://www.mediawiki.org/wiki/MediaWiki-Vagrant or https://github.com/wikimedia/mediawiki-containers
[21:36]  Nemo_bis: is the format dumpgenerator.py creates practically identical to the format wikimedia releases their dumps in?
[21:36]  Now sorry if I overloaded you with information instead of making your life simpler... but if you succeed, that's the holy grail. :) 
[21:37]  ploopkazo: the format is identical for all wikis (given a release, of course).
[21:37]  ploopkazo: but each page is just a long blob of text that might contain anything
[21:38]  If your question is whether mwimport is supposed to work, the answer is (suprisingly) yes
[21:39]  The basics of the database schema for MediaWiki didn't change since 2005 when mwimport was created
[21:41]  Anyway, for more specific help if you get stuck somewhere ask cscott (for Parsoid-without-MediaWiki) and Kelson on #kiwix (for mwoffliner, dumpHTML etc.), both on Freenode. Now I'm going to bed :)
[21:46]  thanks
[23:15] *** Start has joined #wikiteam