#wikiteam 2016-02-11,Thu

↑back Search

Time Nickname Message
00:11 🔗 Start has joined #wikiteam
01:28 🔗 kyan has joined #wikiteam
01:51 🔗 vitzli has joined #wikiteam
03:45 🔗 kyan has quit IRC (This computer has gone to sleep)
03:48 🔗 vitzli has quit IRC (Leaving)
06:03 🔗 vitzli has joined #wikiteam
06:11 🔗 vitzli has quit IRC (Leaving)
06:22 🔗 vitzli has joined #wikiteam
08:32 🔗 kyan has joined #wikiteam
08:48 🔗 kyan has quit IRC (Leaving)
13:57 🔗 * Nemo_bis screams https://github.com/WikiTeam/wikiteam/issues/269
14:09 🔗 vitzli could you share wikipuella_maginet-20160211-images.txt please?
14:09 🔗 vitzli I'd like to stare at it
14:17 🔗 Nemo_bis vitzli: the reporter didn't attach it
14:18 🔗 vitzli I thought it was you, sorry
14:59 🔗 Start has quit IRC (Quit: Disconnected.)
15:39 🔗 Start has joined #wikiteam
15:52 🔗 Fletcher has quit IRC (Ping timeout: 252 seconds)
15:56 🔗 midas has quit IRC (Ping timeout: 260 seconds)
16:05 🔗 midas has joined #wikiteam
16:21 🔗 Start has quit IRC (Quit: Disconnected.)
16:24 🔗 Start has joined #wikiteam
16:25 🔗 Start has quit IRC (Remote host closed the connection)
16:25 🔗 Start has joined #wikiteam
16:40 🔗 Fletcher has joined #wikiteam
17:07 🔗 Start has quit IRC (Quit: Disconnected.)
18:37 🔗 svchfoo3 has quit IRC (Read error: Operation timed out)
18:39 🔗 svchfoo3 has joined #wikiteam
18:39 🔗 svchfoo1 sets mode: +o svchfoo3
18:42 🔗 Start has joined #wikiteam
18:43 🔗 vitzli has quit IRC (Leaving)
19:19 🔗 Start has quit IRC (Quit: Disconnected.)
19:23 🔗 Start has joined #wikiteam
20:28 🔗 ploopkazo has joined #wikiteam
20:29 🔗 ploopkazo once a wiki is downloaded with dumpgenerator.py, is there a way to turn it into a zim for use with a zim reader like kiwix?
20:45 🔗 Start has quit IRC (Quit: Disconnected.)
20:55 🔗 Nemo_bis ploopkazo: not really
20:55 🔗 Nemo_bis ploopkazo: are you trying to make a ZIM file for a dead wiki?
21:15 🔗 ploopkazo Nemo_bis: no, it's still online
21:17 🔗 Nemo_bis ploopkazo: then you should install Parsoid and run mwoffliner with it
21:17 🔗 ploopkazo Nemo_bis: how does one view a dumpgenerator.py dump if not conversion to zim?
21:17 🔗 Nemo_bis The XML dump can be parsed properly only by MediaWiki itself
21:18 🔗 Nemo_bis dumpgenerator.py attempts to collect all the information one will need to create a clone of the original wiki
21:18 🔗 ploopkazo oh
21:18 🔗 xmc welllll there's another tool that reads mediawiki xml dumps https://github.com/chronomex/wikiscraper
21:18 🔗 xmc not full fidelity but it's fun to play with
21:19 🔗 xmc :)
21:20 🔗 ploopkazo so if I want to convert an xml dump to zim, my only real option is to run php+mysql and then scrape my local instance with mwoffliner?
21:21 🔗 Nemo_bis AFAIK yes
21:21 🔗 ploopkazo how easy is it to load the dump into a mw instance?
21:21 🔗 Nemo_bis Well, there's also the old dumpHTML way but that's a bit hacky
21:21 🔗 Nemo_bis Depends on the wiki size and extensions
21:23 🔗 ploopkazo what kind of extensions? I haven't run mediawiki before
21:23 🔗 Nemo_bis Can you link the wiki in question?
21:23 🔗 ploopkazo at the moment, https://wiki.puella-magi.net
21:23 🔗 ploopkazo though I imagine I'll collect a lot of them once I have the process figured out
21:24 🔗 Nemo_bis Ok, that's an easy one
21:24 🔗 ploopkazo oh, http
21:25 🔗 ploopkazo https isn't loading for some reason
21:25 🔗 ploopkazo though https is the one in my history
21:25 🔗 Nemo_bis loaded for me
21:25 🔗 xmc pretty slow on my end
21:25 🔗 ploopkazo weird
21:25 🔗 ploopkazo Nemo_bis: how do you tell it's an easy one? is there an info page with the installed extensions or something?
21:26 🔗 Nemo_bis ploopkazo: http://wiki.puella-magi.net/Special:Version you're especially interested in the parser-related things
21:27 🔗 ploopkazo parsoid is always desirable, right?
21:27 🔗 Nemo_bis ploopkazo: in theory yes but in practice few use it outside Wikimedia and Wikimedia-like installs
21:27 🔗 Nemo_bis And it doesn't work even for some Wikimedia wikis yet
21:27 🔗 ploopkazo hmm
21:28 🔗 Nemo_bis In theory the worse that can happen is that it doesn't parse some pages
21:28 🔗 ploopkazo which things on that wiki make it easy?
21:28 🔗 Nemo_bis It's a small wiki and it only has the most common parser extensions
21:28 🔗 ploopkazo oh
21:28 🔗 Nemo_bis <gallery>, <math>, <nowiki>, <pre>, <ref> and <references>
21:29 🔗 Nemo_bis <math> can be nasty but it's still a common one
21:29 🔗 ploopkazo so once my xml+image dump is complete, how would I go about loading that into a local mw instance?
21:29 🔗 Nemo_bis And of course it can happen that a wiki has a custom parser extension nobody has the code of
21:30 🔗 Nemo_bis ploopkazo: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
21:30 🔗 Nemo_bis ploopkazo: in theory Parsoid is near to the point where it doesn't even need MediaWiki to parse the wikitext; I'm not sure if someone tried yet
21:31 🔗 Nemo_bis Maybe for simple wikis it works, in that case you'd "just" run the node.js service, feed it with wikitext from the XML and get your HTML
21:32 🔗 Nemo_bis No idea how it expands templates though
21:34 🔗 Nemo_bis ploopkazo: ah and if you want to make this scale you probably need to look into installation automatisms like https://www.mediawiki.org/wiki/MediaWiki-Vagrant or https://github.com/wikimedia/mediawiki-containers
21:36 🔗 ploopkazo Nemo_bis: is the format dumpgenerator.py creates practically identical to the format wikimedia releases their dumps in?
21:36 🔗 Nemo_bis Now sorry if I overloaded you with information instead of making your life simpler... but if you succeed, that's the holy grail. :)
21:37 🔗 Nemo_bis ploopkazo: the format is identical for all wikis (given a release, of course).
21:37 🔗 Nemo_bis ploopkazo: but each page is just a long blob of text that might contain anything
21:38 🔗 Nemo_bis If your question is whether mwimport is supposed to work, the answer is (suprisingly) yes
21:39 🔗 Nemo_bis The basics of the database schema for MediaWiki didn't change since 2005 when mwimport was created
21:41 🔗 Nemo_bis Anyway, for more specific help if you get stuck somewhere ask cscott (for Parsoid-without-MediaWiki) and Kelson on #kiwix (for mwoffliner, dumpHTML etc.), both on Freenode. Now I'm going to bed :)
21:46 🔗 ploopkazo thanks
23:15 🔗 Start has joined #wikiteam

irclogger-viewer