#wikiteam 2016-02-11,Thu

↑back Search

Time	Nickname	Message
00:11 ^🔗		Start has joined #wikiteam
01:28 ^🔗		kyan has joined #wikiteam
01:51 ^🔗		vitzli has joined #wikiteam
03:45 ^🔗		kyan has quit IRC (This computer has gone to sleep)
03:48 ^🔗		vitzli has quit IRC (Leaving)
06:03 ^🔗		vitzli has joined #wikiteam
06:11 ^🔗		vitzli has quit IRC (Leaving)
06:22 ^🔗		vitzli has joined #wikiteam
08:32 ^🔗		kyan has joined #wikiteam
08:48 ^🔗		kyan has quit IRC (Leaving)
13:57 ^🔗	*	Nemo_bis screams https://github.com/WikiTeam/wikiteam/issues/269
14:09 ^🔗	vitzli	could you share wikipuella_maginet-20160211-images.txt please?
14:09 ^🔗	vitzli	I'd like to stare at it
14:17 ^🔗	Nemo_bis	vitzli: the reporter didn't attach it
14:18 ^🔗	vitzli	I thought it was you, sorry
14:59 ^🔗		Start has quit IRC (Quit: Disconnected.)
15:39 ^🔗		Start has joined #wikiteam
15:52 ^🔗		Fletcher has quit IRC (Ping timeout: 252 seconds)
15:56 ^🔗		midas has quit IRC (Ping timeout: 260 seconds)
16:05 ^🔗		midas has joined #wikiteam
16:21 ^🔗		Start has quit IRC (Quit: Disconnected.)
16:24 ^🔗		Start has joined #wikiteam
16:25 ^🔗		Start has quit IRC (Remote host closed the connection)
16:25 ^🔗		Start has joined #wikiteam
16:40 ^🔗		Fletcher has joined #wikiteam
17:07 ^🔗		Start has quit IRC (Quit: Disconnected.)
18:37 ^🔗		svchfoo3 has quit IRC (Read error: Operation timed out)
18:39 ^🔗		svchfoo3 has joined #wikiteam
18:39 ^🔗		svchfoo1 sets mode: +o svchfoo3
18:42 ^🔗		Start has joined #wikiteam
18:43 ^🔗		vitzli has quit IRC (Leaving)
19:19 ^🔗		Start has quit IRC (Quit: Disconnected.)
19:23 ^🔗		Start has joined #wikiteam
20:28 ^🔗		ploopkazo has joined #wikiteam
20:29 ^🔗	ploopkazo	once a wiki is downloaded with dumpgenerator.py, is there a way to turn it into a zim for use with a zim reader like kiwix?
20:45 ^🔗		Start has quit IRC (Quit: Disconnected.)
20:55 ^🔗	Nemo_bis	ploopkazo: not really
20:55 ^🔗	Nemo_bis	ploopkazo: are you trying to make a ZIM file for a dead wiki?
21:15 ^🔗	ploopkazo	Nemo_bis: no, it's still online
21:17 ^🔗	Nemo_bis	ploopkazo: then you should install Parsoid and run mwoffliner with it
21:17 ^🔗	ploopkazo	Nemo_bis: how does one view a dumpgenerator.py dump if not conversion to zim?
21:17 ^🔗	Nemo_bis	The XML dump can be parsed properly only by MediaWiki itself
21:18 ^🔗	Nemo_bis	dumpgenerator.py attempts to collect all the information one will need to create a clone of the original wiki
21:18 ^🔗	ploopkazo	oh
21:18 ^🔗	xmc	welllll there's another tool that reads mediawiki xml dumps https://github.com/chronomex/wikiscraper
21:18 ^🔗	xmc	not full fidelity but it's fun to play with
21:19 ^🔗	xmc	:)
21:20 ^🔗	ploopkazo	so if I want to convert an xml dump to zim, my only real option is to run php+mysql and then scrape my local instance with mwoffliner?
21:21 ^🔗	Nemo_bis	AFAIK yes
21:21 ^🔗	ploopkazo	how easy is it to load the dump into a mw instance?
21:21 ^🔗	Nemo_bis	Well, there's also the old dumpHTML way but that's a bit hacky
21:21 ^🔗	Nemo_bis	Depends on the wiki size and extensions
21:23 ^🔗	ploopkazo	what kind of extensions? I haven't run mediawiki before
21:23 ^🔗	Nemo_bis	Can you link the wiki in question?
21:23 ^🔗	ploopkazo	at the moment, https://wiki.puella-magi.net
21:23 ^🔗	ploopkazo	though I imagine I'll collect a lot of them once I have the process figured out
21:24 ^🔗	Nemo_bis	Ok, that's an easy one
21:24 ^🔗	ploopkazo	oh, http
21:25 ^🔗	ploopkazo	https isn't loading for some reason
21:25 ^🔗	ploopkazo	though https is the one in my history
21:25 ^🔗	Nemo_bis	loaded for me
21:25 ^🔗	xmc	pretty slow on my end
21:25 ^🔗	ploopkazo	weird
21:25 ^🔗	ploopkazo	Nemo_bis: how do you tell it's an easy one? is there an info page with the installed extensions or something?
21:26 ^🔗	Nemo_bis	ploopkazo: http://wiki.puella-magi.net/Special:Version you're especially interested in the parser-related things
21:27 ^🔗	ploopkazo	parsoid is always desirable, right?
21:27 ^🔗	Nemo_bis	ploopkazo: in theory yes but in practice few use it outside Wikimedia and Wikimedia-like installs
21:27 ^🔗	Nemo_bis	And it doesn't work even for some Wikimedia wikis yet
21:27 ^🔗	ploopkazo	hmm
21:28 ^🔗	Nemo_bis	In theory the worse that can happen is that it doesn't parse some pages
21:28 ^🔗	ploopkazo	which things on that wiki make it easy?
21:28 ^🔗	Nemo_bis	It's a small wiki and it only has the most common parser extensions
21:28 ^🔗	ploopkazo	oh
21:28 ^🔗	Nemo_bis	<gallery>, <math>, <nowiki>, <pre>, <ref> and <references>
21:29 ^🔗	Nemo_bis	<math> can be nasty but it's still a common one
21:29 ^🔗	ploopkazo	so once my xml+image dump is complete, how would I go about loading that into a local mw instance?
21:29 ^🔗	Nemo_bis	And of course it can happen that a wiki has a custom parser extension nobody has the code of
21:30 ^🔗	Nemo_bis	ploopkazo: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
21:30 ^🔗	Nemo_bis	ploopkazo: in theory Parsoid is near to the point where it doesn't even need MediaWiki to parse the wikitext; I'm not sure if someone tried yet
21:31 ^🔗	Nemo_bis	Maybe for simple wikis it works, in that case you'd "just" run the node.js service, feed it with wikitext from the XML and get your HTML
21:32 ^🔗	Nemo_bis	No idea how it expands templates though
21:34 ^🔗	Nemo_bis	ploopkazo: ah and if you want to make this scale you probably need to look into installation automatisms like https://www.mediawiki.org/wiki/MediaWiki-Vagrant or https://github.com/wikimedia/mediawiki-containers
21:36 ^🔗	ploopkazo	Nemo_bis: is the format dumpgenerator.py creates practically identical to the format wikimedia releases their dumps in?
21:36 ^🔗	Nemo_bis	Now sorry if I overloaded you with information instead of making your life simpler... but if you succeed, that's the holy grail. :)
21:37 ^🔗	Nemo_bis	ploopkazo: the format is identical for all wikis (given a release, of course).
21:37 ^🔗	Nemo_bis	ploopkazo: but each page is just a long blob of text that might contain anything
21:38 ^🔗	Nemo_bis	If your question is whether mwimport is supposed to work, the answer is (suprisingly) yes
21:39 ^🔗	Nemo_bis	The basics of the database schema for MediaWiki didn't change since 2005 when mwimport was created
21:41 ^🔗	Nemo_bis	Anyway, for more specific help if you get stuck somewhere ask cscott (for Parsoid-without-MediaWiki) and Kelson on #kiwix (for mwoffliner, dumpHTML etc.), both on Freenode. Now I'm going to bed :)
21:46 ^🔗	ploopkazo	thanks
23:15 ^🔗		Start has joined #wikiteam

irclogger-viewer