| Time |
Nickname |
Message |
|
00:11
🔗
|
|
Start has joined #wikiteam |
|
01:28
🔗
|
|
kyan has joined #wikiteam |
|
01:51
🔗
|
|
vitzli has joined #wikiteam |
|
03:45
🔗
|
|
kyan has quit IRC (This computer has gone to sleep) |
|
03:48
🔗
|
|
vitzli has quit IRC (Leaving) |
|
06:03
🔗
|
|
vitzli has joined #wikiteam |
|
06:11
🔗
|
|
vitzli has quit IRC (Leaving) |
|
06:22
🔗
|
|
vitzli has joined #wikiteam |
|
08:32
🔗
|
|
kyan has joined #wikiteam |
|
08:48
🔗
|
|
kyan has quit IRC (Leaving) |
|
13:57
🔗
|
* |
Nemo_bis screams https://github.com/WikiTeam/wikiteam/issues/269 |
|
14:09
🔗
|
vitzli |
could you share wikipuella_maginet-20160211-images.txt please? |
|
14:09
🔗
|
vitzli |
I'd like to stare at it |
|
14:17
🔗
|
Nemo_bis |
vitzli: the reporter didn't attach it |
|
14:18
🔗
|
vitzli |
I thought it was you, sorry |
|
14:59
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
15:39
🔗
|
|
Start has joined #wikiteam |
|
15:52
🔗
|
|
Fletcher has quit IRC (Ping timeout: 252 seconds) |
|
15:56
🔗
|
|
midas has quit IRC (Ping timeout: 260 seconds) |
|
16:05
🔗
|
|
midas has joined #wikiteam |
|
16:21
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
16:24
🔗
|
|
Start has joined #wikiteam |
|
16:25
🔗
|
|
Start has quit IRC (Remote host closed the connection) |
|
16:25
🔗
|
|
Start has joined #wikiteam |
|
16:40
🔗
|
|
Fletcher has joined #wikiteam |
|
17:07
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
18:37
🔗
|
|
svchfoo3 has quit IRC (Read error: Operation timed out) |
|
18:39
🔗
|
|
svchfoo3 has joined #wikiteam |
|
18:39
🔗
|
|
svchfoo1 sets mode: +o svchfoo3 |
|
18:42
🔗
|
|
Start has joined #wikiteam |
|
18:43
🔗
|
|
vitzli has quit IRC (Leaving) |
|
19:19
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
19:23
🔗
|
|
Start has joined #wikiteam |
|
20:28
🔗
|
|
ploopkazo has joined #wikiteam |
|
20:29
🔗
|
ploopkazo |
once a wiki is downloaded with dumpgenerator.py, is there a way to turn it into a zim for use with a zim reader like kiwix? |
|
20:45
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
20:55
🔗
|
Nemo_bis |
ploopkazo: not really |
|
20:55
🔗
|
Nemo_bis |
ploopkazo: are you trying to make a ZIM file for a dead wiki? |
|
21:15
🔗
|
ploopkazo |
Nemo_bis: no, it's still online |
|
21:17
🔗
|
Nemo_bis |
ploopkazo: then you should install Parsoid and run mwoffliner with it |
|
21:17
🔗
|
ploopkazo |
Nemo_bis: how does one view a dumpgenerator.py dump if not conversion to zim? |
|
21:17
🔗
|
Nemo_bis |
The XML dump can be parsed properly only by MediaWiki itself |
|
21:18
🔗
|
Nemo_bis |
dumpgenerator.py attempts to collect all the information one will need to create a clone of the original wiki |
|
21:18
🔗
|
ploopkazo |
oh |
|
21:18
🔗
|
xmc |
welllll there's another tool that reads mediawiki xml dumps https://github.com/chronomex/wikiscraper |
|
21:18
🔗
|
xmc |
not full fidelity but it's fun to play with |
|
21:19
🔗
|
xmc |
:) |
|
21:20
🔗
|
ploopkazo |
so if I want to convert an xml dump to zim, my only real option is to run php+mysql and then scrape my local instance with mwoffliner? |
|
21:21
🔗
|
Nemo_bis |
AFAIK yes |
|
21:21
🔗
|
ploopkazo |
how easy is it to load the dump into a mw instance? |
|
21:21
🔗
|
Nemo_bis |
Well, there's also the old dumpHTML way but that's a bit hacky |
|
21:21
🔗
|
Nemo_bis |
Depends on the wiki size and extensions |
|
21:23
🔗
|
ploopkazo |
what kind of extensions? I haven't run mediawiki before |
|
21:23
🔗
|
Nemo_bis |
Can you link the wiki in question? |
|
21:23
🔗
|
ploopkazo |
at the moment, https://wiki.puella-magi.net |
|
21:23
🔗
|
ploopkazo |
though I imagine I'll collect a lot of them once I have the process figured out |
|
21:24
🔗
|
Nemo_bis |
Ok, that's an easy one |
|
21:24
🔗
|
ploopkazo |
oh, http |
|
21:25
🔗
|
ploopkazo |
https isn't loading for some reason |
|
21:25
🔗
|
ploopkazo |
though https is the one in my history |
|
21:25
🔗
|
Nemo_bis |
loaded for me |
|
21:25
🔗
|
xmc |
pretty slow on my end |
|
21:25
🔗
|
ploopkazo |
weird |
|
21:25
🔗
|
ploopkazo |
Nemo_bis: how do you tell it's an easy one? is there an info page with the installed extensions or something? |
|
21:26
🔗
|
Nemo_bis |
ploopkazo: http://wiki.puella-magi.net/Special:Version you're especially interested in the parser-related things |
|
21:27
🔗
|
ploopkazo |
parsoid is always desirable, right? |
|
21:27
🔗
|
Nemo_bis |
ploopkazo: in theory yes but in practice few use it outside Wikimedia and Wikimedia-like installs |
|
21:27
🔗
|
Nemo_bis |
And it doesn't work even for some Wikimedia wikis yet |
|
21:27
🔗
|
ploopkazo |
hmm |
|
21:28
🔗
|
Nemo_bis |
In theory the worse that can happen is that it doesn't parse some pages |
|
21:28
🔗
|
ploopkazo |
which things on that wiki make it easy? |
|
21:28
🔗
|
Nemo_bis |
It's a small wiki and it only has the most common parser extensions |
|
21:28
🔗
|
ploopkazo |
oh |
|
21:28
🔗
|
Nemo_bis |
<gallery>, <math>, <nowiki>, <pre>, <ref> and <references> |
|
21:29
🔗
|
Nemo_bis |
<math> can be nasty but it's still a common one |
|
21:29
🔗
|
ploopkazo |
so once my xml+image dump is complete, how would I go about loading that into a local mw instance? |
|
21:29
🔗
|
Nemo_bis |
And of course it can happen that a wiki has a custom parser extension nobody has the code of |
|
21:30
🔗
|
Nemo_bis |
ploopkazo: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing |
|
21:30
🔗
|
Nemo_bis |
ploopkazo: in theory Parsoid is near to the point where it doesn't even need MediaWiki to parse the wikitext; I'm not sure if someone tried yet |
|
21:31
🔗
|
Nemo_bis |
Maybe for simple wikis it works, in that case you'd "just" run the node.js service, feed it with wikitext from the XML and get your HTML |
|
21:32
🔗
|
Nemo_bis |
No idea how it expands templates though |
|
21:34
🔗
|
Nemo_bis |
ploopkazo: ah and if you want to make this scale you probably need to look into installation automatisms like https://www.mediawiki.org/wiki/MediaWiki-Vagrant or https://github.com/wikimedia/mediawiki-containers |
|
21:36
🔗
|
ploopkazo |
Nemo_bis: is the format dumpgenerator.py creates practically identical to the format wikimedia releases their dumps in? |
|
21:36
🔗
|
Nemo_bis |
Now sorry if I overloaded you with information instead of making your life simpler... but if you succeed, that's the holy grail. :) |
|
21:37
🔗
|
Nemo_bis |
ploopkazo: the format is identical for all wikis (given a release, of course). |
|
21:37
🔗
|
Nemo_bis |
ploopkazo: but each page is just a long blob of text that might contain anything |
|
21:38
🔗
|
Nemo_bis |
If your question is whether mwimport is supposed to work, the answer is (suprisingly) yes |
|
21:39
🔗
|
Nemo_bis |
The basics of the database schema for MediaWiki didn't change since 2005 when mwimport was created |
|
21:41
🔗
|
Nemo_bis |
Anyway, for more specific help if you get stuck somewhere ask cscott (for Parsoid-without-MediaWiki) and Kelson on #kiwix (for mwoffliner, dumpHTML etc.), both on Freenode. Now I'm going to bed :) |
|
21:46
🔗
|
ploopkazo |
thanks |
|
23:15
🔗
|
|
Start has joined #wikiteam |