Time |
Nickname |
Message |
13:24
🔗
|
Nemo_bis |
emijrp, http://wiki.guildwars.com/wiki/Guild_Wars_Wiki:Reporting_wiki_bugs#Can.27t_export_Guild:Its_Mocha_.28historical.29 |
13:24
🔗
|
Nemo_bis |
There are other 4 pages too big to be downloaded with the script. |
13:25
🔗
|
Nemo_bis |
So I just downloaded them with Special:Export, three of them are ~800 MiB and one 1.1 GiB. |
13:26
🔗
|
emijrp |
i know |
13:26
🔗
|
Nemo_bis |
And I'll just compress them along with the xml produced by the script. |
13:26
🔗
|
emijrp |
ok |
13:26
🔗
|
Nemo_bis |
With another wiki I failed... |
13:26
🔗
|
Nemo_bis |
I put titles at the end of the list to retry later |
13:26
🔗
|
emijrp |
did you remove thos huge titles from title list? |
13:26
🔗
|
Nemo_bis |
yes |
13:26
🔗
|
Nemo_bis |
But it didn't work so I removed them and resumed the dump. |
13:27
🔗
|
Nemo_bis |
But even so it's not correct |
13:27
🔗
|
Nemo_bis |
uhm, I deleted the numbers but there was a difference of 1 revision and 5 titles found with grep |
13:27
🔗
|
Nemo_bis |
No idea why |
13:28
🔗
|
Nemo_bis |
So I'm re-downloading everything. Without those pages it's quite fast, it doesn't have to retry and be resumed etc etc. |
13:28
🔗
|
emijrp |
what is the grep output? |
13:31
🔗
|
Nemo_bis |
lost it :-/ |
13:33
🔗
|
Nemo_bis |
Ah, I hadn't emptied the trash yet, rerunning it now |
13:34
🔗
|
Nemo_bis |
It will take a while... |
13:38
🔗
|
Nemo_bis |
in top, what's the difference between Virtual Image, Resident size, Data+Stack size, Shared Mem size? |
13:42
🔗
|
emijrp |
dunno |
13:43
🔗
|
Nemo_bis |
1343952 |
13:43
🔗
|
Nemo_bis |
1343953 |
13:43
🔗
|
Nemo_bis |
135991 |
13:43
🔗
|
Nemo_bis |
135997 |
13:43
🔗
|
Nemo_bis |
Here it is: 135997 |
13:45
🔗
|
Nemo_bis |
998 MB of RAM and growing... will it fail or not? |
13:50
🔗
|
emijrp |
i can send you the script to remove bad pages |
13:50
🔗
|
emijrp |
re-download due to a few errors is not a nice idea |
13:51
🔗
|
emijrp |
probably it will fail again |
13:56
🔗
|
emijrp |
http://pastebin.com/taTBzhBa |
13:56
🔗
|
emijrp |
input is a.xml a corrupt dump |
13:56
🔗
|
emijrp |
output is b.xml a dump excluding the corrupt pages in a.xml |
13:56
🔗
|
emijrp |
it is not very tested |
13:57
🔗
|
emijrp |
i used it to curate deletionpedia dump |
13:58
🔗
|
emijrp |
and i will use it with citizendium too |