[13:24] emijrp, http://wiki.guildwars.com/wiki/Guild_Wars_Wiki:Reporting_wiki_bugs#Can.27t_export_Guild:Its_Mocha_.28historical.29 [13:24] There are other 4 pages too big to be downloaded with the script. [13:25] So I just downloaded them with Special:Export, three of them are ~800 MiB and one 1.1 GiB. [13:26] i know [13:26] And I'll just compress them along with the xml produced by the script. [13:26] ok [13:26] With another wiki I failed... [13:26] I put titles at the end of the list to retry later [13:26] did you remove thos huge titles from title list? [13:26] yes [13:26] But it didn't work so I removed them and resumed the dump. [13:27] But even so it's not correct [13:27] uhm, I deleted the numbers but there was a difference of 1 revision and 5 titles found with grep [13:27] No idea why [13:28] So I'm re-downloading everything. Without those pages it's quite fast, it doesn't have to retry and be resumed etc etc. [13:28] what is the grep output? [13:31] lost it :-/ [13:33] Ah, I hadn't emptied the trash yet, rerunning it now [13:34] It will take a while... [13:38] in top, what's the difference between Virtual Image, Resident size, Data+Stack size, Shared Mem size? [13:42] dunno [13:43] 1343952 [13:43] 1343953 [13:43] 135991 [13:43] 135997 [13:43] Here it is: 135997 [13:45] 998 MB of RAM and growing... will it fail or not? [13:50] i can send you the script to remove bad pages [13:50] re-download due to a few errors is not a nice idea [13:51] probably it will fail again [13:56] http://pastebin.com/taTBzhBa [13:56] input is a.xml a corrupt dump [13:56] output is b.xml a dump excluding the corrupt pages in a.xml [13:56] it is not very tested [13:57] i used it to curate deletionpedia dump [13:58] and i will use it with citizendium too