[00:12] <Dud1> Would it be a bad idea to split a title.xml file into smaller chunks then put them back together at the end?
[00:13] <Dud1> *titles.txt file
[00:21] <Nemo_bis> Hm. Yes, probably rather bad. Every time you resume the download, dumpgenerator re-generates the XML based on the previous XML and the titles list. I've never understood how it works :) but it may destroy data on all pages which are not listed.
[00:22] <Nemo_bis> However, you can just produce multiple XML files and put them in a single archive, multi-step import is not a problem and 7z will be equally happy.
[00:29] <Dud1> Yeah it took like an hour and a half earlier to regenerate one. Cool, add one complete titles file or as many as I have xml files?
[00:30] <Nemo_bis> Dud1: I don't understand this last question
[00:32] <Dud1> When adding the multiple xml files, do I add one titles.txt file or one titles.txt file for each xml file?
[00:32] <Nemo_bis> Dud1: better use the original, full one
[00:32] <Nemo_bis> Just to reduce confusion
[00:39] <Dud1> Sweet, this will be so handy, thanks :)
[00:40] <Dud1> What is the biggest single wiki (other than wikipedia)?
[00:40] <Nemo_bis> Dud1: French Wiktionary?
[00:40] <Nemo_bis> It depends if you mean article count, page count, total size, size compressed, also images...
[22:21] <Schbirid> current status https://pastee.org/69utv
[22:22] <Schbirid> the size rows without a letter are my current counts, i run the script 2-3 times over each of my domain list split files until no more images were downloaded
[22:22] <Schbirid> its slow but i dont need that server atm so no biggie
[22:25] <Nemo_bis> better so then :)
[22:25] <Nemo_bis> hehe, 111 GB for a single letter :)
[22:26] <Schbirid> it's the hongkong bus one :D
[22:28] <Nemo_bis> lol right