[00:12] Would it be a bad idea to split a title.xml file into smaller chunks then put them back together at the end? [00:13] *titles.txt file [00:21] Hm. Yes, probably rather bad. Every time you resume the download, dumpgenerator re-generates the XML based on the previous XML and the titles list. I've never understood how it works :) but it may destroy data on all pages which are not listed. [00:22] However, you can just produce multiple XML files and put them in a single archive, multi-step import is not a problem and 7z will be equally happy. [00:29] Yeah it took like an hour and a half earlier to regenerate one. Cool, add one complete titles file or as many as I have xml files? [00:30] Dud1: I don't understand this last question [00:32] When adding the multiple xml files, do I add one titles.txt file or one titles.txt file for each xml file? [00:32] Dud1: better use the original, full one [00:32] Just to reduce confusion [00:39] Sweet, this will be so handy, thanks :) [00:40] What is the biggest single wiki (other than wikipedia)? [00:40] Dud1: French Wiktionary? [00:40] It depends if you mean article count, page count, total size, size compressed, also images... [22:21] current status https://pastee.org/69utv [22:22] the size rows without a letter are my current counts, i run the script 2-3 times over each of my domain list split files until no more images were downloaded [22:22] its slow but i dont need that server atm so no biggie [22:25] better so then :) [22:25] hehe, 111 GB for a single letter :) [22:26] it's the hongkong bus one :D [22:28] lol right