Time |
Nickname |
Message |
00:12
🔗
|
Dud1 |
Would it be a bad idea to split a title.xml file into smaller chunks then put them back together at the end? |
00:13
🔗
|
Dud1 |
*titles.txt file |
00:21
🔗
|
Nemo_bis |
Hm. Yes, probably rather bad. Every time you resume the download, dumpgenerator re-generates the XML based on the previous XML and the titles list. I've never understood how it works :) but it may destroy data on all pages which are not listed. |
00:22
🔗
|
Nemo_bis |
However, you can just produce multiple XML files and put them in a single archive, multi-step import is not a problem and 7z will be equally happy. |
00:29
🔗
|
Dud1 |
Yeah it took like an hour and a half earlier to regenerate one. Cool, add one complete titles file or as many as I have xml files? |
00:30
🔗
|
Nemo_bis |
Dud1: I don't understand this last question |
00:32
🔗
|
Dud1 |
When adding the multiple xml files, do I add one titles.txt file or one titles.txt file for each xml file? |
00:32
🔗
|
Nemo_bis |
Dud1: better use the original, full one |
00:32
🔗
|
Nemo_bis |
Just to reduce confusion |
00:39
🔗
|
Dud1 |
Sweet, this will be so handy, thanks :) |
00:40
🔗
|
Dud1 |
What is the biggest single wiki (other than wikipedia)? |
00:40
🔗
|
Nemo_bis |
Dud1: French Wiktionary? |
00:40
🔗
|
Nemo_bis |
It depends if you mean article count, page count, total size, size compressed, also images... |
22:21
🔗
|
Schbirid |
current status https://pastee.org/69utv |
22:22
🔗
|
Schbirid |
the size rows without a letter are my current counts, i run the script 2-3 times over each of my domain list split files until no more images were downloaded |
22:22
🔗
|
Schbirid |
its slow but i dont need that server atm so no biggie |
22:25
🔗
|
Nemo_bis |
better so then :) |
22:25
🔗
|
Nemo_bis |
hehe, 111 GB for a single letter :) |
22:26
🔗
|
Schbirid |
it's the hongkong bus one :D |
22:28
🔗
|
Nemo_bis |
lol right |