#wikiteam 2014-02-27,Thu

↑back Search

Time	Nickname	Message
00:12 ^🔗	Dud1	Would it be a bad idea to split a title.xml file into smaller chunks then put them back together at the end?
00:13 ^🔗	Dud1	*titles.txt file
00:21 ^🔗	Nemo_bis	Hm. Yes, probably rather bad. Every time you resume the download, dumpgenerator re-generates the XML based on the previous XML and the titles list. I've never understood how it works :) but it may destroy data on all pages which are not listed.
00:22 ^🔗	Nemo_bis	However, you can just produce multiple XML files and put them in a single archive, multi-step import is not a problem and 7z will be equally happy.
00:29 ^🔗	Dud1	Yeah it took like an hour and a half earlier to regenerate one. Cool, add one complete titles file or as many as I have xml files?
00:30 ^🔗	Nemo_bis	Dud1: I don't understand this last question
00:32 ^🔗	Dud1	When adding the multiple xml files, do I add one titles.txt file or one titles.txt file for each xml file?
00:32 ^🔗	Nemo_bis	Dud1: better use the original, full one
00:32 ^🔗	Nemo_bis	Just to reduce confusion
00:39 ^🔗	Dud1	Sweet, this will be so handy, thanks :)
00:40 ^🔗	Dud1	What is the biggest single wiki (other than wikipedia)?
00:40 ^🔗	Nemo_bis	Dud1: French Wiktionary?
00:40 ^🔗	Nemo_bis	It depends if you mean article count, page count, total size, size compressed, also images...
22:21 ^🔗	Schbirid	current status https://pastee.org/69utv
22:22 ^🔗	Schbirid	the size rows without a letter are my current counts, i run the script 2-3 times over each of my domain list split files until no more images were downloaded
22:22 ^🔗	Schbirid	its slow but i dont need that server atm so no biggie
22:25 ^🔗	Nemo_bis	better so then :)
22:25 ^🔗	Nemo_bis	hehe, 111 GB for a single letter :)
22:26 ^🔗	Schbirid	it's the hongkong bus one :D
22:28 ^🔗	Nemo_bis	lol right

irclogger-viewer