[00:04] gui77: what wiki is that? [00:04] Nemo_bis: https://wikiapiary.com/wiki/Archlinux_wiki [00:05] notice that that is the .se one, not english [00:05] It's not a problem that the dump already exists, a new one is always appreciated. We just have no easy way for non-admins to edit existing items [00:05] the .de and english ones I'm still downloading so dunno if they're up yet [00:05] Nemo_bis: ah. any way for me to upload the new one then, if it's worth it? uploader.py didn't seem to give me any options [00:06] Sorry, it seems I just uploaded it a week ago. https://archive.org/details/wiki-wikiarchlinuxse [00:06] WikiApiary is not updated real-time... [00:07] I'm afraid there is no concrete option. The wiki is so small that you could email me the dump though [00:08] eh, if you did it last week there shouldn't be a need for a new version. let me try and wrap my head around wikiapiary - does it automatically verify which wikis are uploaded? [00:08] or is it a manual process? [00:09] It's an automatic bot, but it's started manually, among other reasons because it's still quite hacky and it takes many hours to run (a day even, IIRC). [00:09] We handle manually a few hundreds ambiguous cases. [00:11] Anyway, gui77, if you have a lot of bandwidth/space as suggested by your question on new warrior projects I can suggest some bigger wikis which would keep your machine busier. [00:12] Alternatively, if you prefer small wikis we have a list of wikis on which I'm not working because they failed for me: https://code.google.com/p/wikiteam/source/browse/trunk/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt [00:12] I'd definitely appreciate some bigger wikis, especially since it's a cumbersome process for small stuff. [00:12] preferably under 200gb each though :p [00:12] 200 GB compressed or total? [00:13] total [00:13] i've got the bw but kind of low on disk space hehe [00:13] One wiki nobody is working on is encyclopedia dramatica, someone speculates it's about 1-2 TB now [00:14] Hm. Wikis with many pages rarely take much bandwidth, wikis with many files do. [00:14] Anyway, I have a half completed dump you may want to continue [00:15] I don't mind trying to complete yours, if it's feasible [00:16] currently trying that list that failed for you, but so far it's failing for me too [00:16] Yes, you'd need manual tweaks [00:17] like testing if webserver returns correct error codes, where index.php and api.php are, if special:export works when used manually etc. [00:18] I think I may have lost that dump, in what dir was it grrr [00:18] I had to stop it because the XML was 100+ GB and I run out of disk, I compressed for later completion [00:20] I'd have to download it from you first to resume it wouldn't I? [00:20] yeah I've only got 200GB myself at the moment - won't it go over that? [00:20] Ah. Yes it would. Well I'm tarring up some smaller ones [00:29] gui77: http://koti.kapsi.fi/~federico/tmp/incomplete-dumps.tar [00:30] Untar the 4 directories, then bunzip2 the files inside them (one or more at once) and resume the dumping [00:30] the api.php or index.php URL to use is in the config.txt files of each [00:30] Now I'm going to bed [00:30] I'll try [00:30] thanks [00:30] good night! [08:42] gui77: have you downloaded the file? [17:33] oh hey. I did but something popped up and I've been busy since mroning, haven't had a chance to look at it yet? >.< [18:56] oh god, wikia s* is even bigger. 116G or seomthing [18:56] poor IA :( [18:56] almost done with grabbing btw [21:04] http://the-house-of-anubis.wikia.com/ seems to be discovered with all new images every time i re-run over the existing dump [21:04] 149 images were found in the directory from a previous session [21:04] Image list was completed in the previous session [21:04] Retrieving images from ".nm.jpg" [21:04] and then it downloads [21:11] and i know i have seem this line from it before and laughed: Filename is too long, truncating. Now it is: 11436-Young-Woman-Pausing-In-A-Shoe-And-Purse-Store-To-Talk-On-Her-Cellphone-While-Shopping-Clipart-4f784878fb508650682af23548a71c78.jpg [23:44] Yes, for some reason s* is always the biggest