#wikiteam 2014-03-02,Sun

↑back Search

Time Nickname Message
00:04 🔗 Nemo_bis gui77: what wiki is that?
00:04 🔗 gui77 Nemo_bis: https://wikiapiary.com/wiki/Archlinux_wiki
00:05 🔗 gui77 notice that that is the .se one, not english
00:05 🔗 Nemo_bis It's not a problem that the dump already exists, a new one is always appreciated. We just have no easy way for non-admins to edit existing items
00:05 🔗 gui77 the .de and english ones I'm still downloading so dunno if they're up yet
00:05 🔗 gui77 Nemo_bis: ah. any way for me to upload the new one then, if it's worth it? uploader.py didn't seem to give me any options
00:06 🔗 Nemo_bis Sorry, it seems I just uploaded it a week ago. https://archive.org/details/wiki-wikiarchlinuxse
00:06 🔗 Nemo_bis WikiApiary is not updated real-time...
00:07 🔗 Nemo_bis I'm afraid there is no concrete option. The wiki is so small that you could email me the dump though
00:08 🔗 gui77 eh, if you did it last week there shouldn't be a need for a new version. let me try and wrap my head around wikiapiary - does it automatically verify which wikis are uploaded?
00:08 🔗 gui77 or is it a manual process?
00:09 🔗 Nemo_bis It's an automatic bot, but it's started manually, among other reasons because it's still quite hacky and it takes many hours to run (a day even, IIRC).
00:09 🔗 Nemo_bis We handle manually a few hundreds ambiguous cases.
00:11 🔗 Nemo_bis Anyway, gui77, if you have a lot of bandwidth/space as suggested by your question on new warrior projects I can suggest some bigger wikis which would keep your machine busier.
00:12 🔗 Nemo_bis Alternatively, if you prefer small wikis we have a list of wikis on which I'm not working because they failed for me: https://code.google.com/p/wikiteam/source/browse/trunk/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt
00:12 🔗 gui77 I'd definitely appreciate some bigger wikis, especially since it's a cumbersome process for small stuff.
00:12 🔗 gui77 preferably under 200gb each though :p
00:12 🔗 Nemo_bis 200 GB compressed or total?
00:13 🔗 gui77 total
00:13 🔗 gui77 i've got the bw but kind of low on disk space hehe
00:13 🔗 Nemo_bis One wiki nobody is working on is encyclopedia dramatica, someone speculates it's about 1-2 TB now
00:14 🔗 Nemo_bis Hm. Wikis with many pages rarely take much bandwidth, wikis with many files do.
00:14 🔗 Nemo_bis Anyway, I have a half completed dump you may want to continue
00:15 🔗 gui77 I don't mind trying to complete yours, if it's feasible
00:16 🔗 gui77 currently trying that list that failed for you, but so far it's failing for me too
00:16 🔗 Nemo_bis Yes, you'd need manual tweaks
00:17 🔗 Nemo_bis like testing if webserver returns correct error codes, where index.php and api.php are, if special:export works when used manually etc.
00:18 🔗 Nemo_bis I think I may have lost that dump, in what dir was it grrr
00:18 🔗 Nemo_bis I had to stop it because the XML was 100+ GB and I run out of disk, I compressed for later completion
00:20 🔗 gui77 I'd have to download it from you first to resume it wouldn't I?
00:20 🔗 gui77 yeah I've only got 200GB myself at the moment - won't it go over that?
00:20 🔗 Nemo_bis Ah. Yes it would. Well I'm tarring up some smaller ones
00:29 🔗 Nemo_bis gui77: http://koti.kapsi.fi/~federico/tmp/incomplete-dumps.tar
00:30 🔗 Nemo_bis Untar the 4 directories, then bunzip2 the files inside them (one or more at once) and resume the dumping
00:30 🔗 Nemo_bis the api.php or index.php URL to use is in the config.txt files of each
00:30 🔗 Nemo_bis Now I'm going to bed
00:30 🔗 gui77 I'll try
00:30 🔗 gui77 thanks
00:30 🔗 gui77 good night!
08:42 🔗 Nemo_bis gui77: have you downloaded the file?
17:33 🔗 gui77 oh hey. I did but something popped up and I've been busy since mroning, haven't had a chance to look at it yet? >.<
18:56 🔗 Schbirid oh god, wikia s* is even bigger. 116G or seomthing
18:56 🔗 Schbirid poor IA :(
18:56 🔗 Schbirid almost done with grabbing btw
21:04 🔗 Schbirid http://the-house-of-anubis.wikia.com/ seems to be discovered with all new images every time i re-run over the existing dump
21:04 🔗 Schbirid 149 images were found in the directory from a previous session
21:04 🔗 Schbirid Image list was completed in the previous session
21:04 🔗 Schbirid Retrieving images from ".nm.jpg"
21:04 🔗 Schbirid and then it downloads
21:11 🔗 Schbirid and i know i have seem this line from it before and laughed: Filename is too long, truncating. Now it is: 11436-Young-Woman-Pausing-In-A-Shoe-And-Purse-Store-To-Talk-On-Her-Cellphone-While-Shopping-Clipart-4f784878fb508650682af23548a71c78.jpg
23:44 🔗 Nemo_bis Yes, for some reason s* is always the biggest

irclogger-viewer