Time |
Nickname |
Message |
00:04
🔗
|
Nemo_bis |
gui77: what wiki is that? |
00:04
🔗
|
gui77 |
Nemo_bis: https://wikiapiary.com/wiki/Archlinux_wiki |
00:05
🔗
|
gui77 |
notice that that is the .se one, not english |
00:05
🔗
|
Nemo_bis |
It's not a problem that the dump already exists, a new one is always appreciated. We just have no easy way for non-admins to edit existing items |
00:05
🔗
|
gui77 |
the .de and english ones I'm still downloading so dunno if they're up yet |
00:05
🔗
|
gui77 |
Nemo_bis: ah. any way for me to upload the new one then, if it's worth it? uploader.py didn't seem to give me any options |
00:06
🔗
|
Nemo_bis |
Sorry, it seems I just uploaded it a week ago. https://archive.org/details/wiki-wikiarchlinuxse |
00:06
🔗
|
Nemo_bis |
WikiApiary is not updated real-time... |
00:07
🔗
|
Nemo_bis |
I'm afraid there is no concrete option. The wiki is so small that you could email me the dump though |
00:08
🔗
|
gui77 |
eh, if you did it last week there shouldn't be a need for a new version. let me try and wrap my head around wikiapiary - does it automatically verify which wikis are uploaded? |
00:08
🔗
|
gui77 |
or is it a manual process? |
00:09
🔗
|
Nemo_bis |
It's an automatic bot, but it's started manually, among other reasons because it's still quite hacky and it takes many hours to run (a day even, IIRC). |
00:09
🔗
|
Nemo_bis |
We handle manually a few hundreds ambiguous cases. |
00:11
🔗
|
Nemo_bis |
Anyway, gui77, if you have a lot of bandwidth/space as suggested by your question on new warrior projects I can suggest some bigger wikis which would keep your machine busier. |
00:12
🔗
|
Nemo_bis |
Alternatively, if you prefer small wikis we have a list of wikis on which I'm not working because they failed for me: https://code.google.com/p/wikiteam/source/browse/trunk/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt |
00:12
🔗
|
gui77 |
I'd definitely appreciate some bigger wikis, especially since it's a cumbersome process for small stuff. |
00:12
🔗
|
gui77 |
preferably under 200gb each though :p |
00:12
🔗
|
Nemo_bis |
200 GB compressed or total? |
00:13
🔗
|
gui77 |
total |
00:13
🔗
|
gui77 |
i've got the bw but kind of low on disk space hehe |
00:13
🔗
|
Nemo_bis |
One wiki nobody is working on is encyclopedia dramatica, someone speculates it's about 1-2 TB now |
00:14
🔗
|
Nemo_bis |
Hm. Wikis with many pages rarely take much bandwidth, wikis with many files do. |
00:14
🔗
|
Nemo_bis |
Anyway, I have a half completed dump you may want to continue |
00:15
🔗
|
gui77 |
I don't mind trying to complete yours, if it's feasible |
00:16
🔗
|
gui77 |
currently trying that list that failed for you, but so far it's failing for me too |
00:16
🔗
|
Nemo_bis |
Yes, you'd need manual tweaks |
00:17
🔗
|
Nemo_bis |
like testing if webserver returns correct error codes, where index.php and api.php are, if special:export works when used manually etc. |
00:18
🔗
|
Nemo_bis |
I think I may have lost that dump, in what dir was it grrr |
00:18
🔗
|
Nemo_bis |
I had to stop it because the XML was 100+ GB and I run out of disk, I compressed for later completion |
00:20
🔗
|
gui77 |
I'd have to download it from you first to resume it wouldn't I? |
00:20
🔗
|
gui77 |
yeah I've only got 200GB myself at the moment - won't it go over that? |
00:20
🔗
|
Nemo_bis |
Ah. Yes it would. Well I'm tarring up some smaller ones |
00:29
🔗
|
Nemo_bis |
gui77: http://koti.kapsi.fi/~federico/tmp/incomplete-dumps.tar |
00:30
🔗
|
Nemo_bis |
Untar the 4 directories, then bunzip2 the files inside them (one or more at once) and resume the dumping |
00:30
🔗
|
Nemo_bis |
the api.php or index.php URL to use is in the config.txt files of each |
00:30
🔗
|
Nemo_bis |
Now I'm going to bed |
00:30
🔗
|
gui77 |
I'll try |
00:30
🔗
|
gui77 |
thanks |
00:30
🔗
|
gui77 |
good night! |
08:42
🔗
|
Nemo_bis |
gui77: have you downloaded the file? |
17:33
🔗
|
gui77 |
oh hey. I did but something popped up and I've been busy since mroning, haven't had a chance to look at it yet? >.< |
18:56
🔗
|
Schbirid |
oh god, wikia s* is even bigger. 116G or seomthing |
18:56
🔗
|
Schbirid |
poor IA :( |
18:56
🔗
|
Schbirid |
almost done with grabbing btw |
21:04
🔗
|
Schbirid |
http://the-house-of-anubis.wikia.com/ seems to be discovered with all new images every time i re-run over the existing dump |
21:04
🔗
|
Schbirid |
149 images were found in the directory from a previous session |
21:04
🔗
|
Schbirid |
Image list was completed in the previous session |
21:04
🔗
|
Schbirid |
Retrieving images from ".nm.jpg" |
21:04
🔗
|
Schbirid |
and then it downloads |
21:11
🔗
|
Schbirid |
and i know i have seem this line from it before and laughed: Filename is too long, truncating. Now it is: 11436-Young-Woman-Pausing-In-A-Shoe-And-Purse-Store-To-Talk-On-Her-Cellphone-While-Shopping-Clipart-4f784878fb508650682af23548a71c78.jpg |
23:44
🔗
|
Nemo_bis |
Yes, for some reason s* is always the biggest |