Time |
Nickname |
Message |
18:55
🔗
|
emijrp |
We start to download shoutwiki.com |
18:55
🔗
|
Nemo_bis |
emijrp, I've not started yet |
18:56
🔗
|
Nemo_bis |
it at "API ok" |
18:56
🔗
|
Nemo_bis |
I made a shell script, if it fails it won't resume |
18:56
🔗
|
Nemo_bis |
So I'll resume all of them when it's finished |
18:57
🔗
|
Nemo_bis |
To check if they're really complete |
18:57
🔗
|
Nemo_bis |
But I need to specificy paths, that's boring |
18:58
🔗
|
Nemo_bis |
Hm, it doesn't seem to do anything. |
18:58
🔗
|
Nemo_bis |
ah, here we are |
18:58
🔗
|
Nemo_bis |
even script is yawning |
19:02
🔗
|
emijrp |
paths? |
19:03
🔗
|
emijrp |
to resume, righT? |
19:05
🔗
|
Nemo_bis |
yes |
19:49
🔗
|
emijrp |
started any wiki? |
19:50
🔗
|
emijrp |
looks like edit mode is frozen |
19:50
🔗
|
emijrp |
http://zootycoon.shoutwiki.com/w/index.php?title=Main_Page&action=edit |
19:53
🔗
|
emijrp |
Nemo_bis: we can split the list |
19:53
🔗
|
emijrp |
50 wikis? |
20:06
🔗
|
Nemo_bis |
emijrp, lemme check |
20:07
🔗
|
Nemo_bis |
I'm at the 4th |
20:07
🔗
|
Nemo_bis |
First 3 very little, 90210shoutwikicom 1200 pages |
20:07
🔗
|
Nemo_bis |
now at 170 |
20:09
🔗
|
emijrp |
ok |
20:09
🔗
|
Nemo_bis |
What should I do, run multiple instances? |
20:14
🔗
|
emijrp |
if server is slow, is slow |
20:14
🔗
|
emijrp |
multiple instances may do it slower |
20:14
🔗
|
soultcer |
I read over in #archiveteam that you want to archive all wikimedia commons images? |
20:15
🔗
|
Nemo_bis |
I don't know if it's the server |
20:15
🔗
|
Nemo_bis |
After all, the script doesn't download much more than some amount of pages per min |
20:15
🔗
|
Nemo_bis |
I have some wikis which have always the same amount of downloaded pages. |
20:16
🔗
|
emijrp |
? |
20:16
🔗
|
emijrp |
in shoutwiki? |
20:16
🔗
|
Nemo_bis |
no |
20:16
🔗
|
Nemo_bis |
Two other wikis |
20:17
🔗
|
Nemo_bis |
They started together and they're always at the same level |
20:17
🔗
|
Nemo_bis |
"Server is slow... Waiting some seconds and retrying..." |
20:17
🔗
|
Nemo_bis |
Ok, it's also slow |
20:17
🔗
|
emijrp |
you mean download rate? |
20:17
🔗
|
Nemo_bis |
number of pages downloaded |
20:17
🔗
|
emijrp |
ok |
20:17
🔗
|
emijrp |
soultcer: yes |
20:18
🔗
|
soultcer |
Last time I checked it was around 6 TB or so |
20:18
🔗
|
soultcer |
And last time I checked wikimedia foundation didn't even have an offsite backup |
20:21
🔗
|
emijrp |
they rsync to other server from time to time |
20:22
🔗
|
soultcer |
To an offsite one? |
20:22
🔗
|
emijrp |
yep |
20:25
🔗
|
emijrp |
but of course that is not enough |
20:26
🔗
|
emijrp |
http://wikitech.wikimedia.org/view/Offsite_Backups |
20:26
🔗
|
soultcer |
They should update their wikitech pages more often, all references to offsite image backup make me believe the offsite backups are incomplete and at least 2 years old |
20:27
🔗
|
emijrp |
are you worried? look this http://wikitech.wikimedia.org/view/Disaster_Recovery |
20:27
🔗
|
Nemo_bis |
emijrp, is that page accurate, then? |
20:27
🔗
|
emijrp |
On todo list. |
20:27
🔗
|
emijrp |
Accurate. Accurate. |
20:30
🔗
|
soultcer |
I think they do have some other page on the main wikipedia that has something about what to do in case of disaster. Advising users to start printing articles and so on |
20:31
🔗
|
soultcer |
So what is your plan on how to back up all the images? |
20:31
🔗
|
emijrp |
HAHAHHAHAHA ARE YOU SERIOUS? |
20:32
🔗
|
emijrp |
I hope that is on the Department of Fun. |
20:32
🔗
|
emijrp |
my plan is get a list o images by date, and distribute efforts |
20:32
🔗
|
emijrp |
but first we have to develop the script |
20:33
🔗
|
soultcer |
http://en.wikipedia.org/wiki/Wikipedia:Terminal_Event_Management_Policy |
20:37
🔗
|
emijrp |
humour page |
20:38
🔗
|
Nemo_bis |
While the light of humanity may flicker and die, we go gently into this dark night, comforted in the knowledge that someday Wikipedia shall take its rightful place as part of a consensus-built Galactic Encyclopedia, editable by all sentient beings. |
20:38
🔗
|
Nemo_bis |
Not bad, but overall not so funny. |
20:39
🔗
|
emijrp |
sure |
20:39
🔗
|
emijrp |
save my pokemons articles! |
20:39
🔗
|
soultcer |
I found it interesting (might be because the first time I read it I didn't notice the banner on top, only saw it now when I looked for the page again) |
20:42
🔗
|
emijrp |
if wikipedia is destroyed today, it will be a pain in the neck, but 99.9% of text is available in other places (books, websites, etc) |
20:43
🔗
|
emijrp |
but images are a different issue, how many images are donated to wikimedia commons and later lost by their owners? |
20:43
🔗
|
emijrp |
commons contains a lot of amateur photos, unique |
20:44
🔗
|
soultcer |
Lots of them are from Flickr, but Flickr is Yahoo, so that doesn't count. |
20:50
🔗
|
emijrp |
i will try next week to do that script for images |
20:55
🔗
|
emijrp |
by now, we are working on shoutwiki |
20:56
🔗
|
emijrp |
Nemo_bis: download only from 1 to 50 wikis, can you send a message to mailing list, asking for split shoutwiki wikis list in chunks and claim first chunk? |
20:57
🔗
|
Nemo_bis |
emijrp, why? |
20:57
🔗
|
Nemo_bis |
If the problem is server slowness, this doesn't fix it. |
20:57
🔗
|
Nemo_bis |
I can do 8 chunks in parallel. |
20:57
🔗
|
Nemo_bis |
Unless you think they'll block my IP. |
20:58
🔗
|
emijrp |
ok, if you want so, do it, if help is needed ask here |
20:58
🔗
|
emijrp |
notice on mailing list you are working on shoutwiki |
21:15
🔗
|
soultcer |
http://groups.google.com/group/wikiteam-discuss?pli=1 <-- this the mailing list? |
21:17
🔗
|
Nemo_bis |
soultcer, yes |