#wikiteam 2011-07-21,Thu

↑back Search

Time Nickname Message
18:55 🔗 emijrp We start to download shoutwiki.com
18:55 🔗 Nemo_bis emijrp, I've not started yet
18:56 🔗 Nemo_bis it at "API ok"
18:56 🔗 Nemo_bis I made a shell script, if it fails it won't resume
18:56 🔗 Nemo_bis So I'll resume all of them when it's finished
18:57 🔗 Nemo_bis To check if they're really complete
18:57 🔗 Nemo_bis But I need to specificy paths, that's boring
18:58 🔗 Nemo_bis Hm, it doesn't seem to do anything.
18:58 🔗 Nemo_bis ah, here we are
18:58 🔗 Nemo_bis even script is yawning
19:02 🔗 emijrp paths?
19:03 🔗 emijrp to resume, righT?
19:05 🔗 Nemo_bis yes
19:49 🔗 emijrp started any wiki?
19:50 🔗 emijrp looks like edit mode is frozen
19:50 🔗 emijrp http://zootycoon.shoutwiki.com/w/index.php?title=Main_Page&action=edit
19:53 🔗 emijrp Nemo_bis: we can split the list
19:53 🔗 emijrp 50 wikis?
20:06 🔗 Nemo_bis emijrp, lemme check
20:07 🔗 Nemo_bis I'm at the 4th
20:07 🔗 Nemo_bis First 3 very little, 90210shoutwikicom 1200 pages
20:07 🔗 Nemo_bis now at 170
20:09 🔗 emijrp ok
20:09 🔗 Nemo_bis What should I do, run multiple instances?
20:14 🔗 emijrp if server is slow, is slow
20:14 🔗 emijrp multiple instances may do it slower
20:14 🔗 soultcer I read over in #archiveteam that you want to archive all wikimedia commons images?
20:15 🔗 Nemo_bis I don't know if it's the server
20:15 🔗 Nemo_bis After all, the script doesn't download much more than some amount of pages per min
20:15 🔗 Nemo_bis I have some wikis which have always the same amount of downloaded pages.
20:16 🔗 emijrp ?
20:16 🔗 emijrp in shoutwiki?
20:16 🔗 Nemo_bis no
20:16 🔗 Nemo_bis Two other wikis
20:17 🔗 Nemo_bis They started together and they're always at the same level
20:17 🔗 Nemo_bis "Server is slow... Waiting some seconds and retrying..."
20:17 🔗 Nemo_bis Ok, it's also slow
20:17 🔗 emijrp you mean download rate?
20:17 🔗 Nemo_bis number of pages downloaded
20:17 🔗 emijrp ok
20:17 🔗 emijrp soultcer: yes
20:18 🔗 soultcer Last time I checked it was around 6 TB or so
20:18 🔗 soultcer And last time I checked wikimedia foundation didn't even have an offsite backup
20:21 🔗 emijrp they rsync to other server from time to time
20:22 🔗 soultcer To an offsite one?
20:22 🔗 emijrp yep
20:25 🔗 emijrp but of course that is not enough
20:26 🔗 emijrp http://wikitech.wikimedia.org/view/Offsite_Backups
20:26 🔗 soultcer They should update their wikitech pages more often, all references to offsite image backup make me believe the offsite backups are incomplete and at least 2 years old
20:27 🔗 emijrp are you worried? look this http://wikitech.wikimedia.org/view/Disaster_Recovery
20:27 🔗 Nemo_bis emijrp, is that page accurate, then?
20:27 🔗 emijrp On todo list.
20:27 🔗 emijrp Accurate. Accurate.
20:30 🔗 soultcer I think they do have some other page on the main wikipedia that has something about what to do in case of disaster. Advising users to start printing articles and so on
20:31 🔗 soultcer So what is your plan on how to back up all the images?
20:31 🔗 emijrp HAHAHHAHAHA ARE YOU SERIOUS?
20:32 🔗 emijrp I hope that is on the Department of Fun.
20:32 🔗 emijrp my plan is get a list o images by date, and distribute efforts
20:32 🔗 emijrp but first we have to develop the script
20:33 🔗 soultcer http://en.wikipedia.org/wiki/Wikipedia:Terminal_Event_Management_Policy
20:37 🔗 emijrp humour page
20:38 🔗 Nemo_bis While the light of humanity may flicker and die, we go gently into this dark night, comforted in the knowledge that someday Wikipedia shall take its rightful place as part of a consensus-built Galactic Encyclopedia, editable by all sentient beings.
20:38 🔗 Nemo_bis Not bad, but overall not so funny.
20:39 🔗 emijrp sure
20:39 🔗 emijrp save my pokemons articles!
20:39 🔗 soultcer I found it interesting (might be because the first time I read it I didn't notice the banner on top, only saw it now when I looked for the page again)
20:42 🔗 emijrp if wikipedia is destroyed today, it will be a pain in the neck, but 99.9% of text is available in other places (books, websites, etc)
20:43 🔗 emijrp but images are a different issue, how many images are donated to wikimedia commons and later lost by their owners?
20:43 🔗 emijrp commons contains a lot of amateur photos, unique
20:44 🔗 soultcer Lots of them are from Flickr, but Flickr is Yahoo, so that doesn't count.
20:50 🔗 emijrp i will try next week to do that script for images
20:55 🔗 emijrp by now, we are working on shoutwiki
20:56 🔗 emijrp Nemo_bis: download only from 1 to 50 wikis, can you send a message to mailing list, asking for split shoutwiki wikis list in chunks and claim first chunk?
20:57 🔗 Nemo_bis emijrp, why?
20:57 🔗 Nemo_bis If the problem is server slowness, this doesn't fix it.
20:57 🔗 Nemo_bis I can do 8 chunks in parallel.
20:57 🔗 Nemo_bis Unless you think they'll block my IP.
20:58 🔗 emijrp ok, if you want so, do it, if help is needed ask here
20:58 🔗 emijrp notice on mailing list you are working on shoutwiki
21:15 🔗 soultcer http://groups.google.com/group/wikiteam-discuss?pli=1 <-- this the mailing list?
21:17 🔗 Nemo_bis soultcer, yes

irclogger-viewer