Time |
Nickname |
Message |
19:06
🔗
|
underscor |
emijrp: finally caught you |
19:06
🔗
|
underscor |
so, I'm working with ariel glenn (from wmf) and kevin day (your.org) on getting the stuff up to ia |
19:06
🔗
|
underscor |
Just thought you'd like to know |
19:07
🔗
|
underscor |
ariel is going to generate the media xml for me, and then I can parse it with lxml or hpricot to extract the bits I need |
19:08
🔗
|
underscor |
and the your.org guys are giving me a box with a nfs mount of the tree, so I can do easy filtering |
19:08
🔗
|
underscor |
^ Nemo_bis too |
19:09
🔗
|
emijrp |
great |
19:09
🔗
|
emijrp |
are you going to upload day-by-day packs to wikiteam collection? |
19:11
🔗
|
underscor |
still figuring out what interval to do it at |
19:11
🔗
|
underscor |
IA wants roughly 10-15GB sized archives |
19:11
🔗
|
underscor |
So for some projects (like the Esperanto Wiktionary), all of their media combined is like 2MB |
19:11
🔗
|
underscor |
lol |
19:11
🔗
|
underscor |
But commons easily sees 10gb in a day |
19:12
🔗
|
underscor |
(according to ariel) |
19:12
🔗
|
underscor |
so different projects'll see different rates |
19:12
🔗
|
underscor |
The goal is to make it all fully automated |
19:12
🔗
|
underscor |
Because right now, even the wiki backups from WMF are semi-manual |
19:12
🔗
|
underscor |
and that's tedius |
19:12
🔗
|
underscor |
tedious* |