Time |
Nickname |
Message |
13:55
🔗
|
Hydriz |
Eh, emijrp, are you free now? |
13:55
🔗
|
emijrp |
depend |
13:55
🔗
|
emijrp |
lul |
13:55
🔗
|
Hydriz |
okok, just a short one |
13:55
🔗
|
Hydriz |
is there going to be anything done to the Wikimedia Commons grab? |
13:56
🔗
|
emijrp |
i reported some bugs, but they are not fixed (in the same way Nemo_bis report bugs to me and i dont fix them; KARMA RETURNS) |
13:56
🔗
|
Hydriz |
heh |
13:56
🔗
|
emijrp |
do you have downloaded many GB? |
13:56
🔗
|
Nemo_bis |
lol |
13:56
🔗
|
Hydriz |
anyways, I finished downloading |
13:57
🔗
|
Hydriz |
eh, about 120GB or so |
13:57
🔗
|
Nemo_bis |
same here, but I deleted everything now |
13:57
🔗
|
Nemo_bis |
to make room for the wikis |
13:57
🔗
|
Hydriz |
some month grabs looks good, so I transferred them to the IA already |
13:57
🔗
|
Nemo_bis |
hmmm |
13:57
🔗
|
emijrp |
Hydriz: cool |
13:57
🔗
|
Nemo_bis |
weren't we supposed o wait |
13:57
🔗
|
Hydriz |
http://archive.org/details/wikimediacommons-200606 |
13:57
🔗
|
Hydriz |
heh Nemo |
13:57
🔗
|
Hydriz |
rules are meant to be broken :P |
13:58
🔗
|
Nemo_bis |
oic |
13:58
🔗
|
Hydriz |
I upload June + January 2006 |
13:58
🔗
|
Nemo_bis |
id you include the output of the checker? |
13:58
🔗
|
Nemo_bis |
(or the log, I don't remember what's there) |
13:58
🔗
|
Hydriz |
lol no |
13:59
🔗
|
Hydriz |
I am clearing stuff off the Labs project |
13:59
🔗
|
emijrp |
i think it is ok you upload whatever you have about Commons, that MediaWiki developers are not going to solve a damn |
13:59
🔗
|
emijrp |
so, upload |
14:00
🔗
|
emijrp |
perhaps it contains some broken images, but, better than nothing |
14:00
🔗
|
Hydriz |
we want moar |
14:00
🔗
|
Hydriz |
yeah, around 10 - 15 per day |
14:00
🔗
|
Hydriz |
* broken images |
14:01
🔗
|
Nemo_bis |
actually some errors were fixed |
14:01
🔗
|
Hydriz |
but issue 45 is the burning issue |
14:02
🔗
|
Hydriz |
its preventing many days to be grabbed |
14:02
🔗
|
Hydriz |
so I am putting them on hold before I upload |
14:02
🔗
|
Hydriz |
or maybe I should upload... |
14:03
🔗
|
emijrp |
no, wait |
14:03
🔗
|
Hydriz |
its not affecting other days though |
14:03
🔗
|
emijrp |
the only months unaffected by issue 45 are january and june? |
14:03
🔗
|
Hydriz |
yep |
14:04
🔗
|
Nemo_bis |
miracle |
14:04
🔗
|
Hydriz |
but a month that is affected, is only isolated to the few days |
14:04
🔗
|
Hydriz |
yeah, some encoding issue that commonsdownloader.py refuses to resolve |
14:04
🔗
|
Hydriz |
like slashes or other symbols |
14:07
🔗
|
Hydriz |
I probably can start on July - December soon, and then we can put pressure to make more commonssql.csv s |
14:07
🔗
|
emijrp |
looks like the bug only affects to old versions |
14:07
🔗
|
emijrp |
but i will try to fix it anyway |
14:07
🔗
|
Hydriz |
heh |
14:07
🔗
|
Hydriz |
but it shouldn't be of top priority anyway |
14:08
🔗
|
emijrp |
can you paste the wget call ? |
14:09
🔗
|
emijrp |
https://code.google.com/p/wikiteam/issues/detail?id=45 |
14:09
🔗
|
Hydriz |
wha..what? |
14:09
🔗
|
emijrp |
just before wget stat |
14:09
🔗
|
emijrp |
starts |
14:09
🔗
|
Hydriz |
lol |
14:09
🔗
|
* |
Hydriz shall start the script again |
14:10
🔗
|
emijrp |
it skips to the last downlaoding image |
14:10
🔗
|
emijrp |
right'? |
14:10
🔗
|
emijrp |
i dont remember.. |
14:11
🔗
|
Hydriz |
right, give me a few minutes |
14:11
🔗
|
Hydriz |
(or give the script a few more minutes) |
14:12
🔗
|
emijrp |
just donwload 2006-02-05 |
14:12
🔗
|
Hydriz |
yep |
14:12
🔗
|
Hydriz |
Doing... |
14:13
🔗
|
emijrp |
the issue is that wget saves it like 2006/02/05/20070605200920!US__reverse.jpg but the eral name is 2006/02/05/20070605200920!US_$100_reverse.jpg |
14:13
🔗
|
emijrp |
i dont know if wget eats the $, or ... |
14:14
🔗
|
Hydriz |
If I recall vaguely, its the downloader that is eating it, or something |
14:14
🔗
|
Hydriz |
but anyway, our taskforce seems to be going well? |
14:14
🔗
|
Hydriz |
the nemo dominance |
14:17
🔗
|
emijrp |
ok |
14:17
🔗
|
emijrp |
about the metadata of items |
14:17
🔗
|
emijrp |
we need to add the ZIP links to explore the images |
14:18
🔗
|
emijrp |
and a link back to WikiTeam Google Code |
14:18
🔗
|
Hydriz |
thats mad |
14:18
🔗
|
Hydriz |
link, yes |
14:18
🔗
|
Hydriz |
but ZIP links, 31 times... |
14:19
🔗
|
emijrp |
yes, that is easy, copy paste or a tiny script |
14:19
🔗
|
emijrp |
to generate a cool HTML table |
14:19
🔗
|
* |
Hydriz is feeling lazy right now... |
14:21
🔗
|
Hydriz |
wait wait |
14:21
🔗
|
Hydriz |
the wget call? |
14:21
🔗
|
Hydriz |
isn't it already in the paste inside my comment? |
14:22
🔗
|
Hydriz |
unless you meant a line above that |
14:22
🔗
|
Hydriz |
which is just the file name |
14:23
🔗
|
emijrp |
yes, a line above |
14:23
🔗
|
Hydriz |
damn |
14:24
🔗
|
Hydriz |
a small oversight |
14:29
🔗
|
emijrp |
when you paste that line (i hope it is shown and not hidden inside the os.system() call), i will check |
14:29
🔗
|
emijrp |
i can add a try: except: too and skip that error |
14:29
🔗
|
emijrp |
it looks like only affects to old versions |
14:30
🔗
|
Hydriz |
maybe... |
14:30
🔗
|
Hydriz |
but thats all the errors I got |
14:30
🔗
|
Hydriz |
8 times |
14:30
🔗
|
emijrp |
8 times where? |
14:31
🔗
|
emijrp |
ah ok |
14:31
🔗
|
Hydriz |
means that this bug affected the grab of 8 days |
14:31
🔗
|
emijrp |
okok |
14:31
🔗
|
emijrp |
not relevant for the big picture |
14:32
🔗
|
emijrp |
18TB of images and fails 8 images |
14:32
🔗
|
Hydriz |
lol |
14:32
🔗
|
emijrp |
MAN. |
14:32
🔗
|
emijrp |
well, really 1 or 2 pictures by day |
14:32
🔗
|
Hydriz |
hmm, lemme look at the IA blog post doc... |
14:33
🔗
|
emijrp |
oh, i forgot to add a line to that post about the commons download |
14:34
🔗
|
emijrp |
add it |
14:34
🔗
|
* |
Hydriz is stunned about what to do |
14:35
🔗
|
emijrp |
a comment about we have to skeap about the wikimedia commons downloader task |
14:35
🔗
|
emijrp |
to the google doc |
14:35
🔗
|
emijrp |
speak* |
14:38
🔗
|
Hydriz |
ah, the download is now in the old versions... |
14:39
🔗
|
Hydriz |
got it |
14:39
🔗
|
Hydriz |
I shall post it on the issue page |
14:40
🔗
|
Hydriz |
emijrp: ping |
14:40
🔗
|
emijrp |
ok |
14:41
🔗
|
Hydriz |
yeah, it seems like its wget |
14:42
🔗
|
emijrp |
weird, but ok |
14:42
🔗
|
emijrp |
i will think about it, and, if i dont see a clear solution, i will just add a try: ecept: and skip that shit |
14:44
🔗
|
Hydriz |
lol |
14:50
🔗
|
Hydriz |
hmm, thinking about it, there isn't really much I know that I can write in the blog post |
14:58
🔗
|
emijrp |
at the begining it is hard to write |
14:58
🔗
|
emijrp |
later we will need more pages |
14:58
🔗
|
Hydriz |
lol |
14:58
🔗
|
Hydriz |
1 week left |
14:58
🔗
|
Hydriz |
though |
14:59
🔗
|
Hydriz |
anyway, can I start uploading the files for the Wikimedia Commons grab? |
14:59
🔗
|
Hydriz |
the rest of them |
15:00
🔗
|
emijrp |
if you can modify the items later and add the missing .zip .. |
15:00
🔗
|
Hydriz |
yep |
15:01
🔗
|
Hydriz |
but still I got to wait for the dvds to get uploaded |
15:01
🔗
|
Hydriz |
why do people want to make DVDs of Wikipedia... |
15:01
🔗
|
Hydriz |
zzz |
15:01
🔗
|
emijrp |
dvs? |
15:02
🔗
|
Hydriz |
http://dumps.wikimedia.org/dvd.html |
15:06
🔗
|
emijrp |
because there are people without internet |
15:06
🔗
|
emijrp |
CDPedia is the Spanish Wikipedia CD, it is useful for El Salvador and other South MAerican countries. |
15:07
🔗
|
Hydriz |
I see |
15:07
🔗
|
Hydriz |
yep, its on the IA |
15:07
🔗
|
Hydriz |
I am just left with dewiki |
15:07
🔗
|
Hydriz |
2 more files |
15:29
🔗
|
Hydriz |
Right, good night people |
15:30
🔗
|
Hydriz |
got to sleep for long day tomorrow :) |