Time |
Nickname |
Message |
10:51
π
|
Nemo_bis |
So emijrp just sent me 500 emails by importing googlecode to https://github.com/WikiTeam/wikiteam , ugh |
11:34
π
|
ersi |
haha, typical |
11:35
π
|
ersi |
uh, so, huh. That's plenty of stand alone scripts >_> |
15:57
π
|
balrog |
Nemo_bis: what are some of these bugs? |
15:58
π
|
Nemo_bis |
balrog: hard to tell |
15:58
π
|
balrog |
do you have a broken wiki and some output? |
15:58
π
|
Nemo_bis |
let me link the list |
15:59
π
|
balrog |
ok |
15:59
π
|
Nemo_bis |
balrog: https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt |
15:59
π
|
Nemo_bis |
it would be useful to debug those i.e. understandw why they fail despite being supposedly alilve |
16:00
π
|
balrog |
is the git repo now canonical? |
16:00
π
|
Nemo_bis |
Things like not following redirects and so |
16:00
π
|
Nemo_bis |
Apparently yes, emijrp just made it so :) nobody disagrees |
16:01
π
|
balrog |
how quickly does it fail? |
16:01
π
|
balrog |
do you have any failure output? |
16:02
π
|
balrog |
I picked one from that list at random and it's happily downloading |
16:02
π
|
Nemo_bis |
balrog: usually it dies on a 3xx error, or a "XML is invalid" loop |
16:02
π
|
balrog |
can you find one that actually fails? |
16:03
π
|
Nemo_bis |
I don't understand the question |
16:03
π
|
balrog |
I need a wiki that fails downloading so I can reproduce and debug the problem |
16:03
π
|
Nemo_bis |
I just linked you 2489 |
16:03
π
|
balrog |
and I picked one at random and it's downloaded most of it without failing |
16:04
π
|
Nemo_bis |
Well, those all failed for me |
16:04
π
|
balrog |
some of these seem to have other issues... |
16:04
π
|
balrog |
http://amandapalmer.despoiler.org/api.php |
16:04
π
|
balrog |
try removing /api.php |
16:04
π
|
balrog |
in a browser |
16:04
π
|
balrog |
can you try a few of these with the latest dumpgenerator and give me one or a few that consistently fail? |
16:05
π
|
Nemo_bis |
Sure but last time I checked there was only some 15-20 % of them with database errors |
16:05
π
|
balrog |
http://editplus.info/w/api.php completed successfully |
16:06
π
|
balrog |
it's going to take me a looooong time to find one that fails at this rate |
16:07
π
|
Nemo_bis |
you can just launch launcher.py over the list, that's what I do |
16:07
π
|
balrog |
since as I try them they either work or have db issues |
16:08
π
|
balrog |
I want to test with one that is known non-working for you |
16:10
π
|
balrog |
it's also possible something's broken with your python install |
16:10
π
|
balrog |
I want to rule that out too |
16:10
π
|
balrog |
(or that perhaps there's platform-specific code in the script) |
16:11
π
|
Nemo_bis |
if you want specific examples that's what I filed bugs for :) |
16:11
π
|
Nemo_bis |
For instance https://github.com/WikiTeam/wikiteam/issues/102 |
16:11
π
|
Nemo_bis |
I can't start a massive debugging sprint right now, sorry |
16:20
π
|
Nemo_bis |
balrog: the DB errors thing is covered in https://github.com/WikiTeam/wikiteam/issues/48 |
16:22
π
|
Nemo_bis |
balrog: more recently tested ones should be at https://github.com/WikiTeam/wikiteam/issues/78 |
16:26
π
|
balrog |
ok |
16:42
π
|
Nemo_bis |
sigh https://encrypted.google.com/search?q=%22internal_api_error%22+inurl%3Aapi.php |
16:44
π
|
Orphen |
Salve Nemo_bis |
16:44
π
|
Orphen |
ti posso chiedere una cosa? |
16:46
π
|
Nemo_bis |
Orphen: spara |
16:47
π
|
Orphen |
ho un problema ho spostato la mia wiki da un server che avevo ad un altro. Ora sul nuovo server non escono le immagini, al loro posto mi esce scritto: Errore nella creazione della miniatura: Impossibile salvare la miniatura nella destinazione |
16:47
π
|
Orphen |
cosa puΓΒ² essere? |
16:48
π
|
Nemo_bis |
probabilmente non hai impostato i permessi di scrittura nella cartella |
16:48
π
|
Orphen |
quale cartella |
16:48
π
|
Orphen |
? |
16:48
π
|
Nemo_bis |
quella delle immagini e/o delle miniature |
16:48
π
|
Orphen |
non so qual'ΓΒ¨ |
16:48
π
|
Nemo_bis |
images/ cache/ e simili, dipende dalla tua impsotazione |
16:49
π
|
Orphen |
ma a 777 li devo mettere? |
16:49
π
|
Orphen |
dove vedo la mia impostazione? |
16:50
π
|
Nemo_bis |
hai accesso da riga di comando o solo FTP? |
16:50
π
|
Orphen |
entrambi |
16:53
π
|
Orphen |
ho settato tutte le cartelle a 777 ma ancora non si vedono |
16:54
π
|
Orphen |
ho installato anche sul server ImageMagick sul server perchΓΒ¨ ho trovato su un forum che se non c'ΓΒ¨ questo app potrebbero non vedersi le img |
16:54
π
|
Orphen |
ma niente |
16:55
π
|
Nemo_bis |
Orphen: devi controllare che cartella hai impostato in LocalSettings.php e se esiste |
16:56
π
|
Orphen |
qual'ΓΒ¨ il paramentro? |
16:57
π
|
Nemo_bis |
controllale tutte... |
17:06
π
|
Orphen |
ho messo tutta la cartella a 777 con sottocartelle ma niente :( |
17:07
π
|
Orphen |
non so piΓΒΉ che provare |
17:14
π
|
balrog |
Nemo_bis: I see why it's breaking |
17:14
π
|
balrog |
it's trying to do a POST and the URL wants to redirect |
17:14
π
|
balrog |
which you can't do per RFC |
17:17
π
|
balrog |
Nemo_bis: do you have anything against borrowing the class from https://github.com/crustymonkey/py-sonic/blob/master/libsonic/connection.py#L50 ? |
17:18
π
|
Nemo_bis |
balrog: looks sensible as long as we're on urllib2 |
17:48
π
|
Nemo_bis |
The POST was originally added to circumvent some stupid anti-DDoS |
17:48
π
|
Nemo_bis |
IIRC |
17:51
π
|
balrog |
ew. ok |
18:19
π
|
ete_ |
Nemo_bis: I made the URL exports less bad |
18:19
π
|
ete_ |
https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Farm_URL_export |
18:19
π
|
ete_ |
https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Standalone_URL_export |
18:19
π
|
ete_ |
Now it won't have a load of duplicate URLs |
18:20
π
|
ete_ |
and actually gives you the main page URL |
18:21
π
|
ete_ |
had to split the standalone one to two pages due to template include limits, but it'll automatically remove the notice once we drop below 4000 wikis marked as unarchived |
18:23
π
|
ete_ |
I'm hoping you guys will use WikiApiary as a major way to track things once we start bulk importing every site we can get our hands on |
18:23
π
|
ete_ |
I can make lots of fun reports :) |
18:24
π
|
ete_ |
and wikiapiary will get all the websites |
18:30
π
|
balrog |
Nemo_bis: http://hastebin.com/kokeqexire |
18:30
π
|
balrog |
Nemo_bis: also @balr0g on github |
18:32
π
|
balrog |
that should squash #46 |
18:33
π
|
balrog |
https://code.google.com/p/wikiteam/issues/detail?id=78 |
18:34
π
|
balrog |
wikkii.com wikis have the API disabled. |
18:34
π
|
balrog |
which is all of those |
18:37
π
|
Nemo_bis |
ou |
18:37
π
|
* |
Nemo_bis has dinner now |
18:37
π
|
Nemo_bis |
baked small patch for #48 in the meanwhile |
18:38
π
|
balrog |
no issues with http://www.wiki-aventurica.de/ but it seems empty, but for images. |
18:38
π
|
balrog |
so far haven't run into that issue |
18:40
π
|
balrog |
what we really need is a working yahoogroup dumper :/ |
18:40
π
|
balrog |
yahoo groups neo broke the old one |
18:52
π
|
balrog |
gzip... is that even worth implementing? |
18:58
π
|
Nemo_bis |
define "implementing" |
19:00
π
|
Nemo_bis |
It's not worth it in the sense that we should use a library that takes care of it. :) But wikis sometimes get angry if you don't use it and there's a really huge saving when downloading XML. |
19:28
π
|
Nemo_bis |
balrog: do you want to submit the patch / pull request yourself or do you want me to? |
19:28
π
|
balrog |
you don't have commit access? |
19:28
π
|
Nemo_bis |
Sure |
19:28
π
|
balrog |
"wikis sometimes get angry if you don't use it" is a good enough reason |
19:28
π
|
Nemo_bis |
:) |
19:29
π
|
Nemo_bis |
It's even mentioned in Wikimedia's etiquette and, well, Wikimedia almost never blocks anyone. |
19:35
π
|
Nemo_bis |
Hmpf, that class is GPLv3+ |
19:36
π
|
Nemo_bis |
Ah good, we're GPLv3 too ;) |
20:11
π
|
balrog |
Nemo_bis: yeah I did double-check |
20:37
π
|
Nemo_bis |
Yes, I've merged in the meanwhile |
20:38
π
|
balrog |
Nemo_bis: is there a better way to do this? |
20:38
π
|
balrog |
if f.headers.get('Content-Encoding') and 'gzip' in f.headers.get('Content-Encoding'): |
20:42
π
|
balrog |
I have gzip working :) |
20:42
π
|
balrog |
it's in no way cleanly done, but it's about as clean as the rest of the code |
20:47
π
|
balrog |
wow gzip speeds things up a lot |
20:48
π
|
Nemo_bis |
yes the scripts are a mess |
20:48
π
|
Nemo_bis |
but it's not as simple as that IIRC; because you must uncompress the received content if and only if the server actually sent it gzipped |
20:48
π
|
balrog |
well yes |
20:48
π
|
balrog |
that's what the if is for |
20:49
π
|
balrog |
to check if the server says it's sent gzipped |
20:49
π
|
Nemo_bis |
yes sorry |
20:49
π
|
Nemo_bis |
I mean it's two lines not two :P one to accept gzip and the other to check if you received gzip like that |
20:49
π
|
balrog |
porting to requests wouldn't be a bad idea but this code needs a total rewrite :/ |
20:49
π
|
Nemo_bis |
yes... |
20:49
π
|
balrog |
my question was is there a better way to do that |
20:52
π
|
balrog |
check if the array is not empty and if it has that element |
20:52
π
|
balrog |
err, rather |
20:52
π
|
balrog |
check if it exists and check if it has that element |
20:52
π
|
balrog |
or is what I'm doing alright? |
20:53
π
|
balrog |
Nemo_bis: add gzip support: http://hastebin.com/irabokabaf |
20:53
π
|
Nemo_bis |
I have no idea how to test this though |
20:53
π
|
Nemo_bis |
Did you find one wiki which does *not* serve gzip? |
20:54
π
|
balrog |
no :/ |
20:55
π
|
balrog |
there's probably a mediawiki appliance for vbox or vagrant... |
20:55
π
|
balrog |
http://www.mediawiki.org/wiki/Mediawiki-vagrant |
20:55
π
|
Nemo_bis |
yes |
20:55
π
|
balrog |
could just set that up and disable gzip |
20:57
π
|
Nemo_bis |
sure |
20:57
π
|
Nemo_bis |
Or if you tell me the correct config I'll do so immediately on an instance I have on the web |
20:58
π
|
balrog |
apache2? |
20:58
π
|
balrog |
just disable mod_deflate |
20:58
π
|
balrog |
sudo a2dismod deflate |
20:58
π
|
balrog |
sudo service apache2 restart |
20:58
π
|
balrog |
I can test it with telnet if you wish |
20:59
π
|
Nemo_bis |
balrog: try http://pagemigration.wmflabs.org/ |
20:59
π
|
Nemo_bis |
quick, puppet will probably revert me soon :P |
21:00
π
|
balrog |
Nemo_bis: I tried it and it spat out gzip |
21:01
π
|
balrog |
wait, you're using nginx? |
21:03
π
|
Nemo_bis |
err, yes |
21:03
π
|
Nemo_bis |
that's what puppet is for, I don't even think about that :P |
21:03
π
|
balrog |
just change gzip on; to off |
21:03
π
|
balrog |
http://nginx.org/en/docs/http/ngx_http_gzip_module.html |
21:08
π
|
balrog |
Nemo_bis: will puppet let you do that? |
21:12
π
|
Nemo_bis |
sure, changing puppet files is easy |
21:13
π
|
ete_ |
Nemo_bis: is there an all-in-one list of alive wikis you know about? |
21:13
π
|
ete_ |
(missing some farms is okay) |
21:14
π
|
balrog |
Nemo_bis: let me know when you've done so |
21:14
π
|
ete_ |
or is it scattered through https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis ? |
21:19
π
|
Nemo_bis |
ete_: mostly they'll be on https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_done_2014.txt |
21:19
π
|
Nemo_bis |
puppet still running |
21:20
π
|
ete_ |
Okay, thanks |
21:28
π
|
Nemo_bis |
balrog: I hope it worked |
21:29
π
|
Nemo_bis |
(fyi puppet load allegedly brought Wikipedia down for 15 min the other day, it's not just slow with me :p) |
21:33
π
|
balrog |
Nemo_bis: it's still sending gzip |
21:34
π
|
balrog |
which means the module must be on |
21:39
π
|
Nemo_bis |
sigh, can you send a pull request? I'll test myself if emijrp doesn't beat me at it |
21:39
π
|
Nemo_bis |
(or I can submit the patch myself tomorrow if you prefer, as you wish) |
21:39
π
|
* |
Nemo_bis to bed now |
21:44
π
|
balrog |
alright... |
21:44
π
|
balrog |
Nemo_bis: you did restart nginx, right? |