| Time |
Nickname |
Message |
|
10:51
π
|
Nemo_bis |
So emijrp just sent me 500 emails by importing googlecode to https://github.com/WikiTeam/wikiteam , ugh |
|
11:34
π
|
ersi |
haha, typical |
|
11:35
π
|
ersi |
uh, so, huh. That's plenty of stand alone scripts >_> |
|
15:57
π
|
balrog |
Nemo_bis: what are some of these bugs? |
|
15:58
π
|
Nemo_bis |
balrog: hard to tell |
|
15:58
π
|
balrog |
do you have a broken wiki and some output? |
|
15:58
π
|
Nemo_bis |
let me link the list |
|
15:59
π
|
balrog |
ok |
|
15:59
π
|
Nemo_bis |
balrog: https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt |
|
15:59
π
|
Nemo_bis |
it would be useful to debug those i.e. understandw why they fail despite being supposedly alilve |
|
16:00
π
|
balrog |
is the git repo now canonical? |
|
16:00
π
|
Nemo_bis |
Things like not following redirects and so |
|
16:00
π
|
Nemo_bis |
Apparently yes, emijrp just made it so :) nobody disagrees |
|
16:01
π
|
balrog |
how quickly does it fail? |
|
16:01
π
|
balrog |
do you have any failure output? |
|
16:02
π
|
balrog |
I picked one from that list at random and it's happily downloading |
|
16:02
π
|
Nemo_bis |
balrog: usually it dies on a 3xx error, or a "XML is invalid" loop |
|
16:02
π
|
balrog |
can you find one that actually fails? |
|
16:03
π
|
Nemo_bis |
I don't understand the question |
|
16:03
π
|
balrog |
I need a wiki that fails downloading so I can reproduce and debug the problem |
|
16:03
π
|
Nemo_bis |
I just linked you 2489 |
|
16:03
π
|
balrog |
and I picked one at random and it's downloaded most of it without failing |
|
16:04
π
|
Nemo_bis |
Well, those all failed for me |
|
16:04
π
|
balrog |
some of these seem to have other issues... |
|
16:04
π
|
balrog |
http://amandapalmer.despoiler.org/api.php |
|
16:04
π
|
balrog |
try removing /api.php |
|
16:04
π
|
balrog |
in a browser |
|
16:04
π
|
balrog |
can you try a few of these with the latest dumpgenerator and give me one or a few that consistently fail? |
|
16:05
π
|
Nemo_bis |
Sure but last time I checked there was only some 15-20 % of them with database errors |
|
16:05
π
|
balrog |
http://editplus.info/w/api.php completed successfully |
|
16:06
π
|
balrog |
it's going to take me a looooong time to find one that fails at this rate |
|
16:07
π
|
Nemo_bis |
you can just launch launcher.py over the list, that's what I do |
|
16:07
π
|
balrog |
since as I try them they either work or have db issues |
|
16:08
π
|
balrog |
I want to test with one that is known non-working for you |
|
16:10
π
|
balrog |
it's also possible something's broken with your python install |
|
16:10
π
|
balrog |
I want to rule that out too |
|
16:10
π
|
balrog |
(or that perhaps there's platform-specific code in the script) |
|
16:11
π
|
Nemo_bis |
if you want specific examples that's what I filed bugs for :) |
|
16:11
π
|
Nemo_bis |
For instance https://github.com/WikiTeam/wikiteam/issues/102 |
|
16:11
π
|
Nemo_bis |
I can't start a massive debugging sprint right now, sorry |
|
16:20
π
|
Nemo_bis |
balrog: the DB errors thing is covered in https://github.com/WikiTeam/wikiteam/issues/48 |
|
16:22
π
|
Nemo_bis |
balrog: more recently tested ones should be at https://github.com/WikiTeam/wikiteam/issues/78 |
|
16:26
π
|
balrog |
ok |
|
16:42
π
|
Nemo_bis |
sigh https://encrypted.google.com/search?q=%22internal_api_error%22+inurl%3Aapi.php |
|
16:44
π
|
Orphen |
Salve Nemo_bis |
|
16:44
π
|
Orphen |
ti posso chiedere una cosa? |
|
16:46
π
|
Nemo_bis |
Orphen: spara |
|
16:47
π
|
Orphen |
ho un problema ho spostato la mia wiki da un server che avevo ad un altro. Ora sul nuovo server non escono le immagini, al loro posto mi esce scritto: Errore nella creazione della miniatura: Impossibile salvare la miniatura nella destinazione |
|
16:47
π
|
Orphen |
cosa puΓΒ² essere? |
|
16:48
π
|
Nemo_bis |
probabilmente non hai impostato i permessi di scrittura nella cartella |
|
16:48
π
|
Orphen |
quale cartella |
|
16:48
π
|
Orphen |
? |
|
16:48
π
|
Nemo_bis |
quella delle immagini e/o delle miniature |
|
16:48
π
|
Orphen |
non so qual'ΓΒ¨ |
|
16:48
π
|
Nemo_bis |
images/ cache/ e simili, dipende dalla tua impsotazione |
|
16:49
π
|
Orphen |
ma a 777 li devo mettere? |
|
16:49
π
|
Orphen |
dove vedo la mia impostazione? |
|
16:50
π
|
Nemo_bis |
hai accesso da riga di comando o solo FTP? |
|
16:50
π
|
Orphen |
entrambi |
|
16:53
π
|
Orphen |
ho settato tutte le cartelle a 777 ma ancora non si vedono |
|
16:54
π
|
Orphen |
ho installato anche sul server ImageMagick sul server perchΓΒ¨ ho trovato su un forum che se non c'ΓΒ¨ questo app potrebbero non vedersi le img |
|
16:54
π
|
Orphen |
ma niente |
|
16:55
π
|
Nemo_bis |
Orphen: devi controllare che cartella hai impostato in LocalSettings.php e se esiste |
|
16:56
π
|
Orphen |
qual'ΓΒ¨ il paramentro? |
|
16:57
π
|
Nemo_bis |
controllale tutte... |
|
17:06
π
|
Orphen |
ho messo tutta la cartella a 777 con sottocartelle ma niente :( |
|
17:07
π
|
Orphen |
non so piΓΒΉ che provare |
|
17:14
π
|
balrog |
Nemo_bis: I see why it's breaking |
|
17:14
π
|
balrog |
it's trying to do a POST and the URL wants to redirect |
|
17:14
π
|
balrog |
which you can't do per RFC |
|
17:17
π
|
balrog |
Nemo_bis: do you have anything against borrowing the class from https://github.com/crustymonkey/py-sonic/blob/master/libsonic/connection.py#L50 ? |
|
17:18
π
|
Nemo_bis |
balrog: looks sensible as long as we're on urllib2 |
|
17:48
π
|
Nemo_bis |
The POST was originally added to circumvent some stupid anti-DDoS |
|
17:48
π
|
Nemo_bis |
IIRC |
|
17:51
π
|
balrog |
ew. ok |
|
18:19
π
|
ete_ |
Nemo_bis: I made the URL exports less bad |
|
18:19
π
|
ete_ |
https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Farm_URL_export |
|
18:19
π
|
ete_ |
https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Standalone_URL_export |
|
18:19
π
|
ete_ |
Now it won't have a load of duplicate URLs |
|
18:20
π
|
ete_ |
and actually gives you the main page URL |
|
18:21
π
|
ete_ |
had to split the standalone one to two pages due to template include limits, but it'll automatically remove the notice once we drop below 4000 wikis marked as unarchived |
|
18:23
π
|
ete_ |
I'm hoping you guys will use WikiApiary as a major way to track things once we start bulk importing every site we can get our hands on |
|
18:23
π
|
ete_ |
I can make lots of fun reports :) |
|
18:24
π
|
ete_ |
and wikiapiary will get all the websites |
|
18:30
π
|
balrog |
Nemo_bis: http://hastebin.com/kokeqexire |
|
18:30
π
|
balrog |
Nemo_bis: also @balr0g on github |
|
18:32
π
|
balrog |
that should squash #46 |
|
18:33
π
|
balrog |
https://code.google.com/p/wikiteam/issues/detail?id=78 |
|
18:34
π
|
balrog |
wikkii.com wikis have the API disabled. |
|
18:34
π
|
balrog |
which is all of those |
|
18:37
π
|
Nemo_bis |
ou |
|
18:37
π
|
* |
Nemo_bis has dinner now |
|
18:37
π
|
Nemo_bis |
baked small patch for #48 in the meanwhile |
|
18:38
π
|
balrog |
no issues with http://www.wiki-aventurica.de/ but it seems empty, but for images. |
|
18:38
π
|
balrog |
so far haven't run into that issue |
|
18:40
π
|
balrog |
what we really need is a working yahoogroup dumper :/ |
|
18:40
π
|
balrog |
yahoo groups neo broke the old one |
|
18:52
π
|
balrog |
gzip... is that even worth implementing? |
|
18:58
π
|
Nemo_bis |
define "implementing" |
|
19:00
π
|
Nemo_bis |
It's not worth it in the sense that we should use a library that takes care of it. :) But wikis sometimes get angry if you don't use it and there's a really huge saving when downloading XML. |
|
19:28
π
|
Nemo_bis |
balrog: do you want to submit the patch / pull request yourself or do you want me to? |
|
19:28
π
|
balrog |
you don't have commit access? |
|
19:28
π
|
Nemo_bis |
Sure |
|
19:28
π
|
balrog |
"wikis sometimes get angry if you don't use it" is a good enough reason |
|
19:28
π
|
Nemo_bis |
:) |
|
19:29
π
|
Nemo_bis |
It's even mentioned in Wikimedia's etiquette and, well, Wikimedia almost never blocks anyone. |
|
19:35
π
|
Nemo_bis |
Hmpf, that class is GPLv3+ |
|
19:36
π
|
Nemo_bis |
Ah good, we're GPLv3 too ;) |
|
20:11
π
|
balrog |
Nemo_bis: yeah I did double-check |
|
20:37
π
|
Nemo_bis |
Yes, I've merged in the meanwhile |
|
20:38
π
|
balrog |
Nemo_bis: is there a better way to do this? |
|
20:38
π
|
balrog |
if f.headers.get('Content-Encoding') and 'gzip' in f.headers.get('Content-Encoding'): |
|
20:42
π
|
balrog |
I have gzip working :) |
|
20:42
π
|
balrog |
it's in no way cleanly done, but it's about as clean as the rest of the code |
|
20:47
π
|
balrog |
wow gzip speeds things up a lot |
|
20:48
π
|
Nemo_bis |
yes the scripts are a mess |
|
20:48
π
|
Nemo_bis |
but it's not as simple as that IIRC; because you must uncompress the received content if and only if the server actually sent it gzipped |
|
20:48
π
|
balrog |
well yes |
|
20:48
π
|
balrog |
that's what the if is for |
|
20:49
π
|
balrog |
to check if the server says it's sent gzipped |
|
20:49
π
|
Nemo_bis |
yes sorry |
|
20:49
π
|
Nemo_bis |
I mean it's two lines not two :P one to accept gzip and the other to check if you received gzip like that |
|
20:49
π
|
balrog |
porting to requests wouldn't be a bad idea but this code needs a total rewrite :/ |
|
20:49
π
|
Nemo_bis |
yes... |
|
20:49
π
|
balrog |
my question was is there a better way to do that |
|
20:52
π
|
balrog |
check if the array is not empty and if it has that element |
|
20:52
π
|
balrog |
err, rather |
|
20:52
π
|
balrog |
check if it exists and check if it has that element |
|
20:52
π
|
balrog |
or is what I'm doing alright? |
|
20:53
π
|
balrog |
Nemo_bis: add gzip support: http://hastebin.com/irabokabaf |
|
20:53
π
|
Nemo_bis |
I have no idea how to test this though |
|
20:53
π
|
Nemo_bis |
Did you find one wiki which does *not* serve gzip? |
|
20:54
π
|
balrog |
no :/ |
|
20:55
π
|
balrog |
there's probably a mediawiki appliance for vbox or vagrant... |
|
20:55
π
|
balrog |
http://www.mediawiki.org/wiki/Mediawiki-vagrant |
|
20:55
π
|
Nemo_bis |
yes |
|
20:55
π
|
balrog |
could just set that up and disable gzip |
|
20:57
π
|
Nemo_bis |
sure |
|
20:57
π
|
Nemo_bis |
Or if you tell me the correct config I'll do so immediately on an instance I have on the web |
|
20:58
π
|
balrog |
apache2? |
|
20:58
π
|
balrog |
just disable mod_deflate |
|
20:58
π
|
balrog |
sudo a2dismod deflate |
|
20:58
π
|
balrog |
sudo service apache2 restart |
|
20:58
π
|
balrog |
I can test it with telnet if you wish |
|
20:59
π
|
Nemo_bis |
balrog: try http://pagemigration.wmflabs.org/ |
|
20:59
π
|
Nemo_bis |
quick, puppet will probably revert me soon :P |
|
21:00
π
|
balrog |
Nemo_bis: I tried it and it spat out gzip |
|
21:01
π
|
balrog |
wait, you're using nginx? |
|
21:03
π
|
Nemo_bis |
err, yes |
|
21:03
π
|
Nemo_bis |
that's what puppet is for, I don't even think about that :P |
|
21:03
π
|
balrog |
just change gzip on; to off |
|
21:03
π
|
balrog |
http://nginx.org/en/docs/http/ngx_http_gzip_module.html |
|
21:08
π
|
balrog |
Nemo_bis: will puppet let you do that? |
|
21:12
π
|
Nemo_bis |
sure, changing puppet files is easy |
|
21:13
π
|
ete_ |
Nemo_bis: is there an all-in-one list of alive wikis you know about? |
|
21:13
π
|
ete_ |
(missing some farms is okay) |
|
21:14
π
|
balrog |
Nemo_bis: let me know when you've done so |
|
21:14
π
|
ete_ |
or is it scattered through https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis ? |
|
21:19
π
|
Nemo_bis |
ete_: mostly they'll be on https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_done_2014.txt |
|
21:19
π
|
Nemo_bis |
puppet still running |
|
21:20
π
|
ete_ |
Okay, thanks |
|
21:28
π
|
Nemo_bis |
balrog: I hope it worked |
|
21:29
π
|
Nemo_bis |
(fyi puppet load allegedly brought Wikipedia down for 15 min the other day, it's not just slow with me :p) |
|
21:33
π
|
balrog |
Nemo_bis: it's still sending gzip |
|
21:34
π
|
balrog |
which means the module must be on |
|
21:39
π
|
Nemo_bis |
sigh, can you send a pull request? I'll test myself if emijrp doesn't beat me at it |
|
21:39
π
|
Nemo_bis |
(or I can submit the patch myself tomorrow if you prefer, as you wish) |
|
21:39
π
|
* |
Nemo_bis to bed now |
|
21:44
π
|
balrog |
alright... |
|
21:44
π
|
balrog |
Nemo_bis: you did restart nginx, right? |