#wikiteam 2014-06-25,Wed

↑back Search

Time Nickname Message
10:51 πŸ”— Nemo_bis So emijrp just sent me 500 emails by importing googlecode to https://github.com/WikiTeam/wikiteam , ugh
11:34 πŸ”— ersi haha, typical
11:35 πŸ”— ersi uh, so, huh. That's plenty of stand alone scripts >_>
15:57 πŸ”— balrog Nemo_bis: what are some of these bugs?
15:58 πŸ”— Nemo_bis balrog: hard to tell
15:58 πŸ”— balrog do you have a broken wiki and some output?
15:58 πŸ”— Nemo_bis let me link the list
15:59 πŸ”— balrog ok
15:59 πŸ”— Nemo_bis balrog: https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt
15:59 πŸ”— Nemo_bis it would be useful to debug those i.e. understandw why they fail despite being supposedly alilve
16:00 πŸ”— balrog is the git repo now canonical?
16:00 πŸ”— Nemo_bis Things like not following redirects and so
16:00 πŸ”— Nemo_bis Apparently yes, emijrp just made it so :) nobody disagrees
16:01 πŸ”— balrog how quickly does it fail?
16:01 πŸ”— balrog do you have any failure output?
16:02 πŸ”— balrog I picked one from that list at random and it's happily downloading
16:02 πŸ”— Nemo_bis balrog: usually it dies on a 3xx error, or a "XML is invalid" loop
16:02 πŸ”— balrog can you find one that actually fails?
16:03 πŸ”— Nemo_bis I don't understand the question
16:03 πŸ”— balrog I need a wiki that fails downloading so I can reproduce and debug the problem
16:03 πŸ”— Nemo_bis I just linked you 2489
16:03 πŸ”— balrog and I picked one at random and it's downloaded most of it without failing
16:04 πŸ”— Nemo_bis Well, those all failed for me
16:04 πŸ”— balrog some of these seem to have other issues...
16:04 πŸ”— balrog http://amandapalmer.despoiler.org/api.php
16:04 πŸ”— balrog try removing /api.php
16:04 πŸ”— balrog in a browser
16:04 πŸ”— balrog can you try a few of these with the latest dumpgenerator and give me one or a few that consistently fail?
16:05 πŸ”— Nemo_bis Sure but last time I checked there was only some 15-20 % of them with database errors
16:05 πŸ”— balrog http://editplus.info/w/api.php completed successfully
16:06 πŸ”— balrog it's going to take me a looooong time to find one that fails at this rate
16:07 πŸ”— Nemo_bis you can just launch launcher.py over the list, that's what I do
16:07 πŸ”— balrog since as I try them they either work or have db issues
16:08 πŸ”— balrog I want to test with one that is known non-working for you
16:10 πŸ”— balrog it's also possible something's broken with your python install
16:10 πŸ”— balrog I want to rule that out too
16:10 πŸ”— balrog (or that perhaps there's platform-specific code in the script)
16:11 πŸ”— Nemo_bis if you want specific examples that's what I filed bugs for :)
16:11 πŸ”— Nemo_bis For instance https://github.com/WikiTeam/wikiteam/issues/102
16:11 πŸ”— Nemo_bis I can't start a massive debugging sprint right now, sorry
16:20 πŸ”— Nemo_bis balrog: the DB errors thing is covered in https://github.com/WikiTeam/wikiteam/issues/48
16:22 πŸ”— Nemo_bis balrog: more recently tested ones should be at https://github.com/WikiTeam/wikiteam/issues/78
16:26 πŸ”— balrog ok
16:42 πŸ”— Nemo_bis sigh https://encrypted.google.com/search?q=%22internal_api_error%22+inurl%3Aapi.php
16:44 πŸ”— Orphen Salve Nemo_bis
16:44 πŸ”— Orphen ti posso chiedere una cosa?
16:46 πŸ”— Nemo_bis Orphen: spara
16:47 πŸ”— Orphen ho un problema ho spostato la mia wiki da un server che avevo ad un altro. Ora sul nuovo server non escono le immagini, al loro posto mi esce scritto: Errore nella creazione della miniatura: Impossibile salvare la miniatura nella destinazione
16:47 πŸ”— Orphen cosa può essere?
16:48 πŸ”— Nemo_bis probabilmente non hai impostato i permessi di scrittura nella cartella
16:48 πŸ”— Orphen quale cartella
16:48 πŸ”— Orphen ?
16:48 πŸ”— Nemo_bis quella delle immagini e/o delle miniature
16:48 πŸ”— Orphen non so qual'è
16:48 πŸ”— Nemo_bis images/ cache/ e simili, dipende dalla tua impsotazione
16:49 πŸ”— Orphen ma a 777 li devo mettere?
16:49 πŸ”— Orphen dove vedo la mia impostazione?
16:50 πŸ”— Nemo_bis hai accesso da riga di comando o solo FTP?
16:50 πŸ”— Orphen entrambi
16:53 πŸ”— Orphen ho settato tutte le cartelle a 777 ma ancora non si vedono
16:54 πŸ”— Orphen ho installato anche sul server ImageMagick sul server perchè ho trovato su un forum che se non c'è questo app potrebbero non vedersi le img
16:54 πŸ”— Orphen ma niente
16:55 πŸ”— Nemo_bis Orphen: devi controllare che cartella hai impostato in LocalSettings.php e se esiste
16:56 πŸ”— Orphen qual'è il paramentro?
16:57 πŸ”— Nemo_bis controllale tutte...
17:06 πŸ”— Orphen ho messo tutta la cartella a 777 con sottocartelle ma niente :(
17:07 πŸ”— Orphen non so più che provare
17:14 πŸ”— balrog Nemo_bis: I see why it's breaking
17:14 πŸ”— balrog it's trying to do a POST and the URL wants to redirect
17:14 πŸ”— balrog which you can't do per RFC
17:17 πŸ”— balrog Nemo_bis: do you have anything against borrowing the class from https://github.com/crustymonkey/py-sonic/blob/master/libsonic/connection.py#L50 ?
17:18 πŸ”— Nemo_bis balrog: looks sensible as long as we're on urllib2
17:48 πŸ”— Nemo_bis The POST was originally added to circumvent some stupid anti-DDoS
17:48 πŸ”— Nemo_bis IIRC
17:51 πŸ”— balrog ew. ok
18:19 πŸ”— ete_ Nemo_bis: I made the URL exports less bad
18:19 πŸ”— ete_ https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Farm_URL_export
18:19 πŸ”— ete_ https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Standalone_URL_export
18:19 πŸ”— ete_ Now it won't have a load of duplicate URLs
18:20 πŸ”— ete_ and actually gives you the main page URL
18:21 πŸ”— ete_ had to split the standalone one to two pages due to template include limits, but it'll automatically remove the notice once we drop below 4000 wikis marked as unarchived
18:23 πŸ”— ete_ I'm hoping you guys will use WikiApiary as a major way to track things once we start bulk importing every site we can get our hands on
18:23 πŸ”— ete_ I can make lots of fun reports :)
18:24 πŸ”— ete_ and wikiapiary will get all the websites
18:30 πŸ”— balrog Nemo_bis: http://hastebin.com/kokeqexire
18:30 πŸ”— balrog Nemo_bis: also @balr0g on github
18:32 πŸ”— balrog that should squash #46
18:33 πŸ”— balrog https://code.google.com/p/wikiteam/issues/detail?id=78
18:34 πŸ”— balrog wikkii.com wikis have the API disabled.
18:34 πŸ”— balrog which is all of those
18:37 πŸ”— Nemo_bis ou
18:37 πŸ”— * Nemo_bis has dinner now
18:37 πŸ”— Nemo_bis baked small patch for #48 in the meanwhile
18:38 πŸ”— balrog no issues with http://www.wiki-aventurica.de/ but it seems empty, but for images.
18:38 πŸ”— balrog so far haven't run into that issue
18:40 πŸ”— balrog what we really need is a working yahoogroup dumper :/
18:40 πŸ”— balrog yahoo groups neo broke the old one
18:52 πŸ”— balrog gzip... is that even worth implementing?
18:58 πŸ”— Nemo_bis define "implementing"
19:00 πŸ”— Nemo_bis It's not worth it in the sense that we should use a library that takes care of it. :) But wikis sometimes get angry if you don't use it and there's a really huge saving when downloading XML.
19:28 πŸ”— Nemo_bis balrog: do you want to submit the patch / pull request yourself or do you want me to?
19:28 πŸ”— balrog you don't have commit access?
19:28 πŸ”— Nemo_bis Sure
19:28 πŸ”— balrog "wikis sometimes get angry if you don't use it" is a good enough reason
19:28 πŸ”— Nemo_bis :)
19:29 πŸ”— Nemo_bis It's even mentioned in Wikimedia's etiquette and, well, Wikimedia almost never blocks anyone.
19:35 πŸ”— Nemo_bis Hmpf, that class is GPLv3+
19:36 πŸ”— Nemo_bis Ah good, we're GPLv3 too ;)
20:11 πŸ”— balrog Nemo_bis: yeah I did double-check
20:37 πŸ”— Nemo_bis Yes, I've merged in the meanwhile
20:38 πŸ”— balrog Nemo_bis: is there a better way to do this?
20:38 πŸ”— balrog if f.headers.get('Content-Encoding') and 'gzip' in f.headers.get('Content-Encoding'):
20:42 πŸ”— balrog I have gzip working :)
20:42 πŸ”— balrog it's in no way cleanly done, but it's about as clean as the rest of the code
20:47 πŸ”— balrog wow gzip speeds things up a lot
20:48 πŸ”— Nemo_bis yes the scripts are a mess
20:48 πŸ”— Nemo_bis but it's not as simple as that IIRC; because you must uncompress the received content if and only if the server actually sent it gzipped
20:48 πŸ”— balrog well yes
20:48 πŸ”— balrog that's what the if is for
20:49 πŸ”— balrog to check if the server says it's sent gzipped
20:49 πŸ”— Nemo_bis yes sorry
20:49 πŸ”— Nemo_bis I mean it's two lines not two :P one to accept gzip and the other to check if you received gzip like that
20:49 πŸ”— balrog porting to requests wouldn't be a bad idea but this code needs a total rewrite :/
20:49 πŸ”— Nemo_bis yes...
20:49 πŸ”— balrog my question was is there a better way to do that
20:52 πŸ”— balrog check if the array is not empty and if it has that element
20:52 πŸ”— balrog err, rather
20:52 πŸ”— balrog check if it exists and check if it has that element
20:52 πŸ”— balrog or is what I'm doing alright?
20:53 πŸ”— balrog Nemo_bis: add gzip support: http://hastebin.com/irabokabaf
20:53 πŸ”— Nemo_bis I have no idea how to test this though
20:53 πŸ”— Nemo_bis Did you find one wiki which does *not* serve gzip?
20:54 πŸ”— balrog no :/
20:55 πŸ”— balrog there's probably a mediawiki appliance for vbox or vagrant...
20:55 πŸ”— balrog http://www.mediawiki.org/wiki/Mediawiki-vagrant
20:55 πŸ”— Nemo_bis yes
20:55 πŸ”— balrog could just set that up and disable gzip
20:57 πŸ”— Nemo_bis sure
20:57 πŸ”— Nemo_bis Or if you tell me the correct config I'll do so immediately on an instance I have on the web
20:58 πŸ”— balrog apache2?
20:58 πŸ”— balrog just disable mod_deflate
20:58 πŸ”— balrog sudo a2dismod deflate
20:58 πŸ”— balrog sudo service apache2 restart
20:58 πŸ”— balrog I can test it with telnet if you wish
20:59 πŸ”— Nemo_bis balrog: try http://pagemigration.wmflabs.org/
20:59 πŸ”— Nemo_bis quick, puppet will probably revert me soon :P
21:00 πŸ”— balrog Nemo_bis: I tried it and it spat out gzip
21:01 πŸ”— balrog wait, you're using nginx?
21:03 πŸ”— Nemo_bis err, yes
21:03 πŸ”— Nemo_bis that's what puppet is for, I don't even think about that :P
21:03 πŸ”— balrog just change gzip on; to off
21:03 πŸ”— balrog http://nginx.org/en/docs/http/ngx_http_gzip_module.html
21:08 πŸ”— balrog Nemo_bis: will puppet let you do that?
21:12 πŸ”— Nemo_bis sure, changing puppet files is easy
21:13 πŸ”— ete_ Nemo_bis: is there an all-in-one list of alive wikis you know about?
21:13 πŸ”— ete_ (missing some farms is okay)
21:14 πŸ”— balrog Nemo_bis: let me know when you've done so
21:14 πŸ”— ete_ or is it scattered through https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis ?
21:19 πŸ”— Nemo_bis ete_: mostly they'll be on https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_done_2014.txt
21:19 πŸ”— Nemo_bis puppet still running
21:20 πŸ”— ete_ Okay, thanks
21:28 πŸ”— Nemo_bis balrog: I hope it worked
21:29 πŸ”— Nemo_bis (fyi puppet load allegedly brought Wikipedia down for 15 min the other day, it's not just slow with me :p)
21:33 πŸ”— balrog Nemo_bis: it's still sending gzip
21:34 πŸ”— balrog which means the module must be on
21:39 πŸ”— Nemo_bis sigh, can you send a pull request? I'll test myself if emijrp doesn't beat me at it
21:39 πŸ”— Nemo_bis (or I can submit the patch myself tomorrow if you prefer, as you wish)
21:39 πŸ”— * Nemo_bis to bed now
21:44 πŸ”— balrog alright...
21:44 πŸ”— balrog Nemo_bis: you did restart nginx, right?

irclogger-viewer