[10:51] So emijrp just sent me 500 emails by importing googlecode to https://github.com/WikiTeam/wikiteam , ugh [11:34] haha, typical [11:35] uh, so, huh. That's plenty of stand alone scripts >_> [15:57] Nemo_bis: what are some of these bugs? [15:58] balrog: hard to tell [15:58] do you have a broken wiki and some output? [15:58] let me link the list [15:59] ok [15:59] balrog: https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_pavlo.alive.filtered.todo.txt [15:59] it would be useful to debug those i.e. understandw why they fail despite being supposedly alilve [16:00] is the git repo now canonical? [16:00] Things like not following redirects and so [16:00] Apparently yes, emijrp just made it so :) nobody disagrees [16:01] how quickly does it fail? [16:01] do you have any failure output? [16:02] I picked one from that list at random and it's happily downloading [16:02] balrog: usually it dies on a 3xx error, or a "XML is invalid" loop [16:02] can you find one that actually fails? [16:03] I don't understand the question [16:03] I need a wiki that fails downloading so I can reproduce and debug the problem [16:03] I just linked you 2489 [16:03] and I picked one at random and it's downloaded most of it without failing [16:04] Well, those all failed for me [16:04] some of these seem to have other issues... [16:04] http://amandapalmer.despoiler.org/api.php [16:04] try removing /api.php [16:04] in a browser [16:04] can you try a few of these with the latest dumpgenerator and give me one or a few that consistently fail? [16:05] Sure but last time I checked there was only some 15-20 % of them with database errors [16:05] http://editplus.info/w/api.php completed successfully [16:06] it's going to take me a looooong time to find one that fails at this rate [16:07] you can just launch launcher.py over the list, that's what I do [16:07] since as I try them they either work or have db issues [16:08] I want to test with one that is known non-working for you [16:10] it's also possible something's broken with your python install [16:10] I want to rule that out too [16:10] (or that perhaps there's platform-specific code in the script) [16:11] if you want specific examples that's what I filed bugs for :) [16:11] For instance https://github.com/WikiTeam/wikiteam/issues/102 [16:11] I can't start a massive debugging sprint right now, sorry [16:20] balrog: the DB errors thing is covered in https://github.com/WikiTeam/wikiteam/issues/48 [16:22] balrog: more recently tested ones should be at https://github.com/WikiTeam/wikiteam/issues/78 [16:26] ok [16:42] sigh https://encrypted.google.com/search?q=%22internal_api_error%22+inurl%3Aapi.php [16:44] Salve Nemo_bis [16:44] ti posso chiedere una cosa? [16:46] Orphen: spara [16:47] ho un problema ho spostato la mia wiki da un server che avevo ad un altro. Ora sul nuovo server non escono le immagini, al loro posto mi esce scritto: Errore nella creazione della miniatura: Impossibile salvare la miniatura nella destinazione [16:47] cosa può essere? [16:48] probabilmente non hai impostato i permessi di scrittura nella cartella [16:48] quale cartella [16:48] ? [16:48] quella delle immagini e/o delle miniature [16:48] non so qual'è [16:48] images/ cache/ e simili, dipende dalla tua impsotazione [16:49] ma a 777 li devo mettere? [16:49] dove vedo la mia impostazione? [16:50] hai accesso da riga di comando o solo FTP? [16:50] entrambi [16:53] ho settato tutte le cartelle a 777 ma ancora non si vedono [16:54] ho installato anche sul server ImageMagick sul server perchè ho trovato su un forum che se non c'è questo app potrebbero non vedersi le img [16:54] ma niente [16:55] Orphen: devi controllare che cartella hai impostato in LocalSettings.php e se esiste [16:56] qual'è il paramentro? [16:57] controllale tutte... [17:06] ho messo tutta la cartella a 777 con sottocartelle ma niente :( [17:07] non so più che provare [17:14] Nemo_bis: I see why it's breaking [17:14] it's trying to do a POST and the URL wants to redirect [17:14] which you can't do per RFC [17:17] Nemo_bis: do you have anything against borrowing the class from https://github.com/crustymonkey/py-sonic/blob/master/libsonic/connection.py#L50 ? [17:18] balrog: looks sensible as long as we're on urllib2 [17:48] The POST was originally added to circumvent some stupid anti-DDoS [17:48] IIRC [17:51] ew. ok [18:19] Nemo_bis: I made the URL exports less bad [18:19] https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Farm_URL_export [18:19] https://wikiapiary.com/wiki/User:Nemo_bis/WikiApiary_Standalone_URL_export [18:19] Now it won't have a load of duplicate URLs [18:20] and actually gives you the main page URL [18:21] had to split the standalone one to two pages due to template include limits, but it'll automatically remove the notice once we drop below 4000 wikis marked as unarchived [18:23] I'm hoping you guys will use WikiApiary as a major way to track things once we start bulk importing every site we can get our hands on [18:23] I can make lots of fun reports :) [18:24] and wikiapiary will get all the websites [18:30] Nemo_bis: http://hastebin.com/kokeqexire [18:30] Nemo_bis: also @balr0g on github [18:32] that should squash #46 [18:33] https://code.google.com/p/wikiteam/issues/detail?id=78 [18:34] wikkii.com wikis have the API disabled. [18:34] which is all of those [18:37] ou [18:37] * Nemo_bis has dinner now [18:37] baked small patch for #48 in the meanwhile [18:38] no issues with http://www.wiki-aventurica.de/ but it seems empty, but for images. [18:38] so far haven't run into that issue [18:40] what we really need is a working yahoogroup dumper :/ [18:40] yahoo groups neo broke the old one [18:52] gzip... is that even worth implementing? [18:58] define "implementing" [19:00] It's not worth it in the sense that we should use a library that takes care of it. :) But wikis sometimes get angry if you don't use it and there's a really huge saving when downloading XML. [19:28] balrog: do you want to submit the patch / pull request yourself or do you want me to? [19:28] you don't have commit access? [19:28] Sure [19:28] "wikis sometimes get angry if you don't use it" is a good enough reason [19:28] :) [19:29] It's even mentioned in Wikimedia's etiquette and, well, Wikimedia almost never blocks anyone. [19:35] Hmpf, that class is GPLv3+ [19:36] Ah good, we're GPLv3 too ;) [20:11] Nemo_bis: yeah I did double-check [20:37] Yes, I've merged in the meanwhile [20:38] Nemo_bis: is there a better way to do this? [20:38] if f.headers.get('Content-Encoding') and 'gzip' in f.headers.get('Content-Encoding'): [20:42] I have gzip working :) [20:42] it's in no way cleanly done, but it's about as clean as the rest of the code [20:47] wow gzip speeds things up a lot [20:48] yes the scripts are a mess [20:48] but it's not as simple as that IIRC; because you must uncompress the received content if and only if the server actually sent it gzipped [20:48] well yes [20:48] that's what the if is for [20:49] to check if the server says it's sent gzipped [20:49] yes sorry [20:49] I mean it's two lines not two :P one to accept gzip and the other to check if you received gzip like that [20:49] porting to requests wouldn't be a bad idea but this code needs a total rewrite :/ [20:49] yes... [20:49] my question was is there a better way to do that [20:52] check if the array is not empty and if it has that element [20:52] err, rather [20:52] check if it exists and check if it has that element [20:52] or is what I'm doing alright? [20:53] Nemo_bis: add gzip support: http://hastebin.com/irabokabaf [20:53] I have no idea how to test this though [20:53] Did you find one wiki which does *not* serve gzip? [20:54] no :/ [20:55] there's probably a mediawiki appliance for vbox or vagrant... [20:55] http://www.mediawiki.org/wiki/Mediawiki-vagrant [20:55] yes [20:55] could just set that up and disable gzip [20:57] sure [20:57] Or if you tell me the correct config I'll do so immediately on an instance I have on the web [20:58] apache2? [20:58] just disable mod_deflate [20:58] sudo a2dismod deflate [20:58] sudo service apache2 restart [20:58] I can test it with telnet if you wish [20:59] balrog: try http://pagemigration.wmflabs.org/ [20:59] quick, puppet will probably revert me soon :P [21:00] Nemo_bis: I tried it and it spat out gzip [21:01] wait, you're using nginx? [21:03] err, yes [21:03] that's what puppet is for, I don't even think about that :P [21:03] just change gzip on; to off [21:03] http://nginx.org/en/docs/http/ngx_http_gzip_module.html [21:08] Nemo_bis: will puppet let you do that? [21:12] sure, changing puppet files is easy [21:13] Nemo_bis: is there an all-in-one list of alive wikis you know about? [21:13] (missing some farms is okay) [21:14] Nemo_bis: let me know when you've done so [21:14] or is it scattered through https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis ? [21:19] ete_: mostly they'll be on https://github.com/WikiTeam/wikiteam/blob/master/batchdownload/taskforce/mediawikis_done_2014.txt [21:19] puppet still running [21:20] Okay, thanks [21:28] balrog: I hope it worked [21:29] (fyi puppet load allegedly brought Wikipedia down for 15 min the other day, it's not just slow with me :p) [21:33] Nemo_bis: it's still sending gzip [21:34] which means the module must be on [21:39] sigh, can you send a pull request? I'll test myself if emijrp doesn't beat me at it [21:39] (or I can submit the patch myself tomorrow if you prefer, as you wish) [21:39] * Nemo_bis to bed now [21:44] alright... [21:44] Nemo_bis: you did restart nginx, right?