Time |
Nickname |
Message |
19:02
π
|
kyan |
I'm trying to use dumpgenerator.py to archive http://www.frathwiki.com/, but I'm getting "Error in api.php, please, provide a correct path to api.php". The command I'm running: python ../dumpgenerator.py --api=http://www.frathwiki.com/api.php --xml --images Any thoughts? |
19:26
π
|
Nemo_bis |
kyan: try adding also the --index= |
19:27
π
|
Nemo_bis |
sometimes it's stupid enough to be the reason, though it shouldn't because it's in the same path http://www.frathwiki.com/index.php |
19:32
π
|
Nemo_bis |
no idea why it would fail https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#901 |
19:33
π
|
balrog |
UA blocking |
19:33
π
|
balrog |
<title>Access denied | www.frathwiki.com used CloudFlare to restrict access</title> |
19:33
π
|
balrog |
oops |
19:34
π
|
balrog |
<p>The owner of this website (www.frathwiki.com) has banned your access based on your browser's signature (bad82647cc6077f-ua48).</p> |
19:36
π
|
balrog |
it's more than just UA |
19:36
π
|
balrog |
Nemo_bis: ^ |
19:38
π
|
balrog |
https://support.cloudflare.com/hc/en-us/articles/200170086-What-does-the-Browser-Integrity-Check-do- |
19:38
π
|
Nemo_bis |
hmm |
19:39
π
|
balrog |
wget and curl work |
19:40
π
|
Nemo_bis |
yes, that's what confused me :) |
19:41
π
|
Nemo_bis |
are they blocking urllib completely? |
19:41
π
|
balrog |
they're finding some way to block it |
19:42
π
|
balrog |
I changed the first two lines of checkAPI to the following and it works: |
19:42
π
|
balrog |
f = urllib2.urlopen(req) |
19:42
π
|
balrog |
req = urllib2.Request(url=api, headers={'User-Agent': getUserAgent()}) |
19:43
π
|
balrog |
getPageTitlesScraper may still be broken |
19:43
π
|
balrog |
they're probably blocking the urllib UA |
19:43
π
|
Nemo_bis |
yeah |
19:43
π
|
kyan |
strangeΓ’ΒΒ¦ |
19:43
π
|
Nemo_bis |
I was doing the same change at the same time |
19:43
π
|
balrog |
and the script isn't consistent and is using urllib in a few places and urllib2 everywhere else |
19:44
π
|
Nemo_bis |
indeed |
19:49
π
|
Nemo_bis |
ok, fixed that one (thanks balrog), only two urllib.urlopen left :) |
19:49
π
|
balrog |
probably should fix those as well or those gets will fail |
19:51
π
|
Nemo_bis |
yeah but they're there only for the older wikis, this one shouldn't be affected |
20:02
π
|
kyan |
The new version of the script is working now :) |
20:02
π
|
kyan |
thanks! |
20:14
π
|
Nemo_bis |
kyan: thank you for reporting, it was a rather embarrassing overlook :) |