Time |
Nickname |
Message |
06:43
🔗
|
Nemo_bis |
Archival of wiki-site.com at this speed (360 s wait between requests) will take about 880 days.... |
06:44
🔗
|
Nemo_bis |
Suggestions for other wait times welcome |
07:14
🔗
|
odie5533 |
361 s |
07:14
🔗
|
Nemo_bis |
https://constantsun.fogbugz.com/default.asp?183_5n6491ub |
07:14
🔗
|
odie5533 |
What is that fogbugz for? |
07:27
🔗
|
Nemo_bis |
EditThis bug tracker apparently |
07:57
🔗
|
w0rp |
I should upload my tanasinn.info archive later. I got the whole thing with dumpgenerator. |
07:58
🔗
|
w0rp |
(Because imageboard and BBS in-jokes must be preserved forever.) |
09:39
🔗
|
Nemo_bis |
good |
18:34
🔗
|
w0rp |
Okay, I filled out all of the forms on archive.org to add tansinn.info. |
18:35
🔗
|
w0rp |
I think it was a little too hard to do, and the forms could definitely use some improvement. I felt like I was applying for insurance or something while doing it. |
18:36
🔗
|
w0rp |
I don't think I've ever seen a <form> before where one of the fields is marked as deprecated. |
18:51
🔗
|
w0rp |
https://archive.org/details/Wiki-Tanasinn.info Case probably needs fixing, but there it is. |
19:06
🔗
|
balrog |
Nemo_bis: what's the issue wrt wait times? |
19:06
🔗
|
balrog |
have you tried wait times with a random bias? |
19:07
🔗
|
Nemo_bis |
balrog: those sites are too stupid to do such a check :) they just have extremely low throttle, it's hard to navigate them even manually on your browser |
19:07
🔗
|
balrog |
ugh... |
19:08
🔗
|
Nemo_bis |
balrog: but they don't give clear errors nor say what speed limit you should follow, so it's hard to understand what's going on |
19:08
🔗
|
Nemo_bis |
sometimes they stop responding at all, sometimes they serve a 500 or 403, sometimes they serve a bogus text warning, sometimes they redirect to some stupid location |
19:09
🔗
|
Nemo_bis |
and sometimes they just are genuinely down or they just have some broken page ^^ |
21:21
🔗
|
odie5533 |
What do the Request packets look like? |
21:22
🔗
|
odie5533 |
if they are down or have a broken page, then that's nothing that can be fixed except to try back later |
21:23
🔗
|
Nemo_bis |
request packets? |
21:23
🔗
|
odie5533 |
HTTP Requests |
21:23
🔗
|
Nemo_bis |
and yes, that's why it's hard to identify what's going wrong and what to retry at what speed |
21:24
🔗
|
Nemo_bis |
odie5533: see yourself https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py |
21:25
🔗
|
odie5533 |
I see... a lot of code. |
21:26
🔗
|
Nemo_bis |
odie5533: e.g. https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#286 |
21:28
🔗
|
odie5533 |
you are getting those "We have retried x times" errors? |
21:28
🔗
|
Nemo_bis |
sometimes |
21:29
🔗
|
odie5533 |
which error are you getting mostly, and on what url? |
21:29
🔗
|
Nemo_bis |
currently I'm not getting any error, I fixed them all |
21:30
🔗
|
Nemo_bis |
the errors the script gives are not particularly important, they're all lies ;) |
21:31
🔗
|
Nemo_bis |
the actual problems are 500, 403, 404, 307 and bogus 200 |
21:35
🔗
|
odie5533 |
let me know if you get it again, and on what url |
21:36
🔗
|
Nemo_bis |
it's not important, it's any URL; they're throttles.. |
21:39
🔗
|
odie5533 |
there's not much that can solve that except using more IPs, or slowing down the rate |
21:39
🔗
|
odie5533 |
might be good to have the program automatically the determine the rate |
21:41
🔗
|
Nemo_bis |
if only it knew how to determine if it was throttled :) |
21:42
🔗
|
Nemo_bis |
one which gives a different issue is dumpgenerator.py --index=http://ac.wikkii.com/w/index.php --xml --images : it fails with urllib complaining about 302 and redirect loops while downloading the list of titles |
21:53
🔗
|
odie5533 |
perhaps there is a redirect loop? |
21:53
🔗
|
odie5533 |
wiki won't even load for me. |
21:56
🔗
|
odie5533 |
there's no pages on that wiki |
22:05
🔗
|
Nemo_bis |
wrong |
22:30
🔗
|
Nemo_bis |
odie5533: a specific case in which one may succeed I filed here: https://code.google.com/p/wikiteam/issues/detail?id=70 |