[06:43] Archival of wiki-site.com at this speed (360 s wait between requests) will take about 880 days.... [06:44] Suggestions for other wait times welcome [07:14] 361 s [07:14] https://constantsun.fogbugz.com/default.asp?183_5n6491ub [07:14] What is that fogbugz for? [07:27] EditThis bug tracker apparently [07:57] I should upload my tanasinn.info archive later. I got the whole thing with dumpgenerator. [07:58] (Because imageboard and BBS in-jokes must be preserved forever.) [09:39] good [18:34] Okay, I filled out all of the forms on archive.org to add tansinn.info. [18:35] I think it was a little too hard to do, and the forms could definitely use some improvement. I felt like I was applying for insurance or something while doing it. [18:36] I don't think I've ever seen a
before where one of the fields is marked as deprecated. [18:51] https://archive.org/details/Wiki-Tanasinn.info Case probably needs fixing, but there it is. [19:06] Nemo_bis: what's the issue wrt wait times? [19:06] have you tried wait times with a random bias? [19:07] balrog: those sites are too stupid to do such a check :) they just have extremely low throttle, it's hard to navigate them even manually on your browser [19:07] ugh... [19:08] balrog: but they don't give clear errors nor say what speed limit you should follow, so it's hard to understand what's going on [19:08] sometimes they stop responding at all, sometimes they serve a 500 or 403, sometimes they serve a bogus text warning, sometimes they redirect to some stupid location [19:09] and sometimes they just are genuinely down or they just have some broken page ^^ [21:21] What do the Request packets look like? [21:22] if they are down or have a broken page, then that's nothing that can be fixed except to try back later [21:23] request packets? [21:23] HTTP Requests [21:23] and yes, that's why it's hard to identify what's going wrong and what to retry at what speed [21:24] odie5533: see yourself https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py [21:25] I see... a lot of code. [21:26] odie5533: e.g. https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#286 [21:28] you are getting those "We have retried x times" errors? [21:28] sometimes [21:29] which error are you getting mostly, and on what url? [21:29] currently I'm not getting any error, I fixed them all [21:30] the errors the script gives are not particularly important, they're all lies ;) [21:31] the actual problems are 500, 403, 404, 307 and bogus 200 [21:35] let me know if you get it again, and on what url [21:36] it's not important, it's any URL; they're throttles.. [21:39] there's not much that can solve that except using more IPs, or slowing down the rate [21:39] might be good to have the program automatically the determine the rate [21:41] if only it knew how to determine if it was throttled :) [21:42] one which gives a different issue is dumpgenerator.py --index=http://ac.wikkii.com/w/index.php --xml --images : it fails with urllib complaining about 302 and redirect loops while downloading the list of titles [21:53] perhaps there is a redirect loop? [21:53] wiki won't even load for me. [21:56] there's no pages on that wiki [22:05] wrong [22:30] odie5533: a specific case in which one may succeed I filed here: https://code.google.com/p/wikiteam/issues/detail?id=70