#wikiteam 2013-11-08,Fri

↑back Search

Time Nickname Message
06:43 🔗 Nemo_bis Archival of wiki-site.com at this speed (360 s wait between requests) will take about 880 days....
06:44 🔗 Nemo_bis Suggestions for other wait times welcome
07:14 🔗 odie5533 361 s
07:14 🔗 Nemo_bis https://constantsun.fogbugz.com/default.asp?183_5n6491ub
07:14 🔗 odie5533 What is that fogbugz for?
07:27 🔗 Nemo_bis EditThis bug tracker apparently
07:57 🔗 w0rp I should upload my tanasinn.info archive later. I got the whole thing with dumpgenerator.
07:58 🔗 w0rp (Because imageboard and BBS in-jokes must be preserved forever.)
09:39 🔗 Nemo_bis good
18:34 🔗 w0rp Okay, I filled out all of the forms on archive.org to add tansinn.info.
18:35 🔗 w0rp I think it was a little too hard to do, and the forms could definitely use some improvement. I felt like I was applying for insurance or something while doing it.
18:36 🔗 w0rp I don't think I've ever seen a <form> before where one of the fields is marked as deprecated.
18:51 🔗 w0rp https://archive.org/details/Wiki-Tanasinn.info Case probably needs fixing, but there it is.
19:06 🔗 balrog Nemo_bis: what's the issue wrt wait times?
19:06 🔗 balrog have you tried wait times with a random bias?
19:07 🔗 Nemo_bis balrog: those sites are too stupid to do such a check :) they just have extremely low throttle, it's hard to navigate them even manually on your browser
19:07 🔗 balrog ugh...
19:08 🔗 Nemo_bis balrog: but they don't give clear errors nor say what speed limit you should follow, so it's hard to understand what's going on
19:08 🔗 Nemo_bis sometimes they stop responding at all, sometimes they serve a 500 or 403, sometimes they serve a bogus text warning, sometimes they redirect to some stupid location
19:09 🔗 Nemo_bis and sometimes they just are genuinely down or they just have some broken page ^^
21:21 🔗 odie5533 What do the Request packets look like?
21:22 🔗 odie5533 if they are down or have a broken page, then that's nothing that can be fixed except to try back later
21:23 🔗 Nemo_bis request packets?
21:23 🔗 odie5533 HTTP Requests
21:23 🔗 Nemo_bis and yes, that's why it's hard to identify what's going wrong and what to retry at what speed
21:24 🔗 Nemo_bis odie5533: see yourself https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py
21:25 🔗 odie5533 I see... a lot of code.
21:26 🔗 Nemo_bis odie5533: e.g. https://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py#286
21:28 🔗 odie5533 you are getting those "We have retried x times" errors?
21:28 🔗 Nemo_bis sometimes
21:29 🔗 odie5533 which error are you getting mostly, and on what url?
21:29 🔗 Nemo_bis currently I'm not getting any error, I fixed them all
21:30 🔗 Nemo_bis the errors the script gives are not particularly important, they're all lies ;)
21:31 🔗 Nemo_bis the actual problems are 500, 403, 404, 307 and bogus 200
21:35 🔗 odie5533 let me know if you get it again, and on what url
21:36 🔗 Nemo_bis it's not important, it's any URL; they're throttles..
21:39 🔗 odie5533 there's not much that can solve that except using more IPs, or slowing down the rate
21:39 🔗 odie5533 might be good to have the program automatically the determine the rate
21:41 🔗 Nemo_bis if only it knew how to determine if it was throttled :)
21:42 🔗 Nemo_bis one which gives a different issue is dumpgenerator.py --index=http://ac.wikkii.com/w/index.php --xml --images : it fails with urllib complaining about 302 and redirect loops while downloading the list of titles
21:53 🔗 odie5533 perhaps there is a redirect loop?
21:53 🔗 odie5533 wiki won't even load for me.
21:56 🔗 odie5533 there's no pages on that wiki
22:05 🔗 Nemo_bis wrong
22:30 🔗 Nemo_bis odie5533: a specific case in which one may succeed I filed here: https://code.google.com/p/wikiteam/issues/detail?id=70

irclogger-viewer