[02:35] Fixing tracker again; minu-me got stuck [02:44] Turning off the queue for minu-me for a bit; let's see if it drains or not [03:09] down to 40 rather than 70... we'll see how it goes [03:55] ok, the queue finally drained; now re-doing the 7 I removed to clear the blockage [03:55] we'll see if any of them consistently fail [03:56] nope, they all went through very nicely [03:56] strange [03:57] ok, turning the queue back on at 60 this time [04:17] *** Sk1d has joined #urlteam [04:18] *** jornane has quit IRC (Ping timeout: 268 seconds) [04:22] *** jornane has joined #urlteam [04:26] lower queue to 40 [05:48] reducing queue even further [05:49] increase time between requests to 0.8 [05:59] Frogging: very good point [05:59] What do we have that handles resuming better? Warrior jobs? [05:59] yes, something more custom [06:00] warrior jobs don't have the resuming problem because they're split up into units anyway [06:00] but throwing recursive crawlers at something like metafilter is asking for trouble, especially if it's archivebot [06:01] If we're doing this on request of the site owner, I'm not sure why they can't do it themselves... [06:01] Or mail IA a hard drive of the data, and we can convert it into web pages locally at our leasure. [06:02] I don't think they'd get into wayback that way [06:09] They could get into wayback eventually... [06:10] If the site owner sent us the full data, we could spin up a virtual machine that thought it was the original, talk to it with a browser, and thereby get the same as the real site, but without the network delay (and cost) [06:10] But I'm pretty sure IA isn't (yet) set up to do that. [06:11] Anyway, this isn't actually #urlteam -- we both just mistakenly switched channels halfway through this conversation! [06:23] turning off the queue on minu-me for the evening, to avoid it getting the tracker stuck [06:38] trying the queue at 5 items, with 10 urls in each [06:41] OK, that seems to be working; I'll leave that for the night [09:42] *** Jonison has joined #urlteam [10:34] *** JAA has joined #urlteam [13:09] *** JAA has quit IRC (Quit: Page closed) [15:50] i'll merge that pull request later today [21:28] *** Jonison has quit IRC (Read error: Connection reset by peer)