#urlteam 2017-05-03,Wed

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
Somebody2Fixing tracker again; minu-me got stuck [02:35]
Turning off the queue for minu-me for a bit; let's see if it drains or not [02:44]
...... (idle for 25mn)
down to 40 rather than 70... we'll see how it goes [03:09]
.......... (idle for 46mn)
ok, the queue finally drained; now re-doing the 7 I removed to clear the blockage
we'll see if any of them consistently fail
nope, they all went through very nicely
strange
ok, turning the queue back on at 60 this time
[03:55]
..... (idle for 20mn)
***Sk1d has joined #urlteam
jornane has quit IRC (Ping timeout: 268 seconds)
jornane has joined #urlteam
[04:17]
Somebody2lower queue to 40 [04:26]
................. (idle for 1h22mn)
reducing queue even further
increase time between requests to 0.8
[05:48]
Frogging: very good point
What do we have that handles resuming better? Warrior jobs?
[05:59]
Froggingyes, something more custom
warrior jobs don't have the resuming problem because they're split up into units anyway
but throwing recursive crawlers at something like metafilter is asking for trouble, especially if it's archivebot
[05:59]
Somebody2If we're doing this on request of the site owner, I'm not sure why they can't do it themselves...
Or mail IA a hard drive of the data, and we can convert it into web pages locally at our leasure.
[06:01]
FroggingI don't think they'd get into wayback that way [06:02]
Somebody2They could get into wayback eventually...
If the site owner sent us the full data, we could spin up a virtual machine that thought it was the original, talk to it with a browser, and thereby get the same as the real site, but without the network delay (and cost)
But I'm pretty sure IA isn't (yet) set up to do that.
Anyway, this isn't actually #urlteam -- we both just mistakenly switched channels halfway through this conversation!
[06:09]
turning off the queue on minu-me for the evening, to avoid it getting the tracker stuck [06:23]
.... (idle for 15mn)
trying the queue at 5 items, with 10 urls in each
OK, that seems to be working; I'll leave that for the night
[06:38]
..................................... (idle for 3h1mn)
***Jonison has joined #urlteam [09:42]
........... (idle for 52mn)
JAA has joined #urlteam [10:34]
................................ (idle for 2h35mn)
JAA has quit IRC (Quit: Page closed) [13:09]
................................. (idle for 2h41mn)
chfooi'll merge that pull request later today [15:50]
.................................................................... (idle for 5h38mn)
***Jonison has quit IRC (Read error: Connection reset by peer) [21:28]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)