[01:23] http://budurl.com/uukc [01:39] *** treyo has joined #urlteam [01:45] *** HCross has quit IRC (Read error: Connection reset by peer) [01:45] *** HCross has joined #urlteam [01:50] *** treyo has quit IRC (Quit: Page closed) [04:00] Sorry I've been out-of-contact for some time. Nice to see the go.usa.gov folks (treyo) speaking up, and being interested in archiving. [04:00] I wondered how long it would be before they noticed how hard we were banging on their server. [04:01] Re: x.vu -- I'll set it running for now; we might as well get what we can. [04:16] x-vu started [04:18] gg-gg hasn't gotten results lately; turning it off [04:19] go-usa-gov is still working fine; I'll keep it going till treyo (or someone else) actually asks us to turn it off, or provides a better alternative. [04:20] vgd_6 hasn't found anything recently, but there are only a million total results right now, so it's probably still fine. [04:21] x-vu has gotten some results [04:21] specifically, about 800 results so far [04:22] x-vu returns HTTP 410 sometimes; added that as a no-redirect expected result [04:23] boosting the queue to 30 [04:24] Hm, I'm not sure if it's case sensitive or not [04:24] the two character ones do not seem to be [04:25] finished all of them, in any case [04:26] boosting queue to 60 [04:29] Interestingly, all of these seem to redirect to a warning page on xdotvu.com [04:29] but the URL includes the real target, so it's good enough for our purposes [04:32] getting a few timeouts, but ... it's going away soon. So, queue up to 90 [04:34] we seem to have about 90 total warriors right now; so I'll boost the queue to 100, that way everybody can join in [04:42] Finished the initial-digit-three-character ones. [04:42] Hm, bunch of errors, dropping queue down to 90 [05:09] checked over 100,000; nearly 4,000 found. [05:46] cleared out the errors, now reloading the queue with 60 [05:54] errors came back; draining queue, then will try 40 [06:02] 40 seems to work, trying 50 [08:20] *** dashcloud has quit IRC (Read error: Operation timed out) [08:20] *** dashcloud has joined #urlteam [09:48] Somebody2: As far as I can tell, the paid short (less than 3 characters) codes are case-insensitive, but the automatic six-character chodes are case-sensitive. No idea about the ones you can set yourself in the advanced options. [10:10] "terroroftinytown.client.errors.ScraperError: Number of attempts exceeded for 5708844300 (0-DHxu)." Hmm. [10:18] *** T31M has joined #urlteam [10:20] *** T31M has quit IRC (Leaving) [14:05] *** zhongfu_ has joined #urlteam [14:05] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [15:19] *** Jonison has joined #urlteam [15:27] *** Jonison has quit IRC (Ping timeout: 260 seconds) [15:40] I turned *off* gg-gg... [15:42] x-vu has finished the 3-character ones [15:43] and it looks like go-usa-gov has blocked us [15:46] I'm trying to drain the queue on it, then we'll leave it off for a few days [15:47] assholes [15:47] :P [15:47] Eh, go-usa-gov is fine; they politely came in a day or so ago, and asked about setting up a bulk export instead. [15:48] I just thought I'd keep the scraper running till they actually told us to stop. [15:48] yeah [15:51] eh, it doesn't seem to be draining; I'll just reset the autoqueue back to the last result, and clear it [15:51] that way it won't interfere with the other jobs [15:52] ok, going afk for the day [15:52] toodleoo [20:26] *** svchfoo1 has quit IRC (Remote host closed the connection) [20:26] *** svchfoo3 has quit IRC (Remote host closed the connection) [20:27] *** svchfoo3 has joined #urlteam [20:27] *** svchfoo1 has joined #urlteam [20:29] *** JAA sets mode: +o svchfoo1 [20:30] *** svchfoo1 sets mode: +o svchfoo3 [20:39] *** astrid sets mode: +ooo joepie91_ HCross2 HCross [20:52] *** svchfoo3 has quit IRC (Remote host closed the connection) [20:53] *** Aoede has left WeeChat 1.9 [20:53] *** svchfoo3 has joined #urlteam [20:54] *** svchfoo1 sets mode: +o svchfoo3