[00:10] *** dashcloud has quit IRC (Read error: Operation timed out) [00:13] *** dashcloud has joined #urlteam [00:13] *** svchfoo1 sets mode: +o dashcloud [01:56] *** bwn has joined #urlteam [02:03] *** JesseW has joined #urlteam [02:03] *** svchfoo1 sets mode: +o JesseW [02:03] *** bwn_ has quit IRC (Read error: Operation timed out) [02:16] Hey JesseW [02:16] :-) [02:17] thanks for your work on the new projects [02:17] * JesseW is frustrated that migre.me is still down... [02:17] No problem. Can you take a peek at hec-su real quick? It threw a TON of errors so I shut it off [02:17] I'm not sure if we got banned or what [02:17] sure [02:17] Tons of this: ScraperError: Number of attempts exceeded for 400 (6s). [02:18] yeah, I saw those [02:18] wasn't sure what to make of it. What happens when you run curl yourself? [02:18] https://gist.github.com/phuzion/4c882d534c2357e2ecd2 [02:20] so, it works when we run it. I think it might just be going too fast? Or making too many requests in the same batch? [02:20] I'll try adjusting it much slower, and try again. [02:21] Ok, so they're probably just banning us based on an excessive amount of requests in a timeframe, you'd guess? [02:21] maybe.... [02:21] I mean, maybe they're banning us by IP, in which case it'll be a problem, but... [02:22] what size queue did you start with? [02:22] Defaults I think [02:22] ah, 100 is ... probably too large ... for most small scrapers. [02:22] I recommend starting with 10, or even 5, and gradually ramping it up to 50. [02:22] And only going over 50 for big shorteners that we've ran happily against for a while [02:22] I'm gonna try that on poeurl.com [02:23] start at 50 urls/item, 10 items in queue. [02:24] also, you can enable it for a sec, then disable it, and wait for the results (or errors) to come in, before enabling it again [02:24] it's a useful way to test [02:25] I've enabled hec-su again [02:26] also paused tinyurl_7 for a bit, due to a lot of 520s [02:26] feel free to turn it back on in an hour or so [02:27] will do. [02:27] I think my regex on poeurl-com3 might be a bit too aggressive or something [02:27] * JesseW will look [02:27] hm, it looks fine [02:27] It did almost 1k urls and got no results back [02:27] for a sequential shortener [02:28] do you have an example of a working URL in that range? [02:28] Oh, crap, I just assumed 1 and 2 character URLs would be valid. [02:29] ha. [02:29] uFg and uFh are known valid. [02:29] :-) [02:29] try it *starting* from 3 characters. [02:30] Eh, what's 3K URLs to scan? 62*62 isn't much. [02:30] Of course, it's only ~4,000 possibilities till it gets there on it's own. [02:30] (as you said) [02:30] How do you tell it to start at x number of characters? [02:31] put the appropriate sequence number in the "Lower sequence number:" field in the Auto Queue section [02:31] that's what the "Common Numbers" table on the wiki page is intended for [02:31] Ah, gotcha. [02:32] So, if I want to skip all the 4 character URLs, I enter the 62*4 number? [02:32] Or is that if I want to START at 4 character URLs? [02:32] try it and see -- as long as Enabled is off, you can just change the value and it will show what it converts to when you click Apply [02:33] well, one of the hec-su claims has come back safely... the others are still waiting, though [02:35] poeurl still hasn't gotten anything, and it's working on the 1..'s [02:35] maybe it doesn't like initial digits, either [02:38] I haven't found anything that works aside from t.. and u.. [02:38] Yeah, I basically had the same problem. [02:42] *** bwn has quit IRC (Read error: Connection reset by peer) [02:43] JesseW: I'm gonna shoot the maintainer of poeurl an email and ask if they can just send us a dbdump [02:44] cool. [03:11] Email sent [03:12] *** JesseW has quit IRC (Leaving.) [04:18] *** asdf has joined #urlteam [05:25] *** bwn has joined #urlteam [05:54] *** JesseW has joined #urlteam [05:54] *** svchfoo1 sets mode: +o JesseW [05:56] paused poeurl-com3 as it hasn't found anything since uLF [05:56] but it did find 7,620 items, so that's good [06:01] it seems like hec-su is likely doing some type of slow ban [06:20] Yeah, they're clearly doing some type of banning [06:20] I've paused it for now, we can get back to it later [06:47] *** JesseW has quit IRC (Read error: Operation timed out) [06:52] *** Start has quit IRC (Read error: Connection reset by peer) [06:52] *** Start_ has joined #urlteam [07:20] *** WinterFox has quit IRC (Remote host closed the connection) [07:33] *** JesseW has joined #urlteam [07:34] *** svchfoo1 sets mode: +o JesseW [07:45] *** JesseW has quit IRC (Leaving.) [09:24] *** bwn has quit IRC (Read error: Operation timed out) [09:25] *** bwn has joined #urlteam [11:19] *** VADemon has joined #urlteam [14:33] Got a friend who used to run a url shortener, trying to reach out to them and see if they can give us a db dump [16:23] *** VADemon has quit IRC (left4dead) [16:23] *** asdf has quit IRC (Quit: Leaving) [17:21] *** JesseW has joined #urlteam [17:21] *** svchfoo1 sets mode: +o JesseW [17:32] sweet [17:34] nicely done, phuzion -- you seem to have a lot of such friends. :-) [17:35] JesseW: I just know a bunch of people, a handful of whom run URL shorteners. :) [17:38] *** JesseW has quit IRC (Leaving.) [17:55] *** Start_ is now known as Start [18:47] *** bwn has quit IRC (Ping timeout: 499 seconds) [19:20] *** bwn has joined #urlteam [19:35] *** bwn has quit IRC (Read error: Connection reset by peer) [19:36] *** bwn has joined #urlteam [19:41] *** ThisIsGet has joined #urlteam [19:42] *** ThisIsGet has left [20:20] *** bwn has quit IRC (Read error: Connection reset by peer) [20:20] *** bwn has joined #urlteam [20:28] *** VADemon has joined #urlteam [21:35] *** JesseW has joined #urlteam [21:35] *** svchfoo1 sets mode: +o JesseW [22:13] *** JesseW has quit IRC (Leaving.) [22:22] *** JesseW has joined #urlteam [22:23] *** svchfoo1 sets mode: +o JesseW [22:34] re-enabled tinyurl_7 for the time being [22:39] *** Start has quit IRC (Read error: Connection reset by peer) [22:40] *** Coderjoe has quit IRC (Ping timeout: 260 seconds) [22:41] *** Start has joined #urlteam [22:42] *** JW_work has quit IRC (Ping timeout: 260 seconds) [22:42] *** Ctrl-S___ has joined #urlteam [22:42] *** JW_work has joined #urlteam [22:51] *** Coderjoe has joined #urlteam [23:00] cool, thanks