#urlteam 2015-12-23,Wed

↑back Search

Time Nickname Message
00:10 🔗 dashcloud has quit IRC (Read error: Operation timed out)
00:13 🔗 dashcloud has joined #urlteam
00:13 🔗 svchfoo1 sets mode: +o dashcloud
01:56 🔗 bwn has joined #urlteam
02:03 🔗 JesseW has joined #urlteam
02:03 🔗 svchfoo1 sets mode: +o JesseW
02:03 🔗 bwn_ has quit IRC (Read error: Operation timed out)
02:16 🔗 phuzion Hey JesseW
02:16 🔗 JesseW :-)
02:17 🔗 JesseW thanks for your work on the new projects
02:17 🔗 * JesseW is frustrated that migre.me is still down...
02:17 🔗 phuzion No problem. Can you take a peek at hec-su real quick? It threw a TON of errors so I shut it off
02:17 🔗 phuzion I'm not sure if we got banned or what
02:17 🔗 JesseW sure
02:17 🔗 phuzion Tons of this: ScraperError: Number of attempts exceeded for 400 (6s).
02:18 🔗 JesseW yeah, I saw those
02:18 🔗 JesseW wasn't sure what to make of it. What happens when you run curl yourself?
02:18 🔗 phuzion https://gist.github.com/phuzion/4c882d534c2357e2ecd2
02:20 🔗 JesseW so, it works when we run it. I think it might just be going too fast? Or making too many requests in the same batch?
02:20 🔗 JesseW I'll try adjusting it much slower, and try again.
02:21 🔗 phuzion Ok, so they're probably just banning us based on an excessive amount of requests in a timeframe, you'd guess?
02:21 🔗 JesseW maybe....
02:21 🔗 JesseW I mean, maybe they're banning us by IP, in which case it'll be a problem, but... <shrug>
02:22 🔗 JesseW what size queue did you start with?
02:22 🔗 phuzion Defaults I think
02:22 🔗 JesseW ah, 100 is ... probably too large ... for most small scrapers.
02:22 🔗 JesseW I recommend starting with 10, or even 5, and gradually ramping it up to 50.
02:22 🔗 JesseW And only going over 50 for big shorteners that we've ran happily against for a while
02:22 🔗 phuzion I'm gonna try that on poeurl.com
02:23 🔗 phuzion start at 50 urls/item, 10 items in queue.
02:24 🔗 JesseW also, you can enable it for a sec, then disable it, and wait for the results (or errors) to come in, before enabling it again
02:24 🔗 JesseW it's a useful way to test
02:25 🔗 JesseW I've enabled hec-su again
02:26 🔗 JesseW also paused tinyurl_7 for a bit, due to a lot of 520s
02:26 🔗 JesseW feel free to turn it back on in an hour or so
02:27 🔗 phuzion will do.
02:27 🔗 phuzion I think my regex on poeurl-com3 might be a bit too aggressive or something
02:27 🔗 * JesseW will look
02:27 🔗 JesseW hm, it looks fine
02:27 🔗 phuzion It did almost 1k urls and got no results back
02:27 🔗 phuzion for a sequential shortener
02:28 🔗 JesseW do you have an example of a working URL in that range?
02:28 🔗 phuzion Oh, crap, I just assumed 1 and 2 character URLs would be valid.
02:29 🔗 JesseW ha.
02:29 🔗 phuzion uFg and uFh are known valid.
02:29 🔗 JesseW :-)
02:29 🔗 JesseW try it *starting* from 3 characters.
02:30 🔗 phuzion Eh, what's 3K URLs to scan? 62*62 isn't much.
02:30 🔗 JesseW Of course, it's only ~4,000 possibilities till it gets there on it's own.
02:30 🔗 JesseW (as you said)
02:30 🔗 phuzion How do you tell it to start at x number of characters?
02:31 🔗 JesseW put the appropriate sequence number in the "Lower sequence number:" field in the Auto Queue section
02:31 🔗 JesseW that's what the "Common Numbers" table on the wiki page is intended for
02:31 🔗 phuzion Ah, gotcha.
02:32 🔗 phuzion So, if I want to skip all the 4 character URLs, I enter the 62*4 number?
02:32 🔗 phuzion Or is that if I want to START at 4 character URLs?
02:32 🔗 JesseW try it and see -- as long as Enabled is off, you can just change the value and it will show what it converts to when you click Apply
02:33 🔗 JesseW well, one of the hec-su claims has come back safely... the others are still waiting, though
02:35 🔗 JesseW poeurl still hasn't gotten anything, and it's working on the 1..'s
02:35 🔗 JesseW maybe it doesn't like initial digits, either
02:38 🔗 JesseW I haven't found anything that works aside from t.. and u..
02:38 🔗 phuzion Yeah, I basically had the same problem.
02:42 🔗 bwn has quit IRC (Read error: Connection reset by peer)
02:43 🔗 phuzion JesseW: I'm gonna shoot the maintainer of poeurl an email and ask if they can just send us a dbdump
02:44 🔗 JesseW cool.
03:11 🔗 phuzion Email sent
03:12 🔗 JesseW has quit IRC (Leaving.)
04:18 🔗 asdf has joined #urlteam
05:25 🔗 bwn has joined #urlteam
05:54 🔗 JesseW has joined #urlteam
05:54 🔗 svchfoo1 sets mode: +o JesseW
05:56 🔗 JesseW paused poeurl-com3 as it hasn't found anything since uLF
05:56 🔗 JesseW but it did find 7,620 items, so that's good
06:01 🔗 JesseW it seems like hec-su is likely doing some type of slow ban
06:20 🔗 JesseW Yeah, they're clearly doing some type of banning
06:20 🔗 JesseW I've paused it for now, we can get back to it later
06:47 🔗 JesseW has quit IRC (Read error: Operation timed out)
06:52 🔗 Start has quit IRC (Read error: Connection reset by peer)
06:52 🔗 Start_ has joined #urlteam
07:20 🔗 WinterFox has quit IRC (Remote host closed the connection)
07:33 🔗 JesseW has joined #urlteam
07:34 🔗 svchfoo1 sets mode: +o JesseW
07:45 🔗 JesseW has quit IRC (Leaving.)
09:24 🔗 bwn has quit IRC (Read error: Operation timed out)
09:25 🔗 bwn has joined #urlteam
11:19 🔗 VADemon has joined #urlteam
14:33 🔗 phuzion Got a friend who used to run a url shortener, trying to reach out to them and see if they can give us a db dump
16:23 🔗 VADemon has quit IRC (left4dead)
16:23 🔗 asdf has quit IRC (Quit: Leaving)
17:21 🔗 JesseW has joined #urlteam
17:21 🔗 svchfoo1 sets mode: +o JesseW
17:32 🔗 xmc sweet
17:34 🔗 JesseW nicely done, phuzion -- you seem to have a lot of such friends. :-)
17:35 🔗 phuzion JesseW: I just know a bunch of people, a handful of whom run URL shorteners. :)
17:38 🔗 JesseW has quit IRC (Leaving.)
17:55 🔗 Start_ is now known as Start
18:47 🔗 bwn has quit IRC (Ping timeout: 499 seconds)
19:20 🔗 bwn has joined #urlteam
19:35 🔗 bwn has quit IRC (Read error: Connection reset by peer)
19:36 🔗 bwn has joined #urlteam
19:41 🔗 ThisIsGet has joined #urlteam
19:42 🔗 ThisIsGet has left
20:20 🔗 bwn has quit IRC (Read error: Connection reset by peer)
20:20 🔗 bwn has joined #urlteam
20:28 🔗 VADemon has joined #urlteam
21:35 🔗 JesseW has joined #urlteam
21:35 🔗 svchfoo1 sets mode: +o JesseW
22:13 🔗 JesseW has quit IRC (Leaving.)
22:22 🔗 JesseW has joined #urlteam
22:23 🔗 svchfoo1 sets mode: +o JesseW
22:34 🔗 phuzion re-enabled tinyurl_7 for the time being
22:39 🔗 Start has quit IRC (Read error: Connection reset by peer)
22:40 🔗 Coderjoe has quit IRC (Ping timeout: 260 seconds)
22:41 🔗 Start has joined #urlteam
22:42 🔗 JW_work has quit IRC (Ping timeout: 260 seconds)
22:42 🔗 Ctrl-S___ has joined #urlteam
22:42 🔗 JW_work has joined #urlteam
22:51 🔗 Coderjoe has joined #urlteam
23:00 🔗 JesseW cool, thanks

irclogger-viewer