Time |
Nickname |
Message |
00:10
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
00:13
🔗
|
|
dashcloud has joined #urlteam |
00:13
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
01:56
🔗
|
|
bwn has joined #urlteam |
02:03
🔗
|
|
JesseW has joined #urlteam |
02:03
🔗
|
|
svchfoo1 sets mode: +o JesseW |
02:03
🔗
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
02:16
🔗
|
phuzion |
Hey JesseW |
02:16
🔗
|
JesseW |
:-) |
02:17
🔗
|
JesseW |
thanks for your work on the new projects |
02:17
🔗
|
* |
JesseW is frustrated that migre.me is still down... |
02:17
🔗
|
phuzion |
No problem. Can you take a peek at hec-su real quick? It threw a TON of errors so I shut it off |
02:17
🔗
|
phuzion |
I'm not sure if we got banned or what |
02:17
🔗
|
JesseW |
sure |
02:17
🔗
|
phuzion |
Tons of this: ScraperError: Number of attempts exceeded for 400 (6s). |
02:18
🔗
|
JesseW |
yeah, I saw those |
02:18
🔗
|
JesseW |
wasn't sure what to make of it. What happens when you run curl yourself? |
02:18
🔗
|
phuzion |
https://gist.github.com/phuzion/4c882d534c2357e2ecd2 |
02:20
🔗
|
JesseW |
so, it works when we run it. I think it might just be going too fast? Or making too many requests in the same batch? |
02:20
🔗
|
JesseW |
I'll try adjusting it much slower, and try again. |
02:21
🔗
|
phuzion |
Ok, so they're probably just banning us based on an excessive amount of requests in a timeframe, you'd guess? |
02:21
🔗
|
JesseW |
maybe.... |
02:21
🔗
|
JesseW |
I mean, maybe they're banning us by IP, in which case it'll be a problem, but... <shrug> |
02:22
🔗
|
JesseW |
what size queue did you start with? |
02:22
🔗
|
phuzion |
Defaults I think |
02:22
🔗
|
JesseW |
ah, 100 is ... probably too large ... for most small scrapers. |
02:22
🔗
|
JesseW |
I recommend starting with 10, or even 5, and gradually ramping it up to 50. |
02:22
🔗
|
JesseW |
And only going over 50 for big shorteners that we've ran happily against for a while |
02:22
🔗
|
phuzion |
I'm gonna try that on poeurl.com |
02:23
🔗
|
phuzion |
start at 50 urls/item, 10 items in queue. |
02:24
🔗
|
JesseW |
also, you can enable it for a sec, then disable it, and wait for the results (or errors) to come in, before enabling it again |
02:24
🔗
|
JesseW |
it's a useful way to test |
02:25
🔗
|
JesseW |
I've enabled hec-su again |
02:26
🔗
|
JesseW |
also paused tinyurl_7 for a bit, due to a lot of 520s |
02:26
🔗
|
JesseW |
feel free to turn it back on in an hour or so |
02:27
🔗
|
phuzion |
will do. |
02:27
🔗
|
phuzion |
I think my regex on poeurl-com3 might be a bit too aggressive or something |
02:27
🔗
|
* |
JesseW will look |
02:27
🔗
|
JesseW |
hm, it looks fine |
02:27
🔗
|
phuzion |
It did almost 1k urls and got no results back |
02:27
🔗
|
phuzion |
for a sequential shortener |
02:28
🔗
|
JesseW |
do you have an example of a working URL in that range? |
02:28
🔗
|
phuzion |
Oh, crap, I just assumed 1 and 2 character URLs would be valid. |
02:29
🔗
|
JesseW |
ha. |
02:29
🔗
|
phuzion |
uFg and uFh are known valid. |
02:29
🔗
|
JesseW |
:-) |
02:29
🔗
|
JesseW |
try it *starting* from 3 characters. |
02:30
🔗
|
phuzion |
Eh, what's 3K URLs to scan? 62*62 isn't much. |
02:30
🔗
|
JesseW |
Of course, it's only ~4,000 possibilities till it gets there on it's own. |
02:30
🔗
|
JesseW |
(as you said) |
02:30
🔗
|
phuzion |
How do you tell it to start at x number of characters? |
02:31
🔗
|
JesseW |
put the appropriate sequence number in the "Lower sequence number:" field in the Auto Queue section |
02:31
🔗
|
JesseW |
that's what the "Common Numbers" table on the wiki page is intended for |
02:31
🔗
|
phuzion |
Ah, gotcha. |
02:32
🔗
|
phuzion |
So, if I want to skip all the 4 character URLs, I enter the 62*4 number? |
02:32
🔗
|
phuzion |
Or is that if I want to START at 4 character URLs? |
02:32
🔗
|
JesseW |
try it and see -- as long as Enabled is off, you can just change the value and it will show what it converts to when you click Apply |
02:33
🔗
|
JesseW |
well, one of the hec-su claims has come back safely... the others are still waiting, though |
02:35
🔗
|
JesseW |
poeurl still hasn't gotten anything, and it's working on the 1..'s |
02:35
🔗
|
JesseW |
maybe it doesn't like initial digits, either |
02:38
🔗
|
JesseW |
I haven't found anything that works aside from t.. and u.. |
02:38
🔗
|
phuzion |
Yeah, I basically had the same problem. |
02:42
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
02:43
🔗
|
phuzion |
JesseW: I'm gonna shoot the maintainer of poeurl an email and ask if they can just send us a dbdump |
02:44
🔗
|
JesseW |
cool. |
03:11
🔗
|
phuzion |
Email sent |
03:12
🔗
|
|
JesseW has quit IRC (Leaving.) |
04:18
🔗
|
|
asdf has joined #urlteam |
05:25
🔗
|
|
bwn has joined #urlteam |
05:54
🔗
|
|
JesseW has joined #urlteam |
05:54
🔗
|
|
svchfoo1 sets mode: +o JesseW |
05:56
🔗
|
JesseW |
paused poeurl-com3 as it hasn't found anything since uLF |
05:56
🔗
|
JesseW |
but it did find 7,620 items, so that's good |
06:01
🔗
|
JesseW |
it seems like hec-su is likely doing some type of slow ban |
06:20
🔗
|
JesseW |
Yeah, they're clearly doing some type of banning |
06:20
🔗
|
JesseW |
I've paused it for now, we can get back to it later |
06:47
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
06:52
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
06:52
🔗
|
|
Start_ has joined #urlteam |
07:20
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
07:33
🔗
|
|
JesseW has joined #urlteam |
07:34
🔗
|
|
svchfoo1 sets mode: +o JesseW |
07:45
🔗
|
|
JesseW has quit IRC (Leaving.) |
09:24
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
09:25
🔗
|
|
bwn has joined #urlteam |
11:19
🔗
|
|
VADemon has joined #urlteam |
14:33
🔗
|
phuzion |
Got a friend who used to run a url shortener, trying to reach out to them and see if they can give us a db dump |
16:23
🔗
|
|
VADemon has quit IRC (left4dead) |
16:23
🔗
|
|
asdf has quit IRC (Quit: Leaving) |
17:21
🔗
|
|
JesseW has joined #urlteam |
17:21
🔗
|
|
svchfoo1 sets mode: +o JesseW |
17:32
🔗
|
xmc |
sweet |
17:34
🔗
|
JesseW |
nicely done, phuzion -- you seem to have a lot of such friends. :-) |
17:35
🔗
|
phuzion |
JesseW: I just know a bunch of people, a handful of whom run URL shorteners. :) |
17:38
🔗
|
|
JesseW has quit IRC (Leaving.) |
17:55
🔗
|
|
Start_ is now known as Start |
18:47
🔗
|
|
bwn has quit IRC (Ping timeout: 499 seconds) |
19:20
🔗
|
|
bwn has joined #urlteam |
19:35
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
19:36
🔗
|
|
bwn has joined #urlteam |
19:41
🔗
|
|
ThisIsGet has joined #urlteam |
19:42
🔗
|
|
ThisIsGet has left |
20:20
🔗
|
|
bwn has quit IRC (Read error: Connection reset by peer) |
20:20
🔗
|
|
bwn has joined #urlteam |
20:28
🔗
|
|
VADemon has joined #urlteam |
21:35
🔗
|
|
JesseW has joined #urlteam |
21:35
🔗
|
|
svchfoo1 sets mode: +o JesseW |
22:13
🔗
|
|
JesseW has quit IRC (Leaving.) |
22:22
🔗
|
|
JesseW has joined #urlteam |
22:23
🔗
|
|
svchfoo1 sets mode: +o JesseW |
22:34
🔗
|
phuzion |
re-enabled tinyurl_7 for the time being |
22:39
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
22:40
🔗
|
|
Coderjoe has quit IRC (Ping timeout: 260 seconds) |
22:41
🔗
|
|
Start has joined #urlteam |
22:42
🔗
|
|
JW_work has quit IRC (Ping timeout: 260 seconds) |
22:42
🔗
|
|
Ctrl-S___ has joined #urlteam |
22:42
🔗
|
|
JW_work has joined #urlteam |
22:51
🔗
|
|
Coderjoe has joined #urlteam |
23:00
🔗
|
JesseW |
cool, thanks |