Time |
Nickname |
Message |
00:08
π
|
phuzion |
google search of site:soli.dm returns about 566 results. shorturl alphabet seems to be 5x A-Z a-z 0-9 |
00:16
π
|
phuzion |
asdf: Added it to the URLTeam wiki page. |
00:19
π
|
asdf |
ok, thank you |
00:19
π
|
asdf |
yeah 566 seems worth saving doesn't it |
00:19
π
|
asdf |
means someone has been using it at least a bit |
00:42
π
|
|
bwn_ has joined #urlteam |
00:43
π
|
asdf |
you have all these? - http://bit.do/list-of-url-shorteners.php |
00:44
π
|
JW_work |
asdf: yep, check the history of the page |
00:45
π
|
JW_work |
er, our page |
00:45
π
|
JW_work |
you'll see I imported them, then checked them off |
00:45
π
|
JW_work |
thanks for asking about it, though! |
00:45
π
|
|
bwn has quit IRC (Read error: Operation timed out) |
01:24
π
|
|
JesseW has joined #urlteam |
01:24
π
|
|
svchfoo1 sets mode: +o JesseW |
01:29
π
|
|
Coderjoe has quit IRC (Read error: Connection reset by peer) |
02:03
π
|
|
Coderjoe has joined #urlteam |
02:06
π
|
|
bwn has joined #urlteam |
02:11
π
|
|
bwn_ has quit IRC (Read error: Operation timed out) |
02:21
π
|
|
W1nterFox has joined #urlteam |
02:27
π
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
02:58
π
|
|
bwn has quit IRC (Ping timeout: 250 seconds) |
03:07
π
|
|
JesseW has quit IRC (Leaving.) |
03:36
π
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:54
π
|
|
JesseW has joined #urlteam |
04:54
π
|
|
svchfoo3 sets mode: +o JesseW |
06:21
π
|
|
cechk01 has joined #urlteam |
07:39
π
|
|
asdf has quit IRC (Ping timeout: 252 seconds) |
08:00
π
|
|
JesseW has quit IRC (Leaving.) |
08:08
π
|
|
GLaDOS has quit IRC (Ping timeout: 252 seconds) |
08:11
π
|
|
GLaDOS has joined #urlteam |
08:11
π
|
|
svchfoo3 sets mode: +o GLaDOS |
09:01
π
|
|
deathy___ has quit IRC (Ping timeout: 252 seconds) |
09:11
π
|
|
deathy___ has joined #urlteam |
09:15
π
|
|
bwn has joined #urlteam |
09:23
π
|
|
deathy___ has quit IRC (Ping timeout: 252 seconds) |
09:42
π
|
|
deathy___ has joined #urlteam |
11:52
π
|
|
deathy___ has quit IRC (Ping timeout: 252 seconds) |
12:31
π
|
|
deathy___ has joined #urlteam |
13:07
π
|
|
deathy___ has quit IRC () |
13:08
π
|
|
deathy___ has joined #urlteam |
14:14
π
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
14:29
π
|
|
Coderjoe has joined #urlteam |
14:29
π
|
|
W1nterFox has quit IRC (Remote host closed the connection) |
15:05
π
|
|
Start has quit IRC (Quit: Disconnected.) |
15:07
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
15:11
π
|
|
dashcloud has joined #urlteam |
15:11
π
|
|
svchfoo1 sets mode: +o dashcloud |
15:16
π
|
phuzion |
750k urls scanned on da.gd. I just asked my buddy if we can bump up the scrape rate, I'll let you all know what he says. |
15:39
π
|
|
JesseW has joined #urlteam |
15:40
π
|
|
svchfoo1 sets mode: +o JesseW |
15:43
π
|
JesseW |
we seem to be finding them at about the expected rate |
16:16
π
|
|
JesseW has quit IRC (Leaving.) |
16:44
π
|
phuzion |
Can we add soli.dm as a project to scrape? |
16:45
π
|
phuzion |
It's in the "Alive" list, but it's not as a warrior project yet. |
16:46
π
|
JW_work |
It looks doable, yeah. I'll probably add it tonight. |
16:50
π
|
phuzion |
Ok cool. I figured out the alphabet of shorl.com too |
16:52
π
|
phuzion |
Wiki pages says it doesn't look guessable, because the short urls are like 13 a-z characters long, but in reality it's more like 6 characters of a 128-character alphabet. |
16:53
π
|
phuzion |
For example, the shorturl on the wiki page is http://shorl.com/tisikestibahu. Initially looking at it, you'd see that as 13 characters of a-z. In reality, it breaks down like this: TI-SI-KE-STI-BA-HU |
16:55
π
|
phuzion |
Here's a gist of the alphabet: https://gist.github.com/anonymous/599ba4c17599cd213005 |
16:57
π
|
JW_work |
awesome! That will require custom code, though β so unless you write it, it will probably take a while to get to. |
16:58
π
|
JW_work |
*If* you (or someone else) wants to write it, I'll be glad to help with reviewing it, test cases, etc. |
17:08
π
|
JW_work |
phuzion: btw, did you hear back from the da.gd owner? |
17:08
π
|
phuzion |
JW_work: Not yet, I think he's passed out right now. No complaints about load or anything though. He did say we were beating the crap out his logs though, lol |
17:09
π
|
JW_work |
ha. well, yes, it would do that. |
17:09
π
|
phuzion |
"phuzion: so far 0.0001% of archiveteam hits have been non-404s" |
17:09
π
|
JW_work |
he's still welcome to send us a dump, and we'll gladly stop scraping it |
17:10
π
|
JW_work |
yep, 165K out of the total should give us that hitrate |
17:10
π
|
phuzion |
Do we log response times by chance? |
17:10
π
|
JW_work |
I don't think so, no |
17:10
π
|
phuzion |
Also, does the client reuse http connections if possible? Or does it start a new HTTP connection for each request? |
17:11
π
|
JW_work |
We have a 60 second timeout, so I see how many go over that⦠|
17:11
π
|
JW_work |
(and yes, it does reuse connections) |
17:11
π
|
JW_work |
but items are only 50 URL, so it will (of course) create a new connection for each group of 50 |
17:12
π
|
JW_work |
I can boost the size of the items, though, too |
17:13
π
|
phuzion |
A new HTTP connection every 50 URLs is fine, I just wanted to make sure that there was some optimization inside there. No need to open a connection, make the request, close the connection, then open yet another connection. |
17:14
π
|
|
Start has joined #urlteam |
17:14
π
|
JW_work |
yep |
17:14
π
|
JW_work |
check services/base.py for the code |
17:35
π
|
phuzion |
JW_work: Do you know of any other shorteners that have a custom alphabet? I __MIIIIIIIGHT__ be able to whip something up if that's the case. |
17:36
π
|
JW_work |
I don't remember any; check the wiki page |
17:36
π
|
JW_work |
I bet there are some, though |
17:37
π
|
JW_work |
there are still a *big* list of ones to research |
17:47
π
|
phuzion |
Daring Fireball (df4.us) should be a quick scan, just researched that one. |
17:58
π
|
JW_work |
cool, excellent research β I'll add it as soon as I'm at home. |
18:34
π
|
|
Start has quit IRC (Quit: Disconnected.) |
19:14
π
|
|
Start has joined #urlteam |
19:16
π
|
|
Start has quit IRC (Client Quit) |
19:32
π
|
|
asdf has joined #urlteam |
19:39
π
|
|
bwn has quit IRC (Read error: Operation timed out) |
19:44
π
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:53
π
|
|
aaaaaaaaa has joined #urlteam |
19:53
π
|
|
swebb sets mode: +o aaaaaaaaa |
19:54
π
|
|
dashcloud has joined #urlteam |
19:55
π
|
|
svchfoo1 sets mode: +o dashcloud |
20:15
π
|
|
Protab has joined #urlteam |
20:15
π
|
|
Rotab has quit IRC (Ping timeout: 200 seconds) |
20:15
π
|
|
Protab is now known as Rotab |
20:19
π
|
|
bwn has joined #urlteam |
20:47
π
|
|
Start has joined #urlteam |
21:26
π
|
|
Start has quit IRC (Quit: Disconnected.) |
22:10
π
|
|
Start has joined #urlteam |
22:16
π
|
|
Start has quit IRC (Quit: Disconnected.) |
22:28
π
|
|
deathy___ has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
GLaDOS has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
zhongfu_ has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
Ctrl-S___ has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
johtso has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
Atluxity has quit IRC (hub.se efnet.port80.se) |
22:28
π
|
|
zhongfu has joined #urlteam |
22:34
π
|
|
Atluxity has joined #urlteam |
22:38
π
|
|
GLaDOS has joined #urlteam |
22:38
π
|
|
svchfoo3 sets mode: +o GLaDOS |
22:45
π
|
|
johtso has joined #urlteam |
22:45
π
|
|
deathy___ has joined #urlteam |
22:46
π
|
|
WinterFox has joined #urlteam |
23:10
π
|
|
Start has joined #urlteam |