#urlteam 2015-12-17,Thu

↑back Search

Time Nickname Message
00:08 πŸ”— phuzion google search of site:soli.dm returns about 566 results. shorturl alphabet seems to be 5x A-Z a-z 0-9
00:16 πŸ”— phuzion asdf: Added it to the URLTeam wiki page.
00:19 πŸ”— asdf ok, thank you
00:19 πŸ”— asdf yeah 566 seems worth saving doesn't it
00:19 πŸ”— asdf means someone has been using it at least a bit
00:42 πŸ”— bwn_ has joined #urlteam
00:43 πŸ”— asdf you have all these? - http://bit.do/list-of-url-shorteners.php
00:44 πŸ”— JW_work asdf: yep, check the history of the page
00:45 πŸ”— JW_work er, our page
00:45 πŸ”— JW_work you'll see I imported them, then checked them off
00:45 πŸ”— JW_work thanks for asking about it, though!
00:45 πŸ”— bwn has quit IRC (Read error: Operation timed out)
01:24 πŸ”— JesseW has joined #urlteam
01:24 πŸ”— svchfoo1 sets mode: +o JesseW
01:29 πŸ”— Coderjoe has quit IRC (Read error: Connection reset by peer)
02:03 πŸ”— Coderjoe has joined #urlteam
02:06 πŸ”— bwn has joined #urlteam
02:11 πŸ”— bwn_ has quit IRC (Read error: Operation timed out)
02:21 πŸ”— W1nterFox has joined #urlteam
02:27 πŸ”— WinterFox has quit IRC (Read error: Operation timed out)
02:58 πŸ”— bwn has quit IRC (Ping timeout: 250 seconds)
03:07 πŸ”— JesseW has quit IRC (Leaving.)
03:36 πŸ”— aaaaaaaaa has quit IRC (Leaving)
04:54 πŸ”— JesseW has joined #urlteam
04:54 πŸ”— svchfoo3 sets mode: +o JesseW
06:21 πŸ”— cechk01 has joined #urlteam
07:39 πŸ”— asdf has quit IRC (Ping timeout: 252 seconds)
08:00 πŸ”— JesseW has quit IRC (Leaving.)
08:08 πŸ”— GLaDOS has quit IRC (Ping timeout: 252 seconds)
08:11 πŸ”— GLaDOS has joined #urlteam
08:11 πŸ”— svchfoo3 sets mode: +o GLaDOS
09:01 πŸ”— deathy___ has quit IRC (Ping timeout: 252 seconds)
09:11 πŸ”— deathy___ has joined #urlteam
09:15 πŸ”— bwn has joined #urlteam
09:23 πŸ”— deathy___ has quit IRC (Ping timeout: 252 seconds)
09:42 πŸ”— deathy___ has joined #urlteam
11:52 πŸ”— deathy___ has quit IRC (Ping timeout: 252 seconds)
12:31 πŸ”— deathy___ has joined #urlteam
13:07 πŸ”— deathy___ has quit IRC ()
13:08 πŸ”— deathy___ has joined #urlteam
14:14 πŸ”— Coderjoe has quit IRC (Read error: Operation timed out)
14:29 πŸ”— Coderjoe has joined #urlteam
14:29 πŸ”— W1nterFox has quit IRC (Remote host closed the connection)
15:05 πŸ”— Start has quit IRC (Quit: Disconnected.)
15:07 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
15:11 πŸ”— dashcloud has joined #urlteam
15:11 πŸ”— svchfoo1 sets mode: +o dashcloud
15:16 πŸ”— phuzion 750k urls scanned on da.gd. I just asked my buddy if we can bump up the scrape rate, I'll let you all know what he says.
15:39 πŸ”— JesseW has joined #urlteam
15:40 πŸ”— svchfoo1 sets mode: +o JesseW
15:43 πŸ”— JesseW we seem to be finding them at about the expected rate
16:16 πŸ”— JesseW has quit IRC (Leaving.)
16:44 πŸ”— phuzion Can we add soli.dm as a project to scrape?
16:45 πŸ”— phuzion It's in the "Alive" list, but it's not as a warrior project yet.
16:46 πŸ”— JW_work It looks doable, yeah. I'll probably add it tonight.
16:50 πŸ”— phuzion Ok cool. I figured out the alphabet of shorl.com too
16:52 πŸ”— phuzion Wiki pages says it doesn't look guessable, because the short urls are like 13 a-z characters long, but in reality it's more like 6 characters of a 128-character alphabet.
16:53 πŸ”— phuzion For example, the shorturl on the wiki page is http://shorl.com/tisikestibahu. Initially looking at it, you'd see that as 13 characters of a-z. In reality, it breaks down like this: TI-SI-KE-STI-BA-HU
16:55 πŸ”— phuzion Here's a gist of the alphabet: https://gist.github.com/anonymous/599ba4c17599cd213005
16:57 πŸ”— JW_work awesome! That will require custom code, though β€” so unless you write it, it will probably take a while to get to.
16:58 πŸ”— JW_work *If* you (or someone else) wants to write it, I'll be glad to help with reviewing it, test cases, etc.
17:08 πŸ”— JW_work phuzion: btw, did you hear back from the da.gd owner?
17:08 πŸ”— phuzion JW_work: Not yet, I think he's passed out right now. No complaints about load or anything though. He did say we were beating the crap out his logs though, lol
17:09 πŸ”— JW_work ha. well, yes, it would do that.
17:09 πŸ”— phuzion "phuzion: so far 0.0001% of archiveteam hits have been non-404s"
17:09 πŸ”— JW_work he's still welcome to send us a dump, and we'll gladly stop scraping it
17:10 πŸ”— JW_work yep, 165K out of the total should give us that hitrate
17:10 πŸ”— phuzion Do we log response times by chance?
17:10 πŸ”— JW_work I don't think so, no
17:10 πŸ”— phuzion Also, does the client reuse http connections if possible? Or does it start a new HTTP connection for each request?
17:11 πŸ”— JW_work We have a 60 second timeout, so I see how many go over that…
17:11 πŸ”— JW_work (and yes, it does reuse connections)
17:11 πŸ”— JW_work but items are only 50 URL, so it will (of course) create a new connection for each group of 50
17:12 πŸ”— JW_work I can boost the size of the items, though, too
17:13 πŸ”— phuzion A new HTTP connection every 50 URLs is fine, I just wanted to make sure that there was some optimization inside there. No need to open a connection, make the request, close the connection, then open yet another connection.
17:14 πŸ”— Start has joined #urlteam
17:14 πŸ”— JW_work yep
17:14 πŸ”— JW_work check services/base.py for the code
17:35 πŸ”— phuzion JW_work: Do you know of any other shorteners that have a custom alphabet? I __MIIIIIIIGHT__ be able to whip something up if that's the case.
17:36 πŸ”— JW_work I don't remember any; check the wiki page
17:36 πŸ”— JW_work I bet there are some, though
17:37 πŸ”— JW_work there are still a *big* list of ones to research
17:47 πŸ”— phuzion Daring Fireball (df4.us) should be a quick scan, just researched that one.
17:58 πŸ”— JW_work cool, excellent research β€” I'll add it as soon as I'm at home.
18:34 πŸ”— Start has quit IRC (Quit: Disconnected.)
19:14 πŸ”— Start has joined #urlteam
19:16 πŸ”— Start has quit IRC (Client Quit)
19:32 πŸ”— asdf has joined #urlteam
19:39 πŸ”— bwn has quit IRC (Read error: Operation timed out)
19:44 πŸ”— dashcloud has quit IRC (Read error: Operation timed out)
19:53 πŸ”— aaaaaaaaa has joined #urlteam
19:53 πŸ”— swebb sets mode: +o aaaaaaaaa
19:54 πŸ”— dashcloud has joined #urlteam
19:55 πŸ”— svchfoo1 sets mode: +o dashcloud
20:15 πŸ”— Protab has joined #urlteam
20:15 πŸ”— Rotab has quit IRC (Ping timeout: 200 seconds)
20:15 πŸ”— Protab is now known as Rotab
20:19 πŸ”— bwn has joined #urlteam
20:47 πŸ”— Start has joined #urlteam
21:26 πŸ”— Start has quit IRC (Quit: Disconnected.)
22:10 πŸ”— Start has joined #urlteam
22:16 πŸ”— Start has quit IRC (Quit: Disconnected.)
22:28 πŸ”— deathy___ has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— GLaDOS has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— zhongfu_ has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— Ctrl-S___ has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— johtso has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— Atluxity has quit IRC (hub.se efnet.port80.se)
22:28 πŸ”— zhongfu has joined #urlteam
22:34 πŸ”— Atluxity has joined #urlteam
22:38 πŸ”— GLaDOS has joined #urlteam
22:38 πŸ”— svchfoo3 sets mode: +o GLaDOS
22:45 πŸ”— johtso has joined #urlteam
22:45 πŸ”— deathy___ has joined #urlteam
22:46 πŸ”— WinterFox has joined #urlteam
23:10 πŸ”— Start has joined #urlteam

irclogger-viewer