[00:08] google search of site:soli.dm returns about 566 results. shorturl alphabet seems to be 5x A-Z a-z 0-9 [00:16] asdf: Added it to the URLTeam wiki page. [00:19] ok, thank you [00:19] yeah 566 seems worth saving doesn't it [00:19] means someone has been using it at least a bit [00:42] *** bwn_ has joined #urlteam [00:43] you have all these? - http://bit.do/list-of-url-shorteners.php [00:44] asdf: yep, check the history of the page [00:45] er, our page [00:45] you'll see I imported them, then checked them off [00:45] thanks for asking about it, though! [00:45] *** bwn has quit IRC (Read error: Operation timed out) [01:24] *** JesseW has joined #urlteam [01:24] *** svchfoo1 sets mode: +o JesseW [01:29] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [02:03] *** Coderjoe has joined #urlteam [02:06] *** bwn has joined #urlteam [02:11] *** bwn_ has quit IRC (Read error: Operation timed out) [02:21] *** W1nterFox has joined #urlteam [02:27] *** WinterFox has quit IRC (Read error: Operation timed out) [02:58] *** bwn has quit IRC (Ping timeout: 250 seconds) [03:07] *** JesseW has quit IRC (Leaving.) [03:36] *** aaaaaaaaa has quit IRC (Leaving) [04:54] *** JesseW has joined #urlteam [04:54] *** svchfoo3 sets mode: +o JesseW [06:21] *** cechk01 has joined #urlteam [07:39] *** asdf has quit IRC (Ping timeout: 252 seconds) [08:00] *** JesseW has quit IRC (Leaving.) [08:08] *** GLaDOS has quit IRC (Ping timeout: 252 seconds) [08:11] *** GLaDOS has joined #urlteam [08:11] *** svchfoo3 sets mode: +o GLaDOS [09:01] *** deathy___ has quit IRC (Ping timeout: 252 seconds) [09:11] *** deathy___ has joined #urlteam [09:15] *** bwn has joined #urlteam [09:23] *** deathy___ has quit IRC (Ping timeout: 252 seconds) [09:42] *** deathy___ has joined #urlteam [11:52] *** deathy___ has quit IRC (Ping timeout: 252 seconds) [12:31] *** deathy___ has joined #urlteam [13:07] *** deathy___ has quit IRC () [13:08] *** deathy___ has joined #urlteam [14:14] *** Coderjoe has quit IRC (Read error: Operation timed out) [14:29] *** Coderjoe has joined #urlteam [14:29] *** W1nterFox has quit IRC (Remote host closed the connection) [15:05] *** Start has quit IRC (Quit: Disconnected.) [15:07] *** dashcloud has quit IRC (Read error: Operation timed out) [15:11] *** dashcloud has joined #urlteam [15:11] *** svchfoo1 sets mode: +o dashcloud [15:16] 750k urls scanned on da.gd. I just asked my buddy if we can bump up the scrape rate, I'll let you all know what he says. [15:39] *** JesseW has joined #urlteam [15:40] *** svchfoo1 sets mode: +o JesseW [15:43] we seem to be finding them at about the expected rate [16:16] *** JesseW has quit IRC (Leaving.) [16:44] Can we add soli.dm as a project to scrape? [16:45] It's in the "Alive" list, but it's not as a warrior project yet. [16:46] It looks doable, yeah. I'll probably add it tonight. [16:50] Ok cool. I figured out the alphabet of shorl.com too [16:52] Wiki pages says it doesn't look guessable, because the short urls are like 13 a-z characters long, but in reality it's more like 6 characters of a 128-character alphabet. [16:53] For example, the shorturl on the wiki page is http://shorl.com/tisikestibahu. Initially looking at it, you'd see that as 13 characters of a-z. In reality, it breaks down like this: TI-SI-KE-STI-BA-HU [16:55] Here's a gist of the alphabet: https://gist.github.com/anonymous/599ba4c17599cd213005 [16:57] awesome! That will require custom code, though — so unless you write it, it will probably take a while to get to. [16:58] *If* you (or someone else) wants to write it, I'll be glad to help with reviewing it, test cases, etc. [17:08] phuzion: btw, did you hear back from the da.gd owner? [17:08] JW_work: Not yet, I think he's passed out right now. No complaints about load or anything though. He did say we were beating the crap out his logs though, lol [17:09] ha. well, yes, it would do that. [17:09] "phuzion: so far 0.0001% of archiveteam hits have been non-404s" [17:09] he's still welcome to send us a dump, and we'll gladly stop scraping it [17:10] yep, 165K out of the total should give us that hitrate [17:10] Do we log response times by chance? [17:10] I don't think so, no [17:10] Also, does the client reuse http connections if possible? Or does it start a new HTTP connection for each request? [17:11] We have a 60 second timeout, so I see how many go over that… [17:11] (and yes, it does reuse connections) [17:11] but items are only 50 URL, so it will (of course) create a new connection for each group of 50 [17:12] I can boost the size of the items, though, too [17:13] A new HTTP connection every 50 URLs is fine, I just wanted to make sure that there was some optimization inside there. No need to open a connection, make the request, close the connection, then open yet another connection. [17:14] *** Start has joined #urlteam [17:14] yep [17:14] check services/base.py for the code [17:35] JW_work: Do you know of any other shorteners that have a custom alphabet? I __MIIIIIIIGHT__ be able to whip something up if that's the case. [17:36] I don't remember any; check the wiki page [17:36] I bet there are some, though [17:37] there are still a *big* list of ones to research [17:47] Daring Fireball (df4.us) should be a quick scan, just researched that one. [17:58] cool, excellent research — I'll add it as soon as I'm at home. [18:34] *** Start has quit IRC (Quit: Disconnected.) [19:14] *** Start has joined #urlteam [19:16] *** Start has quit IRC (Client Quit) [19:32] *** asdf has joined #urlteam [19:39] *** bwn has quit IRC (Read error: Operation timed out) [19:44] *** dashcloud has quit IRC (Read error: Operation timed out) [19:53] *** aaaaaaaaa has joined #urlteam [19:53] *** swebb sets mode: +o aaaaaaaaa [19:54] *** dashcloud has joined #urlteam [19:55] *** svchfoo1 sets mode: +o dashcloud [20:15] *** Protab has joined #urlteam [20:15] *** Rotab has quit IRC (Ping timeout: 200 seconds) [20:15] *** Protab is now known as Rotab [20:19] *** bwn has joined #urlteam [20:47] *** Start has joined #urlteam [21:26] *** Start has quit IRC (Quit: Disconnected.) [22:10] *** Start has joined #urlteam [22:16] *** Start has quit IRC (Quit: Disconnected.) [22:28] *** deathy___ has quit IRC (hub.se efnet.port80.se) [22:28] *** GLaDOS has quit IRC (hub.se efnet.port80.se) [22:28] *** zhongfu_ has quit IRC (hub.se efnet.port80.se) [22:28] *** Ctrl-S___ has quit IRC (hub.se efnet.port80.se) [22:28] *** johtso has quit IRC (hub.se efnet.port80.se) [22:28] *** Atluxity has quit IRC (hub.se efnet.port80.se) [22:28] *** zhongfu has joined #urlteam [22:34] *** Atluxity has joined #urlteam [22:38] *** GLaDOS has joined #urlteam [22:38] *** svchfoo3 sets mode: +o GLaDOS [22:45] *** johtso has joined #urlteam [22:45] *** deathy___ has joined #urlteam [22:46] *** WinterFox has joined #urlteam [23:10] *** Start has joined #urlteam