[14:39] http://grooveshark.com/#!/s/Krebs+Sleepers+Wake+The+Voice+Is+Calling/1WPxqz?src=5 [14:39] arise! time to get to work [16:14] well, does anyone recall how I fixed the utf8 encoding problem with the tracker? [17:48] punched it? [17:59] :) [17:59] I just figured it out [18:00] the utf8 error happens because it's trying to print out the request that caused a previous error, not expecting the body of the request to be gzipped [18:00] fixing the real error is the way to go :) [18:01] Smiley: can I get you to help out? [18:06] https://github.com/db48x/tinyarchive/tree/small-improvements [18:16] im at work, whatya need? [18:17] ah [18:17] we need to get the project going again [18:17] not a coder either... [18:18] :) [18:18] can you recommend a shortener we should scrape? [18:18] but im willibg to try and lwarn with some babysitting/hand holding... [18:19] :D [18:19] well teitter anniyd me [18:19] twitter* [18:20] that's one that hasn't been done before [18:20] or flic.kr [18:20] can you find a selection of urls? [18:20] sure, from my feed :D [18:22] we first need to know what character set it uses, and how many characters [18:22] t.co/Baan9O37h4 [18:23] that's a long url [18:23] 10 characters [18:23] are there longer ones? [18:24] 62 characters in each slot, at least [18:24] any punctuation? [18:26] t.co/caUMNmvzDP [18:26] nope [18:31] well, that's only 839299365868340224 urls to check [18:32] \O/ [18:33] a perfect 10th power. [18:33] yes, 62^10 [18:33] 839 quadrillion [18:35] the last project topped out at 6 queries per second [18:35] so only 2.65e10 years, give or take [18:35] yikes [18:35] divided by however many people we can get to join in [18:35] warrior :) [18:35] so do you have framework?? [18:36] we had ~40 warrors last time, so only 6.6e8 years :) [18:38] hahaha [18:41] right now tho we arent soing anythinf. lets get it going then increase user count... [18:44] Yes [18:44] I support anything. [18:44] frikking work getting in the way [18:48] ok, for such a large keyspace, we have to do a probablistic search [18:49] if you take a look at https://github.com/db48x/tinyback/blob/master/tinyback/services.py [18:49] you can see how the services are defined [18:50] and in https://github.com/db48x/tinyback/blob/master/tinyback/generators.py you can see the chain_generator [18:54] although I don't really like the way it's implemented [18:56] ooh, problem [18:56] t.co doesn't redirect [18:58] oh? [18:58] services seem understandable. [19:01] loading t.co/Baan9O37h4 returns 200, with an html page containing a message [19:02] http://t.co/AEU6b9iu70 returns 200, with a gif in the body [19:02] etc [19:02] we'll have to find something else [19:02] it works on ie here (at work) [19:03] the scripts we're using assume that the service will return a 302, with the correct url to load [19:03] a proper redirect [19:03] ah :/ [19:04] oh, wait [19:04] lack of sleep is causing me to make mistakes [19:04] it's doing a 301 redirect, which is clever [19:05] but makes the browser dev tools lie slightly [19:05] a 301 is a _permenant_ redirect, telling the browser to remember what url it redirected to, and simply load it directly [19:08] http://hastebin.com/yazimomoha.py [19:09] that's a good start, might need some tweaks after testing [19:09] being at work sucks. [19:12] :) [19:35] t.co doesn't support keep-alives [21:40] right, i'm here. [21:44] not really sure what I can do. [22:35] it's way too slow [22:35] 15 minutes to do 2000 urls [22:35] derp [22:40] yea [22:47] well the project gives me no tasks available? [22:51] anyway gotta head to bed [22:51] o/ [22:52] good night [22:52] I haven't added it to the tracker yet [22:53] there's no point when it's this slow [23:08] 2048 0.031 0.000 815.372 0.398 services.py:176(fetch) [23:09] 2050 605.588 0.295 605.588 0.295 {_socket.getaddrinfo} [23:41] ok, so I can cache that fairly easily [23:45] that gets me almost 16 per second [23:46] still pretty hopeless at that rate [23:46] none of my tests have found even a single url [23:49] half of that time is spent in socket.connect and half in socket.recv [23:50] so that only way to speed it up is to make it concurrent