#urlteam 2014-07-05,Sat

↑back Search

Time Nickname Message
14:39 🔗 db48x http://grooveshark.com/#!/s/Krebs+Sleepers+Wake+The+Voice+Is+Calling/1WPxqz?src=5
14:39 🔗 db48x arise! time to get to work
16:14 🔗 db48x well, does anyone recall how I fixed the utf8 encoding problem with the tracker?
17:48 🔗 Smiley punched it?
17:59 🔗 db48x :)
17:59 🔗 db48x I just figured it out
18:00 🔗 db48x the utf8 error happens because it's trying to print out the request that caused a previous error, not expecting the body of the request to be gzipped
18:00 🔗 db48x fixing the real error is the way to go :)
18:01 🔗 db48x Smiley: can I get you to help out?
18:06 🔗 db48x https://github.com/db48x/tinyarchive/tree/small-improvements
18:16 🔗 Smiley im at work, whatya need?
18:17 🔗 db48x ah
18:17 🔗 db48x we need to get the project going again
18:17 🔗 Smiley not a coder either...
18:18 🔗 db48x :)
18:18 🔗 db48x can you recommend a shortener we should scrape?
18:18 🔗 Smiley but im willibg to try and lwarn with some babysitting/hand holding...
18:19 🔗 db48x :D
18:19 🔗 Smiley well teitter anniyd me
18:19 🔗 Smiley twitter*
18:20 🔗 db48x that's one that hasn't been done before
18:20 🔗 Smiley or flic.kr
18:20 🔗 db48x can you find a selection of urls?
18:20 🔗 Smiley sure, from my feed :D
18:22 🔗 db48x we first need to know what character set it uses, and how many characters
18:22 🔗 Smiley t.co/Baan9O37h4
18:23 🔗 db48x that's a long url
18:23 🔗 db48x 10 characters
18:23 🔗 db48x are there longer ones?
18:24 🔗 db48x 62 characters in each slot, at least
18:24 🔗 db48x any punctuation?
18:26 🔗 Smiley t.co/caUMNmvzDP
18:26 🔗 Smiley nope
18:31 🔗 db48x well, that's only 839299365868340224 urls to check
18:32 🔗 Smiley \O/
18:33 🔗 Smiley a perfect 10th power.
18:33 🔗 db48x yes, 62^10
18:33 🔗 Smiley 839 quadrillion
18:35 🔗 db48x the last project topped out at 6 queries per second
18:35 🔗 db48x so only 2.65e10 years, give or take
18:35 🔗 Smiley yikes
18:35 🔗 db48x divided by however many people we can get to join in
18:35 🔗 Smiley warrior :)
18:35 🔗 Smiley so do you have framework??
18:36 🔗 db48x we had ~40 warrors last time, so only 6.6e8 years :)
18:38 🔗 Smiley hahaha
18:41 🔗 Smiley right now tho we arent soing anythinf. lets get it going then increase user count...
18:44 🔗 SketchCow Yes
18:44 🔗 SketchCow I support anything.
18:44 🔗 Smiley frikking work getting in the way
18:48 🔗 db48x ok, for such a large keyspace, we have to do a probablistic search
18:49 🔗 db48x if you take a look at https://github.com/db48x/tinyback/blob/master/tinyback/services.py
18:49 🔗 db48x you can see how the services are defined
18:50 🔗 db48x and in https://github.com/db48x/tinyback/blob/master/tinyback/generators.py you can see the chain_generator
18:54 🔗 db48x although I don't really like the way it's implemented
18:56 🔗 db48x ooh, problem
18:56 🔗 db48x t.co doesn't redirect
18:58 🔗 Smiley oh?
18:58 🔗 Smiley services seem understandable.
19:01 🔗 db48x loading t.co/Baan9O37h4 returns 200, with an html page containing a message
19:02 🔗 db48x http://t.co/AEU6b9iu70 returns 200, with a gif in the body
19:02 🔗 db48x etc
19:02 🔗 db48x we'll have to find something else
19:02 🔗 Smiley it works on ie here (at work)
19:03 🔗 db48x the scripts we're using assume that the service will return a 302, with the correct url to load
19:03 🔗 db48x a proper redirect
19:03 🔗 Smiley ah :/
19:04 🔗 db48x oh, wait
19:04 🔗 db48x lack of sleep is causing me to make mistakes
19:04 🔗 db48x it's doing a 301 redirect, which is clever
19:05 🔗 db48x but makes the browser dev tools lie slightly
19:05 🔗 db48x a 301 is a _permenant_ redirect, telling the browser to remember what url it redirected to, and simply load it directly
19:08 🔗 db48x http://hastebin.com/yazimomoha.py
19:09 🔗 db48x that's a good start, might need some tweaks after testing
19:09 🔗 Smiley being at work sucks.
19:12 🔗 db48x :)
19:35 🔗 db48x t.co doesn't support keep-alives
21:40 🔗 Smiley right, i'm here.
21:44 🔗 Smiley not really sure what I can do.
22:35 🔗 db48x it's way too slow
22:35 🔗 db48x 15 minutes to do 2000 urls
22:35 🔗 Smiley derp
22:40 🔗 db48x yea
22:47 🔗 Smiley well the project gives me no tasks available?
22:51 🔗 Smiley anyway gotta head to bed
22:51 🔗 Smiley o/
22:52 🔗 db48x good night
22:52 🔗 db48x I haven't added it to the tracker yet
22:53 🔗 db48x there's no point when it's this slow
23:08 🔗 db48x 2048 0.031 0.000 815.372 0.398 services.py:176(fetch)
23:09 🔗 db48x 2050 605.588 0.295 605.588 0.295 {_socket.getaddrinfo}
23:41 🔗 db48x ok, so I can cache that fairly easily
23:45 🔗 db48x that gets me almost 16 per second
23:46 🔗 db48x still pretty hopeless at that rate
23:46 🔗 db48x none of my tests have found even a single url
23:49 🔗 db48x half of that time is spent in socket.connect and half in socket.recv
23:50 🔗 db48x so that only way to speed it up is to make it concurrent

irclogger-viewer