Time |
Nickname |
Message |
14:39
🔗
|
db48x |
http://grooveshark.com/#!/s/Krebs+Sleepers+Wake+The+Voice+Is+Calling/1WPxqz?src=5 |
14:39
🔗
|
db48x |
arise! time to get to work |
16:14
🔗
|
db48x |
well, does anyone recall how I fixed the utf8 encoding problem with the tracker? |
17:48
🔗
|
Smiley |
punched it? |
17:59
🔗
|
db48x |
:) |
17:59
🔗
|
db48x |
I just figured it out |
18:00
🔗
|
db48x |
the utf8 error happens because it's trying to print out the request that caused a previous error, not expecting the body of the request to be gzipped |
18:00
🔗
|
db48x |
fixing the real error is the way to go :) |
18:01
🔗
|
db48x |
Smiley: can I get you to help out? |
18:06
🔗
|
db48x |
https://github.com/db48x/tinyarchive/tree/small-improvements |
18:16
🔗
|
Smiley |
im at work, whatya need? |
18:17
🔗
|
db48x |
ah |
18:17
🔗
|
db48x |
we need to get the project going again |
18:17
🔗
|
Smiley |
not a coder either... |
18:18
🔗
|
db48x |
:) |
18:18
🔗
|
db48x |
can you recommend a shortener we should scrape? |
18:18
🔗
|
Smiley |
but im willibg to try and lwarn with some babysitting/hand holding... |
18:19
🔗
|
db48x |
:D |
18:19
🔗
|
Smiley |
well teitter anniyd me |
18:19
🔗
|
Smiley |
twitter* |
18:20
🔗
|
db48x |
that's one that hasn't been done before |
18:20
🔗
|
Smiley |
or flic.kr |
18:20
🔗
|
db48x |
can you find a selection of urls? |
18:20
🔗
|
Smiley |
sure, from my feed :D |
18:22
🔗
|
db48x |
we first need to know what character set it uses, and how many characters |
18:22
🔗
|
Smiley |
t.co/Baan9O37h4 |
18:23
🔗
|
db48x |
that's a long url |
18:23
🔗
|
db48x |
10 characters |
18:23
🔗
|
db48x |
are there longer ones? |
18:24
🔗
|
db48x |
62 characters in each slot, at least |
18:24
🔗
|
db48x |
any punctuation? |
18:26
🔗
|
Smiley |
t.co/caUMNmvzDP |
18:26
🔗
|
Smiley |
nope |
18:31
🔗
|
db48x |
well, that's only 839299365868340224 urls to check |
18:32
🔗
|
Smiley |
\O/ |
18:33
🔗
|
Smiley |
a perfect 10th power. |
18:33
🔗
|
db48x |
yes, 62^10 |
18:33
🔗
|
Smiley |
839 quadrillion |
18:35
🔗
|
db48x |
the last project topped out at 6 queries per second |
18:35
🔗
|
db48x |
so only 2.65e10 years, give or take |
18:35
🔗
|
Smiley |
yikes |
18:35
🔗
|
db48x |
divided by however many people we can get to join in |
18:35
🔗
|
Smiley |
warrior :) |
18:35
🔗
|
Smiley |
so do you have framework?? |
18:36
🔗
|
db48x |
we had ~40 warrors last time, so only 6.6e8 years :) |
18:38
🔗
|
Smiley |
hahaha |
18:41
🔗
|
Smiley |
right now tho we arent soing anythinf. lets get it going then increase user count... |
18:44
🔗
|
SketchCow |
Yes |
18:44
🔗
|
SketchCow |
I support anything. |
18:44
🔗
|
Smiley |
frikking work getting in the way |
18:48
🔗
|
db48x |
ok, for such a large keyspace, we have to do a probablistic search |
18:49
🔗
|
db48x |
if you take a look at https://github.com/db48x/tinyback/blob/master/tinyback/services.py |
18:49
🔗
|
db48x |
you can see how the services are defined |
18:50
🔗
|
db48x |
and in https://github.com/db48x/tinyback/blob/master/tinyback/generators.py you can see the chain_generator |
18:54
🔗
|
db48x |
although I don't really like the way it's implemented |
18:56
🔗
|
db48x |
ooh, problem |
18:56
🔗
|
db48x |
t.co doesn't redirect |
18:58
🔗
|
Smiley |
oh? |
18:58
🔗
|
Smiley |
services seem understandable. |
19:01
🔗
|
db48x |
loading t.co/Baan9O37h4 returns 200, with an html page containing a message |
19:02
🔗
|
db48x |
http://t.co/AEU6b9iu70 returns 200, with a gif in the body |
19:02
🔗
|
db48x |
etc |
19:02
🔗
|
db48x |
we'll have to find something else |
19:02
🔗
|
Smiley |
it works on ie here (at work) |
19:03
🔗
|
db48x |
the scripts we're using assume that the service will return a 302, with the correct url to load |
19:03
🔗
|
db48x |
a proper redirect |
19:03
🔗
|
Smiley |
ah :/ |
19:04
🔗
|
db48x |
oh, wait |
19:04
🔗
|
db48x |
lack of sleep is causing me to make mistakes |
19:04
🔗
|
db48x |
it's doing a 301 redirect, which is clever |
19:05
🔗
|
db48x |
but makes the browser dev tools lie slightly |
19:05
🔗
|
db48x |
a 301 is a _permenant_ redirect, telling the browser to remember what url it redirected to, and simply load it directly |
19:08
🔗
|
db48x |
http://hastebin.com/yazimomoha.py |
19:09
🔗
|
db48x |
that's a good start, might need some tweaks after testing |
19:09
🔗
|
Smiley |
being at work sucks. |
19:12
🔗
|
db48x |
:) |
19:35
🔗
|
db48x |
t.co doesn't support keep-alives |
21:40
🔗
|
Smiley |
right, i'm here. |
21:44
🔗
|
Smiley |
not really sure what I can do. |
22:35
🔗
|
db48x |
it's way too slow |
22:35
🔗
|
db48x |
15 minutes to do 2000 urls |
22:35
🔗
|
Smiley |
derp |
22:40
🔗
|
db48x |
yea |
22:47
🔗
|
Smiley |
well the project gives me no tasks available? |
22:51
🔗
|
Smiley |
anyway gotta head to bed |
22:51
🔗
|
Smiley |
o/ |
22:52
🔗
|
db48x |
good night |
22:52
🔗
|
db48x |
I haven't added it to the tracker yet |
22:53
🔗
|
db48x |
there's no point when it's this slow |
23:08
🔗
|
db48x |
2048 0.031 0.000 815.372 0.398 services.py:176(fetch) |
23:09
🔗
|
db48x |
2050 605.588 0.295 605.588 0.295 {_socket.getaddrinfo} |
23:41
🔗
|
db48x |
ok, so I can cache that fairly easily |
23:45
🔗
|
db48x |
that gets me almost 16 per second |
23:46
🔗
|
db48x |
still pretty hopeless at that rate |
23:46
🔗
|
db48x |
none of my tests have found even a single url |
23:49
🔗
|
db48x |
half of that time is spent in socket.connect and half in socket.recv |
23:50
🔗
|
db48x |
so that only way to speed it up is to make it concurrent |