[20:06] For the 6-letter codes on bit.ly I expect about 18,933,411,861 URLs [20:39] Do we have tinyback (or whatever our software is called) running on the warriors? [20:39] seems like it would be a good project [20:50] hm, yeah [20:55] Yes, it's possible to run on the warrior [20:57] soultcer: do you already have a module written? [20:57] https://github.com/soult/tinyback/blob/master/pipeline.py [20:57] Well, not very pretty since it only calls the run.py binary, but it seems to work fine [20:58] are you opposed to checking it into the warrior? [20:58] I'd have to create some more tasks in the tracker first ;-) [20:59] At the moment it's mostly "Randomly hit 7500 bit.ly links, send the results" tasks [21:00] oh, okay [21:00] is that easy? ;P [21:00] we just have workers sitting idle, would be kinda nice to get them working [21:03] The downside is that you will sometimes get blocked by bit.ly for an hour or two [21:03] Even though I am obeying their robots.txt [21:07] is it just request volume? [21:07] or they randomly decide to do it? [21:08] Request volume [21:08] # robots welcome; [21:08] # welcome to bit.ly =) [21:08] >:( [21:08] is.gd at least tells you how many requests you can make per hour [21:10] maybe just turn it down a bit? [21:10] how agressive is it? [21:12] Only 1 connection per client IP per shortener [21:14] oh [21:14] that's not a lot [21:15] ah, I see. it's not concurrent requests per second, it's total requests per hour [21:15] [tinyback/master] HTTPService: Add default timeout of 30 seconds - David Triendl [21:15] [tinyback/master] services.Bitly: Add "rate limit" of 2 requests per second - David Triendl [21:15] [tinyback] soult pushed 2 new commits to master: https://github.com/soult/tinyback/compare/87772c5d4df9...59ece55f615f [21:15] Let's see if that helps [21:16] neat [21:17] -> Tinyback [21:17] INFO:root:Starting up tornadio server on port '8001' [21:17] Pipeline: [21:17] Starting Tinyback for Item [21:17] root@vks23726:~/tinyback# run-pipeline --concurrent 1 pipeline.py underscor [21:17] 2012-09-17 23:17:32,612 tinyback.Tracker INFO: Initializing tracker at http://tracker.tinyarchive.org/v1// [21:17] 2012-09-17 23:17:33,495 tinyback.Tracker INFO: Received task de1fe3a4-00fe-11e2-ac91-0016179842f7 for service bitly [21:17] 2012-09-17 23:17:33,495 tinyback.Reaper INFO: Rate limit: 2 requests per 1 seconds [21:18] 2012-09-17 23:17:33,495 tinyback.Reaper INFO: Starting Reaper [21:18] wheeeeeeeeee [21:18] is the tracker code anywhere? [21:19] Nah, the tracker code is not in the tinyback repository, but it's a very simple web.py app with an sqlite db [21:19] This will be interesting if it works. :) [21:20] underscor: btw, if you are not running it inside the warrior, you can simply call run.py directly (it even has a fancy option parser) [21:22] ooh, handy [21:22] also, how long does a single work unit usually take? [21:23] Depends on the service. I try to aim for about an hour [21:23] is.gd for example rate-limits to 1 req per sec, so I make it fetch about 3600 codes [21:24] oh okay [21:24] bit.ly now has 7200 req/s and I let it do 7500 requests per unit [21:24] Is that too long or too short? [21:24] I'd say it's too long, imo [21:24] but, it also depends [21:25] if I quit my thing right now, do I lose the work I've done? [21:25] Yes [21:25] ah, okay, then, yeah. I'd say do a much small interval, at least for running on the worker [21:25] smaller* [21:25] Also it will not assign you anything for the same service until you told the tracker to "clear" your work queue, or the unit expires (which as of now is in 48 hours I think) [21:26] as long as your tracker handles the load better. like maybe 5m. [21:26] because warriors by their very nature are transient/unreliable [21:26] Well I hope the tracker will handle fine [21:26] If it ever gets too much I'll switch to a real database system [21:27] er [21:27] s/load better/increased load okay/ [21:27] chronomex: ops [21:27] ops for whom [21:28] thx [21:28] Yey [21:28] soultcer: but I definitely think more small wu are better than few large wu [21:28] in my opinion, at least [21:28] I'd love other's input [21:29] Because that also means if I'm running the warrior and I want to stop it, I either force it off (and lose work) or wait up to an hour [21:29] which is kinda annoying [21:29] Yeah, it will also be nicer to watch, when you finish a task every couple minutes instead of every hour [21:29] I'll have to record how much a given number of requests take for each service [21:30] I've discovered that scraping tinyurl goes really fast from my buyvm VPS, because tinyurl is hosted in the same city. On the other hand it can be slow from Europe/Asia/Australia/Antarctica [21:31] you have a .aq vps? [21:31] where do I get one [21:32] ^ [21:32] wow [21:32] I bet you could make a lot of money doing that [21:32] oh, but you can't use aq for commercial purposes [21:32] hm [21:33] Lol no, I'm just guessing