#urlteam 2012-09-17,Mon

↑back Search

Time Nickname Message
20:06 🔗 soultcer For the 6-letter codes on bit.ly I expect about 18,933,411,861 URLs
20:39 🔗 underscor Do we have tinyback (or whatever our software is called) running on the warriors?
20:39 🔗 underscor seems like it would be a good project
20:50 🔗 chronomex hm, yeah
20:55 🔗 soultcer Yes, it's possible to run on the warrior
20:57 🔗 underscor soultcer: do you already have a module written?
20:57 🔗 soultcer https://github.com/soult/tinyback/blob/master/pipeline.py
20:57 🔗 soultcer Well, not very pretty since it only calls the run.py binary, but it seems to work fine
20:58 🔗 underscor are you opposed to checking it into the warrior?
20:58 🔗 soultcer I'd have to create some more tasks in the tracker first ;-)
20:59 🔗 soultcer At the moment it's mostly "Randomly hit 7500 bit.ly links, send the results" tasks
21:00 🔗 underscor oh, okay
21:00 🔗 underscor is that easy? ;P
21:00 🔗 underscor we just have workers sitting idle, would be kinda nice to get them working
21:03 🔗 soultcer The downside is that you will sometimes get blocked by bit.ly for an hour or two
21:03 🔗 soultcer Even though I am obeying their robots.txt
21:07 🔗 underscor is it just request volume?
21:07 🔗 underscor or they randomly decide to do it?
21:08 🔗 soultcer Request volume
21:08 🔗 underscor # robots welcome;
21:08 🔗 underscor # welcome to bit.ly =)
21:08 🔗 underscor >:(
21:08 🔗 soultcer is.gd at least tells you how many requests you can make per hour
21:10 🔗 underscor maybe just turn it down a bit?
21:10 🔗 underscor how agressive is it?
21:12 🔗 soultcer Only 1 connection per client IP per shortener
21:14 🔗 underscor oh
21:14 🔗 underscor that's not a lot
21:15 🔗 underscor ah, I see. it's not concurrent requests per second, it's total requests per hour
21:15 🔗 GitHub178 [tinyback/master] HTTPService: Add default timeout of 30 seconds - David Triendl
21:15 🔗 GitHub178 [tinyback/master] services.Bitly: Add "rate limit" of 2 requests per second - David Triendl
21:15 🔗 GitHub178 [tinyback] soult pushed 2 new commits to master: https://github.com/soult/tinyback/compare/87772c5d4df9...59ece55f615f
21:15 🔗 soultcer Let's see if that helps
21:16 🔗 underscor neat
21:17 🔗 underscor -> Tinyback
21:17 🔗 underscor INFO:root:Starting up tornadio server on port '8001'
21:17 🔗 underscor Pipeline:
21:17 🔗 underscor Starting Tinyback for Item
21:17 🔗 underscor root@vks23726:~/tinyback# run-pipeline --concurrent 1 pipeline.py underscor
21:17 🔗 underscor 2012-09-17 23:17:32,612 tinyback.Tracker INFO: Initializing tracker at http://tracker.tinyarchive.org/v1//
21:17 🔗 underscor 2012-09-17 23:17:33,495 tinyback.Tracker INFO: Received task de1fe3a4-00fe-11e2-ac91-0016179842f7 for service bitly
21:17 🔗 underscor 2012-09-17 23:17:33,495 tinyback.Reaper INFO: Rate limit: 2 requests per 1 seconds
21:18 🔗 underscor 2012-09-17 23:17:33,495 tinyback.Reaper INFO: Starting Reaper
21:18 🔗 underscor wheeeeeeeeee
21:18 🔗 underscor is the tracker code anywhere?
21:19 🔗 soultcer Nah, the tracker code is not in the tinyback repository, but it's a very simple web.py app with an sqlite db
21:19 🔗 swebb This will be interesting if it works. :)
21:20 🔗 soultcer underscor: btw, if you are not running it inside the warrior, you can simply call run.py directly (it even has a fancy option parser)
21:22 🔗 underscor ooh, handy
21:22 🔗 underscor also, how long does a single work unit usually take?
21:23 🔗 soultcer Depends on the service. I try to aim for about an hour
21:23 🔗 soultcer is.gd for example rate-limits to 1 req per sec, so I make it fetch about 3600 codes
21:24 🔗 underscor oh okay
21:24 🔗 soultcer bit.ly now has 7200 req/s and I let it do 7500 requests per unit
21:24 🔗 soultcer Is that too long or too short?
21:24 🔗 underscor I'd say it's too long, imo
21:24 🔗 underscor but, it also depends
21:25 🔗 underscor if I quit my thing right now, do I lose the work I've done?
21:25 🔗 soultcer Yes
21:25 🔗 underscor ah, okay, then, yeah. I'd say do a much small interval, at least for running on the worker
21:25 🔗 underscor smaller*
21:25 🔗 soultcer Also it will not assign you anything for the same service until you told the tracker to "clear" your work queue, or the unit expires (which as of now is in 48 hours I think)
21:26 🔗 underscor as long as your tracker handles the load better. like maybe 5m.
21:26 🔗 underscor because warriors by their very nature are transient/unreliable
21:26 🔗 soultcer Well I hope the tracker will handle fine
21:26 🔗 soultcer If it ever gets too much I'll switch to a real database system
21:27 🔗 underscor er
21:27 🔗 underscor s/load better/increased load okay/
21:27 🔗 underscor chronomex: ops
21:27 🔗 chronomex ops for whom
21:28 🔗 underscor thx
21:28 🔗 soultcer Yey
21:28 🔗 underscor soultcer: but I definitely think more small wu are better than few large wu
21:28 🔗 underscor in my opinion, at least
21:28 🔗 underscor I'd love other's input
21:29 🔗 underscor Because that also means if I'm running the warrior and I want to stop it, I either force it off (and lose work) or wait up to an hour
21:29 🔗 underscor which is kinda annoying
21:29 🔗 soultcer Yeah, it will also be nicer to watch, when you finish a task every couple minutes instead of every hour
21:29 🔗 soultcer I'll have to record how much a given number of requests take for each service
21:30 🔗 soultcer I've discovered that scraping tinyurl goes really fast from my buyvm VPS, because tinyurl is hosted in the same city. On the other hand it can be slow from Europe/Asia/Australia/Antarctica
21:31 🔗 chronomex you have a .aq vps?
21:31 🔗 chronomex where do I get one
21:32 🔗 underscor ^
21:32 🔗 underscor wow
21:32 🔗 underscor I bet you could make a lot of money doing that
21:32 🔗 underscor oh, but you can't use aq for commercial purposes
21:32 🔗 underscor hm
21:33 🔗 soultcer Lol no, I'm just guessing

irclogger-viewer