#urlteam 2014-10-08,Wed

↑back Search

Time Nickname Message
18:40 🔗 _2tl hi
20:00 🔗 _2tl I don't want to sound negative, but it seems, the delay between 2 requests is not handled the same in tinyback and terroroftinytown-client-grab
20:01 🔗 _2tl in terroroftinytown-client-grab, there's a delay which is more or less a sleep() amount to call between 2 requests
20:02 🔗 _2tl in tinyback it was a bit more precise and optimized (in my opinion):
20:02 🔗 _2tl there was a rate limit tuple, defined here: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/services.py#L48
20:03 🔗 _2tl implementation was there: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/__init__.py#L132
20:03 🔗 _2tl the thing is, if I take is.gd for example, you can scrape 60 url in 1 minute, so with terroroftinytown-client-grab, the delay will be implemented as 1s
20:04 🔗 _2tl now, think on a 1 day timeframe, with tinyback you could scrape 86,400 urls / day
20:05 🔗 _2tl with terroroftinytown-client-grab, you will call sleep(1) 86,400 times, but if you take into account the RTT for each url request, maybe you only scrape 80/85k url
20:07 🔗 xmc hm, good point
20:11 🔗 _2tl also, is it planned to add more projects, like adding back big shorteners like bit.ly, is.gd, tinyurl ... and put them in parallel, so we can maximize the scrapping power
20:14 🔗 _2tl today there's only a focus on y.ahoo.it, but if I understand correctly, by increasing --concurrent number, we could scrap more urls without slowing down y.ahoo.it current scrapping
20:15 🔗 xmc that sounds right
20:18 🔗 aaaaaaaaa they've done others in the past including 3 (might have been 2) at the same time.
20:21 🔗 _2tl yes they did it, I remember too.
20:31 🔗 aaaaaaaaa To be clear, I was referring to the ToTT grabber; so, I think it is just a matter of time and reverse engineering more trackers.
20:37 🔗 _2tl what's the ToTT grabber?
20:38 🔗 aaaaaaaaa terror of tiny town, I was trying to save myself from having to type it.
20:38 🔗 aaaaaaaaa that attempt failed
20:38 🔗 xmc heh
20:38 🔗 _2tl sorry :)
20:39 🔗 _2tl by the way, the /topic is referring to another tracker: http://argonath.db48x.net/ should it be used too?

irclogger-viewer