Time |
Nickname |
Message |
18:40
🔗
|
_2tl |
hi |
20:00
🔗
|
_2tl |
I don't want to sound negative, but it seems, the delay between 2 requests is not handled the same in tinyback and terroroftinytown-client-grab |
20:01
🔗
|
_2tl |
in terroroftinytown-client-grab, there's a delay which is more or less a sleep() amount to call between 2 requests |
20:02
🔗
|
_2tl |
in tinyback it was a bit more precise and optimized (in my opinion): |
20:02
🔗
|
_2tl |
there was a rate limit tuple, defined here: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/services.py#L48 |
20:03
🔗
|
_2tl |
implementation was there: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/__init__.py#L132 |
20:03
🔗
|
_2tl |
the thing is, if I take is.gd for example, you can scrape 60 url in 1 minute, so with terroroftinytown-client-grab, the delay will be implemented as 1s |
20:04
🔗
|
_2tl |
now, think on a 1 day timeframe, with tinyback you could scrape 86,400 urls / day |
20:05
🔗
|
_2tl |
with terroroftinytown-client-grab, you will call sleep(1) 86,400 times, but if you take into account the RTT for each url request, maybe you only scrape 80/85k url |
20:07
🔗
|
xmc |
hm, good point |
20:11
🔗
|
_2tl |
also, is it planned to add more projects, like adding back big shorteners like bit.ly, is.gd, tinyurl ... and put them in parallel, so we can maximize the scrapping power |
20:14
🔗
|
_2tl |
today there's only a focus on y.ahoo.it, but if I understand correctly, by increasing --concurrent number, we could scrap more urls without slowing down y.ahoo.it current scrapping |
20:15
🔗
|
xmc |
that sounds right |
20:18
🔗
|
aaaaaaaaa |
they've done others in the past including 3 (might have been 2) at the same time. |
20:21
🔗
|
_2tl |
yes they did it, I remember too. |
20:31
🔗
|
aaaaaaaaa |
To be clear, I was referring to the ToTT grabber; so, I think it is just a matter of time and reverse engineering more trackers. |
20:37
🔗
|
_2tl |
what's the ToTT grabber? |
20:38
🔗
|
aaaaaaaaa |
terror of tiny town, I was trying to save myself from having to type it. |
20:38
🔗
|
aaaaaaaaa |
that attempt failed |
20:38
🔗
|
xmc |
heh |
20:38
🔗
|
_2tl |
sorry :) |
20:39
🔗
|
_2tl |
by the way, the /topic is referring to another tracker: http://argonath.db48x.net/ should it be used too? |