#urlteam 2013-05-15,Wed

↑back Search

Time Nickname Message
18:00 🔗 soultcer chfoo: Did you have any troubles with bans from visibli? When I tried it on a server, it got banned and the ban never expires. I didn't add any more tasks to the server for visibli since I don't want to get users banned
18:47 🔗 chfoo soultcer: i havn't gotten any bans since my last commit to fix up the script. right now i'm grabbing the hex codes at a very slow rate through tor and using a list common user agents.
18:50 🔗 chfoo the delay between requests is based on the sine function and it averages a little above 1 url per second
18:53 🔗 soultcer Nice
18:58 🔗 chfoo i have no idea how to grab the regular shortcodes at a practical rate without causing noticeable traffic though
22:22 🔗 PepsiMax larger botnet
22:46 🔗 chfoo the number of machines doesn't really matter. it's more like salami slicing.
22:47 🔗 chfoo the goal is to get all the urls using the least amount of traffic for both parties
22:54 🔗 chfoo an idea i have so far is to search through google, twitter etc for valid shortcodes since they have an automated banning system
22:56 🔗 chfoo it might be possible that the shortcodes arn't random too
22:57 🔗 omf_ chfoo, some work has been done in this. Some of the url shorteners we just increment and test
22:57 🔗 omf_ I thought we were scraping twitter for urls but I was told that isn't happening. I know we have the twitter data to do it
23:02 🔗 chfoo i havn't experimented much with their modern shortcodes system yet so it's just a black box to me right now.
23:03 🔗 omf_ the main page and wiki have a lot info we put
23:04 🔗 omf_ Also some shorteners are just aliases to other services which means the hashes are the same
23:04 🔗 omf_ which is nice
23:22 🔗 chfoo oh cool, i didn't know about the twitter data archives before.
23:24 🔗 chfoo for now i'll keep grabbing visibli hex. then when i have more time, i'll investigate a bit more with normal visibli.

irclogger-viewer