#urlteam 2013-07-28,Sun

↑back Search

Time Nickname Message
09:09 🔗 omf_ from the main channel
09:09 🔗 omf_ <cathalgar> Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly
09:09 🔗 omf_ <cathalgar> Hello all: looking to talk to someone in UrlTeam? :)
09:09 🔗 omf_ <cathalgar> Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump
09:09 🔗 omf_ <cathalgar> It was designed as a privacy shiv, not an archivist solution, y'see.
09:10 🔗 omf_ I think this could be useful for discovery of more bitly urls
09:21 🔗 cathalgar Hello all!
09:24 🔗 cathalgar So, delighted to learn of UrlTeam: was pointed to it by someone after writing a bitly circumvention script
09:25 🔗 cathalgar Wondering what, if anything, I can do to make the webcache end of the system more useful to the urlteam effort?
09:25 🔗 cathalgar At present it's bitly only, but if I can convince a bunch of people to use it, it might be a nice constant means of harvesting real URLs without having to guess URLs (tipping off service to scrapers?)
09:26 🔗 cathalgar And it's got a selling point for participants: it removes click-tracking so users not only assist urlteam, they get more privacy. :)
09:35 🔗 xmc very tempting thought
09:37 🔗 cathalgar If not a re-hack of my script, it's at least a thought for urlteam efforts: userscripts let you make cross-domain requests..
09:37 🔗 cathalgar ..so a userscript could be used to harvest real URLs and post them to an UrlTeam API trivially enough.
09:38 🔗 cathalgar Of course it's best for the API to do resolution of that url itself, to prevent people spoofing, but it cuts down the keyspace for trusted urlteam scrapers at least.
09:48 🔗 xmc right
10:00 🔗 cathalgar Thinking of writing a script that scrapes all of Twitter's "most popular accounts" list for likely-to-be-big short urls..
10:00 🔗 cathalgar https://twitter.com/who_to_follow/interests
10:01 🔗 cathalgar Few hundred accounts there, totally doable. Probably higher traffic than feasable using a dedicated account and API though
10:02 🔗 xmc hm, sure
10:37 🔗 cathalgar ..so is there any documentation out there on how to submit to UrlTeam? Is there an API somewhere I can use?
10:37 🔗 cathalgar Likewise for fetching, though if the dumps are too large I can't host on pythonanywhere..
10:42 🔗 omf_ http://urlte.am/ has torrent info and file lists
10:42 🔗 omf_ as well as data formats, etc..
10:56 🔗 cathalgar But there's no API I can direct a webapp to, at present?
10:56 🔗 cathalgar i.e., users request stubs, webapp resolves them, then directs stubs and resolved URLs to an URLteam archive API?
11:14 🔗 omf_ the tracker is currently offline while we migrate it
11:14 🔗 omf_ let me grab the github url so you can see the interface
11:14 🔗 omf_ We tend to build stuff way faster than we document it
11:15 🔗 omf_ https://github.com/ArchiveTeam/tinyarchive
11:15 🔗 omf_ https://github.com/ArchiveTeam/tinyback
15:51 🔗 ersi cathalgar: There's no API, no. Not currently at least :)
16:13 🔗 cathalgar Hm
16:13 🔗 cathalgar Cool, thanks
16:13 🔗 cathalgar GTG, talk soon perhaps! :)
16:14 🔗 cathalgar If you guys *do* want to scrape what URLs pop up on my webapp, you can get a database dump at cathalgarvey.pythonanywhere.com/unbitly/dump but it rolls over at ~1000 at present, dropping least recent URLs.
16:14 🔗 cathalgar TTYL
16:45 🔗 ersi neat

irclogger-viewer