Time |
Nickname |
Message |
09:09
🔗
|
omf_ |
from the main channel |
09:09
🔗
|
omf_ |
<cathalgar> Have a userscript project I was told ye might like: https://gitorious.org/cguserscripts/unbitly |
09:09
🔗
|
omf_ |
<cathalgar> Hello all: looking to talk to someone in UrlTeam? :) |
09:09
🔗
|
omf_ |
<cathalgar> Online API keeps a cache of recent URLs, but I could come up with something cleaner if it could be integrated with UrlTeam's efforts, for a more permanent backup of cached URLs: https://cathalgarvey.pythonanywhere.com/unbitly/dump |
09:09
🔗
|
omf_ |
<cathalgar> It was designed as a privacy shiv, not an archivist solution, y'see. |
09:10
🔗
|
omf_ |
I think this could be useful for discovery of more bitly urls |
09:21
🔗
|
cathalgar |
Hello all! |
09:24
🔗
|
cathalgar |
So, delighted to learn of UrlTeam: was pointed to it by someone after writing a bitly circumvention script |
09:25
🔗
|
cathalgar |
Wondering what, if anything, I can do to make the webcache end of the system more useful to the urlteam effort? |
09:25
🔗
|
cathalgar |
At present it's bitly only, but if I can convince a bunch of people to use it, it might be a nice constant means of harvesting real URLs without having to guess URLs (tipping off service to scrapers?) |
09:26
🔗
|
cathalgar |
And it's got a selling point for participants: it removes click-tracking so users not only assist urlteam, they get more privacy. :) |
09:35
🔗
|
xmc |
very tempting thought |
09:37
🔗
|
cathalgar |
If not a re-hack of my script, it's at least a thought for urlteam efforts: userscripts let you make cross-domain requests.. |
09:37
🔗
|
cathalgar |
..so a userscript could be used to harvest real URLs and post them to an UrlTeam API trivially enough. |
09:38
🔗
|
cathalgar |
Of course it's best for the API to do resolution of that url itself, to prevent people spoofing, but it cuts down the keyspace for trusted urlteam scrapers at least. |
09:48
🔗
|
xmc |
right |
10:00
🔗
|
cathalgar |
Thinking of writing a script that scrapes all of Twitter's "most popular accounts" list for likely-to-be-big short urls.. |
10:00
🔗
|
cathalgar |
https://twitter.com/who_to_follow/interests |
10:01
🔗
|
cathalgar |
Few hundred accounts there, totally doable. Probably higher traffic than feasable using a dedicated account and API though |
10:02
🔗
|
xmc |
hm, sure |
10:37
🔗
|
cathalgar |
..so is there any documentation out there on how to submit to UrlTeam? Is there an API somewhere I can use? |
10:37
🔗
|
cathalgar |
Likewise for fetching, though if the dumps are too large I can't host on pythonanywhere.. |
10:42
🔗
|
omf_ |
http://urlte.am/ has torrent info and file lists |
10:42
🔗
|
omf_ |
as well as data formats, etc.. |
10:56
🔗
|
cathalgar |
But there's no API I can direct a webapp to, at present? |
10:56
🔗
|
cathalgar |
i.e., users request stubs, webapp resolves them, then directs stubs and resolved URLs to an URLteam archive API? |
11:14
🔗
|
omf_ |
the tracker is currently offline while we migrate it |
11:14
🔗
|
omf_ |
let me grab the github url so you can see the interface |
11:14
🔗
|
omf_ |
We tend to build stuff way faster than we document it |
11:15
🔗
|
omf_ |
https://github.com/ArchiveTeam/tinyarchive |
11:15
🔗
|
omf_ |
https://github.com/ArchiveTeam/tinyback |
15:51
🔗
|
ersi |
cathalgar: There's no API, no. Not currently at least :) |
16:13
🔗
|
cathalgar |
Hm |
16:13
🔗
|
cathalgar |
Cool, thanks |
16:13
🔗
|
cathalgar |
GTG, talk soon perhaps! :) |
16:14
🔗
|
cathalgar |
If you guys *do* want to scrape what URLs pop up on my webapp, you can get a database dump at cathalgarvey.pythonanywhere.com/unbitly/dump but it rolls over at ~1000 at present, dropping least recent URLs. |
16:14
🔗
|
cathalgar |
TTYL |
16:45
🔗
|
ersi |
neat |