[02:24] <jdunck> omf_ firehose or ?
[02:25] <omf_> spritzer and searches. Very few groups have access to the full firehose
[02:36] <jdunck> omf_ sure, costs $ from gnip i guess.  are there any estimates of how complete the coverage is using spritzer?
[02:36] <jdunck> presumably popular urls get covered, but lots don't, right?
[02:37] <omf_> it also depends on the shortener
[02:37] <omf_> some we just increment the value and find more urls
[02:38] <omf_> From my own observations more and more companies using a shortener that is just an alias to bitly
[02:38] <jdunck> yeah
[02:38] <omf_> We are always looking for new ways to discover urls
[02:39] <jdunck> hmm, bit.ly customs domains are cnames, right?
[02:39] <omf_> no idea
[02:39] <omf_> just look one up from our list on the wiki
[02:40] <jdunck> it looks like they are actually A records to a bit.ly ip bock
[02:40] <jdunck> block
[02:40] <jdunck> j.mp -> A 69.58.188.45
[02:40] <jdunck> just saying, that may be a good way to tell that it is indeed bit.ly
[02:41] <omf_> the fastest is to just replace their domain name with bit.ly and try and load the page
[02:41] <omf_> it uses the same hash value pool
[02:41] <jdunck> oh, huh
[02:43] <omf_> yep simple for us
[02:43] <jdunck> so i know python and webby stuff.
[02:43] <jdunck> is there a todo or issues list somewhere?
[02:44] <omf_> the always ongoing need to run the warrior on urlteam is the only task I know of. You could check the github repos, let me link you
[02:45] <jdunck> 'if ps.hostname in bitly_pro_hosts or ps.hostname in ["bit.ly", "j.mp", "bitly.com"]:'
[02:45] <jdunck> heh
[02:46] <omf_> https://github.com/ArchiveTeam/urlteam-stuff https://github.com/chronomex/urlteam https://github.com/ArchiveTeam/tinyback
[02:47] <jdunck> cool, thanks for pointers
[02:48] <omf_> There might be more code repos but those are the only ones I know of
[02:49] <jdunck> librarians would hate you guys :)
[02:50] <jdunck> beautiful mess.
[02:50] <jdunck> (meant in the best way possible)
[02:51] <omf_> librarians are always up our ass
[02:51] <omf_> Fuck them, they never contribute shit
[02:52] <omf_> we cannot use this because it does not have metadata. The common response: metadata comes later. If they do not "get" that then I usually follow up with: your bitches does not help this process
[02:52] <omf_> bunch of sheltered fuckers
[02:53] <jdunck> as i said earlier, i'm glad you exist.
[02:53] <omf_> I am glad to have found this group so I could help
[02:53] <jdunck> i think some day librarians may realize that access to knowledge has little to do with books on shelves.
[02:54] <jdunck> until then, a pirates life for me
[02:54] <omf_> yeah in a decade or two
[02:54] <omf_> all I ever hear out of libraries is how their budget got cut or how they are lucky to have kept access to some shitty web database
[02:55] <omf_> It is disgusting how behind the tech curve the whole field is. I, as a non-academic can release papers all I want. I can include the code and data so people can reproduce my results
[02:56] <omf_> and there is nothing they can do about it. The closed circle of academia papers is being beaten down. I love this because I was told for years I could never break in since I am a slacker
[02:57] <omf_> now they are the old fossils
[02:57] <omf_> take urlteam for example
[02:57] <omf_> you could download the full url set and start a search engine with that
[02:57] <omf_> or study what is popular
[02:58] <jdunck> bit.ly studies it and i think does publish :P
[02:58] <omf_> or any thing you can think of, for free. That is the gift these kinds of save the world projects hand out
[02:58] <jdunck> but for sure, i think the inability to integrate tech into society as quickly as tech is changing.. it's a critical problem for society.
[02:59] <omf_> How come people are so quick to use the newest tech smart phone but they go back to fucking crap spreadsheets for data
[02:59] <jdunck> how does tracker deployment work?  who has the keys?
[02:59] <jdunck> driver vs. mechanic.
[02:59] <jdunck> they don't see that it's a tool for their use.
[02:59] <omf_> yeah
[03:00] <omf_> for this I am guessing swebb, soultcer and maybe alard. Alard sets up the trackers for the other projects
[03:01] <omf_> I think it is more the human nature problem of people resisting change and not wanting to learn new things
[03:05] <omf_> Every time I use newish tech to save a company money they always ask themselves why didn't they do that before.
[03:05] <omf_> Like taking a paperwork process that took 3 days and automating it into a report that takes 90 seconds to generate
[09:35] <ersi> For the record: soultcer runs tracker.tinyarchive.org and has done most/all of tinyarchive (tracker)/tinyback (client) repositories
[09:37] <ersi> And no, none of the work in the URLTeam tinyarchive/back combo uses the twitter data - it's all generated short codes, devided in chunks and distributed to the workers to look up.
[09:37] <ersi> swebb is however slurping twitter data and unrolling shortenerlinks - that data gets shared and gets aggregated into the tinyarchive data dumps released every year (or at least in this latest dump if I'm not mistaken)