#urlteam 2013-04-23,Tue

↑back Search

Time Nickname Message
02:24 🔗 jdunck omf_ firehose or ?
02:25 🔗 omf_ spritzer and searches. Very few groups have access to the full firehose
02:36 🔗 jdunck omf_ sure, costs $ from gnip i guess. are there any estimates of how complete the coverage is using spritzer?
02:36 🔗 jdunck presumably popular urls get covered, but lots don't, right?
02:37 🔗 omf_ it also depends on the shortener
02:37 🔗 omf_ some we just increment the value and find more urls
02:38 🔗 omf_ From my own observations more and more companies using a shortener that is just an alias to bitly
02:38 🔗 jdunck yeah
02:38 🔗 omf_ We are always looking for new ways to discover urls
02:39 🔗 jdunck hmm, bit.ly customs domains are cnames, right?
02:39 🔗 omf_ no idea
02:39 🔗 omf_ just look one up from our list on the wiki
02:40 🔗 jdunck it looks like they are actually A records to a bit.ly ip bock
02:40 🔗 jdunck block
02:40 🔗 jdunck j.mp -> A 69.58.188.45
02:40 🔗 jdunck just saying, that may be a good way to tell that it is indeed bit.ly
02:41 🔗 omf_ the fastest is to just replace their domain name with bit.ly and try and load the page
02:41 🔗 omf_ it uses the same hash value pool
02:41 🔗 jdunck oh, huh
02:43 🔗 omf_ yep simple for us
02:43 🔗 jdunck so i know python and webby stuff.
02:43 🔗 jdunck is there a todo or issues list somewhere?
02:44 🔗 omf_ the always ongoing need to run the warrior on urlteam is the only task I know of. You could check the github repos, let me link you
02:45 🔗 jdunck 'if ps.hostname in bitly_pro_hosts or ps.hostname in ["bit.ly", "j.mp", "bitly.com"]:'
02:45 🔗 jdunck heh
02:46 🔗 omf_ https://github.com/ArchiveTeam/urlteam-stuff https://github.com/chronomex/urlteam https://github.com/ArchiveTeam/tinyback
02:47 🔗 jdunck cool, thanks for pointers
02:48 🔗 omf_ There might be more code repos but those are the only ones I know of
02:49 🔗 jdunck librarians would hate you guys :)
02:50 🔗 jdunck beautiful mess.
02:50 🔗 jdunck (meant in the best way possible)
02:51 🔗 omf_ librarians are always up our ass
02:51 🔗 omf_ Fuck them, they never contribute shit
02:52 🔗 omf_ we cannot use this because it does not have metadata. The common response: metadata comes later. If they do not "get" that then I usually follow up with: your bitches does not help this process
02:52 🔗 omf_ bunch of sheltered fuckers
02:53 🔗 jdunck as i said earlier, i'm glad you exist.
02:53 🔗 omf_ I am glad to have found this group so I could help
02:53 🔗 jdunck i think some day librarians may realize that access to knowledge has little to do with books on shelves.
02:54 🔗 jdunck until then, a pirates life for me
02:54 🔗 omf_ yeah in a decade or two
02:54 🔗 omf_ all I ever hear out of libraries is how their budget got cut or how they are lucky to have kept access to some shitty web database
02:55 🔗 omf_ It is disgusting how behind the tech curve the whole field is. I, as a non-academic can release papers all I want. I can include the code and data so people can reproduce my results
02:56 🔗 omf_ and there is nothing they can do about it. The closed circle of academia papers is being beaten down. I love this because I was told for years I could never break in since I am a slacker
02:57 🔗 omf_ now they are the old fossils
02:57 🔗 omf_ take urlteam for example
02:57 🔗 omf_ you could download the full url set and start a search engine with that
02:57 🔗 omf_ or study what is popular
02:58 🔗 jdunck bit.ly studies it and i think does publish :P
02:58 🔗 omf_ or any thing you can think of, for free. That is the gift these kinds of save the world projects hand out
02:58 🔗 jdunck but for sure, i think the inability to integrate tech into society as quickly as tech is changing.. it's a critical problem for society.
02:59 🔗 omf_ How come people are so quick to use the newest tech smart phone but they go back to fucking crap spreadsheets for data
02:59 🔗 jdunck how does tracker deployment work? who has the keys?
02:59 🔗 jdunck driver vs. mechanic.
02:59 🔗 jdunck they don't see that it's a tool for their use.
02:59 🔗 omf_ yeah
03:00 🔗 omf_ for this I am guessing swebb, soultcer and maybe alard. Alard sets up the trackers for the other projects
03:01 🔗 omf_ I think it is more the human nature problem of people resisting change and not wanting to learn new things
03:05 🔗 omf_ Every time I use newish tech to save a company money they always ask themselves why didn't they do that before.
03:05 🔗 omf_ Like taking a paperwork process that took 3 days and automating it into a report that takes 90 seconds to generate
09:35 🔗 ersi For the record: soultcer runs tracker.tinyarchive.org and has done most/all of tinyarchive (tracker)/tinyback (client) repositories
09:37 🔗 ersi And no, none of the work in the URLTeam tinyarchive/back combo uses the twitter data - it's all generated short codes, devided in chunks and distributed to the workers to look up.
09:37 🔗 ersi swebb is however slurping twitter data and unrolling shortenerlinks - that data gets shared and gets aggregated into the tinyarchive data dumps released every year (or at least in this latest dump if I'm not mistaken)

irclogger-viewer