Time |
Nickname |
Message |
02:24
🔗
|
jdunck |
omf_ firehose or ? |
02:25
🔗
|
omf_ |
spritzer and searches. Very few groups have access to the full firehose |
02:36
🔗
|
jdunck |
omf_ sure, costs $ from gnip i guess. are there any estimates of how complete the coverage is using spritzer? |
02:36
🔗
|
jdunck |
presumably popular urls get covered, but lots don't, right? |
02:37
🔗
|
omf_ |
it also depends on the shortener |
02:37
🔗
|
omf_ |
some we just increment the value and find more urls |
02:38
🔗
|
omf_ |
From my own observations more and more companies using a shortener that is just an alias to bitly |
02:38
🔗
|
jdunck |
yeah |
02:38
🔗
|
omf_ |
We are always looking for new ways to discover urls |
02:39
🔗
|
jdunck |
hmm, bit.ly customs domains are cnames, right? |
02:39
🔗
|
omf_ |
no idea |
02:39
🔗
|
omf_ |
just look one up from our list on the wiki |
02:40
🔗
|
jdunck |
it looks like they are actually A records to a bit.ly ip bock |
02:40
🔗
|
jdunck |
block |
02:40
🔗
|
jdunck |
j.mp -> A 69.58.188.45 |
02:40
🔗
|
jdunck |
just saying, that may be a good way to tell that it is indeed bit.ly |
02:41
🔗
|
omf_ |
the fastest is to just replace their domain name with bit.ly and try and load the page |
02:41
🔗
|
omf_ |
it uses the same hash value pool |
02:41
🔗
|
jdunck |
oh, huh |
02:43
🔗
|
omf_ |
yep simple for us |
02:43
🔗
|
jdunck |
so i know python and webby stuff. |
02:43
🔗
|
jdunck |
is there a todo or issues list somewhere? |
02:44
🔗
|
omf_ |
the always ongoing need to run the warrior on urlteam is the only task I know of. You could check the github repos, let me link you |
02:45
🔗
|
jdunck |
'if ps.hostname in bitly_pro_hosts or ps.hostname in ["bit.ly", "j.mp", "bitly.com"]:' |
02:45
🔗
|
jdunck |
heh |
02:46
🔗
|
omf_ |
https://github.com/ArchiveTeam/urlteam-stuff https://github.com/chronomex/urlteam https://github.com/ArchiveTeam/tinyback |
02:47
🔗
|
jdunck |
cool, thanks for pointers |
02:48
🔗
|
omf_ |
There might be more code repos but those are the only ones I know of |
02:49
🔗
|
jdunck |
librarians would hate you guys :) |
02:50
🔗
|
jdunck |
beautiful mess. |
02:50
🔗
|
jdunck |
(meant in the best way possible) |
02:51
🔗
|
omf_ |
librarians are always up our ass |
02:51
🔗
|
omf_ |
Fuck them, they never contribute shit |
02:52
🔗
|
omf_ |
we cannot use this because it does not have metadata. The common response: metadata comes later. If they do not "get" that then I usually follow up with: your bitches does not help this process |
02:52
🔗
|
omf_ |
bunch of sheltered fuckers |
02:53
🔗
|
jdunck |
as i said earlier, i'm glad you exist. |
02:53
🔗
|
omf_ |
I am glad to have found this group so I could help |
02:53
🔗
|
jdunck |
i think some day librarians may realize that access to knowledge has little to do with books on shelves. |
02:54
🔗
|
jdunck |
until then, a pirates life for me |
02:54
🔗
|
omf_ |
yeah in a decade or two |
02:54
🔗
|
omf_ |
all I ever hear out of libraries is how their budget got cut or how they are lucky to have kept access to some shitty web database |
02:55
🔗
|
omf_ |
It is disgusting how behind the tech curve the whole field is. I, as a non-academic can release papers all I want. I can include the code and data so people can reproduce my results |
02:56
🔗
|
omf_ |
and there is nothing they can do about it. The closed circle of academia papers is being beaten down. I love this because I was told for years I could never break in since I am a slacker |
02:57
🔗
|
omf_ |
now they are the old fossils |
02:57
🔗
|
omf_ |
take urlteam for example |
02:57
🔗
|
omf_ |
you could download the full url set and start a search engine with that |
02:57
🔗
|
omf_ |
or study what is popular |
02:58
🔗
|
jdunck |
bit.ly studies it and i think does publish :P |
02:58
🔗
|
omf_ |
or any thing you can think of, for free. That is the gift these kinds of save the world projects hand out |
02:58
🔗
|
jdunck |
but for sure, i think the inability to integrate tech into society as quickly as tech is changing.. it's a critical problem for society. |
02:59
🔗
|
omf_ |
How come people are so quick to use the newest tech smart phone but they go back to fucking crap spreadsheets for data |
02:59
🔗
|
jdunck |
how does tracker deployment work? who has the keys? |
02:59
🔗
|
jdunck |
driver vs. mechanic. |
02:59
🔗
|
jdunck |
they don't see that it's a tool for their use. |
02:59
🔗
|
omf_ |
yeah |
03:00
🔗
|
omf_ |
for this I am guessing swebb, soultcer and maybe alard. Alard sets up the trackers for the other projects |
03:01
🔗
|
omf_ |
I think it is more the human nature problem of people resisting change and not wanting to learn new things |
03:05
🔗
|
omf_ |
Every time I use newish tech to save a company money they always ask themselves why didn't they do that before. |
03:05
🔗
|
omf_ |
Like taking a paperwork process that took 3 days and automating it into a report that takes 90 seconds to generate |
09:35
🔗
|
ersi |
For the record: soultcer runs tracker.tinyarchive.org and has done most/all of tinyarchive (tracker)/tinyback (client) repositories |
09:37
🔗
|
ersi |
And no, none of the work in the URLTeam tinyarchive/back combo uses the twitter data - it's all generated short codes, devided in chunks and distributed to the workers to look up. |
09:37
🔗
|
ersi |
swebb is however slurping twitter data and unrolling shortenerlinks - that data gets shared and gets aggregated into the tinyarchive data dumps released every year (or at least in this latest dump if I'm not mistaken) |