Time |
Nickname |
Message |
01:15
🔗
|
|
Coderjo has joined #internetarchive |
02:28
🔗
|
|
logchfoo0 starts logging #internetarchive at Fri Sep 28 02:28:59 2018 |
02:28
🔗
|
|
logchfoo0 has joined #internetarchive |
02:49
🔗
|
|
zino has quit IRC (Excess Flood) |
02:52
🔗
|
|
zino has joined #internetarchive |
03:03
🔗
|
|
Somebody2 has joined #internetarchive |
03:14
🔗
|
|
JAA has joined #internetarchive |
03:15
🔗
|
|
bakJAA sets mode: +o JAA |
03:43
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
03:55
🔗
|
|
odemg has joined #internetarchive |
04:13
🔗
|
|
tuluu has quit IRC (Read error: Connection refused) |
04:14
🔗
|
|
Frogging has quit IRC (Ping timeout: 268 seconds) |
04:15
🔗
|
|
tuluu has joined #internetarchive |
04:16
🔗
|
|
Frogging has joined #internetarchive |
04:24
🔗
|
|
tsr has quit IRC (Ping timeout: 268 seconds) |
04:27
🔗
|
|
VoynichCr has quit IRC (Ping timeout: 268 seconds) |
04:27
🔗
|
|
VoynichCr has joined #internetarchive |
04:30
🔗
|
|
tsr has joined #internetarchive |
06:40
🔗
|
|
kyounko has quit IRC (Read error: Connection reset by peer) |
12:52
🔗
|
ivan |
there's no one at IA vacuuming up all of twitter, is there? |
12:53
🔗
|
ivan |
the 'put every tweet into a warc' strategy seems a tad slow and inefficient because the search results essentially give you the tweet that you can put into some database |
13:34
🔗
|
|
atomotic has joined #internetarchive |
13:52
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
15:31
🔗
|
|
yuitimoth has quit IRC (Read error: Operation timed out) |
15:41
🔗
|
|
atomotic has joined #internetarchive |
16:31
🔗
|
DFJustin |
iirc twitter declined to work with them |
16:43
🔗
|
|
atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
17:36
🔗
|
zino |
Twitter has only offered to work with Library of Congress, but LoC apparently didn't see the value in it, so they stopped archiving all tweets in 2017. |
17:37
🔗
|
zino |
they = LoC |
17:37
🔗
|
zino |
https://www.npr.org/sections/thetwo-way/2017/12/26/573609499/library-of-congress-will-no-longer-archive-every-tweet |
17:40
🔗
|
DFJustin |
there's a question of budget too, archiving the firehose takes $$$ |
18:50
🔗
|
zino |
Does it really though? It was firehose but without images and video. What's the bandwidth of that thing today? |
19:01
🔗
|
zino |
Assuming 228 bytes per tweet: (/ (* 500000000.0 228) (* 1024 1024 1024)) => 106 GiB/day. That sounds low, but I can't math it any other way. |
19:03
🔗
|
zino |
Given that Tweets are shorter than 228 bytes on average even given Unicode, and the extreme compressability, it sound cheap-ish to archive. |
21:26
🔗
|
VoynichCr |
i bet they archived more than just plain tweets, metadata (who, when, replies/thread pointers, etc) |
21:26
🔗
|
VoynichCr |
if you check TW API, you can see how much info a simple tweet has, like 1 KB or more |