#internetarchive 2018-09-28,Fri

↑back Search

Time Nickname Message
01:15 🔗 Coderjo has joined #internetarchive
02:28 🔗 logchfoo0 starts logging #internetarchive at Fri Sep 28 02:28:59 2018
02:28 🔗 logchfoo0 has joined #internetarchive
02:49 🔗 zino has quit IRC (Excess Flood)
02:52 🔗 zino has joined #internetarchive
03:03 🔗 Somebody2 has joined #internetarchive
03:14 🔗 JAA has joined #internetarchive
03:15 🔗 bakJAA sets mode: +o JAA
03:43 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
03:55 🔗 odemg has joined #internetarchive
04:13 🔗 tuluu has quit IRC (Read error: Connection refused)
04:14 🔗 Frogging has quit IRC (Ping timeout: 268 seconds)
04:15 🔗 tuluu has joined #internetarchive
04:16 🔗 Frogging has joined #internetarchive
04:24 🔗 tsr has quit IRC (Ping timeout: 268 seconds)
04:27 🔗 VoynichCr has quit IRC (Ping timeout: 268 seconds)
04:27 🔗 VoynichCr has joined #internetarchive
04:30 🔗 tsr has joined #internetarchive
06:40 🔗 kyounko has quit IRC (Read error: Connection reset by peer)
12:52 🔗 ivan there's no one at IA vacuuming up all of twitter, is there?
12:53 🔗 ivan the 'put every tweet into a warc' strategy seems a tad slow and inefficient because the search results essentially give you the tweet that you can put into some database
13:34 🔗 atomotic has joined #internetarchive
13:52 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
15:31 🔗 yuitimoth has quit IRC (Read error: Operation timed out)
15:41 🔗 atomotic has joined #internetarchive
16:31 🔗 DFJustin iirc twitter declined to work with them
16:43 🔗 atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
17:36 🔗 zino Twitter has only offered to work with Library of Congress, but LoC apparently didn't see the value in it, so they stopped archiving all tweets in 2017.
17:37 🔗 zino they = LoC
17:37 🔗 zino https://www.npr.org/sections/thetwo-way/2017/12/26/573609499/library-of-congress-will-no-longer-archive-every-tweet
17:40 🔗 DFJustin there's a question of budget too, archiving the firehose takes $$$
18:50 🔗 zino Does it really though? It was firehose but without images and video. What's the bandwidth of that thing today?
19:01 🔗 zino Assuming 228 bytes per tweet: (/ (* 500000000.0 228) (* 1024 1024 1024)) => 106 GiB/day. That sounds low, but I can't math it any other way.
19:03 🔗 zino Given that Tweets are shorter than 228 bytes on average even given Unicode, and the extreme compressability, it sound cheap-ish to archive.
21:26 🔗 VoynichCr i bet they archived more than just plain tweets, metadata (who, when, replies/thread pointers, etc)
21:26 🔗 VoynichCr if you check TW API, you can see how much info a simple tweet has, like 1 KB or more

irclogger-viewer