[01:15] *** Coderjo has joined #internetarchive [02:28] *** logchfoo0 starts logging #internetarchive at Fri Sep 28 02:28:59 2018 [02:28] *** logchfoo0 has joined #internetarchive [02:49] *** zino has quit IRC (Excess Flood) [02:52] *** zino has joined #internetarchive [03:03] *** Somebody2 has joined #internetarchive [03:14] *** JAA has joined #internetarchive [03:15] *** bakJAA sets mode: +o JAA [03:43] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:55] *** odemg has joined #internetarchive [04:13] *** tuluu has quit IRC (Read error: Connection refused) [04:14] *** Frogging has quit IRC (Ping timeout: 268 seconds) [04:15] *** tuluu has joined #internetarchive [04:16] *** Frogging has joined #internetarchive [04:24] *** tsr has quit IRC (Ping timeout: 268 seconds) [04:27] *** VoynichCr has quit IRC (Ping timeout: 268 seconds) [04:27] *** VoynichCr has joined #internetarchive [04:30] *** tsr has joined #internetarchive [06:40] *** kyounko has quit IRC (Read error: Connection reset by peer) [12:52] there's no one at IA vacuuming up all of twitter, is there? [12:53] the 'put every tweet into a warc' strategy seems a tad slow and inefficient because the search results essentially give you the tweet that you can put into some database [13:34] *** atomotic has joined #internetarchive [13:52] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:31] *** yuitimoth has quit IRC (Read error: Operation timed out) [15:41] *** atomotic has joined #internetarchive [16:31] iirc twitter declined to work with them [16:43] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [17:36] Twitter has only offered to work with Library of Congress, but LoC apparently didn't see the value in it, so they stopped archiving all tweets in 2017. [17:37] they = LoC [17:37] https://www.npr.org/sections/thetwo-way/2017/12/26/573609499/library-of-congress-will-no-longer-archive-every-tweet [17:40] there's a question of budget too, archiving the firehose takes $$$ [18:50] Does it really though? It was firehose but without images and video. What's the bandwidth of that thing today? [19:01] Assuming 228 bytes per tweet: (/ (* 500000000.0 228) (* 1024 1024 1024)) => 106 GiB/day. That sounds low, but I can't math it any other way. [19:03] Given that Tweets are shorter than 228 bytes on average even given Unicode, and the extreme compressability, it sound cheap-ish to archive. [21:26] i bet they archived more than just plain tweets, metadata (who, when, replies/thread pointers, etc) [21:26] if you check TW API, you can see how much info a simple tweet has, like 1 KB or more