[02:28] *** logchfoo0 starts logging #archiveteam-ot at Fri Sep 28 02:28:59 2018 [02:28] *** logchfoo0 has joined #archiveteam-ot [02:36] *** BlueMax has joined #archiveteam-ot [02:43] *** erin has quit IRC (Ping timeout: 264 seconds) [02:49] *** zino has quit IRC (Excess Flood) [02:52] *** zino has joined #archiveteam-ot [03:10] *** erin has joined #archiveteam-ot [03:14] *** JAA has joined #archiveteam-ot [03:14] *** svchfoo3 sets mode: +o JAA [03:15] *** bakJAA sets mode: +o JAA [03:18] *** jspiros has joined #archiveteam-ot [03:43] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:55] *** odemg has joined #archiveteam-ot [04:14] *** Frogging has quit IRC (Ping timeout: 268 seconds) [04:16] *** Frogging has joined #archiveteam-ot [04:21] *** Mateon1 has quit IRC (Read error: Connection reset by peer) [04:21] *** Mateon1 has joined #archiveteam-ot [04:24] *** BnAboyZ has joined #archiveteam-ot [04:27] *** VoynichCr has quit IRC (Ping timeout: 268 seconds) [04:27] *** VoynichCr has joined #archiveteam-ot [04:29] *** BnAboyZ has quit IRC (Ping timeout: 268 seconds) [04:30] *** BnAboyZ has joined #archiveteam-ot [07:48] update grab-site for quarantined subreddit support [07:52] my snscrape is running and I'm going to archive a lot of twitter users, send me yours if you want them archived [07:54] *** BlueMax has quit IRC (Read error: Connection reset by peer) [07:55] *** BlueMax has joined #archiveteam-ot [08:01] ivan: Could you please do twitter user morpheus______ for me. Thanks [08:02] Was I doing this wrong I sent dms to ivan [08:02] JAA: On snscrape, is using \ before spaces when doing twitter searches correct? [08:03] Also, is the "quality filter" on or off by default? [08:06] hook54321: That's not really an issue with snscrape but more with your shell. snscrape just needs to see it as a single argument. In typical shells, both snscrape twitter-search a\ search and snscrape twitter-search 'a search' (and a number of other variations) are fine. [08:06] k [08:06] thanks, I will process all of the users mentioned at me in a bit [08:07] snscrape currently doesn't disable the quality filter. It's something I only came across a few days ago. I'll add it this weekend. [08:09] *** Mateon1 has quit IRC (Ping timeout: 260 seconds) [08:09] *** Mateon1 has joined #archiveteam-ot [08:19] does the quality filter affect from: searches? [08:51] I'm testing QFD-banned users on https://shadowban.eu/ and they're no longer marked as QFD-banned [08:52] they seem to show up in search too when using from: [08:55] snscrape isn't going to find retweets for me, is it [08:56] https://twitter.com/starryneutrons/status/1045073527303417856 is showing up in search, for example, I guess something changed in the last few days [09:10] Ivan snscrape doesnt grab retweets [09:37] * ivan imports https://twitter.com/cspan/lists/members-of-congress/members?lang=en [09:56] *** VerifiedJ has joined #archiveteam-ot [10:54] scraping 2787 accounts with snscrape [11:06] does anyone have a list of all bluechecks somewhere [11:22] Ivan let me know when you are done I will find some more [11:31] ivan: Yeah, because of how snscrape discovers tweets (through the search page rather than the user profile, because the latter is limited to 3200 results while the former is not), it can't discover retweets. Or at least I haven't found any way to do that. The search term suggestions I found online are all several years old and no longer work. [11:35] hook54321: FYI, I filed the quality filter issue as https://github.com/JustAnotherArchivist/snscrape/issues/3 [11:41] ok [11:54] Also created issues for a bunch of other things. More to come later. [11:58] does anyone have a kv-store-backed thing for incrementally loading URLs into giant database and producing N-sized items [11:58] I looked at my greader-item-maker from 2013 and it's a hot mess [12:02] might finally have to excuse to write some Rust again [12:14] online.net sent me an email, prices are going up for existing servers [12:23] redis? [12:38] I was going to use a rocksdb [13:05] TIL twitter doesn't need a valid username in the URL, it will redirect if the tweet id is good [13:13] *** Mateon1 has quit IRC (Remote host closed the connection) [13:14] *** Mateon1 has joined #archiveteam-ot [13:28] *** VerifiedJ has quit IRC (Read error: Operation timed out) [14:49] Added a bunch more issues for a variety of things. [14:49] (In case anyone's interested in what I have in mind to do with snscrape soonish.) [15:00] *** VerifiedJ has joined #archiveteam-ot [15:05] *** BlueMax has quit IRC (Quit: Leaving) [15:20] *** wp494 has quit IRC (Read error: Operation timed out) [15:21] *** wp494 has joined #archiveteam-ot [16:41] *** godane has quit IRC (Read error: Operation timed out) [17:24] *** godane has joined #archiveteam-ot [17:24] *** svchfoo1 sets mode: +o godane [18:14] *** bithippo has joined #archiveteam-ot [18:46] Uhm, how do I contribute my WARC? [20:55] ColdIce: if the site is still up and you need it in wayback, maybe rearchive with ArchiveBot [20:56] otherwise upload to an IA item but it will probably not get ingested to wayback [21:07] site is still up [21:07] it's a site for terminal emulator on TIKI-100 [21:07] *** robogoat has quit IRC (Read error: Operation timed out) [21:08] ColdIce: /join #archivebot [21:09] *** astrid has left ][ [21:11] *** robogoat has joined #archiveteam-ot [23:42] *** BlueMax has joined #archiveteam-ot