[00:02] *** Martle has joined #archiveteam-ot [01:12] *** Stilett0 has joined #archiveteam-ot [01:14] *** Stiletto has quit IRC (Read error: Operation timed out) [01:14] *** Stiletto has joined #archiveteam-ot [01:16] *** Stilett0 has quit IRC (Ping timeout: 260 seconds) [01:24] *** Stilett0 has joined #archiveteam-ot [01:25] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [02:05] *** BlueMax has joined #archiveteam-ot [03:38] *** VerifiedJ has quit IRC (Read error: Operation timed out) [03:39] *** dxrt has quit IRC (Ping timeout: 360 seconds) [03:39] *** Polylith has quit IRC (Ping timeout: 360 seconds) [03:39] *** dxrt has joined #archiveteam-ot [03:40] *** svchfoo3 sets mode: +o dxrt [03:40] *** svchfoo1 sets mode: +o dxrt [03:41] *** SketchCo1 has joined #archiveteam-ot [03:42] *** Polylith has joined #archiveteam-ot [03:47] *** arkiver has quit IRC (Ping timeout: 360 seconds) [03:47] *** SketchCow has quit IRC (Read error: Connection reset by peer) [03:52] *** SketchCo1 is now known as SketchCow [03:52] *** arkiver has joined #archiveteam-ot [03:53] *** svchfoo3 sets mode: +o arkiver [03:53] *** svchfoo1 sets mode: +o arkiver [03:56] *** Stiletto has joined #archiveteam-ot [03:58] *** Stilett0 has quit IRC (Read error: Operation timed out) [04:12] *** Stilett0 has joined #archiveteam-ot [04:17] *** Stiletto has quit IRC (Read error: Operation timed out) [06:18] *** logchfoo0 starts logging #archiveteam-ot at Fri Nov 02 06:18:53 2018 [06:18] *** logchfoo0 has joined #archiveteam-ot [06:19] *** svchfoo1 has joined #archiveteam-ot [06:19] *** svchfoo3 sets mode: +o svchfoo1 [06:20] *** svchfoo1 sets mode: +o SketchCow [07:13] *** Martle has quit IRC (Leaving) [08:22] *** svchfoo1 has quit IRC (hub.efnet.us irc.colosolutions.net) [08:22] *** sknebel_ has quit IRC (hub.efnet.us irc.colosolutions.net) [08:22] *** jspiros has quit IRC (hub.efnet.us irc.colosolutions.net) [08:22] *** S1mpbrain has quit IRC (hub.efnet.us irc.colosolutions.net) [08:22] *** ivan has quit IRC (hub.efnet.us irc.colosolutions.net) [08:22] *** JAA has quit IRC (hub.efnet.us irc.colosolutions.net) [08:32] *** svchfoo1 has joined #archiveteam-ot [08:32] *** sknebel_ has joined #archiveteam-ot [08:32] *** jspiros has joined #archiveteam-ot [08:32] *** S1mpbrain has joined #archiveteam-ot [08:32] *** ivan has joined #archiveteam-ot [08:32] *** JAA has joined #archiveteam-ot [08:32] *** irc.colosolutions.net sets mode: +ooo svchfoo1 ivan JAA [08:33] *** bakJAA sets mode: +o JAA [08:33] *** JAA sets mode: +o bakJAA [08:47] JAA: does snscrape always return tweets in reverse-chronological order? [08:48] asking as this would make it simple to continually back up an account: back it up fully once, then keep archiving small amounts until you encounter a tweet URL that has already been archived before [09:35] *** alex__ has joined #archiveteam-ot [09:36] *** alex__ has quit IRC (Quit: alex__) [09:48] *** alex__ has joined #archiveteam-ot [09:50] *** alex__ has quit IRC (Quit: alex__) [10:23] *** VerifiedJ has joined #archiveteam-ot [10:52] *** alex__ has joined #archiveteam-ot [11:02] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:39] betamax: Yes, it should. And you can also do something like snscrape twitter-search 'from:username since:2018-10-01' to get only tweets after a certain date. I don't think there's a way to specify a minimum tweet ID though. [11:40] I don't remember whether "since" is inclusive or not. "until" also exists. [11:53] JAA: that's good to know [11:55] there's probably a 'nice' way to do it, but in the worst case I can just grab the first 10 tweets, see if that includes the top tweet in the list grabbed previously, if not grab the top 20, then 30... [11:55] until I get to a point where it's grabbed enough to overlap what it got last time [11:56] betamax: If you want to do that, it's best to use snscrape not through the CLI but from within Python. [11:57] true [11:57] also, great job on snscrape: currently got over 220,000 tweets from around 250 accounts [11:57] That use is currently undocumented, and I can't give you the exact syntax right now, but the code would look something like this: [11:57] scraper = snscrape.modules.twitter.TwitterUserScraper('username') [11:57] for tweetUrl in scraper.get_items(): [11:57] tweetId = get_tweet_id_from_url(tweetUrl) [11:57] if tweetId <= newestIdFromPreviousRun: [11:57] break [11:57] print(tweetUrl) [11:58] thanks [11:58] Scraper.get_items is a generator, so this will iterate through pages as long as needed. [11:58] ah, excellent [11:59] I don't expect this API to change anytime soon, so if you can figure the exact syntax out from the code, that should be safe. But as mentioned, no documentation on that yet. [12:00] (Relevant issue: https://github.com/JustAnotherArchivist/snscrape/issues/7 ) [12:01] Oh, and tweetUrl won't be a string but an Item. Access the 'url' attribute to get the URL. [12:01] In the future, this will be a specialised Item subclass which provides things like tweetId, date, message, etc. [12:02] ( https://github.com/JustAnotherArchivist/snscrape/issues/9 ) [12:24] 23:22:46 Hey, you're using ESXi on your machines, right? Can you tell me whether vCPUs map to physical or logical cores there? It's surprisingly difficult to find reliable information on that. [12:24] From a PM [12:25] And ESXi maps vCPUs first to physical cores, then as physical cores are mapped out, it does logical. And then major time slicing once CPU resources are fully allocated [12:26] Cheers kiska! [12:52] *** Mateon1 has quit IRC (Read error: Operation timed out) [12:58] *** Mateon1 has joined #archiveteam-ot [16:11] I just learnt about 'git worktree'. Damn, this is awesome. [16:14] I just learned wpull is writen by archiveteam [16:15] *for [17:14] *** wp494 has quit IRC (Read error: Operation timed out) [17:14] *** wp494 has joined #archiveteam-ot [18:42] *** Stiletto has joined #archiveteam-ot [18:46] *** Stilett0 has quit IRC (Read error: Operation timed out) [19:05] *** alex____ has joined #archiveteam-ot [19:06] *** alex__ has quit IRC (Ping timeout: 252 seconds) [19:21] *** Martle has joined #archiveteam-ot [21:47] *** tuluu has quit IRC (Remote host closed the connection) [21:49] *** tuluu has joined #archiveteam-ot [22:04] *** BlueMax has joined #archiveteam-ot