#archiveteam-ot 2018-11-02,Fri

↑back Search

Time Nickname Message
00:02 🔗 Martle has joined #archiveteam-ot
01:12 🔗 Stilett0 has joined #archiveteam-ot
01:14 🔗 Stiletto has quit IRC (Read error: Operation timed out)
01:14 🔗 Stiletto has joined #archiveteam-ot
01:16 🔗 Stilett0 has quit IRC (Ping timeout: 260 seconds)
01:24 🔗 Stilett0 has joined #archiveteam-ot
01:25 🔗 Stiletto has quit IRC (Ping timeout: 264 seconds)
02:05 🔗 BlueMax has joined #archiveteam-ot
03:38 🔗 VerifiedJ has quit IRC (Read error: Operation timed out)
03:39 🔗 dxrt has quit IRC (Ping timeout: 360 seconds)
03:39 🔗 Polylith has quit IRC (Ping timeout: 360 seconds)
03:39 🔗 dxrt has joined #archiveteam-ot
03:40 🔗 svchfoo3 sets mode: +o dxrt
03:40 🔗 svchfoo1 sets mode: +o dxrt
03:41 🔗 SketchCo1 has joined #archiveteam-ot
03:42 🔗 Polylith has joined #archiveteam-ot
03:47 🔗 arkiver has quit IRC (Ping timeout: 360 seconds)
03:47 🔗 SketchCow has quit IRC (Read error: Connection reset by peer)
03:52 🔗 SketchCo1 is now known as SketchCow
03:52 🔗 arkiver has joined #archiveteam-ot
03:53 🔗 svchfoo3 sets mode: +o arkiver
03:53 🔗 svchfoo1 sets mode: +o arkiver
03:56 🔗 Stiletto has joined #archiveteam-ot
03:58 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
04:12 🔗 Stilett0 has joined #archiveteam-ot
04:17 🔗 Stiletto has quit IRC (Read error: Operation timed out)
06:18 🔗 logchfoo0 starts logging #archiveteam-ot at Fri Nov 02 06:18:53 2018
06:18 🔗 logchfoo0 has joined #archiveteam-ot
06:19 🔗 svchfoo1 has joined #archiveteam-ot
06:19 🔗 svchfoo3 sets mode: +o svchfoo1
06:20 🔗 svchfoo1 sets mode: +o SketchCow
07:13 🔗 Martle has quit IRC (Leaving)
08:22 🔗 svchfoo1 has quit IRC (hub.efnet.us irc.colosolutions.net)
08:22 🔗 sknebel_ has quit IRC (hub.efnet.us irc.colosolutions.net)
08:22 🔗 jspiros has quit IRC (hub.efnet.us irc.colosolutions.net)
08:22 🔗 S1mpbrain has quit IRC (hub.efnet.us irc.colosolutions.net)
08:22 🔗 ivan has quit IRC (hub.efnet.us irc.colosolutions.net)
08:22 🔗 JAA has quit IRC (hub.efnet.us irc.colosolutions.net)
08:32 🔗 svchfoo1 has joined #archiveteam-ot
08:32 🔗 sknebel_ has joined #archiveteam-ot
08:32 🔗 jspiros has joined #archiveteam-ot
08:32 🔗 S1mpbrain has joined #archiveteam-ot
08:32 🔗 ivan has joined #archiveteam-ot
08:32 🔗 JAA has joined #archiveteam-ot
08:32 🔗 irc.colosolutions.net sets mode: +ooo svchfoo1 ivan JAA
08:33 🔗 bakJAA sets mode: +o JAA
08:33 🔗 JAA sets mode: +o bakJAA
08:47 🔗 betamax JAA: does snscrape always return tweets in reverse-chronological order?
08:48 🔗 betamax asking as this would make it simple to continually back up an account: back it up fully once, then keep archiving small amounts until you encounter a tweet URL that has already been archived before
09:35 🔗 alex__ has joined #archiveteam-ot
09:36 🔗 alex__ has quit IRC (Quit: alex__)
09:48 🔗 alex__ has joined #archiveteam-ot
09:50 🔗 alex__ has quit IRC (Quit: alex__)
10:23 🔗 VerifiedJ has joined #archiveteam-ot
10:52 🔗 alex__ has joined #archiveteam-ot
11:02 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:39 🔗 JAA betamax: Yes, it should. And you can also do something like snscrape twitter-search 'from:username since:2018-10-01' to get only tweets after a certain date. I don't think there's a way to specify a minimum tweet ID though.
11:40 🔗 JAA I don't remember whether "since" is inclusive or not. "until" also exists.
11:53 🔗 betamax JAA: that's good to know
11:55 🔗 betamax there's probably a 'nice' way to do it, but in the worst case I can just grab the first 10 tweets, see if that includes the top tweet in the list grabbed previously, if not grab the top 20, then 30...
11:55 🔗 betamax until I get to a point where it's grabbed enough to overlap what it got last time
11:56 🔗 JAA betamax: If you want to do that, it's best to use snscrape not through the CLI but from within Python.
11:57 🔗 betamax true
11:57 🔗 betamax also, great job on snscrape: currently got over 220,000 tweets from around 250 accounts
11:57 🔗 JAA That use is currently undocumented, and I can't give you the exact syntax right now, but the code would look something like this:
11:57 🔗 JAA scraper = snscrape.modules.twitter.TwitterUserScraper('username')
11:57 🔗 JAA for tweetUrl in scraper.get_items():
11:57 🔗 JAA tweetId = get_tweet_id_from_url(tweetUrl)
11:57 🔗 JAA if tweetId <= newestIdFromPreviousRun:
11:57 🔗 JAA break
11:57 🔗 JAA print(tweetUrl)
11:58 🔗 betamax thanks
11:58 🔗 JAA Scraper.get_items is a generator, so this will iterate through pages as long as needed.
11:58 🔗 betamax ah, excellent
11:59 🔗 JAA I don't expect this API to change anytime soon, so if you can figure the exact syntax out from the code, that should be safe. But as mentioned, no documentation on that yet.
12:00 🔗 JAA (Relevant issue: https://github.com/JustAnotherArchivist/snscrape/issues/7 )
12:01 🔗 JAA Oh, and tweetUrl won't be a string but an Item. Access the 'url' attribute to get the URL.
12:01 🔗 JAA In the future, this will be a specialised Item subclass which provides things like tweetId, date, message, etc.
12:02 🔗 JAA ( https://github.com/JustAnotherArchivist/snscrape/issues/9 )
12:24 🔗 kiska 23:22:46 <JAA> Hey, you're using ESXi on your machines, right? Can you tell me whether vCPUs map to physical or logical cores there? It's surprisingly difficult to find reliable information on that.
12:24 🔗 kiska From a PM
12:25 🔗 kiska And ESXi maps vCPUs first to physical cores, then as physical cores are mapped out, it does logical. And then major time slicing once CPU resources are fully allocated
12:26 🔗 JAA Cheers kiska!
12:52 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
12:58 🔗 Mateon1 has joined #archiveteam-ot
16:11 🔗 JAA I just learnt about 'git worktree'. Damn, this is awesome.
16:14 🔗 jut I just learned wpull is writen by archiveteam
16:15 🔗 jut *for
17:14 🔗 wp494 has quit IRC (Read error: Operation timed out)
17:14 🔗 wp494 has joined #archiveteam-ot
18:42 🔗 Stiletto has joined #archiveteam-ot
18:46 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
19:05 🔗 alex____ has joined #archiveteam-ot
19:06 🔗 alex__ has quit IRC (Ping timeout: 252 seconds)
19:21 🔗 Martle has joined #archiveteam-ot
21:47 🔗 tuluu has quit IRC (Remote host closed the connection)
21:49 🔗 tuluu has joined #archiveteam-ot
22:04 🔗 BlueMax has joined #archiveteam-ot

irclogger-viewer