[01:35] *** Odd0002 has quit IRC (Quit: ZNC - http://znc.in) [01:36] *** Odd0002 has joined #archiveteam-ot [01:54] *** Stilett0 has joined #archiveteam-ot [01:55] *** Stiletto has quit IRC (Ping timeout: 255 seconds) [03:29] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:41] *** odemg has joined #archiveteam-ot [04:19] *** Rikai has joined #archiveteam-ot [04:47] *** odemg has quit IRC (Ping timeout: 260 seconds) [06:35] *** m007a83_ has joined #archiveteam-ot [06:38] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [06:38] *** m007a83_ is now known as m007a83 [06:58] *** schbirid has joined #archiveteam-ot [07:23] *** icedice has quit IRC (Quit: Leaving) [07:24] *** schbirid has quit IRC (Remote host closed the connection) [07:36] *** m007a83 has quit IRC (Read error: Connection reset by peer) [07:55] *** m007a83 has joined #archiveteam-ot [10:59] *** kiska1 has quit IRC (Read error: Operation timed out) [10:59] *** ivan has quit IRC (Read error: Operation timed out) [11:00] *** ivan has joined #archiveteam-ot [11:00] *** Albardin has quit IRC (Read error: Operation timed out) [11:00] *** mal has quit IRC (Write error: Broken pipe) [11:00] *** dxrt_ has quit IRC (Read error: Operation timed out) [11:00] *** djsundog has quit IRC (Read error: Operation timed out) [11:00] *** chfoo has quit IRC (Read error: Operation timed out) [11:00] *** svchfoo1 sets mode: +o ivan [11:01] *** chfoo has joined #archiveteam-ot [11:02] *** ivan has quit IRC (Read error: Operation timed out) [11:03] *** jspiros has quit IRC (Read error: Operation timed out) [11:04] *** Stiletto has joined #archiveteam-ot [11:06] *** JAA has quit IRC (Ping timeout: 246 seconds) [11:07] *** Stilett0 has quit IRC (Ping timeout: 492 seconds) [11:22] *** kiska1 has joined #archiveteam-ot [11:26] *** kiska1 has quit IRC (Read error: Operation timed out) [11:27] *** mal has joined #archiveteam-ot [11:36] *** BlueMax has quit IRC (Quit: Leaving) [11:37] *** kiska1 has joined #archiveteam-ot [11:38] *** ivan has joined #archiveteam-ot [11:38] *** svchfoo3 sets mode: +o ivan [11:57] *** Albardin has joined #archiveteam-ot [11:57] *** djsundog has joined #archiveteam-ot [11:57] *** dxrt_ has joined #archiveteam-ot [11:57] *** dxrt sets mode: +o dxrt_ [12:05] *** JAA has joined #archiveteam-ot [12:05] *** svchfoo1 sets mode: +o JAA [12:06] *** bakJAA sets mode: +o JAA [12:08] *** jspiros has joined #archiveteam-ot [13:06] JAA: https://github.com/ludios/snscrape/commit/be684eb41fe5488ec027fe6216f4540274cff423 [13:10] ivan: Nice. We could also increase the major version as a function of time. I still find it weird though that they'd ban a single UA on an IP. [13:10] Also, Safari on Linux, heh. [13:14] *** VerifiedJ has joined #archiveteam-ot [13:17] ivan: Are you using ssh tunneling to get the dashboard? [13:19] kiska: yep [13:21] ivan: Just don't go past 40GB disk usage and you'll be fine xD [13:21] If you want you can use rsync on localhost since that connects to my vultr vps to get data faster [13:23] rsync on localhost? [13:23] I'll pull the WARCs out from __WARCs remotely to keep disk usage low (hopefully) [13:27] rsync --remove-source-files :-) [13:33] I am using a ssh tunnel to vultr that has better routing, so its on localhost. So if you want you can point rsync to "yes-uploader-i-know-what-im-doing.localhost" [13:33] ivan: ^ [13:35] oh interesting [13:36] ivan: This line causes conflicts if I want to do localhost: https://github.com/ArchiveTeam/ArchiveBot/blob/master/uploader/uploader.py#L39 [13:36] yeah I added that after I lost a lot of data for everyone [13:37] my rsync setup is weird and pulling from your server instead, so hopefully the 2.5MB/s average will be fast enough [13:37] And I got around that by using "yes-uploader-i-know-what-im-doing.localhost" [13:37] if you see disk filling up feel free to kill -STOP grab-site [13:37] I'll start a script to do that actually [13:39] So do you like the name I gave my tunnel? [13:39] heh [13:39] I hope the next guy to come along knows what he's doing [13:40] have you looked at wireguard? it's the non-terrible alternative to ssh tunnels [13:44] also after using bash on your server I feel like telling you about my zsh configuration https://gist.github.com/ivan/79de5e87210e8cf21e305bb4c30c4360 [13:46] history is immediately written out (but not shared until a restart), tab-completion is case insensitive and matches the middle of things, there's a thing to show the git branch without slowing things down, sizes in ls and du are consistent and commaified, and of course the autosuggestions from fish [14:03] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [14:10] *** wp494 has quit IRC (Ping timeout: 268 seconds) [14:10] *** wp494 has joined #archiveteam-ot [14:10] *** svchfoo1 sets mode: +o wp494 [14:12] *** Stiletto has quit IRC (Read error: Operation timed out) [14:12] *** jrwr has quit IRC (Read error: Operation timed out) [14:13] *** Stiletto has joined #archiveteam-ot [14:14] also you can use wireguard to route _all_ of your traffic through that vultr [14:15] *** VerifiedJ has quit IRC (Read error: Operation timed out) [14:22] *** jrwr has joined #archiveteam-ot [14:24] *** faolingf_ has joined #archiveteam-ot [14:26] *** ivan has quit IRC (Read error: Operation timed out) [14:26] *** ivan has joined #archiveteam-ot [14:26] *** mal has quit IRC (Write error: Broken pipe) [14:26] *** djsundog has quit IRC (Read error: Operation timed out) [14:27] *** svchfoo1 sets mode: +o ivan [14:27] *** Albardin has quit IRC (Read error: Operation timed out) [14:29] *** faolingfa has quit IRC (Read error: Operation timed out) [14:30] *** kiska1 has quit IRC (Read error: Operation timed out) [14:31] *** dxrt_ has quit IRC (Read error: Operation timed out) [14:36] yikes there are > 100K verified accounts https://medium.com/@Haje/who-are-twitter-s-verified-users-af976fc1b032 [14:37] https://twitter.com/verified/following 307K [14:39] https://github.com/sebinsua/scrape-twitter [14:41] I rather not have all traffic routed through vultr [14:41] Also wireguard makes it difficult to spin up a vm let it run for a couple days, destroy, rinse and repeat [14:43] Since I select vultr instances with 500GiB of traffic, use that up and destroy the vm to reset the usage count [14:57] *** Mateon1 has quit IRC (Ping timeout: 360 seconds) [14:58] *** Mateon1 has joined #archiveteam-ot [15:14] *** kiska1 has joined #archiveteam-ot [15:15] *** mal has joined #archiveteam-ot [15:25] *** Albardin has joined #archiveteam-ot [15:26] *** dxrt_ has joined #archiveteam-ot [15:26] *** dxrt sets mode: +o dxrt_ [15:26] *** djsundog has joined #archiveteam-ot [15:27] *** m007a83 has joined #archiveteam-ot [15:46] *** Albardin has quit IRC (Read error: Connection reset by peer) [15:47] *** Albardin has joined #archiveteam-ot [16:05] *** mal has quit IRC (Ping timeout: 600 seconds) [16:06] *** Albardin has quit IRC (Ping timeout: 600 seconds) [16:06] *** kiska1 has quit IRC (Ping timeout: 600 seconds) [16:06] *** djsundog has quit IRC (Ping timeout: 600 seconds) [16:06] *** dxrt_ has quit IRC (Ping timeout: 600 seconds) [16:41] *** kiska1 has joined #archiveteam-ot [16:50] *** mal has joined #archiveteam-ot [16:57] *** Albardin has joined #archiveteam-ot [16:57] *** dxrt_ has joined #archiveteam-ot [16:57] *** dxrt sets mode: +o dxrt_ [16:58] *** djsundog has joined #archiveteam-ot [17:09] *** Stilett0 has joined #archiveteam-ot [17:15] *** Stiletto has quit IRC (Ping timeout: 633 seconds) [17:33] TIL curl -OJ accomplishes the same as wget --content-disposition --no-use-server-timestamps [17:40] -L too, wget follows redirects by default while curl doesn't [17:45] ah right [17:50] apparently aria2c provides sane defaults for downloading a file [17:50] should be -LOJ equivalent [17:56] starting to see the insanity of archiving individual tweet pages [17:57] it might be useful if IA could index URLs inside a page as representative of the content even though the actual URL weren't grabbed [17:58] a search results page has many tweets with identifiers, after all [18:46] *** icedice has joined #archiveteam-ot [19:10] *** icedice has quit IRC (Quit: Leaving) [19:11] *** icedice has joined #archiveteam-ot [20:19] *** dxrt_ has quit IRC (Read error: Operation timed out) [20:20] *** dxrt_ has joined #archiveteam-ot [20:20] *** dxrt sets mode: +o dxrt_ [21:41] *** Stiletto has joined #archiveteam-ot [21:43] *** odemg has joined #archiveteam-ot [21:46] *** Stilett0 has quit IRC (Ping timeout: 492 seconds) [21:52] *** dxrt_ has quit IRC (Read error: Operation timed out) [21:53] *** dxrt_ has joined #archiveteam-ot [21:53] *** dxrt sets mode: +o dxrt_ [22:02] *** Stilett0 has joined #archiveteam-ot [22:05] *** icedice has quit IRC (Read error: Connection reset by peer) [22:05] *** Stiletto has quit IRC (Read error: Operation timed out) [22:28] *** Stiletto has joined #archiveteam-ot [22:32] *** Stilett0 has quit IRC (Read error: Operation timed out)