Time |
Nickname |
Message |
01:35
🔗
|
|
Odd0002 has quit IRC (Quit: ZNC - http://znc.in) |
01:36
🔗
|
|
Odd0002 has joined #archiveteam-ot |
01:54
🔗
|
|
Stilett0 has joined #archiveteam-ot |
01:55
🔗
|
|
Stiletto has quit IRC (Ping timeout: 255 seconds) |
03:29
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
03:41
🔗
|
|
odemg has joined #archiveteam-ot |
04:19
🔗
|
|
Rikai has joined #archiveteam-ot |
04:47
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
06:35
🔗
|
|
m007a83_ has joined #archiveteam-ot |
06:38
🔗
|
|
m007a83 has quit IRC (Ping timeout: 252 seconds) |
06:38
🔗
|
|
m007a83_ is now known as m007a83 |
06:58
🔗
|
|
schbirid has joined #archiveteam-ot |
07:23
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
07:24
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
07:36
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
07:55
🔗
|
|
m007a83 has joined #archiveteam-ot |
10:59
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
10:59
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
|
ivan has joined #archiveteam-ot |
11:00
🔗
|
|
Albardin has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
|
mal has quit IRC (Write error: Broken pipe) |
11:00
🔗
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
|
djsundog has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
|
chfoo has quit IRC (Read error: Operation timed out) |
11:00
🔗
|
|
svchfoo1 sets mode: +o ivan |
11:01
🔗
|
|
chfoo has joined #archiveteam-ot |
11:02
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
11:03
🔗
|
|
jspiros has quit IRC (Read error: Operation timed out) |
11:04
🔗
|
|
Stiletto has joined #archiveteam-ot |
11:06
🔗
|
|
JAA has quit IRC (Ping timeout: 246 seconds) |
11:07
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 492 seconds) |
11:22
🔗
|
|
kiska1 has joined #archiveteam-ot |
11:26
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
11:27
🔗
|
|
mal has joined #archiveteam-ot |
11:36
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
11:37
🔗
|
|
kiska1 has joined #archiveteam-ot |
11:38
🔗
|
|
ivan has joined #archiveteam-ot |
11:38
🔗
|
|
svchfoo3 sets mode: +o ivan |
11:57
🔗
|
|
Albardin has joined #archiveteam-ot |
11:57
🔗
|
|
djsundog has joined #archiveteam-ot |
11:57
🔗
|
|
dxrt_ has joined #archiveteam-ot |
11:57
🔗
|
|
dxrt sets mode: +o dxrt_ |
12:05
🔗
|
|
JAA has joined #archiveteam-ot |
12:05
🔗
|
|
svchfoo1 sets mode: +o JAA |
12:06
🔗
|
|
bakJAA sets mode: +o JAA |
12:08
🔗
|
|
jspiros has joined #archiveteam-ot |
13:06
🔗
|
ivan |
JAA: https://github.com/ludios/snscrape/commit/be684eb41fe5488ec027fe6216f4540274cff423 |
13:10
🔗
|
JAA |
ivan: Nice. We could also increase the major version as a function of time. I still find it weird though that they'd ban a single UA on an IP. |
13:10
🔗
|
JAA |
Also, Safari on Linux, heh. |
13:14
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
13:17
🔗
|
kiska |
ivan: Are you using ssh tunneling to get the dashboard? |
13:19
🔗
|
ivan |
kiska: yep |
13:21
🔗
|
kiska |
ivan: Just don't go past 40GB disk usage and you'll be fine xD |
13:21
🔗
|
kiska |
If you want you can use rsync on localhost since that connects to my vultr vps to get data faster |
13:23
🔗
|
ivan |
rsync on localhost? |
13:23
🔗
|
ivan |
I'll pull the WARCs out from __WARCs remotely to keep disk usage low (hopefully) |
13:27
🔗
|
ivan |
rsync --remove-source-files :-) |
13:33
🔗
|
kiska |
I am using a ssh tunnel to vultr that has better routing, so its on localhost. So if you want you can point rsync to "yes-uploader-i-know-what-im-doing.localhost" |
13:33
🔗
|
kiska |
ivan: ^ |
13:35
🔗
|
ivan |
oh interesting |
13:36
🔗
|
kiska |
ivan: This line causes conflicts if I want to do localhost: https://github.com/ArchiveTeam/ArchiveBot/blob/master/uploader/uploader.py#L39 |
13:36
🔗
|
ivan |
yeah I added that after I lost a lot of data for everyone |
13:37
🔗
|
ivan |
my rsync setup is weird and pulling from your server instead, so hopefully the 2.5MB/s average will be fast enough |
13:37
🔗
|
kiska |
And I got around that by using "yes-uploader-i-know-what-im-doing.localhost" |
13:37
🔗
|
ivan |
if you see disk filling up feel free to kill -STOP grab-site |
13:37
🔗
|
ivan |
I'll start a script to do that actually |
13:39
🔗
|
kiska |
So do you like the name I gave my tunnel? |
13:39
🔗
|
ivan |
heh |
13:39
🔗
|
ivan |
I hope the next guy to come along knows what he's doing |
13:40
🔗
|
ivan |
have you looked at wireguard? it's the non-terrible alternative to ssh tunnels |
13:44
🔗
|
ivan |
also after using bash on your server I feel like telling you about my zsh configuration https://gist.github.com/ivan/79de5e87210e8cf21e305bb4c30c4360 |
13:46
🔗
|
ivan |
history is immediately written out (but not shared until a restart), tab-completion is case insensitive and matches the middle of things, there's a thing to show the git branch without slowing things down, sizes in ls and du are consistent and commaified, and of course the autosuggestions from fish |
14:03
🔗
|
|
m007a83 has quit IRC (Ping timeout: 252 seconds) |
14:10
🔗
|
|
wp494 has quit IRC (Ping timeout: 268 seconds) |
14:10
🔗
|
|
wp494 has joined #archiveteam-ot |
14:10
🔗
|
|
svchfoo1 sets mode: +o wp494 |
14:12
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
14:12
🔗
|
|
jrwr has quit IRC (Read error: Operation timed out) |
14:13
🔗
|
|
Stiletto has joined #archiveteam-ot |
14:14
🔗
|
ivan |
also you can use wireguard to route _all_ of your traffic through that vultr |
14:15
🔗
|
|
VerifiedJ has quit IRC (Read error: Operation timed out) |
14:22
🔗
|
|
jrwr has joined #archiveteam-ot |
14:24
🔗
|
|
faolingf_ has joined #archiveteam-ot |
14:26
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
14:26
🔗
|
|
ivan has joined #archiveteam-ot |
14:26
🔗
|
|
mal has quit IRC (Write error: Broken pipe) |
14:26
🔗
|
|
djsundog has quit IRC (Read error: Operation timed out) |
14:27
🔗
|
|
svchfoo1 sets mode: +o ivan |
14:27
🔗
|
|
Albardin has quit IRC (Read error: Operation timed out) |
14:29
🔗
|
|
faolingfa has quit IRC (Read error: Operation timed out) |
14:30
🔗
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
14:31
🔗
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
14:36
🔗
|
ivan |
yikes there are > 100K verified accounts https://medium.com/@Haje/who-are-twitter-s-verified-users-af976fc1b032 |
14:37
🔗
|
ivan |
https://twitter.com/verified/following 307K |
14:39
🔗
|
ivan |
https://github.com/sebinsua/scrape-twitter |
14:41
🔗
|
kiska |
I rather not have all traffic routed through vultr |
14:41
🔗
|
kiska |
Also wireguard makes it difficult to spin up a vm let it run for a couple days, destroy, rinse and repeat |
14:43
🔗
|
kiska |
Since I select vultr instances with 500GiB of traffic, use that up and destroy the vm to reset the usage count |
14:57
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 360 seconds) |
14:58
🔗
|
|
Mateon1 has joined #archiveteam-ot |
15:14
🔗
|
|
kiska1 has joined #archiveteam-ot |
15:15
🔗
|
|
mal has joined #archiveteam-ot |
15:25
🔗
|
|
Albardin has joined #archiveteam-ot |
15:26
🔗
|
|
dxrt_ has joined #archiveteam-ot |
15:26
🔗
|
|
dxrt sets mode: +o dxrt_ |
15:26
🔗
|
|
djsundog has joined #archiveteam-ot |
15:27
🔗
|
|
m007a83 has joined #archiveteam-ot |
15:46
🔗
|
|
Albardin has quit IRC (Read error: Connection reset by peer) |
15:47
🔗
|
|
Albardin has joined #archiveteam-ot |
16:05
🔗
|
|
mal has quit IRC (Ping timeout: 600 seconds) |
16:06
🔗
|
|
Albardin has quit IRC (Ping timeout: 600 seconds) |
16:06
🔗
|
|
kiska1 has quit IRC (Ping timeout: 600 seconds) |
16:06
🔗
|
|
djsundog has quit IRC (Ping timeout: 600 seconds) |
16:06
🔗
|
|
dxrt_ has quit IRC (Ping timeout: 600 seconds) |
16:41
🔗
|
|
kiska1 has joined #archiveteam-ot |
16:50
🔗
|
|
mal has joined #archiveteam-ot |
16:57
🔗
|
|
Albardin has joined #archiveteam-ot |
16:57
🔗
|
|
dxrt_ has joined #archiveteam-ot |
16:57
🔗
|
|
dxrt sets mode: +o dxrt_ |
16:58
🔗
|
|
djsundog has joined #archiveteam-ot |
17:09
🔗
|
|
Stilett0 has joined #archiveteam-ot |
17:15
🔗
|
|
Stiletto has quit IRC (Ping timeout: 633 seconds) |
17:33
🔗
|
ivan |
TIL curl -OJ accomplishes the same as wget --content-disposition --no-use-server-timestamps |
17:40
🔗
|
moufu |
-L too, wget follows redirects by default while curl doesn't |
17:45
🔗
|
ivan |
ah right |
17:50
🔗
|
ivan |
apparently aria2c provides sane defaults for downloading a file |
17:50
🔗
|
ivan |
should be -LOJ equivalent |
17:56
🔗
|
ivan |
starting to see the insanity of archiving individual tweet pages |
17:57
🔗
|
ivan |
it might be useful if IA could index URLs inside a page as representative of the content even though the actual URL weren't grabbed |
17:58
🔗
|
ivan |
a search results page has many tweets with identifiers, after all |
18:46
🔗
|
|
icedice has joined #archiveteam-ot |
19:10
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
19:11
🔗
|
|
icedice has joined #archiveteam-ot |
20:19
🔗
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
20:20
🔗
|
|
dxrt_ has joined #archiveteam-ot |
20:20
🔗
|
|
dxrt sets mode: +o dxrt_ |
21:41
🔗
|
|
Stiletto has joined #archiveteam-ot |
21:43
🔗
|
|
odemg has joined #archiveteam-ot |
21:46
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 492 seconds) |
21:52
🔗
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
21:53
🔗
|
|
dxrt_ has joined #archiveteam-ot |
21:53
🔗
|
|
dxrt sets mode: +o dxrt_ |
22:02
🔗
|
|
Stilett0 has joined #archiveteam-ot |
22:05
🔗
|
|
icedice has quit IRC (Read error: Connection reset by peer) |
22:05
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
22:28
🔗
|
|
Stiletto has joined #archiveteam-ot |
22:32
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |