Time |
Nickname |
Message |
00:31
🔗
|
|
lunik194 has joined #archiveteam-ot |
00:31
🔗
|
|
lunik19 has quit IRC (Read error: Operation timed out) |
00:56
🔗
|
|
drcd has quit IRC (Leaving) |
02:08
🔗
|
|
ayanami_ has quit IRC (Quit: Leaving) |
02:52
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
03:15
🔗
|
|
MR9K has quit IRC (Ping timeout: 265 seconds) |
03:18
🔗
|
|
MR9K has joined #archiveteam-ot |
03:25
🔗
|
|
qw3rty112 has joined #archiveteam-ot |
03:32
🔗
|
|
qw3rty111 has quit IRC (Ping timeout: 600 seconds) |
03:33
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
03:38
🔗
|
|
odemg has quit IRC (Ping timeout: 615 seconds) |
03:45
🔗
|
|
odemg has joined #archiveteam-ot |
04:30
🔗
|
|
MR9K1 has joined #archiveteam-ot |
04:37
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
04:38
🔗
|
|
MR9K1 has quit IRC (Ping timeout: 255 seconds) |
04:58
🔗
|
|
MR9K4 has joined #archiveteam-ot |
05:06
🔗
|
|
MR9K4 has quit IRC (Quit: Ping timeout (120 seconds)) |
05:09
🔗
|
|
MR9K4 has joined #archiveteam-ot |
05:13
🔗
|
|
fuzzy8021 has quit IRC (Read error: Operation timed out) |
05:14
🔗
|
|
fuzzy8021 has joined #archiveteam-ot |
05:31
🔗
|
|
MR9K has quit IRC (Quit: tilde lounge - https://irc.tilde.team) |
06:08
🔗
|
|
MR9K4 has quit IRC (tilde lounge - https://irc.tilde.team) |
06:09
🔗
|
|
MR9K4 has joined #archiveteam-ot |
06:11
🔗
|
|
MR9K4 has quit IRC (Read error: Connection reset by peer) |
06:11
🔗
|
|
MR9K4 has joined #archiveteam-ot |
06:58
🔗
|
|
fuzy802 has joined #archiveteam-ot |
07:03
🔗
|
|
fuzzy8021 has quit IRC (Ping timeout: 615 seconds) |
07:08
🔗
|
|
fuzy802 is now known as fuzzy8021 |
07:14
🔗
|
|
dhyan_nat has quit IRC (Remote host closed the connection) |
07:19
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
07:25
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
08:34
🔗
|
|
drcd has joined #archiveteam-ot |
08:45
🔗
|
|
Verified_ has quit IRC (Remote host closed the connection) |
09:33
🔗
|
|
Jopik has quit IRC (Ping timeout: 360 seconds) |
09:35
🔗
|
|
Hintswen has quit IRC () |
09:44
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
10:24
🔗
|
|
Odd0002_ has joined #archiveteam-ot |
10:30
🔗
|
|
Odd0002 has quit IRC (Read error: Operation timed out) |
10:30
🔗
|
|
Odd0002_ is now known as Odd0002 |
11:03
🔗
|
|
Despatche has joined #archiveteam-ot |
12:08
🔗
|
|
godane has quit IRC (Ping timeout: 265 seconds) |
12:22
🔗
|
|
godane has joined #archiveteam-ot |
13:02
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
13:11
🔗
|
|
ColdIce has quit IRC (Remote host closed the connection) |
13:12
🔗
|
|
ColdIce has joined #archiveteam-ot |
13:57
🔗
|
|
lunik194 is now known as lunik1 |
14:28
🔗
|
JAA |
In about 12 hours, it's unix epoch 1555555551 (2019-04-18 02:45:51 UTC). Neat. |
15:19
🔗
|
|
drcd has quit IRC (Leaving) |
15:20
🔗
|
|
drcd has joined #archiveteam-ot |
15:23
🔗
|
Fusl |
tgt=1555555551; while true; do now=$(date +%s); test "${now}" -gt "${tgt}" && break; secs=$((${tgt}-${now})); printf "\r%02d:%02d:%02d" $((secs/3600)) $(((secs/60)%60)) $((secs%60)); sleep 1; done |
15:24
🔗
|
kiska |
Anyway I am happy to announce that https://transfer.kiska.pw is ready for archiveteam use |
15:25
🔗
|
kiska |
I will purge items from the oldest once my disk fills to 90% OR I will pay a little more for more disk space |
15:29
🔗
|
Fusl |
or you know, you could have asked for a VM :P |
15:29
🔗
|
Fusl |
GLOBAL: |
15:29
🔗
|
Fusl |
SIZE AVAIL RAW USED %RAW USED OBJECTS |
15:29
🔗
|
Fusl |
7.89TiB 5.92TiB 1.97TiB 24.91 649.54k |
15:30
🔗
|
kiska |
See the thing is, I've already got the instance |
15:30
🔗
|
kiska |
And I am doing almost nothing with it |
15:40
🔗
|
Fusl |
¯\_(ツ)_/¯ |
16:57
🔗
|
|
MR9K4 is now known as MR9K |
16:57
🔗
|
VoynichCr |
what is this? more pipelines for archivebot? |
17:02
🔗
|
kiska |
Whats that? |
17:10
🔗
|
JAA |
Damn, my Twitter scrape for #NotreDame is still running after 3.5 hours. |
17:12
🔗
|
JAA |
And it's now retrieving tweets from about 2 hours after the fire started, I think. Curious how long this list will become in the end. |
17:13
🔗
|
JAA |
324k tweets so far |
17:48
🔗
|
VoynichCr |
not many for such a global event |
17:49
🔗
|
JAA |
True, but there are probably many tweets that don't use the hashtag etc. |
18:25
🔗
|
|
kiska is now known as kiskan |
18:29
🔗
|
|
kiskan is now known as kiskablah |
18:30
🔗
|
|
kiskablah is now known as kiska3 |
19:29
🔗
|
|
kiska3 is now known as kiska |
19:33
🔗
|
kiska |
Please use https://transfer.notkiska.pw/ as I don't want extra pings, unless you want me to main you |
19:35
🔗
|
kiska |
s/main/maim |
19:56
🔗
|
|
drcd has quit IRC (Read error: Connection reset by peer) |
20:33
🔗
|
JAA |
For the record, my largest Twitter scrape with snscrape so far: 2019-04-17 21:29:51.126 INFO snscrape.cli Done, found 535168 results |
20:34
🔗
|
JAA |
Took almost 6 hours. |
21:24
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
21:39
🔗
|
|
BlueMax has joined #archiveteam-ot |
22:11
🔗
|
JAA |
VoynichCr: https://github.com/emijrp/internet-archive/pull/19 for kiska's file storage service. |
22:15
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 252 seconds) |
22:15
🔗
|
|
Mateon1 has joined #archiveteam-ot |
22:41
🔗
|
eythian |
ivan: if grab-site stops running, but doesn't actually terminate, is that a sign that it's done or that it's stuck? |
22:42
🔗
|
eythian |
also, if it's stuck, is there a way to unstick it? |
22:47
🔗
|
ivan |
eythian: probably doing something, you can check the log |
22:47
🔗
|
ivan |
lsof -p will probably show a connection to something |
22:49
🔗
|
eythian |
it looked like it was fetching something, but after 10 minutes of it not progressing I killed it. Rerunning now to see if it behaves differently (this is mostly me experimenting, so it's not a lot of data.) |
22:55
🔗
|
eythian |
it got stuck again on a different URL but on the same host. An image from www.nationalgeographic.com though there seems to be a connection open to akamai. |
22:55
🔗
|
eythian |
I might just leave it and see what it's up to in the morning. |
22:55
🔗
|
ivan |
I just use more connections and wait for the 48 hour timeout |
22:55
🔗
|
ivan |
if the log shows it's grabbing some stream you can ignore that |
22:56
🔗
|
ivan |
https://github.com/ArchiveTeam/grab-site/blob/33b5a98c7d2957bd0666baa75a05f90f644001b1/libgrabsite/main.py#L230 |
22:56
🔗
|
ivan |
--wpull-args=--session-timeout=3600 if you're in a rush |
22:57
🔗
|
eythian |
that's quite the long timeout. |
22:58
🔗
|
JAA |
If wpull 2.x is stuck on an HTTPS connection, try https://github.com/JustAnotherArchivist/kill-wpull-connections |
22:58
🔗
|
JAA |
(Or ivan-wpull 3.x as well I guess.) |
22:59
🔗
|
JAA |
Note also that wpull often keeps connections open in the background. I've seen processes with over 100 open connections. |
23:03
🔗
|
eythian |
looks like two connections, which is the concurrency I'm running with, both to akamai. I guess for some reason it doesn't like that. I'll leave it alone as it's bedtime, see what happens when I get time tomorrow. |
23:04
🔗
|
eythian |
oh, it moved. the troublesome request closed with '[Errno 104] Connection reset by peer', so yeah, akamai being weird is my guess. |