#archiveteam-ot 2019-04-17,Wed

↑back Search

Time Nickname Message
00:31 🔗 lunik194 has joined #archiveteam-ot
00:31 🔗 lunik19 has quit IRC (Read error: Operation timed out)
00:56 🔗 drcd has quit IRC (Leaving)
02:08 🔗 ayanami_ has quit IRC (Quit: Leaving)
02:52 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
03:15 🔗 MR9K has quit IRC (Ping timeout: 265 seconds)
03:18 🔗 MR9K has joined #archiveteam-ot
03:25 🔗 qw3rty112 has joined #archiveteam-ot
03:32 🔗 qw3rty111 has quit IRC (Ping timeout: 600 seconds)
03:33 🔗 icedice has quit IRC (Quit: Leaving)
03:38 🔗 odemg has quit IRC (Ping timeout: 615 seconds)
03:45 🔗 odemg has joined #archiveteam-ot
04:30 🔗 MR9K1 has joined #archiveteam-ot
04:37 🔗 dhyan_nat has joined #archiveteam-ot
04:38 🔗 MR9K1 has quit IRC (Ping timeout: 255 seconds)
04:58 🔗 MR9K4 has joined #archiveteam-ot
05:06 🔗 MR9K4 has quit IRC (Quit: Ping timeout (120 seconds))
05:09 🔗 MR9K4 has joined #archiveteam-ot
05:13 🔗 fuzzy8021 has quit IRC (Read error: Operation timed out)
05:14 🔗 fuzzy8021 has joined #archiveteam-ot
05:31 🔗 MR9K has quit IRC (Quit: tilde lounge - https://irc.tilde.team)
06:08 🔗 MR9K4 has quit IRC (tilde lounge - https://irc.tilde.team)
06:09 🔗 MR9K4 has joined #archiveteam-ot
06:11 🔗 MR9K4 has quit IRC (Read error: Connection reset by peer)
06:11 🔗 MR9K4 has joined #archiveteam-ot
06:58 🔗 fuzy802 has joined #archiveteam-ot
07:03 🔗 fuzzy8021 has quit IRC (Ping timeout: 615 seconds)
07:08 🔗 fuzy802 is now known as fuzzy8021
07:14 🔗 dhyan_nat has quit IRC (Remote host closed the connection)
07:19 🔗 dhyan_nat has joined #archiveteam-ot
07:25 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
08:34 🔗 drcd has joined #archiveteam-ot
08:45 🔗 Verified_ has quit IRC (Remote host closed the connection)
09:33 🔗 Jopik has quit IRC (Ping timeout: 360 seconds)
09:35 🔗 Hintswen has quit IRC ()
09:44 🔗 BlueMax has quit IRC (Quit: Leaving)
10:24 🔗 Odd0002_ has joined #archiveteam-ot
10:30 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
10:30 🔗 Odd0002_ is now known as Odd0002
11:03 🔗 Despatche has joined #archiveteam-ot
12:08 🔗 godane has quit IRC (Ping timeout: 265 seconds)
12:22 🔗 godane has joined #archiveteam-ot
13:02 🔗 dhyan_nat has joined #archiveteam-ot
13:11 🔗 ColdIce has quit IRC (Remote host closed the connection)
13:12 🔗 ColdIce has joined #archiveteam-ot
13:57 🔗 lunik194 is now known as lunik1
14:28 🔗 JAA In about 12 hours, it's unix epoch 1555555551 (2019-04-18 02:45:51 UTC). Neat.
15:19 🔗 drcd has quit IRC (Leaving)
15:20 🔗 drcd has joined #archiveteam-ot
15:23 🔗 Fusl tgt=1555555551; while true; do now=$(date +%s); test "${now}" -gt "${tgt}" && break; secs=$((${tgt}-${now})); printf "\r%02d:%02d:%02d" $((secs/3600)) $(((secs/60)%60)) $((secs%60)); sleep 1; done
15:24 🔗 kiska Anyway I am happy to announce that https://transfer.kiska.pw is ready for archiveteam use
15:25 🔗 kiska I will purge items from the oldest once my disk fills to 90% OR I will pay a little more for more disk space
15:29 🔗 Fusl or you know, you could have asked for a VM :P
15:29 🔗 Fusl GLOBAL:
15:29 🔗 Fusl SIZE AVAIL RAW USED %RAW USED OBJECTS
15:29 🔗 Fusl 7.89TiB 5.92TiB 1.97TiB 24.91 649.54k
15:30 🔗 kiska See the thing is, I've already got the instance
15:30 🔗 kiska And I am doing almost nothing with it
15:40 🔗 Fusl ¯\_(ツ)_/¯
16:57 🔗 MR9K4 is now known as MR9K
16:57 🔗 VoynichCr what is this? more pipelines for archivebot?
17:02 🔗 kiska Whats that?
17:10 🔗 JAA Damn, my Twitter scrape for #NotreDame is still running after 3.5 hours.
17:12 🔗 JAA And it's now retrieving tweets from about 2 hours after the fire started, I think. Curious how long this list will become in the end.
17:13 🔗 JAA 324k tweets so far
17:48 🔗 VoynichCr not many for such a global event
17:49 🔗 JAA True, but there are probably many tweets that don't use the hashtag etc.
18:25 🔗 kiska is now known as kiskan
18:29 🔗 kiskan is now known as kiskablah
18:30 🔗 kiskablah is now known as kiska3
19:29 🔗 kiska3 is now known as kiska
19:33 🔗 kiska Please use https://transfer.notkiska.pw/ as I don't want extra pings, unless you want me to main you
19:35 🔗 kiska s/main/maim
19:56 🔗 drcd has quit IRC (Read error: Connection reset by peer)
20:33 🔗 JAA For the record, my largest Twitter scrape with snscrape so far: 2019-04-17 21:29:51.126 INFO snscrape.cli Done, found 535168 results
20:34 🔗 JAA Took almost 6 hours.
21:24 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
21:39 🔗 BlueMax has joined #archiveteam-ot
22:11 🔗 JAA VoynichCr: https://github.com/emijrp/internet-archive/pull/19 for kiska's file storage service.
22:15 🔗 Mateon1 has quit IRC (Ping timeout: 252 seconds)
22:15 🔗 Mateon1 has joined #archiveteam-ot
22:41 🔗 eythian ivan: if grab-site stops running, but doesn't actually terminate, is that a sign that it's done or that it's stuck?
22:42 🔗 eythian also, if it's stuck, is there a way to unstick it?
22:47 🔗 ivan eythian: probably doing something, you can check the log
22:47 🔗 ivan lsof -p will probably show a connection to something
22:49 🔗 eythian it looked like it was fetching something, but after 10 minutes of it not progressing I killed it. Rerunning now to see if it behaves differently (this is mostly me experimenting, so it's not a lot of data.)
22:55 🔗 eythian it got stuck again on a different URL but on the same host. An image from www.nationalgeographic.com though there seems to be a connection open to akamai.
22:55 🔗 eythian I might just leave it and see what it's up to in the morning.
22:55 🔗 ivan I just use more connections and wait for the 48 hour timeout
22:55 🔗 ivan if the log shows it's grabbing some stream you can ignore that
22:56 🔗 ivan https://github.com/ArchiveTeam/grab-site/blob/33b5a98c7d2957bd0666baa75a05f90f644001b1/libgrabsite/main.py#L230
22:56 🔗 ivan --wpull-args=--session-timeout=3600 if you're in a rush
22:57 🔗 eythian that's quite the long timeout.
22:58 🔗 JAA If wpull 2.x is stuck on an HTTPS connection, try https://github.com/JustAnotherArchivist/kill-wpull-connections
22:58 🔗 JAA (Or ivan-wpull 3.x as well I guess.)
22:59 🔗 JAA Note also that wpull often keeps connections open in the background. I've seen processes with over 100 open connections.
23:03 🔗 eythian looks like two connections, which is the concurrency I'm running with, both to akamai. I guess for some reason it doesn't like that. I'll leave it alone as it's bedtime, see what happens when I get time tomorrow.
23:04 🔗 eythian oh, it moved. the troublesome request closed with '[Errno 104] Connection reset by peer', so yeah, akamai being weird is my guess.

irclogger-viewer