[00:31] *** lunik194 has joined #archiveteam-ot [00:31] *** lunik19 has quit IRC (Read error: Operation timed out) [00:56] *** drcd has quit IRC (Leaving) [02:08] *** ayanami_ has quit IRC (Quit: Leaving) [02:52] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [03:15] *** MR9K has quit IRC (Ping timeout: 265 seconds) [03:18] *** MR9K has joined #archiveteam-ot [03:25] *** qw3rty112 has joined #archiveteam-ot [03:32] *** qw3rty111 has quit IRC (Ping timeout: 600 seconds) [03:33] *** icedice has quit IRC (Quit: Leaving) [03:38] *** odemg has quit IRC (Ping timeout: 615 seconds) [03:45] *** odemg has joined #archiveteam-ot [04:30] *** MR9K1 has joined #archiveteam-ot [04:37] *** dhyan_nat has joined #archiveteam-ot [04:38] *** MR9K1 has quit IRC (Ping timeout: 255 seconds) [04:58] *** MR9K4 has joined #archiveteam-ot [05:06] *** MR9K4 has quit IRC (Quit: Ping timeout (120 seconds)) [05:09] *** MR9K4 has joined #archiveteam-ot [05:13] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [05:14] *** fuzzy8021 has joined #archiveteam-ot [05:31] *** MR9K has quit IRC (Quit: tilde lounge - https://irc.tilde.team) [06:08] *** MR9K4 has quit IRC (tilde lounge - https://irc.tilde.team) [06:09] *** MR9K4 has joined #archiveteam-ot [06:11] *** MR9K4 has quit IRC (Read error: Connection reset by peer) [06:11] *** MR9K4 has joined #archiveteam-ot [06:58] *** fuzy802 has joined #archiveteam-ot [07:03] *** fuzzy8021 has quit IRC (Ping timeout: 615 seconds) [07:08] *** fuzy802 is now known as fuzzy8021 [07:14] *** dhyan_nat has quit IRC (Remote host closed the connection) [07:19] *** dhyan_nat has joined #archiveteam-ot [07:25] *** dhyan_nat has quit IRC (Read error: Operation timed out) [08:34] *** drcd has joined #archiveteam-ot [08:45] *** Verified_ has quit IRC (Remote host closed the connection) [09:33] *** Jopik has quit IRC (Ping timeout: 360 seconds) [09:35] *** Hintswen has quit IRC () [09:44] *** BlueMax has quit IRC (Quit: Leaving) [10:24] *** Odd0002_ has joined #archiveteam-ot [10:30] *** Odd0002 has quit IRC (Read error: Operation timed out) [10:30] *** Odd0002_ is now known as Odd0002 [11:03] *** Despatche has joined #archiveteam-ot [12:08] *** godane has quit IRC (Ping timeout: 265 seconds) [12:22] *** godane has joined #archiveteam-ot [13:02] *** dhyan_nat has joined #archiveteam-ot [13:11] *** ColdIce has quit IRC (Remote host closed the connection) [13:12] *** ColdIce has joined #archiveteam-ot [13:57] *** lunik194 is now known as lunik1 [14:28] In about 12 hours, it's unix epoch 1555555551 (2019-04-18 02:45:51 UTC). Neat. [15:19] *** drcd has quit IRC (Leaving) [15:20] *** drcd has joined #archiveteam-ot [15:23] tgt=1555555551; while true; do now=$(date +%s); test "${now}" -gt "${tgt}" && break; secs=$((${tgt}-${now})); printf "\r%02d:%02d:%02d" $((secs/3600)) $(((secs/60)%60)) $((secs%60)); sleep 1; done [15:24] Anyway I am happy to announce that https://transfer.kiska.pw is ready for archiveteam use [15:25] I will purge items from the oldest once my disk fills to 90% OR I will pay a little more for more disk space [15:29] or you know, you could have asked for a VM :P [15:29] GLOBAL: [15:29] SIZE AVAIL RAW USED %RAW USED OBJECTS [15:29] 7.89TiB 5.92TiB 1.97TiB 24.91 649.54k [15:30] See the thing is, I've already got the instance [15:30] And I am doing almost nothing with it [15:40] ¯\_(ツ)_/¯ [16:57] *** MR9K4 is now known as MR9K [16:57] what is this? more pipelines for archivebot? [17:02] Whats that? [17:10] Damn, my Twitter scrape for #NotreDame is still running after 3.5 hours. [17:12] And it's now retrieving tweets from about 2 hours after the fire started, I think. Curious how long this list will become in the end. [17:13] 324k tweets so far [17:48] not many for such a global event [17:49] True, but there are probably many tweets that don't use the hashtag etc. [18:25] *** kiska is now known as kiskan [18:29] *** kiskan is now known as kiskablah [18:30] *** kiskablah is now known as kiska3 [19:29] *** kiska3 is now known as kiska [19:33] Please use https://transfer.notkiska.pw/ as I don't want extra pings, unless you want me to main you [19:35] s/main/maim [19:56] *** drcd has quit IRC (Read error: Connection reset by peer) [20:33] For the record, my largest Twitter scrape with snscrape so far: 2019-04-17 21:29:51.126 INFO snscrape.cli Done, found 535168 results [20:34] Took almost 6 hours. [21:24] *** dhyan_nat has quit IRC (Read error: Operation timed out) [21:39] *** BlueMax has joined #archiveteam-ot [22:11] VoynichCr: https://github.com/emijrp/internet-archive/pull/19 for kiska's file storage service. [22:15] *** Mateon1 has quit IRC (Ping timeout: 252 seconds) [22:15] *** Mateon1 has joined #archiveteam-ot [22:41] ivan: if grab-site stops running, but doesn't actually terminate, is that a sign that it's done or that it's stuck? [22:42] also, if it's stuck, is there a way to unstick it? [22:47] eythian: probably doing something, you can check the log [22:47] lsof -p will probably show a connection to something [22:49] it looked like it was fetching something, but after 10 minutes of it not progressing I killed it. Rerunning now to see if it behaves differently (this is mostly me experimenting, so it's not a lot of data.) [22:55] it got stuck again on a different URL but on the same host. An image from www.nationalgeographic.com though there seems to be a connection open to akamai. [22:55] I might just leave it and see what it's up to in the morning. [22:55] I just use more connections and wait for the 48 hour timeout [22:55] if the log shows it's grabbing some stream you can ignore that [22:56] https://github.com/ArchiveTeam/grab-site/blob/33b5a98c7d2957bd0666baa75a05f90f644001b1/libgrabsite/main.py#L230 [22:56] --wpull-args=--session-timeout=3600 if you're in a rush [22:57] that's quite the long timeout. [22:58] If wpull 2.x is stuck on an HTTPS connection, try https://github.com/JustAnotherArchivist/kill-wpull-connections [22:58] (Or ivan-wpull 3.x as well I guess.) [22:59] Note also that wpull often keeps connections open in the background. I've seen processes with over 100 open connections. [23:03] looks like two connections, which is the concurrency I'm running with, both to akamai. I guess for some reason it doesn't like that. I'll leave it alone as it's bedtime, see what happens when I get time tomorrow. [23:04] oh, it moved. the troublesome request closed with '[Errno 104] Connection reset by peer', so yeah, akamai being weird is my guess.