[00:10] Yeah, still having trouble with it [00:10] I tried to install it with pip running on python3.5 [00:10] still getting the same errors [00:14] You're installing from git, not from PyPI, right? [00:14] (2.0.3 isn't on PyPI.) [00:15] no, from PyPI [00:15] Or from your fork or whatever since you mentioned also merging some PRs in. [00:15] Ah [00:15] Yeah, try using the repo directly then. [00:15] ok, will do [00:16] Also, make sure that your dependencies are the right versions. I think the two odd ones are html5lib==0.9999999 and tornado==4.5.3. [00:17] hmm, how should I install the proper versions (sorry, I'm not good with pip) [00:17] pip install html5lib==0.9999999 tornado==4.5.3 [00:18] thanks! [00:40] So I've realized that Docker is able to briefly download data for a second when using `docker pull`. However, it then pretty much gets stuck. [00:46] My system's network connection seems to be totally fine. [00:54] t3: maybe take a look at your TCP stream in wireshark [02:07] *** lunik1 has quit IRC (Read error: Connection reset by peer) [02:08] *** lunik1 has joined #archiveteam-ot [02:43] ivan: Thanks. I will try that. [03:17] *** marked is now known as marked1 [04:36] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [05:02] *** fuzzy8021 has quit IRC (Read error: Connection reset by peer) [05:02] *** fuzzy8021 has joined #archiveteam-ot [05:05] *** Hani111 has joined #archiveteam-ot [05:14] *** Hani has quit IRC (Ping timeout: 615 seconds) [05:14] *** Hani111 is now known as Hani [05:44] *** dhyan_nat has joined #archiveteam-ot [05:59] *** Zerote_ has joined #archiveteam-ot [06:03] *** Despatche has joined #archiveteam-ot [06:22] *** dhyan_nat has quit IRC (Quit: Konversation terminated!) [06:22] *** dhyan_nat has joined #archiveteam-ot [07:11] *** Zerote_ has quit IRC (Ping timeout: 252 seconds) [07:13] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [07:22] *** Despatche has joined #archiveteam-ot [07:29] *** Zerote_ has joined #archiveteam-ot [09:18] *** deevious has quit IRC (Quit: deevious) [10:12] *** dhyan_nat has quit IRC (Read error: Operation timed out) [10:25] *** dhyan_nat has joined #archiveteam-ot [10:33] *** deevious has joined #archiveteam-ot [11:28] *** dhyan_nat has quit IRC (Read error: Operation timed out) [12:02] *** Zerote__ has joined #archiveteam-ot [12:05] *** Zerote_ has quit IRC (Ping timeout: 252 seconds) [12:15] is there a channel for grab-site? [12:16] here is fine [12:16] grab-site --level 1 https://www.cl.cam.ac.uk/teaching/1819/ - why did this end up grabbing things from google.co.uk, maths.cam.ac.uk, vle.cam.ac.uk? [12:17] "By default, grab-site also grabs linked pages to a depth of 1 on other domains. To turn off this behavior, use --no-offsite-links." [12:19] hrm, is there a way to get it to stay at least within cam.ac.uk? [12:22] Ignore patterns [12:23] I'm looking, can't work out where the ignore lists are stored on my PC [12:32] ended up adding them to /usr/local/lib/python3.7/dist-packages/libgrabsite/ignore_sets/ which is sub optimal [12:37] "The url https://www.cl.cam.ac.uk/teaching/1819/DiscMath/ was not found in the archive." I am definitely not smart enough for this [12:39] I recommend reading the README [12:39] it explains how to modify the ignores for the job [12:40] unfortunately it is a little annoying and you have to edit them after starting the crawl [12:40] will try when I'm not sleep deprived. I wrote my own ignore and set it with --ignore-sets= [12:40] oh yeah I forgot I added --import-ignores= too [12:40] https://github.com/ArchiveTeam/grab-site#changing-ignores-during-the-crawl [12:49] *** deevious has quit IRC (Quit: deevious) [12:53] ivan, 21nm5wsbfvc2dh35ktql9h06g on archivebot has gone waaaaay further than I'd expect. Can you cancel it please? I don't have access [13:21] !status https://renote.jp/ [13:29] meanwhile, over on a London EC2 instance... https://www.cl.cam.ac.uk/teaching/1819 on 05-14; 4,510.8 MB in 10,299 resp. at 18.8/s, 11,688 in q.; 2 con. w/ 0 ms delay; igoff [13:38] JAA / kiska, even with --no-offsite-links, that cl.cam.ac.uk grab ends up at "buydisplay.com" [13:44] *** deevious has joined #archiveteam-ot [13:52] *** Zerote__ has quit IRC (Ping timeout: 252 seconds) [14:01] *** SketchCow has quit IRC (Ping timeout: 252 seconds) [14:15] *** SketchCow has joined #archiveteam-ot [14:15] *** Fusl sets mode: +o SketchCow [14:30] *** Zerote__ has joined #archiveteam-ot [15:01] *** phiresky1 has joined #archiveteam-ot [15:02] *** phiresky has quit IRC (Ping timeout: 265 seconds) [15:31] voltagex: either it's a page requisite or you're seeing a crazy new bug [15:32] maybe zless the WARC to see how it ended up there [15:41] *** martini has joined #archiveteam-ot [17:09] *** wp494 has quit IRC (Ping timeout: 604 seconds) [17:10] *** wp494 has joined #archiveteam-ot [17:13] god damn why is pfsense complaining about disk IO so badly [17:13] what could it possibly even need to write [18:09] heh, I've run it off a CF to IDE converter for a yeah it two [18:09] and friend has run his that way for 5+ years [18:10] the CF was so bad, windows would freeze every so often to read/write probably to swap [18:46] *** Hani has quit IRC (Ping timeout: 615 seconds) [18:55] *** Hani has joined #archiveteam-ot [19:06] i've noticed weird latency spikes when there's heavy IO too [19:06] like 'just stop passing traffic for half a second' type spikes [19:37] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [19:38] *** JH88 has joined #archiveteam-ot [19:45] *** Despatche has joined #archiveteam-ot [20:45] *** dhyan_nat has joined #archiveteam-ot [21:35] Kaz: ssd rebalancing or whatsitcalled? [21:36] it's just a small array running far too much for what it should be [21:37] Kaz: configure pfsense to use tmpfs instead of the SSDs directly? [21:37] that also gives you a longer SSD lifespan [21:37] hah, you think I have SSDs in the first place [21:38] 5400rpm WD red raidz1, IOPS king [21:38] then definitely go with tmpfs [21:38] its under system -> advanced -> misc -> ram disk settings [21:40] done, cheers [21:44] *** dhyan_nat has quit IRC (Read error: Operation timed out) [22:30] *** martini has quit IRC (Quit: No Reasson) [22:56] *** Jens has quit IRC (Remote host closed the connection) [22:57] *** Jens has joined #archiveteam-ot [22:59] *** BlueMax has joined #archiveteam-ot [23:11] *** tapos has joined #archiveteam-ot [23:14] *** tapos has quit IRC (Client Quit) [23:17] *** icedice has quit IRC (Quit: Leaving) [23:54] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat) [23:58] *** asdf0101 has joined #archiveteam-ot