#archiveteam-ot 2019-05-14,Tue

↑back Search

Time Nickname Message
00:10 🔗 ealgase Yeah, still having trouble with it
00:10 🔗 ealgase I tried to install it with pip running on python3.5
00:10 🔗 ealgase still getting the same errors
00:14 🔗 JAA You're installing from git, not from PyPI, right?
00:14 🔗 JAA (2.0.3 isn't on PyPI.)
00:15 🔗 ealgase no, from PyPI
00:15 🔗 JAA Or from your fork or whatever since you mentioned also merging some PRs in.
00:15 🔗 JAA Ah
00:15 🔗 JAA Yeah, try using the repo directly then.
00:15 🔗 ealgase ok, will do
00:16 🔗 JAA Also, make sure that your dependencies are the right versions. I think the two odd ones are html5lib==0.9999999 and tornado==4.5.3.
00:17 🔗 ealgase hmm, how should I install the proper versions (sorry, I'm not good with pip)
00:17 🔗 JAA pip install html5lib==0.9999999 tornado==4.5.3
00:18 🔗 ealgase thanks!
00:40 🔗 t3 So I've realized that Docker is able to briefly download data for a second when using `docker pull`. However, it then pretty much gets stuck.
00:46 🔗 t3 My system's network connection seems to be totally fine.
00:54 🔗 ivan t3: maybe take a look at your TCP stream in wireshark
02:07 🔗 lunik1 has quit IRC (Read error: Connection reset by peer)
02:08 🔗 lunik1 has joined #archiveteam-ot
02:43 🔗 t3 ivan: Thanks. I will try that.
03:17 🔗 marked is now known as marked1
04:36 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
05:02 🔗 fuzzy8021 has quit IRC (Read error: Connection reset by peer)
05:02 🔗 fuzzy8021 has joined #archiveteam-ot
05:05 🔗 Hani111 has joined #archiveteam-ot
05:14 🔗 Hani has quit IRC (Ping timeout: 615 seconds)
05:14 🔗 Hani111 is now known as Hani
05:44 🔗 dhyan_nat has joined #archiveteam-ot
05:59 🔗 Zerote_ has joined #archiveteam-ot
06:03 🔗 Despatche has joined #archiveteam-ot
06:22 🔗 dhyan_nat has quit IRC (Quit: Konversation terminated!)
06:22 🔗 dhyan_nat has joined #archiveteam-ot
07:11 🔗 Zerote_ has quit IRC (Ping timeout: 252 seconds)
07:13 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
07:22 🔗 Despatche has joined #archiveteam-ot
07:29 🔗 Zerote_ has joined #archiveteam-ot
09:18 🔗 deevious has quit IRC (Quit: deevious)
10:12 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
10:25 🔗 dhyan_nat has joined #archiveteam-ot
10:33 🔗 deevious has joined #archiveteam-ot
11:28 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
12:02 🔗 Zerote__ has joined #archiveteam-ot
12:05 🔗 Zerote_ has quit IRC (Ping timeout: 252 seconds)
12:15 🔗 voltagex is there a channel for grab-site?
12:16 🔗 ivan here is fine
12:16 🔗 voltagex grab-site --level 1 https://www.cl.cam.ac.uk/teaching/1819/ - why did this end up grabbing things from google.co.uk, maths.cam.ac.uk, vle.cam.ac.uk?
12:17 🔗 ivan "By default, grab-site also grabs linked pages to a depth of 1 on other domains. To turn off this behavior, use --no-offsite-links."
12:19 🔗 voltagex hrm, is there a way to get it to stay at least within cam.ac.uk?
12:22 🔗 Flashfire Ignore patterns
12:23 🔗 voltagex I'm looking, can't work out where the ignore lists are stored on my PC
12:32 🔗 voltagex ended up adding them to /usr/local/lib/python3.7/dist-packages/libgrabsite/ignore_sets/ which is sub optimal
12:37 🔗 voltagex "The url https://www.cl.cam.ac.uk/teaching/1819/DiscMath/ was not found in the archive." I am definitely not smart enough for this
12:39 🔗 ivan I recommend reading the README
12:39 🔗 ivan it explains how to modify the ignores for the job
12:40 🔗 ivan unfortunately it is a little annoying and you have to edit them after starting the crawl
12:40 🔗 voltagex will try when I'm not sleep deprived. I wrote my own ignore and set it with --ignore-sets=
12:40 🔗 ivan oh yeah I forgot I added --import-ignores= too
12:40 🔗 ivan https://github.com/ArchiveTeam/grab-site#changing-ignores-during-the-crawl
12:49 🔗 deevious has quit IRC (Quit: deevious)
12:53 🔗 voltagex ivan, 21nm5wsbfvc2dh35ktql9h06g on archivebot has gone waaaaay further than I'd expect. Can you cancel it please? I don't have access
13:21 🔗 godane1 !status https://renote.jp/
13:29 🔗 voltagex meanwhile, over on a London EC2 instance... https://www.cl.cam.ac.uk/teaching/1819 on 05-14; 4,510.8 MB in 10,299 resp. at 18.8/s, 11,688 in q.; 2 con. w/ 0 ms delay; igoff
13:38 🔗 voltagex JAA / kiska, even with --no-offsite-links, that cl.cam.ac.uk grab ends up at "buydisplay.com"
13:44 🔗 deevious has joined #archiveteam-ot
13:52 🔗 Zerote__ has quit IRC (Ping timeout: 252 seconds)
14:01 🔗 SketchCow has quit IRC (Ping timeout: 252 seconds)
14:15 🔗 SketchCow has joined #archiveteam-ot
14:15 🔗 Fusl sets mode: +o SketchCow
14:30 🔗 Zerote__ has joined #archiveteam-ot
15:01 🔗 phiresky1 has joined #archiveteam-ot
15:02 🔗 phiresky has quit IRC (Ping timeout: 265 seconds)
15:31 🔗 ivan voltagex: either it's a page requisite or you're seeing a crazy new bug
15:32 🔗 ivan maybe zless the WARC to see how it ended up there
15:41 🔗 martini has joined #archiveteam-ot
17:09 🔗 wp494 has quit IRC (Ping timeout: 604 seconds)
17:10 🔗 wp494 has joined #archiveteam-ot
17:13 🔗 Kaz god damn why is pfsense complaining about disk IO so badly
17:13 🔗 Kaz what could it possibly even need to write
18:09 🔗 ranma heh, I've run it off a CF to IDE converter for a yeah it two
18:09 🔗 ranma and friend has run his that way for 5+ years
18:10 🔗 ranma the CF was so bad, windows would freeze every so often to read/write probably to swap
18:46 🔗 Hani has quit IRC (Ping timeout: 615 seconds)
18:55 🔗 Hani has joined #archiveteam-ot
19:06 🔗 Kaz i've noticed weird latency spikes when there's heavy IO too
19:06 🔗 Kaz like 'just stop passing traffic for half a second' type spikes
19:37 🔗 Despatche has quit IRC (Quit: Read error: Connection reset by deer)
19:38 🔗 JH88 has joined #archiveteam-ot
19:45 🔗 Despatche has joined #archiveteam-ot
20:45 🔗 dhyan_nat has joined #archiveteam-ot
21:35 🔗 schbirid2 Kaz: ssd rebalancing or whatsitcalled?
21:36 🔗 Kaz it's just a small array running far too much for what it should be
21:37 🔗 Fusl Kaz: configure pfsense to use tmpfs instead of the SSDs directly?
21:37 🔗 Fusl that also gives you a longer SSD lifespan
21:37 🔗 Kaz hah, you think I have SSDs in the first place
21:38 🔗 Kaz 5400rpm WD red raidz1, IOPS king
21:38 🔗 Fusl then definitely go with tmpfs
21:38 🔗 Fusl its under system -> advanced -> misc -> ram disk settings
21:40 🔗 Kaz done, cheers
21:44 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
22:30 🔗 martini has quit IRC (Quit: No Reasson)
22:56 🔗 Jens has quit IRC (Remote host closed the connection)
22:57 🔗 Jens has joined #archiveteam-ot
22:59 🔗 BlueMax has joined #archiveteam-ot
23:11 🔗 tapos has joined #archiveteam-ot
23:14 🔗 tapos has quit IRC (Client Quit)
23:17 🔗 icedice has quit IRC (Quit: Leaving)
23:54 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
23:58 🔗 asdf0101 has joined #archiveteam-ot

irclogger-viewer