Time |
Nickname |
Message |
00:10
🔗
|
ealgase |
Yeah, still having trouble with it |
00:10
🔗
|
ealgase |
I tried to install it with pip running on python3.5 |
00:10
🔗
|
ealgase |
still getting the same errors |
00:14
🔗
|
JAA |
You're installing from git, not from PyPI, right? |
00:14
🔗
|
JAA |
(2.0.3 isn't on PyPI.) |
00:15
🔗
|
ealgase |
no, from PyPI |
00:15
🔗
|
JAA |
Or from your fork or whatever since you mentioned also merging some PRs in. |
00:15
🔗
|
JAA |
Ah |
00:15
🔗
|
JAA |
Yeah, try using the repo directly then. |
00:15
🔗
|
ealgase |
ok, will do |
00:16
🔗
|
JAA |
Also, make sure that your dependencies are the right versions. I think the two odd ones are html5lib==0.9999999 and tornado==4.5.3. |
00:17
🔗
|
ealgase |
hmm, how should I install the proper versions (sorry, I'm not good with pip) |
00:17
🔗
|
JAA |
pip install html5lib==0.9999999 tornado==4.5.3 |
00:18
🔗
|
ealgase |
thanks! |
00:40
🔗
|
t3 |
So I've realized that Docker is able to briefly download data for a second when using `docker pull`. However, it then pretty much gets stuck. |
00:46
🔗
|
t3 |
My system's network connection seems to be totally fine. |
00:54
🔗
|
ivan |
t3: maybe take a look at your TCP stream in wireshark |
02:07
🔗
|
|
lunik1 has quit IRC (Read error: Connection reset by peer) |
02:08
🔗
|
|
lunik1 has joined #archiveteam-ot |
02:43
🔗
|
t3 |
ivan: Thanks. I will try that. |
03:17
🔗
|
|
marked is now known as marked1 |
04:36
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
05:02
🔗
|
|
fuzzy8021 has quit IRC (Read error: Connection reset by peer) |
05:02
🔗
|
|
fuzzy8021 has joined #archiveteam-ot |
05:05
🔗
|
|
Hani111 has joined #archiveteam-ot |
05:14
🔗
|
|
Hani has quit IRC (Ping timeout: 615 seconds) |
05:14
🔗
|
|
Hani111 is now known as Hani |
05:44
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
05:59
🔗
|
|
Zerote_ has joined #archiveteam-ot |
06:03
🔗
|
|
Despatche has joined #archiveteam-ot |
06:22
🔗
|
|
dhyan_nat has quit IRC (Quit: Konversation terminated!) |
06:22
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
07:11
🔗
|
|
Zerote_ has quit IRC (Ping timeout: 252 seconds) |
07:13
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
07:22
🔗
|
|
Despatche has joined #archiveteam-ot |
07:29
🔗
|
|
Zerote_ has joined #archiveteam-ot |
09:18
🔗
|
|
deevious has quit IRC (Quit: deevious) |
10:12
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
10:25
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
10:33
🔗
|
|
deevious has joined #archiveteam-ot |
11:28
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
12:02
🔗
|
|
Zerote__ has joined #archiveteam-ot |
12:05
🔗
|
|
Zerote_ has quit IRC (Ping timeout: 252 seconds) |
12:15
🔗
|
voltagex |
is there a channel for grab-site? |
12:16
🔗
|
ivan |
here is fine |
12:16
🔗
|
voltagex |
grab-site --level 1 https://www.cl.cam.ac.uk/teaching/1819/ - why did this end up grabbing things from google.co.uk, maths.cam.ac.uk, vle.cam.ac.uk? |
12:17
🔗
|
ivan |
"By default, grab-site also grabs linked pages to a depth of 1 on other domains. To turn off this behavior, use --no-offsite-links." |
12:19
🔗
|
voltagex |
hrm, is there a way to get it to stay at least within cam.ac.uk? |
12:22
🔗
|
Flashfire |
Ignore patterns |
12:23
🔗
|
voltagex |
I'm looking, can't work out where the ignore lists are stored on my PC |
12:32
🔗
|
voltagex |
ended up adding them to /usr/local/lib/python3.7/dist-packages/libgrabsite/ignore_sets/ which is sub optimal |
12:37
🔗
|
voltagex |
"The url https://www.cl.cam.ac.uk/teaching/1819/DiscMath/ was not found in the archive." I am definitely not smart enough for this |
12:39
🔗
|
ivan |
I recommend reading the README |
12:39
🔗
|
ivan |
it explains how to modify the ignores for the job |
12:40
🔗
|
ivan |
unfortunately it is a little annoying and you have to edit them after starting the crawl |
12:40
🔗
|
voltagex |
will try when I'm not sleep deprived. I wrote my own ignore and set it with --ignore-sets= |
12:40
🔗
|
ivan |
oh yeah I forgot I added --import-ignores= too |
12:40
🔗
|
ivan |
https://github.com/ArchiveTeam/grab-site#changing-ignores-during-the-crawl |
12:49
🔗
|
|
deevious has quit IRC (Quit: deevious) |
12:53
🔗
|
voltagex |
ivan, 21nm5wsbfvc2dh35ktql9h06g on archivebot has gone waaaaay further than I'd expect. Can you cancel it please? I don't have access |
13:21
🔗
|
godane1 |
!status https://renote.jp/ |
13:29
🔗
|
voltagex |
meanwhile, over on a London EC2 instance... https://www.cl.cam.ac.uk/teaching/1819 on 05-14; 4,510.8 MB in 10,299 resp. at 18.8/s, 11,688 in q.; 2 con. w/ 0 ms delay; igoff |
13:38
🔗
|
voltagex |
JAA / kiska, even with --no-offsite-links, that cl.cam.ac.uk grab ends up at "buydisplay.com" |
13:44
🔗
|
|
deevious has joined #archiveteam-ot |
13:52
🔗
|
|
Zerote__ has quit IRC (Ping timeout: 252 seconds) |
14:01
🔗
|
|
SketchCow has quit IRC (Ping timeout: 252 seconds) |
14:15
🔗
|
|
SketchCow has joined #archiveteam-ot |
14:15
🔗
|
|
Fusl sets mode: +o SketchCow |
14:30
🔗
|
|
Zerote__ has joined #archiveteam-ot |
15:01
🔗
|
|
phiresky1 has joined #archiveteam-ot |
15:02
🔗
|
|
phiresky has quit IRC (Ping timeout: 265 seconds) |
15:31
🔗
|
ivan |
voltagex: either it's a page requisite or you're seeing a crazy new bug |
15:32
🔗
|
ivan |
maybe zless the WARC to see how it ended up there |
15:41
🔗
|
|
martini has joined #archiveteam-ot |
17:09
🔗
|
|
wp494 has quit IRC (Ping timeout: 604 seconds) |
17:10
🔗
|
|
wp494 has joined #archiveteam-ot |
17:13
🔗
|
Kaz |
god damn why is pfsense complaining about disk IO so badly |
17:13
🔗
|
Kaz |
what could it possibly even need to write |
18:09
🔗
|
ranma |
heh, I've run it off a CF to IDE converter for a yeah it two |
18:09
🔗
|
ranma |
and friend has run his that way for 5+ years |
18:10
🔗
|
ranma |
the CF was so bad, windows would freeze every so often to read/write probably to swap |
18:46
🔗
|
|
Hani has quit IRC (Ping timeout: 615 seconds) |
18:55
🔗
|
|
Hani has joined #archiveteam-ot |
19:06
🔗
|
Kaz |
i've noticed weird latency spikes when there's heavy IO too |
19:06
🔗
|
Kaz |
like 'just stop passing traffic for half a second' type spikes |
19:37
🔗
|
|
Despatche has quit IRC (Quit: Read error: Connection reset by deer) |
19:38
🔗
|
|
JH88 has joined #archiveteam-ot |
19:45
🔗
|
|
Despatche has joined #archiveteam-ot |
20:45
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
21:35
🔗
|
schbirid2 |
Kaz: ssd rebalancing or whatsitcalled? |
21:36
🔗
|
Kaz |
it's just a small array running far too much for what it should be |
21:37
🔗
|
Fusl |
Kaz: configure pfsense to use tmpfs instead of the SSDs directly? |
21:37
🔗
|
Fusl |
that also gives you a longer SSD lifespan |
21:37
🔗
|
Kaz |
hah, you think I have SSDs in the first place |
21:38
🔗
|
Kaz |
5400rpm WD red raidz1, IOPS king |
21:38
🔗
|
Fusl |
then definitely go with tmpfs |
21:38
🔗
|
Fusl |
its under system -> advanced -> misc -> ram disk settings |
21:40
🔗
|
Kaz |
done, cheers |
21:44
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
22:30
🔗
|
|
martini has quit IRC (Quit: No Reasson) |
22:56
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
22:57
🔗
|
|
Jens has joined #archiveteam-ot |
22:59
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:11
🔗
|
|
tapos has joined #archiveteam-ot |
23:14
🔗
|
|
tapos has quit IRC (Client Quit) |
23:17
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
23:54
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
23:58
🔗
|
|
asdf0101 has joined #archiveteam-ot |