Time |
Nickname |
Message |
00:17
🔗
|
|
Stiletto has joined #archiveteam-ot |
00:19
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
00:24
🔗
|
|
Stilett0 has joined #archiveteam-ot |
00:30
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
00:31
🔗
|
|
Stiletto has joined #archiveteam-ot |
00:34
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
00:34
🔗
|
|
Stilett0 has joined #archiveteam-ot |
00:36
🔗
|
|
Stiletto has quit IRC (Ping timeout: 268 seconds) |
01:02
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 633 seconds) |
01:02
🔗
|
|
Mateon1 has joined #archiveteam-ot |
01:04
🔗
|
|
Igloo_ has quit IRC (Read error: Operation timed out) |
01:04
🔗
|
|
svchfoo3 has quit IRC (Read error: Operation timed out) |
01:05
🔗
|
|
Igloo has joined #archiveteam-ot |
01:07
🔗
|
|
svchfoo3 has joined #archiveteam-ot |
01:08
🔗
|
|
svchfoo1 sets mode: +o svchfoo3 |
01:21
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 252 seconds) |
01:21
🔗
|
|
Stiletto has joined #archiveteam-ot |
01:33
🔗
|
|
Stilett0 has joined #archiveteam-ot |
01:36
🔗
|
|
Stiletto has quit IRC (Ping timeout: 268 seconds) |
04:57
🔗
|
|
Stiletto has joined #archiveteam-ot |
04:58
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 258 seconds) |
05:13
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
05:33
🔗
|
|
Muad-Dib has joined #archiveteam-ot |
07:57
🔗
|
|
adinbied has quit IRC (Left Channel.) |
08:02
🔗
|
|
adinbied has joined #archiveteam-ot |
08:30
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
08:33
🔗
|
|
m007a83 has joined #archiveteam-ot |
09:36
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:45
🔗
|
ivan |
https://gist.github.com/ivan/dd1d5d9f07da68dca454ee2f899820cf yapsy doesn't work on nixos, or something |
11:48
🔗
|
JAA |
Maybe related to imp.load_module being deprecated since Python 3.3? |
11:49
🔗
|
ivan |
WTF, changing open(candidate_filepath+".py","r") to open(candidate_filepath+".py","rb") in yapsy fixes it |
11:50
🔗
|
ivan |
why does it work on Debian! |
12:18
🔗
|
ivan |
I found a codepath that might be different but I'm just going to patch this yapsy garbage |
12:47
🔗
|
ivan |
I think I need to just remove yapsy |
12:48
🔗
|
JAA |
ivan: Do you plan to merge all these changes to wpull back into the main repo at some point, or is that just for your own use in grab-site? |
12:52
🔗
|
ivan |
I don't know, I'm trying to minimize the amount of time I spend touching Python and ArchiveTeam/wpull seems to have higher standards (like a plan for passing tests) than my repo |
12:52
🔗
|
ivan |
I mean they're pretty good changes so if you like them please take them |
12:53
🔗
|
ivan |
perhaps some regressions, too |
12:55
🔗
|
JAA |
Right. I do like the idea of removing Yapsy. I had nothing but issues with it, and it seems like a huge overkill compared to simply importing the plugin file as a module and using WpullPlugin.__subclasses__. Although "simply importing the plugin file as a module" is a PITA to support in multiple Python versions because they changed how that works multiple times. |
12:56
🔗
|
ivan |
I tried to monkeypatch it but it's got singletons and stashed instances |
12:57
🔗
|
ivan |
maybe I'll subclass PluginManager! |
12:59
🔗
|
JAA |
I also want to remove Tornado at some point, at least outside the test code. |
13:02
🔗
|
ivan |
subclassing doesn't work either, of course |
13:02
🔗
|
ivan |
this is such insane bullshit |
13:17
🔗
|
ivan |
JAA: nice, I didn't even know about __subclasses__ |
13:19
🔗
|
JAA |
Yeah, it's extremely useful. I use it in snscrape for modules. Each scraper is a subclass of snscrape.base.Scraper, and I simply collect the __subclasses__ to generate the command line options on the fly. That way, when I want to add a new scraper, I just need to write a new module in snscrape/modules and don't need to care about adding it anywhere else. |
13:20
🔗
|
JAA |
ivan: Regarding "why does it work on Debian but not on NixOS": could it be related to hash-based .pyc files? |
13:21
🔗
|
JAA |
I.e. that the build on NixOS uses hash-based .pycs but the Debian one doesn't. |
13:21
🔗
|
JAA |
The code throwing that exception is this: https://github.com/python/cpython/blob/3.7/Lib/importlib/_bootstrap_external.py#L832-L841 |
13:22
🔗
|
ivan |
let me finish looking into that, I wrote a script to look at the flags but didn't remember that the pyc files are in __pycache__ now :-) |
13:24
🔗
|
JAA |
Yep, there's the bug report: https://bugs.python.org/issue34056 |
13:27
🔗
|
ivan |
you are correct of course |
13:27
🔗
|
ivan |
# ~/gs-venv/bin/python3 ./print_pyc_flags ~/gs-venv/lib/python3.7/site-packages/libgrabsite/__pycache__/wpull_hooks.cpython-37.pyc |
13:27
🔗
|
ivan |
flags = 3 |
13:27
🔗
|
ivan |
hash_based = True |
13:27
🔗
|
ivan |
god fucking damnit python |
13:27
🔗
|
ivan |
thanks for the link |
13:49
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
13:57
🔗
|
JAA |
:-) |
14:00
🔗
|
|
Muad-Dib has joined #archiveteam-ot |
14:44
🔗
|
ivan |
https://github.com/NixOS/nixpkgs/blob/7261a66d713aabf30249535bfa53e5d205c4423e/pkgs/development/interpreters/python/cpython/3.7/default.nix#L138 |
14:48
🔗
|
ivan |
it took me another hour to figure out which thing was enabling the hash-based pyc compilation |
14:48
🔗
|
ivan |
# rm -rf __pycache__; SOURCE_DATE_EPOCH=0 python3 -c 'import importme'; ./print_pyc_flags ./__pycache__/importme.cpython-37.pyc |
14:48
🔗
|
ivan |
flags = 0 |
14:48
🔗
|
ivan |
hash_based = False |
14:48
🔗
|
ivan |
# rm -rf __pycache__; echo importme.py | SOURCE_DATE_EPOCH=0 python3 -m compileall -f > /dev/null; ~/print_pyc_flags ./__pycache__/importme.cpython-37.pyc |
14:48
🔗
|
ivan |
flags = 3 |
14:48
🔗
|
ivan |
hash_based = True |
15:33
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
15:37
🔗
|
|
Muad-Dib has joined #archiveteam-ot |
16:10
🔗
|
|
Stilett0 has joined #archiveteam-ot |
16:13
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
16:13
🔗
|
JAA |
ivan: Hrm, I hate this. I just ran the wpull test suite for my PR locally, and I got a different result: "FAILED (SKIP=11, errors=8, failures=7)" compared to "FAILED (SKIP=11, errors=5, failures=9)" on Travis. This will be fun to debug. |
16:14
🔗
|
ivan |
which test? |
16:15
🔗
|
ivan |
tests, heh |
16:15
🔗
|
JAA |
Ah, some failures are because I don't have phantomjs on my machine. |
16:16
🔗
|
ivan |
I removed that in my fork, problem solved |
16:16
🔗
|
JAA |
test_concurrency_zero fails with 'AssertionError: 10 != 8' in 'self.assertEqual(10, task.peak_work)'. |
16:16
🔗
|
JAA |
I saw that before. It depends on the machine load whether it succeeds. |
16:17
🔗
|
ivan |
maybe the items can be slowed down |
16:18
🔗
|
JAA |
Yeah, that would be one option. Not a very clean solution though. |
16:21
🔗
|
JAA |
A few other tests for wpull.pipeline fail for the same reason. |
16:21
🔗
|
JAA |
This is almost as good as that random time.sleep(0.5) "for good luck" somewhere in the test suite. |
16:24
🔗
|
JAA |
These are the only two types of test failures I see apart from the tests broken by FalconK's FrozenDict to dict replacement and CONNECT support removal. |
16:25
🔗
|
JAA |
The failures on Travis are also interesting. For example, https://travis-ci.org/ArchiveTeam/wpull/jobs/439753220#L4142 is "localhost" vs. "127.0.0.1". |
16:27
🔗
|
JAA |
I wonder if this is due to a different version of Tornado or similar. |
16:28
🔗
|
JAA |
pip install is running with -q on Travis, so we don't even see what it installs. Great. |
16:28
🔗
|
JAA |
I'll add a pip freeze. |
17:04
🔗
|
JAA |
ivan: Yup, those tests fail with Tornado 4.5.3 but not with 4.5.1. Nice... |
17:05
🔗
|
ivan |
ow |
17:05
🔗
|
JAA |
Specifically wpull.testing.integration.priorisation_test:TestPrioritiserHTTPGoodApp.test_app_priority_plugin_get_priority |
17:08
🔗
|
JAA |
Yeah, that test retrieves different URLs depending on the Tornado version. WTF? |
17:08
🔗
|
|
wp494 has quit IRC (Ping timeout: 252 seconds) |
17:09
🔗
|
|
wp494 has joined #archiveteam-ot |
17:10
🔗
|
|
svchfoo1 sets mode: +o wp494 |
17:10
🔗
|
ivan |
https://github.com/tornadoweb/tornado/commit/84bb2e285e15415bb86cbdf2326b19f0debb80fd |
17:11
🔗
|
JAA |
Ah yeah, that explains the localhost vs. 127.0.0.1 difference. Thanks. |
17:13
🔗
|
JAA |
https://gist.github.com/JustAnotherArchivist/5da6bca269eaf7f0b7ce760e0fe24f16 |
17:14
🔗
|
JAA |
That's the output from that test in the two Tornado versions. |
17:14
🔗
|
JAA |
Logging at DEBUG level for 4.5.3 is at https://travis-ci.org/ArchiveTeam/wpull/jobs/443276034#L2372 |
17:15
🔗
|
JAA |
Oh |
17:15
🔗
|
JAA |
wpull.processor.web: DEBUG: Robots filter verdict False reason filters |
17:15
🔗
|
JAA |
wpull.pipeline.session: DEBUG: Skipping ‘http://localhost:44757/blog/?page=3&tab=1’. |
17:15
🔗
|
JAA |
Same thing here, localhost vs. 127.0.0.1. |
17:15
🔗
|
JAA |
Ugh |
17:31
🔗
|
|
odemg has joined #archiveteam-ot |
17:31
🔗
|
JAA |
But hey, on the plus side, our test suite caught this. :-) |
17:46
🔗
|
JAA |
ivan: How do you feel about overriding tornado.testing.AsyncHTTPTestCase.get_url to always return a URL on localhost? Or alternatively always 127.0.0.1, though that would require changes throughout the test suite as well. |
17:47
🔗
|
ivan |
localhost is fine. |
17:47
🔗
|
ivan |
it was good enough before tornado changed it :-) |
17:48
🔗
|
JAA |
Yeah, and the performance impact shouldn't be too bad either. |
18:17
🔗
|
JAA |
Yep, down to the five errors caused by FalconK. :-) |
18:18
🔗
|
JAA |
Let's see how Travis does. |
18:42
🔗
|
ivan |
I will review your PR when I'm not half-braindead |
18:43
🔗
|
JAA |
Sure, there's no hurry. |
18:59
🔗
|
|
BlueMax has joined #archiveteam-ot |
19:28
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
19:39
🔗
|
|
odemg has joined #archiveteam-ot |
19:58
🔗
|
|
jrwr has quit IRC (Read error: Operation timed out) |
19:58
🔗
|
|
jrwr has joined #archiveteam-ot |
20:06
🔗
|
|
SimpBrain has quit IRC (Remote host closed the connection) |
20:20
🔗
|
|
SimpBrain has joined #archiveteam-ot |
21:18
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 264 seconds) |
21:20
🔗
|
|
Stiletto has joined #archiveteam-ot |
21:25
🔗
|
|
Stilett0 has joined #archiveteam-ot |
21:30
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
21:35
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
21:41
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 264 seconds) |
21:41
🔗
|
|
Stiletto has joined #archiveteam-ot |
21:43
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
21:46
🔗
|
|
odemg has joined #archiveteam-ot |
21:53
🔗
|
|
Stilett0 has joined #archiveteam-ot |
21:53
🔗
|
|
Stiletto has quit IRC (Ping timeout: 492 seconds) |
21:56
🔗
|
|
Stiletto has joined #archiveteam-ot |
21:57
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 264 seconds) |
22:03
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
22:08
🔗
|
|
Stilett0 has joined #archiveteam-ot |
22:10
🔗
|
|
Stiletto has quit IRC (Ping timeout: 264 seconds) |
22:15
🔗
|
|
Stiletto has joined #archiveteam-ot |
22:17
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 264 seconds) |
22:19
🔗
|
|
Stilett0 has joined #archiveteam-ot |
22:20
🔗
|
|
Stiletto has quit IRC (Ping timeout: 264 seconds) |
22:24
🔗
|
|
Stiletto has joined #archiveteam-ot |
22:29
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
22:32
🔗
|
|
Muad-Dib has joined #archiveteam-ot |
22:43
🔗
|
|
Stilett0 has joined #archiveteam-ot |
22:49
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
22:54
🔗
|
|
Stiletto has joined #archiveteam-ot |
22:56
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 268 seconds) |
22:56
🔗
|
|
Mateon1 has joined #archiveteam-ot |
23:00
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
23:38
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
23:50
🔗
|
|
odemg has joined #archiveteam-ot |