[00:17] *** Stiletto has joined #archiveteam-ot [00:19] *** Stilett0 has quit IRC (Read error: Operation timed out) [00:24] *** Stilett0 has joined #archiveteam-ot [00:30] *** Stiletto has quit IRC (Read error: Operation timed out) [00:31] *** Stiletto has joined #archiveteam-ot [00:34] *** Stilett0 has quit IRC (Read error: Operation timed out) [00:34] *** Stilett0 has joined #archiveteam-ot [00:36] *** Stiletto has quit IRC (Ping timeout: 268 seconds) [01:02] *** Mateon1 has quit IRC (Ping timeout: 633 seconds) [01:02] *** Mateon1 has joined #archiveteam-ot [01:04] *** Igloo_ has quit IRC (Read error: Operation timed out) [01:04] *** svchfoo3 has quit IRC (Read error: Operation timed out) [01:05] *** Igloo has joined #archiveteam-ot [01:07] *** svchfoo3 has joined #archiveteam-ot [01:08] *** svchfoo1 sets mode: +o svchfoo3 [01:21] *** Stilett0 has quit IRC (Ping timeout: 252 seconds) [01:21] *** Stiletto has joined #archiveteam-ot [01:33] *** Stilett0 has joined #archiveteam-ot [01:36] *** Stiletto has quit IRC (Ping timeout: 268 seconds) [04:57] *** Stiletto has joined #archiveteam-ot [04:58] *** Stilett0 has quit IRC (Ping timeout: 258 seconds) [05:13] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [05:33] *** Muad-Dib has joined #archiveteam-ot [07:57] *** adinbied has quit IRC (Left Channel.) [08:02] *** adinbied has joined #archiveteam-ot [08:30] *** m007a83 has quit IRC (Read error: Connection reset by peer) [08:33] *** m007a83 has joined #archiveteam-ot [09:36] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:45] https://gist.github.com/ivan/dd1d5d9f07da68dca454ee2f899820cf yapsy doesn't work on nixos, or something [11:48] Maybe related to imp.load_module being deprecated since Python 3.3? [11:49] WTF, changing open(candidate_filepath+".py","r") to open(candidate_filepath+".py","rb") in yapsy fixes it [11:50] why does it work on Debian! [12:18] I found a codepath that might be different but I'm just going to patch this yapsy garbage [12:47] I think I need to just remove yapsy [12:48] ivan: Do you plan to merge all these changes to wpull back into the main repo at some point, or is that just for your own use in grab-site? [12:52] I don't know, I'm trying to minimize the amount of time I spend touching Python and ArchiveTeam/wpull seems to have higher standards (like a plan for passing tests) than my repo [12:52] I mean they're pretty good changes so if you like them please take them [12:53] perhaps some regressions, too [12:55] Right. I do like the idea of removing Yapsy. I had nothing but issues with it, and it seems like a huge overkill compared to simply importing the plugin file as a module and using WpullPlugin.__subclasses__. Although "simply importing the plugin file as a module" is a PITA to support in multiple Python versions because they changed how that works multiple times. [12:56] I tried to monkeypatch it but it's got singletons and stashed instances [12:57] maybe I'll subclass PluginManager! [12:59] I also want to remove Tornado at some point, at least outside the test code. [13:02] subclassing doesn't work either, of course [13:02] this is such insane bullshit [13:17] JAA: nice, I didn't even know about __subclasses__ [13:19] Yeah, it's extremely useful. I use it in snscrape for modules. Each scraper is a subclass of snscrape.base.Scraper, and I simply collect the __subclasses__ to generate the command line options on the fly. That way, when I want to add a new scraper, I just need to write a new module in snscrape/modules and don't need to care about adding it anywhere else. [13:20] ivan: Regarding "why does it work on Debian but not on NixOS": could it be related to hash-based .pyc files? [13:21] I.e. that the build on NixOS uses hash-based .pycs but the Debian one doesn't. [13:21] The code throwing that exception is this: https://github.com/python/cpython/blob/3.7/Lib/importlib/_bootstrap_external.py#L832-L841 [13:22] let me finish looking into that, I wrote a script to look at the flags but didn't remember that the pyc files are in __pycache__ now :-) [13:24] Yep, there's the bug report: https://bugs.python.org/issue34056 [13:27] you are correct of course [13:27] # ~/gs-venv/bin/python3 ./print_pyc_flags ~/gs-venv/lib/python3.7/site-packages/libgrabsite/__pycache__/wpull_hooks.cpython-37.pyc [13:27] flags = 3 [13:27] hash_based = True [13:27] god fucking damnit python [13:27] thanks for the link [13:49] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [13:57] :-) [14:00] *** Muad-Dib has joined #archiveteam-ot [14:44] https://github.com/NixOS/nixpkgs/blob/7261a66d713aabf30249535bfa53e5d205c4423e/pkgs/development/interpreters/python/cpython/3.7/default.nix#L138 [14:48] it took me another hour to figure out which thing was enabling the hash-based pyc compilation [14:48] # rm -rf __pycache__; SOURCE_DATE_EPOCH=0 python3 -c 'import importme'; ./print_pyc_flags ./__pycache__/importme.cpython-37.pyc [14:48] flags = 0 [14:48] hash_based = False [14:48] # rm -rf __pycache__; echo importme.py | SOURCE_DATE_EPOCH=0 python3 -m compileall -f > /dev/null; ~/print_pyc_flags ./__pycache__/importme.cpython-37.pyc [14:48] flags = 3 [14:48] hash_based = True [15:33] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [15:37] *** Muad-Dib has joined #archiveteam-ot [16:10] *** Stilett0 has joined #archiveteam-ot [16:13] *** Stiletto has quit IRC (Read error: Operation timed out) [16:13] ivan: Hrm, I hate this. I just ran the wpull test suite for my PR locally, and I got a different result: "FAILED (SKIP=11, errors=8, failures=7)" compared to "FAILED (SKIP=11, errors=5, failures=9)" on Travis. This will be fun to debug. [16:14] which test? [16:15] tests, heh [16:15] Ah, some failures are because I don't have phantomjs on my machine. [16:16] I removed that in my fork, problem solved [16:16] test_concurrency_zero fails with 'AssertionError: 10 != 8' in 'self.assertEqual(10, task.peak_work)'. [16:16] I saw that before. It depends on the machine load whether it succeeds. [16:17] maybe the items can be slowed down [16:18] Yeah, that would be one option. Not a very clean solution though. [16:21] A few other tests for wpull.pipeline fail for the same reason. [16:21] This is almost as good as that random time.sleep(0.5) "for good luck" somewhere in the test suite. [16:24] These are the only two types of test failures I see apart from the tests broken by FalconK's FrozenDict to dict replacement and CONNECT support removal. [16:25] The failures on Travis are also interesting. For example, https://travis-ci.org/ArchiveTeam/wpull/jobs/439753220#L4142 is "localhost" vs. "127.0.0.1". [16:27] I wonder if this is due to a different version of Tornado or similar. [16:28] pip install is running with -q on Travis, so we don't even see what it installs. Great. [16:28] I'll add a pip freeze. [17:04] ivan: Yup, those tests fail with Tornado 4.5.3 but not with 4.5.1. Nice... [17:05] ow [17:05] Specifically wpull.testing.integration.priorisation_test:TestPrioritiserHTTPGoodApp.test_app_priority_plugin_get_priority [17:08] Yeah, that test retrieves different URLs depending on the Tornado version. WTF? [17:08] *** wp494 has quit IRC (Ping timeout: 252 seconds) [17:09] *** wp494 has joined #archiveteam-ot [17:10] *** svchfoo1 sets mode: +o wp494 [17:10] https://github.com/tornadoweb/tornado/commit/84bb2e285e15415bb86cbdf2326b19f0debb80fd [17:11] Ah yeah, that explains the localhost vs. 127.0.0.1 difference. Thanks. [17:13] https://gist.github.com/JustAnotherArchivist/5da6bca269eaf7f0b7ce760e0fe24f16 [17:14] That's the output from that test in the two Tornado versions. [17:14] Logging at DEBUG level for 4.5.3 is at https://travis-ci.org/ArchiveTeam/wpull/jobs/443276034#L2372 [17:15] Oh [17:15] wpull.processor.web: DEBUG: Robots filter verdict False reason filters [17:15] wpull.pipeline.session: DEBUG: Skipping ‘http://localhost:44757/blog/?page=3&tab=1’. [17:15] Same thing here, localhost vs. 127.0.0.1. [17:15] Ugh [17:31] *** odemg has joined #archiveteam-ot [17:31] But hey, on the plus side, our test suite caught this. :-) [17:46] ivan: How do you feel about overriding tornado.testing.AsyncHTTPTestCase.get_url to always return a URL on localhost? Or alternatively always 127.0.0.1, though that would require changes throughout the test suite as well. [17:47] localhost is fine. [17:47] it was good enough before tornado changed it :-) [17:48] Yeah, and the performance impact shouldn't be too bad either. [18:17] Yep, down to the five errors caused by FalconK. :-) [18:18] Let's see how Travis does. [18:42] I will review your PR when I'm not half-braindead [18:43] Sure, there's no hurry. [18:59] *** BlueMax has joined #archiveteam-ot [19:28] *** odemg has quit IRC (Ping timeout: 260 seconds) [19:39] *** odemg has joined #archiveteam-ot [19:58] *** jrwr has quit IRC (Read error: Operation timed out) [19:58] *** jrwr has joined #archiveteam-ot [20:06] *** SimpBrain has quit IRC (Remote host closed the connection) [20:20] *** SimpBrain has joined #archiveteam-ot [21:18] *** Stilett0 has quit IRC (Ping timeout: 264 seconds) [21:20] *** Stiletto has joined #archiveteam-ot [21:25] *** Stilett0 has joined #archiveteam-ot [21:30] *** Stiletto has quit IRC (Read error: Operation timed out) [21:35] *** odemg has quit IRC (Ping timeout: 260 seconds) [21:41] *** Stilett0 has quit IRC (Ping timeout: 264 seconds) [21:41] *** Stiletto has joined #archiveteam-ot [21:43] *** BlueMax has quit IRC (Read error: Connection reset by peer) [21:46] *** odemg has joined #archiveteam-ot [21:53] *** Stilett0 has joined #archiveteam-ot [21:53] *** Stiletto has quit IRC (Ping timeout: 492 seconds) [21:56] *** Stiletto has joined #archiveteam-ot [21:57] *** Stilett0 has quit IRC (Ping timeout: 264 seconds) [22:03] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [22:08] *** Stilett0 has joined #archiveteam-ot [22:10] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [22:15] *** Stiletto has joined #archiveteam-ot [22:17] *** Stilett0 has quit IRC (Ping timeout: 264 seconds) [22:19] *** Stilett0 has joined #archiveteam-ot [22:20] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [22:24] *** Stiletto has joined #archiveteam-ot [22:29] *** Stilett0 has quit IRC (Read error: Operation timed out) [22:32] *** Muad-Dib has joined #archiveteam-ot [22:43] *** Stilett0 has joined #archiveteam-ot [22:49] *** Stiletto has quit IRC (Read error: Operation timed out) [22:54] *** Stiletto has joined #archiveteam-ot [22:56] *** Mateon1 has quit IRC (Ping timeout: 268 seconds) [22:56] *** Mateon1 has joined #archiveteam-ot [23:00] *** Stilett0 has quit IRC (Read error: Operation timed out) [23:38] *** odemg has quit IRC (Ping timeout: 260 seconds) [23:50] *** odemg has joined #archiveteam-ot