#archiveteam-ot 2018-10-18,Thu

↑back Search

Time Nickname Message
00:17 🔗 Stiletto has joined #archiveteam-ot
00:19 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
00:24 🔗 Stilett0 has joined #archiveteam-ot
00:30 🔗 Stiletto has quit IRC (Read error: Operation timed out)
00:31 🔗 Stiletto has joined #archiveteam-ot
00:34 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
00:34 🔗 Stilett0 has joined #archiveteam-ot
00:36 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
01:02 🔗 Mateon1 has quit IRC (Ping timeout: 633 seconds)
01:02 🔗 Mateon1 has joined #archiveteam-ot
01:04 🔗 Igloo_ has quit IRC (Read error: Operation timed out)
01:04 🔗 svchfoo3 has quit IRC (Read error: Operation timed out)
01:05 🔗 Igloo has joined #archiveteam-ot
01:07 🔗 svchfoo3 has joined #archiveteam-ot
01:08 🔗 svchfoo1 sets mode: +o svchfoo3
01:21 🔗 Stilett0 has quit IRC (Ping timeout: 252 seconds)
01:21 🔗 Stiletto has joined #archiveteam-ot
01:33 🔗 Stilett0 has joined #archiveteam-ot
01:36 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
04:57 🔗 Stiletto has joined #archiveteam-ot
04:58 🔗 Stilett0 has quit IRC (Ping timeout: 258 seconds)
05:13 🔗 Muad-Dib has quit IRC (Ping timeout: 260 seconds)
05:33 🔗 Muad-Dib has joined #archiveteam-ot
07:57 🔗 adinbied has quit IRC (Left Channel.)
08:02 🔗 adinbied has joined #archiveteam-ot
08:30 🔗 m007a83 has quit IRC (Read error: Connection reset by peer)
08:33 🔗 m007a83 has joined #archiveteam-ot
09:36 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:45 🔗 ivan https://gist.github.com/ivan/dd1d5d9f07da68dca454ee2f899820cf yapsy doesn't work on nixos, or something
11:48 🔗 JAA Maybe related to imp.load_module being deprecated since Python 3.3?
11:49 🔗 ivan WTF, changing open(candidate_filepath+".py","r") to open(candidate_filepath+".py","rb") in yapsy fixes it
11:50 🔗 ivan why does it work on Debian!
12:18 🔗 ivan I found a codepath that might be different but I'm just going to patch this yapsy garbage
12:47 🔗 ivan I think I need to just remove yapsy
12:48 🔗 JAA ivan: Do you plan to merge all these changes to wpull back into the main repo at some point, or is that just for your own use in grab-site?
12:52 🔗 ivan I don't know, I'm trying to minimize the amount of time I spend touching Python and ArchiveTeam/wpull seems to have higher standards (like a plan for passing tests) than my repo
12:52 🔗 ivan I mean they're pretty good changes so if you like them please take them
12:53 🔗 ivan perhaps some regressions, too
12:55 🔗 JAA Right. I do like the idea of removing Yapsy. I had nothing but issues with it, and it seems like a huge overkill compared to simply importing the plugin file as a module and using WpullPlugin.__subclasses__. Although "simply importing the plugin file as a module" is a PITA to support in multiple Python versions because they changed how that works multiple times.
12:56 🔗 ivan I tried to monkeypatch it but it's got singletons and stashed instances
12:57 🔗 ivan maybe I'll subclass PluginManager!
12:59 🔗 JAA I also want to remove Tornado at some point, at least outside the test code.
13:02 🔗 ivan subclassing doesn't work either, of course
13:02 🔗 ivan this is such insane bullshit
13:17 🔗 ivan JAA: nice, I didn't even know about __subclasses__
13:19 🔗 JAA Yeah, it's extremely useful. I use it in snscrape for modules. Each scraper is a subclass of snscrape.base.Scraper, and I simply collect the __subclasses__ to generate the command line options on the fly. That way, when I want to add a new scraper, I just need to write a new module in snscrape/modules and don't need to care about adding it anywhere else.
13:20 🔗 JAA ivan: Regarding "why does it work on Debian but not on NixOS": could it be related to hash-based .pyc files?
13:21 🔗 JAA I.e. that the build on NixOS uses hash-based .pycs but the Debian one doesn't.
13:21 🔗 JAA The code throwing that exception is this: https://github.com/python/cpython/blob/3.7/Lib/importlib/_bootstrap_external.py#L832-L841
13:22 🔗 ivan let me finish looking into that, I wrote a script to look at the flags but didn't remember that the pyc files are in __pycache__ now :-)
13:24 🔗 JAA Yep, there's the bug report: https://bugs.python.org/issue34056
13:27 🔗 ivan you are correct of course
13:27 🔗 ivan # ~/gs-venv/bin/python3 ./print_pyc_flags ~/gs-venv/lib/python3.7/site-packages/libgrabsite/__pycache__/wpull_hooks.cpython-37.pyc
13:27 🔗 ivan flags = 3
13:27 🔗 ivan hash_based = True
13:27 🔗 ivan god fucking damnit python
13:27 🔗 ivan thanks for the link
13:49 🔗 Muad-Dib has quit IRC (Ping timeout: 260 seconds)
13:57 🔗 JAA :-)
14:00 🔗 Muad-Dib has joined #archiveteam-ot
14:44 🔗 ivan https://github.com/NixOS/nixpkgs/blob/7261a66d713aabf30249535bfa53e5d205c4423e/pkgs/development/interpreters/python/cpython/3.7/default.nix#L138
14:48 🔗 ivan it took me another hour to figure out which thing was enabling the hash-based pyc compilation
14:48 🔗 ivan # rm -rf __pycache__; SOURCE_DATE_EPOCH=0 python3 -c 'import importme'; ./print_pyc_flags ./__pycache__/importme.cpython-37.pyc
14:48 🔗 ivan flags = 0
14:48 🔗 ivan hash_based = False
14:48 🔗 ivan # rm -rf __pycache__; echo importme.py | SOURCE_DATE_EPOCH=0 python3 -m compileall -f > /dev/null; ~/print_pyc_flags ./__pycache__/importme.cpython-37.pyc
14:48 🔗 ivan flags = 3
14:48 🔗 ivan hash_based = True
15:33 🔗 Muad-Dib has quit IRC (Ping timeout: 260 seconds)
15:37 🔗 Muad-Dib has joined #archiveteam-ot
16:10 🔗 Stilett0 has joined #archiveteam-ot
16:13 🔗 Stiletto has quit IRC (Read error: Operation timed out)
16:13 🔗 JAA ivan: Hrm, I hate this. I just ran the wpull test suite for my PR locally, and I got a different result: "FAILED (SKIP=11, errors=8, failures=7)" compared to "FAILED (SKIP=11, errors=5, failures=9)" on Travis. This will be fun to debug.
16:14 🔗 ivan which test?
16:15 🔗 ivan tests, heh
16:15 🔗 JAA Ah, some failures are because I don't have phantomjs on my machine.
16:16 🔗 ivan I removed that in my fork, problem solved
16:16 🔗 JAA test_concurrency_zero fails with 'AssertionError: 10 != 8' in 'self.assertEqual(10, task.peak_work)'.
16:16 🔗 JAA I saw that before. It depends on the machine load whether it succeeds.
16:17 🔗 ivan maybe the items can be slowed down
16:18 🔗 JAA Yeah, that would be one option. Not a very clean solution though.
16:21 🔗 JAA A few other tests for wpull.pipeline fail for the same reason.
16:21 🔗 JAA This is almost as good as that random time.sleep(0.5) "for good luck" somewhere in the test suite.
16:24 🔗 JAA These are the only two types of test failures I see apart from the tests broken by FalconK's FrozenDict to dict replacement and CONNECT support removal.
16:25 🔗 JAA The failures on Travis are also interesting. For example, https://travis-ci.org/ArchiveTeam/wpull/jobs/439753220#L4142 is "localhost" vs. "127.0.0.1".
16:27 🔗 JAA I wonder if this is due to a different version of Tornado or similar.
16:28 🔗 JAA pip install is running with -q on Travis, so we don't even see what it installs. Great.
16:28 🔗 JAA I'll add a pip freeze.
17:04 🔗 JAA ivan: Yup, those tests fail with Tornado 4.5.3 but not with 4.5.1. Nice...
17:05 🔗 ivan ow
17:05 🔗 JAA Specifically wpull.testing.integration.priorisation_test:TestPrioritiserHTTPGoodApp.test_app_priority_plugin_get_priority
17:08 🔗 JAA Yeah, that test retrieves different URLs depending on the Tornado version. WTF?
17:08 🔗 wp494 has quit IRC (Ping timeout: 252 seconds)
17:09 🔗 wp494 has joined #archiveteam-ot
17:10 🔗 svchfoo1 sets mode: +o wp494
17:10 🔗 ivan https://github.com/tornadoweb/tornado/commit/84bb2e285e15415bb86cbdf2326b19f0debb80fd
17:11 🔗 JAA Ah yeah, that explains the localhost vs. 127.0.0.1 difference. Thanks.
17:13 🔗 JAA https://gist.github.com/JustAnotherArchivist/5da6bca269eaf7f0b7ce760e0fe24f16
17:14 🔗 JAA That's the output from that test in the two Tornado versions.
17:14 🔗 JAA Logging at DEBUG level for 4.5.3 is at https://travis-ci.org/ArchiveTeam/wpull/jobs/443276034#L2372
17:15 🔗 JAA Oh
17:15 🔗 JAA wpull.processor.web: DEBUG: Robots filter verdict False reason filters
17:15 🔗 JAA wpull.pipeline.session: DEBUG: Skipping ‘http://localhost:44757/blog/?page=3&tab=1’.
17:15 🔗 JAA Same thing here, localhost vs. 127.0.0.1.
17:15 🔗 JAA Ugh
17:31 🔗 odemg has joined #archiveteam-ot
17:31 🔗 JAA But hey, on the plus side, our test suite caught this. :-)
17:46 🔗 JAA ivan: How do you feel about overriding tornado.testing.AsyncHTTPTestCase.get_url to always return a URL on localhost? Or alternatively always 127.0.0.1, though that would require changes throughout the test suite as well.
17:47 🔗 ivan localhost is fine.
17:47 🔗 ivan it was good enough before tornado changed it :-)
17:48 🔗 JAA Yeah, and the performance impact shouldn't be too bad either.
18:17 🔗 JAA Yep, down to the five errors caused by FalconK. :-)
18:18 🔗 JAA Let's see how Travis does.
18:42 🔗 ivan I will review your PR when I'm not half-braindead
18:43 🔗 JAA Sure, there's no hurry.
18:59 🔗 BlueMax has joined #archiveteam-ot
19:28 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
19:39 🔗 odemg has joined #archiveteam-ot
19:58 🔗 jrwr has quit IRC (Read error: Operation timed out)
19:58 🔗 jrwr has joined #archiveteam-ot
20:06 🔗 SimpBrain has quit IRC (Remote host closed the connection)
20:20 🔗 SimpBrain has joined #archiveteam-ot
21:18 🔗 Stilett0 has quit IRC (Ping timeout: 264 seconds)
21:20 🔗 Stiletto has joined #archiveteam-ot
21:25 🔗 Stilett0 has joined #archiveteam-ot
21:30 🔗 Stiletto has quit IRC (Read error: Operation timed out)
21:35 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
21:41 🔗 Stilett0 has quit IRC (Ping timeout: 264 seconds)
21:41 🔗 Stiletto has joined #archiveteam-ot
21:43 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
21:46 🔗 odemg has joined #archiveteam-ot
21:53 🔗 Stilett0 has joined #archiveteam-ot
21:53 🔗 Stiletto has quit IRC (Ping timeout: 492 seconds)
21:56 🔗 Stiletto has joined #archiveteam-ot
21:57 🔗 Stilett0 has quit IRC (Ping timeout: 264 seconds)
22:03 🔗 Muad-Dib has quit IRC (Ping timeout: 260 seconds)
22:08 🔗 Stilett0 has joined #archiveteam-ot
22:10 🔗 Stiletto has quit IRC (Ping timeout: 264 seconds)
22:15 🔗 Stiletto has joined #archiveteam-ot
22:17 🔗 Stilett0 has quit IRC (Ping timeout: 264 seconds)
22:19 🔗 Stilett0 has joined #archiveteam-ot
22:20 🔗 Stiletto has quit IRC (Ping timeout: 264 seconds)
22:24 🔗 Stiletto has joined #archiveteam-ot
22:29 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
22:32 🔗 Muad-Dib has joined #archiveteam-ot
22:43 🔗 Stilett0 has joined #archiveteam-ot
22:49 🔗 Stiletto has quit IRC (Read error: Operation timed out)
22:54 🔗 Stiletto has joined #archiveteam-ot
22:56 🔗 Mateon1 has quit IRC (Ping timeout: 268 seconds)
22:56 🔗 Mateon1 has joined #archiveteam-ot
23:00 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
23:38 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
23:50 🔗 odemg has joined #archiveteam-ot

irclogger-viewer