[00:17] *** BlueMax has joined #archiveteam-ot [00:53] *** Ryz has quit IRC (west.us.hub irc.Prison.NET) [00:53] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [00:59] *** Ryz has joined #archiveteam-ot [00:59] *** robogoat has joined #archiveteam-ot [01:01] *** VerifiedJ has quit IRC (Quit: Leaving) [02:05] https://docs.google.com/spreadsheets/d/1zYZ2107xOZwQ37AjLTc5A4dUJl0ilg8oMrZyA0BGvc0/edit anyone here a pro magician? [02:07] *** LFlarey has quit IRC (Ping timeout: 268 seconds) [02:20] *** maxiPsych has joined #archiveteam-ot [03:00] *** m007a83 has quit IRC (Read error: Connection reset by peer) [03:03] *** m007a83 has joined #archiveteam-ot [04:10] *** Martle has quit IRC (Remote host closed the connection) [04:26] Ivan I am. [04:40] sweet, get onto that Art of Misdirection tracker and archive it ;) [04:50] *** odemg has quit IRC (Ping timeout: 260 seconds) [05:02] *** odemg has joined #archiveteam-ot [05:59] *** kiska has quit IRC (Ping timeout: 252 seconds) [05:59] *** logchfoo2 has quit IRC (Ping timeout: 252 seconds) [06:00] *** logchfoo3 starts logging #archiveteam-ot at Thu Nov 15 06:00:56 2018 [06:00] *** logchfoo3 has joined #archiveteam-ot [06:01] *** Flashfire has joined #archiveteam-ot [06:01] *** kiska has joined #archiveteam-ot [06:01] *** svchfoo1 sets mode: +o dxrt [06:01] *** hook54321 has joined #archiveteam-ot [06:01] *** w0rmhole has joined #archiveteam-ot [06:11] *** maxiPsych has quit IRC (Ping timeout: 261 seconds) [06:30] *** tuluu has quit IRC (Read error: Connection refused) [06:31] *** tuluu has joined #archiveteam-ot [06:56] *** LFlarey has joined #archiveteam-ot [06:57] *** godane has quit IRC (Ping timeout: 265 seconds) [07:11] *** godane has joined #archiveteam-ot [07:14] *** BlueMax has quit IRC (Read error: Connection reset by peer) [07:37] *** LFlarey is now known as LFlare [07:38] *** m007a83 has quit IRC (Read error: No route to host) [07:39] *** m007a83 has joined #archiveteam-ot [08:10] *** SketchCow has quit IRC (Read error: Connection reset by peer) [08:10] *** SketchCow has joined #archiveteam-ot [08:11] *** svchfoo1 sets mode: +o SketchCow [08:12] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [08:12] *** Stiletto has joined #archiveteam-ot [08:53] *** Mateon1 has quit IRC (Ping timeout: 265 seconds) [08:53] *** Mateon1 has joined #archiveteam-ot [09:07] *** Ryz has quit IRC (Quit: ChatZilla 0.9.92-rdmsoft [XULRunner 35.0.1/20150122214805]) [10:04] *** BlueMax has joined #archiveteam-ot [10:41] *** BlueMax has quit IRC (Quit: Leaving) [12:29] *** JAA sets mode: +oooo arkiver chfoo hook54321 Igloo [12:29] *** JAA sets mode: +oooo jrwr Kaz Muad-Dib wp494 [13:29] *** godane has quit IRC (Read error: Connection reset by peer) [13:30] *** godane has joined #archiveteam-ot [16:13] *** Martle has joined #archiveteam-ot [16:30] *** Ryz has joined #archiveteam-ot [17:00] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [17:02] *** VerifiedJ has joined #archiveteam-ot [17:08] *** m007a83 has joined #archiveteam-ot [17:59] *** icedice has joined #archiveteam-ot [18:31] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [18:38] *** icedice has quit IRC (Ping timeout: 600 seconds) [19:04] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [20:14] *** wp494 has quit IRC (Read error: Operation timed out) [20:15] *** wp494 has joined #archiveteam-ot [20:16] *** Martle has quit IRC (Ping timeout: 252 seconds) [20:16] *** svchfoo1 sets mode: +o wp494 [22:14] (Continuing from discussion on getting wpull into Debian in #archivebot...) anarcat: I looked a bit into the html5lib tokeniser situation. The HTMLParser object has some attributes and methods that provide the same interface (specifically, the tokenizer attribute is that class itself, and the normalizedTokens() generator returns the tokens after some additional processing). Unfortunately, these [22:15] aren't documented, so I assume they're also not part of the public API. [22:16] consider getting rid of html5lib :-0 [22:17] JAA: you could lobby html5lib to make those public [22:17] or whatever ivan says :p [22:17] Yeah, those are the two options I guess. [22:18] Or use internal APIs like we're doing with asyncio already. :-P [22:24] I like that html5lib is pure-Python. Gives an alternative for environments which either use a different Python implementation (Jython or whatever) or where it's difficult to install C extensions. [22:33] i like that html5lib is already in debian :p [22:35] *** Martle has joined #archiveteam-ot [22:37] True, but html5-parser (which is what ivan used in his wpull fork) is there as well. :-) [22:38] good [22:38] it seems this touches on the phantomjs problem as well... [22:38] after all "html parsing" is not just html anymore - you need a full frigging engine [22:38] So making html5lib an optional dependency (which would always be disabled in Debian due to the version mismatch) would work until we figure out a better solution for that. [22:38] so i wonder if it wouldn't be better to just shove this in a browser driver like crocoite does [22:38] but maybe that's a lot more work [22:39] anyways [22:39] dinner time here [22:39] Yeah, and I wonder if that should be part of wpull at all. [22:42] anarcat: Doing the Reddit Crossword now. That's a nice puzzle. I'd love to have this on my phone. [22:52] *** hook54321 has joined #archiveteam-ot [23:27] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [23:29] *** Stiletto has joined #archiveteam-ot [23:58] *** Martle_ has joined #archiveteam-ot