[00:01] *** Stilett0 has quit IRC (Read error: Operation timed out) [00:02] *** Stilett0 has joined #archiveteam-ot [00:03] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [00:21] *** Stiletto has joined #archiveteam-ot [00:23] *** Stilett0 has quit IRC (Ping timeout: 246 seconds) [00:36] *** Stilett0 has joined #archiveteam-ot [00:36] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [01:24] *** Stiletto has joined #archiveteam-ot [01:26] *** Stilett0 has quit IRC (Ping timeout: 252 seconds) [01:31] *** Stilett0 has joined #archiveteam-ot [01:32] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [01:38] *** Stiletto has joined #archiveteam-ot [01:41] *** Stilett0 has quit IRC (Ping timeout: 260 seconds) [01:46] *** odemg has quit IRC (Ping timeout: 260 seconds) [01:58] *** odemg has joined #archiveteam-ot [02:56] *** Despatche has joined #archiveteam-ot [03:01] *** Stilett0 has joined #archiveteam-ot [03:03] *** Stiletto has quit IRC (Ping timeout: 264 seconds) [03:36] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:41] *** Despatche has quit IRC (Read error: Operation timed out) [03:48] *** odemg has joined #archiveteam-ot [04:04] *** odemg has quit IRC (Ping timeout: 260 seconds) [04:04] *** odemg has joined #archiveteam-ot [04:50] *** Stiletto has joined #archiveteam-ot [04:52] *** Stilett0 has quit IRC (Read error: Operation timed out) [05:35] *** BlueMax has quit IRC (Remote host closed the connection) [05:35] *** m007a83_ has joined #archiveteam-ot [05:36] *** asie has quit IRC (Read error: Operation timed out) [05:37] *** BlueMax has joined #archiveteam-ot [05:38] *** Jens has quit IRC (Read error: Operation timed out) [05:39] *** m007a83 has quit IRC (Read error: Operation timed out) [05:41] *** wp494 has quit IRC (Ping timeout: 633 seconds) [05:43] *** wp494 has joined #archiveteam-ot [05:48] *** m007a83_ has quit IRC (Quit: Fuck you Comcast) [06:15] *** Jens has joined #archiveteam-ot [06:53] *** Mateon1 has quit IRC (Ping timeout: 268 seconds) [06:53] *** Mateon1 has joined #archiveteam-ot [06:57] *** asie has joined #archiveteam-ot [07:19] *** dashcloud has quit IRC (Read error: Operation timed out) [07:23] *** dashcloud has joined #archiveteam-ot [07:45] *** MrRadar has quit IRC (Read error: Operation timed out) [07:51] *** MrRadar has joined #archiveteam-ot [08:20] *** Odd0002_ has joined #archiveteam-ot [08:22] *** Odd0002 has quit IRC (Read error: Operation timed out) [08:22] *** Odd0002_ is now known as Odd0002 [08:29] *** godane has joined #archiveteam-ot [08:29] *** svchfoo3 sets mode: +o godane [09:24] https://github.com/ludios/item-maker [09:41] for mass twitter archiving I am ignoring ^https?://pbs\.twimg\.com/(emoji|profile_images)/ [09:41] hopefully most of the profile images are in other captures [11:24] https://github.com/ludios/grab-site/tree/html5-parser still baking [11:29] if anyone has a dire need for it right now add @html5-parser to the pip3 install url [11:33] * ivan goes to actually make it work [11:40] I thought I ran it with the new --html-parser but no and now I realize it's a little trickier [11:42] wpull works on its own objects created by HTMLParserTarget instead of the lxml tree [11:57] mmm multiple inheritance [12:43] ok, it's back to scraping links again [12:55] *** BlueMax has quit IRC (Read error: Connection reset by peer) [13:07] does anyone use {primary_url} in their own grab-site ignores? tempted to remove it as I rewrite ignoracle [13:14] *** kiska1 has quit IRC (Read error: Operation timed out) [13:15] *** mal_ has quit IRC (Write error: Broken pipe) [13:15] *** djsundog has quit IRC (Read error: Operation timed out) [13:15] *** dxrt_ has quit IRC (Read error: Operation timed out) [13:15] *** Albardin has quit IRC (Write error: Broken pipe) [13:49] *** m007a83 has joined #archiveteam-ot [13:56] *** kiska1 has joined #archiveteam-ot [14:03] *** mal_ has joined #archiveteam-ot [14:12] *** Albardin has joined #archiveteam-ot [14:12] *** djsundog has joined #archiveteam-ot [14:13] *** dxrt_ has joined #archiveteam-ot [14:13] *** dxrt sets mode: +o dxrt_ [14:44] *** Albardin has quit IRC (Read error: Operation timed out) [14:44] *** mal_ has quit IRC (Write error: Broken pipe) [14:44] *** kiska1 has quit IRC (Read error: Operation timed out) [14:44] *** dxrt_ has quit IRC (Read error: Operation timed out) [14:44] *** djsundog has quit IRC (Read error: Operation timed out) [15:03] I cherry-picked some of the early wpull 2.0 commits and now grab-site is running on Python 3.7 :-) [15:12] *** adinbied has quit IRC (Read error: Operation timed out) [15:16] ivan: Did you see my ignoracle changes that were merged into ArchiveBot recently? Not sure regarding {primary_url} and {primary_netloc}, but it's used in ArchiveBot sometimes (singletumblr igset, for example). [15:16] I think the only changes necessary in wpull for 3.7 are replacing asyncio.async with asyncio.ensure_future. [15:17] *** adinbied has joined #archiveteam-ot [15:20] *** eLbot has quit IRC (Ping timeout: 268 seconds) [15:20] *** kiskabak has quit IRC (Ping timeout: 268 seconds) [15:21] *** w0rmybak has quit IRC (Ping timeout: 268 seconds) [15:21] *** robogoat has quit IRC (Ping timeout: 268 seconds) [15:21] *** fenn has quit IRC (Ping timeout: 268 seconds) [15:21] *** Igloo has quit IRC (Ping timeout: 268 seconds) [15:21] *** Igloo has joined #archiveteam-ot [15:21] *** fenn has joined #archiveteam-ot [15:21] I did not see them, I will take a look [15:21] I was planning on combining all the ignores into one and putting into pyre2 [15:21] *** jut has quit IRC (Ping timeout: 268 seconds) [15:21] *** mr_archiv has quit IRC (Ping timeout: 268 seconds) [15:21] *** svchfoo1 has quit IRC (Ping timeout: 268 seconds) [15:21] putting it into [15:22] *** wp494 has quit IRC (Read error: Operation timed out) [15:22] *** eLbot has joined #archiveteam-ot [15:23] *** wp494 has joined #archiveteam-ot [15:23] Oof, the ping timeouts. [15:28] *** kiska1 has joined #archiveteam-ot [15:29] Yeah Choopa and portlane are very unstable [15:29] how come my WebSocketClientProtocol.on_close isn't working after switching from trollius to asyncio [15:31] onClose I mean [15:32] *** mal_ has joined #archiveteam-ot [15:35] *** robogoat has joined #archiveteam-ot [15:41] *** MrRadar2 has quit IRC (Ping timeout: 268 seconds) [15:42] I'm sure autobahn is doing something terrible like swallowing an exception [15:42] *** SketchCow has quit IRC (Ping timeout: 268 seconds) [15:43] *** Albardin has joined #archiveteam-ot [15:43] *** mr_archiv has joined #archiveteam-ot [15:43] *** djsundog has joined #archiveteam-ot [15:43] *** dxrt_ has joined #archiveteam-ot [15:43] *** dxrt sets mode: +o dxrt_ [15:44] *** SketchCow has joined #archiveteam-ot [15:45] *** BnAboyZ has quit IRC (Ping timeout: 268 seconds) [15:47] *** MrRadar2 has joined #archiveteam-ot [15:48] *** BnAboyZ has joined #archiveteam-ot [15:53] *** adinbied has quit IRC (Read error: Operation timed out) [15:58] *** VerifiedJ has joined #archiveteam-ot [15:59] *** kiskabak has joined #archiveteam-ot [15:59] *** w0rmybak has joined #archiveteam-ot [15:59] *** svchfoo1 has joined #archiveteam-ot [16:00] *** svchfoo3 sets mode: +o svchfoo1 [16:00] *** jut has joined #archiveteam-ot [16:19] *** godane has quit IRC (Read error: Operation timed out) [16:30] *** adinbied has joined #archiveteam-ot [16:43] I google for "NoneType' object has no attribute 'resume_reading" because it's a thing I'm seeing on Python 3.7 and on the second page... ArchiveBot dashboard 3.0 [16:44] https://github.com/ludios/wpull/issues/3 [16:45] *** Despatche has joined #archiveteam-ot [16:46] if someone could write a crawler in a good language that would be great [16:52] Haha [16:52] I blame asyncio. Its network stack is awful. [16:54] Well, and Tornado. Removing that from wpull is actually one of the things on my todo list. [16:55] it looks like asyncio was refactored to use async functions and I guess someone fucked it up [16:55] everything can change across an `await` [16:56] That was always the case though. [16:56] *** m007a83_ has joined #archiveteam-ot [16:56] Even when it was all directly generator-based etc. [16:56] sure [16:57] *** VerifiedJ has quit IRC (Read error: Operation timed out) [16:57] On the other hand, I haven't noticed any issues with my aiohttp-based script so far, even after a few hundred million requests. [16:58] I'm still running that on 3.4 though, so maybe that's not too surprising. [16:58] If something went wrong in the @asyncio.coroutine -> async def conversion. [16:58] *** VerifiedJ has joined #archiveteam-ot [16:58] *** m007a83 has quit IRC (Read error: Operation timed out) [17:03] *** mgrytbak has quit IRC (Ping timeout: 492 seconds) [17:05] ok I patched asyncio to check if _transport is not None [17:06] *** mgrytbak_ has joined #archiveteam-ot [17:15] *** apache2_ has quit IRC (Remote host closed the connection) [17:15] *** apache2 has joined #archiveteam-ot [17:16] *** Despatche has quit IRC (Quit: Error: Connection reset by peer) [17:28] https://github.com/ludios/grab-site/commit/9b9682b72209dab1b4e7f149e513175f00a03592#diff-30267b0aeba882fe1003b704fbed5804 [17:34] I didn't even patch the right file, great [17:34] Hmm, that error crashed wpull? I thought it just produces an error. [17:36] oh, I'm not running wpull 2.0.3 but rather just enough patches to get wpull 1.2.3 running on Python 3.7 [17:36] maybe I should stop and use the real 2.x [17:37] Aah, I see. [17:37] Well, if you're a masochist, maybe you should. :-) [17:40] xD [18:04] *** godane has joined #archiveteam-ot [18:04] *** svchfoo1 sets mode: +o godane [18:14] oh I see the thing I've been fighting was fixed in 0c517bab510ff9555bff055b4e1e78a807e0bd90in the fork [18:15] makes total sense for asyncio to be raising AttributeError! [18:18] Ah, so that's why it's not a crash in ArchiveBot. [18:18] *** wp494 has quit IRC (west.us.hub irc.Prison.NET) [18:34] *** wp494 has joined #archiveteam-ot [18:43] *** vectr0n has quit IRC (Quit: ZNC - https://znc.in) [19:03] *** dxrt has quit IRC (Ping timeout: 360 seconds) [19:03] *** dxrt has joined #archiveteam-ot [19:06] *** vectr0n has joined #archiveteam-ot [19:06] *** wp494 has quit IRC (Read error: Operation timed out) [19:06] *** Igloo_ has joined #archiveteam-ot [19:06] *** Igloo has quit IRC (Ping timeout: 360 seconds) [19:06] *** wp494 has joined #archiveteam-ot [19:08] *** arkiver has quit IRC (Ping timeout: 360 seconds) [19:10] *** w0rmybak has quit IRC (Ping timeout: 268 seconds) [19:10] *** kiskabak has quit IRC (Ping timeout: 268 seconds) [19:12] *** arkiver has joined #archiveteam-ot [19:14] *** nightpool has quit IRC (Read error: Operation timed out) [19:14] *** nightpool has joined #archiveteam-ot [19:15] *** Albardin has quit IRC (Ping timeout: 600 seconds) [19:15] *** MrRadar has quit IRC (Read error: Connection reset by peer) [19:16] *** robogoat has quit IRC (Ping timeout: 360 seconds) [19:19] *** chirlu` has quit IRC (Read error: Operation timed out) [19:23] *** robogoat has joined #archiveteam-ot [19:29] *** chirlu has joined #archiveteam-ot [19:38] *** kiskabak has joined #archiveteam-ot [19:38] *** w0rmybak has joined #archiveteam-ot [19:38] *** Albardin has joined #archiveteam-ot [19:48] *** m007a83_ is now known as m007a83 [19:51] *** MrRadar has joined #archiveteam-ot [20:03] *** mgrytbak_ is now known as mgrytbak [21:39] *** BlueMax has joined #archiveteam-ot [21:40] *** SimpBrain has joined #archiveteam-ot [21:42] *** Famicoman has quit IRC (Quit: Famicoman) [21:50] *** VerifiedJ has quit IRC (Quit: Leaving) [21:52] I will be out today with family so unable to help monitor archivebot [21:53] *** Stiletto has quit IRC () [21:57] *** w0rmhole has quit IRC (Excess Flood) [21:57] *** w0rmhole has joined #archiveteam-ot [22:04] !ao < https://transfer.sh/13hV5W/-_WikiTeam-tweets --igset twitter --delay 0