[01:08] *** dxrt_ sets mode: +o dxrt [01:10] *** dxrt_ has quit IRC (The Lounge - https://thelounge.chat) [01:29] *** dxrt_ has joined #archiveteam-ot [01:29] *** dxrt sets mode: +o dxrt_ [01:39] *** chirlu has quit IRC (hub.efnet.us irc.Prison.NET) [01:43] *** chirlu` has joined #archiveteam-ot [01:58] https://www.tripadvisor.com.au/ShowUserReviews-g255346-d259670-r506665283-Sovereign_Hill-Ballarat_Victoria.html [01:58] https://en.m.wikipedia.org/wiki/Sovereign_Hill [02:31] ivan: In regards to https://github.com/ludios/grab-site/issues/133 - I had the same issue on plain old grab-site last week. [02:33] *** ivan has quit IRC (Read error: Operation timed out) [02:34] *** dashcloud has quit IRC (Read error: Operation timed out) [02:35] *** w0rmybak has quit IRC (Ping timeout: 268 seconds) [02:35] *** ivan has joined #archiveteam-ot [02:35] *** svchfoo1 sets mode: +o ivan [02:36] *** JAA has quit IRC (Ping timeout: 246 seconds) [02:36] *** Albardin has quit IRC (Ping timeout (120 seconds)) [02:36] *** jspiros has quit IRC (Read error: Operation timed out) [02:36] *** dashcloud has joined #archiveteam-ot [02:37] *** kiskabak has quit IRC (Ping timeout: 268 seconds) [03:17] dxrt: thanks, I think I noticed it on some of my old crawls too [03:18] the websocket reconnect logic in the hooks is probably bogus and it might be hanging some other things like reloading the control files [03:20] never mind on that second part I checked wrong [03:22] doing science on software requires such careful observation [03:35] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:48] *** odemg has joined #archiveteam-ot [04:15] https://gist.github.com/ivan/cbffa32374aec15f3cc3d16ce1f2d21f make it stop [04:23] I searched github for "warc crawler" to see if anyone has something new and guess what pops up first :( [04:25] https://github.com/info-labs/owlbot [04:31] omg there's haXe code in wpull [04:36] *** jspiros has joined #archiveteam-ot [04:39] *** JAA has joined #archiveteam-ot [04:39] *** svchfoo1 sets mode: +o JAA [04:40] *** bakJAA sets mode: +o JAA [04:54] *** Mateon1 has quit IRC (Ping timeout: 252 seconds) [04:54] *** Mateon1 has joined #archiveteam-ot [05:15] what's the purpose of wpull's HTMLConverter? [05:15] omg the --convert-links thing [05:15] let me just remove that [05:17] HTTrack has a good lock on that market anyway [05:19] *** SimpBrain has quit IRC (Read error: Connection reset by peer) [05:20] *** m007a83 has quit IRC (Remote host closed the connection) [05:20] *** m007a83 has joined #archiveteam-ot [05:24] *** BlueMax has quit IRC (Read error: Operation timed out) [05:24] *** BlueMax has joined #archiveteam-ot [05:25] *** SimpBrain has joined #archiveteam-ot [05:26] *** asie has quit IRC (Ping timeout: 633 seconds) [05:26] *** Jens has quit IRC (Ping timeout: 633 seconds) [05:28] *** BlueMax has quit IRC (Remote host closed the connection) [05:30] *** jrwr has quit IRC (Ping timeout: 633 seconds) [05:32] *** jrwr has joined #archiveteam-ot [05:34] *** phuzion has quit IRC (Ping timeout: 633 seconds) [05:39] *** BlueMax has joined #archiveteam-ot [05:47] *** bztoot has joined #archiveteam-ot [05:51] *** chfoo has quit IRC (Ping timeout: 633 seconds) [05:52] *** t2t2 has quit IRC (Ping timeout: 633 seconds) [05:52] *** chfoo has joined #archiveteam-ot [05:55] *** Jens has joined #archiveteam-ot [06:03] *** JAA has quit IRC (Read error: Operation timed out) [06:03] *** jspiros has quit IRC (Read error: Operation timed out) [06:04] *** chfoo has quit IRC (Ping timeout: 246 seconds) [06:09] *** chfoo has joined #archiveteam-ot [06:14] *** asie has joined #archiveteam-ot [06:31] https://gist.github.com/ivan/855eff1e3fe980fb7e57f7e21a0c1e56 [06:31] https://github.com/rthalley/dnspython/issues/59#event-653775397 problem solved eh [07:36] *** m007a83_ has joined #archiveteam-ot [07:38] *** m007a83 has quit IRC (Read error: Operation timed out) [07:58] why does wpull use dnspython by default instead of the system resolver? [07:58] *** jspiros has joined #archiveteam-ot [08:00] *** JAA has joined #archiveteam-ot [08:01] *** svchfoo1 sets mode: +o JAA [08:01] *** bakJAA sets mode: +o JAA [08:56] does anyone use grab-site --custom-hooks? [09:43] wpull is silently failing to load my plugin, good stuff [09:45] the plugin loader does something I can't decipher with paths and then uses some yapsy module [09:47] starting to regret looking into this [10:30] *** VerifiedJ has joined #archiveteam-ot [10:36] http://pages.di.unipi.it/marino/bubing.pdf / https://github.com/LAW-Unimi/BUbiNG [10:44] *** BlueMax has quit IRC (Quit: Leaving) [11:30] *** m007a83_ has quit IRC (Read error: Connection reset by peer) [11:30] *** m007a83_ has joined #archiveteam-ot [11:32] *** dxrt- has joined #archiveteam-ot [11:34] *** dxrt has quit IRC (Ping timeout: 252 seconds) [11:46] ivan: I assume wpull uses dnspython instead of socket.getaddrinfo because the --dns-timeout option can't be implemented with the latter (AFAIK). Maybe some other things as well. It does seem to fall back to socket.getaddrinfo (well, the event loop's getaddrinfo) though when it doesn't get a reply from dnspython if I read the code correctly (wpull.network.dns.Resolver.resolve method). [12:01] ah thanks that is starting to jog my memory [12:25] *** dxrt- is now known as dxrt [12:26] *** dxrt_ sets mode: +o dxrt [12:42] *** m007a83_ is now known as m007a83 [12:43] *** faolingf_ has joined #archiveteam-ot [12:45] *** faoling__ has joined #archiveteam-ot [12:45] *** faolingfa has quit IRC (Ping timeout: 252 seconds) [12:48] *** faolingf_ has quit IRC (Ping timeout: 252 seconds) [12:51] *** VerifiedJ has quit IRC (Read error: Connection reset by peer) [13:04] *** faolingfa has joined #archiveteam-ot [13:09] *** faoling__ has quit IRC (Read error: Operation timed out) [13:29] *** VerifiedJ has joined #archiveteam-ot [14:15] *** bztoot has quit IRC (Quit: bztoot) [14:16] *** jspiros has quit IRC (Read error: Operation timed out) [14:23] *** jspiros has joined #archiveteam-ot [14:31] *** t2t2 has joined #archiveteam-ot [14:34] w0rmhole: if you like archiving tweets, i can provide you some twitter accounts to save [14:38] *** faolingfa has quit IRC (Quit: Leaving) [15:11] *** VerifiedJ has quit IRC (Read error: Operation timed out) [16:02] *** VerifiedJ has joined #archiveteam-ot [16:36] *** VerifiedJ has quit IRC (Quit: Leaving) [18:21] *** wp494 has quit IRC (Ping timeout: 268 seconds) [18:22] *** wp494 has joined #archiveteam-ot [19:29] *** znak has joined #archiveteam-ot [19:29] *** znak has left [22:39] *** djsundog has quit IRC (The Lounge - https://thelounge.chat) [23:40] *** mal_ is now known as mal