[00:15] *** Stilett0 has joined #archiveteam-ot [00:44] *** BlueMax has joined #archiveteam-ot [00:47] *** VerifiedJ has quit IRC (Quit: Leaving) [01:13] *** dashcloud has quit IRC (Read error: Connection reset by peer) [01:13] *** dashcloud has joined #archiveteam-ot [01:22] *** mgrytbak_ has joined #archiveteam-ot [01:24] *** mgrytbak has quit IRC (Ping timeout: 492 seconds) [02:35] *** Stiletto has joined #archiveteam-ot [02:40] *** Stilett0 has quit IRC (Ping timeout: 492 seconds) [03:19] *** Stilett0 has joined #archiveteam-ot [03:21] *** Stiletto has quit IRC (Ping timeout: 268 seconds) [03:31] *** odemg has quit IRC (Ping timeout: 260 seconds) [03:44] *** odemg has joined #archiveteam-ot [04:13] *** Despatche has joined #archiveteam-ot [04:26] *** Despatche has quit IRC (Ping timeout: 633 seconds) [04:27] *** Despatche has joined #archiveteam-ot [04:36] *** Despatche has quit IRC (Ping timeout: 252 seconds) [04:36] *** Despatche has joined #archiveteam-ot [04:48] *** Despatche has quit IRC (Ping timeout: 506 seconds) [04:55] *** Stiletto has joined #archiveteam-ot [04:58] *** Stilett0 has quit IRC (Read error: Operation timed out) [04:58] *** Stilett0 has joined #archiveteam-ot [04:59] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [05:18] *** adinbied has quit IRC (Quit: Left Channel.) [05:32] *** odemg has quit IRC (Ping timeout: 260 seconds) [05:44] *** odemg has joined #archiveteam-ot [09:06] *** BlueMax has quit IRC (Quit: Leaving) [11:53] ivan: I don't remember right now. I'll look into it later and push everything to my fork. [11:53] thanks [14:25] *** wp494 has quit IRC (Ping timeout: 492 seconds) [14:26] *** wp494 has joined #archiveteam-ot [14:28] *** Stilett0 has quit IRC (Read error: Operation timed out) [14:28] *** Stiletto has joined #archiveteam-ot [15:18] joyous day, ludios/wpull and grab-site@v2 aren't crashing [15:18] with the parser replaced now dupespotter is using 25% of the non-idle time [16:28] *** djsundog has joined #archiveteam-ot [17:37] *** VerifiedJ has joined #archiveteam-ot [18:17] *** mal has quit IRC (mal) [19:32] *** ZizzyDizz has joined #archiveteam-ot [19:32] Hello I was just curious if you guys were aware of what's going on with google+ [19:33] It appears that G+ is going to shut down - and take every single comment made on youtube with it for the past 6 years. [19:37] any source for the youtube comments? [19:38] that seems very unlikely [19:44] https://www.blog.google/technology/safety-security/project-strobe/ if g+ shuts down - all G+ services stop right? G+ serves the YT comments at this time. [19:45] The G+ profiles themselves contain lots of info - it's how I track down unlisted videos for my (unusual) archive. [19:45] Only Google+ for consumers shuts down though. Could YouTube be a (Google-internal) enterprise customer, sort of? [19:45] I'm not sure - I've known about the breach for a couple months it's far worse than it's being reported as [19:46] YT has recently made several steps backwards I fear it too might be the next service to shutter. [19:46] pretty sure youtube comments have been associated with youtube accounts for a while [19:47] they don't require a G+ profile either [19:47] Hmm, you might be right on that. [19:47] Well you are right, [19:47] archiving youtube comments is still a good idea though [19:47] just need to get headless chrome to visit a billion pages [19:48] I mean, I understand the majority of comments are garbage [19:48] there's a lot of good stuff [19:48] in some areas of youtube [19:48] But every once in a while I'll find a video, for example fixing something, and someone comments "this guy made a mistake, value should be 800 not 450" etc. [19:48] Or questions and answers from the uploaders themselves [20:35] *** Stilett0 has joined #archiveteam-ot [20:36] *** Stiletto has quit IRC (Ping timeout: 252 seconds) [20:45] *** mal has joined #archiveteam-ot [20:46] ivan: I'm sure it would be possible to archive YT comments without a headless browser. Given what I had to do to support Google+ in snscrape though, I fully expect that to be a major PITA. [20:54] *** Mateon1 has quit IRC (Ping timeout: 252 seconds) [20:54] *** Mateon1 has joined #archiveteam-ot [21:24] ivan: So regarding wpull, I fixed a bunch of bugs, e.g. --concurrent not working (#339, probably not relevant in the context of grab-site), flattening consecutive slashes in URLs (#380), treatment of backslashes (#377), date parsing crashes (#376), treatment of tabs and newlines in URLs (#355), and empty ports (#340). I also added URL priorisation and splitting meta WARCs when the data WARC is split, [21:24] and I rewrote part of the pipeline code to fix some ordering bugs. [21:26] I did this in Jan and Feb this year, and I don't remember why I didn't publish the code. [21:27] Probably because I didn't have a chance to properly test all of this. [21:28] The last thing I was working on was reversing the CONNECT verb removal, which broke the youtube-dl integration in 2.0.3. [21:29] *** robogoat has quit IRC (Ping timeout: 260 seconds) [21:35] *** robogoat has joined #archiveteam-ot [21:52] chfoo: See above, I think it would be good to chat about the future of wpull sometime. I did some stuff in Jan/Feb, and ivan's working on it now. Maybe we could move it to a "wpull" organisation so the three of us (plus anyone else interested) could contribute? [21:53] *** robogoat has quit IRC (Read error: Operation timed out) [21:57] *** robogoat has joined #archiveteam-ot [22:31] *** m007a83 has quit IRC (Read error: Connection reset by peer) [23:04] *** BlueMax has joined #archiveteam-ot [23:22] I actually use grab-site a lot now, I wanted to integrate youtube-dl so I could archive forums full of unlisted videos but never got around to it. [23:24] *** m007a83 has joined #archiveteam-ot [23:34] *** VerifiedJ has quit IRC (Quit: Leaving)