#archiveteam-ot 2018-10-08,Mon

↑back Search

Time Nickname Message
00:15 🔗 Stilett0 has joined #archiveteam-ot
00:44 🔗 BlueMax has joined #archiveteam-ot
00:47 🔗 VerifiedJ has quit IRC (Quit: Leaving)
01:13 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
01:13 🔗 dashcloud has joined #archiveteam-ot
01:22 🔗 mgrytbak_ has joined #archiveteam-ot
01:24 🔗 mgrytbak has quit IRC (Ping timeout: 492 seconds)
02:35 🔗 Stiletto has joined #archiveteam-ot
02:40 🔗 Stilett0 has quit IRC (Ping timeout: 492 seconds)
03:19 🔗 Stilett0 has joined #archiveteam-ot
03:21 🔗 Stiletto has quit IRC (Ping timeout: 268 seconds)
03:31 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
03:44 🔗 odemg has joined #archiveteam-ot
04:13 🔗 Despatche has joined #archiveteam-ot
04:26 🔗 Despatche has quit IRC (Ping timeout: 633 seconds)
04:27 🔗 Despatche has joined #archiveteam-ot
04:36 🔗 Despatche has quit IRC (Ping timeout: 252 seconds)
04:36 🔗 Despatche has joined #archiveteam-ot
04:48 🔗 Despatche has quit IRC (Ping timeout: 506 seconds)
04:55 🔗 Stiletto has joined #archiveteam-ot
04:58 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
04:58 🔗 Stilett0 has joined #archiveteam-ot
04:59 🔗 Stiletto has quit IRC (Ping timeout: 260 seconds)
05:18 🔗 adinbied has quit IRC (Quit: Left Channel.)
05:32 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
05:44 🔗 odemg has joined #archiveteam-ot
09:06 🔗 BlueMax has quit IRC (Quit: Leaving)
11:53 🔗 JAA ivan: I don't remember right now. I'll look into it later and push everything to my fork.
11:53 🔗 ivan thanks
14:25 🔗 wp494 has quit IRC (Ping timeout: 492 seconds)
14:26 🔗 wp494 has joined #archiveteam-ot
14:28 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
14:28 🔗 Stiletto has joined #archiveteam-ot
15:18 🔗 ivan joyous day, ludios/wpull and grab-site@v2 aren't crashing
15:18 🔗 ivan with the parser replaced now dupespotter is using 25% of the non-idle time
16:28 🔗 djsundog has joined #archiveteam-ot
17:37 🔗 VerifiedJ has joined #archiveteam-ot
18:17 🔗 mal has quit IRC (mal)
19:32 🔗 ZizzyDizz has joined #archiveteam-ot
19:32 🔗 ZizzyDizz Hello I was just curious if you guys were aware of what's going on with google+
19:33 🔗 ZizzyDizz It appears that G+ is going to shut down - and take every single comment made on youtube with it for the past 6 years.
19:37 🔗 ivan any source for the youtube comments?
19:38 🔗 ivan that seems very unlikely
19:44 🔗 ZizzyDizz https://www.blog.google/technology/safety-security/project-strobe/ if g+ shuts down - all G+ services stop right? G+ serves the YT comments at this time.
19:45 🔗 ZizzyDizz The G+ profiles themselves contain lots of info - it's how I track down unlisted videos for my (unusual) archive.
19:45 🔗 JAA Only Google+ for consumers shuts down though. Could YouTube be a (Google-internal) enterprise customer, sort of?
19:45 🔗 ZizzyDizz I'm not sure - I've known about the breach for a couple months it's far worse than it's being reported as
19:46 🔗 ZizzyDizz YT has recently made several steps backwards I fear it too might be the next service to shutter.
19:46 🔗 ivan pretty sure youtube comments have been associated with youtube accounts for a while
19:47 🔗 ivan they don't require a G+ profile either
19:47 🔗 ZizzyDizz Hmm, you might be right on that.
19:47 🔗 ZizzyDizz Well you are right,
19:47 🔗 ivan archiving youtube comments is still a good idea though
19:47 🔗 ivan just need to get headless chrome to visit a billion pages
19:48 🔗 ZizzyDizz I mean, I understand the majority of comments are garbage
19:48 🔗 ivan there's a lot of good stuff
19:48 🔗 ivan in some areas of youtube
19:48 🔗 ZizzyDizz But every once in a while I'll find a video, for example fixing something, and someone comments "this guy made a mistake, value should be 800 not 450" etc.
19:48 🔗 ZizzyDizz Or questions and answers from the uploaders themselves
20:35 🔗 Stilett0 has joined #archiveteam-ot
20:36 🔗 Stiletto has quit IRC (Ping timeout: 252 seconds)
20:45 🔗 mal has joined #archiveteam-ot
20:46 🔗 JAA ivan: I'm sure it would be possible to archive YT comments without a headless browser. Given what I had to do to support Google+ in snscrape though, I fully expect that to be a major PITA.
20:54 🔗 Mateon1 has quit IRC (Ping timeout: 252 seconds)
20:54 🔗 Mateon1 has joined #archiveteam-ot
21:24 🔗 JAA ivan: So regarding wpull, I fixed a bunch of bugs, e.g. --concurrent not working (#339, probably not relevant in the context of grab-site), flattening consecutive slashes in URLs (#380), treatment of backslashes (#377), date parsing crashes (#376), treatment of tabs and newlines in URLs (#355), and empty ports (#340). I also added URL priorisation and splitting meta WARCs when the data WARC is split,
21:24 🔗 JAA and I rewrote part of the pipeline code to fix some ordering bugs.
21:26 🔗 JAA I did this in Jan and Feb this year, and I don't remember why I didn't publish the code.
21:27 🔗 JAA Probably because I didn't have a chance to properly test all of this.
21:28 🔗 JAA The last thing I was working on was reversing the CONNECT verb removal, which broke the youtube-dl integration in 2.0.3.
21:29 🔗 robogoat has quit IRC (Ping timeout: 260 seconds)
21:35 🔗 robogoat has joined #archiveteam-ot
21:52 🔗 JAA chfoo: See above, I think it would be good to chat about the future of wpull sometime. I did some stuff in Jan/Feb, and ivan's working on it now. Maybe we could move it to a "wpull" organisation so the three of us (plus anyone else interested) could contribute?
21:53 🔗 robogoat has quit IRC (Read error: Operation timed out)
21:57 🔗 robogoat has joined #archiveteam-ot
22:31 🔗 m007a83 has quit IRC (Read error: Connection reset by peer)
23:04 🔗 BlueMax has joined #archiveteam-ot
23:22 🔗 ZizzyDizz I actually use grab-site a lot now, I wanted to integrate youtube-dl so I could archive forums full of unlisted videos but never got around to it.
23:24 🔗 m007a83 has joined #archiveteam-ot
23:34 🔗 VerifiedJ has quit IRC (Quit: Leaving)

irclogger-viewer