Time |
Nickname |
Message |
00:15
🔗
|
|
Stilett0 has joined #archiveteam-ot |
00:44
🔗
|
|
BlueMax has joined #archiveteam-ot |
00:47
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
01:13
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
01:13
🔗
|
|
dashcloud has joined #archiveteam-ot |
01:22
🔗
|
|
mgrytbak_ has joined #archiveteam-ot |
01:24
🔗
|
|
mgrytbak has quit IRC (Ping timeout: 492 seconds) |
02:35
🔗
|
|
Stiletto has joined #archiveteam-ot |
02:40
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 492 seconds) |
03:19
🔗
|
|
Stilett0 has joined #archiveteam-ot |
03:21
🔗
|
|
Stiletto has quit IRC (Ping timeout: 268 seconds) |
03:31
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
03:44
🔗
|
|
odemg has joined #archiveteam-ot |
04:13
🔗
|
|
Despatche has joined #archiveteam-ot |
04:26
🔗
|
|
Despatche has quit IRC (Ping timeout: 633 seconds) |
04:27
🔗
|
|
Despatche has joined #archiveteam-ot |
04:36
🔗
|
|
Despatche has quit IRC (Ping timeout: 252 seconds) |
04:36
🔗
|
|
Despatche has joined #archiveteam-ot |
04:48
🔗
|
|
Despatche has quit IRC (Ping timeout: 506 seconds) |
04:55
🔗
|
|
Stiletto has joined #archiveteam-ot |
04:58
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
04:58
🔗
|
|
Stilett0 has joined #archiveteam-ot |
04:59
🔗
|
|
Stiletto has quit IRC (Ping timeout: 260 seconds) |
05:18
🔗
|
|
adinbied has quit IRC (Quit: Left Channel.) |
05:32
🔗
|
|
odemg has quit IRC (Ping timeout: 260 seconds) |
05:44
🔗
|
|
odemg has joined #archiveteam-ot |
09:06
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
11:53
🔗
|
JAA |
ivan: I don't remember right now. I'll look into it later and push everything to my fork. |
11:53
🔗
|
ivan |
thanks |
14:25
🔗
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
14:26
🔗
|
|
wp494 has joined #archiveteam-ot |
14:28
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
14:28
🔗
|
|
Stiletto has joined #archiveteam-ot |
15:18
🔗
|
ivan |
joyous day, ludios/wpull and grab-site@v2 aren't crashing |
15:18
🔗
|
ivan |
with the parser replaced now dupespotter is using 25% of the non-idle time |
16:28
🔗
|
|
djsundog has joined #archiveteam-ot |
17:37
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
18:17
🔗
|
|
mal has quit IRC (mal) |
19:32
🔗
|
|
ZizzyDizz has joined #archiveteam-ot |
19:32
🔗
|
ZizzyDizz |
Hello I was just curious if you guys were aware of what's going on with google+ |
19:33
🔗
|
ZizzyDizz |
It appears that G+ is going to shut down - and take every single comment made on youtube with it for the past 6 years. |
19:37
🔗
|
ivan |
any source for the youtube comments? |
19:38
🔗
|
ivan |
that seems very unlikely |
19:44
🔗
|
ZizzyDizz |
https://www.blog.google/technology/safety-security/project-strobe/ if g+ shuts down - all G+ services stop right? G+ serves the YT comments at this time. |
19:45
🔗
|
ZizzyDizz |
The G+ profiles themselves contain lots of info - it's how I track down unlisted videos for my (unusual) archive. |
19:45
🔗
|
JAA |
Only Google+ for consumers shuts down though. Could YouTube be a (Google-internal) enterprise customer, sort of? |
19:45
🔗
|
ZizzyDizz |
I'm not sure - I've known about the breach for a couple months it's far worse than it's being reported as |
19:46
🔗
|
ZizzyDizz |
YT has recently made several steps backwards I fear it too might be the next service to shutter. |
19:46
🔗
|
ivan |
pretty sure youtube comments have been associated with youtube accounts for a while |
19:47
🔗
|
ivan |
they don't require a G+ profile either |
19:47
🔗
|
ZizzyDizz |
Hmm, you might be right on that. |
19:47
🔗
|
ZizzyDizz |
Well you are right, |
19:47
🔗
|
ivan |
archiving youtube comments is still a good idea though |
19:47
🔗
|
ivan |
just need to get headless chrome to visit a billion pages |
19:48
🔗
|
ZizzyDizz |
I mean, I understand the majority of comments are garbage |
19:48
🔗
|
ivan |
there's a lot of good stuff |
19:48
🔗
|
ivan |
in some areas of youtube |
19:48
🔗
|
ZizzyDizz |
But every once in a while I'll find a video, for example fixing something, and someone comments "this guy made a mistake, value should be 800 not 450" etc. |
19:48
🔗
|
ZizzyDizz |
Or questions and answers from the uploaders themselves |
20:35
🔗
|
|
Stilett0 has joined #archiveteam-ot |
20:36
🔗
|
|
Stiletto has quit IRC (Ping timeout: 252 seconds) |
20:45
🔗
|
|
mal has joined #archiveteam-ot |
20:46
🔗
|
JAA |
ivan: I'm sure it would be possible to archive YT comments without a headless browser. Given what I had to do to support Google+ in snscrape though, I fully expect that to be a major PITA. |
20:54
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 252 seconds) |
20:54
🔗
|
|
Mateon1 has joined #archiveteam-ot |
21:24
🔗
|
JAA |
ivan: So regarding wpull, I fixed a bunch of bugs, e.g. --concurrent not working (#339, probably not relevant in the context of grab-site), flattening consecutive slashes in URLs (#380), treatment of backslashes (#377), date parsing crashes (#376), treatment of tabs and newlines in URLs (#355), and empty ports (#340). I also added URL priorisation and splitting meta WARCs when the data WARC is split, |
21:24
🔗
|
JAA |
and I rewrote part of the pipeline code to fix some ordering bugs. |
21:26
🔗
|
JAA |
I did this in Jan and Feb this year, and I don't remember why I didn't publish the code. |
21:27
🔗
|
JAA |
Probably because I didn't have a chance to properly test all of this. |
21:28
🔗
|
JAA |
The last thing I was working on was reversing the CONNECT verb removal, which broke the youtube-dl integration in 2.0.3. |
21:29
🔗
|
|
robogoat has quit IRC (Ping timeout: 260 seconds) |
21:35
🔗
|
|
robogoat has joined #archiveteam-ot |
21:52
🔗
|
JAA |
chfoo: See above, I think it would be good to chat about the future of wpull sometime. I did some stuff in Jan/Feb, and ivan's working on it now. Maybe we could move it to a "wpull" organisation so the three of us (plus anyone else interested) could contribute? |
21:53
🔗
|
|
robogoat has quit IRC (Read error: Operation timed out) |
21:57
🔗
|
|
robogoat has joined #archiveteam-ot |
22:31
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
23:04
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:22
🔗
|
ZizzyDizz |
I actually use grab-site a lot now, I wanted to integrate youtube-dl so I could archive forums full of unlisted videos but never got around to it. |
23:24
🔗
|
|
m007a83 has joined #archiveteam-ot |
23:34
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |