#archiveteam-ot 2019-10-09,Wed

↑back Search

Time Nickname Message
00:01 🔗 godane has joined #archiveteam-ot
02:49 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:15 🔗 qw3rty has joined #archiveteam-ot
03:24 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
03:39 🔗 ntntn has joined #archiveteam-ot
06:14 🔗 dhyan_nat has joined #archiveteam-ot
06:23 🔗 Mateon1 has quit IRC (Remote host closed the connection)
06:23 🔗 Mateon1 has joined #archiveteam-ot
07:18 🔗 dxrt has joined #archiveteam-ot
07:18 🔗 Fusl____ sets mode: +o dxrt
07:18 🔗 Fusl sets mode: +o dxrt
07:18 🔗 Fusl_ sets mode: +o dxrt
07:24 🔗 systwi_ is now known as systwi
08:30 🔗 systwi JAA: "File size collisions are definitely very unlikely. If you can live with a very small potential error, there's no point in doing anything else."
08:31 🔗 systwi Heh, well, I'd prefer to use a more bulletproof (err, resistant :P ) solution, but I guess the file size will have to do
08:32 🔗 systwi Taking into account how infrequent YT videos are edited by their creators, it'll do
08:33 🔗 systwi You guys don't have to spend time researching it, but if you do find a more reliable approach please let me know
09:10 🔗 ivan how many videos do you have where you have more than one variant?
10:02 🔗 voker57_ is now known as voker57
10:34 🔗 killsushi has quit IRC (Quit: Leaving)
10:36 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
10:40 🔗 qw3rty2 has joined #archiveteam-ot
10:44 🔗 ShellyRol has joined #archiveteam-ot
10:46 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
11:03 🔗 Raccoon i have tons of file size collisions for videos on my hdd. believe it's the nature of mpeg encoding
11:05 🔗 markedL are they from youtube-dl ?
11:07 🔗 Raccoon i only have a few dozen videos from youtube-dl
11:07 🔗 Raccoon no dupe sizes
11:11 🔗 Raccoon if someone has a copy of a youtube video pre-censor and post-censor, try running both videos through ffprobe to see if there are diffs
11:11 🔗 Raccoon looks like there's a CRC32 of the video and audio streams?
11:13 🔗 Raccoon if so, then just ffprobe for those values and tack them onto the filename / your database
11:28 🔗 JAA Where do you get the checksums from? If that's just ffprobe, then you'd still need to redownload to check whether something has changed.
11:29 🔗 JAA And if you redownload, you could just as well run a hash over the file or whatever.
11:29 🔗 Raccoon oh nvm, those aren't crc32 values. but maybe some other values can be sussed out that vary
11:30 🔗 Raccoon yeah, you can hash your files, but i recognize that's more time consuming than metadata readers like ffprobe
11:30 🔗 JAA If the checksum's in the headers, yes. If ffprobe calculates it from the streams, almost certainly no.
11:31 🔗 Ivy censor?
11:31 🔗 JAA And if it's in the headers (and reliable), you can also just download the headers (i.e. first few kB or so?), then do the comparison, and close the connection if it's still the same.
11:36 🔗 Raccoon ivan: i think the discussion is a revisit of youtube's ability to mute copyright audio segments or enable channel owners to edit videos
11:42 🔗 Raccoon offtopic offcolor observation. ffprobe -colors lists "NavajoWhite == #ffdead"
11:44 🔗 coderobe has quit IRC (Remote host closed the connection)
11:45 🔗 markedL I'm a little surprised someone who works on youtube-dl wouldn't know already, but I also feel it barely matters. In the face on uncertainty build the simplest / quickest thing that has a chance of working, and running that will either work or point you to what's needed to work. minimum viable product
11:47 🔗 Raccoon video editing is fairly cutting edge and rarely used
11:57 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:57 🔗 BlueMax has joined #archiveteam-ot
12:08 🔗 ntntn has quit IRC (Ping timeout: 260 seconds)
12:10 🔗 schbirid has quit IRC (Remote host closed the connection)
12:17 🔗 kiska I am going to Melbourne tomorrow. That means I get to experiment with znc
12:18 🔗 ntntn has joined #archiveteam-ot
12:19 🔗 coderobe has joined #archiveteam-ot
12:25 🔗 ntntn re: 'yes anysoftkeyboard is nice', it's ok. it's no Hacker's Keyboard. having the workman thing is a very good bonus, but i'm really using it cos i needed a split layout
12:29 🔗 ntntn has quit IRC ()
12:38 🔗 bluefoo_ has quit IRC (Ping timeout: 745 seconds)
13:16 🔗 systwi ivan: I haven't saved any videos yet, as I'm still building my script, so I can't tell.
13:18 🔗 systwi Yes, Raccoon, this is because of YT muting audio and ability to edit the video after its been uploaded. Thanks YT, NOT what archivists need
13:20 🔗 systwi I also agree, markedL, I would think/hope one of the ytdl devs would know. Also, the way I do my programming is I try to get it perfect the first time. I don't mean to sound rude, it's just that I try to cover any and all bases from the start,
13:23 🔗 ivan what are you programming in
13:25 🔗 Raccoon you still need development testing, revisions, and re-re-revisions. might as well do some interactive coding / practice gets
13:26 🔗 JAA If you want to be absolutely sure, redownload and compare hashes. That's the only way to be really certain.
13:27 🔗 ivan the only way to archive things perfectly is to replicate the remote database exactly and run your own youtube-equivalent viewer
13:28 🔗 ivan if you run youtube-dl long enough you will run into things like youtube-dl extracting incorrect metadata
13:28 🔗 markedL depends on whether you want the new encoding formats to flagged as new content or same content
13:28 🔗 JAA Yeah, if you want to detect video edits as opposed to reencoding, well, good luck.
13:29 🔗 ivan if you want to waste less of your time do the thing that yields the highest ROI by capturing the most important stuff in a format probably usable in the future
13:29 🔗 bluefoo has joined #archiveteam-ot
13:31 🔗 Raccoon there might be some metadata page scraping to find mentions of copyright mutings and author editing
13:31 🔗 ivan you'll also see things like youtube videos changing channel or usernames (e.g. whitehouse) becoming a different channel
13:32 🔗 Raccoon i think youtube is transparent about this so google doesn't get accused and abused for source citation tampering etc
13:33 🔗 Raccoon wouldn't want a video with 20 million views to start editing in obscene material, or selling videos with 20 million views to the highest bidder to replace with their own content
13:51 🔗 markedL how about inserting tangential video clips at regular intervals, from the highest bidder
13:52 🔗 markedL AKA ads
14:18 🔗 DogsRNice has joined #archiveteam-ot
14:22 🔗 DogsRNice has quit IRC (Remote host closed the connection)
14:22 🔗 DogsRNice has joined #archiveteam-ot
14:24 🔗 DogsRNice has quit IRC (Remote host closed the connection)
14:24 🔗 DogsRNice has joined #archiveteam-ot
15:39 🔗 deevious has quit IRC (Quit: deevious)
16:21 🔗 Raccoon has quit IRC (Ping timeout: 258 seconds)
16:40 🔗 icedice has joined #archiveteam-ot
16:47 🔗 ats has quit IRC (leaving)
17:03 🔗 ats has joined #archiveteam-ot
17:14 🔗 VADemon has quit IRC (Quit: left4dead)
17:18 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
18:02 🔗 icedice Windows 10 has now defragmented my SSD so that it's back at 0% fragmentation, so that's nice
18:03 🔗 icedice It would be nice if Storage Optimizer could be run manually, but I suppose once a month is enough for most users
18:41 🔗 Kaz Don't see the issue? https://usercontent.irccloud-cdn.com/file/OVaEyaG1/image.png
19:01 🔗 bluefoo has quit IRC (Ping timeout: 496 seconds)
19:51 🔗 bluefoo has joined #archiveteam-ot
20:14 🔗 Meroje has quit IRC (Quit: bye!)
20:22 🔗 Ivy You should not be literally defragging an SSD anyways
20:23 🔗 Meroje has joined #archiveteam-ot
20:26 🔗 bluefoo has quit IRC (Ping timeout: 252 seconds)
20:30 🔗 Kaz lets not stir that pot again, we had that the other day
20:30 🔗 Ivy oh sorry, didn't know
20:32 🔗 bluefoo has joined #archiveteam-ot
21:28 🔗 icedice Ivy: It's an automatic intelligent defrag thing that Windows 10 runs once a month on all Windows 10 computers that have Storage Sense enabled
21:29 🔗 icedice https://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx
21:30 🔗 icedice "Storage Optimizer will defrag an SSD once a month if volume snapshots are enabled. This is by design and necessary due to slow volsnap copy on write performance on fragmented SSD volumes."
21:30 🔗 icedice -A developer on the Windows storage team
21:30 🔗 icedice Edit: Ok, it was volume snapshots and not Storage Sense, I misremembered that detail
21:52 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
21:57 🔗 Meroje has quit IRC (Quit: bye!)
21:58 🔗 Meroje has joined #archiveteam-ot
22:02 🔗 Meroje has quit IRC (Client Quit)
22:02 🔗 Meroje has joined #archiveteam-ot
22:06 🔗 Meroje has quit IRC (Client Quit)
22:06 🔗 Meroje has joined #archiveteam-ot
22:10 🔗 Meroje has quit IRC (Client Quit)
22:10 🔗 Meroje has joined #archiveteam-ot
22:11 🔗 NickN00b has joined #archiveteam-ot
22:14 🔗 Meroje has quit IRC (Client Quit)
22:14 🔗 Meroje has joined #archiveteam-ot
22:28 🔗 ats_ has joined #archiveteam-ot
22:29 🔗 ats has quit IRC (Read error: Operation timed out)
22:32 🔗 ats_ has quit IRC (Read error: Operation timed out)
22:59 🔗 phillipsj has joined #archiveteam-ot
23:06 🔗 ScruffyB has quit IRC (Read error: Operation timed out)
23:12 🔗 BlueMax has joined #archiveteam-ot
23:27 🔗 ats has joined #archiveteam-ot
23:27 🔗 systwi ivan: It'
23:27 🔗 systwi ivan: It's written in bash
23:30 🔗 systwi And the formats I use are .json (metadata), .txt (description), .xml (annotations), .jpg/.png (thumbnail), .mkv (video), .vtt (subtitles)
23:31 🔗 systwi And yes, I've also taken into account changing usernames, hence why the folders consist of the channel id and display name in brackets, which will also change as the user modifies it.
23:32 🔗 systwi UCeR0n8d3ShTn_yrMhpwyE1Q [TheReportOfTheWeek]
23:33 🔗 systwi My script only cares about the beginning 24 characters. If the name changes, so does the folder
23:33 🔗 systwi UCeR0n8d3ShTn_yrMhpwyE1Q [ROTW]

irclogger-viewer