#archiveteam-ot 2019-09-30,Mon

↑back Search

Time Nickname Message
00:40 🔗 ShellyRol has joined #archiveteam-ot
00:46 🔗 icedice has quit IRC (Quit: Leaving)
01:48 🔗 ScruffyB For the original Star Wars relesae, I assumed they got it from the libtary of congress copy.
01:50 🔗 markedL release when? All I remember was hearing how Lucas wants the original destroyed
02:02 🔗 ScruffyB There was a .torrent realease in 1080p a few years back.
02:02 🔗 * ScruffyB needs to watch it: technicall breaking the law if he doesn't.
02:07 🔗 GuysFree has joined #archiveteam-ot
02:11 🔗 ScruffyB May be this one, except I was under the impression they were targeting 1080p, not 4k: http://www.thestarwarstrilogy.com/page/Project-4K77
02:18 🔗 markedL doesn't sound like Library of Congress
02:27 🔗 markedL ha, spanish audio track. Hopefully with movies being released in digital preservation will be easier
02:38 🔗 markedL Sunday's the big day for bankruptcies
02:38 🔗 Raccoon has quit IRC (Quit: xyzzy)
03:23 🔗 qw3rty has joined #archiveteam-ot
03:32 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
04:10 🔗 mc2 has joined #archiveteam-ot
04:36 🔗 GuysFree has quit IRC (Quit: Connection closed for inactivity)
05:13 🔗 icedice has joined #archiveteam-ot
05:14 🔗 killsushi has joined #archiveteam-ot
05:16 🔗 icedice Would it make sense to set up an automated, encrypted backup system on a cheap dedicated server from some European hosting provider (15€/month or less) or is it just better to go with a backup service since they have two copies on different HDDs of everything (assuming that they're not shit)?
05:17 🔗 icedice Like Sync.com, for example
05:41 🔗 deevious has joined #archiveteam-ot
06:03 🔗 Raccoon has joined #archiveteam-ot
06:04 🔗 Raccoon so apparently tracker.leechers-paradise.org:6969 is working again?
06:20 🔗 systwi_ is now known as systwi
06:25 🔗 systwi icedice: I haven't done this, but I've heard one can sign up for Google Team Drive for $12/mo and get unlimited storage, and then just use some kind of encryption for your files.
06:25 🔗 systwi Raccoon: It was down before?
06:27 🔗 Raccoon https://www.techworm.net/2018/12/torrent-tracker-leechers-paradise.html
06:27 🔗 Raccoon https://www.kodivpn.co/leechers-paradise-shut-down/
06:27 🔗 Raccoon shutdown 10 months ago
06:28 🔗 systwi Huh, interesting. Glad to know they're back up
06:29 🔗 systwi Anyway, I have a question myself. I'm trying to preserve all information from YT that I can on a per-channel/per-video basis. In order for me to save data, drive reads/writes, and just time in general, I need a way to check if a video has changed at all on YT's end (e.g. like a hash YT stores of the video file). That way, my YT archival script I'm working on can see if their hash matches mine. If it mismatches, grab the
06:29 🔗 systwi new video. If it's the same, leave the video alone.
06:29 🔗 Raccoon are they? or is it big brother government
06:29 🔗 Raccoon the FBI owns the .GOV tld
06:29 🔗 Raccoon er .ORG tld
06:29 🔗 systwi Really? Didn't know that was a thing
06:30 🔗 Raccoon the US can spoof anything they want on COM NET ORG
06:30 🔗 Raccoon they do it to seize control of botnets, regularly
06:30 🔗 systwi ivan_, would you maybe know of something I could use? (regarding my YT archival/checksum question)
06:31 🔗 systwi ...
06:31 🔗 systwi traceroute leechers-paradise.org ?
06:33 🔗 Raccoon strange crap. I get put into 10. land through most of the traversal
06:34 🔗 icedice systwi: I have a "free" Google Drive .edu team drive
06:34 🔗 systwi icedice: That might be good for testing, but I've heard access is revoked when you're gone from that school
06:34 🔗 icedice I don't really want to trust my data to a US based company though
06:35 🔗 systwi Don't quote me on that, though, I've never had one
06:35 🔗 icedice I'm not in that school to begin with lol
06:35 🔗 Raccoon https://paste.ee/p/JsJbA
06:35 🔗 systwi I get ya, that'
06:35 🔗 Raccoon that's a weird tracert if i've seen one
06:35 🔗 ivan_ systwi: YouTube is going to regularly re-compress the videos or change the headers thus changing all the hashes without changing the content
06:35 🔗 systwi I get ya, that's why I feel that if you use an encrypted container for your schtuff that would suffice
06:36 🔗 ivan_ systwi: if you want to know if a video has been edited, a much better heuristic would be comparing the lengths which could catch the vast majority of edits
06:36 🔗 icedice I have an online friend who got me an account there
06:36 🔗 icedice Pretty sure he works at he university's IT staff
06:36 🔗 systwi ivan_: How frequent does such a thing happen, and what about instances where a section of audio is removed/changed (e.g. copyright issue)?
06:37 🔗 systwi I've never seen a video change length, only seen audio get removed/changed on a video
06:37 🔗 Raccoon yt channels that host live video can also edit crop their content. spacex does it a lot
06:38 🔗 markedL maybe an option is getting reencoded videos and somehow decide which copy is better quality
06:38 🔗 ivan_ systwi: do you really want to regularly redownload video or audio formats to see if they changed
06:38 🔗 systwi ivan_: No, I'd like to download new copies of videos if content in them changes, i.e. audio removed, video cropped, etc.
06:39 🔗 systwi If the video itself is the same, don't waste the bandwidth
06:39 🔗 systwi /data/time
06:39 🔗 ivan_ systwi: how are you going to know if the video changed without downloading the video
06:39 🔗 Raccoon i prefer videos where audio hasn't been cropped due to copyright strike
06:39 🔗 systwi That's why I wasn't sure if YT had a hash or date of last change to the video itself
06:40 🔗 systwi So I can check something like the .info.json and see if the value is the same or different
06:40 🔗 icedice Raccoon systwi: TLDs administrated by VeriSign, a Washington DC-based company which is controlled by the US government: .com, .net, .name, .gov, .cc, and .tv
06:40 🔗 Raccoon you might be able to pull the metadata from ffmpeg ffprobe
06:40 🔗 ivan_ systwi: I asked a YouTube person, maybe they'll get back to me
06:40 🔗 systwi Thank you ivan_, much appreciated
06:40 🔗 icedice TLDs administrated by Afilias, a company that blocked one of WikiLeaks' domains: .info, .mobi, .org, .asia, .aero, .ag, .bz, .gi, .hgn, .in, .lc, .me, .mn, .sc and .vc
06:41 🔗 systwi Raccoon: I would, however I tell youtube-dl to keep metadata off the video file
06:41 🔗 icedice (Afilias is based in Ireland)
06:41 🔗 ivan_ systwi: maybe try uploading and editing your own video and see if anything at all on the polymer or non-polymer page is different
06:41 🔗 Raccoon icedice: by your best judgement, is leechers-paradise still shut down, or did they come back? is tracker.leechers-paradise.org:6969 a cash grab to see who is pirating what?
06:41 🔗 systwi In the mean time, I'll try that
06:41 🔗 Raccoon systwi: all video files contain metadata
06:41 🔗 Raccoon date of encoding, etc
06:41 🔗 ivan_ youtube dumps a lot of stuff onto the page that youtube-dl doesn't bother saving in .info.json
06:42 🔗 Raccoon i don't think that's what youtube redacts
06:42 🔗 systwi Raccoon: Oh, I meant title, thumbnail, author, etc. Oops
06:42 🔗 Raccoon sorry, not all video files, but mpeg usually does
06:44 🔗 icedice Raccoon: I have no idea
06:45 🔗 Raccoon sadly my torrent client(s) don't have a way to strip a given tracker from all loaded torrent files
06:47 🔗 markedL https://developers.google.com/youtube/v3/getting-started#etags
06:52 🔗 ivan_ those look like etags for api responses
06:52 🔗 systwi Ok just uploaded the video
06:53 🔗 systwi Let me see what happens with an edit or two
06:57 🔗 m007a83_ is now known as m007a83
06:59 🔗 wp494 has quit IRC (Ping timeout: 255 seconds)
07:00 🔗 wp494 has joined #archiveteam-ot
07:07 🔗 systwi So what am I to look for when comparing? (The video is still changing audio btw, taking several minutes on a seven second video)
07:08 🔗 markedL could be just api responses, it's ambiguous enough that I'd want to try it because it's listed on video metadata also: https://developers.google.com/youtube/v3/docs/videos#resource
07:09 🔗 systwi Or, correction, what should I look for? I read the page on etags, but I really want to stay away from using YT's API
07:09 🔗 markedL etags are usually used in HTTP headers
07:10 🔗 systwi A quick look in the info.json doesn't show that, but it could just be youtube-dl'
07:10 🔗 systwi A quick look in the info.json doesn't show that, but it could just be youtube-dl's fault for not adding it in the file
07:10 🔗 systwi So, wget the first page or something?
07:11 🔗 markedL I don't know anything about youtube, but it would be on the headers of a video file request if it's a single file.
07:19 🔗 markedL I don't see any etags in the video headers
07:21 🔗 systwi I really hope there's SOME way to check for differences without redownloading
07:23 🔗 kiska18 has quit IRC (Remote host closed the connection)
07:24 🔗 kiska18 has joined #archiveteam-ot
07:24 🔗 Fusl____ sets mode: +o kiska18
07:24 🔗 svchfoo3 sets mode: +o kiska18
07:24 🔗 Fusl sets mode: +o kiska18
07:24 🔗 Fusl_ sets mode: +o kiska18
07:25 🔗 markedL the most promising I see is date published, byte count, and time length like ivan suggested
07:26 🔗 systwi Ughhh, well I guess byte count sounds closest, but again reencoding might alter the file size
07:27 🔗 systwi Doesn't date published stay the same? I thought that represented the date the video was originally uploaded
07:30 🔗 systwi The json does have a `filesize` variable
07:31 🔗 systwi I assume it's in bytes
07:35 🔗 markedL https://gizmodo.com/this-is-what-happens-when-you-re-upload-a-youtube-video-5555359 funny but not relevant to the problem at hand
07:39 🔗 systwi Those were the most terrifying water sounds I've ever heard in my life
07:41 🔗 markedL https://support.google.com/youtube/answer/1388383?hl=en in place video edits
07:49 🔗 systwi Hmm, thanks but that didn't seem to reveal much info on how YT documents those changes though :(
07:49 🔗 systwi So the video finally got the audio replaced on it, here are the (main) differences between the jsons:
07:50 🔗 systwi json.formats[0].filesize = 119068
07:50 🔗 systwi json.formats[0].tbr = 130.235
07:50 🔗 systwi json.abr = 128
07:51 🔗 systwi Which the bit rates could stay just the same between edits
07:52 🔗 systwi Sadly I think the filesize is the safest way to go. I was really hoping they'd store an MD5 or something on the video/audio file(s) to use for comparison, or to use a variable like `edit_number = 0`, `edit_number = 4`
07:52 🔗 systwi But then again, a reencode on YT's end will envoke a redownload anyway
07:54 🔗 systwi I'm really hoping one of YT's staff members will get back to ivan_ and give us some helpful information
07:55 🔗 ivan_ I don't know if that's particularly likely to yield helpful information
07:55 🔗 * systwi groans
07:55 🔗 systwi Do you have a rough estimate on how frequent YT reencodes their videos?
07:55 🔗 ivan_ I would guess maybe about once a year at least
07:56 🔗 systwi Maybe a few reencoded videos here or there won't be a big issue
07:56 🔗 systwi But that would really, REALLY be a waste of disk space
07:56 🔗 ivan_ maybe you should focus on something other than archiving multiple variants of videos
07:57 🔗 ivan_ it doesn't seem to have particularly high ROI
07:57 🔗 systwi Well, I mean, I'm grabbing multiple variants of everything else already
07:57 🔗 ivan_ edits are relatively rare
07:59 🔗 systwi Too bad there isn't some kind of CLI tool that would, at a bare minimum, would compare two video files and look for noticeable audio/video differences and not something like "this one pixel in this one frame is one shade darker, therefore it's entirely different"
07:59 🔗 systwi That way the process would be, check file size > redownload > tool check
08:00 🔗 systwi I guess the file size variable is better than nothing, but ewwww
08:00 🔗 systwi What a sticky situation
08:06 🔗 systwi I wish I could invoke a YT reencode
08:09 🔗 systwi But even then, while very unlikely, there is still the chance a modification could result in the same file size
08:25 🔗 bluefoo_ has joined #archiveteam-ot
08:25 🔗 bluefoo has quit IRC (Ping timeout: 252 seconds)
08:37 🔗 systwi In reality I would like to implement such a feature in my script for other sources, like Dailymotion, Vimeo, etc. but for the time being I am only focusing on YT
08:43 🔗 voker57_ has joined #archiveteam-ot
08:57 🔗 bluefoo_ has quit IRC (Read error: Connection reset by peer)
09:50 🔗 bluefoo has joined #archiveteam-ot
10:01 🔗 bluefoo has quit IRC (Read error: Connection reset by peer)
10:14 🔗 bluefoo has joined #archiveteam-ot
10:22 🔗 voltagex has joined #archiveteam-ot
10:23 🔗 voltagex I'm quite proud of these two uploads: https://archive.org/details/1194TalkingClockAustralia, https://archive.org/details/1196WeatherServiceAustralia
10:28 🔗 bluefoo has quit IRC (Read error: Operation timed out)
10:29 🔗 Igloo Anyone know how long it takes for something to appear in WBM after upload?
10:29 🔗 Kaz long time
10:29 🔗 Kaz scale is usually "multiple days"
10:31 🔗 Igloo Boo.
11:12 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
11:13 🔗 bluefoo has joined #archiveteam-ot
11:14 🔗 dashcloud has joined #archiveteam-ot
11:46 🔗 bluefoo has quit IRC (Ping timeout: 360 seconds)
12:17 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
13:42 🔗 icedice systwi: There might be a tool like that somewhere in this list: https://alternativeto.net/software/dupeguru/?platform=linux
14:00 🔗 deevious has quit IRC (Quit: deevious)
14:09 🔗 deevious has joined #archiveteam-ot
14:15 🔗 bluefoo has joined #archiveteam-ot
14:34 🔗 m007a83 has quit IRC (Read error: Connection reset by peer)
14:40 🔗 Raccoon` has joined #archiveteam-ot
14:40 🔗 Raccoon` has quit IRC (Handshake flooding)
14:40 🔗 Raccoon` has joined #archiveteam-ot
14:47 🔗 Raccoon has quit IRC (Read error: Operation timed out)
14:47 🔗 Raccoon` is now known as Raccoon
14:50 🔗 icedice has quit IRC (Ping timeout: 252 seconds)
16:18 🔗 qw3rty2 has joined #archiveteam-ot
16:18 🔗 ShellyRol has quit IRC (Read error: Operation timed out)
16:27 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
16:30 🔗 ShellyRol has joined #archiveteam-ot
19:48 🔗 Muad-Dib has joined #archiveteam-ot
19:52 🔗 systwi Thanks icedice, but the tools in there were for dealing with local duplicates, not files matching their off-site counterpart
20:02 🔗 kiskabak has quit IRC (Remote host closed the connection)
20:02 🔗 kiskabak has joined #archiveteam-ot
20:02 🔗 Fusl sets mode: +o kiskabak
20:02 🔗 Fusl____ sets mode: +o kiskabak
20:02 🔗 Fusl_ sets mode: +o kiskabak
20:23 🔗 markedL the likely hood of byte counts being the same after a mod is super low given the size of these files
21:07 🔗 tuluu has joined #archiveteam-ot
21:23 🔗 katocala has quit IRC ()
21:44 🔗 katocala has joined #archiveteam-ot
21:54 🔗 apache2 has quit IRC (Remote host closed the connection)
21:54 🔗 apache2 has joined #archiveteam-ot
22:05 🔗 BlueMax has joined #archiveteam-ot

irclogger-viewer