[00:40] *** ShellyRol has joined #archiveteam-ot [00:46] *** icedice has quit IRC (Quit: Leaving) [01:48] For the original Star Wars relesae, I assumed they got it from the libtary of congress copy. [01:50] release when? All I remember was hearing how Lucas wants the original destroyed [02:02] There was a .torrent realease in 1080p a few years back. [02:02] * ScruffyB needs to watch it: technicall breaking the law if he doesn't. [02:07] *** GuysFree has joined #archiveteam-ot [02:11] May be this one, except I was under the impression they were targeting 1080p, not 4k: http://www.thestarwarstrilogy.com/page/Project-4K77 [02:18] doesn't sound like Library of Congress [02:27] ha, spanish audio track. Hopefully with movies being released in digital preservation will be easier [02:38] Sunday's the big day for bankruptcies [02:38] *** Raccoon has quit IRC (Quit: xyzzy) [03:23] *** qw3rty has joined #archiveteam-ot [03:32] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [04:10] *** mc2 has joined #archiveteam-ot [04:36] *** GuysFree has quit IRC (Quit: Connection closed for inactivity) [05:13] *** icedice has joined #archiveteam-ot [05:14] *** killsushi has joined #archiveteam-ot [05:16] Would it make sense to set up an automated, encrypted backup system on a cheap dedicated server from some European hosting provider (15€/month or less) or is it just better to go with a backup service since they have two copies on different HDDs of everything (assuming that they're not shit)? [05:17] Like Sync.com, for example [05:41] *** deevious has joined #archiveteam-ot [06:03] *** Raccoon has joined #archiveteam-ot [06:04] so apparently tracker.leechers-paradise.org:6969 is working again? [06:20] *** systwi_ is now known as systwi [06:25] icedice: I haven't done this, but I've heard one can sign up for Google Team Drive for $12/mo and get unlimited storage, and then just use some kind of encryption for your files. [06:25] Raccoon: It was down before? [06:27] https://www.techworm.net/2018/12/torrent-tracker-leechers-paradise.html [06:27] https://www.kodivpn.co/leechers-paradise-shut-down/ [06:27] shutdown 10 months ago [06:28] Huh, interesting. Glad to know they're back up [06:29] Anyway, I have a question myself. I'm trying to preserve all information from YT that I can on a per-channel/per-video basis. In order for me to save data, drive reads/writes, and just time in general, I need a way to check if a video has changed at all on YT's end (e.g. like a hash YT stores of the video file). That way, my YT archival script I'm working on can see if their hash matches mine. If it mismatches, grab the [06:29] new video. If it's the same, leave the video alone. [06:29] are they? or is it big brother government [06:29] the FBI owns the .GOV tld [06:29] er .ORG tld [06:29] Really? Didn't know that was a thing [06:30] the US can spoof anything they want on COM NET ORG [06:30] they do it to seize control of botnets, regularly [06:30] ivan_, would you maybe know of something I could use? (regarding my YT archival/checksum question) [06:31] ... [06:31] traceroute leechers-paradise.org ? [06:33] strange crap. I get put into 10. land through most of the traversal [06:34] systwi: I have a "free" Google Drive .edu team drive [06:34] icedice: That might be good for testing, but I've heard access is revoked when you're gone from that school [06:34] I don't really want to trust my data to a US based company though [06:35] Don't quote me on that, though, I've never had one [06:35] I'm not in that school to begin with lol [06:35] https://paste.ee/p/JsJbA [06:35] I get ya, that' [06:35] that's a weird tracert if i've seen one [06:35] systwi: YouTube is going to regularly re-compress the videos or change the headers thus changing all the hashes without changing the content [06:35] I get ya, that's why I feel that if you use an encrypted container for your schtuff that would suffice [06:36] systwi: if you want to know if a video has been edited, a much better heuristic would be comparing the lengths which could catch the vast majority of edits [06:36] I have an online friend who got me an account there [06:36] Pretty sure he works at he university's IT staff [06:36] ivan_: How frequent does such a thing happen, and what about instances where a section of audio is removed/changed (e.g. copyright issue)? [06:37] I've never seen a video change length, only seen audio get removed/changed on a video [06:37] yt channels that host live video can also edit crop their content. spacex does it a lot [06:38] maybe an option is getting reencoded videos and somehow decide which copy is better quality [06:38] systwi: do you really want to regularly redownload video or audio formats to see if they changed [06:38] ivan_: No, I'd like to download new copies of videos if content in them changes, i.e. audio removed, video cropped, etc. [06:39] If the video itself is the same, don't waste the bandwidth [06:39] /data/time [06:39] systwi: how are you going to know if the video changed without downloading the video [06:39] i prefer videos where audio hasn't been cropped due to copyright strike [06:39] That's why I wasn't sure if YT had a hash or date of last change to the video itself [06:40] So I can check something like the .info.json and see if the value is the same or different [06:40] Raccoon systwi: TLDs administrated by VeriSign, a Washington DC-based company which is controlled by the US government: .com, .net, .name, .gov, .cc, and .tv [06:40] you might be able to pull the metadata from ffmpeg ffprobe [06:40] systwi: I asked a YouTube person, maybe they'll get back to me [06:40] Thank you ivan_, much appreciated [06:40] TLDs administrated by Afilias, a company that blocked one of WikiLeaks' domains: .info, .mobi, .org, .asia, .aero, .ag, .bz, .gi, .hgn, .in, .lc, .me, .mn, .sc and .vc [06:41] Raccoon: I would, however I tell youtube-dl to keep metadata off the video file [06:41] (Afilias is based in Ireland) [06:41] systwi: maybe try uploading and editing your own video and see if anything at all on the polymer or non-polymer page is different [06:41] icedice: by your best judgement, is leechers-paradise still shut down, or did they come back? is tracker.leechers-paradise.org:6969 a cash grab to see who is pirating what? [06:41] In the mean time, I'll try that [06:41] systwi: all video files contain metadata [06:41] date of encoding, etc [06:41] youtube dumps a lot of stuff onto the page that youtube-dl doesn't bother saving in .info.json [06:42] i don't think that's what youtube redacts [06:42] Raccoon: Oh, I meant title, thumbnail, author, etc. Oops [06:42] sorry, not all video files, but mpeg usually does [06:44] Raccoon: I have no idea [06:45] sadly my torrent client(s) don't have a way to strip a given tracker from all loaded torrent files [06:47] https://developers.google.com/youtube/v3/getting-started#etags [06:52] those look like etags for api responses [06:52] Ok just uploaded the video [06:53] Let me see what happens with an edit or two [06:57] *** m007a83_ is now known as m007a83 [06:59] *** wp494 has quit IRC (Ping timeout: 255 seconds) [07:00] *** wp494 has joined #archiveteam-ot [07:07] So what am I to look for when comparing? (The video is still changing audio btw, taking several minutes on a seven second video) [07:08] could be just api responses, it's ambiguous enough that I'd want to try it because it's listed on video metadata also: https://developers.google.com/youtube/v3/docs/videos#resource [07:09] Or, correction, what should I look for? I read the page on etags, but I really want to stay away from using YT's API [07:09] etags are usually used in HTTP headers [07:10] A quick look in the info.json doesn't show that, but it could just be youtube-dl' [07:10] A quick look in the info.json doesn't show that, but it could just be youtube-dl's fault for not adding it in the file [07:10] So, wget the first page or something? [07:11] I don't know anything about youtube, but it would be on the headers of a video file request if it's a single file. [07:19] I don't see any etags in the video headers [07:21] I really hope there's SOME way to check for differences without redownloading [07:23] *** kiska18 has quit IRC (Remote host closed the connection) [07:24] *** kiska18 has joined #archiveteam-ot [07:24] *** Fusl____ sets mode: +o kiska18 [07:24] *** svchfoo3 sets mode: +o kiska18 [07:24] *** Fusl sets mode: +o kiska18 [07:24] *** Fusl_ sets mode: +o kiska18 [07:25] the most promising I see is date published, byte count, and time length like ivan suggested [07:26] Ughhh, well I guess byte count sounds closest, but again reencoding might alter the file size [07:27] Doesn't date published stay the same? I thought that represented the date the video was originally uploaded [07:30] The json does have a `filesize` variable [07:31] I assume it's in bytes [07:35] https://gizmodo.com/this-is-what-happens-when-you-re-upload-a-youtube-video-5555359 funny but not relevant to the problem at hand [07:39] Those were the most terrifying water sounds I've ever heard in my life [07:41] https://support.google.com/youtube/answer/1388383?hl=en in place video edits [07:49] Hmm, thanks but that didn't seem to reveal much info on how YT documents those changes though :( [07:49] So the video finally got the audio replaced on it, here are the (main) differences between the jsons: [07:50] json.formats[0].filesize = 119068 [07:50] json.formats[0].tbr = 130.235 [07:50] json.abr = 128 [07:51] Which the bit rates could stay just the same between edits [07:52] Sadly I think the filesize is the safest way to go. I was really hoping they'd store an MD5 or something on the video/audio file(s) to use for comparison, or to use a variable like `edit_number = 0`, `edit_number = 4` [07:52] But then again, a reencode on YT's end will envoke a redownload anyway [07:54] I'm really hoping one of YT's staff members will get back to ivan_ and give us some helpful information [07:55] I don't know if that's particularly likely to yield helpful information [07:55] * systwi groans [07:55] Do you have a rough estimate on how frequent YT reencodes their videos? [07:55] I would guess maybe about once a year at least [07:56] Maybe a few reencoded videos here or there won't be a big issue [07:56] But that would really, REALLY be a waste of disk space [07:56] maybe you should focus on something other than archiving multiple variants of videos [07:57] it doesn't seem to have particularly high ROI [07:57] Well, I mean, I'm grabbing multiple variants of everything else already [07:57] edits are relatively rare [07:59] Too bad there isn't some kind of CLI tool that would, at a bare minimum, would compare two video files and look for noticeable audio/video differences and not something like "this one pixel in this one frame is one shade darker, therefore it's entirely different" [07:59] That way the process would be, check file size > redownload > tool check [08:00] I guess the file size variable is better than nothing, but ewwww [08:00] What a sticky situation [08:06] I wish I could invoke a YT reencode [08:09] But even then, while very unlikely, there is still the chance a modification could result in the same file size [08:25] *** bluefoo_ has joined #archiveteam-ot [08:25] *** bluefoo has quit IRC (Ping timeout: 252 seconds) [08:37] In reality I would like to implement such a feature in my script for other sources, like Dailymotion, Vimeo, etc. but for the time being I am only focusing on YT [08:43] *** voker57_ has joined #archiveteam-ot [08:57] *** bluefoo_ has quit IRC (Read error: Connection reset by peer) [09:50] *** bluefoo has joined #archiveteam-ot [10:01] *** bluefoo has quit IRC (Read error: Connection reset by peer) [10:14] *** bluefoo has joined #archiveteam-ot [10:22] *** voltagex has joined #archiveteam-ot [10:23] I'm quite proud of these two uploads: https://archive.org/details/1194TalkingClockAustralia, https://archive.org/details/1196WeatherServiceAustralia [10:28] *** bluefoo has quit IRC (Read error: Operation timed out) [10:29] Anyone know how long it takes for something to appear in WBM after upload? [10:29] long time [10:29] scale is usually "multiple days" [10:31] Boo. [11:12] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [11:13] *** bluefoo has joined #archiveteam-ot [11:14] *** dashcloud has joined #archiveteam-ot [11:46] *** bluefoo has quit IRC (Ping timeout: 360 seconds) [12:17] *** BlueMax has quit IRC (Read error: Connection reset by peer) [13:42] systwi: There might be a tool like that somewhere in this list: https://alternativeto.net/software/dupeguru/?platform=linux [14:00] *** deevious has quit IRC (Quit: deevious) [14:09] *** deevious has joined #archiveteam-ot [14:15] *** bluefoo has joined #archiveteam-ot [14:34] *** m007a83 has quit IRC (Read error: Connection reset by peer) [14:40] *** Raccoon` has joined #archiveteam-ot [14:40] *** Raccoon` has quit IRC (Handshake flooding) [14:40] *** Raccoon` has joined #archiveteam-ot [14:47] *** Raccoon has quit IRC (Read error: Operation timed out) [14:47] *** Raccoon` is now known as Raccoon [14:50] *** icedice has quit IRC (Ping timeout: 252 seconds) [16:18] *** qw3rty2 has joined #archiveteam-ot [16:18] *** ShellyRol has quit IRC (Read error: Operation timed out) [16:27] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [16:30] *** ShellyRol has joined #archiveteam-ot [19:48] *** Muad-Dib has joined #archiveteam-ot [19:52] Thanks icedice, but the tools in there were for dealing with local duplicates, not files matching their off-site counterpart [20:02] *** kiskabak has quit IRC (Remote host closed the connection) [20:02] *** kiskabak has joined #archiveteam-ot [20:02] *** Fusl sets mode: +o kiskabak [20:02] *** Fusl____ sets mode: +o kiskabak [20:02] *** Fusl_ sets mode: +o kiskabak [20:23] the likely hood of byte counts being the same after a mod is super low given the size of these files [21:07] *** tuluu has joined #archiveteam-ot [21:23] *** katocala has quit IRC () [21:44] *** katocala has joined #archiveteam-ot [21:54] *** apache2 has quit IRC (Remote host closed the connection) [21:54] *** apache2 has joined #archiveteam-ot [22:05] *** BlueMax has joined #archiveteam-ot