Time |
Nickname |
Message |
00:40
🔗
|
|
ShellyRol has joined #archiveteam-ot |
00:46
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
01:48
🔗
|
ScruffyB |
For the original Star Wars relesae, I assumed they got it from the libtary of congress copy. |
01:50
🔗
|
markedL |
release when? All I remember was hearing how Lucas wants the original destroyed |
02:02
🔗
|
ScruffyB |
There was a .torrent realease in 1080p a few years back. |
02:02
🔗
|
* |
ScruffyB needs to watch it: technicall breaking the law if he doesn't. |
02:07
🔗
|
|
GuysFree has joined #archiveteam-ot |
02:11
🔗
|
ScruffyB |
May be this one, except I was under the impression they were targeting 1080p, not 4k: http://www.thestarwarstrilogy.com/page/Project-4K77 |
02:18
🔗
|
markedL |
doesn't sound like Library of Congress |
02:27
🔗
|
markedL |
ha, spanish audio track. Hopefully with movies being released in digital preservation will be easier |
02:38
🔗
|
markedL |
Sunday's the big day for bankruptcies |
02:38
🔗
|
|
Raccoon has quit IRC (Quit: xyzzy) |
03:23
🔗
|
|
qw3rty has joined #archiveteam-ot |
03:32
🔗
|
|
qw3rty2 has quit IRC (Ping timeout: 745 seconds) |
04:10
🔗
|
|
mc2 has joined #archiveteam-ot |
04:36
🔗
|
|
GuysFree has quit IRC (Quit: Connection closed for inactivity) |
05:13
🔗
|
|
icedice has joined #archiveteam-ot |
05:14
🔗
|
|
killsushi has joined #archiveteam-ot |
05:16
🔗
|
icedice |
Would it make sense to set up an automated, encrypted backup system on a cheap dedicated server from some European hosting provider (15€/month or less) or is it just better to go with a backup service since they have two copies on different HDDs of everything (assuming that they're not shit)? |
05:17
🔗
|
icedice |
Like Sync.com, for example |
05:41
🔗
|
|
deevious has joined #archiveteam-ot |
06:03
🔗
|
|
Raccoon has joined #archiveteam-ot |
06:04
🔗
|
Raccoon |
so apparently tracker.leechers-paradise.org:6969 is working again? |
06:20
🔗
|
|
systwi_ is now known as systwi |
06:25
🔗
|
systwi |
icedice: I haven't done this, but I've heard one can sign up for Google Team Drive for $12/mo and get unlimited storage, and then just use some kind of encryption for your files. |
06:25
🔗
|
systwi |
Raccoon: It was down before? |
06:27
🔗
|
Raccoon |
https://www.techworm.net/2018/12/torrent-tracker-leechers-paradise.html |
06:27
🔗
|
Raccoon |
https://www.kodivpn.co/leechers-paradise-shut-down/ |
06:27
🔗
|
Raccoon |
shutdown 10 months ago |
06:28
🔗
|
systwi |
Huh, interesting. Glad to know they're back up |
06:29
🔗
|
systwi |
Anyway, I have a question myself. I'm trying to preserve all information from YT that I can on a per-channel/per-video basis. In order for me to save data, drive reads/writes, and just time in general, I need a way to check if a video has changed at all on YT's end (e.g. like a hash YT stores of the video file). That way, my YT archival script I'm working on can see if their hash matches mine. If it mismatches, grab the |
06:29
🔗
|
systwi |
new video. If it's the same, leave the video alone. |
06:29
🔗
|
Raccoon |
are they? or is it big brother government |
06:29
🔗
|
Raccoon |
the FBI owns the .GOV tld |
06:29
🔗
|
Raccoon |
er .ORG tld |
06:29
🔗
|
systwi |
Really? Didn't know that was a thing |
06:30
🔗
|
Raccoon |
the US can spoof anything they want on COM NET ORG |
06:30
🔗
|
Raccoon |
they do it to seize control of botnets, regularly |
06:30
🔗
|
systwi |
ivan_, would you maybe know of something I could use? (regarding my YT archival/checksum question) |
06:31
🔗
|
systwi |
... |
06:31
🔗
|
systwi |
traceroute leechers-paradise.org ? |
06:33
🔗
|
Raccoon |
strange crap. I get put into 10. land through most of the traversal |
06:34
🔗
|
icedice |
systwi: I have a "free" Google Drive .edu team drive |
06:34
🔗
|
systwi |
icedice: That might be good for testing, but I've heard access is revoked when you're gone from that school |
06:34
🔗
|
icedice |
I don't really want to trust my data to a US based company though |
06:35
🔗
|
systwi |
Don't quote me on that, though, I've never had one |
06:35
🔗
|
icedice |
I'm not in that school to begin with lol |
06:35
🔗
|
Raccoon |
https://paste.ee/p/JsJbA |
06:35
🔗
|
systwi |
I get ya, that' |
06:35
🔗
|
Raccoon |
that's a weird tracert if i've seen one |
06:35
🔗
|
ivan_ |
systwi: YouTube is going to regularly re-compress the videos or change the headers thus changing all the hashes without changing the content |
06:35
🔗
|
systwi |
I get ya, that's why I feel that if you use an encrypted container for your schtuff that would suffice |
06:36
🔗
|
ivan_ |
systwi: if you want to know if a video has been edited, a much better heuristic would be comparing the lengths which could catch the vast majority of edits |
06:36
🔗
|
icedice |
I have an online friend who got me an account there |
06:36
🔗
|
icedice |
Pretty sure he works at he university's IT staff |
06:36
🔗
|
systwi |
ivan_: How frequent does such a thing happen, and what about instances where a section of audio is removed/changed (e.g. copyright issue)? |
06:37
🔗
|
systwi |
I've never seen a video change length, only seen audio get removed/changed on a video |
06:37
🔗
|
Raccoon |
yt channels that host live video can also edit crop their content. spacex does it a lot |
06:38
🔗
|
markedL |
maybe an option is getting reencoded videos and somehow decide which copy is better quality |
06:38
🔗
|
ivan_ |
systwi: do you really want to regularly redownload video or audio formats to see if they changed |
06:38
🔗
|
systwi |
ivan_: No, I'd like to download new copies of videos if content in them changes, i.e. audio removed, video cropped, etc. |
06:39
🔗
|
systwi |
If the video itself is the same, don't waste the bandwidth |
06:39
🔗
|
systwi |
/data/time |
06:39
🔗
|
ivan_ |
systwi: how are you going to know if the video changed without downloading the video |
06:39
🔗
|
Raccoon |
i prefer videos where audio hasn't been cropped due to copyright strike |
06:39
🔗
|
systwi |
That's why I wasn't sure if YT had a hash or date of last change to the video itself |
06:40
🔗
|
systwi |
So I can check something like the .info.json and see if the value is the same or different |
06:40
🔗
|
icedice |
Raccoon systwi: TLDs administrated by VeriSign, a Washington DC-based company which is controlled by the US government: .com, .net, .name, .gov, .cc, and .tv |
06:40
🔗
|
Raccoon |
you might be able to pull the metadata from ffmpeg ffprobe |
06:40
🔗
|
ivan_ |
systwi: I asked a YouTube person, maybe they'll get back to me |
06:40
🔗
|
systwi |
Thank you ivan_, much appreciated |
06:40
🔗
|
icedice |
TLDs administrated by Afilias, a company that blocked one of WikiLeaks' domains: .info, .mobi, .org, .asia, .aero, .ag, .bz, .gi, .hgn, .in, .lc, .me, .mn, .sc and .vc |
06:41
🔗
|
systwi |
Raccoon: I would, however I tell youtube-dl to keep metadata off the video file |
06:41
🔗
|
icedice |
(Afilias is based in Ireland) |
06:41
🔗
|
ivan_ |
systwi: maybe try uploading and editing your own video and see if anything at all on the polymer or non-polymer page is different |
06:41
🔗
|
Raccoon |
icedice: by your best judgement, is leechers-paradise still shut down, or did they come back? is tracker.leechers-paradise.org:6969 a cash grab to see who is pirating what? |
06:41
🔗
|
systwi |
In the mean time, I'll try that |
06:41
🔗
|
Raccoon |
systwi: all video files contain metadata |
06:41
🔗
|
Raccoon |
date of encoding, etc |
06:41
🔗
|
ivan_ |
youtube dumps a lot of stuff onto the page that youtube-dl doesn't bother saving in .info.json |
06:42
🔗
|
Raccoon |
i don't think that's what youtube redacts |
06:42
🔗
|
systwi |
Raccoon: Oh, I meant title, thumbnail, author, etc. Oops |
06:42
🔗
|
Raccoon |
sorry, not all video files, but mpeg usually does |
06:44
🔗
|
icedice |
Raccoon: I have no idea |
06:45
🔗
|
Raccoon |
sadly my torrent client(s) don't have a way to strip a given tracker from all loaded torrent files |
06:47
🔗
|
markedL |
https://developers.google.com/youtube/v3/getting-started#etags |
06:52
🔗
|
ivan_ |
those look like etags for api responses |
06:52
🔗
|
systwi |
Ok just uploaded the video |
06:53
🔗
|
systwi |
Let me see what happens with an edit or two |
06:57
🔗
|
|
m007a83_ is now known as m007a83 |
06:59
🔗
|
|
wp494 has quit IRC (Ping timeout: 255 seconds) |
07:00
🔗
|
|
wp494 has joined #archiveteam-ot |
07:07
🔗
|
systwi |
So what am I to look for when comparing? (The video is still changing audio btw, taking several minutes on a seven second video) |
07:08
🔗
|
markedL |
could be just api responses, it's ambiguous enough that I'd want to try it because it's listed on video metadata also: https://developers.google.com/youtube/v3/docs/videos#resource |
07:09
🔗
|
systwi |
Or, correction, what should I look for? I read the page on etags, but I really want to stay away from using YT's API |
07:09
🔗
|
markedL |
etags are usually used in HTTP headers |
07:10
🔗
|
systwi |
A quick look in the info.json doesn't show that, but it could just be youtube-dl' |
07:10
🔗
|
systwi |
A quick look in the info.json doesn't show that, but it could just be youtube-dl's fault for not adding it in the file |
07:10
🔗
|
systwi |
So, wget the first page or something? |
07:11
🔗
|
markedL |
I don't know anything about youtube, but it would be on the headers of a video file request if it's a single file. |
07:19
🔗
|
markedL |
I don't see any etags in the video headers |
07:21
🔗
|
systwi |
I really hope there's SOME way to check for differences without redownloading |
07:23
🔗
|
|
kiska18 has quit IRC (Remote host closed the connection) |
07:24
🔗
|
|
kiska18 has joined #archiveteam-ot |
07:24
🔗
|
|
Fusl____ sets mode: +o kiska18 |
07:24
🔗
|
|
svchfoo3 sets mode: +o kiska18 |
07:24
🔗
|
|
Fusl sets mode: +o kiska18 |
07:24
🔗
|
|
Fusl_ sets mode: +o kiska18 |
07:25
🔗
|
markedL |
the most promising I see is date published, byte count, and time length like ivan suggested |
07:26
🔗
|
systwi |
Ughhh, well I guess byte count sounds closest, but again reencoding might alter the file size |
07:27
🔗
|
systwi |
Doesn't date published stay the same? I thought that represented the date the video was originally uploaded |
07:30
🔗
|
systwi |
The json does have a `filesize` variable |
07:31
🔗
|
systwi |
I assume it's in bytes |
07:35
🔗
|
markedL |
https://gizmodo.com/this-is-what-happens-when-you-re-upload-a-youtube-video-5555359 funny but not relevant to the problem at hand |
07:39
🔗
|
systwi |
Those were the most terrifying water sounds I've ever heard in my life |
07:41
🔗
|
markedL |
https://support.google.com/youtube/answer/1388383?hl=en in place video edits |
07:49
🔗
|
systwi |
Hmm, thanks but that didn't seem to reveal much info on how YT documents those changes though :( |
07:49
🔗
|
systwi |
So the video finally got the audio replaced on it, here are the (main) differences between the jsons: |
07:50
🔗
|
systwi |
json.formats[0].filesize = 119068 |
07:50
🔗
|
systwi |
json.formats[0].tbr = 130.235 |
07:50
🔗
|
systwi |
json.abr = 128 |
07:51
🔗
|
systwi |
Which the bit rates could stay just the same between edits |
07:52
🔗
|
systwi |
Sadly I think the filesize is the safest way to go. I was really hoping they'd store an MD5 or something on the video/audio file(s) to use for comparison, or to use a variable like `edit_number = 0`, `edit_number = 4` |
07:52
🔗
|
systwi |
But then again, a reencode on YT's end will envoke a redownload anyway |
07:54
🔗
|
systwi |
I'm really hoping one of YT's staff members will get back to ivan_ and give us some helpful information |
07:55
🔗
|
ivan_ |
I don't know if that's particularly likely to yield helpful information |
07:55
🔗
|
* |
systwi groans |
07:55
🔗
|
systwi |
Do you have a rough estimate on how frequent YT reencodes their videos? |
07:55
🔗
|
ivan_ |
I would guess maybe about once a year at least |
07:56
🔗
|
systwi |
Maybe a few reencoded videos here or there won't be a big issue |
07:56
🔗
|
systwi |
But that would really, REALLY be a waste of disk space |
07:56
🔗
|
ivan_ |
maybe you should focus on something other than archiving multiple variants of videos |
07:57
🔗
|
ivan_ |
it doesn't seem to have particularly high ROI |
07:57
🔗
|
systwi |
Well, I mean, I'm grabbing multiple variants of everything else already |
07:57
🔗
|
ivan_ |
edits are relatively rare |
07:59
🔗
|
systwi |
Too bad there isn't some kind of CLI tool that would, at a bare minimum, would compare two video files and look for noticeable audio/video differences and not something like "this one pixel in this one frame is one shade darker, therefore it's entirely different" |
07:59
🔗
|
systwi |
That way the process would be, check file size > redownload > tool check |
08:00
🔗
|
systwi |
I guess the file size variable is better than nothing, but ewwww |
08:00
🔗
|
systwi |
What a sticky situation |
08:06
🔗
|
systwi |
I wish I could invoke a YT reencode |
08:09
🔗
|
systwi |
But even then, while very unlikely, there is still the chance a modification could result in the same file size |
08:25
🔗
|
|
bluefoo_ has joined #archiveteam-ot |
08:25
🔗
|
|
bluefoo has quit IRC (Ping timeout: 252 seconds) |
08:37
🔗
|
systwi |
In reality I would like to implement such a feature in my script for other sources, like Dailymotion, Vimeo, etc. but for the time being I am only focusing on YT |
08:43
🔗
|
|
voker57_ has joined #archiveteam-ot |
08:57
🔗
|
|
bluefoo_ has quit IRC (Read error: Connection reset by peer) |
09:50
🔗
|
|
bluefoo has joined #archiveteam-ot |
10:01
🔗
|
|
bluefoo has quit IRC (Read error: Connection reset by peer) |
10:14
🔗
|
|
bluefoo has joined #archiveteam-ot |
10:22
🔗
|
|
voltagex has joined #archiveteam-ot |
10:23
🔗
|
voltagex |
I'm quite proud of these two uploads: https://archive.org/details/1194TalkingClockAustralia, https://archive.org/details/1196WeatherServiceAustralia |
10:28
🔗
|
|
bluefoo has quit IRC (Read error: Operation timed out) |
10:29
🔗
|
Igloo |
Anyone know how long it takes for something to appear in WBM after upload? |
10:29
🔗
|
Kaz |
long time |
10:29
🔗
|
Kaz |
scale is usually "multiple days" |
10:31
🔗
|
Igloo |
Boo. |
11:12
🔗
|
|
dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) |
11:13
🔗
|
|
bluefoo has joined #archiveteam-ot |
11:14
🔗
|
|
dashcloud has joined #archiveteam-ot |
11:46
🔗
|
|
bluefoo has quit IRC (Ping timeout: 360 seconds) |
12:17
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
13:42
🔗
|
icedice |
systwi: There might be a tool like that somewhere in this list: https://alternativeto.net/software/dupeguru/?platform=linux |
14:00
🔗
|
|
deevious has quit IRC (Quit: deevious) |
14:09
🔗
|
|
deevious has joined #archiveteam-ot |
14:15
🔗
|
|
bluefoo has joined #archiveteam-ot |
14:34
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
14:40
🔗
|
|
Raccoon` has joined #archiveteam-ot |
14:40
🔗
|
|
Raccoon` has quit IRC (Handshake flooding) |
14:40
🔗
|
|
Raccoon` has joined #archiveteam-ot |
14:47
🔗
|
|
Raccoon has quit IRC (Read error: Operation timed out) |
14:47
🔗
|
|
Raccoon` is now known as Raccoon |
14:50
🔗
|
|
icedice has quit IRC (Ping timeout: 252 seconds) |
16:18
🔗
|
|
qw3rty2 has joined #archiveteam-ot |
16:18
🔗
|
|
ShellyRol has quit IRC (Read error: Operation timed out) |
16:27
🔗
|
|
qw3rty has quit IRC (Ping timeout: 745 seconds) |
16:30
🔗
|
|
ShellyRol has joined #archiveteam-ot |
19:48
🔗
|
|
Muad-Dib has joined #archiveteam-ot |
19:52
🔗
|
systwi |
Thanks icedice, but the tools in there were for dealing with local duplicates, not files matching their off-site counterpart |
20:02
🔗
|
|
kiskabak has quit IRC (Remote host closed the connection) |
20:02
🔗
|
|
kiskabak has joined #archiveteam-ot |
20:02
🔗
|
|
Fusl sets mode: +o kiskabak |
20:02
🔗
|
|
Fusl____ sets mode: +o kiskabak |
20:02
🔗
|
|
Fusl_ sets mode: +o kiskabak |
20:23
🔗
|
markedL |
the likely hood of byte counts being the same after a mod is super low given the size of these files |
21:07
🔗
|
|
tuluu has joined #archiveteam-ot |
21:23
🔗
|
|
katocala has quit IRC () |
21:44
🔗
|
|
katocala has joined #archiveteam-ot |
21:54
🔗
|
|
apache2 has quit IRC (Remote host closed the connection) |
21:54
🔗
|
|
apache2 has joined #archiveteam-ot |
22:05
🔗
|
|
BlueMax has joined #archiveteam-ot |