[00:39] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
[00:51] *** wp494 has joined #archiveteam-ot
[02:27] *** Arctic has joined #archiveteam-ot
[02:31] *** Arctic has quit IRC (Remote host closed the connection)
[03:09] <systwi> JAA: I need a hierarchical database to organize changes to a YouTube channel and its videos. It is broken down like this example:
[03:11] <systwi> CHANNEL ID [obj] > VIDEO ID [obj] > video.mkv [obj] > date [str], hash [str]
[03:12] <systwi> And since VIDEO ID is an object, inside of that are every file downloaded (description, info.json, subtitles, etc.)
[03:13] <systwi> And "date" is the date that it was grabbed. If a newer version of the file is downloaded this current information is moved to an "outdated" object.
[03:13] <systwi> There is much much more detail than just that, that's a bare-bones idea of how it work
[03:13] <systwi> *works
[03:20] <systwi> Under CHANNEL ID there will be every video id
[03:56] *** Arctic has joined #archiveteam-ot
[03:58] *** Arctic has quit IRC (Remote host closed the connection)
[04:00] *** lunik1 has quit IRC (:x)
[04:01] *** lunik1 has joined #archiveteam-ot
[04:14] <systwi> Here is what I have so far:
[04:14] <systwi> https://gist.github.com/systwi/413add02946e3a9cb087f6b4a8922687/raw/cc1feadc56a29237e9df2ed6ec786f8d5d81a164/youtube_database_sample.json
[04:17] <systwi> View it in a .json viewer that displays the data in a hierarchical view
[04:23] *** DogsRNice has quit IRC (Read error: Connection reset by peer)
[04:26] <systwi> I am looking for a replacement database format to use instead of .json
[04:40] *** nataraj has joined #archiveteam-ot
[05:14] *** systwi has quit IRC (Remote host closed the connection)
[05:33] *** systwi has joined #archiveteam-ot
[05:57] *** nataraj has quit IRC (Read error: Operation timed out)
[05:58] *** nataraj has joined #archiveteam-ot
[06:00] *** dhyan_nat has joined #archiveteam-ot
[06:03] *** nataraj has quit IRC (Read error: Operation timed out)
[06:20] *** dhyan_nat has quit IRC (Read error: Operation timed out)
[06:40] *** Quirk8 has quit IRC (END OF LINE)
[06:41] *** Quirk8 has joined #archiveteam-ot
[06:47] *** DigiDigi has quit IRC (Remote host closed the connection)
[08:14] *** icedice has joined #archiveteam-ot
[08:16] *** icedice has quit IRC (Client Quit)
[08:23] <ivan_> A videos table with some indexes 
[08:23] <ivan_> The 
[08:23] <ivan_> .info.json stuff can go into a bunch of columns 
[08:25] <Raccoon> i want a json/xml based filesystem.  flat directory structure.
[08:52] <JAA> I'd just store that in a simple relational DB with three tables (or more, depending on what else you want to store): channels, videos, and files.
[09:01] <ivan_> to flesh that out: channels (channel_id, username, display_name) PK channel_id; videos (video_id, channel_id, video_title, ... .info.json stuff) PK video_id; files (video_id, file) PK (video_id, file)
[09:02] <ivan_> add a channel_id index on videos
[09:03] <ivan_> you can query to get videos in a channel, files for a video, etc
[09:05] <ivan_> PK is primary key
[09:49] *** wp494 has quit IRC (Read error: Operation timed out)
[09:49] *** wp494 has joined #archiveteam-ot
[12:19] *** BlueMax has quit IRC (Quit: Leaving)
[14:02] *** chirlu has joined #archiveteam-ot
[17:16] *** Hani has quit IRC (Quit: Hani)
[17:30] *** Stiletto has quit IRC (Ping timeout: 246 seconds)
[17:38] *** Sauce has joined #archiveteam-ot
[18:04] *** DogsRNice has joined #archiveteam-ot
[18:52] *** Sauce has quit IRC (Read error: Connection reset by peer)
[20:59] *** Hani has joined #archiveteam-ot
[21:05] *** qw3rty112 has joined #archiveteam-ot
[21:20] *** DigiDigi has joined #archiveteam-ot
[22:16] *** DogsRNice has quit IRC (Ping timeout: 252 seconds)
[22:17] *** DogsRNice has joined #archiveteam-ot
[22:24] <systwi> I think I understand this (please be easy on me, this is the first time I'm experimenting with databases). So would MySQL be a good choice? I saw an article somewhere where someone [tried to] store hierarchical information in MySQL tables. I don't know how dissimilar that is with something like sqlite3 (already incl. with macOS)
[22:25] <systwi> And another thing, is it mandatory for me to host a SQL server in order for me to manage this database? I was hoping it could manage a local file on a local filesystem.
[22:27] <ivan_> postgresql
[22:27] <ivan_> yes, you'd have to run a sql server unless you want to use sqlite
[22:28] <systwi> I assume you recommend postgresql over sqlite in a situation like this?
[22:28] <ivan_> but guess what, database servers also store things in files on a filesystem
[22:28] <ivan_> yes, postgresql is the better option
[22:29] <ivan_> the postgresql docs are a good read
[22:32] <ivan_> also, you can dump all of your data to a text format
[22:32] <ivan_> pg_dump or pg_dump --data-only
[22:34] <systwi> So, just so that I am understanding this correctly, I would need to run a postgresql server (possibly run the daemon when the script runs) to access the data, connect to it via `localhost` and then quit the daemon when done? Can the database files be stored in a custom location?
[22:35] <ivan_> people generally keep postgresql running as a system service
[22:35] <JAA> SQLite is good enough for many things as long as you don't care too much about performance.
[22:35] <ivan_> yes, you connect via TCP (to e.g. localhost) or the UNIX socket
[22:35] <JAA> If you want to run hundreds of queries against it per second, then yeah, you're in for a bad time.
[22:35] <ivan_> yes, you can change the data directory
[22:39] <systwi> It would be only me accessing the database. Let's say I keep everything on a flash drive. I was hoping it could be portable in the sense that I can run my script on one machine (with the db in the same dir as the script) and then on another machine I can do just the same.
[22:39] <systwi> Additionally this would be low traffic to the database
[22:43] <ivan_> you don't need to sneakernet your data around to use it on two machines
[22:43] <ivan_> you can have one machine connect to the other machine's database
[22:43] <ivan_> you can set up replication if you really want the data in both places
[22:43] <ivan_> you can rsync the postgresql data directory if it's not running
[22:44] <systwi> Is "storing the database on solely one machine and having my script always contact that machine" basically the same thing you just said?
[22:45] <ivan_> yes that was <@ivan_> you can have one machine connect to the other machine's database
[22:45] <ivan_> but the other two were different
[22:45] <systwi> Oh okay I get it
[22:45] <systwi> That might be a good solution
[22:46] <systwi> Would installing postgresql server on every computer I use my script on be another alternative?
[22:46] <systwi> And then just have them all use the same database (not at once of course)
[23:27] <ivan_> systwi: why do you want multiple postgresql servers?
[23:29] <ivan_> are you transporting an external drive between different computers
[23:31] *** Raccoon has quit IRC (Ping timeout: 252 seconds)
[23:32] *** Raccoon has joined #archiveteam-ot
[23:45] *** wp494 has quit IRC (Read error: Operation timed out)
[23:48] *** BlueMax has joined #archiveteam-ot
[23:49] *** wp494 has joined #archiveteam-ot