[00:39] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [00:51] *** wp494 has joined #archiveteam-ot [02:27] *** Arctic has joined #archiveteam-ot [02:31] *** Arctic has quit IRC (Remote host closed the connection) [03:09] JAA: I need a hierarchical database to organize changes to a YouTube channel and its videos. It is broken down like this example: [03:11] CHANNEL ID [obj] > VIDEO ID [obj] > video.mkv [obj] > date [str], hash [str] [03:12] And since VIDEO ID is an object, inside of that are every file downloaded (description, info.json, subtitles, etc.) [03:13] And "date" is the date that it was grabbed. If a newer version of the file is downloaded this current information is moved to an "outdated" object. [03:13] There is much much more detail than just that, that's a bare-bones idea of how it work [03:13] *works [03:20] Under CHANNEL ID there will be every video id [03:56] *** Arctic has joined #archiveteam-ot [03:58] *** Arctic has quit IRC (Remote host closed the connection) [04:00] *** lunik1 has quit IRC (:x) [04:01] *** lunik1 has joined #archiveteam-ot [04:14] Here is what I have so far: [04:14] https://gist.github.com/systwi/413add02946e3a9cb087f6b4a8922687/raw/cc1feadc56a29237e9df2ed6ec786f8d5d81a164/youtube_database_sample.json [04:17] View it in a .json viewer that displays the data in a hierarchical view [04:23] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [04:26] I am looking for a replacement database format to use instead of .json [04:40] *** nataraj has joined #archiveteam-ot [05:14] *** systwi has quit IRC (Remote host closed the connection) [05:33] *** systwi has joined #archiveteam-ot [05:57] *** nataraj has quit IRC (Read error: Operation timed out) [05:58] *** nataraj has joined #archiveteam-ot [06:00] *** dhyan_nat has joined #archiveteam-ot [06:03] *** nataraj has quit IRC (Read error: Operation timed out) [06:20] *** dhyan_nat has quit IRC (Read error: Operation timed out) [06:40] *** Quirk8 has quit IRC (END OF LINE) [06:41] *** Quirk8 has joined #archiveteam-ot [06:47] *** DigiDigi has quit IRC (Remote host closed the connection) [08:14] *** icedice has joined #archiveteam-ot [08:16] *** icedice has quit IRC (Client Quit) [08:23] A videos table with some indexes [08:23] The [08:23] .info.json stuff can go into a bunch of columns [08:25] i want a json/xml based filesystem. flat directory structure. [08:52] I'd just store that in a simple relational DB with three tables (or more, depending on what else you want to store): channels, videos, and files. [09:01] to flesh that out: channels (channel_id, username, display_name) PK channel_id; videos (video_id, channel_id, video_title, ... .info.json stuff) PK video_id; files (video_id, file) PK (video_id, file) [09:02] add a channel_id index on videos [09:03] you can query to get videos in a channel, files for a video, etc [09:05] PK is primary key [09:49] *** wp494 has quit IRC (Read error: Operation timed out) [09:49] *** wp494 has joined #archiveteam-ot [12:19] *** BlueMax has quit IRC (Quit: Leaving) [14:02] *** chirlu has joined #archiveteam-ot [17:16] *** Hani has quit IRC (Quit: Hani) [17:30] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [17:38] *** Sauce has joined #archiveteam-ot [18:04] *** DogsRNice has joined #archiveteam-ot [18:52] *** Sauce has quit IRC (Read error: Connection reset by peer) [20:59] *** Hani has joined #archiveteam-ot [21:05] *** qw3rty112 has joined #archiveteam-ot [21:20] *** DigiDigi has joined #archiveteam-ot [22:16] *** DogsRNice has quit IRC (Ping timeout: 252 seconds) [22:17] *** DogsRNice has joined #archiveteam-ot [22:24] I think I understand this (please be easy on me, this is the first time I'm experimenting with databases). So would MySQL be a good choice? I saw an article somewhere where someone [tried to] store hierarchical information in MySQL tables. I don't know how dissimilar that is with something like sqlite3 (already incl. with macOS) [22:25] And another thing, is it mandatory for me to host a SQL server in order for me to manage this database? I was hoping it could manage a local file on a local filesystem. [22:27] postgresql [22:27] yes, you'd have to run a sql server unless you want to use sqlite [22:28] I assume you recommend postgresql over sqlite in a situation like this? [22:28] but guess what, database servers also store things in files on a filesystem [22:28] yes, postgresql is the better option [22:29] the postgresql docs are a good read [22:32] also, you can dump all of your data to a text format [22:32] pg_dump or pg_dump --data-only [22:34] So, just so that I am understanding this correctly, I would need to run a postgresql server (possibly run the daemon when the script runs) to access the data, connect to it via `localhost` and then quit the daemon when done? Can the database files be stored in a custom location? [22:35] people generally keep postgresql running as a system service [22:35] SQLite is good enough for many things as long as you don't care too much about performance. [22:35] yes, you connect via TCP (to e.g. localhost) or the UNIX socket [22:35] If you want to run hundreds of queries against it per second, then yeah, you're in for a bad time. [22:35] yes, you can change the data directory [22:39] It would be only me accessing the database. Let's say I keep everything on a flash drive. I was hoping it could be portable in the sense that I can run my script on one machine (with the db in the same dir as the script) and then on another machine I can do just the same. [22:39] Additionally this would be low traffic to the database [22:43] you don't need to sneakernet your data around to use it on two machines [22:43] you can have one machine connect to the other machine's database [22:43] you can set up replication if you really want the data in both places [22:43] you can rsync the postgresql data directory if it's not running [22:44] Is "storing the database on solely one machine and having my script always contact that machine" basically the same thing you just said? [22:45] yes that was <@ivan_> you can have one machine connect to the other machine's database [22:45] but the other two were different [22:45] Oh okay I get it [22:45] That might be a good solution [22:46] Would installing postgresql server on every computer I use my script on be another alternative? [22:46] And then just have them all use the same database (not at once of course) [23:27] systwi: why do you want multiple postgresql servers? [23:29] are you transporting an external drive between different computers [23:31] *** Raccoon has quit IRC (Ping timeout: 252 seconds) [23:32] *** Raccoon has joined #archiveteam-ot [23:45] *** wp494 has quit IRC (Read error: Operation timed out) [23:48] *** BlueMax has joined #archiveteam-ot [23:49] *** wp494 has joined #archiveteam-ot