#archiveteam-ot 2019-08-24,Sat

↑back Search

Time Nickname Message
00:39 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
00:51 🔗 wp494 has joined #archiveteam-ot
02:27 🔗 Arctic has joined #archiveteam-ot
02:31 🔗 Arctic has quit IRC (Remote host closed the connection)
03:09 🔗 systwi JAA: I need a hierarchical database to organize changes to a YouTube channel and its videos. It is broken down like this example:
03:11 🔗 systwi CHANNEL ID [obj] > VIDEO ID [obj] > video.mkv [obj] > date [str], hash [str]
03:12 🔗 systwi And since VIDEO ID is an object, inside of that are every file downloaded (description, info.json, subtitles, etc.)
03:13 🔗 systwi And "date" is the date that it was grabbed. If a newer version of the file is downloaded this current information is moved to an "outdated" object.
03:13 🔗 systwi There is much much more detail than just that, that's a bare-bones idea of how it work
03:13 🔗 systwi *works
03:20 🔗 systwi Under CHANNEL ID there will be every video id
03:56 🔗 Arctic has joined #archiveteam-ot
03:58 🔗 Arctic has quit IRC (Remote host closed the connection)
04:00 🔗 lunik1 has quit IRC (:x)
04:01 🔗 lunik1 has joined #archiveteam-ot
04:14 🔗 systwi Here is what I have so far:
04:14 🔗 systwi https://gist.github.com/systwi/413add02946e3a9cb087f6b4a8922687/raw/cc1feadc56a29237e9df2ed6ec786f8d5d81a164/youtube_database_sample.json
04:17 🔗 systwi View it in a .json viewer that displays the data in a hierarchical view
04:23 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
04:26 🔗 systwi I am looking for a replacement database format to use instead of .json
04:40 🔗 nataraj has joined #archiveteam-ot
05:14 🔗 systwi has quit IRC (Remote host closed the connection)
05:33 🔗 systwi has joined #archiveteam-ot
05:57 🔗 nataraj has quit IRC (Read error: Operation timed out)
05:58 🔗 nataraj has joined #archiveteam-ot
06:00 🔗 dhyan_nat has joined #archiveteam-ot
06:03 🔗 nataraj has quit IRC (Read error: Operation timed out)
06:20 🔗 dhyan_nat has quit IRC (Read error: Operation timed out)
06:40 🔗 Quirk8 has quit IRC (END OF LINE)
06:41 🔗 Quirk8 has joined #archiveteam-ot
06:47 🔗 DigiDigi has quit IRC (Remote host closed the connection)
08:14 🔗 icedice has joined #archiveteam-ot
08:16 🔗 icedice has quit IRC (Client Quit)
08:23 🔗 ivan_ A videos table with some indexes
08:23 🔗 ivan_ The
08:23 🔗 ivan_ .info.json stuff can go into a bunch of columns
08:25 🔗 Raccoon i want a json/xml based filesystem. flat directory structure.
08:52 🔗 JAA I'd just store that in a simple relational DB with three tables (or more, depending on what else you want to store): channels, videos, and files.
09:01 🔗 ivan_ to flesh that out: channels (channel_id, username, display_name) PK channel_id; videos (video_id, channel_id, video_title, ... .info.json stuff) PK video_id; files (video_id, file) PK (video_id, file)
09:02 🔗 ivan_ add a channel_id index on videos
09:03 🔗 ivan_ you can query to get videos in a channel, files for a video, etc
09:05 🔗 ivan_ PK is primary key
09:49 🔗 wp494 has quit IRC (Read error: Operation timed out)
09:49 🔗 wp494 has joined #archiveteam-ot
12:19 🔗 BlueMax has quit IRC (Quit: Leaving)
14:02 🔗 chirlu has joined #archiveteam-ot
17:16 🔗 Hani has quit IRC (Quit: Hani)
17:30 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
17:38 🔗 Sauce has joined #archiveteam-ot
18:04 🔗 DogsRNice has joined #archiveteam-ot
18:52 🔗 Sauce has quit IRC (Read error: Connection reset by peer)
20:59 🔗 Hani has joined #archiveteam-ot
21:05 🔗 qw3rty112 has joined #archiveteam-ot
21:20 🔗 DigiDigi has joined #archiveteam-ot
22:16 🔗 DogsRNice has quit IRC (Ping timeout: 252 seconds)
22:17 🔗 DogsRNice has joined #archiveteam-ot
22:24 🔗 systwi I think I understand this (please be easy on me, this is the first time I'm experimenting with databases). So would MySQL be a good choice? I saw an article somewhere where someone [tried to] store hierarchical information in MySQL tables. I don't know how dissimilar that is with something like sqlite3 (already incl. with macOS)
22:25 🔗 systwi And another thing, is it mandatory for me to host a SQL server in order for me to manage this database? I was hoping it could manage a local file on a local filesystem.
22:27 🔗 ivan_ postgresql
22:27 🔗 ivan_ yes, you'd have to run a sql server unless you want to use sqlite
22:28 🔗 systwi I assume you recommend postgresql over sqlite in a situation like this?
22:28 🔗 ivan_ but guess what, database servers also store things in files on a filesystem
22:28 🔗 ivan_ yes, postgresql is the better option
22:29 🔗 ivan_ the postgresql docs are a good read
22:32 🔗 ivan_ also, you can dump all of your data to a text format
22:32 🔗 ivan_ pg_dump or pg_dump --data-only
22:34 🔗 systwi So, just so that I am understanding this correctly, I would need to run a postgresql server (possibly run the daemon when the script runs) to access the data, connect to it via `localhost` and then quit the daemon when done? Can the database files be stored in a custom location?
22:35 🔗 ivan_ people generally keep postgresql running as a system service
22:35 🔗 JAA SQLite is good enough for many things as long as you don't care too much about performance.
22:35 🔗 ivan_ yes, you connect via TCP (to e.g. localhost) or the UNIX socket
22:35 🔗 JAA If you want to run hundreds of queries against it per second, then yeah, you're in for a bad time.
22:35 🔗 ivan_ yes, you can change the data directory
22:39 🔗 systwi It would be only me accessing the database. Let's say I keep everything on a flash drive. I was hoping it could be portable in the sense that I can run my script on one machine (with the db in the same dir as the script) and then on another machine I can do just the same.
22:39 🔗 systwi Additionally this would be low traffic to the database
22:43 🔗 ivan_ you don't need to sneakernet your data around to use it on two machines
22:43 🔗 ivan_ you can have one machine connect to the other machine's database
22:43 🔗 ivan_ you can set up replication if you really want the data in both places
22:43 🔗 ivan_ you can rsync the postgresql data directory if it's not running
22:44 🔗 systwi Is "storing the database on solely one machine and having my script always contact that machine" basically the same thing you just said?
22:45 🔗 ivan_ yes that was <@ivan_> you can have one machine connect to the other machine's database
22:45 🔗 ivan_ but the other two were different
22:45 🔗 systwi Oh okay I get it
22:45 🔗 systwi That might be a good solution
22:46 🔗 systwi Would installing postgresql server on every computer I use my script on be another alternative?
22:46 🔗 systwi And then just have them all use the same database (not at once of course)
23:27 🔗 ivan_ systwi: why do you want multiple postgresql servers?
23:29 🔗 ivan_ are you transporting an external drive between different computers
23:31 🔗 Raccoon has quit IRC (Ping timeout: 252 seconds)
23:32 🔗 Raccoon has joined #archiveteam-ot
23:45 🔗 wp494 has quit IRC (Read error: Operation timed out)
23:48 🔗 BlueMax has joined #archiveteam-ot
23:49 🔗 wp494 has joined #archiveteam-ot

irclogger-viewer