#archiveteam-ot 2019-08-24,Sat

↑back Search

Time	Nickname	Message
00:39 ^🔗		wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
00:51 ^🔗		wp494 has joined #archiveteam-ot
02:27 ^🔗		Arctic has joined #archiveteam-ot
02:31 ^🔗		Arctic has quit IRC (Remote host closed the connection)
03:09 ^🔗	systwi	JAA: I need a hierarchical database to organize changes to a YouTube channel and its videos. It is broken down like this example:
03:11 ^🔗	systwi	CHANNEL ID [obj] > VIDEO ID [obj] > video.mkv [obj] > date [str], hash [str]
03:12 ^🔗	systwi	And since VIDEO ID is an object, inside of that are every file downloaded (description, info.json, subtitles, etc.)
03:13 ^🔗	systwi	And "date" is the date that it was grabbed. If a newer version of the file is downloaded this current information is moved to an "outdated" object.
03:13 ^🔗	systwi	There is much much more detail than just that, that's a bare-bones idea of how it work
03:13 ^🔗	systwi	*works
03:20 ^🔗	systwi	Under CHANNEL ID there will be every video id
03:56 ^🔗		Arctic has joined #archiveteam-ot
03:58 ^🔗		Arctic has quit IRC (Remote host closed the connection)
04:00 ^🔗		lunik1 has quit IRC (:x)
04:01 ^🔗		lunik1 has joined #archiveteam-ot
04:14 ^🔗	systwi	Here is what I have so far:
04:14 ^🔗	systwi	https://gist.github.com/systwi/413add02946e3a9cb087f6b4a8922687/raw/cc1feadc56a29237e9df2ed6ec786f8d5d81a164/youtube_database_sample.json
04:17 ^🔗	systwi	View it in a .json viewer that displays the data in a hierarchical view
04:23 ^🔗		DogsRNice has quit IRC (Read error: Connection reset by peer)
04:26 ^🔗	systwi	I am looking for a replacement database format to use instead of .json
04:40 ^🔗		nataraj has joined #archiveteam-ot
05:14 ^🔗		systwi has quit IRC (Remote host closed the connection)
05:33 ^🔗		systwi has joined #archiveteam-ot
05:57 ^🔗		nataraj has quit IRC (Read error: Operation timed out)
05:58 ^🔗		nataraj has joined #archiveteam-ot
06:00 ^🔗		dhyan_nat has joined #archiveteam-ot
06:03 ^🔗		nataraj has quit IRC (Read error: Operation timed out)
06:20 ^🔗		dhyan_nat has quit IRC (Read error: Operation timed out)
06:40 ^🔗		Quirk8 has quit IRC (END OF LINE)
06:41 ^🔗		Quirk8 has joined #archiveteam-ot
06:47 ^🔗		DigiDigi has quit IRC (Remote host closed the connection)
08:14 ^🔗		icedice has joined #archiveteam-ot
08:16 ^🔗		icedice has quit IRC (Client Quit)
08:23 ^🔗	ivan_	A videos table with some indexes
08:23 ^🔗	ivan_	The
08:23 ^🔗	ivan_	.info.json stuff can go into a bunch of columns
08:25 ^🔗	Raccoon	i want a json/xml based filesystem. flat directory structure.
08:52 ^🔗	JAA	I'd just store that in a simple relational DB with three tables (or more, depending on what else you want to store): channels, videos, and files.
09:01 ^🔗	ivan_	to flesh that out: channels (channel_id, username, display_name) PK channel_id; videos (video_id, channel_id, video_title, ... .info.json stuff) PK video_id; files (video_id, file) PK (video_id, file)
09:02 ^🔗	ivan_	add a channel_id index on videos
09:03 ^🔗	ivan_	you can query to get videos in a channel, files for a video, etc
09:05 ^🔗	ivan_	PK is primary key
09:49 ^🔗		wp494 has quit IRC (Read error: Operation timed out)
09:49 ^🔗		wp494 has joined #archiveteam-ot
12:19 ^🔗		BlueMax has quit IRC (Quit: Leaving)
14:02 ^🔗		chirlu has joined #archiveteam-ot
17:16 ^🔗		Hani has quit IRC (Quit: Hani)
17:30 ^🔗		Stiletto has quit IRC (Ping timeout: 246 seconds)
17:38 ^🔗		Sauce has joined #archiveteam-ot
18:04 ^🔗		DogsRNice has joined #archiveteam-ot
18:52 ^🔗		Sauce has quit IRC (Read error: Connection reset by peer)
20:59 ^🔗		Hani has joined #archiveteam-ot
21:05 ^🔗		qw3rty112 has joined #archiveteam-ot
21:20 ^🔗		DigiDigi has joined #archiveteam-ot
22:16 ^🔗		DogsRNice has quit IRC (Ping timeout: 252 seconds)
22:17 ^🔗		DogsRNice has joined #archiveteam-ot
22:24 ^🔗	systwi	I think I understand this (please be easy on me, this is the first time I'm experimenting with databases). So would MySQL be a good choice? I saw an article somewhere where someone [tried to] store hierarchical information in MySQL tables. I don't know how dissimilar that is with something like sqlite3 (already incl. with macOS)
22:25 ^🔗	systwi	And another thing, is it mandatory for me to host a SQL server in order for me to manage this database? I was hoping it could manage a local file on a local filesystem.
22:27 ^🔗	ivan_	postgresql
22:27 ^🔗	ivan_	yes, you'd have to run a sql server unless you want to use sqlite
22:28 ^🔗	systwi	I assume you recommend postgresql over sqlite in a situation like this?
22:28 ^🔗	ivan_	but guess what, database servers also store things in files on a filesystem
22:28 ^🔗	ivan_	yes, postgresql is the better option
22:29 ^🔗	ivan_	the postgresql docs are a good read
22:32 ^🔗	ivan_	also, you can dump all of your data to a text format
22:32 ^🔗	ivan_	pg_dump or pg_dump --data-only
22:34 ^🔗	systwi	So, just so that I am understanding this correctly, I would need to run a postgresql server (possibly run the daemon when the script runs) to access the data, connect to it via `localhost` and then quit the daemon when done? Can the database files be stored in a custom location?
22:35 ^🔗	ivan_	people generally keep postgresql running as a system service
22:35 ^🔗	JAA	SQLite is good enough for many things as long as you don't care too much about performance.
22:35 ^🔗	ivan_	yes, you connect via TCP (to e.g. localhost) or the UNIX socket
22:35 ^🔗	JAA	If you want to run hundreds of queries against it per second, then yeah, you're in for a bad time.
22:35 ^🔗	ivan_	yes, you can change the data directory
22:39 ^🔗	systwi	It would be only me accessing the database. Let's say I keep everything on a flash drive. I was hoping it could be portable in the sense that I can run my script on one machine (with the db in the same dir as the script) and then on another machine I can do just the same.
22:39 ^🔗	systwi	Additionally this would be low traffic to the database
22:43 ^🔗	ivan_	you don't need to sneakernet your data around to use it on two machines
22:43 ^🔗	ivan_	you can have one machine connect to the other machine's database
22:43 ^🔗	ivan_	you can set up replication if you really want the data in both places
22:43 ^🔗	ivan_	you can rsync the postgresql data directory if it's not running
22:44 ^🔗	systwi	Is "storing the database on solely one machine and having my script always contact that machine" basically the same thing you just said?
22:45 ^🔗	ivan_	yes that was <@ivan_> you can have one machine connect to the other machine's database
22:45 ^🔗	ivan_	but the other two were different
22:45 ^🔗	systwi	Oh okay I get it
22:45 ^🔗	systwi	That might be a good solution
22:46 ^🔗	systwi	Would installing postgresql server on every computer I use my script on be another alternative?
22:46 ^🔗	systwi	And then just have them all use the same database (not at once of course)
23:27 ^🔗	ivan_	systwi: why do you want multiple postgresql servers?
23:29 ^🔗	ivan_	are you transporting an external drive between different computers
23:31 ^🔗		Raccoon has quit IRC (Ping timeout: 252 seconds)
23:32 ^🔗		Raccoon has joined #archiveteam-ot
23:45 ^🔗		wp494 has quit IRC (Read error: Operation timed out)
23:48 ^🔗		BlueMax has joined #archiveteam-ot
23:49 ^🔗		wp494 has joined #archiveteam-ot

irclogger-viewer