Time |
Nickname |
Message |
00:39
🔗
|
|
wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) |
00:51
🔗
|
|
wp494 has joined #archiveteam-ot |
02:27
🔗
|
|
Arctic has joined #archiveteam-ot |
02:31
🔗
|
|
Arctic has quit IRC (Remote host closed the connection) |
03:09
🔗
|
systwi |
JAA: I need a hierarchical database to organize changes to a YouTube channel and its videos. It is broken down like this example: |
03:11
🔗
|
systwi |
CHANNEL ID [obj] > VIDEO ID [obj] > video.mkv [obj] > date [str], hash [str] |
03:12
🔗
|
systwi |
And since VIDEO ID is an object, inside of that are every file downloaded (description, info.json, subtitles, etc.) |
03:13
🔗
|
systwi |
And "date" is the date that it was grabbed. If a newer version of the file is downloaded this current information is moved to an "outdated" object. |
03:13
🔗
|
systwi |
There is much much more detail than just that, that's a bare-bones idea of how it work |
03:13
🔗
|
systwi |
*works |
03:20
🔗
|
systwi |
Under CHANNEL ID there will be every video id |
03:56
🔗
|
|
Arctic has joined #archiveteam-ot |
03:58
🔗
|
|
Arctic has quit IRC (Remote host closed the connection) |
04:00
🔗
|
|
lunik1 has quit IRC (:x) |
04:01
🔗
|
|
lunik1 has joined #archiveteam-ot |
04:14
🔗
|
systwi |
Here is what I have so far: |
04:14
🔗
|
systwi |
https://gist.github.com/systwi/413add02946e3a9cb087f6b4a8922687/raw/cc1feadc56a29237e9df2ed6ec786f8d5d81a164/youtube_database_sample.json |
04:17
🔗
|
systwi |
View it in a .json viewer that displays the data in a hierarchical view |
04:23
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
04:26
🔗
|
systwi |
I am looking for a replacement database format to use instead of .json |
04:40
🔗
|
|
nataraj has joined #archiveteam-ot |
05:14
🔗
|
|
systwi has quit IRC (Remote host closed the connection) |
05:33
🔗
|
|
systwi has joined #archiveteam-ot |
05:57
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
05:58
🔗
|
|
nataraj has joined #archiveteam-ot |
06:00
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
06:03
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
06:20
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
06:40
🔗
|
|
Quirk8 has quit IRC (END OF LINE) |
06:41
🔗
|
|
Quirk8 has joined #archiveteam-ot |
06:47
🔗
|
|
DigiDigi has quit IRC (Remote host closed the connection) |
08:14
🔗
|
|
icedice has joined #archiveteam-ot |
08:16
🔗
|
|
icedice has quit IRC (Client Quit) |
08:23
🔗
|
ivan_ |
A videos table with some indexes |
08:23
🔗
|
ivan_ |
The |
08:23
🔗
|
ivan_ |
.info.json stuff can go into a bunch of columns |
08:25
🔗
|
Raccoon |
i want a json/xml based filesystem. flat directory structure. |
08:52
🔗
|
JAA |
I'd just store that in a simple relational DB with three tables (or more, depending on what else you want to store): channels, videos, and files. |
09:01
🔗
|
ivan_ |
to flesh that out: channels (channel_id, username, display_name) PK channel_id; videos (video_id, channel_id, video_title, ... .info.json stuff) PK video_id; files (video_id, file) PK (video_id, file) |
09:02
🔗
|
ivan_ |
add a channel_id index on videos |
09:03
🔗
|
ivan_ |
you can query to get videos in a channel, files for a video, etc |
09:05
🔗
|
ivan_ |
PK is primary key |
09:49
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
09:49
🔗
|
|
wp494 has joined #archiveteam-ot |
12:19
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
14:02
🔗
|
|
chirlu has joined #archiveteam-ot |
17:16
🔗
|
|
Hani has quit IRC (Quit: Hani) |
17:30
🔗
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
17:38
🔗
|
|
Sauce has joined #archiveteam-ot |
18:04
🔗
|
|
DogsRNice has joined #archiveteam-ot |
18:52
🔗
|
|
Sauce has quit IRC (Read error: Connection reset by peer) |
20:59
🔗
|
|
Hani has joined #archiveteam-ot |
21:05
🔗
|
|
qw3rty112 has joined #archiveteam-ot |
21:20
🔗
|
|
DigiDigi has joined #archiveteam-ot |
22:16
🔗
|
|
DogsRNice has quit IRC (Ping timeout: 252 seconds) |
22:17
🔗
|
|
DogsRNice has joined #archiveteam-ot |
22:24
🔗
|
systwi |
I think I understand this (please be easy on me, this is the first time I'm experimenting with databases). So would MySQL be a good choice? I saw an article somewhere where someone [tried to] store hierarchical information in MySQL tables. I don't know how dissimilar that is with something like sqlite3 (already incl. with macOS) |
22:25
🔗
|
systwi |
And another thing, is it mandatory for me to host a SQL server in order for me to manage this database? I was hoping it could manage a local file on a local filesystem. |
22:27
🔗
|
ivan_ |
postgresql |
22:27
🔗
|
ivan_ |
yes, you'd have to run a sql server unless you want to use sqlite |
22:28
🔗
|
systwi |
I assume you recommend postgresql over sqlite in a situation like this? |
22:28
🔗
|
ivan_ |
but guess what, database servers also store things in files on a filesystem |
22:28
🔗
|
ivan_ |
yes, postgresql is the better option |
22:29
🔗
|
ivan_ |
the postgresql docs are a good read |
22:32
🔗
|
ivan_ |
also, you can dump all of your data to a text format |
22:32
🔗
|
ivan_ |
pg_dump or pg_dump --data-only |
22:34
🔗
|
systwi |
So, just so that I am understanding this correctly, I would need to run a postgresql server (possibly run the daemon when the script runs) to access the data, connect to it via `localhost` and then quit the daemon when done? Can the database files be stored in a custom location? |
22:35
🔗
|
ivan_ |
people generally keep postgresql running as a system service |
22:35
🔗
|
JAA |
SQLite is good enough for many things as long as you don't care too much about performance. |
22:35
🔗
|
ivan_ |
yes, you connect via TCP (to e.g. localhost) or the UNIX socket |
22:35
🔗
|
JAA |
If you want to run hundreds of queries against it per second, then yeah, you're in for a bad time. |
22:35
🔗
|
ivan_ |
yes, you can change the data directory |
22:39
🔗
|
systwi |
It would be only me accessing the database. Let's say I keep everything on a flash drive. I was hoping it could be portable in the sense that I can run my script on one machine (with the db in the same dir as the script) and then on another machine I can do just the same. |
22:39
🔗
|
systwi |
Additionally this would be low traffic to the database |
22:43
🔗
|
ivan_ |
you don't need to sneakernet your data around to use it on two machines |
22:43
🔗
|
ivan_ |
you can have one machine connect to the other machine's database |
22:43
🔗
|
ivan_ |
you can set up replication if you really want the data in both places |
22:43
🔗
|
ivan_ |
you can rsync the postgresql data directory if it's not running |
22:44
🔗
|
systwi |
Is "storing the database on solely one machine and having my script always contact that machine" basically the same thing you just said? |
22:45
🔗
|
ivan_ |
yes that was <@ivan_> you can have one machine connect to the other machine's database |
22:45
🔗
|
ivan_ |
but the other two were different |
22:45
🔗
|
systwi |
Oh okay I get it |
22:45
🔗
|
systwi |
That might be a good solution |
22:46
🔗
|
systwi |
Would installing postgresql server on every computer I use my script on be another alternative? |
22:46
🔗
|
systwi |
And then just have them all use the same database (not at once of course) |
23:27
🔗
|
ivan_ |
systwi: why do you want multiple postgresql servers? |
23:29
🔗
|
ivan_ |
are you transporting an external drive between different computers |
23:31
🔗
|
|
Raccoon has quit IRC (Ping timeout: 252 seconds) |
23:32
🔗
|
|
Raccoon has joined #archiveteam-ot |
23:45
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
23:48
🔗
|
|
BlueMax has joined #archiveteam-ot |
23:49
🔗
|
|
wp494 has joined #archiveteam-ot |