Time |
Nickname |
Message |
00:17
🔗
|
|
ats has quit IRC (Read error: Operation timed out) |
00:20
🔗
|
|
ats has joined #archiveteam-ot |
00:37
🔗
|
|
qnisz has joined #archiveteam-ot |
00:42
🔗
|
|
qnicw has quit IRC (Read error: Operation timed out) |
01:08
🔗
|
|
akierig has joined #archiveteam-ot |
01:15
🔗
|
|
robogoat has quit IRC (Read error: Operation timed out) |
01:16
🔗
|
|
BlueMax has joined #archiveteam-ot |
01:16
🔗
|
|
yawkat has quit IRC (Ping timeout: 252 seconds) |
01:16
🔗
|
|
robogoat has joined #archiveteam-ot |
01:26
🔗
|
|
akierig_ has joined #archiveteam-ot |
01:28
🔗
|
|
yawkat has joined #archiveteam-ot |
01:32
🔗
|
|
akierig has quit IRC (Read error: Operation timed out) |
01:46
🔗
|
|
akierig_ has quit IRC (Remote host closed the connection) |
01:47
🔗
|
|
akierig has joined #archiveteam-ot |
01:49
🔗
|
|
akierig_ has joined #archiveteam-ot |
01:49
🔗
|
|
akierig has quit IRC (Read error: Connection reset by peer) |
02:07
🔗
|
|
akierig has joined #archiveteam-ot |
02:07
🔗
|
|
bluefoo has quit IRC (Read error: Connection reset by peer) |
02:13
🔗
|
|
akierig_ has quit IRC (Read error: Operation timed out) |
02:39
🔗
|
|
akierig has quit IRC (Quit: later_gator) |
04:18
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
04:27
🔗
|
|
qnisz has quit IRC (Ping timeout: 496 seconds) |
04:32
🔗
|
|
qw3rty2 has joined #archiveteam-ot |
04:37
🔗
|
|
qw3rty has quit IRC (Ping timeout: 745 seconds) |
04:38
🔗
|
|
odemg has quit IRC (Ping timeout: 745 seconds) |
04:42
🔗
|
|
odemg has joined #archiveteam-ot |
05:36
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
06:14
🔗
|
|
markedL7 has joined #archiveteam-ot |
06:16
🔗
|
|
markedL has quit IRC (Read error: Operation timed out) |
06:16
🔗
|
|
markedL7 is now known as markedL |
06:29
🔗
|
|
dhyan_nat has quit IRC (Quit: Konversation terminated!) |
06:29
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
06:35
🔗
|
|
m007a83 has joined #archiveteam-ot |
07:14
🔗
|
|
RSY00O has joined #archiveteam-ot |
07:14
🔗
|
RSY00O |
[02:12] <RSY00O> Hi. [02:13] <RSY00O> I am wondering if it [02:13] <RSY00O> excuse me [02:13] <RSY00O> I am wondering if it's possible to mass archive every YT video, but just everything from before the beginning of 2010. I.e. all of 2000s YouTube. [02:14] <RSY00O> my friend told me 2000s YT would be ~385TB but it was a rough estimate |
07:15
🔗
|
ivan |
RSY00O: do you know how many videos that was? |
07:16
🔗
|
ivan |
I have an extensive YouTube archiving thing going on in #youtubearchive |
07:16
🔗
|
RSY00O |
https://www.archiveteam.org/index.php?title=YouTube this page says "Little is known about its database, but according to data from 2006, it was 45TB and doubling every 4 months. At this rate it would be 660 Petabytes (Oct 2014) by now." |
07:17
🔗
|
ivan |
I'm not immediately on board but getting all 2006-2009 YouTube but if you sample the content and think it's more good than bad, maybe |
07:18
🔗
|
ivan |
2005-2009 |
07:20
🔗
|
ivan |
there's this which might help find the old stuff https://old.reddit.com/r/DataHoarder/comments/906884/youtube_metadata_archive_because_working_with/ |
07:21
🔗
|
RSY00O |
well the scope of 00s YT videos gets smaller every day. we would need to get it ASAP |
07:21
🔗
|
RSY00O |
especially before 2021. hopefully this whole article 17 thing won't make the scope shrink considerably. but we'll have to see |
07:25
🔗
|
RSY00O |
if I were somehow able to successfully get all the videos (pretty hard stuff) I would need to put them on like 25 individual 16 TB hard-drives to save them locally which would cost around $12,000 |
07:25
🔗
|
RSY00O |
and that's not including backups |
07:26
🔗
|
RSY00O |
but the YTPs and Unregistered Hypercam 2 vids must be saved! |
07:26
🔗
|
ivan |
are you any good at writing software or data modeling |
07:27
🔗
|
RSY00O |
Nope. |
07:27
🔗
|
ivan |
too bad, I am really looking for someone to help with the YouTube I've got |
07:27
🔗
|
RSY00O |
I'm assuming I need a web crawling script to work with youtube-dl? |
07:28
🔗
|
RSY00O |
I really respect what you guys are doing btw really great stuff, it'll take a few years before I can help that much with these painstaking efforts. |
07:29
🔗
|
ivan |
you'd have to scrape the upload playlists of a lot of channels and load them into a database and youtube-dl the pre-2010 stuff |
07:29
🔗
|
ivan |
just found https://github.com/simon987/yt-metadata via that reddit link |
07:30
🔗
|
ivan |
also you need hundreds of IPs to archive YouTube these days |
07:30
🔗
|
ivan |
you get about 500-1000 videos per day per IP |
07:30
🔗
|
RSY00O |
well thanks, for now with youtube-dl I can archive individual channels at the very least automatically right? |
07:31
🔗
|
RSY00O |
idk how that works my computer has issues that need to be worked out, so I couldn't install it yet |
07:31
🔗
|
RSY00O |
so once I watch one video from that time, just grab the entire channel's videos (up to 2010) and add them to my archive |
07:32
🔗
|
RSY00O |
so I don't have to download each video and copy the metadata manually, which sucks |
07:52
🔗
|
ivan |
youtube-dl can grab channels yes |
07:59
🔗
|
|
deevious has joined #archiveteam-ot |
08:03
🔗
|
|
HP_Archiv has joined #archiveteam-ot |
08:03
🔗
|
HP_Archiv |
Hey, so I've got HexChat installed |
08:03
🔗
|
HP_Archiv |
I'm trying to connect to the EFnet but I can't for some reason |
08:03
🔗
|
HP_Archiv |
Any thoughts? |
08:04
🔗
|
HP_Archiv |
Never mind, disregard that. Got it. |
08:24
🔗
|
Raccoon |
At the rate YouTube deletes videos, that 385 TB should only be a few gigs today :p |
08:24
🔗
|
Raccoon |
(snark re RSY00O) |
08:28
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
09:09
🔗
|
|
HP_Archiv has quit IRC (Quit: Page closed) |
09:51
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
09:51
🔗
|
|
Jens has joined #archiveteam-ot |
11:34
🔗
|
ivan |
multilingual keyword spreadsheet that I made for youtube searching https://docs.google.com/spreadsheets/d/1fFqfhJjpZsCNuL9_uvRpwpe40onVoRE1RsflJsiOKKc/edit?usp=sharing |
11:35
🔗
|
ivan |
I guess the non-insane way to archive this stuff would be to have scripts hit search and look for high-view videos |
13:33
🔗
|
|
vitzli has joined #archiveteam-ot |
14:59
🔗
|
|
bluefoo has joined #archiveteam-ot |
15:03
🔗
|
|
akierig has joined #archiveteam-ot |
15:53
🔗
|
|
deevious has quit IRC (Remote host closed the connection) |
15:55
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
16:01
🔗
|
|
RSY00O has quit IRC (Ping timeout: 260 seconds) |
16:19
🔗
|
|
akierig has quit IRC (Quit: later_gator) |
16:28
🔗
|
|
SketchCow has quit IRC (Read error: Connection reset by peer) |
16:31
🔗
|
|
SketchCow has joined #archiveteam-ot |
16:31
🔗
|
|
Fusl__ sets mode: +o SketchCow |
16:31
🔗
|
|
Fusl sets mode: +o SketchCow |
16:31
🔗
|
|
Fusl_ sets mode: +o SketchCow |
16:37
🔗
|
|
Hani111 has joined #archiveteam-ot |
16:47
🔗
|
|
Hani has quit IRC (Ping timeout: 745 seconds) |
16:47
🔗
|
|
Hani111 is now known as Hani |
17:49
🔗
|
|
icedice has joined #archiveteam-ot |
17:54
🔗
|
|
iceloops1 has joined #archiveteam-ot |
17:55
🔗
|
|
prq has joined #archiveteam-ot |
18:08
🔗
|
|
akierig has joined #archiveteam-ot |
18:41
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
18:51
🔗
|
|
icedice has quit IRC (Ping timeout: 252 seconds) |
19:19
🔗
|
|
Hani111 has joined #archiveteam-ot |
19:23
🔗
|
|
Hani has quit IRC (Ping timeout: 745 seconds) |
19:23
🔗
|
|
Hani111 is now known as Hani |
19:46
🔗
|
|
icedice has joined #archiveteam-ot |
20:24
🔗
|
|
akierig has quit IRC (Read error: Operation timed out) |
20:59
🔗
|
|
odemg has quit IRC (Ping timeout: 745 seconds) |
21:00
🔗
|
|
odemg has joined #archiveteam-ot |
21:08
🔗
|
|
dhyan_nat has joined #archiveteam-ot |
21:35
🔗
|
|
dhyan_nat has quit IRC (Read error: Operation timed out) |
22:37
🔗
|
|
X-Scale` has joined #archiveteam-ot |
22:40
🔗
|
|
X-Scale has quit IRC (Read error: Operation timed out) |
22:40
🔗
|
|
X-Scale` is now known as X-Scale |
23:10
🔗
|
|
BlueMax has joined #archiveteam-ot |