[00:10] *** Stiletto has quit IRC (Read error: Connection reset by peer) [00:11] *** Stiletto has joined #archiveteam [00:15] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [00:17] *** Muad-Dib has joined #archiveteam [00:23] So at first this videobot will only support youtube [00:23] *** Stiletto has quit IRC (Read error: Connection reset by peer) [00:23] *** Stiletto has joined #archiveteam [00:23] It will save youtube video together with all files that would be neede for playback [00:23] it will also upload the youtube video as video item to IA [00:24] Now youtue-dl does an ok job on saving youtube videos and making them playback later [00:24] *** Stiletto has quit IRC (Read error: Connection reset by peer) [00:24] *** Chorca has quit IRC (Read error: Operation timed out) [00:25] But some video sites downloaded with youtube-dl won't have all files saved that are needed for playback [00:25] Other sites will be supported later [00:25] Full account, playlist, etc. discovery for videos will be in too [00:25] *** Stiletto has joined #archiveteam [00:28] SketchCow: what do you think of such a project? see above [00:29] *** Chorca has joined #archiveteam [00:43] *** Stiletto has quit IRC (Read error: Connection reset by peer) [00:44] *** Stiletto has joined #archiveteam [01:23] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [01:33] Videobot of what [01:33] Everything? [01:37] *** SN4T14 has joined #archiveteam [01:39] I think hes meaning an "on demand channel archiver" - so you feed it a channel and it gets everything related to it [01:52] *** Stiletto has joined #archiveteam [01:53] *** toad2 has joined #archiveteam [01:56] *** toad1 has quit IRC (Read error: Operation timed out) [02:03] FOS load has gone WAY down. [02:03] Hard drive usage is dropping notably [02:10] *** vitzli has joined #archiveteam [02:15] *** philpem has quit IRC (Ping timeout: 260 seconds) [02:22] *** SirCmpwn has joined #archiveteam [02:39] *** kisspunch has joined #archiveteam [02:45] YouTube Red deletes videos if the channel owner isn't around to accept the new terms of service I think [02:45] So it might be worth having a way to archive things that are likely to disappear [02:45] Certain YouTubers have died for example [02:45] Or just stopped using the site [02:46] *** Frogging1 is now known as Frogging [02:52] Weren't the new terms of service rolled out months ago, though? I remember wailing and arguments about it late last year. Hasn't the deadline come and gone? [02:55] A quick Google search suggests the deadline for accepting the TOS was 22 October 2015. [02:59] That being said... http://youtube.wikia.com/wiki/Deceased_YouTubers [03:10] they might only become inaccessible in the US though? since youtube red isn't available elsewhere [03:19] doesn't that mean....heh [03:26] *** altlabel has quit IRC (hub.dk irc.homelien.no) [03:26] *** i0npulse has quit IRC (hub.dk irc.homelien.no) [03:26] *** PotcFdk has quit IRC (hub.dk irc.homelien.no) [03:26] *** limebyte has quit IRC (hub.dk irc.homelien.no) [03:26] *** coretx has quit IRC (hub.dk irc.homelien.no) [03:26] *** tobbez has quit IRC (hub.dk irc.homelien.no) [03:26] *** pikhq has quit IRC (hub.dk irc.homelien.no) [03:26] *** Ymgve has quit IRC (hub.dk irc.homelien.no) [03:26] *** PurpleSym has quit IRC (hub.dk irc.homelien.no) [03:26] *** mafrasi2 has quit IRC (hub.dk irc.homelien.no) [03:26] *** Meeh has quit IRC (hub.dk irc.homelien.no) [03:26] *** sHATNER has quit IRC (hub.dk irc.homelien.no) [03:59] *** vOYtEC_ has joined #archiveteam [04:02] *** achip has quit IRC (hub.efnet.us irc.Prison.NET) [04:26] *** Stiletto has quit IRC (Remote host closed the connection) [04:26] *** Stiletto has joined #archiveteam [04:30] *** Stiletto has quit IRC (Remote host closed the connection) [04:31] *** Stiletto has joined #archiveteam [04:31] *** achip has joined #archiveteam [04:38] *** Chorca has quit IRC (Ping timeout: 252 seconds) [04:40] *** SketchCow sets mode: +b *!*kyan@184.75.223.* [04:40] *** kyan was kicked by SketchCow (kyan) [04:40] *** Chorca has joined #archiveteam [04:43] *** Froggypwn has joined #archiveteam [05:14] *** Swizzle has quit IRC (Read error: Operation timed out) [05:38] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:45] *** Sk1d has joined #archiveteam [06:09] *** oldcad has quit IRC (Quit: Leaving.) [06:20] *** db48x has joined #archiveteam [06:26] *** WinterFox has joined #archiveteam [06:26] *** sHATNER has joined #archiveteam [06:26] *** i0npulse has joined #archiveteam [06:26] *** mafrasi2_ has joined #archiveteam [06:26] *** altlabel has joined #archiveteam [06:26] *** PotcFdk has joined #archiveteam [06:26] *** limebyte has joined #archiveteam [06:26] *** coretx has joined #archiveteam [06:26] *** tobbez has joined #archiveteam [06:26] *** pikhq has joined #archiveteam [06:26] *** Ymgve has joined #archiveteam [06:26] *** PurpleSym has joined #archiveteam [06:26] *** Meeh has joined #archiveteam [06:26] *** irc.homelien.no sets mode: +o PurpleSym [07:13] *** Aranje has quit IRC (Quit: Three sheets to the wind) [07:15] *** jut has joined #archiveteam [08:47] *** signius has quit IRC (Ping timeout: 300 seconds) [08:49] *** antomatic has quit IRC (Read error: Connection reset by peer) [08:50] *** antomatic has joined #archiveteam [09:00] *** signius has joined #archiveteam [09:28] *** schbirid has joined #archiveteam [09:47] SketchCow: The videobot should support a lot of video, audio, etc. services [09:48] Basically if some youtube, vine, some other service account is going away for whatever reason then this videobot will grab all videos from that account [09:49] It then uploads the videos to IA as video items (like the videos from the youtubearchive collection that was darked) and as WARC items. [09:50] Single videos will also be supported. For example in case of protests or new terrorist attacks. [10:08] *** xekc has joined #archiveteam [10:25] *** lytv has quit IRC (Read error: Operation timed out) [10:26] *** lytv has joined #archiveteam [11:01] *** xekc has quit IRC (Ping timeout: 250 seconds) [11:11] *** VADemon has joined #archiveteam [11:16] *** Swizzle has joined #archiveteam [11:33] *** Swizzle has quit IRC (Read error: Operation timed out) [11:43] *** i0npulse has quit IRC (leaving) [11:47] *** i0npulse has joined #archiveteam [11:54] *** WinterFox has quit IRC (Remote host closed the connection) [12:00] *** megaminxw has joined #archiveteam [12:26] *** arkiver3 has joined #archiveteam [12:44] *** Rickster has quit IRC (Ping timeout: 260 seconds) [12:44] *** marvinw has quit IRC (Ping timeout: 260 seconds) [12:46] *** Kenshin has quit IRC (Read error: Connection reset by peer) [12:46] *** Kenshin has joined #archiveteam [12:46] *** Famicoman has quit IRC (Ping timeout: 260 seconds) [12:47] *** goekesmi has quit IRC (Ping timeout: 260 seconds) [12:47] *** goekesmi has joined #archiveteam [12:55] *** Rickster has joined #archiveteam [13:00] *** marvinw has joined #archiveteam [13:12] *** megaminxw has quit IRC (Quit: Leaving.) [13:34] *** VADemon has quit IRC (Read error: Operation timed out) [13:36] *** Famicoman has joined #archiveteam [13:47] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [13:51] *** arkiver3 has joined #archiveteam [13:52] Nice [14:22] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [14:24] *** Zei-Pii has joined #archiveteam [14:31] *** plog99 has joined #archiveteam [14:34] *** fpoee has quit IRC (Ping timeout: 360 seconds) [14:41] *** vegbrasil has quit IRC (*) [14:41] *** vegbrasil has joined #archiveteam [14:43] *** scyther has joined #archiveteam [14:49] *** Boltsie__ has joined #archiveteam [14:50] *** Boltsie__ is now known as Boltsie [14:55] *** VADemon has joined #archiveteam [14:57] *** arkiver3 has joined #archiveteam [15:17] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [15:29] *** RichardG has quit IRC (Read error: Operation timed out) [15:48] *** GLaDOS has quit IRC (Read error: Operation timed out) [15:49] *** ndiddy has joined #archiveteam [15:56] *** RichardG has joined #archiveteam [16:12] *** scyther has quit IRC (Read error: Connection reset by peer) [16:14] *** GLaDOS has joined #archiveteam [16:23] *** VADemon has quit IRC (Quit: left4dead) [16:24] Hey, I just wanted to announce that I began rewriting my old broken YouTube channel/playlist mirror script that helps maintaining a local mirror of channels. It handles video title changes and collisions while providing a handy way of keeping an up-to-date mirror, including a directory of video-title symlinks that point at video-id files. Full explanation and example workflow in README.md - maybe somebody here is interested in such a thing, too. [16:24] https://github.com/PotcFdk/youtube-sync (Note: This is WIP. It works, but I wouldn't consider this stable yet.) [16:26] arkiver ^ [16:55] Can whoever is running newsbuddy again. Stop please.... [16:57] The IRC bot is broken, but its actually working [17:04] FOS continues to heal [17:31] *** espes__ has quit IRC (Read error: Operation timed out) [17:43] *** scyther has joined #archiveteam [17:51] *** philpem has joined #archiveteam [18:02] *** vitzli has quit IRC (Leaving) [18:04] *** mafrasi2_ has quit IRC (Read error: Connection reset by peer) [18:06] *** Swizzle has joined #archiveteam [18:07] *** i0npulse has quit IRC (hub.dk irc.homelien.no) [18:07] *** sHATNER has quit IRC (hub.dk irc.homelien.no) [18:07] *** altlabel has quit IRC (hub.dk irc.homelien.no) [18:07] *** PotcFdk has quit IRC (hub.dk irc.homelien.no) [18:07] *** limebyte has quit IRC (hub.dk irc.homelien.no) [18:07] *** coretx has quit IRC (hub.dk irc.homelien.no) [18:07] *** tobbez has quit IRC (hub.dk irc.homelien.no) [18:07] *** pikhq has quit IRC (hub.dk irc.homelien.no) [18:07] *** Ymgve has quit IRC (hub.dk irc.homelien.no) [18:07] *** PurpleSym has quit IRC (hub.dk irc.homelien.no) [18:07] *** Meeh has quit IRC (hub.dk irc.homelien.no) [18:29] *** Tomcat_ has joined #archiveteam [18:34] My heritrix crawl of Al Jazeera America is about 170,000 pages (18GB) so-far. Should I continue even if archive.org is already archiving it in the Wayback machine? [18:35] why not [18:35] duplicate of some pages wouldn't be too bad [18:36] If you can afford the space and BW, sure. All al-Jazeera have to do is add one line to their robots.txt to make everything in the Wayback Machine unavailable, after all... [18:36] if it's unavailable it's still saved [18:38] *** tobbez has joined #archiveteam [18:38] *** i0npulse has joined #archiveteam [18:38] *** PurpleSym has joined #archiveteam [18:38] *** mafrasi2 has joined #archiveteam [18:38] *** sHATNER has joined #archiveteam [18:38] *** altlabel has joined #archiveteam [18:38] *** PotcFdk has joined #archiveteam [18:38] *** limebyte has joined #archiveteam [18:38] *** coretx has joined #archiveteam [18:38] *** pikhq has joined #archiveteam [18:38] *** Ymgve has joined #archiveteam [18:38] *** Meeh has joined #archiveteam [18:43] True, but it doesn't hurt to have a second copy, just in case. It's less than a Blu-Ray of data, after all... [18:44] snape: so-far. [18:48] PotcFdk: Nice! I'll try that out for my personal datahoarding. [18:49] zino: I'm happy that it appears to be useful to other people than just me [18:51] Now I only need something similar for Twitch since they automatically throw away all old content that has not been featured. [18:51] zino, arkiver - is twitch something the videobot could tackle as a repeat thing? [18:52] sure [18:52] Hmm, I'm going to make a version too which can be run at home for personal archives [18:53] That would be very nice. [18:53] with the option to create WARC, only grab video/audio file or do both [18:54] That would be amazing. [18:54] swebb, is that 170,000 pages, or pages/images/scripts/everything else? [18:55] Oh, everything. [18:55] urls [18:55] 80k html pages [18:56] zino: Feel free to spam issues in case everything breaks horribly [18:56] *** wyatt8740 has joined #archiveteam [18:58] PotcFdk: Will do. Probably not until the weekend though. I'm rebuilding my home racks and several of my storage servers are currently residing on my living room table. :) [19:02] swebb, I have to imagine you're pretty close to done. Even with all the topic pages and everything, that'd be something above 70 pages/day over their three-year run. I wouldn't think it'd be much above a hundred, but I could easily be wrong... [19:04] *** metalcamp has joined #archiveteam [19:06] Google claims to know of only "about 36,000" pages, FWIW. O.o [19:15] Update your scripts for gametrailers! [19:15] Last round of items [19:15] All 10videos items have been converted to single video items [19:15] well, all 10videos items that were out [19:16] Boston-specific startup dunwello.com is closing down in the next few weeks, probably maybe not even the wacky head of the company really seems to know for sure. http://bostinno.streetwise.co/2016/02/15/dunwello-is-shutting-down-matt-lauzon-says/ [19:21] arkiver: You know, youtube-dl supports a ton of video sites and downloading whole profiles on some of them (including YouTube) [19:21] https://rg3.github.io/youtube-dl/supportedsites.html [19:22] yes, though youtube-dl is not working well for all video websites when comes to creating a WARC can be playbacked somewhere in the future [19:22] It sometimes doesn't grab all files needed for a playback [19:22] However, youtube-dl is working fine for youtube when it comes to that [19:22] Soundcloud is supported too it seems [19:23] WARC? [19:24] I'd google but I'm on mobile [19:24] WebARChive file [19:24] contains all headers too besides the files [19:24] What is that kind of file used for? [19:24] well, web archives [19:25] pretty much for every project we do [19:25] and the wayback machine only works with that [19:25] but let's move this over to #archiveteam-bs [19:29] kk [19:36] *** RichardG has quit IRC (Read error: Operation timed out) [19:44] *** GLaDOS has quit IRC (Ping timeout: 260 seconds) [20:30] *** metalcamp has quit IRC (Ping timeout: 492 seconds) [20:30] *** espes__ has joined #archiveteam [20:58] *** Zei-Pii has quit IRC (Ping timeout: 250 seconds) [21:09] *** Tomcat_ has quit IRC (Remote host closed the connection) [21:25] *** jut has quit IRC (Read error: Connection reset by peer) [21:29] *** wyatt8740 has quit IRC (Read error: Operation timed out) [21:34] *** RichardG has joined #archiveteam [21:36] *** schbirid has quit IRC (Quit: Leaving) [21:46] *** RichardG has quit IRC (Ping timeout: 633 seconds) [21:50] *** megaminxw has joined #archiveteam [22:00] *** megaminxw has quit IRC (Quit: Leaving.) [22:06] *** scyther has quit IRC (Quit: Leaving) [22:16] *** RichardG has joined #archiveteam [22:23] *** RichardG has quit IRC (Ping timeout: 360 seconds) [22:31] *** Atom__ has quit IRC (Ping timeout: 252 seconds) [22:32] *** Lord_Nigh has quit IRC (Ping timeout: 252 seconds) [22:35] *** superkuh has quit IRC (Ping timeout: 252 seconds) [22:37] *** Lord_Nigh has joined #archiveteam [22:39] *** superkuh has joined #archiveteam [23:28] *** mismatch has quit IRC (Remote host closed the connection) [23:28] *** mismatch has joined #archiveteam