Time |
Nickname |
Message |
00:10
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
00:11
🔗
|
|
Stiletto has joined #archiveteam |
00:15
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 260 seconds) |
00:17
🔗
|
|
Muad-Dib has joined #archiveteam |
00:23
🔗
|
arkiver |
So at first this videobot will only support youtube |
00:23
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
00:23
🔗
|
|
Stiletto has joined #archiveteam |
00:23
🔗
|
arkiver |
It will save youtube video together with all files that would be neede for playback |
00:23
🔗
|
arkiver |
it will also upload the youtube video as video item to IA |
00:24
🔗
|
arkiver |
Now youtue-dl does an ok job on saving youtube videos and making them playback later |
00:24
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
00:24
🔗
|
|
Chorca has quit IRC (Read error: Operation timed out) |
00:25
🔗
|
arkiver |
But some video sites downloaded with youtube-dl won't have all files saved that are needed for playback |
00:25
🔗
|
arkiver |
Other sites will be supported later |
00:25
🔗
|
arkiver |
Full account, playlist, etc. discovery for videos will be in too |
00:25
🔗
|
|
Stiletto has joined #archiveteam |
00:28
🔗
|
arkiver |
SketchCow: what do you think of such a project? see above |
00:29
🔗
|
|
Chorca has joined #archiveteam |
00:43
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
00:44
🔗
|
|
Stiletto has joined #archiveteam |
01:23
🔗
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
01:33
🔗
|
SketchCow |
Videobot of what |
01:33
🔗
|
SketchCow |
Everything? |
01:37
🔗
|
|
SN4T14 has joined #archiveteam |
01:39
🔗
|
HCross |
I think hes meaning an "on demand channel archiver" - so you feed it a channel and it gets everything related to it |
01:52
🔗
|
|
Stiletto has joined #archiveteam |
01:53
🔗
|
|
toad2 has joined #archiveteam |
01:56
🔗
|
|
toad1 has quit IRC (Read error: Operation timed out) |
02:03
🔗
|
SketchCow |
FOS load has gone WAY down. |
02:03
🔗
|
SketchCow |
Hard drive usage is dropping notably |
02:10
🔗
|
|
vitzli has joined #archiveteam |
02:15
🔗
|
|
philpem has quit IRC (Ping timeout: 260 seconds) |
02:22
🔗
|
|
SirCmpwn has joined #archiveteam |
02:39
🔗
|
|
kisspunch has joined #archiveteam |
02:45
🔗
|
Frogging1 |
YouTube Red deletes videos if the channel owner isn't around to accept the new terms of service I think |
02:45
🔗
|
Frogging1 |
So it might be worth having a way to archive things that are likely to disappear |
02:45
🔗
|
Frogging1 |
Certain YouTubers have died for example |
02:45
🔗
|
Frogging1 |
Or just stopped using the site |
02:46
🔗
|
|
Frogging1 is now known as Frogging |
02:52
🔗
|
snape_ |
Weren't the new terms of service rolled out months ago, though? I remember wailing and arguments about it late last year. Hasn't the deadline come and gone? |
02:55
🔗
|
snape_ |
A quick Google search suggests the deadline for accepting the TOS was 22 October 2015. |
02:59
🔗
|
snape_ |
That being said... http://youtube.wikia.com/wiki/Deceased_YouTubers |
03:10
🔗
|
trs80 |
they might only become inaccessible in the US though? since youtube red isn't available elsewhere |
03:19
🔗
|
bai |
doesn't that mean....heh |
03:26
🔗
|
|
altlabel has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
i0npulse has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
PotcFdk has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
limebyte has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
coretx has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
tobbez has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
pikhq has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
Ymgve has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
PurpleSym has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
mafrasi2 has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
Meeh has quit IRC (hub.dk irc.homelien.no) |
03:26
🔗
|
|
sHATNER has quit IRC (hub.dk irc.homelien.no) |
03:59
🔗
|
|
vOYtEC_ has joined #archiveteam |
04:02
🔗
|
|
achip has quit IRC (hub.efnet.us irc.Prison.NET) |
04:26
🔗
|
|
Stiletto has quit IRC (Remote host closed the connection) |
04:26
🔗
|
|
Stiletto has joined #archiveteam |
04:30
🔗
|
|
Stiletto has quit IRC (Remote host closed the connection) |
04:31
🔗
|
|
Stiletto has joined #archiveteam |
04:31
🔗
|
|
achip has joined #archiveteam |
04:38
🔗
|
|
Chorca has quit IRC (Ping timeout: 252 seconds) |
04:40
🔗
|
|
SketchCow sets mode: +b *!*kyan@184.75.223.* |
04:40
🔗
|
|
kyan was kicked by SketchCow (kyan) |
04:40
🔗
|
|
Chorca has joined #archiveteam |
04:43
🔗
|
|
Froggypwn has joined #archiveteam |
05:14
🔗
|
|
Swizzle has quit IRC (Read error: Operation timed out) |
05:38
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:45
🔗
|
|
Sk1d has joined #archiveteam |
06:09
🔗
|
|
oldcad has quit IRC (Quit: Leaving.) |
06:20
🔗
|
|
db48x has joined #archiveteam |
06:26
🔗
|
|
WinterFox has joined #archiveteam |
06:26
🔗
|
|
sHATNER has joined #archiveteam |
06:26
🔗
|
|
i0npulse has joined #archiveteam |
06:26
🔗
|
|
mafrasi2_ has joined #archiveteam |
06:26
🔗
|
|
altlabel has joined #archiveteam |
06:26
🔗
|
|
PotcFdk has joined #archiveteam |
06:26
🔗
|
|
limebyte has joined #archiveteam |
06:26
🔗
|
|
coretx has joined #archiveteam |
06:26
🔗
|
|
tobbez has joined #archiveteam |
06:26
🔗
|
|
pikhq has joined #archiveteam |
06:26
🔗
|
|
Ymgve has joined #archiveteam |
06:26
🔗
|
|
PurpleSym has joined #archiveteam |
06:26
🔗
|
|
Meeh has joined #archiveteam |
06:26
🔗
|
|
irc.homelien.no sets mode: +o PurpleSym |
07:13
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
07:15
🔗
|
|
jut has joined #archiveteam |
08:47
🔗
|
|
signius has quit IRC (Ping timeout: 300 seconds) |
08:49
🔗
|
|
antomatic has quit IRC (Read error: Connection reset by peer) |
08:50
🔗
|
|
antomatic has joined #archiveteam |
09:00
🔗
|
|
signius has joined #archiveteam |
09:28
🔗
|
|
schbirid has joined #archiveteam |
09:47
🔗
|
arkiver |
SketchCow: The videobot should support a lot of video, audio, etc. services |
09:48
🔗
|
arkiver |
Basically if some youtube, vine, some other service account is going away for whatever reason then this videobot will grab all videos from that account |
09:49
🔗
|
arkiver |
It then uploads the videos to IA as video items (like the videos from the youtubearchive collection that was darked) and as WARC items. |
09:50
🔗
|
arkiver |
Single videos will also be supported. For example in case of protests or new terrorist attacks. |
10:08
🔗
|
|
xekc has joined #archiveteam |
10:25
🔗
|
|
lytv has quit IRC (Read error: Operation timed out) |
10:26
🔗
|
|
lytv has joined #archiveteam |
11:01
🔗
|
|
xekc has quit IRC (Ping timeout: 250 seconds) |
11:11
🔗
|
|
VADemon has joined #archiveteam |
11:16
🔗
|
|
Swizzle has joined #archiveteam |
11:33
🔗
|
|
Swizzle has quit IRC (Read error: Operation timed out) |
11:43
🔗
|
|
i0npulse has quit IRC (leaving) |
11:47
🔗
|
|
i0npulse has joined #archiveteam |
11:54
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
12:00
🔗
|
|
megaminxw has joined #archiveteam |
12:26
🔗
|
|
arkiver3 has joined #archiveteam |
12:44
🔗
|
|
Rickster has quit IRC (Ping timeout: 260 seconds) |
12:44
🔗
|
|
marvinw has quit IRC (Ping timeout: 260 seconds) |
12:46
🔗
|
|
Kenshin has quit IRC (Read error: Connection reset by peer) |
12:46
🔗
|
|
Kenshin has joined #archiveteam |
12:46
🔗
|
|
Famicoman has quit IRC (Ping timeout: 260 seconds) |
12:47
🔗
|
|
goekesmi has quit IRC (Ping timeout: 260 seconds) |
12:47
🔗
|
|
goekesmi has joined #archiveteam |
12:55
🔗
|
|
Rickster has joined #archiveteam |
13:00
🔗
|
|
marvinw has joined #archiveteam |
13:12
🔗
|
|
megaminxw has quit IRC (Quit: Leaving.) |
13:34
🔗
|
|
VADemon has quit IRC (Read error: Operation timed out) |
13:36
🔗
|
|
Famicoman has joined #archiveteam |
13:47
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
13:51
🔗
|
|
arkiver3 has joined #archiveteam |
13:52
🔗
|
SmileyG |
Nice |
14:22
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
14:24
🔗
|
|
Zei-Pii has joined #archiveteam |
14:31
🔗
|
|
plog99 has joined #archiveteam |
14:34
🔗
|
|
fpoee has quit IRC (Ping timeout: 360 seconds) |
14:41
🔗
|
|
vegbrasil has quit IRC (*) |
14:41
🔗
|
|
vegbrasil has joined #archiveteam |
14:43
🔗
|
|
scyther has joined #archiveteam |
14:49
🔗
|
|
Boltsie__ has joined #archiveteam |
14:50
🔗
|
|
Boltsie__ is now known as Boltsie |
14:55
🔗
|
|
VADemon has joined #archiveteam |
14:57
🔗
|
|
arkiver3 has joined #archiveteam |
15:17
🔗
|
|
arkiver3 has quit IRC (Ping timeout: 252 seconds) |
15:29
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
15:48
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
15:49
🔗
|
|
ndiddy has joined #archiveteam |
15:56
🔗
|
|
RichardG has joined #archiveteam |
16:12
🔗
|
|
scyther has quit IRC (Read error: Connection reset by peer) |
16:14
🔗
|
|
GLaDOS has joined #archiveteam |
16:23
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
16:24
🔗
|
PotcFdk |
Hey, I just wanted to announce that I began rewriting my old broken YouTube channel/playlist mirror script that helps maintaining a local mirror of channels. It handles video title changes and collisions while providing a handy way of keeping an up-to-date mirror, including a directory of video-title symlinks that point at video-id files. Full explanation and example workflow in README.md - maybe somebody here is interested in such a thing, too. |
16:24
🔗
|
PotcFdk |
https://github.com/PotcFdk/youtube-sync (Note: This is WIP. It works, but I wouldn't consider this stable yet.) |
16:26
🔗
|
Fletcher |
arkiver ^ |
16:55
🔗
|
HCross |
Can whoever is running newsbuddy again. Stop please.... |
16:57
🔗
|
HCross |
The IRC bot is broken, but its actually working |
17:04
🔗
|
SketchCow |
FOS continues to heal |
17:31
🔗
|
|
espes__ has quit IRC (Read error: Operation timed out) |
17:43
🔗
|
|
scyther has joined #archiveteam |
17:51
🔗
|
|
philpem has joined #archiveteam |
18:02
🔗
|
|
vitzli has quit IRC (Leaving) |
18:04
🔗
|
|
mafrasi2_ has quit IRC (Read error: Connection reset by peer) |
18:06
🔗
|
|
Swizzle has joined #archiveteam |
18:07
🔗
|
|
i0npulse has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
sHATNER has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
altlabel has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
PotcFdk has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
limebyte has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
coretx has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
tobbez has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
pikhq has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
Ymgve has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
PurpleSym has quit IRC (hub.dk irc.homelien.no) |
18:07
🔗
|
|
Meeh has quit IRC (hub.dk irc.homelien.no) |
18:29
🔗
|
|
Tomcat_ has joined #archiveteam |
18:34
🔗
|
swebb |
My heritrix crawl of Al Jazeera America is about 170,000 pages (18GB) so-far. Should I continue even if archive.org is already archiving it in the Wayback machine? |
18:35
🔗
|
arkiver |
why not |
18:35
🔗
|
arkiver |
duplicate of some pages wouldn't be too bad |
18:36
🔗
|
snape_ |
If you can afford the space and BW, sure. All al-Jazeera have to do is add one line to their robots.txt to make everything in the Wayback Machine unavailable, after all... |
18:36
🔗
|
arkiver |
if it's unavailable it's still saved |
18:38
🔗
|
|
tobbez has joined #archiveteam |
18:38
🔗
|
|
i0npulse has joined #archiveteam |
18:38
🔗
|
|
PurpleSym has joined #archiveteam |
18:38
🔗
|
|
mafrasi2 has joined #archiveteam |
18:38
🔗
|
|
sHATNER has joined #archiveteam |
18:38
🔗
|
|
altlabel has joined #archiveteam |
18:38
🔗
|
|
PotcFdk has joined #archiveteam |
18:38
🔗
|
|
limebyte has joined #archiveteam |
18:38
🔗
|
|
coretx has joined #archiveteam |
18:38
🔗
|
|
pikhq has joined #archiveteam |
18:38
🔗
|
|
Ymgve has joined #archiveteam |
18:38
🔗
|
|
Meeh has joined #archiveteam |
18:43
🔗
|
snape_ |
True, but it doesn't hurt to have a second copy, just in case. It's less than a Blu-Ray of data, after all... |
18:44
🔗
|
swebb |
snape: so-far. |
18:48
🔗
|
zino |
PotcFdk: Nice! I'll try that out for my personal datahoarding. |
18:49
🔗
|
PotcFdk |
zino: I'm happy that it appears to be useful to other people than just me |
18:51
🔗
|
zino |
Now I only need something similar for Twitch since they automatically throw away all old content that has not been featured. |
18:51
🔗
|
HCross |
zino, arkiver - is twitch something the videobot could tackle as a repeat thing? |
18:52
🔗
|
arkiver |
sure |
18:52
🔗
|
arkiver |
Hmm, I'm going to make a version too which can be run at home for personal archives |
18:53
🔗
|
zino |
That would be very nice. |
18:53
🔗
|
arkiver |
with the option to create WARC, only grab video/audio file or do both |
18:54
🔗
|
jut |
That would be amazing. |
18:54
🔗
|
snape_ |
swebb, is that 170,000 pages, or pages/images/scripts/everything else? |
18:55
🔗
|
swebb |
Oh, everything. |
18:55
🔗
|
swebb |
urls |
18:55
🔗
|
swebb |
80k html pages |
18:56
🔗
|
PotcFdk |
zino: Feel free to spam issues in case everything breaks horribly |
18:56
🔗
|
|
wyatt8740 has joined #archiveteam |
18:58
🔗
|
zino |
PotcFdk: Will do. Probably not until the weekend though. I'm rebuilding my home racks and several of my storage servers are currently residing on my living room table. :) |
19:02
🔗
|
snape_ |
swebb, I have to imagine you're pretty close to done. Even with all the topic pages and everything, that'd be something above 70 pages/day over their three-year run. I wouldn't think it'd be much above a hundred, but I could easily be wrong... |
19:04
🔗
|
|
metalcamp has joined #archiveteam |
19:06
🔗
|
snape_ |
Google claims to know of only "about 36,000" pages, FWIW. O.o |
19:15
🔗
|
arkiver |
Update your scripts for gametrailers! |
19:15
🔗
|
arkiver |
Last round of items |
19:15
🔗
|
arkiver |
All 10videos items have been converted to single video items |
19:15
🔗
|
arkiver |
well, all 10videos items that were out |
19:16
🔗
|
snape_ |
Boston-specific startup dunwello.com is closing down in the next few weeks, probably maybe not even the wacky head of the company really seems to know for sure. http://bostinno.streetwise.co/2016/02/15/dunwello-is-shutting-down-matt-lauzon-says/ |
19:21
🔗
|
Frogging |
arkiver: You know, youtube-dl supports a ton of video sites and downloading whole profiles on some of them (including YouTube) |
19:21
🔗
|
Frogging |
https://rg3.github.io/youtube-dl/supportedsites.html |
19:22
🔗
|
arkiver |
yes, though youtube-dl is not working well for all video websites when comes to creating a WARC can be playbacked somewhere in the future |
19:22
🔗
|
arkiver |
It sometimes doesn't grab all files needed for a playback |
19:22
🔗
|
arkiver |
However, youtube-dl is working fine for youtube when it comes to that |
19:22
🔗
|
Frogging |
Soundcloud is supported too it seems |
19:23
🔗
|
Frogging |
WARC? |
19:24
🔗
|
Frogging |
I'd google but I'm on mobile |
19:24
🔗
|
arkiver |
WebARChive file |
19:24
🔗
|
arkiver |
contains all headers too besides the files |
19:24
🔗
|
Frogging |
What is that kind of file used for? |
19:24
🔗
|
arkiver |
well, web archives |
19:25
🔗
|
arkiver |
pretty much for every project we do |
19:25
🔗
|
arkiver |
and the wayback machine only works with that |
19:25
🔗
|
arkiver |
but let's move this over to #archiveteam-bs |
19:29
🔗
|
Frogging |
kk |
19:36
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
19:44
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 260 seconds) |
20:30
🔗
|
|
metalcamp has quit IRC (Ping timeout: 492 seconds) |
20:30
🔗
|
|
espes__ has joined #archiveteam |
20:58
🔗
|
|
Zei-Pii has quit IRC (Ping timeout: 250 seconds) |
21:09
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
21:25
🔗
|
|
jut has quit IRC (Read error: Connection reset by peer) |
21:29
🔗
|
|
wyatt8740 has quit IRC (Read error: Operation timed out) |
21:34
🔗
|
|
RichardG has joined #archiveteam |
21:36
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:46
🔗
|
|
RichardG has quit IRC (Ping timeout: 633 seconds) |
21:50
🔗
|
|
megaminxw has joined #archiveteam |
22:00
🔗
|
|
megaminxw has quit IRC (Quit: Leaving.) |
22:06
🔗
|
|
scyther has quit IRC (Quit: Leaving) |
22:16
🔗
|
|
RichardG has joined #archiveteam |
22:23
🔗
|
|
RichardG has quit IRC (Ping timeout: 360 seconds) |
22:31
🔗
|
|
Atom__ has quit IRC (Ping timeout: 252 seconds) |
22:32
🔗
|
|
Lord_Nigh has quit IRC (Ping timeout: 252 seconds) |
22:35
🔗
|
|
superkuh has quit IRC (Ping timeout: 252 seconds) |
22:37
🔗
|
|
Lord_Nigh has joined #archiveteam |
22:39
🔗
|
|
superkuh has joined #archiveteam |
23:28
🔗
|
|
mismatch has quit IRC (Remote host closed the connection) |
23:28
🔗
|
|
mismatch has joined #archiveteam |