[00:14] (nods) [00:15] I've started backing up/mirroring various YouTube channels (and my own favourites list) because you just can't rely on anything staying up for any significant time any more [00:27] *** JesseW has joined #archiveteam-bs [00:46] DFJustin, Microgur1: I'll add this to the wiki, but here's a one liner for getting youtube (needs warcprox running on localhost:8000) youtube-dl --title --continue --retries 100 --write-info-json --write-description --write-thumbnail --proxy="localhost:8000" --write-annotations --all-subs --no-check-certificate --ignore-errors -k -f bestvideo+bestaudio/best [01:07] *** JetBalsa has quit IRC (Quit: - nbs-irc 2.39 - www.nbs-irc.net -) [01:08] PurpleSym: much better metadata for the yahoo grab, thank you [01:09] number of messages per group would be nice, and earliest and latest date of messages (although that would be more of a hassle to get). [01:40] i [01:40] i [01:40] i'm going to be uploading more mbc newsdesk [01:45] I got 9000 of Springer. [01:47] *** jleclanch has joined #archiveteam-bs [01:47] fine [01:47] jleclanch: hi! [01:47] hi :) [01:47] Sketchcow: Over 9000? :-) [01:48] * JesseW is trying to figure out how to install a particular version of a ruby gem, to get the tests for archivebot's bot to run [01:49] JesseW: those download tests take so long >.> [01:50] yeah, they do [01:50] JesseW: I believe -v sem.var is the [01:51] proper option for that [01:52] --version seems to do the trick. thanks, though [01:54] I am sure someone will make insane torrents of springer. Let me know if they do. [01:56] you guys may get the official playstation magazine [01:56] But I have to turn back to the other opportunities. [01:56] from russia [01:56] Maybe one day I can finally clear all the uploads on FOS [01:56] yeah, the day that machine is retired. [01:57] *** JesseW has quit IRC (Leaving.) [01:57] or dies like the printer in office space [01:58] https://www.youtube.com/watch?v=pD2xBXm4y70 [02:19] *** username1 has joined #archiveteam-bs [02:24] *** schbirid2 has quit IRC (Ping timeout: 311 seconds) [02:24] looks like the korea trailer for Pearl Harbor is on one of the mbc newsdesk [02:25] at the end of 2001-05-21 episode [02:57] * kyan wants to know how to see the live chat from youtube live streams after the stream has ended. E.g., to see my comments I made on the IA telethon last year (?). [02:59] and it starting to be uploaded: https://archive.org/details/MBC_Newsdesk_20010411 [03:16] *** aaaaaaaaa has quit IRC (Leaving) [04:02] joepie91: have you seen this story yet? https://decorrespondent.nl/3789/Operation-Easy-Chair-or-how-a-little-company-in-Holland-helped-the-CIA-bug-the-Russians/116534484-2a3d7f11 [05:18] *** BlueMaxim has joined #archiveteam-bs [05:24] *** vitzli has joined #archiveteam-bs [05:42] *** VADemon has quit IRC (left4dead) [05:49] *** FAMAS has joined #archiveteam-bs [06:18] *** FAMAS has quit IRC (Read error: Operation timed out) [06:27] i'm starting to upload good morning pops: https://archive.org/details/kbsradio-2fm-coolfm-gmp-2009-02-12 [06:31] *** FAMAS has joined #archiveteam-bs [07:00] *** FAMAS2 has joined #archiveteam-bs [07:01] *** FAMAS has quit IRC (Read error: Connection reset by peer) [07:02] it's 2015, and I still can't upload a large file on my internet connection without everything else slowing down [07:03] and I still can't read recorded video via a CF card reader without Linux crapping out other I/O in odd ways [07:03] It might help a bit to have fqcodel set up on your router. [07:03] what even is that [07:03] oh bufferbloat-related [07:03] IP queue management algorithm, fixes bufferbloat rather effectively. [07:04] I'm starting to think I really gotta set up my own router [07:04] Unfortunately, having it set up means openwrt or building your own Linux router. [07:04] yeah, I've been told I need to do this a lot, maybe 2016 is the year I finally do so [07:04] i.e. it's nice if you can get it set up, but it's a bit of a pain to do so. [07:05] But, I run fq_codel and now it doesn't matter what I do with my Internet connection, everything works reasonably. [07:05] that does sound really nice [07:05] If there's some huge download then my connection technically "slows down", but it ends up in a fair sharing situation. [07:05] why the fuck doesn't everything use that by default then [07:05] ^ [07:05] espes___: Because it's new and home router makers suck. [07:06] can isps deploy it [07:06] Yes. [07:06] It's upstream in Linux. [07:07] I'll read more on it; it'd be super-nice to be able to see what I'm typing in the chat without a 1-second delay after I hit Enter [07:08] * yipdw has modest goals [07:09] Welp, fq_codel can help that quite well. [07:12] ooh, there's some names I recognize on codel [07:26] *** FAMAS2 is now known as FAMA [07:26] *** FAMA is now known as FAMAS [07:28] is there in any person within the archiveteam fora who practices the techniques of video screenshotting? [07:30] *** Stiletto has quit IRC (Read error: Operation timed out) [07:44] *** FAMAS has quit IRC (Read error: Connection reset by peer) [08:17] JW_work: #messages in this item/WARC /= #messages in this group. WARCs are split whenever they hit 1GB and not after finishing a group. [08:18] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [08:21] *** wp494 has joined #archiveteam-bs [09:30] *** dashcloud has quit IRC (Read error: Operation timed out) [09:30] *** FAMAS has joined #archiveteam-bs [09:37] *** dashcloud has joined #archiveteam-bs [09:49] *** username1 is now known as schbirid [09:56] *** FAMAS2 has joined #archiveteam-bs [09:57] *** FAMAS has quit IRC (Quit: KVIrc 4.3.2 Aria http://www.kvirc.net/) [09:57] *** FAMAS2 is now known as FAMAS [11:06] anyone wanna mirror ebooks i grabbed at 32c3 speak in the next hour (no chance if i dont know your nick ;) ) [11:08] *german [11:13] "mirror" as in "public mirror" or just get a copy? [11:13] *** FAMAS has quit IRC (Read error: Connection reset by peer) [11:13] *** FAMAS has joined #archiveteam-bs [11:17] rsync from my box before i delete it ;) [11:20] schbirid, may I pm you? [11:20] sure [11:29] *** arkhive has quit IRC (Read error: Connection reset by peer) [12:03] *** FAMAS has quit IRC (Read error: Connection reset by peer) [12:06] so i found something old [12:07] a magazine called Table Tennis News [12:22] *** BlueMaxim has quit IRC (Quit: Leaving) [12:29] *** tjg has quit IRC (Ping timeout: 260 seconds) [12:30] *** tjg has joined #archiveteam-bs [12:35] *** tjg has quit IRC (Ping timeout: 260 seconds) [12:39] *** tjg has joined #archiveteam-bs [13:12] *** vitzli has quit IRC (Quit: Leaving) [13:34] *** tjg has quit IRC (Read error: Connection reset by peer) [13:35] *** tjg has joined #archiveteam-bs [13:50] *** tjg has quit IRC (Ping timeout: 260 seconds) [13:51] *** tjg has joined #archiveteam-bs [14:02] *** tjg has quit IRC (Ping timeout: 260 seconds) [14:24] *** tjg has joined #archiveteam-bs [14:36] *** dashcloud has quit IRC (Read error: Operation timed out) [14:36] godane, that's amazing [14:38] much fireworkz [14:38] such booom [14:39] *** dashcloud has joined #archiveteam-bs [14:52] *** ohhdemgir has quit IRC (Read error: Operation timed out) [14:55] *** FAMAS has joined #archiveteam-bs [14:59] *** tjg has quit IRC (Ping timeout: 260 seconds) [14:59] *** tjg has joined #archiveteam-bs [15:04] *** tjg has quit IRC (Ping timeout: 260 seconds) [15:07] *** tjg has joined #archiveteam-bs [15:20] *** tjg has quit IRC (Ping timeout: 260 seconds) [15:27] *** tjg has joined #archiveteam-bs [15:35] *** tjg has quit IRC (Ping timeout: 260 seconds) [15:36] *** tjg has joined #archiveteam-bs [15:41] *** tjg has quit IRC (Ping timeout: 260 seconds) [15:46] *** tjg has joined #archiveteam-bs [16:06] *** tjg has quit IRC (Ping timeout: 260 seconds) [16:06] *** tjg has joined #archiveteam-bs [16:26] *** FAMAS has quit IRC (Quit: http://chat.efnet.org (EOF)) [16:28] *** tjg has quit IRC (Ping timeout: 260 seconds) [16:29] *** Stiletto has joined #archiveteam-bs [16:30] *** tjg has joined #archiveteam-bs [16:52] *** tjg has quit IRC (Ping timeout: 260 seconds) [16:53] *** tjg has joined #archiveteam-bs [16:58] *** tjg has quit IRC (Ping timeout: 260 seconds) [16:58] *** tjg has joined #archiveteam-bs [17:13] *** tjg has quit IRC (Ping timeout: 260 seconds) [17:13] *** tjg has joined #archiveteam-bs [17:20] *** tjg has quit IRC (Ping timeout: 260 seconds) [17:24] *** tjg has joined #archiveteam-bs [17:25] *** JesseW has joined #archiveteam-bs [17:29] *** tjg has quit IRC (Ping timeout: 260 seconds) [17:32] *** SimpBrain has quit IRC (Leaving) [17:32] *** SimpBrain has joined #archiveteam-bs [17:33] *** tjg has joined #archiveteam-bs [17:50] https://openlibrary.org/works/OL3282859W/Music_the_brain_and_ecstasy <- is mistakenly linked to a treatise on moral philosophy from the 1800s. I've sent a note to the openlibrary folks, but thought it worth mentioning here, if only for the amusement value. [17:50] *** tjg has quit IRC (Ping timeout: 260 seconds) [17:53] *** tjg has joined #archiveteam-bs [18:00] *** wyatt8740 has quit IRC (Read error: Operation timed out) [18:06] Is archive.is down for anyone else? [18:08] *** tjg has quit IRC (Ping timeout: 260 seconds) [18:08] *** tjg has joined #archiveteam-bs [18:08] *** wyatt8740 has joined #archiveteam-bs [18:15] when I try to archive an entire user, I get a 410 error "$ youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio https://www.youtube.com/user/TheAn1meMan/ [18:15] [download] Downloading playlist: TheAn1meMan [18:15] [youtube:user] TheAn1meMan: Downloading video ids from 1 to 51 [18:15] ERROR: Unable to download webpage: HTTP Error 410: Gone; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type youtube-dl -U to update." [18:17] is this normal? I'm using version 2014.02.17 [18:17] Microgur1: haha you can't use an ancient youtube-dl [18:17] type youtube-dl -U to update [18:20] there we go, now it's working [18:21] I downloaded the latest version from the github page, set the executable bit, then ran. [18:22] *** tjg has quit IRC (Ping timeout: 260 seconds) [18:22] archive.is now back for me. [18:23] *** tjg has joined #archiveteam-bs [18:28] "As a side note, the administrator is unsupportive of Internet Archive's robots.txt policy - which could hinder future backup cooperation. " does this mean that archive.is isn't backed up fully? [18:28] source: http://archiveteam.org/index.php?title=Archive.is [18:29] I emailed the guy and he was unhappy with IA for not making all the wayback WARCs available [18:32] hm, the date field in archive.org cdx'es is for the response, not the request, right? [18:45] I'm unhappy with some things IA does too, but that doesn't mean I don't absolutely adore that organization. [18:52] i've noticed that when downloading an entire youtube channel, the resulting videos have some weird properties. such as "90001 frames per second" and an audio type of audio/x-unknown which won't play for me. this error message was produced in the terminal window, and may be related: "WARNING: Your copy of avconv is outdated, update avconv to version 10-0 or newer if you encounter any errors. [18:52] " [18:54] *** tjg has quit IRC (Ping timeout: 260 seconds) [18:58] *** tjg has joined #archiveteam-bs [18:58] hey either khanacademy.org is annyoing or they are running out of funds. just received multiple emails within the last 2 weeks, telling me to donate [19:23] Microgur1: yeah, youtube-dl needs avconv or ffmpeg to mux the audio and video streams together [19:23] I use a recent ffmpeg [19:23] there's a ppa for ubuntu 14.04; it's in ubuntu 15.10 without a ppa [19:26] I guess this machine is no good for archiving; it's stuck in 2014. at least it can run the Warrior just fine (8085 urlteam units and counting since last reboot) [19:27] I wouldn't worry about the avconf error, Microgurl. It just downloads the file in a differnt way, so it doesn't affect the archiving. [19:27] I use ubuntu 14.04 for archiving just fine [19:27] I see. [19:27] antomatic: he just mentioned that his files are screwed up though [19:27] it's probably still using avconv [19:28] and if it's not using avconv/ffmpeg for muxing, you can't get the highest-resolution formats [19:28] yeah, totemcan't play the audio. maybe that's just because the audio it's downloafing is in a codec I con't play. I've had issues with that unrelated to youtube-dl before with newer, generally patented codecs [19:28] well, you can, as separate files, but who wants that [19:28] try mpv [19:29] YouTube has opus audio and VP9 video [19:29] along with H264 main profile and VP8 and vorbis and AAC [19:35] i'm trying it out without specifying formats this time [19:36] i'm getting H.264 for the video and MPEG-4 AAC for the audio. are those usuially the best youtubre gives? [19:36] sometimes there are higher-resolution formats available only in VP9 [19:36] use youtube-dl -F to see a list of formats for a video [19:37] you can pass -f FORMAT+FORMAT to get that video+audio [19:37] some formats (e.g. 22) include both video and audio [19:37] passing -f bestvideo+bestaudio while using grandpa's old avconf was vcausing the problem [19:40] I value archive.is particularly *because* it is not-entirely-friendly with IA -- it serves as a (semi-) independent collection. [19:42] speaking of that, are there any IA clones? [19:42] I also sympathize with both the objection to IA keeping the wayback WARCs private, and IA's decision to do so. It's a balance between availability (which would lean towards making the WARCs open) and avoiding content authors objecting to inclusion (which they would be more likely to do if the WARCs were open). [19:43] There are other web scrape collections using the Wayback software (mainly by national libraries). [19:43] I'm not sure where a list is. [19:44] Direct IA clones -- not exactly, AFAIK. [20:13] *** JesseW has quit IRC (Leaving.) [20:31] *** wyatt8740 has quit IRC (Read error: Operation timed out) [20:33] I didn't know it was a decision; I thought it was still more of a "that isn't on our high-priority list" [20:44] *** aaaaaaaaa has joined #archiveteam-bs [20:44] *** swebb sets mode: +o aaaaaaaaa [21:02] *** Ravenloft has joined #archiveteam-bs [21:02] "we held back this title till 1 week after cinema premiere to give the movie a fighting chance to play in the budget, we learned from our mistake" [21:02] did you guys catched that? [21:09] its not everyday a pirate suddenly got a conscience [21:12] That's really interesting. [21:25] *** dashcloud has quit IRC (Read error: Operation timed out) [21:29] *** dashcloud has joined #archiveteam-bs [21:48] Very occasionally [21:49] *** Stiletto has quit IRC (Read error: Connection reset by peer) [22:13] mpaa must have every fbi agent on the case [22:13] sod terrorists, someone is losing hollywood money, after them! [22:50] *** BlueMaxim has joined #archiveteam-bs [22:59] An end of year gift for Archive Team from me. [22:59] (Don't social media/reddit/hackernews it) [22:59] https://archive.org/details/magazine_rack?sort=-publicdate [22:59] Across the next, oh, hour or two, hundreds of magazines will appear there. Stuff from the last few months. [23:00] Read up, expand some horizons, go into the new year happy. [23:13] whoa [23:17] It won't last [23:17] But let us enjoy a fleeting public cool thing. [23:26] Just broke 1000 queued up. [23:27] our little newsbot has a big night ahead of it to say the least http://newsgrabber.harrycross.me:29000 [23:34] *** kyan has quit IRC (Ping timeout: 258 seconds) [23:52] those are rather current magazines- good job there Sketchcow