[00:53] *** ndizzle has quit IRC (Read error: Connection reset by peer) [01:15] *** ndiddy has joined #archiveteam [01:35] *** VADemon has joined #archiveteam [02:03] *** atrocity has quit IRC (Read error: Operation timed out) [02:30] while I have no reason to think this is going away anytime soon, we should probably think about the process for saving https://steamcommunity.com/workshop/ [02:31] sometimes modmakers take something down [02:41] not sure if this was on steam, but this author tried to delete his mod from the internet: http://pastebin.com/bRYrvSAs [02:41] so yeah, i think it's a good idea to backup the steamworkshop [03:23] *** VADemon has quit IRC (Quit: left4dead) [03:32] *** JesseW has joined #archiveteam [04:36] *** bsmith093 has quit IRC (Ping timeout: 244 seconds) [04:40] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:46] *** BlueMaxim has joined #archiveteam [04:46] *** Sk1d has joined #archiveteam [04:52] *** BlueMaxim has quit IRC (Quit: Leaving) [04:55] *** BlueMaxim has joined #archiveteam [05:17] *** JesseW has quit IRC (Ping timeout: 370 seconds) [05:25] *** JesseW has joined #archiveteam [05:31] *** DaSaucefu has quit IRC (Ping timeout: 244 seconds) [05:31] *** danielsau has joined #archiveteam [05:40] *** bsmith093 has joined #archiveteam [05:42] *** Honno has joined #archiveteam [06:12] *** vitzli has joined #archiveteam [06:16] http://www.bbc.co.uk/news/uk-36308976 looks like it's official [06:17] Also looks like it might be worth a full BBC archive at sometime [06:42] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:50] HCross2: i starting up my news.bbc.co.uk/2/hi/XXXXXX.stm grabs again [06:51] OK [06:51] ALL SO ADORABLE [06:52] the last one i did : https://archive.org/details/news.bbc.co.uk-2-hi-240xxxx-stm-pages-20160110 [06:52] your technically not get the real url [06:52] but you are getting the articles [06:54] i do a brute force of the numbers before i make a list [06:54] seeing as about only 1200 to 1500 urls give me a page per a 10000 [06:55] Anyone want to take a shot at remixing the recipies? [07:20] *** metalcamp has joined #archiveteam [07:34] my cat may have killed by a coyote [07:35] *** ariscop has quit IRC (Quit: Leaving) [07:45] Sorry to hear it, man [07:51] *** atomotic has joined #archiveteam [08:01] *** schbirid has joined #archiveteam [08:08] *** ndiddy has quit IRC (Read error: Operation timed out) [08:19] *** WinterFox has joined #archiveteam [08:47] *** bsmith093 has quit IRC (Ping timeout: 499 seconds) [08:48] *** ariscop has joined #archiveteam [08:59] *** bsmith093 has joined #archiveteam [09:08] *** dashcloud has quit IRC (Read error: Operation timed out) [09:11] *** dashcloud has joined #archiveteam [09:30] anyone seen the bbc maybe closing their recipes archive? [09:30] oh looks like you're on it ^_^ [09:47] *** bwn has quit IRC (Read error: Operation timed out) [09:56] *** marvinw has quit IRC (Quit: Leaving) [10:18] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [10:22] *** marvinw has joined #archiveteam [10:25] *** BartoCH has joined #archiveteam [10:43] *** marvinw has quit IRC (Quit: Leaving) [10:46] *** marvinw has joined #archiveteam [11:14] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:44] *** bwn has joined #archiveteam [12:25] *** atomotic has joined #archiveteam [12:33] escapistmagazine.com is down. It belonged to gamefront. Did we save it too? [12:42] http://www.bbc.co.uk/news/uk-36308976 arseholes [12:42] *** toad2 has quit IRC (Read error: Operation timed out) [12:43] Are we sure the BBC isn't owned by Yahoo at this point [12:43] XD [12:43] *** toad1 has joined #archiveteam [12:43] *** jgeoiur has joined #archiveteam [12:44] *** BlueMaxim has quit IRC (Quit: Leaving) [12:44] BBC doesn't keep a lot of there podcasts pass 30 days [12:45] thats been a thing for a while with them [12:45] hi. are you guys aware that a significant portion of the youtube videos that included copyrighted work are marked as such, and can be deleted by the press of a button? have you considered archiving videos from youtube? [12:49] jgeoiur: the scale of that is ludicrous, the amount of content that must get uploaded to youtube on a minutely basis must be measured in the gigabytes a second. [12:49] i didn't say all of youtube. but only copyrighted works that aren't easily available elsewhere, perhaps [12:50] which is like 80% of the content [12:51] i think most of the content is available elsewhere as well [12:51] films and albums that are only available on youtube are pretty rare [12:53] i discovered that if i run `mpv [url of some album]`, i get a 404 not found, but if i run `mpv [url of video without copyrighted work]`, it works just fine [12:54] which is pretty frightening. if something like SOPA passes, they can conveniently remove all that content [12:55] i'm not even sure if it's only the case for copyrighted work. they can apply censorship that way [12:55] if an archive contains anything that a copyright holder cares enough to remove from youtube, they will happily fire DMCA takedowns at anybody hosting other copies of it too. [12:55] sorry i meant 403 forbidden. [12:59] yes ok so forget about the copyrighted material. the point i was trying to make was that youtube has identified and marked an enormous amount of videos with attribute x. in this case x is copyrighted material, but it could just as well be anything else, like controversial material and you won't know about it until it's too late [12:59] we don't know to what degree the material on youtube is being data mined [13:01] If you have any video that is controversal, feel free to throw it to #archivebot, with the --youtube-dl flag. [13:02] Downloading all of youtube does not work, as mentioned and we have no way of generally identifing, what is controversal and what not [13:04] *** WinterFox has quit IRC (Remote host closed the connection) [13:05] can the archivebot monitor a webpage for changes, and archive each change automatically? [13:05] or rather monitor a youtube channel, and archive each new video that's being uploaded automatically [13:06] right now, archivebot works on a purely manual basis [13:06] jgeoiur, what sort of site is it [13:07] https://www.youtube.com/user/robag88/videos [13:08] very high quality Videos, I see. [13:10] ok I'll just keep on doing it manually [13:15] jgeoiur: there is a project that is in the works that can do that in the future [13:15] i see. thanks [13:16] if you wish to help with it, join #videobot [13:22] *** atrocity has joined #archiveteam [13:27] *** dashcloud has quit IRC (Read error: Operation timed out) [13:30] *** dashcloud has joined #archiveteam [13:43] *** jgeoiur has quit IRC (Leaving) [14:27] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [14:41] *** atomotic has joined #archiveteam [14:44] *** atomotic has quit IRC (Client Quit) [14:47] *** VADemon has joined #archiveteam [15:09] *** JesseW has joined #archiveteam [15:35] *** dashcloud has quit IRC (Read error: Operation timed out) [15:37] *** JesseW has quit IRC (Ping timeout: 370 seconds) [15:39] *** dashcloud has joined #archiveteam [16:10] *** nwf has joined #archiveteam [16:13] *** G33KY has joined #archiveteam [16:24] *** schbirid has quit IRC (Ping timeout: 258 seconds) [16:54] Soundcloud is over. [16:55] in the past year they've had a lot of... staff turnover, i hear [16:55] http://www.digitalmusicnews.com/2016/05/16/soundcloud-preparing-massive-restrictions-dj-uploads/ [17:00] I know some people were working on discovery for Soundcloud, did that ever go anywhere? [17:03] #soundclown FYI [17:03] It's pretty dead [17:10] * phillipsj gets 403: forbidden while using lynx with that last link. (that is like 2 in 1 week) [17:10] rofl [17:10] *** zgrant has joined #archiveteam [17:11] silly kids, cats arent for the net [17:11] wait.. [17:12] *** zgrant has quit IRC (Client Quit) [17:12] *** zgrant has joined #archiveteam [17:14] The user-agent string looks more like a crawler than a Graphical web-browser. [17:16] let me guess, "Lynx/2.8.9 (Not A Crawler, like Lizard) SrslyNotCrawler/" [17:19] User agent strings are so broken. Look at the current one for Edge: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 [17:19] I'm surprised the browser vendors haven't made any effort to deprecate them [17:23] Jesus Christ thats for Edge? Thats like the whole browser market from the last 20 years in one! [17:25] Really, they should just agree on a standard backwards-compatible user agent for browsers and have a separate header for bots (like Robot: wpull/1.x) [17:26] Though this is -bs [17:36] oops, I forgot the first two characters of the Edge sting. Oh well. [17:36] *** nwf has quit IRC (Read error: Connection reset by peer) [17:36] sorry wrong window [17:37] *** nwf has joined #archiveteam [17:39] really, people should stop doing naive useragent detection on their websites, then browser vendors wouldn't have to lie like that [17:44] Yeah, but we know it'll never, ever get fixed [17:48] *** JW_work has quit IRC (Quit: Leaving.) [17:58] *** vitzli has quit IRC (Quit: Leaving) [18:05] *** JW_work has joined #archiveteam [18:22] https://www.reddit.com/r/electronicmusic/comments/4jpqeo/soundcloud_says_reports_of_dj_mixes_being_pulled/ [19:16] *** nox_ has joined #archiveteam [19:17] *** nox has quit IRC (Read error: Connection reset by peer) [19:22] *** closure has joined #archiveteam [19:57] *** ndiddy has joined #archiveteam [19:57] *** MMovie has quit IRC (Read error: Operation timed out) [19:57] *** MMovie has joined #archiveteam [20:07] *** ariscop has quit IRC (Leaving) [20:17] *** MMovie1 has joined #archiveteam [20:19] *** MMovie has quit IRC (Read error: Operation timed out) [20:41] *** ItsYoda has joined #archiveteam [20:45] Ive started a manual crawl of BBC travel [20:47] *** Stiletto has quit IRC () [21:00] *** Honno has quit IRC (Read error: Operation timed out) [21:10] *** ariscop has joined #archiveteam [21:15] http://www.polygon.com/2016/5/17/11692866/gametrailers-ign-acquisition-youtube-archive [21:15] *** khaoohs has quit IRC (Ping timeout: 499 seconds) [21:21] *** zgrant has quit IRC (Quit: http://chat.efnet.org (EOF)) [21:21] picture of my cat : https://scontent-lga3-1.xx.fbcdn.net/t31.0-8/q81/s960x960/13217022_10204588349268713_1457902763586552308_o.jpg [21:24] *** dashcloud has quit IRC (Read error: Operation timed out) [21:27] *** dashcloud has joined #archiveteam [21:29] that's a heck of a cat [21:40] *** Stiletto has joined #archiveteam [21:59] *** G33KY has quit IRC (Remote host closed the connection) [22:33] *** bwn has quit IRC (Read error: Operation timed out) [22:34] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [22:43] *** bwn has joined #archiveteam [22:48] *** tomwsmf-a has joined #archiveteam [22:56] hey, is there any plan for a bbc cooking site scrape? [22:58] head's up: FurAffinity was just attacked http://forums.furaffinity.net/threads/5-17-site-attack.1530523/ [22:58] *** maseck has quit IRC (Remote host closed the connection) [22:59] *** w0rp has quit IRC (Read error: Operation timed out) [23:00] *** w0rp has joined #archiveteam [23:03] *** maseck has joined #archiveteam [23:05] cadbury_ yes, we are working on it [23:23] *** BlueMaxim has joined #archiveteam [23:55] *** JordanJ2 has joined #archiveteam