#archiveteam 2016-05-17,Tue

↑back Search

Time Nickname Message
00:53 🔗 ndizzle has quit IRC (Read error: Connection reset by peer)
01:15 🔗 ndiddy has joined #archiveteam
01:35 🔗 VADemon has joined #archiveteam
02:03 🔗 atrocity has quit IRC (Read error: Operation timed out)
02:30 🔗 ranma <JW_work> while I have no reason to think this is going away anytime soon, we should probably think about the process for saving https://steamcommunity.com/workshop/
02:31 🔗 ranma sometimes modmakers take something down
02:41 🔗 ranma not sure if this was on steam, but this author tried to delete his mod from the internet: http://pastebin.com/bRYrvSAs
02:41 🔗 ranma so yeah, i think it's a good idea to backup the steamworkshop
03:23 🔗 VADemon has quit IRC (Quit: left4dead)
03:32 🔗 JesseW has joined #archiveteam
04:36 🔗 bsmith093 has quit IRC (Ping timeout: 244 seconds)
04:40 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:46 🔗 BlueMaxim has joined #archiveteam
04:46 🔗 Sk1d has joined #archiveteam
04:52 🔗 BlueMaxim has quit IRC (Quit: Leaving)
04:55 🔗 BlueMaxim has joined #archiveteam
05:17 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
05:25 🔗 JesseW has joined #archiveteam
05:31 🔗 DaSaucefu has quit IRC (Ping timeout: 244 seconds)
05:31 🔗 danielsau has joined #archiveteam
05:40 🔗 bsmith093 has joined #archiveteam
05:42 🔗 Honno has joined #archiveteam
06:12 🔗 vitzli has joined #archiveteam
06:16 🔗 HCross2 http://www.bbc.co.uk/news/uk-36308976 looks like it's official
06:17 🔗 HCross2 Also looks like it might be worth a full BBC archive at sometime
06:42 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
06:50 🔗 godane HCross2: i starting up my news.bbc.co.uk/2/hi/XXXXXX.stm grabs again
06:51 🔗 HCross2 OK
06:51 🔗 SketchCow ALL SO ADORABLE
06:52 🔗 godane the last one i did : https://archive.org/details/news.bbc.co.uk-2-hi-240xxxx-stm-pages-20160110
06:52 🔗 godane your technically not get the real url
06:52 🔗 godane but you are getting the articles
06:54 🔗 godane i do a brute force of the numbers before i make a list
06:54 🔗 godane seeing as about only 1200 to 1500 urls give me a page per a 10000
06:55 🔗 SketchCow Anyone want to take a shot at remixing the recipies?
07:20 🔗 metalcamp has joined #archiveteam
07:34 🔗 godane my cat may have killed by a coyote
07:35 🔗 ariscop has quit IRC (Quit: Leaving)
07:45 🔗 SketchCow Sorry to hear it, man
07:51 🔗 atomotic has joined #archiveteam
08:01 🔗 schbirid has joined #archiveteam
08:08 🔗 ndiddy has quit IRC (Read error: Operation timed out)
08:19 🔗 WinterFox has joined #archiveteam
08:47 🔗 bsmith093 has quit IRC (Ping timeout: 499 seconds)
08:48 🔗 ariscop has joined #archiveteam
08:59 🔗 bsmith093 has joined #archiveteam
09:08 🔗 dashcloud has quit IRC (Read error: Operation timed out)
09:11 🔗 dashcloud has joined #archiveteam
09:30 🔗 SmileyG anyone seen the bbc maybe closing their recipes archive?
09:30 🔗 SmileyG oh looks like you're on it ^_^
09:47 🔗 bwn has quit IRC (Read error: Operation timed out)
09:56 🔗 marvinw has quit IRC (Quit: Leaving)
10:18 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
10:22 🔗 marvinw has joined #archiveteam
10:25 🔗 BartoCH has joined #archiveteam
10:43 🔗 marvinw has quit IRC (Quit: Leaving)
10:46 🔗 marvinw has joined #archiveteam
11:14 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:44 🔗 bwn has joined #archiveteam
12:25 🔗 atomotic has joined #archiveteam
12:33 🔗 Medowar escapistmagazine.com is down. It belonged to gamefront. Did we save it too?
12:42 🔗 HCross2 http://www.bbc.co.uk/news/uk-36308976 arseholes
12:42 🔗 toad2 has quit IRC (Read error: Operation timed out)
12:43 🔗 HCross2 Are we sure the BBC isn't owned by Yahoo at this point
12:43 🔗 luckcolor XD
12:43 🔗 toad1 has joined #archiveteam
12:43 🔗 jgeoiur has joined #archiveteam
12:44 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:44 🔗 godane BBC doesn't keep a lot of there podcasts pass 30 days
12:45 🔗 godane thats been a thing for a while with them
12:45 🔗 jgeoiur hi. are you guys aware that a significant portion of the youtube videos that included copyrighted work are marked as such, and can be deleted by the press of a button? have you considered archiving videos from youtube?
12:49 🔗 murk jgeoiur: the scale of that is ludicrous, the amount of content that must get uploaded to youtube on a minutely basis must be measured in the gigabytes a second.
12:49 🔗 jgeoiur i didn't say all of youtube. but only copyrighted works that aren't easily available elsewhere, perhaps
12:50 🔗 vitzli which is like 80% of the content
12:51 🔗 jgeoiur i think most of the content is available elsewhere as well
12:51 🔗 jgeoiur films and albums that are only available on youtube are pretty rare
12:53 🔗 jgeoiur i discovered that if i run `mpv [url of some album]`, i get a 404 not found, but if i run `mpv [url of video without copyrighted work]`, it works just fine
12:54 🔗 jgeoiur which is pretty frightening. if something like SOPA passes, they can conveniently remove all that content
12:55 🔗 jgeoiur i'm not even sure if it's only the case for copyrighted work. they can apply censorship that way
12:55 🔗 murk if an archive contains anything that a copyright holder cares enough to remove from youtube, they will happily fire DMCA takedowns at anybody hosting other copies of it too.
12:55 🔗 jgeoiur sorry i meant 403 forbidden.
12:59 🔗 jgeoiur yes ok so forget about the copyrighted material. the point i was trying to make was that youtube has identified and marked an enormous amount of videos with attribute x. in this case x is copyrighted material, but it could just as well be anything else, like controversial material and you won't know about it until it's too late
12:59 🔗 jgeoiur we don't know to what degree the material on youtube is being data mined
13:01 🔗 Medowar If you have any video that is controversal, feel free to throw it to #archivebot, with the --youtube-dl flag.
13:02 🔗 Medowar Downloading all of youtube does not work, as mentioned and we have no way of generally identifing, what is controversal and what not
13:04 🔗 WinterFox has quit IRC (Remote host closed the connection)
13:05 🔗 jgeoiur can the archivebot monitor a webpage for changes, and archive each change automatically?
13:05 🔗 jgeoiur or rather monitor a youtube channel, and archive each new video that's being uploaded automatically
13:06 🔗 Medowar right now, archivebot works on a purely manual basis
13:06 🔗 HCross jgeoiur, what sort of site is it
13:07 🔗 jgeoiur https://www.youtube.com/user/robag88/videos
13:08 🔗 Medowar very high quality Videos, I see.
13:10 🔗 jgeoiur ok I'll just keep on doing it manually
13:15 🔗 midas jgeoiur: there is a project that is in the works that can do that in the future
13:15 🔗 jgeoiur i see. thanks
13:16 🔗 midas if you wish to help with it, join #videobot
13:22 🔗 atrocity has joined #archiveteam
13:27 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:30 🔗 dashcloud has joined #archiveteam
13:43 🔗 jgeoiur has quit IRC (Leaving)
14:27 🔗 atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…)
14:41 🔗 atomotic has joined #archiveteam
14:44 🔗 atomotic has quit IRC (Client Quit)
14:47 🔗 VADemon has joined #archiveteam
15:09 🔗 JesseW has joined #archiveteam
15:35 🔗 dashcloud has quit IRC (Read error: Operation timed out)
15:37 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
15:39 🔗 dashcloud has joined #archiveteam
16:10 🔗 nwf has joined #archiveteam
16:13 🔗 G33KY has joined #archiveteam
16:24 🔗 schbirid has quit IRC (Ping timeout: 258 seconds)
16:54 🔗 SketchCow Soundcloud is over.
16:55 🔗 xmc in the past year they've had a lot of... staff turnover, i hear
16:55 🔗 SketchCow http://www.digitalmusicnews.com/2016/05/16/soundcloud-preparing-massive-restrictions-dj-uploads/
17:00 🔗 MrRadar I know some people were working on discovery for Soundcloud, did that ever go anywhere?
17:03 🔗 phuzion #soundclown FYI
17:03 🔗 phuzion It's pretty dead
17:10 🔗 * phillipsj gets 403: forbidden while using lynx with that last link. (that is like 2 in 1 week)
17:10 🔗 vitzli rofl
17:10 🔗 zgrant has joined #archiveteam
17:11 🔗 GLaDOS silly kids, cats arent for the net
17:11 🔗 GLaDOS wait..
17:12 🔗 zgrant has quit IRC (Client Quit)
17:12 🔗 zgrant has joined #archiveteam
17:14 🔗 phillipsj The user-agent string looks more like a crawler than a Graphical web-browser.
17:16 🔗 GLaDOS let me guess, "Lynx/2.8.9 (Not A Crawler, like Lizard) SrslyNotCrawler/69.4.20.00"
17:19 🔗 MrRadar User agent strings are so broken. Look at the current one for Edge: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246
17:19 🔗 MrRadar I'm surprised the browser vendors haven't made any effort to deprecate them
17:23 🔗 will Jesus Christ thats for Edge? Thats like the whole browser market from the last 20 years in one!
17:25 🔗 MrRadar Really, they should just agree on a standard backwards-compatible user agent for browsers and have a separate header for bots (like Robot: wpull/1.x)
17:26 🔗 MrRadar Though this is -bs
17:36 🔗 phillipsj oops, I forgot the first two characters of the Edge sting. Oh well.
17:36 🔗 nwf has quit IRC (Read error: Connection reset by peer)
17:36 🔗 phillipsj sorry wrong window
17:37 🔗 nwf has joined #archiveteam
17:39 🔗 bai really, people should stop doing naive useragent detection on their websites, then browser vendors wouldn't have to lie like that
17:44 🔗 MrRadar Yeah, but we know it'll never, ever get fixed
17:48 🔗 JW_work has quit IRC (Quit: Leaving.)
17:58 🔗 vitzli has quit IRC (Quit: Leaving)
18:05 🔗 JW_work has joined #archiveteam
18:22 🔗 Frogging https://www.reddit.com/r/electronicmusic/comments/4jpqeo/soundcloud_says_reports_of_dj_mixes_being_pulled/
19:16 🔗 nox_ has joined #archiveteam
19:17 🔗 nox has quit IRC (Read error: Connection reset by peer)
19:22 🔗 closure has joined #archiveteam
19:57 🔗 ndiddy has joined #archiveteam
19:57 🔗 MMovie has quit IRC (Read error: Operation timed out)
19:57 🔗 MMovie has joined #archiveteam
20:07 🔗 ariscop has quit IRC (Leaving)
20:17 🔗 MMovie1 has joined #archiveteam
20:19 🔗 MMovie has quit IRC (Read error: Operation timed out)
20:41 🔗 ItsYoda has joined #archiveteam
20:45 🔗 HCross Ive started a manual crawl of BBC travel
20:47 🔗 Stiletto has quit IRC ()
21:00 🔗 Honno has quit IRC (Read error: Operation timed out)
21:10 🔗 ariscop has joined #archiveteam
21:15 🔗 Start http://www.polygon.com/2016/5/17/11692866/gametrailers-ign-acquisition-youtube-archive
21:15 🔗 khaoohs has quit IRC (Ping timeout: 499 seconds)
21:21 🔗 zgrant has quit IRC (Quit: http://chat.efnet.org (EOF))
21:21 🔗 godane picture of my cat : https://scontent-lga3-1.xx.fbcdn.net/t31.0-8/q81/s960x960/13217022_10204588349268713_1457902763586552308_o.jpg
21:24 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:27 🔗 dashcloud has joined #archiveteam
21:29 🔗 xmc that's a heck of a cat
21:40 🔗 Stiletto has joined #archiveteam
21:59 🔗 G33KY has quit IRC (Remote host closed the connection)
22:33 🔗 bwn has quit IRC (Read error: Operation timed out)
22:34 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
22:43 🔗 bwn has joined #archiveteam
22:48 🔗 tomwsmf-a has joined #archiveteam
22:56 🔗 cadbury_ hey, is there any plan for a bbc cooking site scrape?
22:58 🔗 Frogging head's up: FurAffinity was just attacked http://forums.furaffinity.net/threads/5-17-site-attack.1530523/
22:58 🔗 maseck has quit IRC (Remote host closed the connection)
22:59 🔗 w0rp has quit IRC (Read error: Operation timed out)
23:00 🔗 w0rp has joined #archiveteam
23:03 🔗 maseck has joined #archiveteam
23:05 🔗 Medowar cadbury_ yes, we are working on it
23:23 🔗 BlueMaxim has joined #archiveteam
23:55 🔗 JordanJ2 has joined #archiveteam

irclogger-viewer