Time |
Nickname |
Message |
00:53
🔗
|
|
ndizzle has quit IRC (Read error: Connection reset by peer) |
01:15
🔗
|
|
ndiddy has joined #archiveteam |
01:35
🔗
|
|
VADemon has joined #archiveteam |
02:03
🔗
|
|
atrocity has quit IRC (Read error: Operation timed out) |
02:30
🔗
|
ranma |
<JW_work> while I have no reason to think this is going away anytime soon, we should probably think about the process for saving https://steamcommunity.com/workshop/ |
02:31
🔗
|
ranma |
sometimes modmakers take something down |
02:41
🔗
|
ranma |
not sure if this was on steam, but this author tried to delete his mod from the internet: http://pastebin.com/bRYrvSAs |
02:41
🔗
|
ranma |
so yeah, i think it's a good idea to backup the steamworkshop |
03:23
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
03:32
🔗
|
|
JesseW has joined #archiveteam |
04:36
🔗
|
|
bsmith093 has quit IRC (Ping timeout: 244 seconds) |
04:40
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:46
🔗
|
|
BlueMaxim has joined #archiveteam |
04:46
🔗
|
|
Sk1d has joined #archiveteam |
04:52
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
04:55
🔗
|
|
BlueMaxim has joined #archiveteam |
05:17
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
05:25
🔗
|
|
JesseW has joined #archiveteam |
05:31
🔗
|
|
DaSaucefu has quit IRC (Ping timeout: 244 seconds) |
05:31
🔗
|
|
danielsau has joined #archiveteam |
05:40
🔗
|
|
bsmith093 has joined #archiveteam |
05:42
🔗
|
|
Honno has joined #archiveteam |
06:12
🔗
|
|
vitzli has joined #archiveteam |
06:16
🔗
|
HCross2 |
http://www.bbc.co.uk/news/uk-36308976 looks like it's official |
06:17
🔗
|
HCross2 |
Also looks like it might be worth a full BBC archive at sometime |
06:42
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:50
🔗
|
godane |
HCross2: i starting up my news.bbc.co.uk/2/hi/XXXXXX.stm grabs again |
06:51
🔗
|
HCross2 |
OK |
06:51
🔗
|
SketchCow |
ALL SO ADORABLE |
06:52
🔗
|
godane |
the last one i did : https://archive.org/details/news.bbc.co.uk-2-hi-240xxxx-stm-pages-20160110 |
06:52
🔗
|
godane |
your technically not get the real url |
06:52
🔗
|
godane |
but you are getting the articles |
06:54
🔗
|
godane |
i do a brute force of the numbers before i make a list |
06:54
🔗
|
godane |
seeing as about only 1200 to 1500 urls give me a page per a 10000 |
06:55
🔗
|
SketchCow |
Anyone want to take a shot at remixing the recipies? |
07:20
🔗
|
|
metalcamp has joined #archiveteam |
07:34
🔗
|
godane |
my cat may have killed by a coyote |
07:35
🔗
|
|
ariscop has quit IRC (Quit: Leaving) |
07:45
🔗
|
SketchCow |
Sorry to hear it, man |
07:51
🔗
|
|
atomotic has joined #archiveteam |
08:01
🔗
|
|
schbirid has joined #archiveteam |
08:08
🔗
|
|
ndiddy has quit IRC (Read error: Operation timed out) |
08:19
🔗
|
|
WinterFox has joined #archiveteam |
08:47
🔗
|
|
bsmith093 has quit IRC (Ping timeout: 499 seconds) |
08:48
🔗
|
|
ariscop has joined #archiveteam |
08:59
🔗
|
|
bsmith093 has joined #archiveteam |
09:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
09:11
🔗
|
|
dashcloud has joined #archiveteam |
09:30
🔗
|
SmileyG |
anyone seen the bbc maybe closing their recipes archive? |
09:30
🔗
|
SmileyG |
oh looks like you're on it ^_^ |
09:47
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
09:56
🔗
|
|
marvinw has quit IRC (Quit: Leaving) |
10:18
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
10:22
🔗
|
|
marvinw has joined #archiveteam |
10:25
🔗
|
|
BartoCH has joined #archiveteam |
10:43
🔗
|
|
marvinw has quit IRC (Quit: Leaving) |
10:46
🔗
|
|
marvinw has joined #archiveteam |
11:14
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
11:44
🔗
|
|
bwn has joined #archiveteam |
12:25
🔗
|
|
atomotic has joined #archiveteam |
12:33
🔗
|
Medowar |
escapistmagazine.com is down. It belonged to gamefront. Did we save it too? |
12:42
🔗
|
HCross2 |
http://www.bbc.co.uk/news/uk-36308976 arseholes |
12:42
🔗
|
|
toad2 has quit IRC (Read error: Operation timed out) |
12:43
🔗
|
HCross2 |
Are we sure the BBC isn't owned by Yahoo at this point |
12:43
🔗
|
luckcolor |
XD |
12:43
🔗
|
|
toad1 has joined #archiveteam |
12:43
🔗
|
|
jgeoiur has joined #archiveteam |
12:44
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:44
🔗
|
godane |
BBC doesn't keep a lot of there podcasts pass 30 days |
12:45
🔗
|
godane |
thats been a thing for a while with them |
12:45
🔗
|
jgeoiur |
hi. are you guys aware that a significant portion of the youtube videos that included copyrighted work are marked as such, and can be deleted by the press of a button? have you considered archiving videos from youtube? |
12:49
🔗
|
murk |
jgeoiur: the scale of that is ludicrous, the amount of content that must get uploaded to youtube on a minutely basis must be measured in the gigabytes a second. |
12:49
🔗
|
jgeoiur |
i didn't say all of youtube. but only copyrighted works that aren't easily available elsewhere, perhaps |
12:50
🔗
|
vitzli |
which is like 80% of the content |
12:51
🔗
|
jgeoiur |
i think most of the content is available elsewhere as well |
12:51
🔗
|
jgeoiur |
films and albums that are only available on youtube are pretty rare |
12:53
🔗
|
jgeoiur |
i discovered that if i run `mpv [url of some album]`, i get a 404 not found, but if i run `mpv [url of video without copyrighted work]`, it works just fine |
12:54
🔗
|
jgeoiur |
which is pretty frightening. if something like SOPA passes, they can conveniently remove all that content |
12:55
🔗
|
jgeoiur |
i'm not even sure if it's only the case for copyrighted work. they can apply censorship that way |
12:55
🔗
|
murk |
if an archive contains anything that a copyright holder cares enough to remove from youtube, they will happily fire DMCA takedowns at anybody hosting other copies of it too. |
12:55
🔗
|
jgeoiur |
sorry i meant 403 forbidden. |
12:59
🔗
|
jgeoiur |
yes ok so forget about the copyrighted material. the point i was trying to make was that youtube has identified and marked an enormous amount of videos with attribute x. in this case x is copyrighted material, but it could just as well be anything else, like controversial material and you won't know about it until it's too late |
12:59
🔗
|
jgeoiur |
we don't know to what degree the material on youtube is being data mined |
13:01
🔗
|
Medowar |
If you have any video that is controversal, feel free to throw it to #archivebot, with the --youtube-dl flag. |
13:02
🔗
|
Medowar |
Downloading all of youtube does not work, as mentioned and we have no way of generally identifing, what is controversal and what not |
13:04
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
13:05
🔗
|
jgeoiur |
can the archivebot monitor a webpage for changes, and archive each change automatically? |
13:05
🔗
|
jgeoiur |
or rather monitor a youtube channel, and archive each new video that's being uploaded automatically |
13:06
🔗
|
Medowar |
right now, archivebot works on a purely manual basis |
13:06
🔗
|
HCross |
jgeoiur, what sort of site is it |
13:07
🔗
|
jgeoiur |
https://www.youtube.com/user/robag88/videos |
13:08
🔗
|
Medowar |
very high quality Videos, I see. |
13:10
🔗
|
jgeoiur |
ok I'll just keep on doing it manually |
13:15
🔗
|
midas |
jgeoiur: there is a project that is in the works that can do that in the future |
13:15
🔗
|
jgeoiur |
i see. thanks |
13:16
🔗
|
midas |
if you wish to help with it, join #videobot |
13:22
🔗
|
|
atrocity has joined #archiveteam |
13:27
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:30
🔗
|
|
dashcloud has joined #archiveteam |
13:43
🔗
|
|
jgeoiur has quit IRC (Leaving) |
14:27
🔗
|
|
atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) |
14:41
🔗
|
|
atomotic has joined #archiveteam |
14:44
🔗
|
|
atomotic has quit IRC (Client Quit) |
14:47
🔗
|
|
VADemon has joined #archiveteam |
15:09
🔗
|
|
JesseW has joined #archiveteam |
15:35
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
15:37
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
15:39
🔗
|
|
dashcloud has joined #archiveteam |
16:10
🔗
|
|
nwf has joined #archiveteam |
16:13
🔗
|
|
G33KY has joined #archiveteam |
16:24
🔗
|
|
schbirid has quit IRC (Ping timeout: 258 seconds) |
16:54
🔗
|
SketchCow |
Soundcloud is over. |
16:55
🔗
|
xmc |
in the past year they've had a lot of... staff turnover, i hear |
16:55
🔗
|
SketchCow |
http://www.digitalmusicnews.com/2016/05/16/soundcloud-preparing-massive-restrictions-dj-uploads/ |
17:00
🔗
|
MrRadar |
I know some people were working on discovery for Soundcloud, did that ever go anywhere? |
17:03
🔗
|
phuzion |
#soundclown FYI |
17:03
🔗
|
phuzion |
It's pretty dead |
17:10
🔗
|
* |
phillipsj gets 403: forbidden while using lynx with that last link. (that is like 2 in 1 week) |
17:10
🔗
|
vitzli |
rofl |
17:10
🔗
|
|
zgrant has joined #archiveteam |
17:11
🔗
|
GLaDOS |
silly kids, cats arent for the net |
17:11
🔗
|
GLaDOS |
wait.. |
17:12
🔗
|
|
zgrant has quit IRC (Client Quit) |
17:12
🔗
|
|
zgrant has joined #archiveteam |
17:14
🔗
|
phillipsj |
The user-agent string looks more like a crawler than a Graphical web-browser. |
17:16
🔗
|
GLaDOS |
let me guess, "Lynx/2.8.9 (Not A Crawler, like Lizard) SrslyNotCrawler/69.4.20.00" |
17:19
🔗
|
MrRadar |
User agent strings are so broken. Look at the current one for Edge: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 |
17:19
🔗
|
MrRadar |
I'm surprised the browser vendors haven't made any effort to deprecate them |
17:23
🔗
|
will |
Jesus Christ thats for Edge? Thats like the whole browser market from the last 20 years in one! |
17:25
🔗
|
MrRadar |
Really, they should just agree on a standard backwards-compatible user agent for browsers and have a separate header for bots (like Robot: wpull/1.x) |
17:26
🔗
|
MrRadar |
Though this is -bs |
17:36
🔗
|
phillipsj |
oops, I forgot the first two characters of the Edge sting. Oh well. |
17:36
🔗
|
|
nwf has quit IRC (Read error: Connection reset by peer) |
17:36
🔗
|
phillipsj |
sorry wrong window |
17:37
🔗
|
|
nwf has joined #archiveteam |
17:39
🔗
|
bai |
really, people should stop doing naive useragent detection on their websites, then browser vendors wouldn't have to lie like that |
17:44
🔗
|
MrRadar |
Yeah, but we know it'll never, ever get fixed |
17:48
🔗
|
|
JW_work has quit IRC (Quit: Leaving.) |
17:58
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
18:05
🔗
|
|
JW_work has joined #archiveteam |
18:22
🔗
|
Frogging |
https://www.reddit.com/r/electronicmusic/comments/4jpqeo/soundcloud_says_reports_of_dj_mixes_being_pulled/ |
19:16
🔗
|
|
nox_ has joined #archiveteam |
19:17
🔗
|
|
nox has quit IRC (Read error: Connection reset by peer) |
19:22
🔗
|
|
closure has joined #archiveteam |
19:57
🔗
|
|
ndiddy has joined #archiveteam |
19:57
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
19:57
🔗
|
|
MMovie has joined #archiveteam |
20:07
🔗
|
|
ariscop has quit IRC (Leaving) |
20:17
🔗
|
|
MMovie1 has joined #archiveteam |
20:19
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
20:41
🔗
|
|
ItsYoda has joined #archiveteam |
20:45
🔗
|
HCross |
Ive started a manual crawl of BBC travel |
20:47
🔗
|
|
Stiletto has quit IRC () |
21:00
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
21:10
🔗
|
|
ariscop has joined #archiveteam |
21:15
🔗
|
Start |
http://www.polygon.com/2016/5/17/11692866/gametrailers-ign-acquisition-youtube-archive |
21:15
🔗
|
|
khaoohs has quit IRC (Ping timeout: 499 seconds) |
21:21
🔗
|
|
zgrant has quit IRC (Quit: http://chat.efnet.org (EOF)) |
21:21
🔗
|
godane |
picture of my cat : https://scontent-lga3-1.xx.fbcdn.net/t31.0-8/q81/s960x960/13217022_10204588349268713_1457902763586552308_o.jpg |
21:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
21:27
🔗
|
|
dashcloud has joined #archiveteam |
21:29
🔗
|
xmc |
that's a heck of a cat |
21:40
🔗
|
|
Stiletto has joined #archiveteam |
21:59
🔗
|
|
G33KY has quit IRC (Remote host closed the connection) |
22:33
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
22:34
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
22:43
🔗
|
|
bwn has joined #archiveteam |
22:48
🔗
|
|
tomwsmf-a has joined #archiveteam |
22:56
🔗
|
cadbury_ |
hey, is there any plan for a bbc cooking site scrape? |
22:58
🔗
|
Frogging |
head's up: FurAffinity was just attacked http://forums.furaffinity.net/threads/5-17-site-attack.1530523/ |
22:58
🔗
|
|
maseck has quit IRC (Remote host closed the connection) |
22:59
🔗
|
|
w0rp has quit IRC (Read error: Operation timed out) |
23:00
🔗
|
|
w0rp has joined #archiveteam |
23:03
🔗
|
|
maseck has joined #archiveteam |
23:05
🔗
|
Medowar |
cadbury_ yes, we are working on it |
23:23
🔗
|
|
BlueMaxim has joined #archiveteam |
23:55
🔗
|
|
JordanJ2 has joined #archiveteam |