Time |
Nickname |
Message |
00:33
🔗
|
|
schbirid2 has joined #archiveteam-bs |
00:35
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
00:44
🔗
|
|
wp494 has joined #archiveteam-bs |
00:44
🔗
|
|
BiggieJo1 has joined #archiveteam-bs |
00:47
🔗
|
|
BiggieJon has quit IRC (Read error: Operation timed out) |
00:53
🔗
|
|
Mayonaise has quit IRC (Ping timeout: 365 seconds) |
00:53
🔗
|
|
zenguy_pc has quit IRC (Read error: Operation timed out) |
01:06
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:09
🔗
|
|
zenguy_pc has joined #archiveteam-bs |
01:38
🔗
|
|
Mayonaise has joined #archiveteam-bs |
01:47
🔗
|
tfgbd |
Ugh, no wonder I had trouble uploading some massive site dumps.. |
01:47
🔗
|
tfgbd |
Just noticed these emails... |
01:47
🔗
|
tfgbd |
Thank you for your interest in adding files to the Internet Archive. Unfortunately, one or more of the files you uploaded into item VetuswareSoftware_olddosru appear to be malware, and the item has been removed from archive.org. You can get more details about the malware file(s) here: |
01:47
🔗
|
tfgbd |
Communication_update.zip https://www.virustotal.com/file/3babe259474e50616dfb47fcb8dc983dae673e5d6f856d18c8cbd903c67256f7/analysis/1415505484/ |
01:55
🔗
|
tfgbd |
but that doesn't explain why the huge files fail |
01:55
🔗
|
tfgbd |
I think I'll just give up on that ID and just reupload everything... |
02:06
🔗
|
|
Mayonaise has quit IRC (Ping timeout: 365 seconds) |
02:11
🔗
|
|
Mayonaise has joined #archiveteam-bs |
02:35
🔗
|
|
logchfoo starts logging #archiveteam-bs at Mon Nov 10 02:35:30 2014 |
02:35
🔗
|
|
logchfoo has joined #archiveteam-bs |
02:40
🔗
|
|
ex-parrot has joined #archiveteam-bs |
02:52
🔗
|
|
Ravenloft has quit IRC (Ping timeout: 606 seconds) |
02:57
🔗
|
|
bauruine has quit IRC (Ping timeout: 265 seconds) |
03:02
🔗
|
|
bauruine has joined #archiveteam-bs |
03:03
🔗
|
|
schbirid2 has quit IRC (Read error: Operation timed out) |
03:06
🔗
|
dashcloud |
so, if you used browserstacks, bad news- they were hacked, and if you believe the pastebin, it was very bad |
03:11
🔗
|
|
schbirid2 has joined #archiveteam-bs |
03:47
🔗
|
|
mistym has joined #archiveteam-bs |
03:51
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
03:53
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
04:23
🔗
|
|
bsmith093 has quit IRC (Read error: Operation timed out) |
04:38
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
04:39
🔗
|
|
bsmith093 has joined #archiveteam-bs |
04:39
🔗
|
|
midas sets mode: +o bsmith093 |
04:53
🔗
|
|
ex-parrot has quit IRC (Leaving.) |
04:56
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:23
🔗
|
|
mistym has joined #archiveteam-bs |
05:33
🔗
|
|
JonimusP is now known as Jonimus |
07:16
🔗
|
joepie91 |
dashcloud: link to pastebin? |
07:17
🔗
|
Kazzy |
https://www.reddit.com/r/sysadmin/comments/2ltemy/crazy_browserstack_email_i_just_got/ |
07:17
🔗
|
Kazzy |
this is all I've seen from it |
07:17
🔗
|
Kazzy |
I'm assuming that'd be the contents of whatever someone had put on pastebin |
07:18
🔗
|
garyrh |
the pastebin: http://pastebin.com/RQXd2Au3 |
07:18
🔗
|
garyrh |
(from https://news.ycombinator.com/item?id=8581477) |
07:18
🔗
|
joepie91 |
right |
07:23
🔗
|
|
primus104 has joined #archiveteam-bs |
07:39
🔗
|
|
rduser has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
SadDM has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
twrist has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
altlabel has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
pikhq has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
ionpulse has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
eprillios has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Insomnia_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Aranje has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
dcmorton has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
SmileyG has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Cameron_D has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
slash` has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
pft has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
antomatic has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Sue_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
mistym has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Lord_Nigh has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
GLaDOS has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
arkiver has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
RainbowCo has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
SN4T14__ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
brayden has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Zebranky_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Atluxity has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
tfgbd has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
wm_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
bauruine has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Sellyme_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
dashcloud has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
DFJustin has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Sk1d has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
danneh_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Kirk has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
primus104 has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
schbirid2 has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Coderjoe has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
norbert79 has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
ersi has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
garyrh has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Void_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Boppen has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Kenshin has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
wp494 has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
wiktor_b has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
bsmith093 has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
SketchCow has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
kanzure has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
lytv has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
RedType has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
chfoo has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
dx has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
xmc has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
zenguy_pc has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
BlueMaxim has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
primus_ has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Jonimus has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
swebb has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
w0rp has quit IRC (ircd.choopa.net hub.efnet.us) |
07:39
🔗
|
|
Laverne has quit IRC (ircd.choopa.net hub.efnet.us) |
17:03
🔗
|
|
logchfoo starts logging #archiveteam-bs at Mon Nov 10 17:03:04 2014 |
17:03
🔗
|
|
logchfoo has joined #archiveteam-bs |
17:14
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
17:31
🔗
|
|
logchfoo starts logging #archiveteam-bs at Mon Nov 10 17:31:59 2014 |
17:31
🔗
|
|
logchfoo has joined #archiveteam-bs |
17:32
🔗
|
|
mistym has joined #archiveteam-bs |
17:51
🔗
|
|
primus104 has quit IRC (Leaving.) |
17:59
🔗
|
|
bobby_ has joined #archiveteam-bs |
18:23
🔗
|
|
bobby__ has joined #archiveteam-bs |
18:26
🔗
|
|
bobby_ has quit IRC (Ping timeout: 240 seconds) |
18:39
🔗
|
espes__ |
so after churning through 100gb of the hyves indexes I realised I got the username wrong |
18:39
🔗
|
espes__ |
I probably should have been tipped off by the "mother" being a teenage boy |
18:43
🔗
|
|
kyan has joined #archiveteam-bs |
18:43
🔗
|
kyan |
So, I'm trying to find this file http://downloads.bbc.co.uk/podcasts/radio4/ipm/ipm_20080412-1843.mp3 |
18:44
🔗
|
joepie91 |
espes__: lol? |
18:44
🔗
|
kyan |
BBC's archiving policy is, as far as I can tell, to BURN IT ALL. That makes me sad. I was wgetting the podcast for a while last year, but it was kind of bad because it only worked when my laptop was online. Is there any sort of scheduled archiving thing? |
18:44
🔗
|
joepie91 |
espes__: I'm still writing parsing code, but, I need sleep :( |
18:44
🔗
|
* |
joepie91 hit a zlib speed bump |
18:45
🔗
|
kyan |
Like, periodic archiving in the cloud |
18:45
🔗
|
joepie91 |
~cloud~ |
18:45
🔗
|
joepie91 |
kyan: I'm running a periodic wget for the NHK broadcasts, I could set one up for this if necessary |
18:45
🔗
|
joepie91 |
no idea if something like that already exists though |
18:46
🔗
|
kyan |
joepie91: Ah. I was thinking that archivebot or something might have that. guess not though. Yea, as far as I can tell BBC4 deletes basically everything after a couple weeks |
18:46
🔗
|
joepie91 |
: |
18:46
🔗
|
joepie91 |
:/ * |
18:47
🔗
|
joepie91 |
kyan: is there, like, a directory index for them? |
18:47
🔗
|
joepie91 |
or does it require page parsing? |
18:47
🔗
|
* |
joepie91 notes that he has now almost a year of NHK podcasts |
18:47
🔗
|
kyan |
the closest thing I found was the actual podcast XML, IIRC |
18:47
🔗
|
antomatic |
BBC archiving policy (today) is 'KEEP IT ALL', but that does not mean that they'll let anyone else have it. |
18:47
🔗
|
joepie91 |
kyan: if you could drop me all the URLs you have for it in PM |
18:47
🔗
|
kyan |
I think what i ended up doing was just wgetting the main podcast page for a few hops |
18:47
🔗
|
joepie91 |
I can have a look at it tomorrow |
18:48
🔗
|
joepie91 |
still need to set up an automated upload job for NHK as well |
18:48
🔗
|
kyan |
Hmm, if they keep it all, maybe it's not a priority issue for us then? |
18:48
🔗
|
antomatic |
They're reasonably good at making stuff /available/ for limited periods - e.g. 7-30 days after broadcast, but after that it doesn't count towards ratings so they hide it away again. |
18:48
🔗
|
joepie91 |
so may as well look into adding this to the schedule |
18:48
🔗
|
joepie91 |
kyan: dark archive is no archive |
18:48
🔗
|
joepie91 |
:) |
18:48
🔗
|
kyan |
true. |
18:48
🔗
|
* |
antomatic nods |
18:49
🔗
|
joepie91 |
but yeah, drop me all the relevant URLs in PM and I will look at it probably tomorrow |
18:49
🔗
|
antomatic |
BBC policy used to be 'tapes are expensive!', but they do seem more enlightened today |
18:49
🔗
|
kyan |
I think there are a lot more podcasts than that, though |
18:49
🔗
|
kyan |
like, one for each show |
18:50
🔗
|
kyan |
that's the one I 've been doing, because of that interveiew about the scientology documents (which is apparently the only thing that gave the name of the involved laywer) |
18:51
🔗
|
espes__ |
there should be a thing to automatically ingest stuff from an rss feed |
18:52
🔗
|
espes__ |
and another scraperwiki-like thing to easily generate rss feeds |
18:52
🔗
|
espes__ |
or just like, cron-as-a-service :P |
18:53
🔗
|
schbirid2 |
oh debian, php5 depends on apache2 |
18:53
🔗
|
kyan |
For what it's worth, BBC also uses IP blocks to limit some content to UK only http://www.bbc.co.uk/podcasts/help/uk_only |
18:54
🔗
|
joepie91 |
schbirid2: wrong php package |
18:54
🔗
|
joepie91 |
schbirid2: php5-cgi/php5-fpm for lighttpd/nginx |
18:56
🔗
|
DFJustin |
http://file.wikileaks.org/robots.txt sad face |
19:04
🔗
|
antomatic |
Most of the BBC's content is behind iPlayer, which is completely IP-locked, unlike [most] podcasts |
19:05
🔗
|
|
primus104 has joined #archiveteam-bs |
19:05
🔗
|
schbirid2 |
joepie91: too late :P |
19:05
🔗
|
schbirid2 |
also i want to run the builtin php server because i like danger |
19:07
🔗
|
joepie91 |
antomatic: haha, IP-locked |
19:07
🔗
|
* |
joepie91 SSHs into UK VPS |
19:07
🔗
|
joepie91 |
:D |
19:10
🔗
|
antomatic |
Ssh, don't tell them. :) |
19:11
🔗
|
antomatic |
True confession: I used to look after the geoblocking for a video site at work. Pretty easy, in the main it was just whitelisting the big UK ISPs, and denything everything else with a "Has there been an error? Let us know" reply form. |
19:12
🔗
|
antomatic |
Occasionally an ISP would open up a new IP range, we'd hear about it from the form. |
19:12
🔗
|
antomatic |
More often I'd get emails from people saying "I am just trying to use your site, in my home, and it does not work, please can you fix it" |
19:13
🔗
|
antomatic |
which due investigation would reveal that their home was apparentlyhosted in the middle of a large datacenter. :) |
19:16
🔗
|
antomatic |
Or that their IP range was allocated to "Soopa VPN Ltd" |
19:16
🔗
|
joepie91 |
lol |
19:16
🔗
|
kyan |
joepie91: btw I pmed you with an idea for a wget command |
19:42
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
20:06
🔗
|
|
ex-parrot has joined #archiveteam-bs |
20:08
🔗
|
kyan |
it looks like something might have gone wrong with bhscfbemh541lxe06mrgurvz9 in archivebot, Facebook urls are timing out |
20:08
🔗
|
|
ex-parrot has quit IRC (Client Quit) |
20:20
🔗
|
kyan |
Also: is there a way to search for specific finished Archivebot WARCs? |
20:23
🔗
|
|
Panasonic has quit IRC (Ping timeout: 480 seconds) |
20:24
🔗
|
|
bobby__ has quit IRC (Ping timeout: 240 seconds) |
20:30
🔗
|
|
bobby_ has joined #archiveteam-bs |
21:06
🔗
|
DFJustin |
google site:archive.org archivebot whatever |
21:09
🔗
|
|
bobby_ has quit IRC (Quit: Page closed) |
21:09
🔗
|
|
Bobby_ has joined #archiveteam-bs |
21:28
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
21:31
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
21:33
🔗
|
|
kyan_ has joined #archiveteam-bs |
21:36
🔗
|
|
kyan_ has quit IRC (Client Quit) |
21:38
🔗
|
|
kyan has quit IRC (Ping timeout: 480 seconds) |
21:46
🔗
|
|
Bobby_ has quit IRC () |
21:52
🔗
|
|
mistym has joined #archiveteam-bs |
22:22
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
22:37
🔗
|
|
mistym has joined #archiveteam-bs |
22:54
🔗
|
|
RedType has quit IRC (Quit: leaving) |
22:54
🔗
|
|
RedType has joined #archiveteam-bs |
22:54
🔗
|
|
RedType has quit IRC (Client Quit) |
22:55
🔗
|
midas |
https://archive.org/details/archivebot |
22:56
🔗
|
midas |
(it has its own collection you know:)) |
23:07
🔗
|
|
RedType has joined #archiveteam-bs |