[00:33] *** schbirid2 has joined #archiveteam-bs [00:35] *** schbirid has quit IRC (Read error: Operation timed out) [00:44] *** wp494 has joined #archiveteam-bs [00:44] *** BiggieJo1 has joined #archiveteam-bs [00:47] *** BiggieJon has quit IRC (Read error: Operation timed out) [00:53] *** Mayonaise has quit IRC (Ping timeout: 365 seconds) [00:53] *** zenguy_pc has quit IRC (Read error: Operation timed out) [01:06] *** primus104 has quit IRC (Leaving.) [01:09] *** zenguy_pc has joined #archiveteam-bs [01:38] *** Mayonaise has joined #archiveteam-bs [01:47] Ugh, no wonder I had trouble uploading some massive site dumps.. [01:47] Just noticed these emails... [01:47] Thank you for your interest in adding files to the Internet Archive. Unfortunately, one or more of the files you uploaded into item VetuswareSoftware_olddosru appear to be malware, and the item has been removed from archive.org. You can get more details about the malware file(s) here: [01:47] Communication_update.zip https://www.virustotal.com/file/3babe259474e50616dfb47fcb8dc983dae673e5d6f856d18c8cbd903c67256f7/analysis/1415505484/ [01:55] but that doesn't explain why the huge files fail [01:55] I think I'll just give up on that ID and just reupload everything... [02:06] *** Mayonaise has quit IRC (Ping timeout: 365 seconds) [02:11] *** Mayonaise has joined #archiveteam-bs [02:35] *** logchfoo starts logging #archiveteam-bs at Mon Nov 10 02:35:30 2014 [02:35] *** logchfoo has joined #archiveteam-bs [02:40] *** ex-parrot has joined #archiveteam-bs [02:52] *** Ravenloft has quit IRC (Ping timeout: 606 seconds) [02:57] *** bauruine has quit IRC (Ping timeout: 265 seconds) [03:02] *** bauruine has joined #archiveteam-bs [03:03] *** schbirid2 has quit IRC (Read error: Operation timed out) [03:06] so, if you used browserstacks, bad news- they were hacked, and if you believe the pastebin, it was very bad [03:11] *** schbirid2 has joined #archiveteam-bs [03:47] *** mistym has joined #archiveteam-bs [03:51] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [03:53] *** Lord_Nigh has joined #archiveteam-bs [04:23] *** bsmith093 has quit IRC (Read error: Operation timed out) [04:38] *** mistym has quit IRC (Remote host closed the connection) [04:39] *** bsmith093 has joined #archiveteam-bs [04:39] *** midas sets mode: +o bsmith093 [04:53] *** ex-parrot has quit IRC (Leaving.) [04:56] *** aaaaaaaaa has quit IRC (Leaving) [05:23] *** mistym has joined #archiveteam-bs [05:33] *** JonimusP is now known as Jonimus [07:16] dashcloud: link to pastebin? [07:17] https://www.reddit.com/r/sysadmin/comments/2ltemy/crazy_browserstack_email_i_just_got/ [07:17] this is all I've seen from it [07:17] I'm assuming that'd be the contents of whatever someone had put on pastebin [07:18] the pastebin: http://pastebin.com/RQXd2Au3 [07:18] (from https://news.ycombinator.com/item?id=8581477) [07:18] right [07:23] *** primus104 has joined #archiveteam-bs [07:39] *** rduser has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** SadDM has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** twrist has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** altlabel has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** pikhq has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** ionpulse has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** eprillios has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Insomnia_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Aranje has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** dcmorton has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** SmileyG has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Cameron_D has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** slash` has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** pft has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** antomatic has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Sue_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** mistym has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Lord_Nigh has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** GLaDOS has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** arkiver has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** RainbowCo has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** SN4T14__ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** brayden has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Zebranky_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Atluxity has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** tfgbd has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** wm_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** bauruine has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Sellyme_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** dashcloud has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** DFJustin has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Sk1d has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** danneh_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Kirk has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** primus104 has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** schbirid2 has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Coderjoe has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** norbert79 has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** ersi has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** garyrh has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Void_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Boppen has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Kenshin has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** wp494 has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** wiktor_b has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** bsmith093 has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** SketchCow has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** kanzure has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** lytv has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** RedType has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** chfoo has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** dx has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** xmc has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** zenguy_pc has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** BlueMaxim has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** primus_ has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Jonimus has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** swebb has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** w0rp has quit IRC (ircd.choopa.net hub.efnet.us) [07:39] *** Laverne has quit IRC (ircd.choopa.net hub.efnet.us) [17:03] *** logchfoo starts logging #archiveteam-bs at Mon Nov 10 17:03:04 2014 [17:03] *** logchfoo has joined #archiveteam-bs [17:14] *** mistym has quit IRC (Remote host closed the connection) [17:31] *** logchfoo starts logging #archiveteam-bs at Mon Nov 10 17:31:59 2014 [17:31] *** logchfoo has joined #archiveteam-bs [17:32] *** mistym has joined #archiveteam-bs [17:51] *** primus104 has quit IRC (Leaving.) [17:59] *** bobby_ has joined #archiveteam-bs [18:23] *** bobby__ has joined #archiveteam-bs [18:26] *** bobby_ has quit IRC (Ping timeout: 240 seconds) [18:39] so after churning through 100gb of the hyves indexes I realised I got the username wrong [18:39] I probably should have been tipped off by the "mother" being a teenage boy [18:43] *** kyan has joined #archiveteam-bs [18:43] So, I'm trying to find this file http://downloads.bbc.co.uk/podcasts/radio4/ipm/ipm_20080412-1843.mp3 [18:44] espes__: lol? [18:44] BBC's archiving policy is, as far as I can tell, to BURN IT ALL. That makes me sad. I was wgetting the podcast for a while last year, but it was kind of bad because it only worked when my laptop was online. Is there any sort of scheduled archiving thing? [18:44] espes__: I'm still writing parsing code, but, I need sleep :( [18:44] * joepie91 hit a zlib speed bump [18:45] Like, periodic archiving in the cloud [18:45] ~cloud~ [18:45] kyan: I'm running a periodic wget for the NHK broadcasts, I could set one up for this if necessary [18:45] no idea if something like that already exists though [18:46] joepie91: Ah. I was thinking that archivebot or something might have that. guess not though. Yea, as far as I can tell BBC4 deletes basically everything after a couple weeks [18:46] : [18:46] :/ * [18:47] kyan: is there, like, a directory index for them? [18:47] or does it require page parsing? [18:47] * joepie91 notes that he has now almost a year of NHK podcasts [18:47] the closest thing I found was the actual podcast XML, IIRC [18:47] BBC archiving policy (today) is 'KEEP IT ALL', but that does not mean that they'll let anyone else have it. [18:47] kyan: if you could drop me all the URLs you have for it in PM [18:47] I think what i ended up doing was just wgetting the main podcast page for a few hops [18:47] I can have a look at it tomorrow [18:48] still need to set up an automated upload job for NHK as well [18:48] Hmm, if they keep it all, maybe it's not a priority issue for us then? [18:48] They're reasonably good at making stuff /available/ for limited periods - e.g. 7-30 days after broadcast, but after that it doesn't count towards ratings so they hide it away again. [18:48] so may as well look into adding this to the schedule [18:48] kyan: dark archive is no archive [18:48] :) [18:48] true. [18:48] * antomatic nods [18:49] but yeah, drop me all the relevant URLs in PM and I will look at it probably tomorrow [18:49] BBC policy used to be 'tapes are expensive!', but they do seem more enlightened today [18:49] I think there are a lot more podcasts than that, though [18:49] like, one for each show [18:50] that's the one I 've been doing, because of that interveiew about the scientology documents (which is apparently the only thing that gave the name of the involved laywer) [18:51] there should be a thing to automatically ingest stuff from an rss feed [18:52] and another scraperwiki-like thing to easily generate rss feeds [18:52] or just like, cron-as-a-service :P [18:53] oh debian, php5 depends on apache2 [18:53] For what it's worth, BBC also uses IP blocks to limit some content to UK only http://www.bbc.co.uk/podcasts/help/uk_only [18:54] schbirid2: wrong php package [18:54] schbirid2: php5-cgi/php5-fpm for lighttpd/nginx [18:56] http://file.wikileaks.org/robots.txt sad face [19:04] Most of the BBC's content is behind iPlayer, which is completely IP-locked, unlike [most] podcasts [19:05] *** primus104 has joined #archiveteam-bs [19:05] joepie91: too late :P [19:05] also i want to run the builtin php server because i like danger [19:07] antomatic: haha, IP-locked [19:07] * joepie91 SSHs into UK VPS [19:07] :D [19:10] Ssh, don't tell them. :) [19:11] True confession: I used to look after the geoblocking for a video site at work. Pretty easy, in the main it was just whitelisting the big UK ISPs, and denything everything else with a "Has there been an error? Let us know" reply form. [19:12] Occasionally an ISP would open up a new IP range, we'd hear about it from the form. [19:12] More often I'd get emails from people saying "I am just trying to use your site, in my home, and it does not work, please can you fix it" [19:13] which due investigation would reveal that their home was apparentlyhosted in the middle of a large datacenter. :) [19:16] Or that their IP range was allocated to "Soopa VPN Ltd" [19:16] lol [19:16] joepie91: btw I pmed you with an idea for a wget command [19:42] *** Aranje has quit IRC (Quit: Three sheets to the wind) [20:06] *** ex-parrot has joined #archiveteam-bs [20:08] it looks like something might have gone wrong with bhscfbemh541lxe06mrgurvz9 in archivebot, Facebook urls are timing out [20:08] *** ex-parrot has quit IRC (Client Quit) [20:20] Also: is there a way to search for specific finished Archivebot WARCs? [20:23] *** Panasonic has quit IRC (Ping timeout: 480 seconds) [20:24] *** bobby__ has quit IRC (Ping timeout: 240 seconds) [20:30] *** bobby_ has joined #archiveteam-bs [21:06] google site:archive.org archivebot whatever [21:09] *** bobby_ has quit IRC (Quit: Page closed) [21:09] *** Bobby_ has joined #archiveteam-bs [21:28] *** BlueMaxim has joined #archiveteam-bs [21:31] *** mistym has quit IRC (Remote host closed the connection) [21:33] *** kyan_ has joined #archiveteam-bs [21:36] *** kyan_ has quit IRC (Client Quit) [21:38] *** kyan has quit IRC (Ping timeout: 480 seconds) [21:46] *** Bobby_ has quit IRC () [21:52] *** mistym has joined #archiveteam-bs [22:22] *** mistym has quit IRC (Remote host closed the connection) [22:37] *** mistym has joined #archiveteam-bs [22:54] *** RedType has quit IRC (Quit: leaving) [22:54] *** RedType has joined #archiveteam-bs [22:54] *** RedType has quit IRC (Client Quit) [22:55] https://archive.org/details/archivebot [22:56] (it has its own collection you know:)) [23:07] *** RedType has joined #archiveteam-bs