[00:04] The Warrior is a really inspired feat of packaging! I love it. [00:15] someone uploaded a 2.1gb file of the last x-play episode [00:15] getting that one now [00:28] Ever heard about formspring.me? [00:33] "This Account Has Expired. As of January 2013, accounts that have not been active in over 18 months may be automatically deleted. If this is your account, you may login within the next 24 hours to stop this account from being permanently deleted." [00:33] How nice of them. A whole 24 hours to react. [00:51] how bad is it to disrespect the robots.txt rate guidelines for bots? [00:51] ethically/legally [00:54] rate guidelines? are they ridiculously strict here? [00:55] also, you may violate the "spirit of the law" but who says you can't use multiple bots from multiple hosts [00:55] Crawl-delay: 5 [00:55] for the Speedy spider [00:56] and I suppose our advice wouldn't really work for an EC2 cluster then, would it... [00:58] in fact... here's what they put in the robots.txt: 5 specific user-agents with crawl-delay restriction. If my crawler is diy, then I can assume it's not restricted? [00:58] * xk_id_ chuckles [00:58] you can set your own UA [00:58] are they blocking all other UAs? [00:58] nope [00:58] lol [00:59] ikr [00:59] I'd still add a delay just to be polite [00:59] but I'd probably make is lower [00:59] under "User-agent: *" they only list a bunch of "Disallow:" [00:59] Yes. I shall. say, what would be a polite, yet satisfying delay? [01:00] in fact, let's keep it to polite. (satisfying depends on my needs) [01:00] 1 or 2 seconds probably is what I'd do [01:00] ic [01:00] would that be cumulative across the cluster? or for just one machine? [02:08] xk_id_: robots is somewhere between a request and a suggestion [02:08] imo [02:08] ic [02:08] polite delay is 1 second, sneaky is 10 seconds [04:33] uploaded finally: http://archive.org/details/TechTV_Music_Wars [09:57] http://www.ussc.gov/ got hacked by Anonymous [09:57] I made a copy for the future to see [10:07] omf__: archiving that? [10:12] I am downloading it now [10:12] many of the content mirrors are down so [10:13] I am really interested in what is in these files, no doubt it is going to piss off the gov [10:13] they are 150mb each [10:15] :) [10:15] the gov't is going all out tho [10:16] trying to recruit netizens to fight in the cyber war [10:16] http://www.whitehouse.gov/blog/2013/01/22/roll-your-sleeves-get-involved-and-get-civic-hacking [10:17] http://activepolitic.com:82/external/1785.html?utm_source=dlvr.it&utm_medium=twitter [10:17] John Kerry: Foreign Hackers Are '21st Century Nuclear Weapons' [10:18] #archiveteam-bs man [10:18] If it ain't about archiving, put it in -bs [10:18] the download is going super slow, I assume because everyone is slamming the shit out of those servers [11:23] ussc.gov dns has been taken out [11:23] am I the only one who backed it up? [14:41] aww, ussc.gov is down [14:41] just the dns is down [14:41] I always sleep at the worst times [14:41] ah, have you an ip address? [14:41] the direct ip works [14:42] it is in the hackernews story [14:42] I am still getting 2 of the files from mirrors [14:42] ncdu has new features since september: Added option to dump scanned directory information to a file (-o) & Added option to load scanned directory information from a file (-f) [14:43] this makes it the perfect tool if you want to get a nice overview of some directories, their sizes, etc [14:45] I guess the story left the first few pages [15:08] https://news.ycombinator.com/item?id=5119600 [15:08] it is still the 2nd story on there [15:08] it looks like for now I am only going to get 2 of the files [15:08] I am keeping an eye out for more mirrors [15:24] omf__: that one doesn't seem to have the ip address in it [15:35] Yes it does but it does not matter anymore since they partially restored the site to normal [16:34] mmm, flashy lights: http://offog.org/stuff/arc-breakout.jpg [16:34] (as it turned out, I didn't need the breakout box to do anything in the end, but at least it gives me something to watch while the hard disk image is copying...) [18:27] omf__: which of those files have you downloaded? [18:27] http://www.youtube.com/watch?v=myYzfsEOaDw [18:28] http://www.youtube.com/watch?v=x3Fz1V3LZtw [18:28] alternate footage of the NYC memorial; official footage of the IA memorial. [18:33] almost done with kennedy and scalia [18:34] I also got the site as screenshots [18:35] I hope others are getting the other files [18:36] link me [18:36] oh, the hn one? ok [18:37] http://pastebin.com/d2nvt263 [18:37] the new anonymous thing [18:37] yeah ok [18:38] 7 parts left [18:38] speeds cusk [18:38] suck* [18:38] from what I have gathered you need all the files to decrypt everything [18:39] I am glad they at least have 4 mirrors since one was already taken down [18:39] there is a torrent [18:40] some are giving 503 [18:40] and most are 404 [18:42] eta 14h 44m [18:42] for one [18:43] I want to add this to interesting things I collect over the years [18:43] like the doom 3 alpha and the half life 2 source code [18:43] amongst other things [18:44] did you get the half life source code? [18:44] err, half life alpha [18:44] I might have a copy of the game not the code, just HL2 [18:44] I am not sure they ever caught who broke in [18:45] wow [18:45] the torrent picked up [18:45] should be done in a minute [18:45] omf__: half life (1) alpha leaked some week ago [18:45] alpha as in press release disc [18:45] pretty nice early stuff [18:46] hmm maybe I should look around for it [18:46] my big thing is gaming history, way too much of that is long gone. [18:49] the torrent has a "press release" flv in it [18:49] and .txt [18:50] omf__: I have all the files [18:50] you got a fast internet connection [18:50] yeah, 5MB/s [18:56] omf__: someone posted the whole thing on mega : https://mega.co.nz/#!V9sH3TIC!P9U_C2udtPdJyt8772o_aEiceHsV7BDxdOmwO9224Qg [18:56] hahah go MEGA [18:56] ha, they force accepting tos [18:56] most download sites don't do that for downloaders [18:58] I need fucking flash to use mega [18:58] :[ [18:58] meh [18:58] what a huge stack of shit [18:58] hold on [18:59] why cannot flash just die [19:00] someone grab google cache of ussc.gov [19:02] I already got the main page saved from before. First thing I did [19:02] ok [19:02] that and the video in case youtube pulled it [19:03] see pm [19:17] http://no.reddit.com/r/technology/comments/17awqe/ussc_has_been_taken_down_with_an_important_message/ [19:17] lists what was in clarity1-3 [19:19] hah [19:19] weird [19:20] Does anyone else backup twitter feeds? I am only doing a few hundred so far [19:20] I was thinking of setting up an archive warrior so people could help [19:20] We have people backing up both twitter feeds of most followed accounts, and a sample (called the drizzle) of the main feed. [19:21] No, twitter would not be worth your effort or the effort of the warrior. [19:21] not for everything [19:21] just a few hundred [19:21] Yes, but this is literally being done by many others. [19:22] Sexy high-profile site, gets all the downloading and the backing up. [19:22] I would like to coordinate with them as to not duplicate effort [19:22] Much more at risk are small communities running vbulletin or sites of people recently dead. [19:22] How can I say this? [19:22] I am currently doing the small half life site [19:22] Oh never mind, I did 3 times. Have fun. [19:23] Self-hosted content is dying an silent death, it's in my opinion a lot more important. We know US LoC gets data from Twitter [19:24] SketchCow, you missed my point. I want to make sure they are backing up the things I would, so I do not have to do it. Also a few of these twitter archivers that I know of do not share data because of the TOS [19:24] I totally got the point. [19:24] I have the point. [19:24] I see no reason to fight you. You want to do it, goooooooo nuts. [19:24] Some people like vanilla. [19:25] Let me repeat: I want to make sure they are backing up the things I would, so I do not have to do it. [19:25] I do not want to do it [19:26] Then why are you talking about it? If you want to ensure something, start an effort - maybe people tag along [19:28] lol [19:28] To head back to my original point. I was asking if anyone else is doing it so I can stop doing it [19:29] don't trust others to do what you believe should be done. [19:29] I would like to contact the others first and see if they would upload it to IA [19:30] I have got people to put things up before with a simple email, the normal response is it never dawned on them to back it up [19:30] if it hadn't, why would they be here, of all places? [19:32] People here know people who are not here who do big data [19:32] finding data is like job hunting, you get more through word of mouth than anything else [19:34] I got a local non-profit to convert all their tapes to dvd and this year they are going to upload them to IA [19:34] they just wanted a place to back them up online and I proposed that solution [19:36] plus they have the dvd backups for their library [19:36] all of it is news shows from 70s-80s [19:37] sorry I went OT [19:44] Someone is contributing roughly 500 CD-ROM images and scans to me. That's happening in another window. More than enough good for the world today. [19:44] aww [19:47] more shareware? I love that stuff [19:47] Primarily cover discs. [19:48] aah from mags [19:48] And the mags. [19:50] Do the mags go dark until copyright expires? [19:53] Ask the question again [19:57] Do the magazines themselves have to go dark until the copyright on them expires? [19:58] given the number of other magazines that aren't dark, I'd guess that they don't [19:58] Unless there's a complaint, etc. [20:01] I am just glad we get this stuff [20:48] so i have about 30gb of videos from g4tv.com [21:27] * chronomex omw to portland to pick up some of the zillions of cds that turnkit wanted [21:33] cd-roms fuck yeah [21:33] some of, dunno how much I can fit in this vehicle [21:33] :P [21:33] O_O [21:34] well, penny each [21:34] Truck rental [21:35] heh [21:36] I wonder how favourably a carload of CD-ROMs compares with fibre internet [21:36] Seriously. Truck rental. [21:53] lady at goodwill doesn't understand indiscriminate purchasing [21:55] "give me a shelfload of cdroms, I don't care which" [21:55] :) [21:55] "no, you have to go pick them out on amazon" [22:00] I've decided to start a no-kill shelter for elderly PCs. My first rescue is a rusty, cosmetically deficient, missing accessories Tandy 1000. He sits nicely on command, is housetrained, needs a little TLC. This guy deserves to live out the rest of his days in a warm and loving home. Does great with cats. Won't you be his forever home? [22:09] :) [22:12] Already running a no-kill shelter [22:26] DrainLbry you basically described my basement [22:38] DrainLbry, I do that too [22:38] I have a whole 10x10 storage unit full of old computers. It is the only thing I collect [22:38] I should get some pics up online [22:38] Does it work? [23:46] hello? [23:46] hi [23:57] whazzzzzuupppp [23:58] happy weekend, SketchCow