[03:41] So, I just asked Jason about this; is there a warrior AMI for EC2? [03:42] I'm not really interested in killing my home bandwidth, but I'm willing to drop a bit of Amazon's on it :) [03:45] offby1: ask in #BurnTheMessenger, they were talking about that earlier [06:48] I've thrown together a spot-instance friendly AMI for Yahoo Messages: ami-2400984d [06:48] just pop your username in for the userdata and Amazon takes care of the rest. [12:32] hey all--is there a public AMI for Yahoo Messages yet? [12:33] Yeah, there is. [12:33] http://scr.glados.me/1364040894.png [12:33] Wait no [12:33] ami-2400984d [12:34] Put your username (alphanumeric, no spaces) in the userdata field [12:34] GLaDOS: great, just heard the sirens, will fire 'er up. [12:34] perfect! [12:35] See that number on the tracker of "total items"? [12:35] That's still increasing [12:36] We're discovering items while scraping as well. [12:36] yikes, I figured as much [12:36] So, we're going to need all we can get. [12:36] cool, I'm definitely down to throw some $$$ on the fire [12:36] it's an insane cut-off date and an important public record, so ridiculous that they'd pull the plug so fast, but not surprising [12:37] (on top of rate limiting, grr) [12:37] * GLaDOS pushes viseratop into #BurnTheMessenger [13:18] oh yay.. oh fantastic. Google getting rid of Google Reader. [13:26] Howdy, if Yahoo is rate-limiting all 6 of my instances, should I shut down and come back with just 1 or is it going to be more efficient to have some other number trying? [13:27] ...I'll ask in #BurnTheMessenger [14:11] Hi, i was hoping to use Archive Team for Posterous [14:11] and the web panel says to ask in here prior to running, so that's what i'm doing.. [14:13] BoomBox: If you're fine with the possibility of being temporarily banned from it, go ahead. [14:13] Otherwise, stahp. [14:14] GLaDOS, assuming you're not a bot attempting to test for the rest of eternity, how long would the temp ban last for? [14:15] BoomBox: The ban time varies. It used to be over a week. Shortened to about 4 hours when we started archiving. [14:15] So, lets say about a day possibly. [14:15] aah [14:16] completely fine with the possibility then [14:16] Well then, go ahead! [14:16] Grab the fucker. [14:16] :D [14:38] Shall I remove that warning? (Or at least make it less threatening?) There's no such warning for Yahoo Messages. [14:39] Never mind, I've removed it. [14:42] http://warriorhq.archiveteam.org/ [14:46] We still need one in Africa. [14:46] alard, could potentially use "You have the risk of possibly being temporiarily banned from it, feel free to venture. If you have any questions, ...." [14:48] BoomBox: Yes. Although you shouldn't get banned by Posterous at the moment. The warriors are contacting a special Posterous server that doesn't ban. [14:58] Whoa, almost 300 warriors in the wild [15:02] expand the leaderboards, add "teams" (the whole folding@home vibe), and there could be 100,000. [15:28] Yeah, I'd say that the dealmaker for me is the EC2 AMI [15:29] I tried to get the generic AMI that GLaDOS suggested last night working, but I could never connect to it to see if it was working, and it seems to be gone now. [15:29] But if there's a multi-project capable AMI out there, I'd be interested. [15:29] I think I could probably make one. [15:29] but… time and all that shit :) [15:29] There's an AMI in N.Virginia [15:29] Yep [15:29] but it's yahoo-only [15:30] I'm spot requesting it right now [15:30] Ah [15:30] There should be an AMI called i386-debian-squeeze-warrior or something [15:30] I tried that briefly last night [15:31] this morning, I could not find it again, even by ID [15:31] ..huh [15:31] At any rate, this seems like a great workload for EC2 :) [15:32] I've got 100 spot requests and 20 instances running the AMI [15:32] Weird. [15:32] STRAIGHT TO THE MOON [15:32] Also, straight to the #BurnTheMessenger before Cameron_D sees us [15:45] Newb question: can I run the Archive Warrior in a virtualbox on an already virtual machine? (I have a virtual web server on vps.net with space) [15:46] You can, I believe. [15:46] Just need to foward the ports [15:48] You mean ssh-forward the ports? I don't want to run the warrior on my local machine; I want to run it on the virtual server and store the data there. [15:49] (I'll ALSO run it locally, but I have less bandwidth and space than it does) [15:49] As in, have the virtual appliance forward the internal ports. [15:51] Hmm...ok, I'll try to set it up and figure it out when I get to that point. [16:29] I'm trying to archive a WordPress website with the gravatar stuff installed. My wget commands are making the .HTML files balloon to a dozen or more megabytes each. Anyone have any suggestions on how to skip the dynamic gravatar stuff? [16:32] i downloaded the warrior. virtualbox. in the browser i looked at the projects and first clicked on posterous and then yahoo. now the "available projects" page tells me posterous is running, yahoo is not. "current project" page however shows yahoo running [16:33] being able to limit the upload rate would be nice [16:33] Schbirid: Because it will first finish the current tasks before switching to the new project [16:34] @GlaDOS, when you were talking about forwarding ports, were you talking about setting up port 8001 back to my local machine so that I can configure it? [16:35] gevmage: Yeah [16:35] Wait, what [16:35] I mean, allowing HTTP access into the VM over port 8001 [16:35] What virtualization software are you using for the warrior, virtualbox? [16:35] Yes. [16:35] Yes, virtualbox. [16:36] Is that in "settings" before I launch the appliance? [16:36] Wait, are you using it on windows? [16:36] If so, it automatically does it [16:36] I don't do windows. [16:37] Phew [16:37] local machine is Ubuntu [16:37] outer virtual web server is Debian of some variety. [16:37] But yeah, I'm tired beyone the point of helping [16:37] * GLaDOS hurls soultcer at gevmage [16:38] If you imported the Warrior OVA it automatically sets up the port forwarding [16:40] Ok. So far my status is ova downloaded, unpacked. package-manipulation done so that virtualbox does't complain about not having a drive. When I ran it it seemed to stop at 20%. [16:40] But reading through the instructions, I have to get to port 8001 on the remote machine. [16:40] So I'll have to ssh forward that port, I guess. [16:42] Ah, and now I see it's complaining about not having enough RAM. I'll sort that out. [17:13] hi all [17:13] is there any way to run the software without virtualbox, e.g. on a vserver? [17:16] heya, bowman__ - yes, if you just want to run the yahoo job you can use the instructions here (for linuxes) [17:16] http://pastebin.com/dJYURk6m [17:16] duggan: cool, I'll take a look [17:24] Ok, another newb question. Do I have to run virtualbox as root? [17:25] not unless you want to use a low port number [17:26] Oh. Hm. Twice when I've tried to run it, I got: Message from syslogd@openclasses at Mar 23 17:22:38 ... kernel:[901697.913991] general protection fault: 0000 [#1] SMP [17:27] How long does it typically take to start up? [17:30] on my box, it only takes a couple of seconds [17:58] Oh, well then it's borked. [17:58] I'm trying to run it virtual machine within virtual machine. [17:58] It seems to keep running (in top) but it stops making progress that I can see. [17:58] That's why I was trying to run the other instructions. [18:00] Is it possible to use rin-pipeline to run two different projects on the same box? [18:00] I'm getting a socket.error: [Errno 98] Address already in use [18:00] error [18:01] http://paste.ee/p/UAYQk [18:02] chazchaz: run-pipeline starts a webserver, but you can either disable it with --disable-web-server or change the port with --port [18:03] ahh. didn't know that. WHat does the web server show? [18:04] Logs from the process [20:11] http://www.fas.org/blog/secrecy/2013/03/ntrs_dark.html [20:11] it's not just private entities that are unreliable stewards of digital history. [20:12] um. or the people complaining about it. google cache of the above: http://v.gd/UQW3Bh [20:13] "In other words, all NASA technical documents, no matter how voluminous and valuable they are, should cease to be publicly available in order to prevent the continued disclosure of any restricted documents, no matter how limited or insignificant they may be." [20:16] https://archive.org/details/nasa_techdocs [20:18] it's crazy how the Yahoo leaderboard is going nuts by now :) [20:19] DFJustin: that's good. [20:37] I thought the thing was going to be a lost cause before ycombinator had that link to a blog posting [21:30] The photos/info @ http://www.skyscrapercity.com might be worth archiving. The site isn't closing anytime soon but all the photos are hotlinked and more of them disappear each day. I've personally been keeping a mirror of the just the Philippines sections with all photos, I can't store it all though (the rest of site) [21:30] I suspect they are exactly the sort of photos that'll interest people years from now for history... and nicely categorized in forum threads