[00:12] strange surge in posterous grabbing rate this hour: http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/project_items.html [00:19] alard: if you'd like to delegate a second dba, I'd love to have two people in charge :) [00:19] alternately, document it on the wiki I guess [00:34] http://archiveteam.org/index.php?title=Tracker [01:35] CHAT DONE [01:35] Gave the talk [01:35] Spoiler alert, I yelled [01:49] Did they video it [02:02] Yepo [02:02] Up next week [02:21] I see some spammers are coming in to the wiki again [02:21] e.g. http://archiveteam.org/index.php?title=User:JPSMavis [02:30] SketchCow, did webstock do full video recordings? Looking through the past talks coverage looks spotty [02:30] Yes [03:24] can someone explain this? http://web.archive.org/web/*/http://www.2ndwave.com/details.asp?ProductID=55 [03:24] I click on it [03:24] says "not in IA" :( [03:49] is seems all the captures from http://web.archive.org/web/20070624184153/http://www.2ndwave.com/ are gone :/ [03:49] was there some sort of data loss? [14:44] more scanned magazines: http://www.digitpress.com/library/magazines/ [15:03] https://twitter.com/hukl/status/307469987826761729 .... [15:03] holy shit that better not be true [15:30] Smiley: Isn't that just sudo password caching? [15:30] underscor: kind of [15:31] sudo checks the time of the cache against epoch [15:31] .... [15:31] so if your at epoch [15:31] blam, stright into root [15:31] Have you tried that tweet? [15:32] It doesn't make sense to me that setting yourself near the epoch would give you instant root [15:32] hmmm kind of [15:32] but stipdly I did it on my desktop [15:32] (based on what I know of the sudo codebase) [15:32] now I can do it clearly, let me try again without exploding all the ssl certs etc [15:32] Oh wait [15:32] yeah its not a bug in sudo [15:32] its a bug in the fact X is running as root [15:33] and KDM/Gnome lets you set the date... (as root). [15:37] (as far as I understand). [15:41] hmm, the default polkit policy in GNOME requires authenticating as root, unless the user is already in the 'wheel' group [15:41] (in which case the user is already allowed to become root via pkexec) [15:54] Has anyone tried using Heritrix instead of wget on sites that wget fails out on? [17:18] how is the project switching IPs so often now? [17:20] Amazon EC2 [17:20] Check out #preposterus for more discussion on it btw [17:24] makes sense, since they bill by the hour and only work for an hour [17:25] Hey, I just downloaded the warrior VM, and I can't get past the settings screen [17:26] the JS console says "INVALID_SETTINGS" [17:26] titanous: The tracker server is currently overloaded, it can't send you a list of projects [17:26] ah, k, I'll wait for things to settle down then [17:28] looks like it's working now, is there anything special I should do about the Posterous ban? [17:29] titanous: Posterous runs a cronjob that will ban you exactly 50 minutes after the full hour [17:29] So it's best to start at XX:50 so you can do a full hour [17:29] k, got it [17:29] how long does the ban last? [17:29] About a day or two [18:05] = Please join #preposterus - if you're here regarding Posterious closing down. = [18:58] SketchCo2: Uploading of what? [18:59] chronomex: I'm all for sharing the management access. [19:44] Posterous [19:46] I'm in conversation with the lead engineer of Twitter. [19:46] He may ask for IP ranges for ban amnesty. [19:46] P.S. Posterous is hosted on Rackspace [19:46] Twitter knew they were going to shut it down the day they bought it, so they didn't bother porting it to Amazon. [19:50] SketchCow: can they just provide a data-dump? [20:09] that means, if you host your server at rackspace, and get Posterous's local-network IP addresses, they wouldn't have to pay bandwidth costs for the transfer [20:11] Is there someone who wants to finish Punchfork? The users that are left on the tracker have been tried a few times before, so they're probably large or have some other problem. [20:11] can i swith my warrior remotely [20:11] i'll happily leave it running all weekend [20:11] If you can access it, you can. [20:12] yah, ssh [20:12] i have one warrior running on punchfork. [20:12] I just don't know the command. [20:12] it has two jobs over 1,000 items and they look stuck [20:13] Smiley: Tunnel the http connection (to port 8001), or use some curl thing (let me look that up). [20:14] i need the curl really, i'm not on my linux system atm to tunnel easily [20:14] curl -d "project_name=punchfork" http://localhost:8001/api/select-project [20:15] crap, whats teh IP of my warrior :S [20:16] oh yeah, do't need it [20:16] urgh idiot moments for me. Thanks alard [20:16] punchfork is showing "No item received" for me [20:17] I can't see it here :D [20:17] I will be able to later... [20:22] a have a stuck punchfork job :( [20:22] urbanglobetrotterblog.com stopped after 1420 items [20:22] *"URLs" [20:22] some huge users [20:22] 2gb. [20:23] sep332: Yes. That's why I don't really want to put these items back into the queue. If they're still not done they're likely to be difficult. [20:24] :/ [20:24] I'll start up one of my larger work horses [20:27] makes sense. difficult just meaning big, or requiring extra fiddling? [20:28] Both, probably - and unfortunally [20:28] oh hey it went to 1440, guess it's not toally stuck [20:30] alard: I'm up for some punchfork. could you release just a few? like 5-10? [20:37] sep332: Re Setting up a server at rackspace: Incoing traffic is free on Amazon EC2. The expensive part is outgoing traffic caused by uploading the warc.gz file to the internet archive [20:37] I actually ran posterous-grab at rackspace - The 0,5 msec ping to posterous was nice, but outgoing traffic is actually more expensive than when using Amazon [20:37] And instances are more expensive as well [20:38] true about outgoing traffic. but it twitter's not paying for bandwidth, maybe they'd cooperate more... ok maybe not :) [20:39] also if it's much faster, you might need the servers for less time so it could be cheaper. i have no idea about the math for that though. [20:41] if time @ amazon + amazon b/w < time @rackspace + rackspace bw.... [20:41] So if money saved from less time < money saved due to price diff @ amazon [20:41] Amazon Spot instances are 10 times cheaper than rackspace regular instances [20:42] Amazon traffic out is $0.12/gb, rackspace I think $0.16/gb [20:42] so it'd need to be like 30x faster or something [20:42] While the scripts and dynamic pages are on rackspace, static assets like images are actually on Amazon S3 [20:56] It'd be great to know people inside Amazon and Rackspace [21:14] ersi, Smiley: I've added the remaining punchfork tasks to your queues. [21:16] thanks [21:16] it should keep crunching away quite happily, I gave my warrior extyra ram at some point in the past, not sure if it still has it tho [21:16] alard: got the command to do the port forward handy? [21:17] Smiley: Isn't that something with -L ? ssh -L LOCAL_PORT:127.0.0.1:8001 [21:18] yeah, sounds about right [21:18] * Smiley checks his logs [21:18] 478 ssh -i ./.ssh/amazonkey.pem -L 8002:localhost:8001 admin@ec2-184-72-85-21.compute-1.amazonaws.com [21:18] there we go, thats basically it [21:20] ssh -L 8001:localhost:8001 tim.bowers@10.2.1.134 -f -N [21:20] thgat's backgrounded [21:20] - Downloaded: 13440 URLs. [21:20] Starting WgetDownload for Item user-Taylor_Lynn - Downloaded: 11040 URLs. [21:20] yup, all big users [21:20] grabbing at ~400Kbs+ [22:47] alard: Doesn't seem to pick 'em up [22:49] alard: I'm running punchfork-grab stand-alone, if that matters [23:04] Guys.... [23:04] WHo is alive who understands puynchfork? [23:04] http://pastebin.com/rXvRgixt - it blew up when zipping. [23:05] I have one currently at - Downloaded: 34050 URLs. too [23:07] ersi i think he returned them to just Smiley [23:07] he gave them to us both [23:07] oh [23:07] not sure if we goth both each or what [23:09] media-cdn1.pinterest.com doesn't exist [23:10] gthub is being super dumb..im looking at your pastebin smiley.. [23:10] guess we should try: wrap that bitch for socket.gaierrors [23:10] S[h]O[r]T: o_O [23:10] and ^^. i see the connection failure at the end but also quesiton if it even downloaded any data for that user? [23:10] just a guess from looking at those lines in pipeline.py, probably wrong [23:11] duno :/ [23:12] i on't have ssh access directly to the warrior atm [23:12] only the web interface [23:12] its been doing upto 800Kb/s [23:13] so it's doing "something" :/