#archiveteam 2013-03-01,Fri

↑back Search

Time Nickname Message
00:12 🔗 chronomex strange surge in posterous grabbing rate this hour: http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/project_items.html
00:19 🔗 chronomex alard: if you'd like to delegate a second dba, I'd love to have two people in charge :)
00:19 🔗 chronomex alternately, document it on the wiki I guess
00:34 🔗 chronomex http://archiveteam.org/index.php?title=Tracker
01:35 🔗 SketchCow CHAT DONE
01:35 🔗 SketchCow Gave the talk
01:35 🔗 SketchCow Spoiler alert, I yelled
01:49 🔗 omf_ Did they video it
02:02 🔗 SketchCow Yepo
02:02 🔗 SketchCow Up next week
02:21 🔗 chronomex I see some spammers are coming in to the wiki again
02:21 🔗 chronomex e.g. http://archiveteam.org/index.php?title=User:JPSMavis
02:30 🔗 omf_ SketchCow, did webstock do full video recordings? Looking through the past talks coverage looks spotty
02:30 🔗 SketchCow Yes
03:24 🔗 balrog can someone explain this? http://web.archive.org/web/*/http://www.2ndwave.com/details.asp?ProductID=55
03:24 🔗 balrog I click on it
03:24 🔗 balrog says "not in IA" :(
03:49 🔗 balrog is seems all the captures from http://web.archive.org/web/20070624184153/http://www.2ndwave.com/ are gone :/
03:49 🔗 balrog was there some sort of data loss?
14:44 🔗 ats more scanned magazines: http://www.digitpress.com/library/magazines/
15:03 🔗 Smiley https://twitter.com/hukl/status/307469987826761729 ....
15:03 🔗 BlueMax holy shit that better not be true
15:30 🔗 underscor Smiley: Isn't that just sudo password caching?
15:30 🔗 Smiley underscor: kind of
15:31 🔗 Smiley sudo checks the time of the cache against epoch
15:31 🔗 Smiley ....
15:31 🔗 Smiley so if your at epoch
15:31 🔗 Smiley blam, stright into root
15:31 🔗 underscor Have you tried that tweet?
15:32 🔗 underscor It doesn't make sense to me that setting yourself near the epoch would give you instant root
15:32 🔗 Smiley hmmm kind of
15:32 🔗 Smiley but stipdly I did it on my desktop
15:32 🔗 underscor (based on what I know of the sudo codebase)
15:32 🔗 Smiley now I can do it clearly, let me try again without exploding all the ssl certs etc
15:32 🔗 Smiley Oh wait
15:32 🔗 Smiley yeah its not a bug in sudo
15:32 🔗 Smiley its a bug in the fact X is running as root
15:33 🔗 Smiley and KDM/Gnome lets you set the date... (as root).
15:37 🔗 Smiley (as far as I understand).
15:41 🔗 grawity hmm, the default polkit policy in GNOME requires authenticating as root, unless the user is already in the 'wheel' group
15:41 🔗 grawity (in which case the user is already allowed to become root via pkexec)
15:54 🔗 omf_ Has anyone tried using Heritrix instead of wget on sites that wget fails out on?
17:18 🔗 sep332 how is the project switching IPs so often now?
17:20 🔗 soultcer Amazon EC2
17:20 🔗 soultcer Check out #preposterus for more discussion on it btw
17:24 🔗 sep332 makes sense, since they bill by the hour and only work for an hour
17:25 🔗 titanous Hey, I just downloaded the warrior VM, and I can't get past the settings screen
17:26 🔗 titanous the JS console says "INVALID_SETTINGS"
17:26 🔗 soultcer titanous: The tracker server is currently overloaded, it can't send you a list of projects
17:26 🔗 titanous ah, k, I'll wait for things to settle down then
17:28 🔗 titanous looks like it's working now, is there anything special I should do about the Posterous ban?
17:29 🔗 soultcer titanous: Posterous runs a cronjob that will ban you exactly 50 minutes after the full hour
17:29 🔗 soultcer So it's best to start at XX:50 so you can do a full hour
17:29 🔗 titanous k, got it
17:29 🔗 titanous how long does the ban last?
17:29 🔗 soultcer About a day or two
18:05 🔗 ersi = Please join #preposterus - if you're here regarding Posterious closing down. =
18:58 🔗 alard SketchCo2: Uploading of what?
18:59 🔗 alard chronomex: I'm all for sharing the management access.
19:44 🔗 SketchCo2 Posterous
19:46 🔗 SketchCow I'm in conversation with the lead engineer of Twitter.
19:46 🔗 SketchCow He may ask for IP ranges for ban amnesty.
19:46 🔗 SketchCow P.S. Posterous is hosted on Rackspace
19:46 🔗 SketchCow Twitter knew they were going to shut it down the day they bought it, so they didn't bother porting it to Amazon.
19:50 🔗 eadler SketchCow: can they just provide a data-dump?
20:09 🔗 sep332 that means, if you host your server at rackspace, and get Posterous's local-network IP addresses, they wouldn't have to pay bandwidth costs for the transfer
20:11 🔗 alard Is there someone who wants to finish Punchfork? The users that are left on the tracker have been tried a few times before, so they're probably large or have some other problem.
20:11 🔗 Smiley can i swith my warrior remotely
20:11 🔗 Smiley i'll happily leave it running all weekend
20:11 🔗 alard If you can access it, you can.
20:12 🔗 Smiley yah, ssh
20:12 🔗 sep332 i have one warrior running on punchfork.
20:12 🔗 Smiley I just don't know the command.
20:12 🔗 sep332 it has two jobs over 1,000 items and they look stuck
20:13 🔗 alard Smiley: Tunnel the http connection (to port 8001), or use some curl thing (let me look that up).
20:14 🔗 Smiley i need the curl really, i'm not on my linux system atm to tunnel easily
20:14 🔗 alard curl -d "project_name=punchfork" http://localhost:8001/api/select-project
20:15 🔗 Smiley crap, whats teh IP of my warrior :S
20:16 🔗 Smiley oh yeah, do't need it
20:16 🔗 Smiley urgh idiot moments for me. Thanks alard
20:16 🔗 titanous punchfork is showing "No item received" for me
20:17 🔗 Smiley I can't see it here :D
20:17 🔗 Smiley I will be able to later...
20:22 🔗 sep332 a have a stuck punchfork job :(
20:22 🔗 sep332 urbanglobetrotterblog.com stopped after 1420 items
20:22 🔗 sep332 *"URLs"
20:22 🔗 Smiley some huge users
20:22 🔗 Smiley 2gb.
20:23 🔗 alard sep332: Yes. That's why I don't really want to put these items back into the queue. If they're still not done they're likely to be difficult.
20:24 🔗 Smiley :/
20:24 🔗 ersi I'll start up one of my larger work horses
20:27 🔗 sep332 makes sense. difficult just meaning big, or requiring extra fiddling?
20:28 🔗 ersi Both, probably - and unfortunally
20:28 🔗 sep332 oh hey it went to 1440, guess it's not toally stuck
20:30 🔗 ersi alard: I'm up for some punchfork. could you release just a few? like 5-10?
20:37 🔗 soultcer sep332: Re Setting up a server at rackspace: Incoing traffic is free on Amazon EC2. The expensive part is outgoing traffic caused by uploading the warc.gz file to the internet archive
20:37 🔗 soultcer I actually ran posterous-grab at rackspace - The 0,5 msec ping to posterous was nice, but outgoing traffic is actually more expensive than when using Amazon
20:37 🔗 soultcer And instances are more expensive as well
20:38 🔗 sep332 true about outgoing traffic. but it twitter's not paying for bandwidth, maybe they'd cooperate more... ok maybe not :)
20:39 🔗 sep332 also if it's much faster, you might need the servers for less time so it could be cheaper. i have no idea about the math for that though.
20:41 🔗 Smiley if time @ amazon + amazon b/w < time @rackspace + rackspace bw....
20:41 🔗 Smiley So if money saved from less time < money saved due to price diff @ amazon
20:41 🔗 soultcer Amazon Spot instances are 10 times cheaper than rackspace regular instances
20:42 🔗 soultcer Amazon traffic out is $0.12/gb, rackspace I think $0.16/gb
20:42 🔗 Smiley so it'd need to be like 30x faster or something
20:42 🔗 soultcer While the scripts and dynamic pages are on rackspace, static assets like images are actually on Amazon S3
20:56 🔗 ersi It'd be great to know people inside Amazon and Rackspace
21:14 🔗 alard ersi, Smiley: I've added the remaining punchfork tasks to your queues.
21:16 🔗 Smiley thanks
21:16 🔗 Smiley it should keep crunching away quite happily, I gave my warrior extyra ram at some point in the past, not sure if it still has it tho
21:16 🔗 Smiley alard: got the command to do the port forward handy?
21:17 🔗 alard Smiley: Isn't that something with -L ? ssh -L LOCAL_PORT:127.0.0.1:8001
21:18 🔗 Smiley yeah, sounds about right
21:18 🔗 * Smiley checks his logs
21:18 🔗 Smiley 478 ssh -i ./.ssh/amazonkey.pem -L 8002:localhost:8001 admin@ec2-184-72-85-21.compute-1.amazonaws.com
21:18 🔗 Smiley there we go, thats basically it
21:20 🔗 Smiley ssh -L 8001:localhost:8001 tim.bowers@10.2.1.134 -f -N
21:20 🔗 Smiley thgat's backgrounded
21:20 🔗 Smiley - Downloaded: 13440 URLs.
21:20 🔗 Smiley Starting WgetDownload for Item user-Taylor_Lynn - Downloaded: 11040 URLs.
21:20 🔗 Smiley yup, all big users
21:20 🔗 Smiley grabbing at ~400Kbs+
22:47 🔗 ersi alard: Doesn't seem to pick 'em up
22:49 🔗 ersi alard: I'm running punchfork-grab stand-alone, if that matters
23:04 🔗 Smiley Guys....
23:04 🔗 Smiley WHo is alive who understands puynchfork?
23:04 🔗 Smiley http://pastebin.com/rXvRgixt - it blew up when zipping.
23:05 🔗 Smiley I have one currently at - Downloaded: 34050 URLs. too
23:07 🔗 S[h]O[r]T ersi i think he returned them to just Smiley
23:07 🔗 Smiley he gave them to us both
23:07 🔗 S[h]O[r]T oh
23:07 🔗 Smiley not sure if we goth both each or what
23:09 🔗 ersi media-cdn1.pinterest.com doesn't exist
23:10 🔗 S[h]O[r]T gthub is being super dumb..im looking at your pastebin smiley..
23:10 🔗 ersi guess we should try: wrap that bitch for socket.gaierrors
23:10 🔗 Smiley S[h]O[r]T: o_O
23:10 🔗 S[h]O[r]T and ^^. i see the connection failure at the end but also quesiton if it even downloaded any data for that user?
23:10 🔗 S[h]O[r]T just a guess from looking at those lines in pipeline.py, probably wrong
23:11 🔗 Smiley duno :/
23:12 🔗 Smiley i on't have ssh access directly to the warrior atm
23:12 🔗 Smiley only the web interface
23:12 🔗 Smiley its been doing upto 800Kb/s
23:13 🔗 Smiley so it's doing "something" :/

irclogger-viewer