#archiveteam 2012-11-30,Fri

↑back Search

Time Nickname Message
09:10 πŸ”— ersi Any rsyncers around?
09:12 πŸ”— ersi nevermind
10:20 πŸ”— SmileyG ARUGH MY WARRIOR HAS GONE BONKERS
10:21 πŸ”— norbert79 What's wrong?
10:21 πŸ”— SmileyG Tried to run urlteam and errrr wow, 2012-11-30 10:20:48,117 tinyback.Tracker INFO: Initializing tracker at http://tracker.tinyarchive.org/v1/
10:21 πŸ”— SmileyG Traceback (most recent call last):
10:21 πŸ”— SmileyG File "./single_task.py", line 36, in <module>
10:21 πŸ”— SmileyG task = tracker.fetch()
10:21 πŸ”— SmileyG File "/data/data/projects/URLTeam-0937f7a/tinyback/tracker.py", line 28, in fetch
10:21 πŸ”— SmileyG raise Exception("Unexpected status %i" % status)
10:21 πŸ”— SmileyG Exception: Unexpected status 502
10:21 πŸ”— SmileyG Process TinyBack returned exit code 1 for Item
10:21 πŸ”— SmileyG Failed TinyBack for Item
10:22 πŸ”— Deewiant Tracker's down apparently
10:48 πŸ”— chronomex pity it's not on tracker.archiveteam,org, else my drunk ass could reset it right now
10:59 πŸ”— Muad-Dib yay, this 16k-picture Webshots profile is almost done :D
11:04 πŸ”— tuankiet SmileyG: Yeah, I have the same thing
11:05 πŸ”— SmileyG k
11:05 πŸ”— tuankiet Nothing to run now =))
11:08 πŸ”— chronomex Muad-Dib: what's your name on the tracker?
11:17 πŸ”— Muad-Dib same as irc
11:18 πŸ”— Muad-Dib almost to 100 GB now
11:18 πŸ”— chronomex nice
12:01 πŸ”— DoubleJ chronomex: Do you have access to the webshots user list? I'm assuming there are some that have been checked out forever that we could retry.
12:04 πŸ”— tuankiet SmilleyG: the URLTeam tracker is down. HTTP 502
12:05 πŸ”— SmileyG :(
12:07 πŸ”— soultcer tuankiet: Tracker should be back up now
12:08 πŸ”— tuankiet soultcer: OK :D
12:08 πŸ”— soultcer I ran a webshots downloader on the same machine as the tracker, and somehow the wget-lua process of that crawler stole all the memory and the incredibly smart oom-killer decided that it is best to kill small processes instead of the one memory-hogging one.
12:09 πŸ”— tuankiet Ah, bad thing
12:10 πŸ”— soultcer I should have ran the dailybooth downloader as a separate user with resource limits turned on
12:13 πŸ”— tuankiet how many ram do you have
12:16 πŸ”— soultcer 1 GB + 1 GB swap
12:17 πŸ”— soultcer alard: Good to see you again ;-)
12:18 πŸ”— alard Hi.
12:22 πŸ”— SmileyG its alive!
12:24 πŸ”— chronomex alard is alive!
12:25 πŸ”— chronomex We all were worried
12:30 πŸ”— tuankiet Ah, where are you in the last few days alard?
12:31 πŸ”— alard Oh, in a very good place, actually. I was in Istanbul.
12:31 πŸ”— ersi Neat
12:32 πŸ”— tuankiet Travel alard?
12:32 πŸ”— alard chronomex: In the meantime, you've apparently downloaded all of webshots etc. Good. (And repaired the tracker as well, it seems.)
12:32 πŸ”— soultcer alard: We did what any computer expert does when software is running slow. We threw more hardware at it ;-)
12:33 πŸ”— ersi Or rather, restarted it
12:33 πŸ”— alard tuankiet: Some sort of holiday. There's a lot to see there. (But that's -bs stuff.)
12:33 πŸ”— alard Ah, that always works.
12:39 πŸ”— alard There's still a bit of webshots, cityofheroes left. I've requeued it now.
12:40 πŸ”— ersi I still got workers running webshots, so should be picking it up
12:40 πŸ”— Deewiant ersi: I think the linode did actually get upgraded
12:42 πŸ”— alard Dailybooth: there are more users, the tracker only has the first batch.
12:42 πŸ”— alard What's up with www.archiveteam.org? Is that a known problem?
12:43 πŸ”— Deewiant alard: cityofheroes should be the "archive team's choice" now since it's going down something like today, right?
12:43 πŸ”— soultcer Oh, luckily we still have a month for dailybooth
12:43 πŸ”— alard Oh, it's back now.
12:44 πŸ”— Muad-Dib Are you guys aware that some of your archive.org torrents are being improperly webseeded
12:44 πŸ”— alard Deewiant: Is it? Last time I looked it was all very vague. We have 217 items left, so that's not too much.
12:45 πŸ”— Deewiant alard: I'm not sure, but at least a few people have been worried about it here in the past few days.
12:45 πŸ”— Muad-Dib me too
12:46 πŸ”— Deewiant November 30th is the game's "planned end of services" date, at least
12:47 πŸ”— ersi People are generally very worried about things here
12:47 πŸ”— SmileyG :D
12:47 πŸ”— SmileyG I'm worried about the end of the universe.
12:48 πŸ”— Nemo_bis the Russian government issued an official statement according to which you shouldn't
12:49 πŸ”— alard Deewiant: CoH is now the "archive team's choice" project.
12:49 πŸ”— Deewiant Cool, that seems like the right call to me :-)
12:50 πŸ”— Muad-Dib switching to CoH too
12:50 πŸ”— alard Muad-Dib: If you switch, please switch to "archiveteam's choice". CoH will be done pretty soon, I think.
12:51 πŸ”— Muad-Dib after that I´ll switch back to webshots, see if I can break the 125GB mark
12:53 πŸ”— Muad-Dib alard: I´ll switch to choice, will it switch to webshots after CoH forums?
12:53 πŸ”— Muad-Dib has the user discovery for webshots already finished?
12:53 πŸ”— ersi It'll switch to the next AT Choce, when it's changed. Depends on what it'll be changed to.
12:54 πŸ”— alard Yes, I think webshots would be next.
12:55 πŸ”— alard Webshots user discovery isn't finished yet. We've explored about half it.
12:55 πŸ”— Muad-Dib maybe more people need to be put on it :/
12:57 πŸ”— Muad-Dib put on the user discovery*
12:58 πŸ”— alard The user discovery is included in the user downloading, so everyone who downloads looks for a few new users first.
13:01 πŸ”— Muad-Dib yeah, but maybe we need some people running only the user discovery, so it isn´t slowed by the gallery downloading
13:02 πŸ”— ersi >_>
13:02 πŸ”— Muad-Dib on those peers*
13:06 πŸ”— Muad-Dib going home, brb
13:30 πŸ”— tuankiet alard: Will CoH end after this?
13:31 πŸ”— tuankiet 2 million items on Dailybooth o_0 0_o
13:39 πŸ”— ersi Why do you think alard would know? He's not CoH :)
13:53 πŸ”— soultcer alard: Dailybooth is not downloading any users, just getting 404s
13:55 πŸ”— alard soultcer: Oh. Any idea why?
13:56 πŸ”— soultcer Maybe there are no users with those IDs
13:56 πŸ”— soultcer Finding username for ID 495912: not found (response code 404).
13:56 πŸ”— soultcer Received item '495912' from tracker
13:58 πŸ”— alard They changed the API.
13:58 πŸ”— tuankiet Sorry. I am running CoH now
13:58 πŸ”— soultcer Those bastards
13:59 πŸ”— soultcer haha just great, http://blog.dailybooth.com/category/api/ is a known malware site according to google
14:00 πŸ”— alard Yes, you get funny popups if you go to http://developers.dailybooth.com/
14:00 πŸ”— ersi soultcer: yeah, been like that pretty long :) pretty funny
14:01 πŸ”— alard Apparently https://api.dailybooth.com/v1/users/1.json still works (the HTTPS version)
14:01 πŸ”— soultcer Makes me really want to trust them with my precious family pictures
14:03 πŸ”— alard {"error":{"error":"rate_limit","error_description":"Rate limit exceeded.","error_code":412}}
14:04 πŸ”— soultcer guess they finally figured out it was wget-lua and not some macos webkit :D
14:04 πŸ”— alard Probably.
14:05 πŸ”— soultcer Is it just me or is dailybooth mostly girls doing mirror-shots?
14:05 πŸ”— ersi So? :)
14:05 πŸ”— alard I think that was the original idea: take a picture of yourself every day.
14:07 πŸ”— alard "Users seem to be predominantly teenagers Ҁ” many of whom seem very bored" http://paidcontent.org/2011/03/09/419-photo-social-network-dailybooth-raises-6-million/
14:09 πŸ”— soultcer Anyway, I guess we will need a new way to go from user id to username, or we have to start boring old username discovery using search engines
14:11 πŸ”— alard The current version of the script also uses the API to discover the photos, comments etc.
14:13 πŸ”— alard We could try it via https://api.dailybooth.com first, see how that works. (I'm not sure when you get the rate limit message.)
14:13 πŸ”— alard https://api.dailybooth.com/v1/status/rate_limit.json isn't very informative.
14:19 πŸ”— soultcer It seems that dailybooth is returning 404 for all API requests, even from new IPs?
14:22 πŸ”— alard It redirects from api.dailybooth.com to dailybooth.com, I think. This still works: https://api.dailybooth.com/v1/users/495911.json
14:30 πŸ”— tuankiet alard: Let's help them
14:32 πŸ”— tuankiet Webshots shoult listen to One More Night =)). There is 1 night before the old Webshots die =))
14:34 πŸ”— balrog_ will we make it?
14:35 πŸ”— tuankiet DailyBooth API run again https://api.dailybooth.com/v1/users/495911.json
14:35 πŸ”— tuankiet Output: {"user_id":495911,"username":"Krystofao_O","picture_count":9,"followers_count":67,"following_count":22,"private":false,"details":{"name":"Krystofa Hill","gender":"male","age":16,"country":"United Kingdom","relationship_status":"single","about":"Hello. Im 17.Bisexual. I go to college :). First Year. From England. I absolutley love different cultures. Im a very talkative person, but when meeting peop
14:35 πŸ”— tuankiet Pretty Little Liars, One Tree Hill, The O.C, Ghost Whisperer, Gilmore Girls, Dr Quinn Medicine Woman, The Tribe.\n\n Tamna Island. Boys Over Flowers. You&#039;re Beautiful. Tree Of Heaven. Stairway To Heaven. Sad Love Story. Truth. Forever Yours. My Girlfriend is a gumiho. Prosecutor Princess. Secret Garden. City Hunter. Scent Of A Woman. Brilliant Legacy. Dae Jang Geum. Damo. Palace. ","music":"M
14:35 πŸ”— tuankiet Se7en, B2ST, , T.O.P. Block B. Infinite. SHINee, Super Junior. \n\n","websites":"www.facebook.com\/krystofahill\nwww.twitter.com\/krystofao_O","movies":null,"books":null},"avatars":{"tiny":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/tiny\/495911_26931560.jpg","small":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/small\/495911_26931560.jpg","medium":"http:\/\/d1oi94rh653f1l.cloudf
14:35 πŸ”— tuankiet arina &amp; The Diamonds. Imogen Heap. Paloma Faith. David Guetta. Adele. Lana Del Rey. Panic! At The Disco. 30 Seconds to Mars. My Chemical Romance. Rihanna. Nicki Minah. Greyson Chance. Kelly Clarkson. Oh Land. Natalia Kills. Cher Lloyd. VV Brown. Katy Perry. Eminem.Professor Green. Pink. Poets of the Fall. Cute is what we Aim For.\n\nBIGBANG , U-kiss, 2NE1, SNSD,G- DRAGON, C.N. Blue,FT Island, ,
14:35 πŸ”— tuankiet le for the first time im very shy ^.^","interests":"The only historic thing i am intrested in is the Egyptians, ooo and also ancient South Korea :).\r\nHmm, i do like lollypops in class, since it occupys me and helps me concentrate :L\r\nMOOSIQUE, Love that stuff.\r\nMOOVIES, i also love them\r\nBOOKIES, i am quite a book fantastic, lol.","tv":"Supernatural, Smallville, Vampire Diaries, True Blood,
14:35 πŸ”— tuankiet ront.net\/16\/avatars\/medium\/495911_26931560.jpg","large":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/large\/495911_26931560.jpg"}}
14:35 πŸ”— balrog_ that seems to work
14:36 πŸ”— ersi Uh.. Dude.
14:36 πŸ”— SmileyG D:
14:37 πŸ”— balrog_ hm?
14:37 πŸ”— balrog_ honestly I wasn't looking at any of the content
14:43 πŸ”— tuankiet Can we rescue Webshots?
14:45 πŸ”— ersi What kind of answer are you expecting? Also, why are you asking - after we've downloaded 50+TB of data?
14:46 πŸ”— tuankiet Just curious ;) I ask that, can we rescue all the data?
14:46 πŸ”— ersi First project for you? Just curious
14:48 πŸ”— tuankiet After the MobileMe, I run but I found my upload speed was crazy, so after that I stop run but I run again after that. So you can understand that this is my first project ;)
14:50 πŸ”— ersi Basically, I'd say "Yes, we've saved Webshots" to your initial question; Considering we've downloaded 55TB of Webshots data - which is more than if we wouldn't have done anything. I understand the idea of trying to save something perfectly, and I'm sure everyone here wants to do that for the most part - but it's most often impossible
14:51 πŸ”— tuankiet Thanks
14:51 πŸ”— ersi that's how I think about it at least
15:12 πŸ”— DFJustin don't we need to do a second pass on CoH to get new posts
21:55 πŸ”— soultcer alard, chronomex: It appears the webshots user discovery tracker is down
21:56 πŸ”— soultcer So each webshots download will be slowed down by waiting for a timeout on the username discovery part
21:57 πŸ”— chronomex are you sure? I'm not sure what's going on but there's some scrolling logs in a screen session named webshots-adder
21:57 πŸ”— chronomex and webshots-tracker
21:58 πŸ”— soultcer Yeah, trying to reach it results in a timeout. The webshots user discovery tracker was listening on HTTP port 8123
22:02 πŸ”— alard Strange.
22:02 πŸ”— chronomex alard: can you take this from here?
22:03 πŸ”— alard chronomex: I'm taking a look.
22:03 πŸ”— chronomex cool, thanks
22:13 πŸ”— alard soultcer: Better now?
22:15 πŸ”— soultcer alard: I am now getting HTTP 500 when submitting results
22:20 πŸ”— alard soultcer: Ah, yes. Fixed now? (I'm now running the discovery tracker in Nginx, instead of as a standalone application. There were too many requests for a single-threaded app.)
22:21 πŸ”— soultcer alard: Yes, working fine now. Great work ;-)
22:22 πŸ”— alard Good.
22:46 πŸ”— alard Dailybooth is now completely unavailable, it seems.
22:47 πŸ”— alard A temporary lapse, it's back now.
22:48 πŸ”— godane i'm grabing http://crypto.stanford.edu
22:49 πŸ”— godane looks like there very few crawls
23:12 πŸ”— SketchCow OK, back.
23:13 πŸ”— godane hey SketchCow
23:13 πŸ”— godane looks like i'm finding vm dumps on crypto.standford.edu
23:28 πŸ”— SketchCow I'm going to Stuttgart Germany tomorrow. I assume nobody here is from that region.
23:33 πŸ”— alard The latest set of DailyBooth scripts fixes the api url and can handle the rate limiting.
23:41 πŸ”— SketchCow alard: The wayback machine loaded all the data.
23:41 πŸ”— SketchCow I mean, I don't know how much of ours it grabbed in, but it's all up on the beta wayback.

irclogger-viewer