[09:10] Any rsyncers around? [09:12] nevermind [10:20] ARUGH MY WARRIOR HAS GONE BONKERS [10:21] What's wrong? [10:21] Tried to run urlteam and errrr wow, 2012-11-30 10:20:48,117 tinyback.Tracker INFO: Initializing tracker at http://tracker.tinyarchive.org/v1/ [10:21] Traceback (most recent call last): [10:21] File "./single_task.py", line 36, in [10:21] task = tracker.fetch() [10:21] File "/data/data/projects/URLTeam-0937f7a/tinyback/tracker.py", line 28, in fetch [10:21] raise Exception("Unexpected status %i" % status) [10:21] Exception: Unexpected status 502 [10:21] Process TinyBack returned exit code 1 for Item [10:21] Failed TinyBack for Item [10:22] Tracker's down apparently [10:48] pity it's not on tracker.archiveteam,org, else my drunk ass could reset it right now [10:59] yay, this 16k-picture Webshots profile is almost done :D [11:04] SmileyG: Yeah, I have the same thing [11:05] k [11:05] Nothing to run now =)) [11:08] Muad-Dib: what's your name on the tracker? [11:17] same as irc [11:18] almost to 100 GB now [11:18] nice [12:01] chronomex: Do you have access to the webshots user list? I'm assuming there are some that have been checked out forever that we could retry. [12:04] SmilleyG: the URLTeam tracker is down. HTTP 502 [12:05] :( [12:07] tuankiet: Tracker should be back up now [12:08] soultcer: OK :D [12:08] I ran a webshots downloader on the same machine as the tracker, and somehow the wget-lua process of that crawler stole all the memory and the incredibly smart oom-killer decided that it is best to kill small processes instead of the one memory-hogging one. [12:09] Ah, bad thing [12:10] I should have ran the dailybooth downloader as a separate user with resource limits turned on [12:13] how many ram do you have [12:16] 1 GB + 1 GB swap [12:17] alard: Good to see you again ;-) [12:18] Hi. [12:22] its alive! [12:24] alard is alive! [12:25] We all were worried [12:30] Ah, where are you in the last few days alard? [12:31] Oh, in a very good place, actually. I was in Istanbul. [12:31] Neat [12:32] Travel alard? [12:32] chronomex: In the meantime, you've apparently downloaded all of webshots etc. Good. (And repaired the tracker as well, it seems.) [12:32] alard: We did what any computer expert does when software is running slow. We threw more hardware at it ;-) [12:33] Or rather, restarted it [12:33] tuankiet: Some sort of holiday. There's a lot to see there. (But that's -bs stuff.) [12:33] Ah, that always works. [12:39] There's still a bit of webshots, cityofheroes left. I've requeued it now. [12:40] I still got workers running webshots, so should be picking it up [12:40] ersi: I think the linode did actually get upgraded [12:42] Dailybooth: there are more users, the tracker only has the first batch. [12:42] What's up with www.archiveteam.org? Is that a known problem? [12:43] alard: cityofheroes should be the "archive team's choice" now since it's going down something like today, right? [12:43] Oh, luckily we still have a month for dailybooth [12:43] Oh, it's back now. [12:44] Are you guys aware that some of your archive.org torrents are being improperly webseeded [12:44] Deewiant: Is it? Last time I looked it was all very vague. We have 217 items left, so that's not too much. [12:45] alard: I'm not sure, but at least a few people have been worried about it here in the past few days. [12:45] me too [12:46] November 30th is the game's "planned end of services" date, at least [12:47] People are generally very worried about things here [12:47] :D [12:47] I'm worried about the end of the universe. [12:48] the Russian government issued an official statement according to which you shouldn't [12:49] Deewiant: CoH is now the "archive team's choice" project. [12:49] Cool, that seems like the right call to me :-) [12:50] switching to CoH too [12:50] Muad-Dib: If you switch, please switch to "archiveteam's choice". CoH will be done pretty soon, I think. [12:51] after that I´ll switch back to webshots, see if I can break the 125GB mark [12:53] alard: I´ll switch to choice, will it switch to webshots after CoH forums? [12:53] has the user discovery for webshots already finished? [12:53] It'll switch to the next AT Choce, when it's changed. Depends on what it'll be changed to. [12:54] Yes, I think webshots would be next. [12:55] Webshots user discovery isn't finished yet. We've explored about half it. [12:55] maybe more people need to be put on it :/ [12:57] put on the user discovery* [12:58] The user discovery is included in the user downloading, so everyone who downloads looks for a few new users first. [13:01] yeah, but maybe we need some people running only the user discovery, so it isn´t slowed by the gallery downloading [13:02] >_> [13:02] on those peers* [13:06] going home, brb [13:30] alard: Will CoH end after this? [13:31] 2 million items on Dailybooth o_0 0_o [13:39] Why do you think alard would know? He's not CoH :) [13:53] alard: Dailybooth is not downloading any users, just getting 404s [13:55] soultcer: Oh. Any idea why? [13:56] Maybe there are no users with those IDs [13:56] Finding username for ID 495912: not found (response code 404). [13:56] Received item '495912' from tracker [13:58] They changed the API. [13:58] Sorry. I am running CoH now [13:58] Those bastards [13:59] haha just great, http://blog.dailybooth.com/category/api/ is a known malware site according to google [14:00] Yes, you get funny popups if you go to http://developers.dailybooth.com/ [14:00] soultcer: yeah, been like that pretty long :) pretty funny [14:01] Apparently https://api.dailybooth.com/v1/users/1.json still works (the HTTPS version) [14:01] Makes me really want to trust them with my precious family pictures [14:03] {"error":{"error":"rate_limit","error_description":"Rate limit exceeded.","error_code":412}} [14:04] guess they finally figured out it was wget-lua and not some macos webkit :D [14:04] Probably. [14:05] Is it just me or is dailybooth mostly girls doing mirror-shots? [14:05] So? :) [14:05] I think that was the original idea: take a picture of yourself every day. [14:07] "Users seem to be predominantly teenagers — many of whom seem very bored" http://paidcontent.org/2011/03/09/419-photo-social-network-dailybooth-raises-6-million/ [14:09] Anyway, I guess we will need a new way to go from user id to username, or we have to start boring old username discovery using search engines [14:11] The current version of the script also uses the API to discover the photos, comments etc. [14:13] We could try it via https://api.dailybooth.com first, see how that works. (I'm not sure when you get the rate limit message.) [14:13] https://api.dailybooth.com/v1/status/rate_limit.json isn't very informative. [14:19] It seems that dailybooth is returning 404 for all API requests, even from new IPs? [14:22] It redirects from api.dailybooth.com to dailybooth.com, I think. This still works: https://api.dailybooth.com/v1/users/495911.json [14:30] alard: Let's help them [14:32] Webshots shoult listen to One More Night =)). There is 1 night before the old Webshots die =)) [14:34] will we make it? [14:35] DailyBooth API run again https://api.dailybooth.com/v1/users/495911.json [14:35] Output: {"user_id":495911,"username":"Krystofao_O","picture_count":9,"followers_count":67,"following_count":22,"private":false,"details":{"name":"Krystofa Hill","gender":"male","age":16,"country":"United Kingdom","relationship_status":"single","about":"Hello. Im 17.Bisexual. I go to college :). First Year. From England. I absolutley love different cultures. Im a very talkative person, but when meeting peop [14:35] Pretty Little Liars, One Tree Hill, The O.C, Ghost Whisperer, Gilmore Girls, Dr Quinn Medicine Woman, The Tribe.\n\n Tamna Island. Boys Over Flowers. You're Beautiful. Tree Of Heaven. Stairway To Heaven. Sad Love Story. Truth. Forever Yours. My Girlfriend is a gumiho. Prosecutor Princess. Secret Garden. City Hunter. Scent Of A Woman. Brilliant Legacy. Dae Jang Geum. Damo. Palace. ","music":"M [14:35] Se7en, B2ST, , T.O.P. Block B. Infinite. SHINee, Super Junior. \n\n","websites":"www.facebook.com\/krystofahill\nwww.twitter.com\/krystofao_O","movies":null,"books":null},"avatars":{"tiny":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/tiny\/495911_26931560.jpg","small":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/small\/495911_26931560.jpg","medium":"http:\/\/d1oi94rh653f1l.cloudf [14:35] arina & The Diamonds. Imogen Heap. Paloma Faith. David Guetta. Adele. Lana Del Rey. Panic! At The Disco. 30 Seconds to Mars. My Chemical Romance. Rihanna. Nicki Minah. Greyson Chance. Kelly Clarkson. Oh Land. Natalia Kills. Cher Lloyd. VV Brown. Katy Perry. Eminem.Professor Green. Pink. Poets of the Fall. Cute is what we Aim For.\n\nBIGBANG , U-kiss, 2NE1, SNSD,G- DRAGON, C.N. Blue,FT Island, , [14:35] le for the first time im very shy ^.^","interests":"The only historic thing i am intrested in is the Egyptians, ooo and also ancient South Korea :).\r\nHmm, i do like lollypops in class, since it occupys me and helps me concentrate :L\r\nMOOSIQUE, Love that stuff.\r\nMOOVIES, i also love them\r\nBOOKIES, i am quite a book fantastic, lol.","tv":"Supernatural, Smallville, Vampire Diaries, True Blood, [14:35] ront.net\/16\/avatars\/medium\/495911_26931560.jpg","large":"http:\/\/d1oi94rh653f1l.cloudfront.net\/16\/avatars\/large\/495911_26931560.jpg"}} [14:35] that seems to work [14:36] Uh.. Dude. [14:36] D: [14:37] hm? [14:37] honestly I wasn't looking at any of the content [14:43] Can we rescue Webshots? [14:45] What kind of answer are you expecting? Also, why are you asking - after we've downloaded 50+TB of data? [14:46] Just curious ;) I ask that, can we rescue all the data? [14:46] First project for you? Just curious [14:48] After the MobileMe, I run but I found my upload speed was crazy, so after that I stop run but I run again after that. So you can understand that this is my first project ;) [14:50] Basically, I'd say "Yes, we've saved Webshots" to your initial question; Considering we've downloaded 55TB of Webshots data - which is more than if we wouldn't have done anything. I understand the idea of trying to save something perfectly, and I'm sure everyone here wants to do that for the most part - but it's most often impossible [14:51] Thanks [14:51] that's how I think about it at least [15:12] don't we need to do a second pass on CoH to get new posts [21:55] alard, chronomex: It appears the webshots user discovery tracker is down [21:56] So each webshots download will be slowed down by waiting for a timeout on the username discovery part [21:57] are you sure? I'm not sure what's going on but there's some scrolling logs in a screen session named webshots-adder [21:57] and webshots-tracker [21:58] Yeah, trying to reach it results in a timeout. The webshots user discovery tracker was listening on HTTP port 8123 [22:02] Strange. [22:02] alard: can you take this from here? [22:03] chronomex: I'm taking a look. [22:03] cool, thanks [22:13] soultcer: Better now? [22:15] alard: I am now getting HTTP 500 when submitting results [22:20] soultcer: Ah, yes. Fixed now? (I'm now running the discovery tracker in Nginx, instead of as a standalone application. There were too many requests for a single-threaded app.) [22:21] alard: Yes, working fine now. Great work ;-) [22:22] Good. [22:46] Dailybooth is now completely unavailable, it seems. [22:47] A temporary lapse, it's back now. [22:48] i'm grabing http://crypto.stanford.edu [22:49] looks like there very few crawls [23:12] OK, back. [23:13] hey SketchCow [23:13] looks like i'm finding vm dumps on crypto.standford.edu [23:28] I'm going to Stuttgart Germany tomorrow. I assume nobody here is from that region. [23:33] The latest set of DailyBooth scripts fixes the api url and can handle the rate limiting. [23:41] alard: The wayback machine loaded all the data. [23:41] I mean, I don't know how much of ours it grabbed in, but it's all up on the beta wayback.