[00:06] so uh, looks like someone i knew died [00:07] he was a wikipedia and wikimedia commons user, had a couple of websites [00:07] archive the guy? [00:09] http://commons.wikimedia.org/w/index.php?title=User_talk:Doug_youvan&curid=3452387&diff=71647460&oldid=68672850 [00:09] doesn't seem like a troll, given that he mentioned his ill health (read "will die soon") in an email to me [00:23] winr4r: you could check if his sites are in archive.org's wayback machine: http://web.archive.org/ [00:23] just going thur some shmoocon videos for 2012 [00:23] there video sometimes is shit [00:23] arrith: they are [00:24] mics not working and and sea of blocks [00:25] i would think if this was recorded in 2004 i at lease get why it was happening [00:25] winr4r: he's archived then :) [00:26] winr4r: one thing is if one wanted, one could put together the various links to his stuff as a sort of memorial, since i think archive.org has something for collections [00:26] arrith: i was thinking a doug youvan collection would be good, in case his kids google his name in future [00:26] "PS I am sort of at end life, taking OC every day, with a son 50 years younger than me, now age 7. That's my real motivation. I would like him to know who his dad was." [00:27] ^ email from him to me in january, trying to stop his wikipedia article from being pruned and his wikimedia commons contributions being deleted [00:36] mit youvan? [00:43] okay, guys, now i have access to a VPS, what's the best way to use wget to say "download the whole fucking site in such a way that it'll make an offline-browsable version?" [00:43] * winr4r is new to this. [00:53] wget -r [00:53] wget -mkK was what i wanted apparently [00:57] winr4r: Not going warc? [00:58] mistym: archive.org have it in the wayback machine, which means warc files already exist [00:58] this is the "download a copy of his site" version [00:58] if you know what i mean [01:02] winr4r: Ah, right [01:29] 415mb of doug youvan so far! [01:30] (that's tarred/gzipped, too) [03:24] just because the wayback has some of it doesn't mean they have it all. heratrix skips stuff that robots.txt says to skip, and might also have filesize limits [03:24] plus, if stuff is loaded dynamically via javascript... [03:25] (though wget won't get that on its own, either) [03:25] Coderjoe: fortunately it was all static content [03:28] yeah js scraping is pretty tricky. i guess google does it [03:29] gotta be some way to get that into a wget-type tool [03:29] arrith: if every site had predictable interfaces then it would be easy [03:29] they do not [03:31] hm well overall you'd need something to run the js, so some js engine. then almost a per-site greasemonkey-type thing for custom scraping. [03:32] like userscripts people could collect scripts that work for various sites [03:33] yes [03:49] phantom-js works well for that [03:55] winr4r: Hm, is there any way to scrape output of a JS call to a site's database? [03:56] Assuming that it's standardized - asking since Fileplanet has some metadata stored like this [03:59] shaqfu: i'd suppose so [04:00] winr4r: Trivially? [04:00] shaqfu: use wireshark to figure out what it's doing [04:04] google developed a spec for how sites are supposed to provide an alternate means for those stupid hashbang urls. which is terrible, because it just provides validation to idiot web developers so they think it is a good idea [04:05] you can also use the firebug net tab to see what requests the JS is doing [04:25] first time I've seen a captcha like this [04:25] http://www.google.com/recaptcha/api/image?c=03AHJ_VuuY01ZDHApJCWY4QgNDvLtjdulL39PHYHQ5p9VJwgNpTZDdSQAdiEX2QQJowrHsXi2RYKyJ0BJXeFmVbJrixa2BFfpuyO786M58cLG0AiUD5TW98MInm4FITB-5ZgWTdj4avniALdSN4wDhaQ6FwJn22vyEFQ [04:25] ooh [04:26] yes, they are having you read street signs from streetview [04:27] that's a cool idea [04:28] I made a few hundred dollars back when mturk was new by clicking streetview photos [04:28] it'd be cooler if they released their data [04:28] i got those for the first time a week or two ago [04:28] yeah [04:29] when google acquired recaptcha i was ;/ [04:29] I've gotten a few ones with math in em [04:29] archive.org would do well to have their own recaptcha [04:29] hmmm upside down greek letters with superscripts? TeX time! [04:30] hah [04:30] so there is some hope that maths has been properly ocred into TeX [04:30] i go as far as what i know how to do with the compose key, or google utf [04:31] https://twitter.com/DopefishJustin/status/75283261710024704 [04:31] yeah, but sometimes that's not sufficient :P [04:31] i just say "screw that" and load a new captcha [04:32] I buy a new computer when that happens [05:34] hi [05:35] how are those textfiles.com.7z.xxx files setup? i cant extract them all :| are they split ? [05:36] oh they are [05:36] all good :p [05:42] fuck yeah mturk [05:42] I just made 15 cents [05:42] hahahahahaha [05:45] Hey, if you're a poor South Asian, it's a good deal [05:45] win [05:45] apply it to buy some tunez on amazon [05:59] * Pronoiac slaps SketchCo1 around a bit with a large fishbot [05:59] Whoa, whoops. [05:59] I was going to try to get his attention, but not so much. [06:00] Looking over the conversation, I was going to suggest he try H264 for video encoding, rather than FLV. [06:01] Flash can play both now, right? Use the one with choices for encoders. [06:06] I DID use h264 [06:06] Problem's been solved. [06:06] I'm giving them .avis and they can fuck with it [06:06] With their people. [06:06] Oh crud. [06:07] Over it. [06:07] Hi Jason, sorry I took time to get back to you. Had a great time . I hope to see you next time. Lets keepin touch. Chuck [06:07] I just earned 0.44 in 10 minutes with amazon mturk [06:07] More importantly, I got this in the mail today: [06:07] Chuck Testa [06:07] I can't imagine doing that for any length of time [06:07] I think I would commit suicide [06:07] I'd figured, well, the Flash encoder is known to be crap, but x264 should work. [06:07] Archive Team? Nope, Chuck Testa. [06:07] hah [06:07] underscor: Now you know why it's a lot of poor South Asians that do it [06:08] anyone familiar with beanstalkd for distributed backend tasks? [06:08] :( [06:08] oli: I love beanstalkd [06:08] Never looked into if they build farms for MTurk like they did with captchas [06:09] i have a cps client who is permanently using 2 cpu cores 100% for beanstalkd [06:09] surely its not meant to do that? :/ [06:09] cps = vps [06:10] no [06:10] never had that happen [06:10] sounds like something's broken [06:10] http://audiojungle.net/item/its-alright/99659 using this for a presentation [06:11] but fuck, it makes me want to get up, buy apple products, put on a hipster hoodie, and walk through san francisco or something [06:12] http://archive.org/details/iuma-dick_delicious_and_the_tasty_testicles__atl [06:13] SketchCow's theme band [06:13] My night job [06:13] I like track 2's title [06:14] this customer is running a shitload of php processes doing this: 434270 ? S 0:00 /bin/sh /home/serverping/serverping/run_check_worker_php [06:14] and the beanstalkd [06:14] and im guessing something is borked since beanstalkd cpu usage is so high [06:14] might just kill it and see if they notice :p [06:16] underscor: how does it feel to make $2.64/hr? [06:17] Like a boss [06:17] I tried one of those. [06:17] It was "is this porn" [06:17] winning [06:17] Coderjoe: ...disappointing [06:17] I'd rather do a customer facing job [06:18] than suffer through that [06:18] It's like flipping burgers, from your own home [06:19] underscor: Thanks for the Audiojungle pointer, incidentally. It looks like a good source for podcast music. [06:19] except for 1/5 the pay [06:19] It was win win win [06:19] Although, if someone made a super-duper-awesome OCR program, they could make good money off MTurk not telling anyone about it [06:20] Pronoiac: Sure! [06:20] I love audiojungle [06:20] their snippet site is good too [06:20] codecove or whatever [06:21] all of envato's stuff is great (easy to use, high standards, etc) [06:27] http://archive.org/details/musopen-mozart-marriage-of-figaro-compressed [06:29] hi SketchCow! [06:29] Hi. [06:29] i am archiving/have archived a dead person, may i rsync to you when i am done and you get it into archive.org some time? [06:33] Yes [06:35] 400+mb of Dead Guy, gzipped, and probably a lot more than that once i get his wikimedia commons uploads undeleted [06:36] OK [06:39] how is jason! [06:39] Feelin' fat [06:39] But that's because I had a shitton of rice [06:39] I have the all clear for exercise again [06:39] So that'll start [06:40] And I have a vitamin D deficiency! [06:40] (Not due to non-sunlight - seems genetic) [06:40] So drugs for that. [06:41] excellent [06:41] on the "all clear for exercise" bit obviously [06:44] SketchCow: make sure you're doing some of that cardio for july :D [06:44] so you can run around the rio continuously [06:46] man, s3funnel seems to do a much better job of saturating my connection. it would rock if it had a "sync" mode, rather than just the get, put, and copy commands [06:47] underscor: btw: https://github.com/sstoiana/s3funnel is a continuation/fork of the one I linked to before [06:51] SketchCow: awesome on all fronts (musopen, clear to exercise, that the vitamin D problem has been found and treated) [06:56] zipping these files will ake a day. [06:56] huge. 607gb of files. [06:59] neat [07:12] http://alexbuie.net/ [07:12] is the