[00:12] There are files out there that could be directly downloaded [03:48] techcrunch.com/2012/07/11/hashable-the-app-that-aimed-to-replace-business-cards-to-shut-down-on-july-25/ [04:26] BlueMax: Hashable does have some content, but it's available only if you sign in [04:26] a lot of profiles are restricted to an "inner circle" (ha) [04:26] actually [04:26] http://hashable.com/#!/stephencaggiano <-- can anyone tell me if you can see any user data on this without signing up and logging in? [04:27] seems to send me to the front page [04:27] or something frontpagey [04:27] hm [04:27] 0 user data [04:27] I guess if we want to archive it then we'll have to all make fake accounts again [04:28] I can't believe this site has leaderboards [04:28] haha, seriously? [04:28] yeah [04:28] this is kinda sad [04:28] seriously [04:28] this is fucking Persona [04:28] for business [04:28] "guys guys I have the best idea" "what is it?" "gamify business meetings!" "ooooooo" [04:28] Atlus should either be flattered or embarassed [04:29] who's atlus? [04:29] Hashable: SOCIAL LINKS GO [04:29] Atlus USA [04:29] they make the Persona series, amongst a ton of other games [04:29] does not mean anything to me [04:29] anyway [04:30] well, for more info, http://en.wikipedia.org/wiki/Shin_Megami_Tensei:_Persona [04:30] I basically have net-negative gamer and anime credentials [04:31] I tend to learn about this stuff by osmosis [04:31] sure [04:31] I'm pretty good at avoiding culture [04:31] so [04:31] I'm going to be away from home for a week [04:31] shut down computers or leave on? [04:31] shut down [04:32] but then I'd have to reopen everything [04:32] and make sure Wake-on-LAN works [04:32] heh [04:32] I don't have any of that set up [04:32] and they all have crypto root anyway [04:32] woop woop woop off-topic siren [04:33] chronomex: dropbear [04:33] chronomex, hibernate? [04:33] all the convenience of no encrypted root, with all the insecurity of evil maids [04:33] BlueMax: well, I normally would but they've been up so long the kernels have been upgraded on-disk and so it wouldn't come up properly [04:34] reboot then hibernate [04:34] I'm leaving in 20 minutes ... [04:34] I think I'll just shut them down now [04:34] * chronomex master of planning [04:34] that's where all your skill points went [05:35] i was not successful in mirroring planetphillip.com [05:36] he might be trapping bots with some dirs [05:36] i got 404s, 500s etc [05:36] gtg [05:45] http://www.trustedsec.com/july-2012/yahoo-voice-website-breached-400000-compromised/ [05:46] only yahoo voice accounts affected, not yahoo email/yim/groups/etc [05:50] I thought he took all of planetphillip offline [05:59] http://burnbit.com/torrent/206849/yahoo_disclosure_txt [05:59] torrent of the linked yahoo voice file being generated, ~20min to go [09:11] Hashable to shut down on July 25 http://techcrunch.com/2012/07/11/hashable-the-app-that-aimed-to-replace-business-cards-to-shut-down-on-july-25/ [09:15] Again, no warning. Daaamn [09:16] lol I linked that hours ago [09:16] Hey SketchCow what's your opinion on the Ouya [09:17] I invested. [09:17] But big deal, it's an amusing $100 bet. [09:18] I've spent $100 on a dinner that wasn't good [09:18] SketchCow that's the exact reason I did that too [09:18] $150 for a two-controller 1080p emulator box. [09:18] Who can fairly complain? [09:18] And if it gets XBMC and fairly good Android games, all the better. [09:27] http://hashable.com/connections/index?eid=1BZTLZIAX7B60 [09:28] http://hashable.com/connections [09:28] haha [09:35] ? [09:51] BlueMax: #archiveteam-bs for off-topic things >_> [09:51] Oh shush ersi :P [09:54] i love the way it still has a sign up link... [11:01] Schbirid, did you just wget/curl planetphillip? Use any special options? I will just build something that he cannot detect. [11:03] Maybe I am too old school but I have been writing scrapers since the late 1990s. The goal was always to get all the data as easily as possible. Usually by going undetected. I understand last minute grabs and these places expect it [11:04] So lets say he is running a honey pot or blackhole for bots [11:04] or even using some fancy piece of software on the logs [11:05] I would try spoofing my UA to spider the site and build a link list. [11:05] I would then take a statistically significant section of that data and try it out in Firefox automated via selenium and Perl [11:06] With the data in random order and random intervals and random amounts of pages getting opened [11:06] sometimes just 1, sometimes 3 [11:07] I could then identify the trap and skip around it [11:07] usually I find them hidden in navigation blocks [11:07] AT is doing massive data mining and these are just methods for getting the data [11:08] wish I knew this stuff :( [11:09] it is easier to be good at it now days [11:09] the tools are far more advanced and the tracking systems are not even close to catching up. [11:09] google has the best [11:10] one of the meme sites has a funny little trick for stopping robots as well [11:30] maybe we should go to -bs? [11:31] but I'm interested in waht they do. [15:57] omf_: "time wget -w 2 -e robots=off -m -a planetphillip.com_20120711.log -nv --adjust-extension --convert-links --page-requisites --content-disposition --warc-file=planetphillip.com_20120711 http://www.planetphillip.com/" [15:57] with a user agent clearly identifying me since i am a nice guy [15:57] it felt like the grab was giving the server problems, since i got 500s randomly [15:59] well there is a 30 second crawl delay list in the robots.txt. He could have the server setup to throttle too many requests in a 30 second period [15:59] it is a possibility [16:24] http://www.sexfilmler.com see how turkish girls get fucked! [16:25] girls? i thought most turkish porn was gay porn to get people out of mandatory army service [16:25] regardless... [16:25] I didn't really think the mechanics were that different. [16:57] Nooooo, my sex film contact [17:46] Under "who gives a shit", I'll be back in town properly this weekend after HOPE (hanging with Chronomex, actually) and all next week I'm archiveteam fulltime, properly. [17:48] Our logo artist for Archiveteam Warrior is hard at work, that'll be ready soon. [18:33] SketchCow, you mentioned a while back you had some updated s3 scripts. Are they ready for public consumption? [18:59] SketchCow: I'm up to october 12 2011 of gbtv [18:59] i'm also uploading 'The List' episode thats going to be here: http://archive.org/details/GBTV_10_13_2011 [21:07] SketchCow: http://www.engadget.com/2012/07/12/sinking-social-news-site-digg-bought-for-500k-by-nyc-firm-betaw/ [21:07] we may need to do a just in time grab of digg.com [21:12] heh woww [21:12] I remember when Digg was hot shit [21:28] yeah digg was good before they realized they had no idea how to make money [21:28] I am glad to seem them die [22:41]  [23:33] Hi [23:33] :D [23:47] http://hardware.slashdot.org/story/12/07/12/2219257/a-million-year-hard-disk