[00:24] kennethre: quick q... does requests not allow for retrieving the response body if the response code is a 404? [00:24] actually wait [00:25] I'm an idiot, disregard above [00:45] http://dogtv.com/ [01:05] I grabbed the jedi academy & outcast source zips when it was up [01:05] DFJustin, could you sha256sum both files so I can compare mine make sure everything matches up [01:06] 266c87c4fa204c8da87f1d5968abf115e33b332e45f265226cffed7e4a621641 *jediAcademy.zip [01:06] 89a0d8bfb45b194fe8ab0943a25b65089c28457423380aecb760c7e0b2a98209 *jediOutcast.zip [01:07] Mine match up [01:07] \o/ [01:24] Please upload to archive.org. [01:27] on it [01:35] the unedited version? [01:36] http://archive.org/details/jediacademy_source [01:36] http://archive.org/details/jedioutcast_source [01:39] thanks [01:44] Excellent [02:14] I want to stab the guy who presented after you SketchCow. He is totally a publisher apologist [02:16] "Publishers want to get it right." Consider the fucking stacks of shit that comes out places like Apress, Wrox, nostarch [02:17] "They don't want to lock things down." Which is why everything is fucking locked down hard [02:21] More than half way done and no mention of Elsevier, the boycott or the dozens or articles talking about authors who hate the publishing industry [02:41] omf_: is your second sentence from the presentation itself? [02:49] i just popped in, SketchCow has a presentation ?? link? [02:49] bsmith093, - https://www.youtube.com/watch?v=M41806HPaCY [02:50] dashcloud, I am looking for the time so you can watch it youself [02:51] thanks! [03:06] start at 10 minutes 40 seconds in [03:07] He then goes on to talk about all the work publishers do [03:07] Here is how unimportant this guy is [03:08] He has like a staff of 40 and then run a university library and interface with people and buy journals [03:09] SketchCow, has 20-30 core volunteers, 100 more that are big helpers and 100+ more people he pulls in just to run warriors which saves millions of peoples content and then gives it away for free [03:11] This speaker also keeps mentioning how he has friends in the publishing industry, how the publishing industry is all idealists and to go buy your publisher a drink [03:13] and this talk started off so interesting [03:13] Open Access Journals or Die [03:14] thanks! [03:15] dashcloud, I am always on the hunt for good talks to watch. What are you interested in? I might have some suggestions [03:19] maybe some reverse engineering talks (saw a good one by Travis Goodspeed), but in general I just don't have the time to watch everything I want to see/listen to [03:35] I got nothing off the top of my head. [03:35] I am actually extracting data from the DMOZ rdf dump [03:36] A very useful little bit of data for mapping out a small fraction of the internet [04:04] dashcloud: https://www.youtube.com/watch?v=PR9tFXz4Quc [04:54] Ooh just read a good quote [04:54] Violence is like duct tape. If it doesn't solve the problem, you didn't use enough. [04:56] dynamite is another example of that [04:56] good point [04:58] Anyone here had to setup and run a search engine that had to crawl web pages [04:58] I think I am running into edge cases in url de-duplication and I wanted to take a look at how the existing open source players handle it [05:00] Also webcrawler.com still exists [05:01] how the-- [05:02] I guess some people have really old start pages bookmarked [05:09] does metacrawler still exist? [05:10] i remember patching the webcrawler.com page locally around 1997 to allow an arbitrary number of results returned [05:10] it used to limit you to 5,10,20,or 30 results [05:10] i wanted to see 200 [05:11] paginated results were still a ways off [05:11] back in the ancient days [05:17] I think I can use a Bloom filter to speed up url lookups and de-duplication [05:24] The Lorena Bobbit emoji [05:24] ( ^◡^)っ✂╰⋃╯ [05:24] wow. I am so impressed, I mean fucking look at that. Creativity all around [05:41] For the pythonistas in here - http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ [05:41] It walks through the problem but the guy did not release the code because he was afraid people would abuse it [05:43] He used 20 Amazone EC2 extra large instances. [06:26] chfoo did you read the ISO or the other warc tools to build your warc file reader? [06:26] and writer [06:50] Anyone here have experience with Amazon Elastic MapReduce> [06:50] ? [09:57] so looks like i'm starting to upload 2011 clips from g4 [09:58] AOTS Reveals World's Largest T-Bag: https://archive.org/details/g4tv.com-video50827 [13:19] Hehe, Virgin Lounge. [13:19] SEXY virgin lounge [13:19] I just had a smoothie brought to my couch [13:19] They have US and UK plugs, makes life easier. [13:19] Wine tasting upstairs, not attending [13:20] I think it's hilarious that, that airline is actually named Virgin :D [13:20] Virgin the company has an interesting history. I'd check it out, as well as Richard Branson [13:20] Indeed, Richard is indeed quite interesting [13:21] Lounges are pretty sweet. [13:21] Don't you have enough points, seeing how you criss cross the world, to not pay for access to those? [13:33] lucky SketchCow [13:39] How much does luck play a role in this? Paid for entrance :p [13:42] Yeah, SketchCow with 60 pounds. [13:43] If I was hungry enough, I could totally nail that in just free food/drink [13:50] Virgin Coke anyone? [13:50] I mean how does a record company end up selling soft drinks o_O [13:51] http://i.telegraph.co.uk/multimedia/archive/01556/VIRGINCOLA_1556623i.jpg [13:51] ..what [13:51] why would you [13:51] Richard, what are you doing? [13:51] Was years ago [13:52] Still.. [13:52] then again how does a record company end up flying planes o_O [13:52] Boredom, obviously. [13:52] I've made lots of money, wtf do I do now? [13:53] Or, got annoyed at Emirates not serving up Virgin Cola on a flight, and bought a whole fleet of aircraft. [13:53] xD [13:53] "They said I couldn't bring drinks on a plane.... I proved them wrong!" [13:54] How does a record company end up buying an ISP?! [13:54] It all makes sense.. [13:54] Hm, this one is tough.. [13:54] he was really annoyed with bt? [13:55] He wanted the extra kilobyte/second. [14:01] Anyway the interview seemed ok :O [14:02] That's good. [14:02] ..what's this about again? [14:02] yeah ope so ;D [14:02] my new job hopefully [14:02] Ah [14:03] from 4-7k payrise, easily reaching 9k pay rise [14:03] concidering I've only had 2k in my present comapny in 2 years.... I'd be very happy [14:03] Well that's good! [14:04] Going to do a wiki spambot cleanup tomorrow. [14:04] This time, I'm using my new PC that can have Nightly open while I do other things! [14:05] NIghtly is...? [14:06] Firefox Nightly. [14:07] * GLaDOS crashes into bed [14:11] ahh [14:39] omf_: i was reading the iso when i was writing it. i didn't look at code from other projects because i wanted to avoid copying conformance bugs if any by accident. [14:39] have you been trying it out? [14:40] the next thing i plan to do is a verify command to check the digests and conformance issues [14:45] in terms in conformance, i noticed so far is that wget produces warcs with duplicate record ids when saving some metadata [15:02] http://i.imgur.com/LrASrN8.jpg [15:02] los it at broconut [15:05] xD [15:06] I put the lime in the broconut [15:30] so i looked at g4tv.com/images again [15:31] i think i can get them all in warc.gz format [15:31] or at least whats on the website [15:32] only cause i checked the source and it looks like every url for images are in there [15:32] even if its not being displayed [15:35] When's G4 down for good? [15:35] 11 days, I see [15:39] i think so [15:42] SketchCow: i did find it funny that you had a interview on g4 [15:43] i was always thinking about you going on to techtv if you could [15:43] Yes, that was a LONG time ago and I was one of Kevin's first big interviews [15:43] He worked so hard on this and we talked a lot before and after. [15:43] He probably would remember me [15:43] I have a memory he was so happy to get a copy of the doc when it did come out a year later. [15:44] My flight leaves soon [15:44] its on his youtube account [15:44] As per the upgrade, they come and get me and I get pre-boarding [15:44] And now I can polish off a few more movies. [15:44] (Saw Jack Reacher, Hotel Transylvania on the way in) [15:45] is there a guide anywhere for archiving IP.board (invision powerboard) forums? [15:45] Hotel Transylvania is Adam Sandler with an actual director giving him characterization directions [15:47] so...good? [15:57] it has lots of singing and other childish things, one of the few movies i stopped watching midway through [16:02] so g4 has stopped releasing new videos like 3 weeks ago [16:03] so thats why i'm panicing to get as much stuff as i can [17:31] i have passed 28k videos in g4video-web [18:58] After over a day I can say that DigitalOcean droplets respond faster than joyent instances [18:58] It all comes down to DO using SSD and joyent does not [18:58] DigitalOcean has a better online interface than Joyent and costs half the price for the same thing [19:11] Interesting. I read that DO has a pretty bad network. Did you have any trouble with it? [19:12] None so far [19:12] I will give them a month just like I did joyent [19:12] Also re joyent: It's been over a month and I still haven't received a billing statement. [19:13] Preliminary calculations suggest though that I have so little traffic that the price advantage of Amazon Spot instances negates the savings for having cheap traffic at DO or joyent [19:14] I tried DO because I got $10 account credit for free [19:14] and Joyent gave me a bunch free too [19:14] I will probably end back at Amazon simply because they offer better features [19:23] DO still has the SSDTWEET promo code activated btw [19:24] which entitled you to $10 credit [19:59] http://archive.org/details/cst_000027 [20:32] These DO instances are so much faster than joyent that I have to up my droplet size to get more hard drive space [21:21] DICE 2011 "Exploring the Ocean Deep in 3D" Presentation: https://archive.org/details/g4tv.com-video51230 [22:04] Internet Archive eat another 2.2gb doughnut. You know you love it. [22:04] omf_: it's clearly a marketing tactic! ;) [22:04] joepie92, yeah now it costs what I was paying for joyent [22:05] $0.03 an hour [22:06] See I am old school. I use the butt, then I lose the butt [22:06] If the butts aren't making me money there is no reason to keep them around :) [22:10] are you talking about buttcoins? [22:12] nope- the "cloud" [22:13] The clown when in public [22:13] I think I prefer "moon" more- it makes it very clear how far away your stuff is and your chances of getting it if something goes wrong [22:16] I might spin up an instance of the common crawl tonight to try MapReduce and see how fast things are [22:16] The thing I am trying to figure out is do I have to load the whole thing first or is it already in a format that I can just query against [22:17] I don't even need all 81tb, just the URL frontier [22:28] Here is what is interesting about DO. $0.007 per hour = 20gb storage, 1tb transfer [22:29] $0.015 per hour = 30gb, 2tb transfer [22:29] So I should just get 2 of the smallest instances to get more hard drive space [22:32] haha [22:41] I added more clown info to the wiki - http://www.archiveteam.org/index.php?title=Clown_hosting [22:43] omf_: DO offers unmetered again? [22:43] I didn't put that note on there. Not sure [22:43] All the plans have transfer limits on them [22:44] All servers come with 1Gb/sec. network interface. Plans start with 1TB per month and increase incrementally. Once the monthly transfer limit has been exceeded, it's $0.02 per GB thereafter. Youâ??ll save a ton of money with our network and it's easy to get started. [22:59] http://www.vr.org/buy-vps/#usa [22:59] free inbound [23:00] interesting move by Google: http://dataliberation.blogspot.com/2013/04/plan-your-digital-afterlife-with.html [23:32] dashcloud: whoa. [23:33] yeah man [23:33] I'm excited about it [23:33] so, not perfect, but far better than everyone else out there [23:33] I'm setting mine up today [23:33] yeah