#archiveteam-bs 2013-04-11,Thu

↑back Search

Time Nickname Message
00:24 πŸ”— joepie92 kennethre: quick q... does requests not allow for retrieving the response body if the response code is a 404?
00:24 πŸ”— joepie92 actually wait
00:25 πŸ”— joepie92 I'm an idiot, disregard above
00:45 πŸ”— godane http://dogtv.com/
01:05 πŸ”— DFJustin I grabbed the jedi academy & outcast source zips when it was up
01:05 πŸ”— omf_ DFJustin, could you sha256sum both files so I can compare mine make sure everything matches up
01:06 πŸ”— DFJustin 266c87c4fa204c8da87f1d5968abf115e33b332e45f265226cffed7e4a621641 *jediAcademy.zip
01:06 πŸ”— DFJustin 89a0d8bfb45b194fe8ab0943a25b65089c28457423380aecb760c7e0b2a98209 *jediOutcast.zip
01:07 πŸ”— omf_ Mine match up
01:07 πŸ”— omf_ \o/
01:24 πŸ”— SketchCow Please upload to archive.org.
01:27 πŸ”— DFJustin on it
01:35 πŸ”— dashcloud the unedited version?
01:36 πŸ”— DFJustin http://archive.org/details/jediacademy_source
01:36 πŸ”— DFJustin http://archive.org/details/jedioutcast_source
01:39 πŸ”— dashcloud thanks
01:44 πŸ”— SketchCow Excellent
02:14 πŸ”— omf_ I want to stab the guy who presented after you SketchCow. He is totally a publisher apologist
02:16 πŸ”— omf_ "Publishers want to get it right." Consider the fucking stacks of shit that comes out places like Apress, Wrox, nostarch
02:17 πŸ”— omf_ "They don't want to lock things down." Which is why everything is fucking locked down hard
02:21 πŸ”— omf_ More than half way done and no mention of Elsevier, the boycott or the dozens or articles talking about authors who hate the publishing industry
02:41 πŸ”— dashcloud omf_: is your second sentence from the presentation itself?
02:49 πŸ”— bsmith093 i just popped in, SketchCow has a presentation ?? link?
02:49 πŸ”— omf_ bsmith093, - https://www.youtube.com/watch?v=M41806HPaCY
02:50 πŸ”— omf_ dashcloud, I am looking for the time so you can watch it youself
02:51 πŸ”— dashcloud thanks!
03:06 πŸ”— omf_ start at 10 minutes 40 seconds in
03:07 πŸ”— omf_ He then goes on to talk about all the work publishers do
03:07 πŸ”— omf_ Here is how unimportant this guy is
03:08 πŸ”— omf_ He has like a staff of 40 and then run a university library and interface with people and buy journals
03:09 πŸ”— omf_ SketchCow, has 20-30 core volunteers, 100 more that are big helpers and 100+ more people he pulls in just to run warriors which saves millions of peoples content and then gives it away for free
03:11 πŸ”— omf_ This speaker also keeps mentioning how he has friends in the publishing industry, how the publishing industry is all idealists and to go buy your publisher a drink
03:13 πŸ”— omf_ and this talk started off so interesting
03:13 πŸ”— omf_ Open Access Journals or Die
03:14 πŸ”— dashcloud thanks!
03:15 πŸ”— omf_ dashcloud, I am always on the hunt for good talks to watch. What are you interested in? I might have some suggestions
03:19 πŸ”— dashcloud maybe some reverse engineering talks (saw a good one by Travis Goodspeed), but in general I just don't have the time to watch everything I want to see/listen to
03:35 πŸ”— omf_ I got nothing off the top of my head.
03:35 πŸ”— omf_ I am actually extracting data from the DMOZ rdf dump
03:36 πŸ”— omf_ A very useful little bit of data for mapping out a small fraction of the internet
04:04 πŸ”— DFJustin dashcloud: https://www.youtube.com/watch?v=PR9tFXz4Quc
04:54 πŸ”— omf_ Ooh just read a good quote
04:54 πŸ”— omf_ Violence is like duct tape. If it doesn't solve the problem, you didn't use enough.
04:56 πŸ”— DFJustin dynamite is another example of that
04:56 πŸ”— omf_ good point
04:58 πŸ”— omf_ Anyone here had to setup and run a search engine that had to crawl web pages
04:58 πŸ”— omf_ I think I am running into edge cases in url de-duplication and I wanted to take a look at how the existing open source players handle it
05:00 πŸ”— omf_ Also webcrawler.com still exists
05:01 πŸ”— DFJustin how the--
05:02 πŸ”— DFJustin I guess some people have really old start pages bookmarked
05:09 πŸ”— Lord_Nigh does metacrawler still exist?
05:10 πŸ”— Lord_Nigh i remember patching the webcrawler.com page locally around 1997 to allow an arbitrary number of results returned
05:10 πŸ”— Lord_Nigh it used to limit you to 5,10,20,or 30 results
05:10 πŸ”— Lord_Nigh i wanted to see 200
05:11 πŸ”— Lord_Nigh paginated results were still a ways off
05:11 πŸ”— Lord_Nigh back in the ancient days
05:17 πŸ”— omf_ I think I can use a Bloom filter to speed up url lookups and de-duplication
05:24 πŸ”— omf_ The Lorena Bobbit emoji
05:24 πŸ”— omf_ ( ^җ‘ï¼¾)Γ£ΒΒ£Γ’ΒœΒ‚Γ’Β•Β°Γ’Β‹ΒƒΓ’Β•Β―
05:24 πŸ”— omf_ wow. I am so impressed, I mean fucking look at that. Creativity all around
05:41 πŸ”— omf_ For the pythonistas in here - http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/
05:41 πŸ”— omf_ It walks through the problem but the guy did not release the code because he was afraid people would abuse it
05:43 πŸ”— omf_ He used 20 Amazone EC2 extra large instances.
06:26 πŸ”— omf_ chfoo did you read the ISO or the other warc tools to build your warc file reader?
06:26 πŸ”— omf_ and writer
06:50 πŸ”— omf_ Anyone here have experience with Amazon Elastic MapReduce>
06:50 πŸ”— omf_ ?
09:57 πŸ”— godane so looks like i'm starting to upload 2011 clips from g4
09:58 πŸ”— godane AOTS Reveals World's Largest T-Bag: https://archive.org/details/g4tv.com-video50827
13:19 πŸ”— ersi Hehe, Virgin Lounge.
13:19 πŸ”— SketchCow SEXY virgin lounge
13:19 πŸ”— SketchCow I just had a smoothie brought to my couch
13:19 πŸ”— SketchCow They have US and UK plugs, makes life easier.
13:19 πŸ”— SketchCow Wine tasting upstairs, not attending
13:20 πŸ”— ersi I think it's hilarious that, that airline is actually named Virgin :D
13:20 πŸ”— SketchCow Virgin the company has an interesting history. I'd check it out, as well as Richard Branson
13:20 πŸ”— ersi Indeed, Richard is indeed quite interesting
13:21 πŸ”— ersi Lounges are pretty sweet.
13:21 πŸ”— ersi Don't you have enough points, seeing how you criss cross the world, to not pay for access to those?
13:33 πŸ”— BlueMax lucky SketchCow
13:39 πŸ”— ersi How much does luck play a role in this? Paid for entrance :p
13:42 πŸ”— SketchCow Yeah, SketchCow with 60 pounds.
13:43 πŸ”— SketchCow If I was hungry enough, I could totally nail that in just free food/drink
13:50 πŸ”— SmileyG Virgin Coke anyone?
13:50 πŸ”— SmileyG I mean how does a record company end up selling soft drinks o_O
13:51 πŸ”— SmileyG http://i.telegraph.co.uk/multimedia/archive/01556/VIRGINCOLA_1556623i.jpg
13:51 πŸ”— GLaDOS ..what
13:51 πŸ”— GLaDOS why would you
13:51 πŸ”— GLaDOS Richard, what are you doing?
13:51 πŸ”— SmileyG Was years ago
13:52 πŸ”— GLaDOS Still..
13:52 πŸ”— SmileyG then again how does a record company end up flying planes o_O
13:52 πŸ”— GLaDOS Boredom, obviously.
13:52 πŸ”— SmileyG I've made lots of money, wtf do I do now?
13:53 πŸ”— GLaDOS Or, got annoyed at Emirates not serving up Virgin Cola on a flight, and bought a whole fleet of aircraft.
13:53 πŸ”— SmileyG xD
13:53 πŸ”— SmileyG "They said I couldn't bring drinks on a plane.... I proved them wrong!"
13:54 πŸ”— SmileyG How does a record company end up buying an ISP?!
13:54 πŸ”— GLaDOS It all makes sense..
13:54 πŸ”— GLaDOS Hm, this one is tough..
13:54 πŸ”— SmileyG he was really annoyed with bt?
13:55 πŸ”— GLaDOS He wanted the extra kilobyte/second.
14:01 πŸ”— SmileyG Anyway the interview seemed ok :O
14:02 πŸ”— GLaDOS That's good.
14:02 πŸ”— GLaDOS ..what's this about again?
14:02 πŸ”— SmileyG yeah ope so ;D
14:02 πŸ”— SmileyG my new job hopefully
14:02 πŸ”— GLaDOS Ah
14:03 πŸ”— SmileyG from 4-7k payrise, easily reaching 9k pay rise
14:03 πŸ”— SmileyG concidering I've only had 2k in my present comapny in 2 years.... I'd be very happy
14:03 πŸ”— GLaDOS Well that's good!
14:04 πŸ”— GLaDOS Going to do a wiki spambot cleanup tomorrow.
14:04 πŸ”— GLaDOS This time, I'm using my new PC that can have Nightly open while I do other things!
14:05 πŸ”— SmileyG NIghtly is...?
14:06 πŸ”— GLaDOS Firefox Nightly.
14:07 πŸ”— * GLaDOS crashes into bed
14:11 πŸ”— SmileyG ahh
14:39 πŸ”— chfoo omf_: i was reading the iso when i was writing it. i didn't look at code from other projects because i wanted to avoid copying conformance bugs if any by accident.
14:39 πŸ”— chfoo have you been trying it out?
14:40 πŸ”— chfoo the next thing i plan to do is a verify command to check the digests and conformance issues
14:45 πŸ”— chfoo in terms in conformance, i noticed so far is that wget produces warcs with duplicate record ids when saving some metadata
15:02 πŸ”— SketchCow http://i.imgur.com/LrASrN8.jpg
15:02 πŸ”— SketchCow los it at broconut
15:05 πŸ”— SmileyG xD
15:06 πŸ”— BlueMax I put the lime in the broconut
15:30 πŸ”— godane so i looked at g4tv.com/images again
15:31 πŸ”— godane i think i can get them all in warc.gz format
15:31 πŸ”— godane or at least whats on the website
15:32 πŸ”— godane only cause i checked the source and it looks like every url for images are in there
15:32 πŸ”— godane even if its not being displayed
15:35 πŸ”— SketchCow When's G4 down for good?
15:35 πŸ”— SketchCow 11 days, I see
15:39 πŸ”— godane i think so
15:42 πŸ”— godane SketchCow: i did find it funny that you had a interview on g4
15:43 πŸ”— godane i was always thinking about you going on to techtv if you could
15:43 πŸ”— SketchCow Yes, that was a LONG time ago and I was one of Kevin's first big interviews
15:43 πŸ”— SketchCow He worked so hard on this and we talked a lot before and after.
15:43 πŸ”— SketchCow He probably would remember me
15:43 πŸ”— SketchCow I have a memory he was so happy to get a copy of the doc when it did come out a year later.
15:44 πŸ”— SketchCow My flight leaves soon
15:44 πŸ”— godane its on his youtube account
15:44 πŸ”— SketchCow As per the upgrade, they come and get me and I get pre-boarding
15:44 πŸ”— SketchCow And now I can polish off a few more movies.
15:44 πŸ”— SketchCow (Saw Jack Reacher, Hotel Transylvania on the way in)
15:45 πŸ”— balrog_ is there a guide anywhere for archiving IP.board (invision powerboard) forums?
15:45 πŸ”— SketchCow Hotel Transylvania is Adam Sandler with an actual director giving him characterization directions
15:47 πŸ”— BlueMax so...good?
15:57 πŸ”— Schbirid it has lots of singing and other childish things, one of the few movies i stopped watching midway through
16:02 πŸ”— godane so g4 has stopped releasing new videos like 3 weeks ago
16:03 πŸ”— godane so thats why i'm panicing to get as much stuff as i can
17:31 πŸ”— godane i have passed 28k videos in g4video-web
18:58 πŸ”— omf_ After over a day I can say that DigitalOcean droplets respond faster than joyent instances
18:58 πŸ”— omf_ It all comes down to DO using SSD and joyent does not
18:58 πŸ”— omf_ DigitalOcean has a better online interface than Joyent and costs half the price for the same thing
19:11 πŸ”— soultcer Interesting. I read that DO has a pretty bad network. Did you have any trouble with it?
19:12 πŸ”— omf_ None so far
19:12 πŸ”— omf_ I will give them a month just like I did joyent
19:12 πŸ”— soultcer Also re joyent: It's been over a month and I still haven't received a billing statement.
19:13 πŸ”— soultcer Preliminary calculations suggest though that I have so little traffic that the price advantage of Amazon Spot instances negates the savings for having cheap traffic at DO or joyent
19:14 πŸ”— omf_ I tried DO because I got $10 account credit for free
19:14 πŸ”— omf_ and Joyent gave me a bunch free too
19:14 πŸ”— omf_ I will probably end back at Amazon simply because they offer better features
19:23 πŸ”— ersi DO still has the SSDTWEET promo code activated btw
19:24 πŸ”— ersi which entitled you to $10 credit
19:59 πŸ”— DFJustin http://archive.org/details/cst_000027
20:32 πŸ”— omf_ These DO instances are so much faster than joyent that I have to up my droplet size to get more hard drive space
21:21 πŸ”— godane DICE 2011 "Exploring the Ocean Deep in 3D" Presentation: https://archive.org/details/g4tv.com-video51230
22:04 πŸ”— omf_ Internet Archive eat another 2.2gb doughnut. You know you love it.
22:04 πŸ”— joepie92 omf_: it's clearly a marketing tactic! ;)
22:04 πŸ”— omf_ joepie92, yeah now it costs what I was paying for joyent
22:05 πŸ”— omf_ $0.03 an hour
22:06 πŸ”— omf_ See I am old school. I use the butt, then I lose the butt
22:06 πŸ”— omf_ If the butts aren't making me money there is no reason to keep them around :)
22:10 πŸ”— S[h]O[r]T are you talking about buttcoins?
22:12 πŸ”— dashcloud nope- the "cloud"
22:13 πŸ”— omf_ The clown when in public
22:13 πŸ”— dashcloud I think I prefer "moon" more- it makes it very clear how far away your stuff is and your chances of getting it if something goes wrong
22:16 πŸ”— omf_ I might spin up an instance of the common crawl tonight to try MapReduce and see how fast things are
22:16 πŸ”— omf_ The thing I am trying to figure out is do I have to load the whole thing first or is it already in a format that I can just query against
22:17 πŸ”— omf_ I don't even need all 81tb, just the URL frontier
22:28 πŸ”— omf_ Here is what is interesting about DO. $0.007 per hour = 20gb storage, 1tb transfer
22:29 πŸ”— omf_ $0.015 per hour = 30gb, 2tb transfer
22:29 πŸ”— omf_ So I should just get 2 of the smallest instances to get more hard drive space
22:32 πŸ”— joepie92 haha
22:41 πŸ”— omf_ I added more clown info to the wiki - http://www.archiveteam.org/index.php?title=Clown_hosting
22:43 πŸ”— joepie92 omf_: DO offers unmetered again?
22:43 πŸ”— omf_ I didn't put that note on there. Not sure
22:43 πŸ”— omf_ All the plans have transfer limits on them
22:44 πŸ”— joepie92 All servers come with 1Gb/sec. network interface. Plans start with 1TB per month and increase incrementally. Once the monthly transfer limit has been exceeded, it's $0.02 per GB thereafter. YouÒ??ll save a ton of money with our network and it's easy to get started.
22:59 πŸ”— joepie92 http://www.vr.org/buy-vps/#usa
22:59 πŸ”— joepie92 free inbound
23:00 πŸ”— dashcloud interesting move by Google: http://dataliberation.blogspot.com/2013/04/plan-your-digital-afterlife-with.html
23:32 πŸ”— joepie92 dashcloud: whoa.
23:33 πŸ”— Aranje- yeah man
23:33 πŸ”— Aranje- I'm excited about it
23:33 πŸ”— dashcloud so, not perfect, but far better than everyone else out there
23:33 πŸ”— Aranje- I'm setting mine up today
23:33 πŸ”— Aranje- yeah

irclogger-viewer