[03:17] alard: Have you uploaded your twaud.io files? [03:19] Guys, how complete does this list look? [03:19] Prodigy [03:19] URLTeam [03:19] WikiTeam [03:19] Yahoo Video [03:19] Google Video [03:19] Starwars Forums [03:19] Twaud.io [03:19] Windows Live Spaces [03:20] flickrFckr [03:20] Google Groups [03:20] (for recent projects) [03:20] poetry.com [03:20] Ah, thanks [03:37] del.icio.us/delicious.com? [03:39] Oh yeah [04:07] SketchCow: email'd [05:15] Just sent this to Jason, dunno if any of y'all are interested in it [05:15] http://pastebin.com/G3GUWX1p [05:18] underscor: nice [05:23] hmm, new Amazon pricing allows unlimited downloading into EC2 instances for $0. Still costs to get the data out, but could be useful for scrape tasks that don't generate a large payload. [05:24] hmmmmm. [05:24] yeah [05:25] course you still pay for CPU also. Heh, I got a bigger AWS bill for scraping poetry.com than I was expecting [05:26] underscor: should throw that on the wiki so it doesn't get deleted (ironically) [05:27] haha [05:27] Will do [05:27] It's in my email too [05:29] in related news, I'm wasting far too much time now that I've discovered you can make the wayback machine archive things it's missing by loading the page in web.archive.org and see if it pulls images or files from liveweb.archive.org (and using liveweb directly if things are stale) [05:31] so try calling up odd corners of your website, your favorite sites, etc [06:35] have you seen http://aws.amazon.com/pricing_effective_july_2011/ already? :) [06:39] yes, closure beat you to it [06:39] DFJustin: Um, what? Does Wayback get the material from when you're surfin' liveweb.archive.org? [06:40] DFJustin: please elaborate when you got time :) [13:54] Hiya all. :-) [14:00] http://faq.web.archive.org/can-i-get-just-one-page-archived/ [14:14] DFJustin: Fuck yeah, that's nice. Thanks :) [14:19] DFJustin: Hm.. But I don't get how to trigger an 'archivation' when there's already a snapshot like.. a year ago [14:20] replace "web." with "liveweb." [14:20] oh [14:21] well that didn't work [14:21] I get "Wayback Machine doesn't have that page archived. Want to search for all archived pages under http://web/20100812215505/http:// " [14:21] :/ [14:22] oh strip the web/2010blah blah part [14:22] ah, stupid me [14:22] yeah, now it works just fine. :) [14:35] Hm~ this is weird [14:35] DFJustin: Do you get "Wayback doesn't have that page archived" if you click on this? http://liveweb.archive.org/http://frigolit.net/projects/hxemu/ [14:35] It's got a permissive robots.txt :o [14:36] might be ip-blocking archive.org or something [14:36] hmm shouldn't be [14:37] but maybe, I'll poke the owner and see if he maybe does something weird based on user-agent perhaps [15:16] That whole "free inbound bandwidth with Amazon AWS could be pretty nice. [16:52] http://en.wikinews.org/wiki/Wikinews_interviews_US_National_Archives_Wikipedian_in_Residence [17:01] http://blog.ted.com/2011/05/02/beware-online-filter-bubbles-eli-pariser-on-ted-com/ [23:04] Hey :D [23:06] yay, I got a google+ invite :D [23:08] spangle: what is that? [23:09] like facebook, but by google :P [23:10] Oh, the "+1" thing? [23:11] http://www.google.com/intl/en/+/demo/ [23:24] https://plus.google.com/104560124403688998123/posts this is the face of a man ready to see his ship sink [23:34] hehe [23:37] underscor: tried the drive yet? rescued the yahoo videos?