[10:29] Besides the geocities projects and the usenet site what are other projects based off of Archive Team collected data? [10:32] well correct me if I'm wrong but has there really been a project with data available publically on a wide enough scope to make big projects off? [10:38] I just referred to big projects already known and asked if anyone knew of more. http://deletedcity.net/, http://olduse.net/, http://oneterabyteofkilobyteage.tumblr.com/ are the ones I was referring to [10:38] Plus the art book written about Geocities [12:26] http://www.gwern.net/Google%20shutdowns [13:14] wut, "See also: Archive Team" [13:19] hah, missed that [13:42] so what's the plan as to Reader? [13:58] balrog: Google Reader? [13:58] yeah [13:59] .... theres nothing really to save? [13:59] I'm not sure there's anything Archive Team can do about that. Archive Team tends to focus on public content, and everything in Reader is behind individual Google logins. [13:59] And, well, it's not the content that's going away, really; it's the app used to access it. [14:00] There's metadata too - people's stars and shares and so on - but, as I say, private. [14:00] google is harder to scrape and not get caught than any other company I have tried [14:00] they are fucking on it [14:02] Even using headless browsers and sophisticated macros can only get you so far [14:04] well I would hope so, they're professional scrapers themselves, you'd think they'd know how to defend themselves [14:08] :) [14:10] The one game no one can win is the long game [14:10] the slower you go the easier it is to evade [14:11] There is some content that needed saving. Google kept an archive of each blog post, even for blogs that arent' around anymore. [14:11] There is a paper out there about how all detection software fails because the timing window is small to find DOS style hits [14:11] I think in the case of Google it might be a better idea to get them to give the archives to archive.org themselves. [14:12] They have no incentive to do so [14:12] Well, they are archiving themselves. Like Google Books. [14:12] I'm not sure there's any way for Google to know which blogs were "public" and which were not generally accessible [14:12] Google is already one of the largest open data contributors out there [14:13] As far as i know you could even put a password in the RSS URL [14:14] Of course Google has no incentive to give stuff up for when they're no longer around. ;) So I guess that's the bad part. [14:21] Always count on that everyone are stupid twats that wouldn't want to share things. Then again, if you're able to ask someone who is important, go right ahead [14:22] Also never underestimate straight up incompetence [14:38] or lack of time ;) [21:33] godane: going to grab these? :) http://creativecommons.org/weblog/entry/37647