#archiveteam 2013-05-04,Sat

↑back Search

Time Nickname Message
10:29 🔗 omf_ Besides the geocities projects and the usenet site what are other projects based off of Archive Team collected data?
10:32 🔗 BlueMax well correct me if I'm wrong but has there really been a project with data available publically on a wide enough scope to make big projects off?
10:38 🔗 omf_ I just referred to big projects already known and asked if anyone knew of more. http://deletedcity.net/, http://olduse.net/, http://oneterabyteofkilobyteage.tumblr.com/ are the ones I was referring to
10:38 🔗 omf_ Plus the art book written about Geocities
12:26 🔗 Cameron_D http://www.gwern.net/Google%20shutdowns
13:14 🔗 ersi wut, "See also: Archive Team"
13:19 🔗 Cameron_D hah, missed that
13:42 🔗 balrog so what's the plan as to Reader?
13:58 🔗 creature balrog: Google Reader?
13:58 🔗 balrog yeah
13:59 🔗 Smiley .... theres nothing really to save?
13:59 🔗 creature I'm not sure there's anything Archive Team can do about that. Archive Team tends to focus on public content, and everything in Reader is behind individual Google logins.
13:59 🔗 creature And, well, it's not the content that's going away, really; it's the app used to access it.
14:00 🔗 creature There's metadata too - people's stars and shares and so on - but, as I say, private.
14:00 🔗 omf_ google is harder to scrape and not get caught than any other company I have tried
14:00 🔗 omf_ they are fucking on it
14:02 🔗 omf_ Even using headless browsers and sophisticated macros can only get you so far
14:04 🔗 BlueMax well I would hope so, they're professional scrapers themselves, you'd think they'd know how to defend themselves
14:08 🔗 Tomcat_ :)
14:10 🔗 omf_ The one game no one can win is the long game
14:10 🔗 omf_ the slower you go the easier it is to evade
14:11 🔗 sep332 There is some content that needed saving. Google kept an archive of each blog post, even for blogs that arent' around anymore.
14:11 🔗 omf_ There is a paper out there about how all detection software fails because the timing window is small to find DOS style hits
14:11 🔗 Tomcat_ I think in the case of Google it might be a better idea to get them to give the archives to archive.org themselves.
14:12 🔗 omf_ They have no incentive to do so
14:12 🔗 Tomcat_ Well, they are archiving themselves. Like Google Books.
14:12 🔗 sep332 I'm not sure there's any way for Google to know which blogs were "public" and which were not generally accessible
14:12 🔗 omf_ Google is already one of the largest open data contributors out there
14:13 🔗 sep332 As far as i know you could even put a password in the RSS URL
14:14 🔗 Tomcat_ Of course Google has no incentive to give stuff up for when they're no longer around. ;) So I guess that's the bad part.
14:21 🔗 ersi Always count on that everyone are stupid twats that wouldn't want to share things. Then again, if you're able to ask someone who is important, go right ahead
14:22 🔗 omf_ Also never underestimate straight up incompetence
14:38 🔗 Tomcat_ or lack of time ;)
21:33 🔗 Nemo_bis godane: going to grab these? :) http://creativecommons.org/weblog/entry/37647

irclogger-viewer