#archiveteam 2013-05-04,Sat

↑back Search

Time	Nickname	Message
10:29 ^🔗	omf_	Besides the geocities projects and the usenet site what are other projects based off of Archive Team collected data?
10:32 ^🔗	BlueMax	well correct me if I'm wrong but has there really been a project with data available publically on a wide enough scope to make big projects off?
10:38 ^🔗	omf_	I just referred to big projects already known and asked if anyone knew of more. http://deletedcity.net/, http://olduse.net/, http://oneterabyteofkilobyteage.tumblr.com/ are the ones I was referring to
10:38 ^🔗	omf_	Plus the art book written about Geocities
12:26 ^🔗	Cameron_D	http://www.gwern.net/Google%20shutdowns
13:14 ^🔗	ersi	wut, "See also: Archive Team"
13:19 ^🔗	Cameron_D	hah, missed that
13:42 ^🔗	balrog	so what's the plan as to Reader?
13:58 ^🔗	creature	balrog: Google Reader?
13:58 ^🔗	balrog	yeah
13:59 ^🔗	Smiley	.... theres nothing really to save?
13:59 ^🔗	creature	I'm not sure there's anything Archive Team can do about that. Archive Team tends to focus on public content, and everything in Reader is behind individual Google logins.
13:59 ^🔗	creature	And, well, it's not the content that's going away, really; it's the app used to access it.
14:00 ^🔗	creature	There's metadata too - people's stars and shares and so on - but, as I say, private.
14:00 ^🔗	omf_	google is harder to scrape and not get caught than any other company I have tried
14:00 ^🔗	omf_	they are fucking on it
14:02 ^🔗	omf_	Even using headless browsers and sophisticated macros can only get you so far
14:04 ^🔗	BlueMax	well I would hope so, they're professional scrapers themselves, you'd think they'd know how to defend themselves
14:08 ^🔗	Tomcat_	:)
14:10 ^🔗	omf_	The one game no one can win is the long game
14:10 ^🔗	omf_	the slower you go the easier it is to evade
14:11 ^🔗	sep332	There is some content that needed saving. Google kept an archive of each blog post, even for blogs that arent' around anymore.
14:11 ^🔗	omf_	There is a paper out there about how all detection software fails because the timing window is small to find DOS style hits
14:11 ^🔗	Tomcat_	I think in the case of Google it might be a better idea to get them to give the archives to archive.org themselves.
14:12 ^🔗	omf_	They have no incentive to do so
14:12 ^🔗	Tomcat_	Well, they are archiving themselves. Like Google Books.
14:12 ^🔗	sep332	I'm not sure there's any way for Google to know which blogs were "public" and which were not generally accessible
14:12 ^🔗	omf_	Google is already one of the largest open data contributors out there
14:13 ^🔗	sep332	As far as i know you could even put a password in the RSS URL
14:14 ^🔗	Tomcat_	Of course Google has no incentive to give stuff up for when they're no longer around. ;) So I guess that's the bad part.
14:21 ^🔗	ersi	Always count on that everyone are stupid twats that wouldn't want to share things. Then again, if you're able to ask someone who is important, go right ahead
14:22 ^🔗	omf_	Also never underestimate straight up incompetence
14:38 ^🔗	Tomcat_	or lack of time ;)
21:33 ^🔗	Nemo_bis	godane: going to grab these? :) http://creativecommons.org/weblog/entry/37647

irclogger-viewer