| Time |
Nickname |
Message |
|
10:29
🔗
|
omf_ |
Besides the geocities projects and the usenet site what are other projects based off of Archive Team collected data? |
|
10:32
🔗
|
BlueMax |
well correct me if I'm wrong but has there really been a project with data available publically on a wide enough scope to make big projects off? |
|
10:38
🔗
|
omf_ |
I just referred to big projects already known and asked if anyone knew of more. http://deletedcity.net/, http://olduse.net/, http://oneterabyteofkilobyteage.tumblr.com/ are the ones I was referring to |
|
10:38
🔗
|
omf_ |
Plus the art book written about Geocities |
|
12:26
🔗
|
Cameron_D |
http://www.gwern.net/Google%20shutdowns |
|
13:14
🔗
|
ersi |
wut, "See also: Archive Team" |
|
13:19
🔗
|
Cameron_D |
hah, missed that |
|
13:42
🔗
|
balrog |
so what's the plan as to Reader? |
|
13:58
🔗
|
creature |
balrog: Google Reader? |
|
13:58
🔗
|
balrog |
yeah |
|
13:59
🔗
|
Smiley |
.... theres nothing really to save? |
|
13:59
🔗
|
creature |
I'm not sure there's anything Archive Team can do about that. Archive Team tends to focus on public content, and everything in Reader is behind individual Google logins. |
|
13:59
🔗
|
creature |
And, well, it's not the content that's going away, really; it's the app used to access it. |
|
14:00
🔗
|
creature |
There's metadata too - people's stars and shares and so on - but, as I say, private. |
|
14:00
🔗
|
omf_ |
google is harder to scrape and not get caught than any other company I have tried |
|
14:00
🔗
|
omf_ |
they are fucking on it |
|
14:02
🔗
|
omf_ |
Even using headless browsers and sophisticated macros can only get you so far |
|
14:04
🔗
|
BlueMax |
well I would hope so, they're professional scrapers themselves, you'd think they'd know how to defend themselves |
|
14:08
🔗
|
Tomcat_ |
:) |
|
14:10
🔗
|
omf_ |
The one game no one can win is the long game |
|
14:10
🔗
|
omf_ |
the slower you go the easier it is to evade |
|
14:11
🔗
|
sep332 |
There is some content that needed saving. Google kept an archive of each blog post, even for blogs that arent' around anymore. |
|
14:11
🔗
|
omf_ |
There is a paper out there about how all detection software fails because the timing window is small to find DOS style hits |
|
14:11
🔗
|
Tomcat_ |
I think in the case of Google it might be a better idea to get them to give the archives to archive.org themselves. |
|
14:12
🔗
|
omf_ |
They have no incentive to do so |
|
14:12
🔗
|
Tomcat_ |
Well, they are archiving themselves. Like Google Books. |
|
14:12
🔗
|
sep332 |
I'm not sure there's any way for Google to know which blogs were "public" and which were not generally accessible |
|
14:12
🔗
|
omf_ |
Google is already one of the largest open data contributors out there |
|
14:13
🔗
|
sep332 |
As far as i know you could even put a password in the RSS URL |
|
14:14
🔗
|
Tomcat_ |
Of course Google has no incentive to give stuff up for when they're no longer around. ;) So I guess that's the bad part. |
|
14:21
🔗
|
ersi |
Always count on that everyone are stupid twats that wouldn't want to share things. Then again, if you're able to ask someone who is important, go right ahead |
|
14:22
🔗
|
omf_ |
Also never underestimate straight up incompetence |
|
14:38
🔗
|
Tomcat_ |
or lack of time ;) |
|
21:33
🔗
|
Nemo_bis |
godane: going to grab these? :) http://creativecommons.org/weblog/entry/37647 |