Time |
Nickname |
Message |
10:29
🔗
|
omf_ |
Besides the geocities projects and the usenet site what are other projects based off of Archive Team collected data? |
10:32
🔗
|
BlueMax |
well correct me if I'm wrong but has there really been a project with data available publically on a wide enough scope to make big projects off? |
10:38
🔗
|
omf_ |
I just referred to big projects already known and asked if anyone knew of more. http://deletedcity.net/, http://olduse.net/, http://oneterabyteofkilobyteage.tumblr.com/ are the ones I was referring to |
10:38
🔗
|
omf_ |
Plus the art book written about Geocities |
12:26
🔗
|
Cameron_D |
http://www.gwern.net/Google%20shutdowns |
13:14
🔗
|
ersi |
wut, "See also: Archive Team" |
13:19
🔗
|
Cameron_D |
hah, missed that |
13:42
🔗
|
balrog |
so what's the plan as to Reader? |
13:58
🔗
|
creature |
balrog: Google Reader? |
13:58
🔗
|
balrog |
yeah |
13:59
🔗
|
Smiley |
.... theres nothing really to save? |
13:59
🔗
|
creature |
I'm not sure there's anything Archive Team can do about that. Archive Team tends to focus on public content, and everything in Reader is behind individual Google logins. |
13:59
🔗
|
creature |
And, well, it's not the content that's going away, really; it's the app used to access it. |
14:00
🔗
|
creature |
There's metadata too - people's stars and shares and so on - but, as I say, private. |
14:00
🔗
|
omf_ |
google is harder to scrape and not get caught than any other company I have tried |
14:00
🔗
|
omf_ |
they are fucking on it |
14:02
🔗
|
omf_ |
Even using headless browsers and sophisticated macros can only get you so far |
14:04
🔗
|
BlueMax |
well I would hope so, they're professional scrapers themselves, you'd think they'd know how to defend themselves |
14:08
🔗
|
Tomcat_ |
:) |
14:10
🔗
|
omf_ |
The one game no one can win is the long game |
14:10
🔗
|
omf_ |
the slower you go the easier it is to evade |
14:11
🔗
|
sep332 |
There is some content that needed saving. Google kept an archive of each blog post, even for blogs that arent' around anymore. |
14:11
🔗
|
omf_ |
There is a paper out there about how all detection software fails because the timing window is small to find DOS style hits |
14:11
🔗
|
Tomcat_ |
I think in the case of Google it might be a better idea to get them to give the archives to archive.org themselves. |
14:12
🔗
|
omf_ |
They have no incentive to do so |
14:12
🔗
|
Tomcat_ |
Well, they are archiving themselves. Like Google Books. |
14:12
🔗
|
sep332 |
I'm not sure there's any way for Google to know which blogs were "public" and which were not generally accessible |
14:12
🔗
|
omf_ |
Google is already one of the largest open data contributors out there |
14:13
🔗
|
sep332 |
As far as i know you could even put a password in the RSS URL |
14:14
🔗
|
Tomcat_ |
Of course Google has no incentive to give stuff up for when they're no longer around. ;) So I guess that's the bad part. |
14:21
🔗
|
ersi |
Always count on that everyone are stupid twats that wouldn't want to share things. Then again, if you're able to ask someone who is important, go right ahead |
14:22
🔗
|
omf_ |
Also never underestimate straight up incompetence |
14:38
🔗
|
Tomcat_ |
or lack of time ;) |
21:33
🔗
|
Nemo_bis |
godane: going to grab these? :) http://creativecommons.org/weblog/entry/37647 |