#internetarchive 2020-01-02,Thu

↑back Search

Time Nickname Message
02:41 🔗 DogsRNice has quit IRC (Ping timeout: 276 seconds)
02:42 🔗 DogsRNice has joined #internetarchive
03:28 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:47 🔗 martini has quit IRC (Quit: No Reasson)
04:04 🔗 qw3rty2 has joined #internetarchive
04:13 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
05:07 🔗 odemg has quit IRC (Ping timeout: 745 seconds)
05:11 🔗 odemg has joined #internetarchive
08:20 🔗 OrIdow6 has quit IRC (Ping timeout: 276 seconds)
10:42 🔗 yano_ has joined #internetarchive
10:43 🔗 yano has quit IRC (Read error: Operation timed out)
10:44 🔗 tsr has quit IRC (Ping timeout: 745 seconds)
11:01 🔗 tsr has joined #internetarchive
11:36 🔗 tuluu has quit IRC (Ping timeout: 276 seconds)
11:36 🔗 tuluu has joined #internetarchive
14:42 🔗 tuluu has quit IRC (Remote host closed the connection)
14:43 🔗 tuluu has joined #internetarchive
15:09 🔗 martini has joined #internetarchive
15:55 🔗 Raccoon > Sorry. This URL has been excluded from the Wayback Machine.
15:55 🔗 Raccoon Did/does IA really remove sites from the wayback machine upon request?
15:59 🔗 JAA Yep
16:00 🔗 JAA Partial list: https://www.archiveteam.org/index.php?title=List_of_websites_excluded_from_the_Wayback_Machine
16:00 🔗 atphoenix Dropbox appears to be excluded too. not on that list.
16:01 🔗 Raccoon I just noticed it for a random ass tiny website.
16:02 🔗 Raccoon http://web.archive.org/web/*/http://www.caatstudios.com/caats/index.html
16:02 🔗 JAA atphoenix: Because only some parts of Dropbox are excluded. The list (currently) only contains full websites that are excluded.
16:03 🔗 atphoenix I'm surprised that http://truecrypt.sourceforge.net/ is on the list
16:03 🔗 Raccoon the site no longer exists, and now there's no history of it.
16:03 🔗 atphoenix Are these manual exclusions or somehow related to robots.txt?
16:04 🔗 Raccoon if IA respected robots, there'd be no archive
16:04 🔗 JAA There has been some discussion about whether to also add sites that are partially excluded and, if so, to what degree (e.g. entire sections, individual pagesURLs).
16:04 🔗 JAA Manual exclusions, as explained on the wiki page.
16:05 🔗 JAA robots.txt rules produce a different error and aren't always followed, though it's not entirely clear under which circumstances they are/aren't.
16:06 🔗 Raccoon JAA: just doesn't feel right that an company website (animation studio; disney splinter) should go out of business and have its history erased
16:08 🔗 atphoenix 'no history of it'...ya, that's a problem. But seems that's the way some like to go. Ashes scattered into the ocean of /dev/null
16:08 🔗 atphoenix metadata references may be all that is left
16:08 🔗 Raccoon i wonder if any sites can be later un-excluded
16:08 🔗 Raccoon or if ArchiveTeam could back up IA in case it starts deleting stuff
16:09 🔗 atphoenix if IA survives 200 years, and copyright laws don't get extended into forever, then maybe?
16:10 🔗 atphoenix so https://www.dropbox.com/s/ is excluded but https://www.dropbox.com/ is not excluded. This can't archive shared dropbox content.
16:10 🔗 atphoenix Thus*
16:11 🔗 JAA The data still exists. IA does not delete these.
16:12 🔗 JAA I've archived files on Dropbox through the WBM before, but YMMV. IA's robots.txt parser is well known to be somewhat broken, so sometimes it blocks you, sometimes it doesn't.
16:14 🔗 JAA Also, regarding backing up IA: IA.BAK. But much of the WBM data is in locked collections that cannot be accessed publicly, so mirroring them is also impossible without collaboration with IA (which, if it were to happen, would almost certainly include terms that you shall not redistribute that data).
16:14 🔗 atphoenix Raccoon, there is https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK but it appears inactive. Last I read IA is at 60 PB data.
16:15 🔗 Raccoon :o
16:16 🔗 atphoenix i.e. 6000x 10 TB 3.5" drives
16:18 🔗 Raccoon https://www.archiveteam.org/images/9/93/Ohreally.gif
16:19 🔗 Raccoon might be easier to just identify the bad actors and set them out to sea :)
16:30 🔗 JAA 60617 "TB" (TiB?) of unique data across a bit over 10k disks as of right now.
16:34 🔗 Raccoon JAA: Do you just have a basement sweatshop of kids shucking harddisks day and night?
16:36 🔗 atphoenix Raccoon, I think JAA is clarifying the IA configuration, not his own
16:36 🔗 yano_ is now known as yano
16:36 🔗 Raccoon oh, shucks.
16:36 🔗 JAA Correct, and I'm not employed by IA or anything either.
16:37 🔗 Raccoon Well, I was basing on the assumption that IA.BAK is an exclusively non-employee endeavor.
16:38 🔗 JAA That is correct. I was never involved with that project either; it broke well before I joined AT.
16:40 🔗 qw3rty has joined #internetarchive
16:40 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
16:52 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
16:56 🔗 qw3rty has joined #internetarchive
16:57 🔗 atphoenix "CAAT Studios (Classical Animation and Advanced Technology) was an animation studio in Hollywood, California, founded in July 2004 by animators Dave Kuhn, Toby Bluth, and Shawn Keller. The studio also enlisted the talent of fellow artists Frank Molieri, Bill Waldman, and Craig Maras. Its domain name expired in May 2007. "
16:57 🔗 atphoenix https://en.wikifur.com/wiki/CAAT_Studios
16:57 🔗 atphoenix looks like it was a small company
16:59 🔗 atphoenix and from the link to IA on that wiki page (last updated 2011), I'm guessing the IA record was still publicly accessible in 2011.
17:00 🔗 Raccoon which makes it all the more weird that it was removed from IA years after the site vanished
17:00 🔗 atphoenix or it means that someone just put that IA link in the wiki without checking IA?
17:00 🔗 Raccoon but i don't know what the policy is on politely requesting sites to be removed
17:01 🔗 atphoenix it is possible the people behind CAAT Studios filed a takedown request against IA
17:02 🔗 atphoenix also mentioned here https://www.lcad.edu/person/dave-kuhn
17:04 🔗 Raccoon Somebody at IA might be a furry fan and did a favor :P secret underground furry cabal within the organization.
17:05 🔗 atphoenix vanishing with barely a trace is what happened to the video game site CHV.net back ~2000. Just a few traces left at http://web.archive.org/web/20000229104821/http://www.chv.net:80/
22:34 🔗 martini has quit IRC (Quit: No Reasson)
23:13 🔗 OrIdow6 has joined #internetarchive
23:19 🔗 OrIdow6 Will the secret furry cabal please contact me... thanks
23:22 🔗 Raccoon spies within spies
23:48 🔗 atphoenix has quit IRC (Quit: Leaving)
23:48 🔗 atphoenix has joined #internetarchive

irclogger-viewer