[02:41] *** DogsRNice has quit IRC (Ping timeout: 276 seconds) [02:42] *** DogsRNice has joined #internetarchive [03:28] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:47] *** martini has quit IRC (Quit: No Reasson) [04:04] *** qw3rty2 has joined #internetarchive [04:13] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [05:07] *** odemg has quit IRC (Ping timeout: 745 seconds) [05:11] *** odemg has joined #internetarchive [08:20] *** OrIdow6 has quit IRC (Ping timeout: 276 seconds) [10:42] *** yano_ has joined #internetarchive [10:43] *** yano has quit IRC (Read error: Operation timed out) [10:44] *** tsr has quit IRC (Ping timeout: 745 seconds) [11:01] *** tsr has joined #internetarchive [11:36] *** tuluu has quit IRC (Ping timeout: 276 seconds) [11:36] *** tuluu has joined #internetarchive [14:42] *** tuluu has quit IRC (Remote host closed the connection) [14:43] *** tuluu has joined #internetarchive [15:09] *** martini has joined #internetarchive [15:55] > Sorry. This URL has been excluded from the Wayback Machine. [15:55] Did/does IA really remove sites from the wayback machine upon request? [15:59] Yep [16:00] Partial list: https://www.archiveteam.org/index.php?title=List_of_websites_excluded_from_the_Wayback_Machine [16:00] Dropbox appears to be excluded too. not on that list. [16:01] I just noticed it for a random ass tiny website. [16:02] http://web.archive.org/web/*/http://www.caatstudios.com/caats/index.html [16:02] atphoenix: Because only some parts of Dropbox are excluded. The list (currently) only contains full websites that are excluded. [16:03] I'm surprised that http://truecrypt.sourceforge.net/ is on the list [16:03] the site no longer exists, and now there's no history of it. [16:03] Are these manual exclusions or somehow related to robots.txt? [16:04] if IA respected robots, there'd be no archive [16:04] There has been some discussion about whether to also add sites that are partially excluded and, if so, to what degree (e.g. entire sections, individual pagesURLs). [16:04] Manual exclusions, as explained on the wiki page. [16:05] robots.txt rules produce a different error and aren't always followed, though it's not entirely clear under which circumstances they are/aren't. [16:06] JAA: just doesn't feel right that an company website (animation studio; disney splinter) should go out of business and have its history erased [16:08] 'no history of it'...ya, that's a problem. But seems that's the way some like to go. Ashes scattered into the ocean of /dev/null [16:08] metadata references may be all that is left [16:08] i wonder if any sites can be later un-excluded [16:08] or if ArchiveTeam could back up IA in case it starts deleting stuff [16:09] if IA survives 200 years, and copyright laws don't get extended into forever, then maybe? [16:10] so https://www.dropbox.com/s/ is excluded but https://www.dropbox.com/ is not excluded. This can't archive shared dropbox content. [16:10] Thus* [16:11] The data still exists. IA does not delete these. [16:12] I've archived files on Dropbox through the WBM before, but YMMV. IA's robots.txt parser is well known to be somewhat broken, so sometimes it blocks you, sometimes it doesn't. [16:14] Also, regarding backing up IA: IA.BAK. But much of the WBM data is in locked collections that cannot be accessed publicly, so mirroring them is also impossible without collaboration with IA (which, if it were to happen, would almost certainly include terms that you shall not redistribute that data). [16:14] Raccoon, there is https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK but it appears inactive. Last I read IA is at 60 PB data. [16:15] :o [16:16] i.e. 6000x 10 TB 3.5" drives [16:18] https://www.archiveteam.org/images/9/93/Ohreally.gif [16:19] might be easier to just identify the bad actors and set them out to sea :) [16:30] 60617 "TB" (TiB?) of unique data across a bit over 10k disks as of right now. [16:34] JAA: Do you just have a basement sweatshop of kids shucking harddisks day and night? [16:36] Raccoon, I think JAA is clarifying the IA configuration, not his own [16:36] *** yano_ is now known as yano [16:36] oh, shucks. [16:36] Correct, and I'm not employed by IA or anything either. [16:37] Well, I was basing on the assumption that IA.BAK is an exclusively non-employee endeavor. [16:38] That is correct. I was never involved with that project either; it broke well before I joined AT. [16:40] *** qw3rty has joined #internetarchive [16:40] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [16:52] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [16:56] *** qw3rty has joined #internetarchive [16:57] "CAAT Studios (Classical Animation and Advanced Technology) was an animation studio in Hollywood, California, founded in July 2004 by animators Dave Kuhn, Toby Bluth, and Shawn Keller. The studio also enlisted the talent of fellow artists Frank Molieri, Bill Waldman, and Craig Maras. Its domain name expired in May 2007. " [16:57] https://en.wikifur.com/wiki/CAAT_Studios [16:57] looks like it was a small company [16:59] and from the link to IA on that wiki page (last updated 2011), I'm guessing the IA record was still publicly accessible in 2011. [17:00] which makes it all the more weird that it was removed from IA years after the site vanished [17:00] or it means that someone just put that IA link in the wiki without checking IA? [17:00] but i don't know what the policy is on politely requesting sites to be removed [17:01] it is possible the people behind CAAT Studios filed a takedown request against IA [17:02] also mentioned here https://www.lcad.edu/person/dave-kuhn [17:04] Somebody at IA might be a furry fan and did a favor :P secret underground furry cabal within the organization. [17:05] vanishing with barely a trace is what happened to the video game site CHV.net back ~2000. Just a few traces left at http://web.archive.org/web/20000229104821/http://www.chv.net:80/ [22:34] *** martini has quit IRC (Quit: No Reasson) [23:13] *** OrIdow6 has joined #internetarchive [23:19] Will the secret furry cabal please contact me... thanks [23:22] spies within spies [23:48] *** atphoenix has quit IRC (Quit: Leaving) [23:48] *** atphoenix has joined #internetarchive