#internetarchive 2017-05-19,Fri

↑back Search

Time	Nickname	Message
01:03 ^🔗		VADemon has quit IRC (Quit: left4dead)
01:27 ^🔗		Stilett0 has quit IRC (Ping timeout: 370 seconds)
01:45 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
01:45 ^🔗		REiN^ has joined #internetarchive
01:59 ^🔗		SmileyG has joined #internetarchive
01:59 ^🔗		sep332 has joined #internetarchive
02:02 ^🔗		sep332_ has quit IRC (Read error: Connection reset by peer)
02:02 ^🔗		alembic has quit IRC (Read error: Connection reset by peer)
02:02 ^🔗		alembic has joined #internetarchive
02:02 ^🔗		Smiley has quit IRC (Remote host closed the connection)
02:02 ^🔗		davidar has quit IRC (Ping timeout: 260 seconds)
02:02 ^🔗		Ctrl-S___ has quit IRC (Ping timeout: 260 seconds)
02:02 ^🔗		SirCmpwn has quit IRC (Ping timeout: 260 seconds)
02:02 ^🔗		REiN^ has quit IRC (Read error: Operation timed out)
02:02 ^🔗		Ctrl-S___ has joined #internetarchive
02:03 ^🔗		davidar has joined #internetarchive
02:05 ^🔗		REiN^ has joined #internetarchive
02:09 ^🔗		Stilett0 has joined #internetarchive
02:09 ^🔗		SirCmpwn has joined #internetarchive
02:09 ^🔗		fuckHilla has joined #internetarchive
02:45 ^🔗		flurblefi has joined #internetarchive
02:46 ^🔗	flurblefi	I was just wondering is there any way to download stuff from the comcast websites archive http://www.archiveteam.org/index.php?title=Comcast_Personal_Web_Pages if the wayback machine is now blocking it because the new company xfinity put a robots.txt
02:47 ^🔗	flurblefi	http://web.archive.org/web/*/http://home.comcast.net/ —+— http://my.xfinity.com/robots.txt
03:05 ^🔗	Lord_Nigh	i... think yes, if you replace the * in that url with a date older than the robots.txt thing
03:05 ^🔗	Lord_Nigh	like 200801011200000 or something
03:06 ^🔗	Lord_Nigh	just keep adding zeroes until it works
03:22 ^🔗	flurblefi	nah, the internet archive nukes everything in the history ever as soon as any company takes control of the domain name it's really pretty messed up
03:22 ^🔗	flurblefi	given how much domain names change hands
03:23 ^🔗	Lord_Nigh	flurblefi: i don't believe that is true anymore, and hasn't been for 2 or 3 years
03:23 ^🔗	Lord_Nigh	but don't quote me on that
03:24 ^🔗	flurblefi	it is I tested :(
03:24 ^🔗	flurblefi	it's why I say to people always use www.archive.is instead
03:24 ^🔗	Lord_Nigh	tested when
03:25 ^🔗	flurblefi	frequently and just now when you said that before too :(
03:25 ^🔗	flurblefi	the new robots.txt kills history
03:26 ^🔗	Lord_Nigh	i believe this is false. can you give me an example of a site which has a robots.txt and used to not?
03:27 ^🔗	Lord_Nigh	my example site isn't cooperating
03:28 ^🔗		kyan has quit IRC (Read error: Operation timed out)
03:36 ^🔗	flurblefi	http://web.archive.org/web/20150711141433/http://home.comcast.net/~max555/rites/lilith_1.htm
03:37 ^🔗	flurblefi	or http://web.archive.org/web/20021231204110im_/http://diablerie.org:80/images/arch.gif
03:37 ^🔗		kyan has joined #internetarchive
03:38 ^🔗	Lord_Nigh	looking
03:38 ^🔗	flurblefi	in the latter case the new owners bought the domain name deliberately to kill a competing roleplay website
03:38 ^🔗	flurblefi	and redirected it to their own one
03:38 ^🔗	flurblefi	(diablerie is www.sanguinus.org now)
03:42 ^🔗	Lord_Nigh	interesting. this seems to be a recent regression.
03:42 ^🔗	Lord_Nigh	arkiver: can you look into this?
03:43 ^🔗	Lord_Nigh	the home.comcast.net robots.txt can be looked at from that date and it seemed malformed at that time, but it is using the latest robots.txt for blocking, not the one at the date of the archiving
03:43 ^🔗	Lord_Nigh	which is definitely wrong
03:44 ^🔗	Lord_Nigh	this was fixed like 3 years ago, i guess it broke recently
03:49 ^🔗		Somebody2 has joined #internetarchive
06:03 ^🔗		kyan has quit IRC (Read error: Operation timed out)
06:56 ^🔗	DFJustin	flurblefi: they don't actually nuke it you just can't access it
06:56 ^🔗	DFJustin	but they've been floating the idea on the blog of just ignoring robots.txt altogether so that seems likely to happen at some point
06:57 ^🔗	DFJustin	at which point the stuff would be retroactively available
06:58 ^🔗	DFJustin	https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
07:29 ^🔗	flurblefi	Oh that's nice it sounds like they might be having a change of heart, I hope they actually do something about it though, as is I tell people to use www.archive.is/ (has search now) or www.webcitation.org (it has a search too but only for a specific url) instead at the moment, with webcitation can also snapshot the internet archive and use a long url to have both links in the same link
07:29 ^🔗	flurblefi	so that if one dies the other will still be able to be found
07:31 ^🔗	flurblefi	I wish wikipedia did something like that, I understand they have some sort of connection with the internet archive but there should never be only one backup of anything
08:02 ^🔗	flurblefi	I used to use Offline Explorer but this is actually really good, it's not WARC but it's a lot more convenient for everyday use http://addons.mozilla.org/addon/scrapbook
08:03 ^🔗	flurblefi	This site seems to want to compete with Internet Archive but I don't know how true that is about IA not picking up complex made scripted pages etc properly http://webrecorder.io/_faq
08:43 ^🔗		atomotic has joined #internetarchive
10:45 ^🔗	flurblefi	found what I was looking for thanks to 010 Editor examination of the index file for http://archive.org/details/archiveteam_comcast_20151007011615 (thanks a lot dcmorton) net,comcast,home)/~max555/rites/lilith_1.htm 20151002140238 http://home.comcast.net/~max555/rites/lilith_1.htm text/html 200 A5AQHS3TZ7TAZQOVBZ2CN6IPTCIWYUUS - - 8178 32231705976 archiveteam_comcast_20151007011615/co
10:45 ^🔗	flurblefi	mcast_20151007011615.megawarc.warc.gz
10:56 ^🔗	flurblefi	just a shame that a lot of other websites like the old version of sanguinus I mentioned earlier, aren't part of a bigger website so don't have an archiveteam arc to rescue from :(
11:04 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:23 ^🔗		fuckHilla has quit IRC (Ping timeout: 268 seconds)
12:12 ^🔗		atomotic has joined #internetarchive
12:50 ^🔗		tsr has quit IRC (Ping timeout: 245 seconds)
12:50 ^🔗		yuitimoth has quit IRC (Read error: Operation timed out)
12:50 ^🔗		BaKFat has quit IRC (Read error: Operation timed out)
12:50 ^🔗		luckcolor has quit IRC (Ping timeout: 245 seconds)
12:50 ^🔗		tapedrive has quit IRC (Ping timeout: 245 seconds)
12:50 ^🔗		gig3x has quit IRC (Read error: Operation timed out)
12:50 ^🔗		gig3x has joined #internetarchive
12:50 ^🔗		luckcolor has joined #internetarchive
12:51 ^🔗		tsr has joined #internetarchive
12:51 ^🔗		tapedrive has joined #internetarchive
12:52 ^🔗		yuitimoth has joined #internetarchive
13:00 ^🔗		yuitimoth has quit IRC (Ping timeout: 245 seconds)
13:00 ^🔗		yuitimoth has joined #internetarchive
13:37 ^🔗		yuitimoth has quit IRC (se.hub irc.efnet.nl)
13:52 ^🔗		yuitimoth has joined #internetarchive
13:58 ^🔗		mhazinsk has quit IRC (Read error: Operation timed out)
14:04 ^🔗		mhazinsk has joined #internetarchive
14:12 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:41 ^🔗		mhazinsk has quit IRC (Read error: Operation timed out)
14:47 ^🔗		mhazinsk has joined #internetarchive
16:26 ^🔗		Stiletto has joined #internetarchive
16:28 ^🔗		Stilett0 has quit IRC (Ping timeout: 246 seconds)
17:08 ^🔗		Stiletto has quit IRC ()
17:18 ^🔗		DoomTay has joined #internetarchive
17:21 ^🔗	DoomTay	How does Wayback "guess" a page's encoding?
17:21 ^🔗	DoomTay	Because I've noticed that it misses the mark for pages where the coding is explicitly defined
17:22 ^🔗	DoomTay	I've shot a message through the feedback form but I've yet to get a response
17:29 ^🔗		DoomTay has quit IRC (Ping timeout: 268 seconds)
17:29 ^🔗		DoomTay has joined #internetarchive
17:45 ^🔗		Martini has joined #internetarchive
17:45 ^🔗	Martini	Hi there.
17:46 ^🔗	Martini	I'm disappointed today with the Internet Archive.
17:46 ^🔗	Martini	They locked my account and erased all my uploaded stuff.
17:46 ^🔗	Martini	I didn't have anything illegal there and I didn't get any kind of notice.
17:47 ^🔗	Martini	I would like to know what happen or what I did wrong.
17:49 ^🔗	flurblefi	use http://archive.is or http://webcitation.org/archive or http://webrecorder.io/ or http://freezepage.com or http://addons.mozilla.org/addon/scrapbook
18:00 ^🔗		Stilett0 has joined #internetarchive
18:10 ^🔗		Stilett0 has quit IRC ()
18:27 ^🔗		Stilett0 has joined #internetarchive
18:31 ^🔗		Stilett0 is now known as Stiletto
18:31 ^🔗		DoomTay has quit IRC (Ping timeout: 271 seconds)
19:26 ^🔗		kyan has joined #internetarchive
19:39 ^🔗		atomotic has joined #internetarchive
19:54 ^🔗		atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
20:40 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
20:40 ^🔗		REiN^ has joined #internetarchive
21:19 ^🔗	DFJustin	Martini contact info@archive.org
21:19 ^🔗	Martini	Thanks. I already did that. Jeff helped me.
21:20 ^🔗	Martini	I was wondering if there can be something to help Jeff. He is the only one replying on the forums.
21:20 ^🔗	Martini	https://archive.org/about/faqs.php#forum
22:38 ^🔗		Martini has quit IRC (Ping timeout: 255 seconds)
23:15 ^🔗		Asparagir has quit IRC (Asparagir)

irclogger-viewer