Time |
Nickname |
Message |
01:03
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
01:27
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 370 seconds) |
01:45
🔗
|
|
REiN^ has quit IRC (Max SendQ exceeded) |
01:45
🔗
|
|
REiN^ has joined #internetarchive |
01:59
🔗
|
|
SmileyG has joined #internetarchive |
01:59
🔗
|
|
sep332 has joined #internetarchive |
02:02
🔗
|
|
sep332_ has quit IRC (Read error: Connection reset by peer) |
02:02
🔗
|
|
alembic has quit IRC (Read error: Connection reset by peer) |
02:02
🔗
|
|
alembic has joined #internetarchive |
02:02
🔗
|
|
Smiley has quit IRC (Remote host closed the connection) |
02:02
🔗
|
|
davidar has quit IRC (Ping timeout: 260 seconds) |
02:02
🔗
|
|
Ctrl-S___ has quit IRC (Ping timeout: 260 seconds) |
02:02
🔗
|
|
SirCmpwn has quit IRC (Ping timeout: 260 seconds) |
02:02
🔗
|
|
REiN^ has quit IRC (Read error: Operation timed out) |
02:02
🔗
|
|
Ctrl-S___ has joined #internetarchive |
02:03
🔗
|
|
davidar has joined #internetarchive |
02:05
🔗
|
|
REiN^ has joined #internetarchive |
02:09
🔗
|
|
Stilett0 has joined #internetarchive |
02:09
🔗
|
|
SirCmpwn has joined #internetarchive |
02:09
🔗
|
|
fuckHilla has joined #internetarchive |
02:45
🔗
|
|
flurblefi has joined #internetarchive |
02:46
🔗
|
flurblefi |
I was just wondering is there any way to download stuff from the comcast websites archive http://www.archiveteam.org/index.php?title=Comcast_Personal_Web_Pages if the wayback machine is now blocking it because the new company xfinity put a robots.txt |
02:47
🔗
|
flurblefi |
http://web.archive.org/web/*/http://home.comcast.net/ —+— http://my.xfinity.com/robots.txt |
03:05
🔗
|
Lord_Nigh |
i... think yes, if you replace the * in that url with a date older than the robots.txt thing |
03:05
🔗
|
Lord_Nigh |
like 200801011200000 or something |
03:06
🔗
|
Lord_Nigh |
just keep adding zeroes until it works |
03:22
🔗
|
flurblefi |
nah, the internet archive nukes everything in the history ever as soon as any company takes control of the domain name it's really pretty messed up |
03:22
🔗
|
flurblefi |
given how much domain names change hands |
03:23
🔗
|
Lord_Nigh |
flurblefi: i don't believe that is true anymore, and hasn't been for 2 or 3 years |
03:23
🔗
|
Lord_Nigh |
but don't quote me on that |
03:24
🔗
|
flurblefi |
it is I tested :( |
03:24
🔗
|
flurblefi |
it's why I say to people always use www.archive.is instead |
03:24
🔗
|
Lord_Nigh |
tested when |
03:25
🔗
|
flurblefi |
frequently and just now when you said that before too :( |
03:25
🔗
|
flurblefi |
the new robots.txt kills history |
03:26
🔗
|
Lord_Nigh |
i believe this is false. can you give me an example of a site which has a robots.txt and used to not? |
03:27
🔗
|
Lord_Nigh |
my example site isn't cooperating |
03:28
🔗
|
|
kyan has quit IRC (Read error: Operation timed out) |
03:36
🔗
|
flurblefi |
http://web.archive.org/web/20150711141433/http://home.comcast.net/~max555/rites/lilith_1.htm |
03:37
🔗
|
flurblefi |
or http://web.archive.org/web/20021231204110im_/http://diablerie.org:80/images/arch.gif |
03:37
🔗
|
|
kyan has joined #internetarchive |
03:38
🔗
|
Lord_Nigh |
looking |
03:38
🔗
|
flurblefi |
in the latter case the new owners bought the domain name deliberately to kill a competing roleplay website |
03:38
🔗
|
flurblefi |
and redirected it to their own one |
03:38
🔗
|
flurblefi |
(diablerie is www.sanguinus.org now) |
03:42
🔗
|
Lord_Nigh |
interesting. this seems to be a recent regression. |
03:42
🔗
|
Lord_Nigh |
arkiver: can you look into this? |
03:43
🔗
|
Lord_Nigh |
the home.comcast.net robots.txt can be looked at from that date and it seemed malformed at that time, but it is using the latest robots.txt for blocking, not the one at the date of the archiving |
03:43
🔗
|
Lord_Nigh |
which is definitely wrong |
03:44
🔗
|
Lord_Nigh |
this was fixed like 3 years ago, i guess it broke recently |
03:49
🔗
|
|
Somebody2 has joined #internetarchive |
06:03
🔗
|
|
kyan has quit IRC (Read error: Operation timed out) |
06:56
🔗
|
DFJustin |
flurblefi: they don't actually nuke it you just can't access it |
06:56
🔗
|
DFJustin |
but they've been floating the idea on the blog of just ignoring robots.txt altogether so that seems likely to happen at some point |
06:57
🔗
|
DFJustin |
at which point the stuff would be retroactively available |
06:58
🔗
|
DFJustin |
https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |
07:29
🔗
|
flurblefi |
Oh that's nice it sounds like they might be having a change of heart, I hope they actually do something about it though, as is I tell people to use www.archive.is/ (has search now) or www.webcitation.org (it has a search too but only for a specific url) instead at the moment, with webcitation can also snapshot the internet archive and use a long url to have both links in the same link |
07:29
🔗
|
flurblefi |
so that if one dies the other will still be able to be found |
07:31
🔗
|
flurblefi |
I wish wikipedia did something like that, I understand they have some sort of connection with the internet archive but there should never be only one backup of anything |
08:02
🔗
|
flurblefi |
I used to use Offline Explorer but this is actually really good, it's not WARC but it's a lot more convenient for everyday use http://addons.mozilla.org/addon/scrapbook |
08:03
🔗
|
flurblefi |
This site seems to want to compete with Internet Archive but I don't know how true that is about IA not picking up complex made scripted pages etc properly http://webrecorder.io/_faq |
08:43
🔗
|
|
atomotic has joined #internetarchive |
10:45
🔗
|
flurblefi |
found what I was looking for thanks to 010 Editor examination of the index file for http://archive.org/details/archiveteam_comcast_20151007011615 (thanks a lot dcmorton) net,comcast,home)/~max555/rites/lilith_1.htm 20151002140238 http://home.comcast.net/~max555/rites/lilith_1.htm text/html 200 A5AQHS3TZ7TAZQOVBZ2CN6IPTCIWYUUS - - 8178 32231705976 archiveteam_comcast_20151007011615/co |
10:45
🔗
|
flurblefi |
mcast_20151007011615.megawarc.warc.gz |
10:56
🔗
|
flurblefi |
just a shame that a lot of other websites like the old version of sanguinus I mentioned earlier, aren't part of a bigger website so don't have an archiveteam arc to rescue from :( |
11:04
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
11:23
🔗
|
|
fuckHilla has quit IRC (Ping timeout: 268 seconds) |
12:12
🔗
|
|
atomotic has joined #internetarchive |
12:50
🔗
|
|
tsr has quit IRC (Ping timeout: 245 seconds) |
12:50
🔗
|
|
yuitimoth has quit IRC (Read error: Operation timed out) |
12:50
🔗
|
|
BaKFat has quit IRC (Read error: Operation timed out) |
12:50
🔗
|
|
luckcolor has quit IRC (Ping timeout: 245 seconds) |
12:50
🔗
|
|
tapedrive has quit IRC (Ping timeout: 245 seconds) |
12:50
🔗
|
|
gig3x has quit IRC (Read error: Operation timed out) |
12:50
🔗
|
|
gig3x has joined #internetarchive |
12:50
🔗
|
|
luckcolor has joined #internetarchive |
12:51
🔗
|
|
tsr has joined #internetarchive |
12:51
🔗
|
|
tapedrive has joined #internetarchive |
12:52
🔗
|
|
yuitimoth has joined #internetarchive |
13:00
🔗
|
|
yuitimoth has quit IRC (Ping timeout: 245 seconds) |
13:00
🔗
|
|
yuitimoth has joined #internetarchive |
13:37
🔗
|
|
yuitimoth has quit IRC (se.hub irc.efnet.nl) |
13:52
🔗
|
|
yuitimoth has joined #internetarchive |
13:58
🔗
|
|
mhazinsk has quit IRC (Read error: Operation timed out) |
14:04
🔗
|
|
mhazinsk has joined #internetarchive |
14:12
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
14:41
🔗
|
|
mhazinsk has quit IRC (Read error: Operation timed out) |
14:47
🔗
|
|
mhazinsk has joined #internetarchive |
16:26
🔗
|
|
Stiletto has joined #internetarchive |
16:28
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 246 seconds) |
17:08
🔗
|
|
Stiletto has quit IRC () |
17:18
🔗
|
|
DoomTay has joined #internetarchive |
17:21
🔗
|
DoomTay |
How does Wayback "guess" a page's encoding? |
17:21
🔗
|
DoomTay |
Because I've noticed that it misses the mark for pages where the coding is explicitly defined |
17:22
🔗
|
DoomTay |
I've shot a message through the feedback form but I've yet to get a response |
17:29
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
17:29
🔗
|
|
DoomTay has joined #internetarchive |
17:45
🔗
|
|
Martini has joined #internetarchive |
17:45
🔗
|
Martini |
Hi there. |
17:46
🔗
|
Martini |
I'm disappointed today with the Internet Archive. |
17:46
🔗
|
Martini |
They locked my account and erased all my uploaded stuff. |
17:46
🔗
|
Martini |
I didn't have anything illegal there and I didn't get any kind of notice. |
17:47
🔗
|
Martini |
I would like to know what happen or what I did wrong. |
17:49
🔗
|
flurblefi |
use http://archive.is or http://webcitation.org/archive or http://webrecorder.io/ or http://freezepage.com or http://addons.mozilla.org/addon/scrapbook |
18:00
🔗
|
|
Stilett0 has joined #internetarchive |
18:10
🔗
|
|
Stilett0 has quit IRC () |
18:27
🔗
|
|
Stilett0 has joined #internetarchive |
18:31
🔗
|
|
Stilett0 is now known as Stiletto |
18:31
🔗
|
|
DoomTay has quit IRC (Ping timeout: 271 seconds) |
19:26
🔗
|
|
kyan has joined #internetarchive |
19:39
🔗
|
|
atomotic has joined #internetarchive |
19:54
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
20:40
🔗
|
|
REiN^ has quit IRC (Max SendQ exceeded) |
20:40
🔗
|
|
REiN^ has joined #internetarchive |
21:19
🔗
|
DFJustin |
Martini contact info@archive.org |
21:19
🔗
|
Martini |
Thanks. I already did that. Jeff helped me. |
21:20
🔗
|
Martini |
I was wondering if there can be something to help Jeff. He is the only one replying on the forums. |
21:20
🔗
|
Martini |
https://archive.org/about/faqs.php#forum |
22:38
🔗
|
|
Martini has quit IRC (Ping timeout: 255 seconds) |
23:15
🔗
|
|
Asparagir has quit IRC (Asparagir) |