#internetarchive 2017-05-19,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***VADemon has quit IRC (Quit: left4dead) [01:03]
..... (idle for 24mn)
Stilett0 has quit IRC (Ping timeout: 370 seconds) [01:27]
.... (idle for 18mn)
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #internetarchive
[01:45]
SmileyG has joined #internetarchive
sep332 has joined #internetarchive
sep332_ has quit IRC (Read error: Connection reset by peer)
alembic has quit IRC (Read error: Connection reset by peer)
alembic has joined #internetarchive
Smiley has quit IRC (Remote host closed the connection)
davidar has quit IRC (Ping timeout: 260 seconds)
Ctrl-S___ has quit IRC (Ping timeout: 260 seconds)
SirCmpwn has quit IRC (Ping timeout: 260 seconds)
REiN^ has quit IRC (Read error: Operation timed out)
Ctrl-S___ has joined #internetarchive
davidar has joined #internetarchive
REiN^ has joined #internetarchive
Stilett0 has joined #internetarchive
SirCmpwn has joined #internetarchive
fuckHilla has joined #internetarchive
[01:59]
........ (idle for 36mn)
flurblefi has joined #internetarchive [02:45]
flurblefiI was just wondering is there any way to download stuff from the comcast websites archive http://www.archiveteam.org/index.php?title=Comcast_Personal_Web_Pages if the wayback machine is now blocking it because the new company xfinity put a robots.txt
http://web.archive.org/web/*/http://home.comcast.net/ —+— http://my.xfinity.com/robots.txt
[02:46]
.... (idle for 18mn)
Lord_Nighi... think yes, if you replace the * in that url with a date older than the robots.txt thing
like 200801011200000 or something
just keep adding zeroes until it works
[03:05]
.... (idle for 16mn)
flurblefinah, the internet archive nukes everything in the history ever as soon as any company takes control of the domain name it's really pretty messed up
given how much domain names change hands
[03:22]
Lord_Nighflurblefi: i don't believe that is true anymore, and hasn't been for 2 or 3 years
but don't quote me on that
[03:23]
flurblefiit is I tested :(
it's why I say to people always use www.archive.is instead
[03:24]
Lord_Nightested when [03:24]
flurblefifrequently and just now when you said that before too :(
the new robots.txt kills history
[03:25]
Lord_Nighi believe this is false. can you give me an example of a site which has a robots.txt and used to not?
my example site isn't cooperating
[03:26]
***kyan has quit IRC (Read error: Operation timed out) [03:28]
flurblefihttp://web.archive.org/web/20150711141433/http://home.comcast.net/~max555/rites/lilith_1.htm
or http://web.archive.org/web/20021231204110im_/http://diablerie.org:80/images/arch.gif
[03:36]
***kyan has joined #internetarchive [03:37]
Lord_Nighlooking [03:38]
flurblefiin the latter case the new owners bought the domain name deliberately to kill a competing roleplay website
and redirected it to their own one
(diablerie is www.sanguinus.org now)
[03:38]
Lord_Nighinteresting. this seems to be a recent regression.
arkiver: can you look into this?
the home.comcast.net robots.txt can be looked at from that date and it seemed malformed at that time, but it is using the latest robots.txt for blocking, not the one at the date of the archiving
which is definitely wrong
this was fixed like 3 years ago, i guess it broke recently
[03:42]
***Somebody2 has joined #internetarchive [03:49]
........................... (idle for 2h14mn)
kyan has quit IRC (Read error: Operation timed out) [06:03]
........... (idle for 53mn)
DFJustinflurblefi: they don't actually nuke it you just can't access it
but they've been floating the idea on the blog of just ignoring robots.txt altogether so that seems likely to happen at some point
at which point the stuff would be retroactively available
https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/
[06:56]
....... (idle for 31mn)
flurblefiOh that's nice it sounds like they might be having a change of heart, I hope they actually do something about it though, as is I tell people to use www.archive.is/ (has search now) or www.webcitation.org (it has a search too but only for a specific url) instead at the moment, with webcitation can also snapshot the internet archive and use a long url to have both links in the same link
so that if one dies the other will still be able to be found
I wish wikipedia did something like that, I understand they have some sort of connection with the internet archive but there should never be only one backup of anything
[07:29]
....... (idle for 31mn)
I used to use Offline Explorer but this is actually really good, it's not WARC but it's a lot more convenient for everyday use http://addons.mozilla.org/addon/scrapbook
This site seems to want to compete with Internet Archive but I don't know how true that is about IA not picking up complex made scripted pages etc properly http://webrecorder.io/_faq
[08:02]
......... (idle for 40mn)
***atomotic has joined #internetarchive [08:43]
......................... (idle for 2h2mn)
flurblefifound what I was looking for thanks to 010 Editor examination of the index file for http://archive.org/details/archiveteam_comcast_20151007011615 (thanks a lot dcmorton) net,comcast,home)/~max555/rites/lilith_1.htm 20151002140238 http://home.comcast.net/~max555/rites/lilith_1.htm text/html 200 A5AQHS3TZ7TAZQOVBZ2CN6IPTCIWYUUS - - 8178 32231705976 archiveteam_comcast_20151007011615/co
mcast_20151007011615.megawarc.warc.gz
[10:45]
just a shame that a lot of other websites like the old version of sanguinus I mentioned earlier, aren't part of a bigger website so don't have an archiveteam arc to rescue from :( [10:56]
***atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:04]
.... (idle for 19mn)
fuckHilla has quit IRC (Ping timeout: 268 seconds) [11:23]
.......... (idle for 49mn)
atomotic has joined #internetarchive [12:12]
........ (idle for 38mn)
tsr has quit IRC (Ping timeout: 245 seconds)
yuitimoth has quit IRC (Read error: Operation timed out)
BaKFat has quit IRC (Read error: Operation timed out)
luckcolor has quit IRC (Ping timeout: 245 seconds)
tapedrive has quit IRC (Ping timeout: 245 seconds)
gig3x has quit IRC (Read error: Operation timed out)
gig3x has joined #internetarchive
luckcolor has joined #internetarchive
tsr has joined #internetarchive
tapedrive has joined #internetarchive
yuitimoth has joined #internetarchive
[12:50]
yuitimoth has quit IRC (Ping timeout: 245 seconds)
yuitimoth has joined #internetarchive
[13:00]
........ (idle for 37mn)
yuitimoth has quit IRC (se.hub irc.efnet.nl) [13:37]
.... (idle for 15mn)
yuitimoth has joined #internetarchive [13:52]
mhazinsk has quit IRC (Read error: Operation timed out) [13:58]
mhazinsk has joined #internetarchive [14:04]
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:12]
...... (idle for 29mn)
mhazinsk has quit IRC (Read error: Operation timed out) [14:41]
mhazinsk has joined #internetarchive [14:47]
.................... (idle for 1h39mn)
Stiletto has joined #internetarchive
Stilett0 has quit IRC (Ping timeout: 246 seconds)
[16:26]
......... (idle for 40mn)
Stiletto has quit IRC () [17:08]
DoomTay has joined #internetarchive [17:18]
DoomTayHow does Wayback "guess" a page's encoding?
Because I've noticed that it misses the mark for pages where the coding is explicitly defined
I've shot a message through the feedback form but I've yet to get a response
[17:21]
***DoomTay has quit IRC (Ping timeout: 268 seconds)
DoomTay has joined #internetarchive
[17:29]
.... (idle for 16mn)
Martini has joined #internetarchive [17:45]
MartiniHi there.
I'm disappointed today with the Internet Archive.
They locked my account and erased all my uploaded stuff.
I didn't have anything illegal there and I didn't get any kind of notice.
I would like to know what happen or what I did wrong.
[17:45]
flurblefiuse http://archive.is or http://webcitation.org/archive or http://webrecorder.io/ or http://freezepage.com or http://addons.mozilla.org/addon/scrapbook [17:49]
***Stilett0 has joined #internetarchive [18:00]
Stilett0 has quit IRC () [18:10]
.... (idle for 17mn)
Stilett0 has joined #internetarchive
Stilett0 is now known as Stiletto
DoomTay has quit IRC (Ping timeout: 271 seconds)
[18:27]
............ (idle for 55mn)
kyan has joined #internetarchive [19:26]
atomotic has joined #internetarchive [19:39]
.... (idle for 15mn)
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [19:54]
.......... (idle for 46mn)
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #internetarchive
[20:40]
........ (idle for 39mn)
DFJustinMartini contact info@archive.org [21:19]
MartiniThanks. I already did that. Jeff helped me.
I was wondering if there can be something to help Jeff. He is the only one replying on the forums.
https://archive.org/about/faqs.php#forum
[21:19]
................ (idle for 1h18mn)
***Martini has quit IRC (Ping timeout: 255 seconds) [22:38]
........ (idle for 37mn)
Asparagir has quit IRC (Asparagir) [23:15]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)