Time |
Nickname |
Message |
03:09
🔗
|
|
qw3rty__ has joined #internetarchive |
03:16
🔗
|
|
qw3rty_ has quit IRC (Read error: Operation timed out) |
03:31
🔗
|
|
Coderjo has quit IRC (Read error: Operation timed out) |
03:35
🔗
|
|
Coderjo has joined #internetarchive |
03:50
🔗
|
|
Craigle has quit IRC (Quit: Ping timeout (120 seconds)) |
03:52
🔗
|
|
Craigle has joined #internetarchive |
03:59
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
08:42
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
08:43
🔗
|
|
OrIdow6 has quit IRC (Quit: Leaving.) |
08:52
🔗
|
|
kiska182 has quit IRC (Remote host closed the connection) |
08:52
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
08:53
🔗
|
|
Ryz has joined #internetarchive |
08:53
🔗
|
|
kiska182 has joined #internetarchive |
08:58
🔗
|
|
eythian has quit IRC (Remote host closed the connection) |
08:59
🔗
|
|
eythian has joined #internetarchive |
09:00
🔗
|
|
eythian has quit IRC (Remote host closed the connection) |
15:12
🔗
|
|
Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) |
16:16
🔗
|
Ryz |
Sigh, so you know about the bug of non-www and www websites in IA considered the same? This also applies with anything with a number after the 'www' within it |
16:16
🔗
|
Ryz |
https://web.archive.org/web/*/https://www5.laserfiche.com/ is the same as https://web.archive.org/web/*/https://www.laserfiche.com/ |
16:16
🔗
|
Ryz |
Which means I can also put something ridiculous like https://web.archive.org/web/*/https://www50000000000000000000000000.laserfiche.com/ and it'll still think it's the same domain |
19:04
🔗
|
|
Craigle has joined #internetarchive |
19:31
🔗
|
|
jodizzle has quit IRC (Quit: ZNC - https://znc.in) |
19:33
🔗
|
|
jodizzle has joined #internetarchive |
21:15
🔗
|
|
Stilett0 is now known as Stiletto |
23:44
🔗
|
atphoenix |
Depending on how it is implemented, I'm not sure I'd interpret that as a bug. It could be potentially a graceful fallback mechanism to weave together partially archived, or otherwise weirdly designed websites. Especially if the logic is to try to retrieve the *exact* URL requested, and failing that, then try to retrieve other variations as fallback alternatives. |
23:46
🔗
|
atphoenix |
I haven't tried enough combinations to intimately know how this behaves. But it sounds like it is effectively providing redirects from one to the other. |
23:50
🔗
|
JAA |
The protocol, auth data, and a leading www\d*\. is simply stripped from the URL entirely, both internally when processing WARCs and when you request something through the WBM. |
23:51
🔗
|
JAA |
There are also other "canonicalisations", as IA calls them, like stripping session IDs, converting everything to lower case, reordering the parameters alphabetically, and some others. |
23:51
🔗
|
JAA |
All snapshots that have the same canonical URL are treated exactly the same. |
23:53
🔗
|
JAA |
There are obviously countless scenarios where this causes problems: case-sensitive paths (e.g. picosong), different content on http/https or www/non-www, parameter order significance, and so on. |