[03:09] *** qw3rty__ has joined #internetarchive [03:16] *** qw3rty_ has quit IRC (Read error: Operation timed out) [03:31] *** Coderjo has quit IRC (Read error: Operation timed out) [03:35] *** Coderjo has joined #internetarchive [03:50] *** Craigle has quit IRC (Quit: Ping timeout (120 seconds)) [03:52] *** Craigle has joined #internetarchive [03:59] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [08:42] *** Raccoon has quit IRC (Remote host closed the connection) [08:43] *** OrIdow6 has quit IRC (Quit: Leaving.) [08:52] *** kiska182 has quit IRC (Remote host closed the connection) [08:52] *** Ryz has quit IRC (Remote host closed the connection) [08:53] *** Ryz has joined #internetarchive [08:53] *** kiska182 has joined #internetarchive [08:58] *** eythian has quit IRC (Remote host closed the connection) [08:59] *** eythian has joined #internetarchive [09:00] *** eythian has quit IRC (Remote host closed the connection) [15:12] *** Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) [16:16] Sigh, so you know about the bug of non-www and www websites in IA considered the same? This also applies with anything with a number after the 'www' within it [16:16] https://web.archive.org/web/*/https://www5.laserfiche.com/ is the same as https://web.archive.org/web/*/https://www.laserfiche.com/ [16:16] Which means I can also put something ridiculous like https://web.archive.org/web/*/https://www50000000000000000000000000.laserfiche.com/ and it'll still think it's the same domain [19:04] *** Craigle has joined #internetarchive [19:31] *** jodizzle has quit IRC (Quit: ZNC - https://znc.in) [19:33] *** jodizzle has joined #internetarchive [21:15] *** Stilett0 is now known as Stiletto [23:44] Depending on how it is implemented, I'm not sure I'd interpret that as a bug. It could be potentially a graceful fallback mechanism to weave together partially archived, or otherwise weirdly designed websites. Especially if the logic is to try to retrieve the *exact* URL requested, and failing that, then try to retrieve other variations as fallback alternatives. [23:46] I haven't tried enough combinations to intimately know how this behaves. But it sounds like it is effectively providing redirects from one to the other. [23:50] The protocol, auth data, and a leading www\d*\. is simply stripped from the URL entirely, both internally when processing WARCs and when you request something through the WBM. [23:51] There are also other "canonicalisations", as IA calls them, like stripping session IDs, converting everything to lower case, reordering the parameters alphabetically, and some others. [23:51] All snapshots that have the same canonical URL are treated exactly the same. [23:53] There are obviously countless scenarios where this causes problems: case-sensitive paths (e.g. picosong), different content on http/https or www/non-www, parameter order significance, and so on.