#internetarchive 2020-04-01,Wed

↑back Search

Time Nickname Message
03:09 🔗 qw3rty__ has joined #internetarchive
03:16 🔗 qw3rty_ has quit IRC (Read error: Operation timed out)
03:31 🔗 Coderjo has quit IRC (Read error: Operation timed out)
03:35 🔗 Coderjo has joined #internetarchive
03:50 🔗 Craigle has quit IRC (Quit: Ping timeout (120 seconds))
03:52 🔗 Craigle has joined #internetarchive
03:59 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
08:42 🔗 Raccoon has quit IRC (Remote host closed the connection)
08:43 🔗 OrIdow6 has quit IRC (Quit: Leaving.)
08:52 🔗 kiska182 has quit IRC (Remote host closed the connection)
08:52 🔗 Ryz has quit IRC (Remote host closed the connection)
08:53 🔗 Ryz has joined #internetarchive
08:53 🔗 kiska182 has joined #internetarchive
08:58 🔗 eythian has quit IRC (Remote host closed the connection)
08:59 🔗 eythian has joined #internetarchive
09:00 🔗 eythian has quit IRC (Remote host closed the connection)
15:12 🔗 Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat)
16:16 🔗 Ryz Sigh, so you know about the bug of non-www and www websites in IA considered the same? This also applies with anything with a number after the 'www' within it
16:16 🔗 Ryz https://web.archive.org/web/*/https://www5.laserfiche.com/ is the same as https://web.archive.org/web/*/https://www.laserfiche.com/
16:16 🔗 Ryz Which means I can also put something ridiculous like https://web.archive.org/web/*/https://www50000000000000000000000000.laserfiche.com/ and it'll still think it's the same domain
19:04 🔗 Craigle has joined #internetarchive
19:31 🔗 jodizzle has quit IRC (Quit: ZNC - https://znc.in)
19:33 🔗 jodizzle has joined #internetarchive
21:15 🔗 Stilett0 is now known as Stiletto
23:44 🔗 atphoenix Depending on how it is implemented, I'm not sure I'd interpret that as a bug. It could be potentially a graceful fallback mechanism to weave together partially archived, or otherwise weirdly designed websites. Especially if the logic is to try to retrieve the *exact* URL requested, and failing that, then try to retrieve other variations as fallback alternatives.
23:46 🔗 atphoenix I haven't tried enough combinations to intimately know how this behaves. But it sounds like it is effectively providing redirects from one to the other.
23:50 🔗 JAA The protocol, auth data, and a leading www\d*\. is simply stripped from the URL entirely, both internally when processing WARCs and when you request something through the WBM.
23:51 🔗 JAA There are also other "canonicalisations", as IA calls them, like stripping session IDs, converting everything to lower case, reordering the parameters alphabetically, and some others.
23:51 🔗 JAA All snapshots that have the same canonical URL are treated exactly the same.
23:53 🔗 JAA There are obviously countless scenarios where this causes problems: case-sensitive paths (e.g. picosong), different content on http/https or www/non-www, parameter order significance, and so on.

irclogger-viewer