[00:08] *** Darkstar has joined #archiveteam-bs [00:09] *** achip has quit IRC (west.us.hub irc.Prison.NET) [00:09] *** Somebody2 has quit IRC (west.us.hub irc.Prison.NET) [00:20] *** achip has joined #archiveteam-bs [00:20] *** Somebody2 has joined #archiveteam-bs [00:20] *** irc.Prison.NET sets mode: +o Somebody2 [00:35] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [00:42] *** wyatt8740 has joined #archiveteam-bs [00:46] *** killsushi has quit IRC (Quit: Leaving) [03:01] *** BlueMax has joined #archiveteam-bs [03:03] *** fredgido has quit IRC (Remote host closed the connection) [03:04] *** fredgido has joined #archiveteam-bs [03:17] *** qw3rty114 has joined #archiveteam-bs [03:24] *** odemgi has joined #archiveteam-bs [03:26] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [03:52] *** bluefoo has quit IRC (Quit: bluefoo) [04:00] *** Anthony1 has quit IRC (Quit: Page closed) [04:26] *** larryv has quit IRC (Quit: larryv) [04:35] *** achip has quit IRC (west.us.hub irc.Prison.NET) [04:35] *** Somebody2 has quit IRC (west.us.hub irc.Prison.NET) [04:58] *** Somebody2 has joined #archiveteam-bs [04:58] *** irc.Prison.NET sets mode: +o Somebody2 [05:02] *** achip has joined #archiveteam-bs [05:14] *** Selavi has quit IRC (Quit: verb. to stop or discontinue) [05:14] *** Selavi has joined #archiveteam-bs [06:57] *** logchfoo4 starts logging #archiveteam-bs at Fri Aug 30 06:57:08 2019 [06:57] *** logchfoo4 has joined #archiveteam-bs [07:03] *** adasdasd has joined #archiveteam-bs [07:03] *** adasdasd has left [07:07] *** Gfy has joined #archiveteam-bs [07:58] *** Gfy has quit IRC (se.hub efnet.portlane.se) [08:35] *** Atom-- has quit IRC (Ping timeout: 252 seconds) [08:43] *** Atom-- has joined #archiveteam-bs [08:54] *** Gfy has joined #archiveteam-bs [09:07] *** ephemer0l has quit IRC (Ping timeout: 745 seconds) [09:37] *** ephemer0l has joined #archiveteam-bs [09:56] *** m007a83_ has joined #archiveteam-bs [09:56] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [10:02] *** yano_ has quit IRC (west.us.hub irc.Prison.NET) [10:02] *** achip has quit IRC (west.us.hub irc.Prison.NET) [10:02] *** Somebody2 has quit IRC (west.us.hub irc.Prison.NET) [10:04] *** yano has joined #archiveteam-bs [10:44] *** achip has joined #archiveteam-bs [10:44] *** Somebody2 has joined #archiveteam-bs [10:44] *** irc.Prison.NET sets mode: +o Somebody2 [11:00] *** Raccoon has joined #archiveteam-bs [11:03] *** Raccoon` has quit IRC (Read error: Operation timed out) [11:12] *** odemgi_ has joined #archiveteam-bs [11:12] *** odemgi_ has quit IRC (Connection closed) [11:13] *** odemgi_ has joined #archiveteam-bs [11:14] *** odemgi has quit IRC (Ping timeout: 252 seconds) [11:19] *** tuluu_ has quit IRC (Read error: Connection refused) [11:19] *** tuluu has joined #archiveteam-bs [11:41] *** BlueMax has quit IRC (Quit: Leaving) [12:00] *** BlueMax has joined #archiveteam-bs [12:00] *** BlueMax has quit IRC (Read error: Connection reset by peer) [12:14] *** killsushi has joined #archiveteam-bs [13:05] *** DogsRNice has joined #archiveteam-bs [14:03] *** killsushi has quit IRC (Quit: Leaving) [14:39] *** deevious has quit IRC (Quit: deevious) [15:05] *** larryv has joined #archiveteam-bs [16:22] *** jeekl_ has joined #archiveteam-bs [16:32] *** jeekl has quit IRC (Ping timeout: 745 seconds) [16:32] *** jeekl_ is now known as jeekl [16:33] *** DogsRNice has quit IRC (Ping timeout: 252 seconds) [16:39] *** jeekl_ has joined #archiveteam-bs [16:46] *** jeekl has quit IRC (Ping timeout: 745 seconds) [16:46] *** jeekl_ is now known as jeekl [17:37] *** killsushi has joined #archiveteam-bs [18:44] *** DogsRNice has joined #archiveteam-bs [18:49] *** odemgi_ has quit IRC (Ping timeout: 604 seconds) [18:53] *** Mateon1 has quit IRC (Remote host closed the connection) [19:13] *** n00buser has quit IRC (Ping timeout: 360 seconds) [19:30] Sanqui: can you test if either ! or # are causing the issue? "!" is a special case in bash and I'm wondering if it's a quoting issue [19:30] kpcyrd: just from the log of downloaded urls I have a hunch it simply ignores urls with fragments [19:30] or rather, strips the fragments [19:31] since it attempted to download http://www.speedrunslive.com/races/game/, which is an empty url [19:31] it should request e.g. http://www.speedrunslive.com/races/game/#!/sm64/1 [19:32] (#! is an old way to develop js-only websites, dead practice today) [19:32] (which... is exactly why I want to save that site, incidentally) [19:56] *** odemgi has joined #archiveteam-bs [20:04] *** tuluu has quit IRC (Read error: Connection refused) [20:05] *** tuluu has joined #archiveteam-bs [20:16] *** godane has quit IRC (Read error: Connection reset by peer) [20:25] Sanqui: Actshually, the hash-bang fragment isn't sent to the server. It should indeed request http://www.speedrunslive.com/races/game/ . The rest is handled by JS. [20:25] But in this context, it should pass the full http://www.speedrunslive.com/races/game/#!/sm64/1 URL into Chromium of course. [20:25] JAA: I understand how it works, but it doesn't seem to be capturing the proper URL including the fragment anyway [20:25] I also wonder how this should be recorded in the WARC. [20:26] If you use the full URI in the WARC-Target-URI, you might run into issues on playback. If you only use the fragment-less URL, you'll run into even more issues. [20:27] PurpleSym: ^ crocoite isn't handling hash-bang fragments correctly apparently. [20:27] Well, the main difference is in the data the JS requests from the server. There's no point in capturing the full URI in WARCs because each will be a duplicate. [20:28] If it does any requests. [20:28] ultimately the data is exposed through an API, e.g. http://api.speedrunslive.com/pastraces?game=sm64&page=1&season=0&pageSize=16 [20:29] Ok yeah, in this case it does. [20:29] But in others, it might just parse the fragment and then display the appropriate data which is already somewhere in the JS file or something. [20:29] I suppose yeah. [20:29] Shouldn't change anything in the archival case though. [20:31] Yeah, you're right, I guess it shouldn't matter much for the archival itself. There might still be issues on playback though, depending on how the JS parses the URI. [20:31] But that's no different from other JS-laden pages. [20:31] BTW, even Twitter used to use hashbang #! urls in days of yore. [20:32] Yeah, #! was the cool kid on the block for a couple years. [20:32] Facebook did it as well. [20:32] We need to start project hashbang. [20:33] By the way, just in case someone searches for this one day and wants to look at the actual specification for this abomination, the official name of it is "AJAX Crawling", and the specs are at https://developers.google.com/search/docs/ajax-crawling/docs/specification . [20:35] Oh, this is another thing actually [20:35] It's a method to make hashbang urls googlebot-friendly. I never knew about this. [20:36] BTW, as far as Wayback Machine goes, these URLs fail to render because the API requests are not passed over to web.archive.org: [20:36] https://screen.sanqui.net/2019-08-30T22:36:45-Gamelist-SpeedRunsLive-MozillaFirefox.png [20:37] Yes, technically it's a different thing, but hash-bang fragments were hardly used before that spec as far as I know. [20:37] Ah yeah, the usual broken JS rewrite mechanism of the WBM. [20:37] That said no request with sm64 was made, so some more stuff may be broken. [20:38] I wonder if they'll ever look into service workers... [20:38] That would fix just about every single one of those broken XHR issues. [20:38] *** godane has joined #archiveteam-bs [20:38] Wayback Machine actually does contain some of the API requests. Looks like Live Web Proxy handles them: [20:38] https://web.archive.org/web/20190823103412/http://api.speedrunslive.com//pastraces?game=sm64&page=1&season=0&pageSize=16 [20:46] *** atbk has joined #archiveteam-bs [21:06] SketchCow: i'm uploading the recaptured acid tape [21:35] SketchCow: so i'm looking to mirror the Federal Register pdfs [21:35] it looks like you guys sort of have it [21:35] but i don't think its in the 3500 to 4500 number range [21:36] 2nd problem is that urls and titles don't have dates in them at all [21:36] the titles just say Federal register [21:37] its close to the 3500 number cause there is only 3469 texts items when look for federal register [21:46] *** n00buser has joined #archiveteam-bs [21:56] *** n00buser has quit IRC (Ping timeout: 360 seconds) [22:32] *** godane has quit IRC (Read error: Operation timed out) [22:39] *** BlueMax has joined #archiveteam-bs [22:42] *** godane has joined #archiveteam-bs [22:55] *** larryv has quit IRC (Quit: larryv) [23:06] first Federal Register issue : https://archive.org/details/federal-register-1936-03-14 [23:14] *** larryv has joined #archiveteam-bs [23:49] SketchCow: so i found American Atheist Magazine on Scribd so i'm grabbing it [23:50] i couldn't find the collection on archive.org