#archiveteam-bs 2019-08-30,Fri

↑back Search

Time Nickname Message
00:08 🔗 Darkstar has joined #archiveteam-bs
00:09 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
00:09 🔗 Somebody2 has quit IRC (west.us.hub irc.Prison.NET)
00:20 🔗 achip has joined #archiveteam-bs
00:20 🔗 Somebody2 has joined #archiveteam-bs
00:20 🔗 irc.Prison.NET sets mode: +o Somebody2
00:35 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
00:42 🔗 wyatt8740 has joined #archiveteam-bs
00:46 🔗 killsushi has quit IRC (Quit: Leaving)
03:01 🔗 BlueMax has joined #archiveteam-bs
03:03 🔗 fredgido has quit IRC (Remote host closed the connection)
03:04 🔗 fredgido has joined #archiveteam-bs
03:17 🔗 qw3rty114 has joined #archiveteam-bs
03:24 🔗 odemgi has joined #archiveteam-bs
03:26 🔗 odemgi_ has quit IRC (Ping timeout: 252 seconds)
03:52 🔗 bluefoo has quit IRC (Quit: bluefoo)
04:00 🔗 Anthony1 has quit IRC (Quit: Page closed)
04:26 🔗 larryv has quit IRC (Quit: larryv)
04:35 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
04:35 🔗 Somebody2 has quit IRC (west.us.hub irc.Prison.NET)
04:58 🔗 Somebody2 has joined #archiveteam-bs
04:58 🔗 irc.Prison.NET sets mode: +o Somebody2
05:02 🔗 achip has joined #archiveteam-bs
05:14 🔗 Selavi has quit IRC (Quit: verb. to stop or discontinue)
05:14 🔗 Selavi has joined #archiveteam-bs
06:57 🔗 logchfoo4 starts logging #archiveteam-bs at Fri Aug 30 06:57:08 2019
06:57 🔗 logchfoo4 has joined #archiveteam-bs
07:03 🔗 adasdasd has joined #archiveteam-bs
07:03 🔗 adasdasd has left
07:07 🔗 Gfy has joined #archiveteam-bs
07:58 🔗 Gfy has quit IRC (se.hub efnet.portlane.se)
08:35 🔗 Atom-- has quit IRC (Ping timeout: 252 seconds)
08:43 🔗 Atom-- has joined #archiveteam-bs
08:54 🔗 Gfy has joined #archiveteam-bs
09:07 🔗 ephemer0l has quit IRC (Ping timeout: 745 seconds)
09:37 🔗 ephemer0l has joined #archiveteam-bs
09:56 🔗 m007a83_ has joined #archiveteam-bs
09:56 🔗 m007a83 has quit IRC (Ping timeout: 252 seconds)
10:02 🔗 yano_ has quit IRC (west.us.hub irc.Prison.NET)
10:02 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
10:02 🔗 Somebody2 has quit IRC (west.us.hub irc.Prison.NET)
10:04 🔗 yano has joined #archiveteam-bs
10:44 🔗 achip has joined #archiveteam-bs
10:44 🔗 Somebody2 has joined #archiveteam-bs
10:44 🔗 irc.Prison.NET sets mode: +o Somebody2
11:00 🔗 Raccoon has joined #archiveteam-bs
11:03 🔗 Raccoon` has quit IRC (Read error: Operation timed out)
11:12 🔗 odemgi_ has joined #archiveteam-bs
11:12 🔗 odemgi_ has quit IRC (Connection closed)
11:13 🔗 odemgi_ has joined #archiveteam-bs
11:14 🔗 odemgi has quit IRC (Ping timeout: 252 seconds)
11:19 🔗 tuluu_ has quit IRC (Read error: Connection refused)
11:19 🔗 tuluu has joined #archiveteam-bs
11:41 🔗 BlueMax has quit IRC (Quit: Leaving)
12:00 🔗 BlueMax has joined #archiveteam-bs
12:00 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
12:14 🔗 killsushi has joined #archiveteam-bs
13:05 🔗 DogsRNice has joined #archiveteam-bs
14:03 🔗 killsushi has quit IRC (Quit: Leaving)
14:39 🔗 deevious has quit IRC (Quit: deevious)
15:05 🔗 larryv has joined #archiveteam-bs
16:22 🔗 jeekl_ has joined #archiveteam-bs
16:32 🔗 jeekl has quit IRC (Ping timeout: 745 seconds)
16:32 🔗 jeekl_ is now known as jeekl
16:33 🔗 DogsRNice has quit IRC (Ping timeout: 252 seconds)
16:39 🔗 jeekl_ has joined #archiveteam-bs
16:46 🔗 jeekl has quit IRC (Ping timeout: 745 seconds)
16:46 🔗 jeekl_ is now known as jeekl
17:37 🔗 killsushi has joined #archiveteam-bs
18:44 🔗 DogsRNice has joined #archiveteam-bs
18:49 🔗 odemgi_ has quit IRC (Ping timeout: 604 seconds)
18:53 🔗 Mateon1 has quit IRC (Remote host closed the connection)
19:13 🔗 n00buser has quit IRC (Ping timeout: 360 seconds)
19:30 🔗 kpcyrd Sanqui: can you test if either ! or # are causing the issue? "!" is a special case in bash and I'm wondering if it's a quoting issue
19:30 🔗 Sanqui kpcyrd: just from the log of downloaded urls I have a hunch it simply ignores urls with fragments
19:30 🔗 Sanqui or rather, strips the fragments
19:31 🔗 Sanqui since it attempted to download http://www.speedrunslive.com/races/game/, which is an empty url
19:31 🔗 Sanqui it should request e.g. http://www.speedrunslive.com/races/game/#!/sm64/1
19:32 🔗 Sanqui (#! is an old way to develop js-only websites, dead practice today)
19:32 🔗 Sanqui (which... is exactly why I want to save that site, incidentally)
19:56 🔗 odemgi has joined #archiveteam-bs
20:04 🔗 tuluu has quit IRC (Read error: Connection refused)
20:05 🔗 tuluu has joined #archiveteam-bs
20:16 🔗 godane has quit IRC (Read error: Connection reset by peer)
20:25 🔗 JAA Sanqui: Actshually, the hash-bang fragment isn't sent to the server. It should indeed request http://www.speedrunslive.com/races/game/ . The rest is handled by JS.
20:25 🔗 JAA But in this context, it should pass the full http://www.speedrunslive.com/races/game/#!/sm64/1 URL into Chromium of course.
20:25 🔗 Sanqui JAA: I understand how it works, but it doesn't seem to be capturing the proper URL including the fragment anyway
20:25 🔗 JAA I also wonder how this should be recorded in the WARC.
20:26 🔗 JAA If you use the full URI in the WARC-Target-URI, you might run into issues on playback. If you only use the fragment-less URL, you'll run into even more issues.
20:27 🔗 JAA PurpleSym: ^ crocoite isn't handling hash-bang fragments correctly apparently.
20:27 🔗 Sanqui Well, the main difference is in the data the JS requests from the server. There's no point in capturing the full URI in WARCs because each will be a duplicate.
20:28 🔗 JAA If it does any requests.
20:28 🔗 Sanqui ultimately the data is exposed through an API, e.g. http://api.speedrunslive.com/pastraces?game=sm64&page=1&season=0&pageSize=16
20:29 🔗 JAA Ok yeah, in this case it does.
20:29 🔗 JAA But in others, it might just parse the fragment and then display the appropriate data which is already somewhere in the JS file or something.
20:29 🔗 Sanqui I suppose yeah.
20:29 🔗 Sanqui Shouldn't change anything in the archival case though.
20:31 🔗 JAA Yeah, you're right, I guess it shouldn't matter much for the archival itself. There might still be issues on playback though, depending on how the JS parses the URI.
20:31 🔗 JAA But that's no different from other JS-laden pages.
20:31 🔗 Sanqui BTW, even Twitter used to use hashbang #! urls in days of yore.
20:32 🔗 JAA Yeah, #! was the cool kid on the block for a couple years.
20:32 🔗 JAA Facebook did it as well.
20:32 🔗 Sanqui We need to start project hashbang.
20:33 🔗 JAA By the way, just in case someone searches for this one day and wants to look at the actual specification for this abomination, the official name of it is "AJAX Crawling", and the specs are at https://developers.google.com/search/docs/ajax-crawling/docs/specification .
20:35 🔗 Sanqui Oh, this is another thing actually
20:35 🔗 Sanqui It's a method to make hashbang urls googlebot-friendly. I never knew about this.
20:36 🔗 Sanqui BTW, as far as Wayback Machine goes, these URLs fail to render because the API requests are not passed over to web.archive.org:
20:36 🔗 Sanqui https://screen.sanqui.net/2019-08-30T22:36:45-Gamelist-SpeedRunsLive-MozillaFirefox.png
20:37 🔗 JAA Yes, technically it's a different thing, but hash-bang fragments were hardly used before that spec as far as I know.
20:37 🔗 JAA Ah yeah, the usual broken JS rewrite mechanism of the WBM.
20:37 🔗 Sanqui That said no request with sm64 was made, so some more stuff may be broken.
20:38 🔗 JAA I wonder if they'll ever look into service workers...
20:38 🔗 JAA That would fix just about every single one of those broken XHR issues.
20:38 🔗 godane has joined #archiveteam-bs
20:38 🔗 Sanqui Wayback Machine actually does contain some of the API requests. Looks like Live Web Proxy handles them:
20:38 🔗 Sanqui https://web.archive.org/web/20190823103412/http://api.speedrunslive.com//pastraces?game=sm64&page=1&season=0&pageSize=16
20:46 🔗 atbk has joined #archiveteam-bs
21:06 🔗 godane SketchCow: i'm uploading the recaptured acid tape
21:35 🔗 godane SketchCow: so i'm looking to mirror the Federal Register pdfs
21:35 🔗 godane it looks like you guys sort of have it
21:35 🔗 godane but i don't think its in the 3500 to 4500 number range
21:36 🔗 godane 2nd problem is that urls and titles don't have dates in them at all
21:36 🔗 godane the titles just say Federal register
21:37 🔗 godane its close to the 3500 number cause there is only 3469 texts items when look for federal register
21:46 🔗 n00buser has joined #archiveteam-bs
21:56 🔗 n00buser has quit IRC (Ping timeout: 360 seconds)
22:32 🔗 godane has quit IRC (Read error: Operation timed out)
22:39 🔗 BlueMax has joined #archiveteam-bs
22:42 🔗 godane has joined #archiveteam-bs
22:55 🔗 larryv has quit IRC (Quit: larryv)
23:06 🔗 godane first Federal Register issue : https://archive.org/details/federal-register-1936-03-14
23:14 🔗 larryv has joined #archiveteam-bs
23:49 🔗 godane SketchCow: so i found American Atheist Magazine on Scribd so i'm grabbing it
23:50 🔗 godane i couldn't find the collection on archive.org

irclogger-viewer