#archiveteam-bs 2019-08-24,Sat

↑back Search

Time Nickname Message
00:39 🔗 wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES)
00:51 🔗 wp494 has joined #archiveteam-bs
03:31 🔗 qw3rty112 has joined #archiveteam-bs
03:31 🔗 odemgi has joined #archiveteam-bs
03:33 🔗 odemgi_ has quit IRC (Ping timeout: 252 seconds)
03:37 🔗 qw3rty111 has quit IRC (Read error: Operation timed out)
04:02 🔗 OrIdow has joined #archiveteam-bs
04:23 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
05:14 🔗 systwi has quit IRC (Remote host closed the connection)
05:33 🔗 systwi has joined #archiveteam-bs
06:40 🔗 Quirk8 has quit IRC (END OF LINE)
06:41 🔗 Quirk8 has joined #archiveteam-bs
06:47 🔗 DigiDigi has quit IRC (Remote host closed the connection)
06:49 🔗 odemgi has quit IRC (Remote host closed the connection)
06:49 🔗 odemgi has joined #archiveteam-bs
08:14 🔗 icedice has joined #archiveteam-bs
08:16 🔗 icedice has quit IRC (Client Quit)
09:07 🔗 odemgi has quit IRC (Read error: Connection reset by peer)
09:14 🔗 odemgi has joined #archiveteam-bs
09:49 🔗 wp494 has quit IRC (Read error: Operation timed out)
09:49 🔗 wp494 has joined #archiveteam-bs
10:27 🔗 eientei95 JAA: So how does the "?archiveteam" thing work for session IDs?
10:31 🔗 JAA eientei95: It works if the server is configured such that it includes session IDs in the links unless the session cookie is present. It's really just a trick to avoid pollution with those session-ID-laden links on the actually relevant pages.
10:32 🔗 JAA Basically, it just loads some random page that nobody will ever look at. This sets the cookie, and then when it gets to the actual homepage, the cookie's already set, so the links are clean.
10:32 🔗 eientei95 So it wouldn't work out of the box for Blogger's "Content Warning"?
10:33 🔗 JAA I don't know how that works.
10:33 🔗 JAA The method should in principle work for anything where the server sets a cookie and responds differently when that cookie is present.
10:33 🔗 JAA It also requires a link back to the page that you really want to start the archive from.
10:38 🔗 eientei95 Blogger's is: Load page, show warning and have user click "I UNDERSTAND AND I WISH TO CONTINUE", browser goes to "?interstitial=<value>" link which sets the "INTERSTITIAL" cookie and loads the page with "?zx=<value>", after that you can load the first URL (without any arguments) and it'll load fine
10:39 🔗 JAA Have an example?
10:40 🔗 JAA If it's a series of links or redirects and not buttons with JS, it might work.
10:42 🔗 eientei95 https://bobmenezes.blogspot.com is one example from a Google search (site:blogspot.com "Content Warning")
10:44 🔗 JAA Yeah, might work.
10:44 🔗 JAA It's a series of redirects and then a link back to the same domain.
10:46 🔗 eientei95 <iframe src="https://www.blogger.com/blogin.g?blogspotURL=<url>" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" height="100%" width="100%" id="injected-iframe" style="background-color:white; position:absolute; top:0; left:0; z-index:999; display:block; visibility:visible"></iframe>
10:47 🔗 JAA Let's test it...
11:15 🔗 JAA eientei95: Looks like it doesn't work, probably due to the way ArchiveBot sets the NCR cookie for Blogger.
11:18 🔗 JAA Can you try it with grab-site?
11:19 🔗 JAA Actually, nvm, I'll try that.
11:20 🔗 JAA Right, grab-site has an ignore for ^https?://accounts\.google\.com/(SignUp|ServiceLogin|AccountChooser|a/UniversalLogin), so that won't work either.
11:22 🔗 eientei95 I've found that blogspot blogs, if you bypass the iframe, it has css that makes "body *" invisible
11:28 🔗 JAA Looks like it's not possible to remove the global igset on grab-site. :-/
11:29 🔗 JAA Issue filed: https://github.com/ArchiveTeam/ArchiveBot/issues/416
11:30 🔗 JAA That's been on my list for a long time, but I haven't decided yet how it should be implemented (putting a cookie jar on the pipelines or pulling it from the control node when the job starts).
11:32 🔗 eientei95 Cool, thanks
12:01 🔗 OrIdow has quit IRC (Quit: Leaving.)
12:19 🔗 BlueMax has quit IRC (Quit: Leaving)
15:37 🔗 Cameron_D has quit IRC (Read error: Operation timed out)
15:42 🔗 Cameron_D has joined #archiveteam-bs
16:51 🔗 Sauce has joined #archiveteam-bs
16:52 🔗 ivan_ godane: I'm grabbing the audio for radiotalk.jp now
16:52 🔗 Sauce and I was attempting to grab screech which was at screech [dot] xyz
16:53 🔗 Sauce but it was shut down as of april/may
16:53 🔗 Sauce but I guess the creator still has a backup copy of the database
16:55 🔗 Sauce so that's good
17:16 🔗 Hani has quit IRC (Quit: Hani)
17:30 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
17:33 🔗 godane ivan_: good to hear
17:33 🔗 godane at least i don't have to do it
17:35 🔗 Sauce ikr
18:04 🔗 DogsRNice has joined #archiveteam-bs
18:12 🔗 trc has joined #archiveteam-bs
18:52 🔗 Sauce has quit IRC (Read error: Connection reset by peer)
20:27 🔗 RichardG_ has joined #archiveteam-bs
20:35 🔗 RichardG has quit IRC (Ping timeout: 615 seconds)
20:59 🔗 Hani has joined #archiveteam-bs
21:20 🔗 DigiDigi has joined #archiveteam-bs
21:40 🔗 xLovely has joined #archiveteam-bs
22:00 🔗 xLovely has quit IRC (Quit: Leaving)
22:10 🔗 trc has quit IRC (Read error: Connection reset by peer)
22:10 🔗 trc has joined #archiveteam-bs
22:16 🔗 DogsRNice has quit IRC (Ping timeout: 252 seconds)
22:17 🔗 DogsRNice has joined #archiveteam-bs
23:31 🔗 Raccoon has quit IRC (Ping timeout: 252 seconds)
23:32 🔗 Raccoon has joined #archiveteam-bs
23:45 🔗 wp494 has quit IRC (Read error: Operation timed out)
23:48 🔗 BlueMax has joined #archiveteam-bs
23:49 🔗 wp494 has joined #archiveteam-bs

irclogger-viewer