Time |
Nickname |
Message |
00:39
🔗
|
|
wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) |
00:51
🔗
|
|
wp494 has joined #archiveteam-bs |
03:31
🔗
|
|
qw3rty112 has joined #archiveteam-bs |
03:31
🔗
|
|
odemgi has joined #archiveteam-bs |
03:33
🔗
|
|
odemgi_ has quit IRC (Ping timeout: 252 seconds) |
03:37
🔗
|
|
qw3rty111 has quit IRC (Read error: Operation timed out) |
04:02
🔗
|
|
OrIdow has joined #archiveteam-bs |
04:23
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
05:14
🔗
|
|
systwi has quit IRC (Remote host closed the connection) |
05:33
🔗
|
|
systwi has joined #archiveteam-bs |
06:40
🔗
|
|
Quirk8 has quit IRC (END OF LINE) |
06:41
🔗
|
|
Quirk8 has joined #archiveteam-bs |
06:47
🔗
|
|
DigiDigi has quit IRC (Remote host closed the connection) |
06:49
🔗
|
|
odemgi has quit IRC (Remote host closed the connection) |
06:49
🔗
|
|
odemgi has joined #archiveteam-bs |
08:14
🔗
|
|
icedice has joined #archiveteam-bs |
08:16
🔗
|
|
icedice has quit IRC (Client Quit) |
09:07
🔗
|
|
odemgi has quit IRC (Read error: Connection reset by peer) |
09:14
🔗
|
|
odemgi has joined #archiveteam-bs |
09:49
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
09:49
🔗
|
|
wp494 has joined #archiveteam-bs |
10:27
🔗
|
eientei95 |
JAA: So how does the "?archiveteam" thing work for session IDs? |
10:31
🔗
|
JAA |
eientei95: It works if the server is configured such that it includes session IDs in the links unless the session cookie is present. It's really just a trick to avoid pollution with those session-ID-laden links on the actually relevant pages. |
10:32
🔗
|
JAA |
Basically, it just loads some random page that nobody will ever look at. This sets the cookie, and then when it gets to the actual homepage, the cookie's already set, so the links are clean. |
10:32
🔗
|
eientei95 |
So it wouldn't work out of the box for Blogger's "Content Warning"? |
10:33
🔗
|
JAA |
I don't know how that works. |
10:33
🔗
|
JAA |
The method should in principle work for anything where the server sets a cookie and responds differently when that cookie is present. |
10:33
🔗
|
JAA |
It also requires a link back to the page that you really want to start the archive from. |
10:38
🔗
|
eientei95 |
Blogger's is: Load page, show warning and have user click "I UNDERSTAND AND I WISH TO CONTINUE", browser goes to "?interstitial=<value>" link which sets the "INTERSTITIAL" cookie and loads the page with "?zx=<value>", after that you can load the first URL (without any arguments) and it'll load fine |
10:39
🔗
|
JAA |
Have an example? |
10:40
🔗
|
JAA |
If it's a series of links or redirects and not buttons with JS, it might work. |
10:42
🔗
|
eientei95 |
https://bobmenezes.blogspot.com is one example from a Google search (site:blogspot.com "Content Warning") |
10:44
🔗
|
JAA |
Yeah, might work. |
10:44
🔗
|
JAA |
It's a series of redirects and then a link back to the same domain. |
10:46
🔗
|
eientei95 |
<iframe src="https://www.blogger.com/blogin.g?blogspotURL=<url>" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" height="100%" width="100%" id="injected-iframe" style="background-color:white; position:absolute; top:0; left:0; z-index:999; display:block; visibility:visible"></iframe> |
10:47
🔗
|
JAA |
Let's test it... |
11:15
🔗
|
JAA |
eientei95: Looks like it doesn't work, probably due to the way ArchiveBot sets the NCR cookie for Blogger. |
11:18
🔗
|
JAA |
Can you try it with grab-site? |
11:19
🔗
|
JAA |
Actually, nvm, I'll try that. |
11:20
🔗
|
JAA |
Right, grab-site has an ignore for ^https?://accounts\.google\.com/(SignUp|ServiceLogin|AccountChooser|a/UniversalLogin), so that won't work either. |
11:22
🔗
|
eientei95 |
I've found that blogspot blogs, if you bypass the iframe, it has css that makes "body *" invisible |
11:28
🔗
|
JAA |
Looks like it's not possible to remove the global igset on grab-site. :-/ |
11:29
🔗
|
JAA |
Issue filed: https://github.com/ArchiveTeam/ArchiveBot/issues/416 |
11:30
🔗
|
JAA |
That's been on my list for a long time, but I haven't decided yet how it should be implemented (putting a cookie jar on the pipelines or pulling it from the control node when the job starts). |
11:32
🔗
|
eientei95 |
Cool, thanks |
12:01
🔗
|
|
OrIdow has quit IRC (Quit: Leaving.) |
12:19
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
15:37
🔗
|
|
Cameron_D has quit IRC (Read error: Operation timed out) |
15:42
🔗
|
|
Cameron_D has joined #archiveteam-bs |
16:51
🔗
|
|
Sauce has joined #archiveteam-bs |
16:52
🔗
|
ivan_ |
godane: I'm grabbing the audio for radiotalk.jp now |
16:52
🔗
|
Sauce |
and I was attempting to grab screech which was at screech [dot] xyz |
16:53
🔗
|
Sauce |
but it was shut down as of april/may |
16:53
🔗
|
Sauce |
but I guess the creator still has a backup copy of the database |
16:55
🔗
|
Sauce |
so that's good |
17:16
🔗
|
|
Hani has quit IRC (Quit: Hani) |
17:30
🔗
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
17:33
🔗
|
godane |
ivan_: good to hear |
17:33
🔗
|
godane |
at least i don't have to do it |
17:35
🔗
|
Sauce |
ikr |
18:04
🔗
|
|
DogsRNice has joined #archiveteam-bs |
18:12
🔗
|
|
trc has joined #archiveteam-bs |
18:52
🔗
|
|
Sauce has quit IRC (Read error: Connection reset by peer) |
20:27
🔗
|
|
RichardG_ has joined #archiveteam-bs |
20:35
🔗
|
|
RichardG has quit IRC (Ping timeout: 615 seconds) |
20:59
🔗
|
|
Hani has joined #archiveteam-bs |
21:20
🔗
|
|
DigiDigi has joined #archiveteam-bs |
21:40
🔗
|
|
xLovely has joined #archiveteam-bs |
22:00
🔗
|
|
xLovely has quit IRC (Quit: Leaving) |
22:10
🔗
|
|
trc has quit IRC (Read error: Connection reset by peer) |
22:10
🔗
|
|
trc has joined #archiveteam-bs |
22:16
🔗
|
|
DogsRNice has quit IRC (Ping timeout: 252 seconds) |
22:17
🔗
|
|
DogsRNice has joined #archiveteam-bs |
23:31
🔗
|
|
Raccoon has quit IRC (Ping timeout: 252 seconds) |
23:32
🔗
|
|
Raccoon has joined #archiveteam-bs |
23:45
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
23:48
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:49
🔗
|
|
wp494 has joined #archiveteam-bs |