Time |
Nickname |
Message |
01:28
🔗
|
|
edsu has quit IRC (Read error: Operation timed out) |
01:35
🔗
|
|
edsu has joined #internetarchive |
01:39
🔗
|
|
X-Scale has quit IRC (Ping timeout: 240 seconds) |
01:40
🔗
|
|
[X-Scale] has joined #internetarchive |
03:44
🔗
|
|
nicolas17 has joined #internetarchive |
03:44
🔗
|
nicolas17 |
hi, what's the user-agent used by the wayback machine crawler? |
03:47
🔗
|
nicolas17 |
I'm trying to archive http://www.casarosada.gob.ar/informacion/eventos-destacados-presi/39048-el-gobierno-firmo-convenios-con-la-casa-ana-frank-para-promover-el-dialogo-y-la-tolerancia and I'm getting a blank page |
03:48
🔗
|
nicolas17 |
so I'm wondering if they block IA |
03:48
🔗
|
xmc |
https://archive.org/details/archive.org_bot |
03:48
🔗
|
xmc |
"User Agent archive.org_bot is used for our wide crawl of the web" |
03:48
🔗
|
nicolas17 |
I'm using the "Save Page Now" thing, will that do things differently? |
03:48
🔗
|
xmc |
i'm not sure |
03:49
🔗
|
nicolas17 |
they do block curl (response becomes "<html><head></head><body><br><br></body></html>" if the user-agent is curl) |
03:50
🔗
|
xmc |
that's positively adversarial |
03:50
🔗
|
nicolas17 |
but curl -A "archive.org_bot" gives the normal page |
03:50
🔗
|
nicolas17 |
so I dunno |
04:47
🔗
|
|
X-Scale has joined #internetarchive |
04:48
🔗
|
|
[X-Scale] has quit IRC (Ping timeout: 240 seconds) |
06:14
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 250 seconds) |
06:22
🔗
|
|
nicolas17 has quit IRC (Quit: Konversation terminated!) |
07:34
🔗
|
|
atomotic has joined #internetarchive |
08:00
🔗
|
|
X-Scale has quit IRC (Quit: Try HydraIRC -> http://www.hydrairc.com <-) |
08:23
🔗
|
|
atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
08:28
🔗
|
|
atomotic has joined #internetarchive |
09:33
🔗
|
|
atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
09:34
🔗
|
|
kyounko has joined #internetarchive |
10:08
🔗
|
|
atomotic has joined #internetarchive |
14:00
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
15:21
🔗
|
|
nicolas17 has joined #internetarchive |
15:30
🔗
|
|
atomotic has joined #internetarchive |
16:51
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
17:14
🔗
|
|
Stilett0 has joined #internetarchive |
20:31
🔗
|
|
SketchCow has quit IRC (Quit: Lost terminal) |
20:38
🔗
|
|
SketchCow has joined #internetarchive |
23:19
🔗
|
|
butchster has quit IRC (Ping timeout: 492 seconds) |
23:25
🔗
|
|
kyan has joined #internetarchive |