#newsgrabber 2017-07-10,Mon

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***kyan has quit IRC (Remote host closed the connection)
kyan has joined #newsgrabber
kyan has quit IRC (Remote host closed the connection)
[01:46]
........... (idle for 53mn)
ErkDog has quit IRC (Read error: Connection reset by peer)
ErkDog has joined #newsgrabber
ErkDog has quit IRC (Remote host closed the connection!)
ErkDog has joined #newsgrabber
[02:42]
......................... (idle for 2h4mn)
underscorjrwr: idk, this is what I get
https://p.defau.lt/?mAJ0jpn3JjX9HLvP_wiS9Q
it doesn't happen when I'm going straight to wayback
ah
jrwr: this is probably why: https://p.defau.lt/?oFNPyt_BnZD1CX_lDpDQ5Q
so I think everyone's items are just failing right now, if they're using the current proxy-dedupe setup
(cc arkiver HCross2)
[04:48]
......... (idle for 40mn)
***Fletcher_ has joined #newsgrabber [05:34]
............................ (idle for 2h18mn)
underscorout count jsut keeps climbing and climbing with failed jobs [07:52]
Kaztracker crashes if I try to load the claims page, so can't tell you who it is [07:56]
....... (idle for 33mn)
looks like HCross2 [08:29]
HCross2Hm.. so I'm grabbing but not uploading [08:30]
underscorProbably because dedupe is broken?
I think everyone who is claiming is failing if they're running HEAD code
jrwr.io:4444 is just returning a 302 to the same url you give it, so the request gives up after 30 redirects (to the same resource)
[08:38]
HCross2Kaz: I've just killed my grabs [08:45]
KazI've paused mine as of last night - doesn't look like they were doing much anyway now [08:48]
HCross2Deduplication is our big stumbling block [08:57]
................................................................................................ (idle for 7h56mn)
***gk_1wm_su has joined #newsgrabber
gk_1wm_su has left
[16:53]
jrwrunderscor: https://hastebin.com/asanehaxed.nginx
thats my config
so its the internet archive doing it
[16:55]
Yep, My box is pretty much IP Banned from IA [17:02]
HCross2jrwr: I've just messaged the wayback machine director to see what is going on
However he's 8 hours behind London
[17:03]
jrwr"10/Jul/2017:17:03:20 +0000" client=179.181.55.94 method=GET request="GET /cdx/search/cdx?url=http%3A%2F%2Fwww.huffingtonpost.com%2Ftag%2Fhuffpost-live%2Ffeed&output=json&matchType=exact&limit=1&filter=digest:335NYMYRSZ5MQA2TPPXRDTQ6QF2SREAU HTTP/1.1" request_length=309 status=302 bytes_sent=622 body_bytes_sent=264 referer=- user_agent="python-requests/2.10.0" upstream_addr=207.241.225.186:80
upstream_status=302 request_time=0.293 upstream_response_time=0.293 upstream_connect_time=0.146 upstream_header_time=0.293
[17:03]
HCross2I'd check the wayback cdx status ATM.. but *.archive.org is blocked my by my phone provider
arkiver: I finally took on a phone contract and did age validation with my phone provider but the Archives are still blocked
[17:06]
JAAThat query above works fine for me. [17:21]
HCross2Hm. jrwr do you have access to a web browser via the proxy box [17:22]
.... (idle for 16mn)
***BubuAnabe has joined #newsgrabber [17:38]
BubuAnabeHey it's newsgrabber working?? [17:38]
JAAI think it's broken currently. [17:49]
..................................... (idle for 3h2mn)
Kazhttps://lowendbox.com/blog/storage-vps-exclusive-offer-2gtb-for-10month/?utm_content=57277571&utm_medium=social&utm_source=twitter
wonder how quickly they'd kick us off..
[20:51]
JAA2 TB storage, nice. [20:59]
..... (idle for 22mn)
***logchfoo3 starts logging #newsgrabber at Mon Jul 10 21:21:32 2017
logchfoo3 has joined #newsgrabber
[21:21]
.......... (idle for 45mn)
jrwrHCross2: I can [22:06]
hrn
wev browser is working OK over a SSH Socks Proxy
web*
[22:11]
...... (idle for 26mn)
***Hecatz has joined #newsgrabber [22:37]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)