#newsgrabber 2018-04-02,Mon

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
phirephlySo I added caching to the dedupe queries using squid and over-riding the uncachable header with a 60 minute cache, and I'm seeing a 30%-70% hit rate depending on the work load https://gist.github.com/PhirePhly/e80e507a5f669d1b8b07c0ebdaa3c68f
It significantly speeds up the dedup process
obviously not routing the actual crawls through squid, just the dedup queries
[01:03]
................ (idle for 1h18mn)
***HCross has quit IRC (Quit: Connection closed for inactivity) [02:22]
.............. (idle for 1h9mn)
qw3rty116 has joined #newsgrabber [03:31]
qw3rty115 has quit IRC (Read error: Operation timed out) [03:37]
................................................................ (idle for 5h15mn)
HCross2 has joined #newsgrabber
HCross has joined #newsgrabber
[08:52]
................................................................................. (idle for 6h42mn)
ampt has joined #newsgrabber
ampt has left
[15:35]
................................................................ (idle for 5h17mn)
odemg has joined #newsgrabber [20:53]
.......... (idle for 48mn)
Igloo_ is now known as Igloo [21:41]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)