#newsgrabber 2017-11-05,Sun

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***figpucker has joined #newsgrabber
figpucker has quit IRC (Read error: Connection reset by peer)
[11:01]
................. (idle for 1h24mn)
HCross2!url http://www.bbc.co.uk/news/uk-politics-41874026 [12:26]
........................................................ (idle for 4h38mn)
JensRexWhy is the deduplication stage so slow?
Doesn't appear to be stressing cpu, ram or disk.
[17:04]
Kazdedupe isn't done locally
whatever you have, it's checked to see if IA already has it
https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py#L153
[17:12]
JensRexOh. That makes sense.
Doesn't make sense to go above 1 concurrent then, when IA is the bottleneck.
wpull --session-timeout is 24 hours. That seems excessive.
[17:12]
........... (idle for 52mn)
HCross2FYI: https://www.theguardian.com/news/series/paradise-papers
Here we go
[18:11]
...... (idle for 25mn)
Kazbreaking: the queen has money [18:36]
................................ (idle for 2h37mn)
JensRexUploading 8GB warc at 200 kb/s :( [21:13]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)