#newsgrabber 2018-01-28,Sun

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[00:26]
..... (idle for 20mn)
Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[00:46]
.................... (idle for 1h36mn)
williede has quit IRC (Quit: Page closed) [02:23]
........................... (idle for 2h14mn)
qw3rty114 has joined #newsgrabber [04:37]
qw3rty115 has joined #newsgrabber [04:46]
qw3rty114 has quit IRC (Ping timeout: 600 seconds) [04:54]
................. (idle for 1h23mn)
anonymoosmedowar: change the wpull session timeout from 2 days to 2 hours (in the script) to make the live stream grabs fail faster [06:17]
................................. (idle for 2h40mn)
JAADon't run modified scripts. [08:57]
.............. (idle for 1h6mn)
HCross2Do we even need 2 hours? 2 minutes may be acceptable [10:03]
..... (idle for 21mn)
JAANot sure, some video downloads might tak elonger.
ArchiveBot uses 6 hours, just for reference.
[10:24]
HCross2hmm.. maybe 1 set of timeouts for non videos, and another for videos [10:37]
Kaz30 mins maybe.. jobs shouldn't be as long-running as archivebot grabs anyway
And we're running far too slowly as it is
[10:41]
***blitzed has quit IRC (Read error: Operation timed out) [10:47]
HCross2yep
I want to see 200Mbps.. not 20Mbps
[11:01]
Kaz: https://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/8 [11:06]
Kazmerged [11:10]
HCross2thanks
that should.. make things better
Kaz: just thinking.. im trying to work out a way to do an igset
probably a file in github, and then the warriors can get it every so often
[11:11]
although - should dedupe happen on the discovery side
or maybe master
[11:20]
........ (idle for 35mn)
***newsbuddy has joined #newsgrabber [11:57]
newsbuddyHello! I've just been (re)started. Follow my newsgrabs in #newsgrabberbot [11:57]
................................. (idle for 2h44mn)
Igloonah HCross2 dedupe needs to happen at source [14:41]
***Igloo sets mode: +o HCross2 [14:41]
IglooI think that igset is required
And we can make it pull on each version using a hash
And maybe track it in the tracker?
[14:41]
.... (idle for 15mn)
HCross2yeah, igset on the discovery side to knock out any bad urls at source [14:56]
IglooThe dedupe needs to happen at the warrior, as it's stuff like images etc which are duplicated
THe disco won't find them
[15:05]
.... (idle for 15mn)
HCross2*igset [15:20]
................. (idle for 1h21mn)
***blitzed has joined #newsgrabber [16:41]
..................... (idle for 1h40mn)
Smileydid u see my message about dedupe not working in warriors? [18:21]
............ (idle for 59mn)
HCross2I didnt [19:20]
..... (idle for 22mn)
Smileyinfact, it should be printing 'Deduplicating digest' +....
I dunno if you can make the dedupe step a bit more verbose, for warriors at least.
right so basically when I ran a warrior, it finished the wget step.... and seemed to stop there
the dedupe step wasn't highlighted in the warrior, it didn't show it was doing anything, cpu usage was 0.
[19:42]
.......... (idle for 45mn)
HCross2Network activity? [20:28]
........ (idle for 37mn)
***Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[21:05]
Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[21:10]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)