#newsgrabber 2017-08-27,Sun

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***YortMeist has left [00:05]
YortMeist has joined #newsgrabber
YortMeist has quit IRC (Connection closed)
YortMeist has joined #newsgrabber
[00:15]
.... (idle for 15mn)
YortMeist has left [00:31]
................................ (idle for 2h39mn)
Grabb has joined #newsgrabber [03:10]
Grabb has quit IRC (Ping timeout: 268 seconds) [03:22]
............. (idle for 1h3mn)
Aranje has joined #newsgrabber [04:25]
.............. (idle for 1h9mn)
Aranje has quit IRC (Three sheets to the wind) [05:34]
............... (idle for 1h11mn)
mls has joined #newsgrabber [06:45]
........................................................... (idle for 4h52mn)
mlsjrwr, HCross2, arkiver There's some long urls seemingly causing issues on the dedupe end [11:37]
HCross2hmm - can you paste logs? [11:38]
mlsFrom what's left on my tmux session, yes
HCross2: It's newsbuddy:warrior_813_1503668275.92, https://pastebin.com/LDvhfUPi
[11:38]
HCross2arkiver: could we truncate URLs after X length?
mls: that actually returns content
[11:42]
mlsIt does
HCross2: It returns the same regardless what's added when it reaches this URL https://open.http.mp.streamamg.com/html5/html5lib/v2.42/mwEmbedLoader.php
[11:43]
Fletcher_HCross2, which DC is newsbuddy in? (looking spin up a server to help out, not sure which location is preferred) [11:49]
HCross2OVH RBX
I'm finding Frankfurt is pulling down content very quickly
[11:49]
Fletcher_any recommendations? looking at the SP-32 (+NVMe if it would help) https://www.ovh.co.uk/dedicated_servers/enterprise/173sp2.xml [11:53]
HCross2OVH would work, it would be nice if you could get some geolocated IPs from different nations and crawl from them [11:56]
Fletcher_before I hit the button, nvme or no? :P [12:04]
HCross2dooo iiittt [12:08]
Fletcher_haha [12:08]
HCross2I just got this from my friend at M247: "Harry... we need to talk, https://media.discordapp.net/attachments/260918481665392650/351336578091581451/Screen_Shot_2017-08-27_at_15.06.00.png?width=1440&height=233" [12:08]
Fletcher_:D [12:09]
HCross2new discoveries on their way in Los Angeles and Singapore :) [12:14]
***frgh has joined #newsgrabber
frgh has quit IRC (Client Quit)
[12:15]
..................................... (idle for 3h0mn)
trvzis it normal that new workers are only or mostly checking duplicates? [15:15]
........ (idle for 35mn)
HCross2trvz: if you have a concurrent of 20 say.. they'll dedupe with 1 concurrent [15:50]
.......... (idle for 48mn)
66.2 million views :) [16:38]
arkiver: https://archive.org/download/archiveteam_newssites_20170827124255 hmm - we have a 0KB .tar in there
quite a few have that
[16:51]
arkiveryes that's good
afaik bad WARCs go into the tar
[16:51]
HCross2ah so we want it to be 0kb [16:56]
arkiverI think so yeah
let me check to be sure
[17:01]
..................... (idle for 1h44mn)
jrwrI have processed 66103100 requests [18:45]
........................................ (idle for 3h19mn)
***blitzed has quit IRC (hub.efnet.us hub.dk)
Igloo has quit IRC (hub.efnet.us hub.dk)
Fletcher_ has quit IRC (hub.efnet.us hub.dk)
underscor has quit IRC (hub.efnet.us hub.dk)
Igloo_ has joined #newsgrabber
[22:04]
Fletcher- has joined #newsgrabber [22:13]
underscor has joined #newsgrabber [22:19]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)