#newsgrabber 2017-08-24,Thu

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***Aranje has quit IRC (Remote host closed the connection) [01:34]
arkiveras suggested by jrwr we're going to use ws protocol
for communication with deduplication server
first doing http however, with later implementation of ws protocol and fallback to http as suggested by jrwr
[01:41]
jrwr:) [01:42]
arkiverso just changed ip for now [01:44]
jrwrand port [01:58]
arkivernot using ws yet, so I guess port 80? [02:01]
jrwrport 3000 for htttp
arkiver:
[02:01]
arkiverah [02:02]
jrwrusing node this time around
not PHP
[02:02]
arkiverhttp://142.44.174.241:3000/{hashed} [02:02]
jrwryes
I am still running the NewsGrabber-Deduplication-Feeder
FYI
[02:02]
arkiver20170824.03 newest version
great, is it still running fine?
[02:04]
jrwrya [02:04]
arkiveryou can just leave it running [02:04]
jrwrI made a edit to move the redis port [02:04]
arkiverit'll pick up new WARCs as they are uploaded [02:05]
jrwrcool [02:05]
arkiver20170824.03 is now minimum in tracker [02:05]
jrwrpeople's pipelines are going to need updating
does manual pipline update at all?
[02:05]
arkiverno [02:06]
jrwrdare you to set it as default :) [02:06]
arkiverrestarting newsbuddy [02:06]
***newsbuddy has joined #newsgrabber [02:07]
newsbuddyHello! I've just been (re)started. Follow my newsgrabs in #newsgrabberbot [02:07]
arkiverdefault :)
it should run fine on warriors
but could be tested a little more
[02:08]
jrwrWe should start pushing my updated warrior image
uses alpine + docker
[02:08]
arkiverI guess so [02:08]
jrwrso the docker version is much newer and easier to field update [02:09]
arkiverdid you test it with all project we have running?
also urlteam
[02:09]
jrwrYes
Since its using the docker version
[02:09]
arkiverwe should discuss making it the new default with chfoo [02:09]
jrwrYa
here come the requests!
[02:09]
arkiverI see newsbuddy loading a crazy number of URLs... [02:11]
jrwr0.45ms return times
sweet baby jesus
[02:12]
arkivernice.... [02:12]
jrwr1% cpu
BRING IT
[02:12]
arkiverwell I'll just let newsbuddy load the enormous amount of URLs and push them to the tracker [02:13]
jrwrI guess it HAS been a busy newsweek [02:13]
arkiverif it is really an extreme amount it might be good to reset URLs
haha yeah
[02:13]
jrwrtotal_calls
611546990
dbsize
5361392020
8.8G
on disk
ill have to remember SSDB
its coms compact with REDIS
and all its key types
[02:16]
arkiverit seems to be pretty good [02:19]
jrwrits using 2GB of ram as cache [02:20]
arkiverpretty good [02:20]
jrwrim up to 400req/minute [02:20]
arkiver:D
do we have any online statistics again?
arkiver is afk for 20 min
[02:20]
jrwrhttp://142.44.174.241:8088/
arkiver:
[02:28]
...... (idle for 25mn)
logs are showing 0ms response times
lol
[02:54]
https://usercontent.irccloud-cdn.com/file/qe2K5H7w/image.png
thats per worker
I have 4 works online
[03:04]
arkiverthat's great!
so this is running great again :)
[03:09]
jrwryep [03:10]
arkiverwell except for getting ws protocol in [03:10]
jrwrya [03:10]
arkivergotta go now [03:10]
jrwrthat will make it faster for the bulk [03:10]
arkiverthanks for the work
yes
and let's fix more tomorrow
[03:10]
jrwrYep [03:11]
arkivershould look into moving the default warrior image to yours tomorrow
what timezone you in?
well
2749457 URLs added in the last 15 minutes.
crazy
arkiver is afk for 8 hours
[03:11]
...... (idle for 28mn)
***Fletcher has quit IRC (Remote host closed the connection) [03:41]
.... (idle for 15mn)
Fletcher has joined #newsgrabber [03:56]
....................................... (idle for 3h10mn)
HCross2arkiver: our disks are full. We don't seem to be uploading
https://usercontent.irccloud-cdn.com/file/qfu9KMu5/Screenshot_20170824-080619.png
[07:06]
............................................................... (idle for 5h11mn)
arkiverHCross2: ah
so the WARCs from older project are partially synced WARCs for the largest part that did not get into the megaWARC
normally those are deleted
so I guess I'll do the same here
[12:17]
HCross2Ok. We're still at 95% with old newsbuddy warcs and old other projects [12:28]
arkiverlooks like 2 WARCs were uploaded today https://archive.org/details/archiveteam_newssites?sort=-publicdate [12:32]
***newsbuddy has quit IRC (Remote host closed the connection)
newsbuddy has joined #newsgrabber
[12:42]
newsbuddyHello! I've just been (re)started. Follow my newsgrabs in #newsgrabberbot [12:43]
arkivergoing to start fresh [12:44]
***newsbuddy has quit IRC (Remote host closed the connection) [12:44]
Kazstarting warriors up now, I take it I don't need to do anything other than update the copy I've got locally? [12:46]
arkiverI think so
a git pull would be enough
[12:46]
Kazhmm, discovery process has been running for a while
..it's now using 30GB of ram
going to reboot that machine, seems it could do with it
[12:47]
arkiveryeah, it's keeping discovered URLs in memory
feel free to restart
for deduplication
[12:48]
***kurt has quit IRC (Read error: Operation timed out) [12:49]
arkiverKaz: try command !clear next time [12:50]
Kazah, didn't realise that existed
will do
[12:50]
arkivernot sure if it's ever been tested well
https://github.com/ArchiveTeam/NewsGrabber-Discovery/blob/master/irc.py#L139-L141
but is should work I think
it*
[12:50]
Kazhmm, that is if the machine ever comes back up [12:51]
***kurt has joined #newsgrabber
Kaz sets mode: +o kurt
[12:55]
arkiverstill not coming back? [12:56]
Kazwe're back
looks like it screwed up rebooting, so someone had to go and push the button
[12:57]
........... (idle for 50mn)
HCross2arkiver: explains why me using 512mb VMs for discovery was a bad idea [13:47]
.... (idle for 17mn)
arkiverHCross2: yeah... oops [14:04]
.................... (idle for 1h35mn)
Maybe we should have an option to turn deduplication on the discoverer off
HCross2 ^
then it's all done on the main server
[15:39]
.................. (idle for 1h26mn)
jrwrALERT: Dedupe rate has decreased to almost nothing, its down to 4req/s
requests are not coming in
[17:05]
arkiveryes, I paused as WARCs are being uploaded to IA
we'll restart soon
[17:08]
jrwrOk [17:08]
HCross2Yup. We're hammering my 1Gbps pory
Port
[17:09]
jrwrlol [17:09]
HCross2Reminds me.. I should go pay OVH soon [17:09]
jrwrarkiver: gimmie your email
ill give you the list of rsync targets I have
[17:09]
HCross2Here goes ££ [17:10]
arkiver(send mail to jrwr over PM)
(since channel is publicly logged)
[17:11]
JAASpeaking of public logs, what's the password for the #archivebot logs? [17:13]
HCross2not sure
I do have
Just a note for the future, the value of the pound against the Euro is now at an 8 year low. I have a bad feeling that if this carries on, then the cost for me for the server may increase again
its already increased 20% this year
[17:13]
JAABlame the stupid half of your country, I guess. :-/ [17:15]
HCross2yup
./end of politics
[17:15]
trvzjust buy something in GBP, I'm sure you can get 500Mbps for 60£ in London [17:16]
HCross2hmm, I doubt unlimited 500Mbps [17:16]
JAADoes OVH charge in EUR, even if you use ovh.co.uk and order servers in London? [17:17]
HCross2.co.uk charges in GBP, but they convert to EUR [17:18]
jrwrHCross2: it is
if its OVH
[17:18]
JAAWait, so they advertise prices in GBP, then convert that to EUR, but then charge you again in GBP? What the hell? [17:18]
trvzthat wouldn't help, at some point OVH just'd raise the GBP price [17:19]
HCross2Nah they bill me in GBP then I presume send it to France and convert to EUR in the process [17:19]
trvzthey need to pay the hardware in USD [17:19]
JAAYeah, true.
Oh, Online.net finally has some ST8s again.
[17:19]
HCross2Hahahahah. Online.net
We got thrown out of there
[17:23]
JAAFor using too much bandwidth? [17:25]
HCross2Yup [17:25]
JAACan't say I'm surprised. [17:25]
arkiverdropped like 10% in 2 months... http://www.xe.com/currencycharts/?from=GBP&to=EUR&view=1Y [17:25]
HCross2arkiver: banks are predicting parity by the end of the year [17:26]
arkiverlooking at the graph that's probably going to happen yeah
or even lower
[17:26]
jrwrso
I've been thrown off seedboxes for doing too much taffic
and it was HTTP traffic to boot!
Sir you are using all of 10Gbit/s nic on the box
Im sorry if im trying to serve a 40MB video file to half the internet
[17:27]
arkivernice
what were you seeding on there...
[17:35]
jrwrRemember the collab artwork that reddit did
./r/place
[17:38]
...... (idle for 25mn)
I had my timelapses hosted as videos
they where only 40MB
but hot damn
[18:03]
....... (idle for 33mn)
arkiverhaha well that explains a lot [18:36]
..... (idle for 21mn)
jrwr: HCross2: restarted [18:57]
***newsbuddy has joined #newsgrabber [18:57]
newsbuddyHello! I've just been (re)started. Follow my newsgrabs in #newsgrabberbot [18:57]
***HarryCros has quit IRC (Read error: Connection reset by peer)
HarryCros has joined #newsgrabber
[19:02]
HarryCros has quit IRC (Remote host closed the connection)
HarryCros has joined #newsgrabber
HarryCros has quit IRC (Remote host closed the connection)
[19:09]
.......... (idle for 46mn)
Wurstsala has joined #newsgrabber [19:57]
WurstsalaHey I am running the script on a box without sudo permissions is there anyhow I can fix the youtube-dl errors in the wpull? [19:58]
Aoedepip3 install --user youtube-dl
maybe some symlink magic
[20:03]
Wurstsaladoes it detect the install automaticly then
or do I have to force it to use videos
[20:07]
Aoedeit should find youtube-dl automatically. if not create a symlink
NewsGrabber-Warrior/youtube-dl: symbolic link to /home/atwarrior/.local/bin/youtube-dl
I did that and it stopped complaining
[20:09]
Wurstsalathanks
will see if it grabs videos now
do you also have those task destroyed things?
[20:12]
Aoedenope... arkiver ^ [20:15]
WurstsalaHm [20:16]
INFO youtube-dl fetched \u2018https://rtax.criteo.com/delivery/rta/rta.js\u2019.
that sounds like it's working?
[20:23]
Aoedeyep \o/ [20:25]
Wurstsalathank you
I think something is still wrong as I do not see to get any submissions through on the tracker
but we will see
or well far less than everyone else do you guys have some special setup with 30 nodes>
Guess it could have to do with those tasks getting destroyed every 20 - 30 seconds
[20:26]
......... (idle for 40mn)
***HarryCros has joined #newsgrabber [21:12]
HarryCros has quit IRC (Remote host closed the connection)
HarryCros has joined #newsgrabber
HarryCros has quit IRC (Read error: Connection reset by peer)
HarryCros has joined #newsgrabber
[21:21]
........ (idle for 38mn)
jrwrarkiver: forshame you should do setups to pull youtube-dl
not hard to do it locally
[22:03]
***Wurstsala has quit IRC (Ping timeout: 268 seconds) [22:10]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)