#newsgrabber 2018-01-01,Mon

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***odemg has joined #newsgrabber [00:19]
.... (idle for 19mn)
odemg has quit IRC (Read error: Operation timed out) [00:38]
............. (idle for 1h3mn)
odemg has joined #newsgrabber [01:41]
................................................................................... (idle for 6h53mn)
odemg has quit IRC (Ping timeout: 480 seconds) [08:34]
................................. (idle for 2h43mn)
HCross2arkiver: im popping a small Electroneum miner on newsbuddy (just to try to offset some of the cost)- if you need the CPU feel free to stop it [11:17]
.... (idle for 19mn)
JAAElectroneum? [11:36]
HCross2yep
or not.. as cmake doesnt want to play ball at all
[11:36]
JAAAh, another cryptocurrency.
We could make our own and call it ATC.
[11:36]
......... (idle for 43mn)
HCross2yeah.. gcc etc is broken on newsbuddy [12:20]
JAA:-| [12:22]
...... (idle for 29mn)
HCross2OVH.. take your kernel and fuck off [12:51]
........... (idle for 50mn)
***odemg has joined #newsgrabber [13:41]
........ (idle for 37mn)
odemg has quit IRC (Read error: Operation timed out) [14:18]
odemg has joined #newsgrabber [14:30]
odemg has quit IRC (Ping timeout: 260 seconds) [14:40]
.... (idle for 15mn)
odemg has joined #newsgrabber [14:55]
................ (idle for 1h19mn)
IglooHey, Does the warrior work with this now?
Or is that still broke?
[16:14]
HCross2im not 100% sure, but there hasnt been any update recently [16:17]
IglooIt seems to work OK, Until it gets a video one
Youtube-dl is missing
[16:18]
HCross2hmm, could we add a step to the warrior to download youtube-dk
dl
[16:19]
IglooYup [16:19]
HCross2Igloo: you good with Python at all? [16:20]
IglooI can make it do what I want, Want me to look at sorting that? [16:20]
HCross2I was wanting someone to cast an eye over https://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/3/files before I admit it
Would it just be a check for youtube-dl on https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py and then download it if we dont have it?
[16:21]
IglooYep, I believe so but it's been a while since I looked at it. Youtube DL is updated frequently, Might be better placed somewhere else if it runs for a long time? [16:23]
HCross2ive found that I need youtube-dl in the newsgrabber directory
I was just thinking "check if its there" "if not, get it there via wget"
[16:24]
IglooGet it via pip
So it's up to date.
[16:24]
JAA"< HCross2> ive found that I need youtube-dl in the newsgrabber directory" needs to be fixed then. It should just look at PATH. [16:25]
HCross2hmm, it may just be my broken setup for that
I think it uses path if it cna
can
[16:26]
JAAI think someone else reported it before though. [16:26]
IglooI've had issues too with PATH [16:26]
HCross2I think that may be a wpull thing, as ArchiveBot has the same issues if I remember [16:26]
***JAA sets mode: +oo Igloo HCross2 [16:26]
IglooYeah it's AB where I had issues [16:27]
JAAI'm pretty sure wpull is just calling youtube-dl through subprocess.
You can give it an explicit path through --youtube-dl-exe, but by default it should just search PATH.
[16:27]
HCross2I just had a look - https://github.com/ArchiveTeam/NewsGrabber-Warrior/blob/master/pipeline.py#L278 might be a better place, so we dont get youtube-dl involved until we need too [16:28]
Kaznah, you want it to run once at the start of the script
rather than check for each item that needs it
[16:29]
HCross2good point [16:29]
IglooNah you want it to be checked for each item
As youtubedl updates frequently
[16:30]
JAAIgloo: ArchiveBot has an explicit call to ./youtube-dl apparently, for whatever reason. "Copy of youtube-dl, copied here so it doesn't need to be in PATH"... :-| [16:30]
IglooAs youtube changes their stuff (We don't do youtube per se, But good to have the new one) [16:30]
HCross2^ good point too - downloading it is a hack [16:30]
IglooJAA: Yep, Saw that. Weird. Must be a bug from some version of wpull
HCross2: we need to have pip re-get it for each video item
[16:31]
JAAYeah, possible. [16:31]
IglooIt's only a few seconds to check it. [16:31]
JAADoes that break anything if another item using youtube-dl is already running? [16:31]
IglooIt should be in memory [16:31]
HCross2doesnt it just cache the exe into ram? [16:31]
IglooSo "shouldn't"
But it'd have to be verified
[16:31]
JAAWell, it's not just a single executable.
The "executable" is simply a Python script which imports and runs stuff.
[16:32]
IglooNo but that part will be in memory and then just do local calls to the other imports and things [16:32]
JAAIf other parts of youtube-dl are only imported later and the installation is changed inbetween, maybe stuff might break.
Also, wpull executes youtube-dl once for each URL, so you'd have different youtube-dl versions within one item potentially. (But the WARC metadata record would only list the version used when wpull is launched.)
[16:32]
IglooTrue, That's more of a problem
We can do it when the script starts. But if it's out of date it won't change until the warrior reboots (or project is changed)
[16:33]
HCross2maybe do a "once every 10 video items, check for an update" [16:34]
JAAYeah, or periodically wait for all items to finish, update youtube-dl, restart processing items. [16:34]
Iglooyoutube-dl gets commits daily.
So it's not practicable to do that
Maybe we just do it on script start up and have done with it
The warrior install has youtube-dl in it
[16:36]
JAACommits, yes. But releases only happen every week or so (9 releases in the past two months).
I'm not very familiar with NewsGrabber. How long do individual items take usually/at most?
[16:42]
IglooDepends on the size
Videos some are 20G+
Depends on the internet connection at the warrior etc
[16:44]
Kazmost of the people using warriors will be restarting enough that it'll get updated frequently anyway
I'm sure there's plenty of *much older* versions of youtubedl on non-warrior grabbers, and it's never caused huge issues in thepast
[16:48]
IglooYeah, If we're going to fix something fix it properly tho
youtube-dl is in the script, So I need to look into it
[16:52]
.......... (idle for 46mn)
The warrior code is really really old [17:38]
..... (idle for 20mn)
Kazi mean, if you want the *proper* fix, wpull needs to implement youtube-dl properly, rather than just calling it through subprocess or whatever.. but that's a fair bit more work [17:58]
Igloohttps://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/4
Tested & working.
[18:04]
HCross2I dont have access to that rep
repo
and arkiver doesnt seem to be around
[18:11]
Kazhttps://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/4#pullrequestreview-86066907 [18:18]
.......... (idle for 48mn)
IglooKaz: needs to be merged when it's ready, HCross2's other work is there too.
It will just merge the change lines into that
[19:06]
HCross2Perfect. We just need permissions now
There's a load of services I'll accept once this is done
[19:07]
IglooIt should also mean that we can use the old warrior image
I'm going to work on upgrading the warrior image / updating the software update project
[19:14]
.... (idle for 15mn)
***Martle has joined #newsgrabber [19:30]
....... (idle for 34mn)
anonymoos has quit IRC (Ping timeout: 255 seconds) [20:04]
anonymoos has joined #newsgrabber [20:18]
.... (idle for 18mn)
jrwrHCross2: Igloo Im so glad that sketchcow let me work on the wiki
poor thing was getting pretty bad
[20:36]
HCross2I need to ask you about the news wiki actually.. [20:37]
jrwryes? (I've not been to it in some time) [20:37]
HCross2I need to get that thing sorted... And away from DontKnowWhatTheyreDoing247 [20:37]
jrwrAh
I've got a dreamhost Ill slap it on
its got a dozen other projects on it
[20:37]
HCross2Ta. Let me know what DNS you need [20:38]
jrwrwhat the domain name? [20:38]
HCross2wiki.newsbuddy.net [20:38]
jrwrns1.dreamhost.com.
ns3.dreamhost.com.
ns2.dreamhost.com.
I already had the domain in there already (for master.news)
[20:40]
HCross2Hm. I can delegate the nameservers for just wiki. Would that work? [20:40]
jrwrya
thats fine
[20:41]
HCross2Doing it now. I pay for OVH anycast DNS for the main domain
All done
[20:41]
Igloojrwr: I'm hoping more and more "stuff" will be desciminated to others soon
So we can work on this without requiring just one or two people
[20:48]
jrwrYa, right now (No fault of arkiver) archive team's overall infra is pretty outdated for modern websites [20:49]
HCross2Igloo: I'm glad.. we're stuck now until we can get the code committed [20:49]
IglooI can do bits and pieces but testing is a royal pain in the arse
Especially at scale
[20:49]
jrwrand across so many websites
the inf loops are killing us
all the extra resources are as well
[20:50]
***anonymoos has quit IRC (Read error: Operation timed out) [20:55]
anonymoos has joined #newsgrabber [21:08]
..... (idle for 23mn)
Kazwe need a sub-org in github for newsgrabber-related crap
or a team or whatever they call it
[21:31]
........ (idle for 37mn)
***Igloo has quit IRC (Ping timeout: 250 seconds)
Aoede has quit IRC (Ping timeout: 250 seconds)
Igloo has joined #newsgrabber
svchfoo3 sets mode: +o Igloo
Aoede has joined #newsgrabber
[22:09]
...... (idle for 25mn)
JAAWho has access to the org anyway? [22:35]
.... (idle for 15mn)
***blitzed has quit IRC (Read error: Operation timed out)
blitzed has joined #newsgrabber
[22:50]
HCross2I do... In a limited fashion
jrwr: let me know when I can shut the VM down for the wiki
[22:53]
KazJAA: https://github.com/orgs/ArchiveTeam/people
anyone listed as owner
[22:56]
JAAI don't see anything distinguishing the people on that page (owners vs. other org members)...?
HCross2: Ah yeah, I meant admin access to the entire org.
[23:02]
HCross2Push comes to shove.. SketchCow might have access [23:04]
Kazah, you probably need to be in the org to see [23:04]
JAAI know arkiver does. yipdw maybe as well. [23:04]
Kazthere's a lot of people that do [23:04]
JAAAh, that would make sense. [23:04]
Kazseems fairly arbitrary as to who gets member vs owner
oh actually, there's only a couple of outliers
[23:05]
HCross2I don't have access to all the news repos [23:07]
Kazyeah, you're just a member
wil see if anyone's willing to bump me to owner
[23:10]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)