#newsgrabber 2018-01-31,Wed

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[00:01]
..................................... (idle for 3h3mn)
Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[03:05]
Aranje has quit IRC (Read error: Operation timed out)
Aranje has joined #newsgrabber
[03:10]
........ (idle for 38mn)
anonymoossorry about the dedupe issues on warrior. i used python2 instead of python because some linux distros like arch have python3 as the default, and the python2 symlink exists on my ubuntu 14.04 vm. [03:48]
............ (idle for 57mn)
***qw3rty118 has joined #newsgrabber
qw3rty117 has quit IRC (Read error: Operation timed out)
Aranje has quit IRC (Quit: Three sheets to the wind)
[04:45]
.................. (idle for 1h28mn)
blitzed has quit IRC (Quit: Leaving) [06:19]
........................................ (idle for 3h19mn)
Kazto be fair, that's probably the correct way to do it.. we're just having to work around the fact that warrior v2 is old and broken [09:38]
...... (idle for 27mn)
IglooYeah, Don't stress aboutit anonymoos
We don't have a good testing method, or time
Mostly no time to look into anything
[10:05]
........ (idle for 39mn)
Smileymls: that's fine
it sets WARRIOR to 2, checks for 3, if it finds python3 it uses warrior3
yeah, no one would of noticed if I hadn't been annoying ;D
Also, urlteam2 has been running all night, with project selected as archiveteams choice
[10:44]
Kazstop breaking things Smiley [10:51]
Smileyhehe sry
i guess most of you don't run warriors, so this stuff doesn't get noticed ;D
Smiley tries just restarting the script
hmmmm weird, didn't work
maybe the docker?
The directory '/home/warrior/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
bleh?
Checking youtube-dl status
youtube-dl symlink exists
Test for Wpull returned 0
[10:51]
Kazoh hmm
maybe it can't install requests module?
output of pip freeze?
[10:54]
Smileylooks like it
as root?
or warrior?
[10:55]
Kazwarrior if you can [10:55]
Smileyhttps://pastebin.com/6N7Cc24P
I can spam the channel if you'd prefer
[10:57]
who does the pip commands get called as? [11:03]
..... (idle for 22mn)
Kaz: so does the V3 warrior use the same dockerfile as we can run manually? [11:25]
Kazhonestly don't have a clue [11:26]
Smileyk [11:26]
KazI haven't touched warrior v3 yet, as v2 works on esxi for me
so requests is installed for warrior user.. idfk what's going on
[11:27]
SmileyRUN (cd /home/warrior && sudo -u warrior git clone -b docker https://github.com/ArchiveTeam/warrior-code2.git)
I think this needs -H on the sudo
but I'm grabbin at je straws
[11:28]
..... (idle for 20mn)
so the V3 warrior isn't starting newgrabber
so the V3 warrior isn't starting newsgrabber
2018-01-31 11:33:12,794 - seesaw.warrior - DEBUG - Update warrior hq.
2018-01-31 11:33:12,794 - seesaw.warrior - DEBUG - Warrior ID '17878'.
2018-01-31 11:33:13,548 - seesaw.warrior - DEBUG - Select project newsgrabber
2018-01-31 11:43:12,793 - seesaw.warrior - DEBUG - Update warrior hq.
2018-01-31 11:43:12,794 - seesaw.warrior - DEBUG - Warrior ID '17878'.
2018-01-31 11:43:13,513 - seesaw.warrior - DEBUG - Select project newsgrabber
Smiley hits it
even rebooting it didn't help :/
I need the command that runs, which starts the pipeline, so i can try it manually
if it's just python3 ./pipeline.py then it's fooked
ImportError: No module named six.moves
`sudo -H pip3 install six` fixes it
Warcio was not imported correctly.
getting this over and over in logs now xD
Smiley ponders quietly
[11:50]
.... (idle for 17mn)
mlsSmiley: Hey, still going at it on v3? [12:20]
Smileyyup [12:22]
mlsSmiley: v3 beta is running newsgrabber through run-warrior3, which is python3 based and doesn't work. [12:22]
Smileycept I don't know where to call the pipeline from, and it's moaning about warcio
mls: lol yeah, was trying to 'fix' it
[12:22]
mlsNot for this project anyway. [12:22]
SmileySmiley doesn't understand the warcio test
if not warcio.__file__ == os.path.join(os.getcwd(), 'warcio', '__init__.pyc'):
[12:22]
IglooThe problem is that this project isn't coded for python3 [12:24]
mls^ That [12:24]
Iglooand so much is different with 3 than 2
It just will not work.
Coplete re-code
[12:24]
mlsI ran it manually with run-warrior on v3 and that defaults to python2
Wouldn't recommend it in your case Smiley
[12:24]
SmileyAhhh ok
right fine. no problem then haha
Why didn't someone say that yesterday XD
[12:25]
mlsI did [12:25]
Smileyoh :D
I didn't udnerstand then ;D
[12:26]
mlsBut I didn't say it as clearly as Igloo did [12:26]
Smileyhehe
k, so running a v2 warrior doing newsgrabber, and a v3 warrior doing urlteam ;D
[12:27]
mlsDumped v2 just to see if it works with a fresh install [12:27]
Smileyhmmm V2, seems to be crashing
Smiley tries to get into the console
[12:31]
mlsAh right, the youtube-dl thing [12:32]
Smileyload of 4 xD
Smiley throws some more cpu it's way
why on earth is wpull using like 100% cpu o_O
[12:32]
mlsCause it can [12:36]
Smileyseems to be related to the fact it can't find youtube-dl [12:36]
mlsRight, newsgrabber is working on v2 except for the youtube-dl thingy [12:37]
Smileyis youtube-dl important? [12:37]
mlsYes [12:37]
Smileyk, so now i think about fixing this [12:38]
Igloosymlinks [12:39]
mlsIndeed [12:39]
Smileyso the youtube-dl version between v3 and v2 hasn't changed? [12:40]
Igloono
Well, Not true, but it works with both warriors
It's updated nearly daily
I did update teh warrior script somet ime ago to make it pull the latest each start
[12:40]
Smileyk, so we just need to link the newsgrabber code to call youtube-dl properly? [12:41]
Igloosymlink into the project directory
next to wpull
[12:41]
Smileygrr
can't do anything because it's too heavily loaded :/
[12:47]
brand new v2 warrior
youtube-dl just installed
the code creates a symlink, but that one doesn't work?
but even the PATH is fixed now it seems
errr ok there's a weird error about not finding GLIBC_2.13
Also, is this kernel non-smp ?
[12:55]
mlsAppliance is single core, so I guess it wouldn't be
symlink is created just fine, only not where we need it to be and especially when
Already know when we want it, just not how (yet)
[13:04]
KazI've symlinked youtube-dl to project directory in /data, still doesn't work
this is v2 ^
unless I've got the location wrong?
[13:08]
mlsThe symlink should be in the same directory as where the pipeline used is ran from
So if pipeline is ran from /data/data/projects/newsgrabber-4e3dd61, the symlink needs to be created in that same directory
[13:11]
Smileycd /data/data/projects/newsgrabber-hash
rm ./youtube-dl
ln -s $(`which youtube-dl`) ./youtube-dl
[13:12]
Kazthe link is created in /data/data/projects/newsgrabber-commit
Smiley: make that work in python
[13:13]
Smileywell, as long as we aren't going to move youtube-dl anywhere else again
it's just a link to /usr/local/bin/youtube-dl
so how do we create a symlink in python
os.symlink('/usr/bin/youtube-dl', os.getcwd() + '/youtube-dl')
[13:13]
mlssubprocess.check_output('which youtube-dl', shell=True) or would that be cheating? [13:15]
Smileyshould be os.symlink('/usr/_local_/bin/youtube-dl', os.getcwd() + '/youtube-dl')
errr that local was meant to be in bold, not underlined XD
should be os.symlink('/usr/*local*/bin/youtube-dl', os.getcwd() + '/youtube-dl')
should be os.symlink('/usr/local/bin/youtube-dl', os.getcwd() + '/youtube-dl')
boom
[13:15]
mlsSmiley: That's really nice, but youtube-dl doesn't exist in /usr/local/bin/ on v3 beta
And in the case of manually running it, it can't be breaking stuff =D
[13:16]
Smileyffs [13:16]
mlslol [13:17]
Smileyhow can we check which version? [13:17]
JAAFix PATH then. Seriously. [13:18]
SmileyJAA: python doesn't use $PATH [13:18]
JAAsubprocess sure does. [13:18]
mlsHow about wpull? [13:18]
JAAAnd wpull uses subprocess to call youtube-dl.
So unless you specify --youtube-dl-exe ./youtube-dl or something like that, it should just go look for it in the PATH.
[13:18]
Smileyhttps://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/9
there?
not sure if that'll work exactly, but it makes it easy to undedrstand
as I'm unsure of the except: continuing?
or should I just have 2 try's and then a except?
[13:20]
JAAOh wait, wpull 1.2.3 *doesn't* use subprocess. Hmm.
Newsgrabber uses 1.2.3, right?
[13:21]
SmileyJAA: see my change? it'll work i think. [13:21]
mlsYea
JAA: Explicitly
[13:21]
JAASmiley: Hmm, doesn't os.path.isfile just return False if the target doesn't exist? [13:23]
Smileydoes false not fail a try? [13:23]
JAANo
Only exceptions do.
[13:23]
Smileycan you make it fail a try? D:
oh ffs
[13:23]
JAAYeah, but you shouldn't. [13:24]
Kazhey guys
who wants to rewrite the project?
[13:24]
Smiley:D [13:24]
Kaz:) [13:24]
JAASeriously, why don't we use find_executable and pass that to wpull with --youtube-dl-exe? [13:24]
SmileyJAA: do a pull request?
I have no idea what find_Executable is, nor how to use it :/
[13:24]
JAAOk, just a second [13:25]
KazJAA: because as of right now, I'm the only goon with access to the repo [13:25]
mlsI believe find_executable from seesaw.util expects a version
But there's test_executable
[13:26]
KazI think you can give it a minimum version
maybe..
[13:26]
JAANo, I don't think so, but you don't have to specify a version. [13:26]
Smileythere, done it with two if's [13:27]
Kazoh, and bump the version number on your pull please [13:27]
Smileydidn't know there was one XD
fixed.
[13:28]
mlsSo what minimal version would we require for youtube-dl? [13:29]
Smileythis has not been tested btw [13:29]
Kazwhatever the current version is [13:29]
mls2018.01.28 I believe [13:29]
KazSmiley: sounds like probably not the best idea for me to merge while at work then? [13:29]
SmileyKaz: not yet.
I can't ssh into the damn warrior atm to easily put the code in to test
hmm, can I git pull the proposed patch?
[13:29]
JAAShould the pipeline fail if no usable youtube-dl is found? [13:32]
Smileyit doesn't currently? [13:32]
JAAOr should it only fail when trying to process video jobs. [13:32]
Smileyoh wait [13:32]
JAAI have no idea, I'm not running this project on my machines. [13:32]
SmileyI'm running it on warrior-v2s
it runs, and get's STUCK currently
so i can hardly break it any worse xD
[13:33]
Kazpipeline will run without youtube-dl, but video jobs will fail, yeah
normal jobs will complete just fine
[13:35]
JAAHm, okay. [13:36]
SmileySo my code is ok maybe ;D
and my son just woke up, which kills the chance of me testing it yet :(
[13:36]
JAAWould be easier to just let it fail entirely. [13:36]
Smileywhich would mean having to raise an exception? XD [13:36]
JAAKaz: What do you think about always requiring youtube-dl? It's installed as part of the warrior setup anyway, right? [13:37]
SmileyTask was destroyed but it is pending!
wut is this?
[13:37]
JAAThat's what you get when you destroy a task when it's pending.
:-P
[13:38]
Smiley:D [13:38]
KazJAA: honestly, as long as video jobs work on the warrior, anything's fine [13:38]
Smileywow with like 4 cpu's [13:39]
Kazfor now we're only supporting Warrior v2, so as long as it works there [13:39]
Smileyi can't run more than 2 grabs at once o_O
which process actually runs the webui?
and I'll renice it XD
[13:39]
Kaznot a clue [13:40]
JAASmiley: Can you try https://github.com/JustAnotherArchivist/NewsGrabber-Warrior ?
I'll send a PR if it works.
[13:40]
Smileyrnot atm i can't
bvecause the entire thing goes unresponsive
[13:41]
mlsJAA: Works for me on v2 [13:51]
JAASweet, thanks.
Kaz: https://github.com/ArchiveTeam/NewsGrabber-Warrior/pull/10
[13:52]
mlsIn case of pip installed user youtube-dl, there's ~/.local/bin/youtube-dl [13:53]
JAAHmm, right. [13:53]
mlsBut I guess that wouldn't be a problem since it's most likely a manual run [13:53]
JAAYeah [13:54]
Smileyso what was wrong with my pr btw? [13:55]
JAANothing in particular, it's just way cleaner to pass the path to youtube-dl explicitly instead of creating symlinks.
And I just realised that I didn't remove that symlink code.
[13:57]
mlsJAA: Line 32 through 37 can be removed, it's the symlink dealio
lol
[13:57]
JAAHehe [13:58]
mlsWas just looking at the source directory, ah.. youtube-dl, wait wut? [13:58]
JAAPR updated [13:59]
Kazright [14:06]
mlsSo what about per project preferred python version, useful and if so, should warriorhq contain this information rather than the project itself? [14:06]
KazProbably best to not break the tracker too
not today at least
#10 merged
[14:07]
mlsOh ok, I won't.
=P
I just found out that all my dedupes have been failing as my distro has /usr/bin/python linked to python 3.x
Hurrah
[14:08]
JAAAs it should be. Python 2 needs to die already. [14:12]
mlsBut Warcio crapped out because of it [14:12]
JAAYeah, I know. [14:12]
mlsBut but.. [14:12]
JAAIt's why I don't run newsgrabber on my machines. [14:12]
mlsOh, fairynuf [14:12]
JAAAll projects except this and Flickr (and Vidme) are Python 3 compatible.
(Flickr uses a Python 2 only package for processing WARCs, Vidme had the same thing as this.)
[14:12]
IglooThis needs some serious time spending on it [14:15]
JAAYeah [14:15]
IglooIm busy re-writing my Cisco Firepoewr migration tool for work at the minute.
as the new version breaks *everything*
[14:15]
mlsUgh [14:15]
Iglooand i'm going python2 to 3 as well
So i've rm -rfd the directory and gone back at it.
[14:16]
JAAFitting name though, if it sets everything on fire. :-) [14:16]
IglooTrue [14:16]
SmileyPOEWRRRRR!!!!
so my birthday pr didn't go in :P
Smiley ponders what to try and fix now
Also what's the eaiest way of testing? - fire up a warrior, change the git origin?
[14:22]
Kazit's a bit of a pain with warrior.. as it wipes the repo each reboot
and will pull the new repo from the data it gets from warriorhq
you really can't win
[14:24]
Smileybut for a quick test you could just edit the repo
git pull, see if it works?
obviously that's not tested on a newly booted warrior, but.
[14:28]
IglooYou can just edit the local file
and manually re-launch it
For testing.
[14:35]
.... (idle for 16mn)
mlsI'd edit /etc/hosts and do some magic there in case you want to redirect to your local resources
Stricly for testing purposes
[14:51]
midas3wink wink [14:52]
KazKaz screams internally [14:52]
***Kaz changes topic to: https://github.com/ArchiveTeam/NewsGrabber-Warrior // Warrior v3 isn't supported, use v2 [14:53]
mls^_^ [14:53]
........ (idle for 39mn)
***blitzed has joined #newsgrabber [15:32]
jrwrKaz: screams externally
arkiver: fix it :P
[15:32]
..... (idle for 20mn)
SmileyINFO Fetched \u2018https://static.xx.fbcdn.net/rsrc.php/v3ifM24/yJ/l/en_GB/CSSBackgroundPattern.art\u2019: 400 Bad Request. Length: 0 [text/html; charset=UTF-8].
I presume these are normal and ok?
[15:53]
IglooYeah [15:53]
Smileyor is the unicode still busted?
oh yeah trie dto load it manually. deffo dead
Deduplicating digest sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ, url https://static.xx.fbcdn.net/rsrc.php/v3iQrP4/yd/l/en_GB/3d-bkgrd-btn-12-2x.jpg
so yey dedupe works
i believe yt-dl is now fixed...
progress smells good
It only took me crying about it 2 days :P
Does the V3 warrior handle high load better?
as I presume the web component is outside of the project?
[15:53]
also, we should offer the warrior as a internet traffic anonimisation assistance tool
anyone looking at my traffic is going to be going 'wtf?!'
Also, do the tasks that fail, matter?
[16:08]
......................... (idle for 2h3mn)
Kazyes but no
they can be requeued
[18:13]
................. (idle for 1h22mn)
***mls has left [19:35]
octothorp has quit IRC (Read error: Connection reset by peer)
octothorp has joined #newsgrabber
[19:46]
blitzed has quit IRC (Read error: Connection reset by peer) [19:57]
.... (idle for 15mn)
blitzed has joined #newsgrabber [20:12]
........... (idle for 54mn)
HCross2I logged into BunnyCDN to have a look... https://usercontent.irccloud-cdn.com/file/6Z4nfksX/image.png [21:06]
..... (idle for 22mn)
also. Discovery in Melbourne, Australia should be up soon [21:28]
***Atom-- has joined #newsgrabber [21:35]
Atom has quit IRC (Read error: Operation timed out) [21:41]
Atom-- has quit IRC (Read error: Operation timed out) [21:46]
.............. (idle for 1h5mn)
Kazdid youtube-dl get fixed today?
I'm so lost on where we are with this
[22:51]
JAAMy PR should've fixed it, I think, but I don't know really. [22:54]
Kazjust waiting on a video item to hit my warrior [22:54]
looks okay, I don't get an error about missing youtube-dl now
thanks Jaa (And Smiley too, even though I didn't use your PR)
[22:59]
JAASweet [23:01]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)