#newsgrabber 2017-11-02,Thu

Logs of this channel are not protected. You can protect them by a password.

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)


WhoWhatWhen
***svchfoo3 has quit IRC (Read error: Operation timed out) [01:07]
................ (idle for 1h15mn)
wolfkin has quit IRC (Ping timeout: 255 seconds) [02:22]
.................... (idle for 1h36mn)
wolfkin has joined #newsgrabber
wolfkin has quit IRC (Client Quit)
[03:58]
................................................................ (idle for 5h19mn)
logchfoo3 starts logging #newsgrabber at Thu Nov 02 09:21:08 2017
logchfoo3 has joined #newsgrabber
dxrt has joined #newsgrabber
arkiver has joined #newsgrabber
svchfoo1 sets mode: +o arkiver
[09:21]
.................... (idle for 1h35mn)
IglooJAA: If we're loosing pipelines that offer still stands. I can add either a US one with ~100Gb of disk space and 1Gbps or in a few days a EUR 2TB 100Mbps server [10:59]
JAAYeah, I'll ask David about it. [11:06]
.............................................. (idle for 3h48mn)
***rm0 has joined #newsgrabber [14:54]
rm0hey! is this the right place to ask a question about running the warrior? [14:55]
IglooSure [14:56]
rm0I set up the docker image, and it says it is running, with network traffic in the web interface graph in the corner, and on the docker0 interface. I wanted to make sure things are really working though, because my school network can be a bit picky. I went to http://tracker.archiveteam.org/newsgrabber/#show-all and did not see my nickname there. is there a way to tell if what I'm doing is helping? [14:59]
IglooHow long has it been running? [14:59]
rm0several hours, less 11, more than 6
it says "The warrior is beginning work on a project" and the current project tab is blank
[15:00]
IglooOK, I'm not sure on the docker image... But it *sounds* like it's not started yet [15:01]
JensRexCan you attach a terminal to Docker?
Off topic, but Docker seems a terrible thing to use for archiveteam.
rm0: I haven't used the new Docker image (and never will), but in the old Warrior, there was a "project" you ran to update the Warrior before it would run any other jobs.
Are there any projects of that sort for you to select?
In any case, something is wrong, because Newsgrabber would never run for 11 hours and not finish any jobs.
My server has finished two while I've been typing this.
[15:01]
rm0I ran the update project, it restarted.
I selected a project and it says "The warrior is idle. Select a project."
I have a shell attached
[15:07]
JensRexBlah. Haven't got my Warrior VM set up since the reinstall. [15:10]
rm0I saved settings a second time and it is now "beginning a project"
what would be really nice is something in the web interface saying what it's doing, like "getting urls; downloading newssite.com/news; uploading newssite.com/news.warc"
[15:10]
JensRexThat is supposed to happen when a project is running. [15:12]
Igloo^ there should be a web gui :) [15:12]
rm0There is but it is blank for some reason
http://imgur.com/aWyUFCal.png
[15:13]
JensRexIf you have a shell attached, see if it has downloaded any project files. [15:14]
rm0where are those? [15:14]
JensRexI can't remember, and I just now discovered I *lost* my Warrior VM entirely in the reinstall.
Urgh.
https://archive.org/download/Warrior100G/Warrior-100G.ova <-- Is this the one you're using?
[15:15]
rm0I am using https://github.com/ArchiveTeam/warrior-dockerfile
I was trying to get a virtual machine running but it didn't work
[15:16]
JensRexThe regular Warrior should work, but it's older than dirt.
I'm just running the scripts manually. Easier to debug.
The Docker nonsense hasn't been extensively tested as far as I know.
[15:17]
rm0where can I get the scripts? [15:18]
JensRexhttps://github.com/ArchiveTeam/NewsGrabber-Warrior [15:18]
rm0got it downloaded, had to replace "__init__.pyc" with "__init__.py" on line 21 of pipeline.py for it to start
it looks like it's running now
getting a lot of timeouts and 400 Bad Requests
also a unicode error, is this supposed to be python2?
[15:29]
JensRexYes. [15:34]
rm0that was why I got the pyc error too [15:37]
JensRexShouldn't need to edit pipeline.py [15:37]
rm0I set it back to normal and it's running now, it was a python2/3 thing [15:38]
JensRexI've been getting youtube-dl errors too. Unsure how to fix. [15:38]
rm0youtube-dl is so useful
I also need dnspython right
[15:42]
JensRexarkiver: https://bpaste.net/show/d8c5359c68d3
Latest fail.
[15:42]
rm0oh I got that one [15:43]
JensRexSo it's not just me being incompetent.
pip
meh... not my terminal
[15:43]
rm0I also don't have youtube-dl installed for some reason?
check if you do
[15:44]
JensRexI have. But not the one from apt, because it wants to pull 5000 packages, including X11.
Installed the one from pip2.
[15:44]
rm0do you have the command working because it looks like that's what it's looking for
do a which youtube-dl
I get "/usr/bin/youtube-dl" now
[15:45]
JensRex/usr/local/bin/youtube-dl [15:46]
rm0hm it should be able to find it
maybe that's not in the path subprocess is looking in?
I'm not sure how all that works
[15:46]
JensRexMe neither. [15:47]
rm0run this: python2 -c "import subprocess; subprocess.call('youtube-dl -h', shell=True)" [15:49]
Everything's stuck on uploading, do you know what port it uses? [15:57]
rsync error: timeout in data send/receive (code 30) at io.c(195) [sender=3.1.1] [16:03]
JensRexNope. [16:10]
rm0did you get the youtube-dl issue fixed? [16:21]
JensRexNo. [16:21]
......................... (idle for 2h4mn)
arkiverhello
Do NOT edit any of the scripts and run the project with the edited scripts
rm0: ^
[18:25]
rm0got it [18:26]
arkiverplease stop running the project [18:26]
rm0it was not able to upload
nothing was broken
the rsync port is blocked for me
[18:27]
arkiverwhat is the error you get
are you running the edited scripts
[18:28]
rm0no, I am not running the edited scripts. the only thing I changed was where it looks for the files to import.
https://bpaste.net/show/4bef0595c2a1
[18:29]
arkiverHCross2: Do you have any idea about that rsync problem? [18:42]
HCross2Is it just one user?
The disks are doing a lot at the moment
[18:43]
......................... (idle for 2h0mn)
***rm0 has quit IRC (Read error: Operation timed out) [20:44]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)