#warrior 2018-12-09,Sun

↑back Search

Time Nickname Message
00:29 🔗 teej_ The Newsgrabber warrior project doesn't seem to do anything. I am using the latest VirtualBox appliance (archiveteam-warrior-v3-20171013.ova) that was provided.
00:31 🔗 JAA Already answered in #newsgrabber, but for completeness: Newsgrabber in the warrior is unsupported.
02:35 🔗 human4565 has joined #warrior
06:37 🔗 zera has joined #warrior
06:41 🔗 zera hey not sure if this is the right place here, but I've been trying to get tumblr-grab running, but it seems like the aur wget-lua is failing during the build process
06:57 🔗 zera well guess I'll just use the warrior dockerfile
06:58 🔗 zera has quit IRC (Quit: Page closed)
09:20 🔗 jut I can't get the tumblr script to run on a new ubuntu 16.4.5 installl I get bash: run-pipeline: command not found?
09:20 🔗 jut I tried pip2
09:56 🔗 kiska Did you try pip install --upgrade seesaw?
09:56 🔗 kiska usually that works for me
09:59 🔗 jut I did, I'm installing 18.04 to see what happens
12:12 🔗 Silvan has quit IRC (Read error: Operation timed out)
12:19 🔗 SilSte has joined #warrior
12:26 🔗 nertzy has joined #warrior
12:59 🔗 nertzy has quit IRC (Quit: This computer has gone to sleep)
16:25 🔗 human4565 what is the proper procedure to ascertain which tasks have been aborted through warrior shutdown and reporting them?
16:27 🔗 JAA human4565: You can ask someone to requeue all items associated with either your username or your IP address. Won't work if you run multiple instances on the same IP obviously.
16:27 🔗 human4565 JAA: I assume that will only requeue items which were not finished?
16:27 🔗 JAA But generally, don't worry about it; all items get requeued at the end anyway normally, assuming there's still time to do so.
16:27 🔗 JAA Correct.
16:28 🔗 human4565 I guess that leaves one concern. Is there any priority in queuing, such that more important items might be assigned early on?
16:28 🔗 JAA The internal term for this is a "claim": a pipeline claims an item and later reports it as done. If that doesn't happen, the item remains claimed under your username, and the "requeueing" actually just means releasing those claims.
16:29 🔗 JAA The same thing happens when an item fails, by the way.
16:29 🔗 JAA That item would just remain claimed and get released eventually.
16:30 🔗 JAA Whether items are queued by priority depends strongly on the project. If I were to run such a project, I'd make sure to requeue all unresolved claims for items of the highest priority before adding a batch of less-urgent items.
16:30 🔗 human4565 ok, I'm working on tumblr which seems it could be a bit rushed
16:31 🔗 JAA Right. Not sure how the item list was prepared there exactly.
16:31 🔗 human4565 ok
16:56 🔗 human4565 is it possible to open a gui when a virtualbox warrior was started headless
17:26 🔗 Atom__ has quit IRC (Read error: Operation timed out)
17:27 🔗 Atom__ has joined #warrior
17:34 🔗 sep332 I got this error on one warrior job but the others seem to be going ok:
17:34 🔗 sep332 rsync error: error in socket IO (code 10) at clientserver.c(128) [sender=3.1.1]
17:34 🔗 sep332 Is there a way to recover this or am I going to have to abandon it?
18:30 🔗 vectr0n has left
19:31 🔗 pukkie has joined #warrior
20:08 🔗 human4565 has quit IRC (Quit: Leaving)
20:43 🔗 pukkie has left
22:22 🔗 tuluu has joined #warrior
22:22 🔗 tuluu has quit IRC (Remote host closed the connection)
22:23 🔗 tuluu has joined #warrior
22:59 🔗 twoTBHetz has joined #warrior
23:02 🔗 twoTBHetz Hi i just setup my first warrior of #tumbrl (in a headless virtual box). I gave it 2 GB of RAM, did not change the 60GB disk size and one 1 CPU. My server is still idling arounf at 0.12 load with 4 cores (2 hyoer threads). Are there some ways how i can get it more busy?
23:03 🔗 elomatreb has joined #warrior
23:05 🔗 twoTBHetz Also the "data transfer graph" in the web interface is flat 0, but i the logs suggest it is wokring
23:12 🔗 JAA twoTBHetz: If you want to get the most of your hardware, don't use the warrior but run the scripts directly.
23:12 🔗 JAA And yeah, I think someone said that the transfer graph is broken.
23:12 🔗 twoTBHetz i am under solaris ... i doubt they run directly
23:15 🔗 twoTBHetz where can i get a tarball of the scripts and read about the dependencies?
23:16 🔗 JAA https://github.com/archiveteam/tumblr-grab#running-without-a-warrior
23:18 🔗 twoTBHetz great so i just have to build pip ;)
23:19 🔗 JAA Might be easier to set up a Debian VM and do it there. :-)
23:21 🔗 twoTBHetz how is a debian vm faster than warrior?
23:22 🔗 JAA Warrior is limited in terms of concurrency. If you have your own VM, you can do it all manually (and also get rid of the web dashboard).
23:24 🔗 twoTBHetz you mean like being limited to 6 parallel things?
23:24 🔗 JAA Yeah
23:24 🔗 JAA You can run at a higher concurrency and multiple pipelines in parallel.
23:25 🔗 twoTBHetz I see.
23:29 🔗 twoTBHetz prefered pip version?
23:29 🔗 JAA ¯\_(ツ)_/¯
23:29 🔗 JAA Shouldn't matter as long as it can install packages.
23:30 🔗 JAA As for Python 2/3, I think the Tumblr project should work with both. I'm using 3.4 on one machine, FWIW.
23:31 🔗 elomatreb I'm using the warrior to work on the tumblr project, and one of the items it picked up is running for 7 hours and 130k URLs by now, is that to be expected or did something break?
23:32 🔗 JAA No, that's expected. Some of those blogs are huge.
23:32 🔗 JAA Anything specifically about the Tumblr project in #tumbledown please.
23:32 🔗 elomatreb Oh, ok, sorry

irclogger-viewer