Time |
Nickname |
Message |
00:29
🔗
|
teej_ |
The Newsgrabber warrior project doesn't seem to do anything. I am using the latest VirtualBox appliance (archiveteam-warrior-v3-20171013.ova) that was provided. |
00:31
🔗
|
JAA |
Already answered in #newsgrabber, but for completeness: Newsgrabber in the warrior is unsupported. |
02:35
🔗
|
|
human4565 has joined #warrior |
06:37
🔗
|
|
zera has joined #warrior |
06:41
🔗
|
zera |
hey not sure if this is the right place here, but I've been trying to get tumblr-grab running, but it seems like the aur wget-lua is failing during the build process |
06:57
🔗
|
zera |
well guess I'll just use the warrior dockerfile |
06:58
🔗
|
|
zera has quit IRC (Quit: Page closed) |
09:20
🔗
|
jut |
I can't get the tumblr script to run on a new ubuntu 16.4.5 installl I get bash: run-pipeline: command not found? |
09:20
🔗
|
jut |
I tried pip2 |
09:56
🔗
|
kiska |
Did you try pip install --upgrade seesaw? |
09:56
🔗
|
kiska |
usually that works for me |
09:59
🔗
|
jut |
I did, I'm installing 18.04 to see what happens |
12:12
🔗
|
|
Silvan has quit IRC (Read error: Operation timed out) |
12:19
🔗
|
|
SilSte has joined #warrior |
12:26
🔗
|
|
nertzy has joined #warrior |
12:59
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
16:25
🔗
|
human4565 |
what is the proper procedure to ascertain which tasks have been aborted through warrior shutdown and reporting them? |
16:27
🔗
|
JAA |
human4565: You can ask someone to requeue all items associated with either your username or your IP address. Won't work if you run multiple instances on the same IP obviously. |
16:27
🔗
|
human4565 |
JAA: I assume that will only requeue items which were not finished? |
16:27
🔗
|
JAA |
But generally, don't worry about it; all items get requeued at the end anyway normally, assuming there's still time to do so. |
16:27
🔗
|
JAA |
Correct. |
16:28
🔗
|
human4565 |
I guess that leaves one concern. Is there any priority in queuing, such that more important items might be assigned early on? |
16:28
🔗
|
JAA |
The internal term for this is a "claim": a pipeline claims an item and later reports it as done. If that doesn't happen, the item remains claimed under your username, and the "requeueing" actually just means releasing those claims. |
16:29
🔗
|
JAA |
The same thing happens when an item fails, by the way. |
16:29
🔗
|
JAA |
That item would just remain claimed and get released eventually. |
16:30
🔗
|
JAA |
Whether items are queued by priority depends strongly on the project. If I were to run such a project, I'd make sure to requeue all unresolved claims for items of the highest priority before adding a batch of less-urgent items. |
16:30
🔗
|
human4565 |
ok, I'm working on tumblr which seems it could be a bit rushed |
16:31
🔗
|
JAA |
Right. Not sure how the item list was prepared there exactly. |
16:31
🔗
|
human4565 |
ok |
16:56
🔗
|
human4565 |
is it possible to open a gui when a virtualbox warrior was started headless |
17:26
🔗
|
|
Atom__ has quit IRC (Read error: Operation timed out) |
17:27
🔗
|
|
Atom__ has joined #warrior |
17:34
🔗
|
sep332 |
I got this error on one warrior job but the others seem to be going ok: |
17:34
🔗
|
sep332 |
rsync error: error in socket IO (code 10) at clientserver.c(128) [sender=3.1.1] |
17:34
🔗
|
sep332 |
Is there a way to recover this or am I going to have to abandon it? |
18:30
🔗
|
|
vectr0n has left |
19:31
🔗
|
|
pukkie has joined #warrior |
20:08
🔗
|
|
human4565 has quit IRC (Quit: Leaving) |
20:43
🔗
|
|
pukkie has left |
22:22
🔗
|
|
tuluu has joined #warrior |
22:22
🔗
|
|
tuluu has quit IRC (Remote host closed the connection) |
22:23
🔗
|
|
tuluu has joined #warrior |
22:59
🔗
|
|
twoTBHetz has joined #warrior |
23:02
🔗
|
twoTBHetz |
Hi i just setup my first warrior of #tumbrl (in a headless virtual box). I gave it 2 GB of RAM, did not change the 60GB disk size and one 1 CPU. My server is still idling arounf at 0.12 load with 4 cores (2 hyoer threads). Are there some ways how i can get it more busy? |
23:03
🔗
|
|
elomatreb has joined #warrior |
23:05
🔗
|
twoTBHetz |
Also the "data transfer graph" in the web interface is flat 0, but i the logs suggest it is wokring |
23:12
🔗
|
JAA |
twoTBHetz: If you want to get the most of your hardware, don't use the warrior but run the scripts directly. |
23:12
🔗
|
JAA |
And yeah, I think someone said that the transfer graph is broken. |
23:12
🔗
|
twoTBHetz |
i am under solaris ... i doubt they run directly |
23:15
🔗
|
twoTBHetz |
where can i get a tarball of the scripts and read about the dependencies? |
23:16
🔗
|
JAA |
https://github.com/archiveteam/tumblr-grab#running-without-a-warrior |
23:18
🔗
|
twoTBHetz |
great so i just have to build pip ;) |
23:19
🔗
|
JAA |
Might be easier to set up a Debian VM and do it there. :-) |
23:21
🔗
|
twoTBHetz |
how is a debian vm faster than warrior? |
23:22
🔗
|
JAA |
Warrior is limited in terms of concurrency. If you have your own VM, you can do it all manually (and also get rid of the web dashboard). |
23:24
🔗
|
twoTBHetz |
you mean like being limited to 6 parallel things? |
23:24
🔗
|
JAA |
Yeah |
23:24
🔗
|
JAA |
You can run at a higher concurrency and multiple pipelines in parallel. |
23:25
🔗
|
twoTBHetz |
I see. |
23:29
🔗
|
twoTBHetz |
prefered pip version? |
23:29
🔗
|
JAA |
¯\_(ツ)_/¯ |
23:29
🔗
|
JAA |
Shouldn't matter as long as it can install packages. |
23:30
🔗
|
JAA |
As for Python 2/3, I think the Tumblr project should work with both. I'm using 3.4 on one machine, FWIW. |
23:31
🔗
|
elomatreb |
I'm using the warrior to work on the tumblr project, and one of the items it picked up is running for 7 hours and 130k URLs by now, is that to be expected or did something break? |
23:32
🔗
|
JAA |
No, that's expected. Some of those blogs are huge. |
23:32
🔗
|
JAA |
Anything specifically about the Tumblr project in #tumbledown please. |
23:32
🔗
|
elomatreb |
Oh, ok, sorry |