| Time |
Nickname |
Message |
|
19:26
🔗
|
ersi |
so if you got any more questions/thoughts, just shoot :-) |
|
19:26
🔗
|
shaqfu |
re: project RAM use: is it possible for the tracker to pass smaller jobs to low-power instances? |
|
19:26
🔗
|
ersi |
no such distinction today |
|
19:27
🔗
|
ersi |
unless we intentionally make "smaller" jobs by knowing they'll probably be low-power tasks |
|
19:27
🔗
|
shaqfu |
Hm, does the tracker split the whole task into jobs first, or does it do it on a per-request basis? |
|
19:28
🔗
|
ersi |
one task = one job |
|
19:28
🔗
|
shaqfu |
I mean, task as in "all of FormSpring" |
|
19:29
🔗
|
ersi |
Ah. Well, that's entirely a human task currently for what I know |
|
19:29
🔗
|
ersi |
ie. the awesome person who researches the target, writes the project code and so (Most often, awesomely and much appreciatedly done by alard) |
|
19:30
🔗
|
shaqfu |
It could be split between normal and low-power jobs on a per-request basis, but that seems like it may require totally restructuring the tracker |
|
19:31
🔗
|
ersi |
I assume that's mostly a lot of difficult work on estimating what's a "big" and "small" task when breaking up the archival job |
|
19:31
🔗
|
shaqfu |
It's not just an issue of "here's 1000 urls" vs "here's 100"? |
|
19:32
🔗
|
ersi |
it's impossible to know how many links each link will link to, how many resources there are per page and such in before hand |
|
19:32
🔗
|
shaqfu |
Gotcha |
|
19:32
🔗
|
ersi |
unless you know, crawl them all in before.. but we might as well grab all of it as we go in that case |
|
19:33
🔗
|
shaqfu |
We could just build the image with loads of swap ;) |
|
19:33
🔗
|
ersi |
Indeed, and/or make it default to only do one/two tasks at a time |
|
19:33
🔗
|
ersi |
and people who host the pi's could turn it up if it works |
|
19:34
🔗
|
ersi |
I'm up for giving it a try without modifying any of the current infra, I got a 512MB RAM based Pi laying about |
|
19:34
🔗
|
shaqfu |
Same, but mine's running a fairly large job right now |
|
19:35
🔗
|
shaqfu |
(only in #archiveteam is a 1.56M crawl "fairly large"...) |
|
19:35
🔗
|
shaqfu |
Would it be tricky to port to ARM? |
|
19:37
🔗
|
ersi |
only thing we need to do is compile wget(-lua) for what I know |
|
19:37
🔗
|
shaqfu |
There are .debs for 1.13, but not 1.14 |
|
19:37
🔗
|
shaqfu |
That's something on my to-to list |
|
19:37
🔗
|
ersi |
yeah, but we need wget-lua - which is something that alard has cooked up :) |
|
19:37
🔗
|
ersi |
it's based on wget-1.14 |
|
19:38
🔗
|
shaqfu |
Getting to 1.14 should be first in line, then, if only as a public good :) |
|
19:38
🔗
|
ersi |
sure it isn't available in experimental? |
|
19:41
🔗
|
shaqfu |
Dunno if there's experimental on Raspbian |
|
19:42
🔗
|
ersi |
ah |
|
19:48
🔗
|
shaqfu |
Seems like a reasonable first step |
|
22:23
🔗
|
shaqfu |
wget 1.14 builds out of the box on RAspbian |
|
22:25
🔗
|
shaqfu |
Suppose I'll keep trying to build wget-lua until it stops yelling about dependencies :) |
|
22:57
🔗
|
shaqfu |
Hm, error building wget-lua: couldn't find css.c |
|
22:58
🔗
|
Cameron_D |
I remember getting that error a few times, can't remember what needed to be done to fix it though :( |
|
22:59
🔗
|
shaqfu |
Everything else seems to be going along very well |
|
23:00
🔗
|
shaqfu |
Cameron_D: Was it a missing library? |
|
23:01
🔗
|
Cameron_D |
I think so, it was a strange library, found the solution somewhere deep in Google |
|
23:04
🔗
|
shaqfu |
Trying it with flex |
|
23:07
🔗
|
shaqfu |
Hm, nope |