[04:47] So, the warrior is the greatest thing ever [04:58] ^ [05:02] lol [05:02] surprised we didn't think of this earlier [05:16] We did! [05:16] I announced the idea a while ago [05:16] And pre-ideas existed [05:16] But now it's just an unstoppable juggernaut [05:16] The real puzzler now is coming up with tasks other than 'download this website' [05:38] keeping up with fanfiction.net scrape, and maybe fictionpress as well [05:48] I mean, there's things like the indexer job [05:48] but yeah, would be nice to always keep them busy [07:24] Can it do the small time compression / splitting files stuff? [07:24] or are they too large to sensibly attempt? [07:27] creating metadata for anything? (I don't know the structure of the data within the archives..) [13:56] SmileyG: Alard had them do that for the tars [13:56] but I'm trying to think of more jobs [13:58] tbh I don't know much of the process... so I can't really help :( [13:58] We'll just have to keep thinking. [13:58] preemp backups of the "sites to be watched" ? [13:59] The problem just isn't processing for determining the contents of items on archive. [17:34] if someone wants to adapt the fanfiction.net scraper I wrote for seesaw, that'd be nice [17:34] they're both in Python, so it's not a language translation [17:34] you'll also need a tracker, which can get a bit expensive to run, especially if you want to do continuous backup [17:35] not sure if the AT shared tracker is up to it [18:28] Is the source up on the AT github space? [18:32] yes, ffnet-grab [18:33] I make no claims for it being *good* Python, but I like to think it is at least straightforward [18:33] heh [18:47] yipdw: What does the tracker need to do that the shared tracker can't do? [19:09] alard: nothing special, it's just a lot of data [19:11] alard: more specifically, frequent updates -- ff.net has 2 million or so users; each user is a work item, and I've noticed that that puts a lot of stress on the tracker when e.g. generating progress graphs [19:11] I did implement a quick-and-dirty trimming method in a private copy of the tracker, but the one I implemented discarded history instead of making past history less granular [19:12] I'm not sure if you or someone else has fixed that yet; still reviewing universal-tracker history [19:32] Ah, yes, no, that's still the same. [19:35] Although it's mainly the stats page that becomes slow; the tracker still works. [19:40] ah, ok [19:40] right