#warrior 2012-08-23,Thu

↑back Search

Time Nickname Message
04:47 🔗 SketchCow So, the warrior is the greatest thing ever
04:58 🔗 underscor ^
05:02 🔗 BlueMax lol
05:02 🔗 BlueMax surprised we didn't think of this earlier
05:16 🔗 SketchCow We did!
05:16 🔗 SketchCow I announced the idea a while ago
05:16 🔗 SketchCow And pre-ideas existed
05:16 🔗 SketchCow But now it's just an unstoppable juggernaut
05:16 🔗 SketchCow The real puzzler now is coming up with tasks other than 'download this website'
05:38 🔗 bsmith094 keeping up with fanfiction.net scrape, and maybe fictionpress as well
05:48 🔗 underscor I mean, there's things like the indexer job
05:48 🔗 underscor but yeah, would be nice to always keep them busy
07:24 🔗 SmileyG Can it do the small time compression / splitting files stuff?
07:24 🔗 SmileyG or are they too large to sensibly attempt?
07:27 🔗 SmileyG creating metadata for anything? (I don't know the structure of the data within the archives..)
13:56 🔗 underscor SmileyG: Alard had them do that for the tars
13:56 🔗 underscor but I'm trying to think of more jobs
13:58 🔗 SmileyG tbh I don't know much of the process... so I can't really help :(
13:58 🔗 SketchCow We'll just have to keep thinking.
13:58 🔗 SmileyG preemp backups of the "sites to be watched" ?
13:59 🔗 SketchCow The problem just isn't processing for determining the contents of items on archive.
17:34 🔗 yipdw if someone wants to adapt the fanfiction.net scraper I wrote for seesaw, that'd be nice
17:34 🔗 yipdw they're both in Python, so it's not a language translation
17:34 🔗 yipdw you'll also need a tracker, which can get a bit expensive to run, especially if you want to do continuous backup
17:35 🔗 yipdw not sure if the AT shared tracker is up to it
18:28 🔗 ersi Is the source up on the AT github space?
18:32 🔗 yipdw yes, ffnet-grab
18:33 🔗 yipdw I make no claims for it being *good* Python, but I like to think it is at least straightforward
18:33 🔗 yipdw heh
18:47 🔗 alard yipdw: What does the tracker need to do that the shared tracker can't do?
19:09 🔗 yipdw alard: nothing special, it's just a lot of data
19:11 🔗 yipdw alard: more specifically, frequent updates -- ff.net has 2 million or so users; each user is a work item, and I've noticed that that puts a lot of stress on the tracker when e.g. generating progress graphs
19:11 🔗 yipdw I did implement a quick-and-dirty trimming method in a private copy of the tracker, but the one I implemented discarded history instead of making past history less granular
19:12 🔗 yipdw I'm not sure if you or someone else has fixed that yet; still reviewing universal-tracker history
19:32 🔗 alard Ah, yes, no, that's still the same.
19:35 🔗 alard Although it's mainly the stats page that becomes slow; the tracker still works.
19:40 🔗 yipdw ah, ok
19:40 🔗 yipdw right

irclogger-viewer