[06:09] Do we have any sites we use for testing spiders and software [06:10] I know we use wget now but I know we will need more advanced stuff later. I am working on converting some stuff I wrote for my own data mining to be more generalized so everyone can use them [06:10] Right now I just spider the wiki as a test [08:27] omf_: Not really, that I know of. We just prod sites we're gonna fetch afaik [11:33] alard: What's the reason for limiting the concurrent tasks to 6? [12:25] besides the official reason, being nice to the site should be one :). [12:39] I would imagine it being related to not loading the connection or disk of the VM too much. [18:52] soultcer: Mainly memory (and disk, but that's less critical). And the limit prevents people from starting hundreds of tasks and then failing to complete any of them. [19:11] alard: Well with all the ec2 machines running there will still be a lot of abandoned tasks. I assume the tracker will reassign them eventually, right? [19:15] Oh [19:16] What will happen with uploads that are incomplete, i.e. where the client shut off during the upload? [19:17] On rsync, they go into a partial dir, which can be ignored when creating a pack [19:17] But what happens on curlupload? [20:09] soultcer: The abandoned tasks will be reassigned if someone clicks the button. [20:09] HTTP uploads, that depends on the web server. The Nginx server we used so far kept uploads in a temporary directory, so that's fine. [20:10] Good