[06:09] <omf_> Do we have any sites we use for testing spiders and software
[06:10] <omf_> I know we use wget now but I know we will need more advanced stuff later. I am working on converting some stuff I wrote for my own data mining to be more generalized so everyone can use them
[06:10] <omf_> Right now I just spider the wiki as a test
[08:27] <ersi> omf_: Not really, that I know of. We just prod sites we're gonna fetch afaik
[11:33] <soultcer> alard: What's the reason for limiting the concurrent tasks to 6?
[12:25] <ewook> besides the official reason, being nice to the site should be one :).
[12:39] <ersi> I would imagine it being related to not loading the connection or disk of the VM too much. </imagination>
[18:52] <alard> soultcer: Mainly memory (and disk, but that's less critical). And the limit prevents people from starting hundreds of tasks and then failing to complete any of them.
[19:11] <soultcer> alard: Well with all the ec2 machines running there will still be a lot of abandoned tasks. I assume the tracker will reassign them eventually, right?
[19:15] <soultcer> Oh
[19:16] <soultcer> What will happen with uploads that are incomplete, i.e. where the client shut off during the upload?
[19:17] <soultcer> On rsync, they go into a partial dir, which can be ignored when creating a pack
[19:17] <soultcer> But what happens on  curlupload?
[20:09] <alard> soultcer: The abandoned tasks will be reassigned if someone clicks the button.
[20:09] <alard> HTTP uploads, that depends on the web server. The Nginx server we used so far kept uploads in a temporary directory, so that's fine.
[20:10] <soultcer> Good