#warrior 2013-02-26,Tue

↑back Search

Time Nickname Message
06:09 🔗 omf_ Do we have any sites we use for testing spiders and software
06:10 🔗 omf_ I know we use wget now but I know we will need more advanced stuff later. I am working on converting some stuff I wrote for my own data mining to be more generalized so everyone can use them
06:10 🔗 omf_ Right now I just spider the wiki as a test
08:27 🔗 ersi omf_: Not really, that I know of. We just prod sites we're gonna fetch afaik
11:33 🔗 soultcer alard: What's the reason for limiting the concurrent tasks to 6?
12:25 🔗 ewook besides the official reason, being nice to the site should be one :).
12:39 🔗 ersi I would imagine it being related to not loading the connection or disk of the VM too much. </imagination>
18:52 🔗 alard soultcer: Mainly memory (and disk, but that's less critical). And the limit prevents people from starting hundreds of tasks and then failing to complete any of them.
19:11 🔗 soultcer alard: Well with all the ec2 machines running there will still be a lot of abandoned tasks. I assume the tracker will reassign them eventually, right?
19:15 🔗 soultcer Oh
19:16 🔗 soultcer What will happen with uploads that are incomplete, i.e. where the client shut off during the upload?
19:17 🔗 soultcer On rsync, they go into a partial dir, which can be ignored when creating a pack
19:17 🔗 soultcer But what happens on curlupload?
20:09 🔗 alard soultcer: The abandoned tasks will be reassigned if someone clicks the button.
20:09 🔗 alard HTTP uploads, that depends on the web server. The Nginx server we used so far kept uploads in a temporary directory, so that's fine.
20:10 🔗 soultcer Good

irclogger-viewer