[00:30] how much free memory does the redis server on http://tracker.archiveteam.org/ have? [00:31] I'm going to have many gigabytes of feed URLs [19:32] ivan`: The tracker has a few (undocumented) features that may be useful for your project. [19:33] (Nice to see someone using the Lua extension.) [19:36] GetItemFromTracker returns a json object to the warrior. The properties of that object end up in the item dictionary. One of them is set by default, the "item_name", but you can add custom keys. [19:37] The custom data is generated by a little bit of Ruby that runs on the tracker. This script gets the item name and can fill the data object with other things. [19:37] Because it's Ruby, it can also read a file. [19:39] In your case, I think it would be handy to put the batch IDs as the items in the Redis queue, and write the url list for each batch to a file with a filename that corresponds to the batch ID. [19:39] The extra-parameters script can then read that file and add the urls to the json response. [19:40] You can remove the interesting Custom* contraptions from your pipeline. :) [19:41] And it'll save a lot of Redis memory, since only the batch IDs are kept in memory. [19:45] Last trick: you can give URL lists to Wget via the STDIN pipe. For example: https://github.com/ArchiveTeam/yahoo-upcoming-grab/blob/master/pipeline.py#L276-L281 [19:46] For the Upcoming project, the tracker would add a "URL1 [19:47] "URL1\nURL2\n...URLn\n" value in the item["task_urls"] field the json response.