[00:30] <ivan`> how much free memory does the redis server on http://tracker.archiveteam.org/ have?
[00:31] <ivan`> I'm going to have many gigabytes of feed URLs
[19:32] <alard> ivan`: The tracker has a few (undocumented) features that may be useful for your project.
[19:33] <alard> (Nice to see someone using the Lua extension.)
[19:36] <alard> GetItemFromTracker returns a json object to the warrior. The properties of that object end up in the item dictionary. One of them is set by default, the "item_name", but you can add custom keys.
[19:37] <alard> The custom data is generated by a little bit of Ruby that runs on the tracker. This script gets the item name and can fill the data object with other things.
[19:37] <alard> Because it's Ruby, it can also read a file.
[19:39] <alard> In your case, I think it would be handy to put the batch IDs as the items in the Redis queue, and write the url list for each batch to a file with a filename that corresponds to the batch ID.
[19:39] <alard> The extra-parameters script can then read that file and add the urls to the json response.
[19:40] <alard> You can remove the interesting Custom* contraptions from your pipeline. :)
[19:41] <alard> And it'll save a lot of Redis memory, since only the batch IDs are kept in memory.
[19:45] <alard> Last trick: you can give URL lists to Wget via the STDIN pipe. For example: https://github.com/ArchiveTeam/yahoo-upcoming-grab/blob/master/pipeline.py#L276-L281
[19:46] <alard> For the Upcoming project, the tracker would add a "URL1
[19:47] <alard> "URL1\nURL2\n...URLn\n" value in the item["task_urls"] field the json response.