[Twisted-Python] running 1,000,000 tasks, 40 at-a-time
itamar at itamarst.org
Wed Oct 26 10:24:34 EDT 2011
On Wed, 2011-10-26 at 10:02 -0400, Jason Rennie wrote:
> The background:
> I've been using DeferredSemaphore and DeferredList to manage the
> running of tasks with a resource constraint (only so many tasks can
> run at the same time). This worked great until I tried to use it to
> manage millions of tasks. Simply setting them up to run
> (DeferredSemaphore.run() calls) took appx. 2 hours and used ~5 gigs of
> ram. This was less efficient than I expected. Note that these
> numbers don't include time/memory for actually running the tasks, only
> time/memory to set up the running of the tasks. I've since written a
> custom task runner that has uses comparatively little setup
> time/memory by adding a "manager" callback to each task which starts
> additional tasks as appropriate.
> My questions:
> * Is the behavior I'm seeing expected? i.e. are DS/DL only
> recommended for task management if the # of tasks not too
> large? Is there a better way to use DS/DL that I might not be
> thinking of?
DeferredList is intended for the case where you want to wait for all
results to have arrived. Given its API, you basically *have* to create
all the millions of input Deferreds first (although not the tasks
themselves, if you're clever). So this is going to be slow, and use a
lot of memory... although 5 gigs is rather surprising, unless each task
has a lot of state.
> * Is there a Twisted pattern for managing tasks efficiently that
> I might be missing?
It seems like you've figured it out, if you've written a custom task
runner. Probably Twisted should include some better abstraction for
doing this sort of thing, since it does come up regularly.
More information about the Twisted-Python