[Twisted-Python] running 1,000,000 tasks, 40 at-a-time

Itamar Turner-Trauring itamar at itamarst.org
Wed Oct 26 10:24:34 EDT 2011


On Wed, 2011-10-26 at 10:02 -0400, Jason Rennie wrote:
> The background:
> 
> 
> I've been using DeferredSemaphore and DeferredList to manage the
> running of tasks with a resource constraint (only so many tasks can
> run at the same time).  This worked great until I tried to use it to
> manage millions of tasks.  Simply setting them up to run
> (DeferredSemaphore.run() calls) took appx. 2 hours and used ~5 gigs of
> ram.  This was less efficient than I expected.  Note that these
> numbers don't include time/memory for actually running the tasks, only
> time/memory to set up the running of the tasks.  I've since written a
> custom task runner that has uses comparatively little setup
> time/memory by adding a "manager" callback to each task which starts
> additional tasks as appropriate. 
> 
> 
> My questions:
>       * Is the behavior I'm seeing expected?  i.e. are DS/DL only
>         recommended for task management if the # of tasks not too
>         large?  Is there a better way to use DS/DL that I might not be
>         thinking of?

DeferredList is intended for the case where you want to wait for all
results to have arrived. Given its API, you basically *have* to create
all the millions of input Deferreds first (although not the tasks
themselves, if you're clever). So this is going to be slow, and use a
lot of memory... although 5 gigs is rather surprising, unless each task
has a lot of state.

>       * Is there a Twisted pattern for managing tasks efficiently that
>         I might be missing?

It seems like you've figured it out, if you've written a custom task
runner. Probably Twisted should include some better abstraction for
doing this sort of thing, since it does come up regularly.






More information about the Twisted-Python mailing list