[Twisted-Python] Re: Memory size of Deferreds

Mon May 19 16:45:26 EDT 2008

"Michal Pasternak" <michal.dtz at gmail.com> writes:

Thanks for the reply!

> 2008/5/19 Martin Geisler <mg at daimi.au.dk>:
>> What worries me is the size of a single Deferred. One user wanted to
>> do computations on very large lists of Deferreds and reported that his
>> programs used more memory than expected.
>>
>> I tried to measure the size of a Deferred by creating large lists of
>> them and found that a single empty Deferred takes up about 200 bytes.
>> Adding a callback brings that up to about 500 bytes.
>
> As such thing as memory allocation is tightly tied to the runtime
> environment, it is a bit hard to give reasonable answers without any
> information about, at least, an OS and a CPU.

Ah, sorry: I tested this on an Athlon 64 X2 (but with 32-bit Debian). I
simply tested like this:

% python
Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05) 
[GCC 4.2.3 (Debian 4.2.3-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from twisted.internet.defer import Deferred
>>> a = [Deferred() for _ in xrange(100000)]
>>> b = [Deferred() for _ in xrange(100000)]
>>> c = [Deferred() for _ in xrange(100000)]

and noted how the memory usage rose from 8 MiB to 30 MiB, to 52 MiB to
73 MiB in 'top'. So that is about 22 MiB per list -- around ~200 bytes
pr Deferred. (I know there is some overhead for the list.)

> Did the memory in the case you mention grow constantly? Is that user
> sure, that it was not a memory leak in his/hers code?

I think he was just concerned that his application used 450 MiB RAM, and
that made me try to see how big an empty Deferred is.

> Deferreds, as you all already know, are a core concept that Twisted is
> built upon. If you need to optimize such areas of Twisted, then, just
> my suggestion - maybe you are using a wrong tool to do the job. If the
> memory usage is a problem, then, well, maybe you are trying to do
> things at once, when you should do them sequentially (eg. using a
> generator).

Yes, you may be right... scheduling everything to run at once is
probably not necessary. It just makes things simple with my design.

I would love to get some advice on the design of my library -- it is the
first thing I have made with Twisted. The goal of VIFF is to enable
shared computations -- think addition and multiplication. So the user writes

  x = add(a, mul(b * c))

and where a, b, c are Deferreds which will eventually hold integers. The
add and mul functions return new Deferreds which will eventually hold
the results. Addition is simple:

  def add(x, y):
      sum = gatherResults([x, y])
      sum.addCallback(lambda results: results[0] + results[1])
      return sum

Multiplication is similar, but requires network communication -- that is
why it returns a Deferred.

This is the basic operation of the library: an expression tree is built
and evaluated from the leaves inwards. The important point is that as
much as possible is done in parallel, that all operations run as soon as
their operands are available. Twisted made this extremely easy thanks to
the DeferredList and Deferred abstraction.

> Maybe you should rather look into other areas of the problem (eg. not
> try to optimize deferreds, but the code that uses them). Maybe you
> should use something else than Deferreds for that job. As for memory
> usage of such small objects like Deferreds, well... Python is, well,
> Python. Python string or integer uses much more memory, than C string
> or int. You can't tune Python string memory usage, and you hardly can
> do anything about size of Deferred if that becomes a problem.

There is the possibility of using the __slots__ attribute to avoid the
object dictionary. I don't know what negative sideeffects that might
have, though.

Also, I have yet to try the C implementation of Deferred to see how it
performs memory-wise.

> On the other hand, you should be able to write optimal code with
> Python faster and memory footprint shouldn't be a problem (from my
> experience, in many cases Twisted footprint is suprisingly low).
>
> In other words - if size of Deferred object is your only bottleneck,
> then congratulations :-)

Hehe, thanks! :-) I am certainly not claiming that Deferreds are the
only bottleneck or consumer of memory. I just figured that they would be
an easy starting point since my code (and as you say, Twisted in
general) uses them a lot.

> I'd rather suggest optimizing other parts of the code - as Deferreds
> are the core of Twisted, I suppose it could be a bit hard to do
> anything about them, in the same way you can't do anything about
> memory footprint of core Python objects.

Right... I'll try looking through the code to see if I by mistake hold
on to Deferreds or other objects longer than I have to.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20080519/129814a7/attachment.pgp