[Twisted-Python] Re: Memory size of Deferreds

Mon May 19 23:45:19 EDT 2008

Martin Geisler wrote:
> "Michal Pasternak" <michal.dtz at gmail.com> writes:
> 
> Thanks for the reply!
> 
>> 2008/5/19 Martin Geisler <mg at daimi.au.dk>:
>>> What worries me is the size of a single Deferred. One user wanted to
>>> do computations on very large lists of Deferreds and reported that his
>>> programs used more memory than expected.
>>>
>>> I tried to measure the size of a Deferred by creating large lists of
>>> them and found that a single empty Deferred takes up about 200 bytes.
>>> Adding a callback brings that up to about 500 bytes.
>> As such thing as memory allocation is tightly tied to the runtime
>> environment, it is a bit hard to give reasonable answers without any
>> information about, at least, an OS and a CPU.
> 
> Ah, sorry: I tested this on an Athlon 64 X2 (but with 32-bit Debian). I
> simply tested like this:
> 
> % python
> Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05) 
> [GCC 4.2.3 (Debian 4.2.3-3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from twisted.internet.defer import Deferred
>>>> a = [Deferred() for _ in xrange(100000)]
>>>> b = [Deferred() for _ in xrange(100000)]
>>>> c = [Deferred() for _ in xrange(100000)]
> 
> and noted how the memory usage rose from 8 MiB to 30 MiB, to 52 MiB to
> 73 MiB in 'top'. So that is about 22 MiB per list -- around ~200 bytes
> pr Deferred. (I know there is some overhead for the list.)
> 
>> Did the memory in the case you mention grow constantly? Is that user
>> sure, that it was not a memory leak in his/hers code?
> 
> I think he was just concerned that his application used 450 MiB RAM, and
> that made me try to see how big an empty Deferred is.
> 
>> Deferreds, as you all already know, are a core concept that Twisted is
>> built upon. If you need to optimize such areas of Twisted, then, just
>> my suggestion - maybe you are using a wrong tool to do the job. If the
>> memory usage is a problem, then, well, maybe you are trying to do
>> things at once, when you should do them sequentially (eg. using a
>> generator).
> 
> Yes, you may be right... scheduling everything to run at once is
> probably not necessary. It just makes things simple with my design.
> 
> I would love to get some advice on the design of my library -- it is the
> first thing I have made with Twisted. The goal of VIFF is to enable
> shared computations -- think addition and multiplication. So the user writes
> 
>   x = add(a, mul(b * c))
> 
> and where a, b, c are Deferreds which will eventually hold integers. The
> add and mul functions return new Deferreds which will eventually hold
> the results. Addition is simple:
> 
>   def add(x, y):
>       sum = gatherResults([x, y])
>       sum.addCallback(lambda results: results[0] + results[1])
>       return sum
> 
> Multiplication is similar, but requires network communication -- that is
> why it returns a Deferred.
> 
> This is the basic operation of the library: an expression tree is built
> and evaluated from the leaves inwards. The important point is that as
> much as possible is done in parallel, that all operations run as soon as
> their operands are available. Twisted made this extremely easy thanks to
> the DeferredList and Deferred abstraction.
> 
>> Maybe you should rather look into other areas of the problem (eg. not
>> try to optimize deferreds, but the code that uses them). Maybe you
>> should use something else than Deferreds for that job. As for memory
>> usage of such small objects like Deferreds, well... Python is, well,
>> Python. Python string or integer uses much more memory, than C string
>> or int. You can't tune Python string memory usage, and you hardly can
>> do anything about size of Deferred if that becomes a problem.
> 
> There is the possibility of using the __slots__ attribute to avoid the
> object dictionary. I don't know what negative sideeffects that might
> have, though.
> 
> Also, I have yet to try the C implementation of Deferred to see how it
> performs memory-wise.
> 
>> On the other hand, you should be able to write optimal code with
>> Python faster and memory footprint shouldn't be a problem (from my
>> experience, in many cases Twisted footprint is suprisingly low).
>>
>> In other words - if size of Deferred object is your only bottleneck,
>> then congratulations :-)
> 
> Hehe, thanks! :-) I am certainly not claiming that Deferreds are the
> only bottleneck or consumer of memory. I just figured that they would be
> an easy starting point since my code (and as you say, Twisted in
> general) uses them a lot.
> 
>> I'd rather suggest optimizing other parts of the code - as Deferreds
>> are the core of Twisted, I suppose it could be a bit hard to do
>> anything about them, in the same way you can't do anything about
>> memory footprint of core Python objects.
> 
> Right... I'll try looking through the code to see if I by mistake hold
> on to Deferreds or other objects longer than I have to.
> 
Surely (he said, tongue firmly in cheek) the answer to having too many 
outstanding Deferreds is to use faster peer systems :-)

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/