[Twisted-Python] Reducing memory footprint of twisted app?

Tue Aug 15 06:54:25 MDT 2006

On Tue, Aug 15, 2006 at 12:36:16PM +0200, Marcin Kasperski wrote:
> Not an exact question, but rather 'searching for ideas'.
> 
> I have some twisted app, which uses more memory than I would like it to.  I
> tried analysing it a bit (mainly using gc module object list and enumerating
> items of different types) and it seem to me that there is something
> 'twistedish' in it. My application uses in a lot of places generator idiom
> (functions/methods which yield wrapped with defer.deferredGenerator). And, as
> there seem to be a lot of anonymous functions and tuples allocated, I suspect
> that maybe those functions, deferreds and related params and closures live
> longer then I would like to.

Well, objects in Python will live as long as they are referenced.  If you have
large objects (or many objects) referenced from a function scope or object
that's still live, then of course the referenced objects will still be live too.

> Any ideas of what could I do to track it down? In particular, is it possible
> to somehow use introspection to find which lambdas and deferreds are allocated
> while the program is running? Are there any suggestions on how to code
> deferredGenerators to reduce allocated memory (maybe, for instance, I should
> try to turn local variables into object attributes, or opposite, or ...)

Object attributes would tend to be worse than locals, because typically objects
(and thus their attributes) outlive a function's scope.

As a thought experiment, if you transform a generator function into a class,
moving the state from locals in the generator to instance variables of the
class, what have you changed about the lifetimes of those objects?  Answer:
nothing.  If some of those generator locals become locals in the __next__ and
other methods of the class, but *not* instance variables, then those lifetimes
will be shorter -- but you can achieve exactly the same effect by adding "del
foo" or "foo = None" statements to the original generator function.

Thinking about the problem as somehow inherent to generator functions (and by
extension, deferredGenerator), is a red herring.

The best idea I can offer you is this: first find out what's taking the memory
before you try to change your code to fix it.  Blindly rewriting some code in a
different style without understanding why (or even if) it's taking up so much
memory will get you nowhere.  Even if you think you have a pretty good guess,
you're probably wrong (at least, I find that's what happens to me when I try to
optimise based only on guesses).

> Also, if anybody could me point to any interesting resources about tracking
> python momory usage, I would be grateful.
> 
> Tried googling for some time, but apart of zope trackRef I did not found 
> anything.

I use http://twistedmatrix.com/users/spiv/countrefs.py occasionally when I'm
trying to figure out what's using memory in a Python program.  It uses the ref
count on class/type objects as an approximation of the number of instances;
which is close enough.  If there are 100000 references to a class, it's almost
certain that at least 999990 of them are instances of that class.

The other thing to do is to reproduce the problem as simply as possible.  Do you
have a test suite?  Does the memory usage get too high during the test run?

Also, can you reproduce it just by starting the web server?  If so, try running
just half the code involved to start it up -- still see it?  And so on.

Or, if it only consumes unacceptably large amounts of memory after serving 10000
requests, write a script to issue 10000 requests, change the server to only do
first half the processing, hit it with 10000 requests, and you'll see if the
problem is in the first half or the second half.

You get the idea.  Reproduce your problem, then simplify things as much as
possible until you can analyse it.

I hope these ideas help you.

-Andrew.