[Twisted-Python] debugging a memory leak

Alec Matusis matusis at yahoo.com
Wed Mar 3 20:08:03 EST 2010


It turns out that the situation described in my last email, where the server
buffers strings due to the socket Send-Q saturation never arises in our
production system, it was an artifact of high load testing in the lab
environment.

I "fixed" the memory leak, and I still do not understand what caused it. The
code looked like this:

File protocols.py:

class MyProtocol(twisted.protocols.basic.LineOnlyReceiver):

    def lineReceived(self, line):
        def login_cb(session):
            self.session = session

        #this deferred chain creates an instance of Session in the end
        d = self.factory.service.login_deferred(Callback(self), line)
        d.addErrback(twisted.python.log.err)
        #adds some properties to the session instance
        d.addCallback(self.factory.service.authenticate)
        d.addErrback(twisted.python.log.err)
        d.addCallback(login_cb)
        d.addErrback(twisted.python.log.err)

class Callback:
    def __init__(self, protocol):
        self.protocol = protocol

File service.py (protocol.factory.service):

class Session:
    def __init__(self, service, callback):
        self.callback = callback
        self.service = service

    def do_something_asynchronous(self):
        #this function creates MEMORY LEAK
        def cb(result):
            self.some_property = result
            return self

        d = some_asynch_operation()
        d.addCallback(cb)
        d.addErrback(twisted.python.log.err)

class Service:

    def login_deferred(self, callback, line):

        def cb(result):
            s = Session(self, callback)
            return s

        d = asynch_operation(line)
        d.addCallback(cb)
        d.addErrback(twisted.python.log.err)
        return d

    def authenticate(self, session):
        #the deferred below caused the MEMORY LEAK
        #when its code was moved into here from
Session.do_something_asynchronous
        #the leak disappered
        d = session.do_something_asynchronous()
        return d

The problem was in the function Session.do_something_asynchronous , that
returned the instance of its class.
Interestingly, even though this was causing an unbounded memory leak visible
from the OS, the number of objects tracked by garbage collector
len(gc.get_objects()) remained stable, and also none of the objects that had
len() were getting too long. Nothing could be seen by Heapy either,
When I removed this function, and moved its code into Service.authenticate,
the memory stabilized. I still do not quite understand why, I was just
acting on the intuition that when a function returns its class instance, it
could create a reference cycle.

After the removal of this function, the server exhibits the following
behavior: when it starts and after all clients connect and it reaches its
average load, the memory stays at 350MB for about 1.5hrs. Then it suddenly
jumps to 650MB, and stays there. Since at 350MB the server already serves
the maximum number of connection (and is operating at its normal load), I
think that the jump to 650MB is caused by some additional memory leak.
Could there be anything wrong with this code pattern, where I launch
deferreds from other deferreds (e.g. my function Service.login_deferred ,
which itself is a part of a deferred chain, launches another deferred
function cb() within itself)?

Also this code has the following reference cycle: protocol.session ->
session.callback.protocol. This cycle existed in the code for a long time
and never caused unbounded memory, even though the server creates 100 of
Session()s per sec, and they seem to be always garbage-collected (by
inspecting gc.get_objects()), even with python 2.4 before the improvements
of the cyclical garbage collector.

> -----Original Message-----
> From: twisted-python-bounces at twistedmatrix.com [mailto:twisted-python-
> bounces at twistedmatrix.com] On Behalf Of exarkun at twistedmatrix.com
> Sent: Thursday, February 25, 2010 4:51 PM
> To: Twisted general discussion
> Subject: Re: [Twisted-Python] debugging a memory leak
> 
> On 01:24 am, johann.borck at densedata.com wrote:
> >Alec Matusis wrote:
> >>In desperation of not finding the real memory leak on the production
> >>server,
> >>
> >>I wrote a test server that I can push to arbitrary high RSS memory. I
> >>am far
> >>from sure if this the same leak that I observe in production, but I
> >>would
> >>like to understand what this one is.
> >>This is the server code:
> >>
> >>Server.py:
> >>
> >>import twisted.protocols.basic
> >>from twisted.internet.protocol import Factory
> >>from twisted.internet import reactor
> >>
> >>class HiRate(twisted.protocols.basic.LineOnlyReceiver):
> >>         MAX_LENGTH = 20000
> >>
> >>         def lineReceived(self, line):
> >>                 if line == 'get':
> >>                         out = 'a'*4000+'\r\r'
> >>                         self.transport.write(out)
> >>
> >>
> >>
> >>factory = Factory()
> >>factory.protocol = HiRate
> >>
> >>reactor.listenTCP(8007, factory, backlog=50, interface='10.18.0.2')
> >>reactor.run()
> >>[...]
> >>To reproduce the memory leak, I either need two machines with fast LAN
> >>between them (since the client program takes 100% CPU), or possibly
> >>one
> >>machine with a dual core CPU (I have not tried that). It is important
> >>that
> >>client.py is given a separate CPU to run.
> >>When the length of the response from the server is sufficient,  (out =
> >>'a'*4000+'\r\r' ,  4000 is enough in my case), the RSS of the server
> >>process
> >>starts to leak without a bound.
> >>If you introduce a small delay in the client (#time.sleep(.001)), the
> >>leak
> >>does not occur.
> >Hi Alec,
> >I wouldn't call that a memory leak. In your example  ( SC/s = (n* CS) /
> >s ;  n > 1, SC: bytes sent to the client, CS: bytes sent to the server
> >)
> >there is some CS that satifies the condition SC/s > [available
> >bandwidth]. For all CS that exceed that limit, the data will be first
> >queued in the Send-Q of the socket and then inside twisted, because the
> >Send-Q is full. That's the reason why you see increasing memory usage,
> >but it's not a leak, it is a server that can not cope with this kind of
> >DOS attack. Twisted can not handle this situation for you because that
> >would mean it had to decide which packets, connections, etc. to drop.
> >Such decisions have to be made by your application.
> 
> You can get notifications about when your send buffers are full by
> registering a producer with the transport.
> http://twistedmatrix.com/documents/current/core/howto/producers.html
> describes the APIs involved.  This will help you make these decisions in
> your application.
> 
> Jean-Paul
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python




More information about the Twisted-Python mailing list