[Twisted-Python] help with refcounts and memleaks

Andrea Arcangeli andrea at cpushare.com
Mon Dec 26 19:18:33 EST 2005


On Mon, Dec 26, 2005 at 12:21:29PM -0500, Jean-Paul Calderone wrote:
> Of course, in creating a cycle which contains an object with an 
> implementation of __del__, you have created a leak, since Python's 
> GC cannot collect that kind of graph.

Ah this explains many things. I didn't realize that having a __del__
callback made any difference from a garbage collection point of view, so
while trying to fix memleaks I probably added them ;).

Sorry for posting it here and not a python list, but my basic problem is
to make sure the "protocol" object is being collected away, and the
protocol object is a very twisted thing, so I thought it would be at
on topic here since everyone of us needs the protocol object garbage
collected properly. Now it turned out more a language thing than I
thought originally...

Ok, going back to how this thing started. I happened to allocate 50M of
ram somehow attached to a protocol object, and then I noticed that the
reconnectingclientfactory was leaking memory after a
disconnect/reconnect event. Every time I restarted the server, 50M were
added to the RSS of the task. That was definitely a memleak, and I never
had a __del__ method. Then I started adding debugging aid to figure out
what was going wrong. By removing the self and cross references the
memleak was fixed in the client. So then I figured out the same
self-references were in the server as well, and I added more debugging
in the server as well. That lead me in the current situation.  So
something was definitely going wrong w.r.t. memleaks even before I
started messing with the __del__ methods.

But I'm very relieived to know that python gets it right if __del__
isn't implemented.

> Hopefully the __del__ implementation is only included as an aid to 
> understanding what is going on, and you don't actually need it in 
> any of your actual applications.  Once removed, the cycle will be 
> collectable by Python.

Correct, it was only an aid, it didn't exist until today.

> When you have "two cross referenced objects", that's a cycle, and 
> Python will indeed clean it up.  The only exception is if there is a 

Well, I never cared about cyclic references until today, because I
thought python would understand it automatically like I think it's
possible infact.

But then while trying to debug the 50M leak in the client at every
server restart (so very visible), I quickly into this:

	http://www.nightmare.com/medusa/memory-leaks.html

class thing:
    pass

a = thing()
b = thing()
a.other = b
b.other = a

del a
del b    

Code like above is very common in my twisted based server. Note that
there's no __del__ method in the class "thing". So what you say seems in
disagreement with the above url. Perhaps I got bitten by the common
mistake "I found it on the internet so it must be true"... I really
hope you're the one being right, my code was all written with your ideas
in mind but that seems to collide strong with the above url. I guess I
should have checked the date, it's from 99, perhaps it has been true a
long time ago?

> __del__ implementation, as I mentioned above.  This is a general problem 
> with garbage collection.  If you have two objects which refer to each 
> other and which each wish to perform some finalization, which finalizer 
> do you call first?

Why would it matter which one you call first? Random no? Better to call
it random than to leak memory, no? At least python should spawn a
gigantic warning that there's a cross reference leaking, instead of
silenty not calling __del__.

> You might be surprised :)  These things tend to build up, if your process 
> is long-running.

I think you're right there was no memleak generated by self/cross
cyclic references, but then the load is pretty low at the moment so I
could have overlooked it. I periodically monitor the rss of all tasks.
I never had problems before noticing the reconnectingclientfactory
memleak (which btw I can't reproduce anymore after removing the cross
references).

> (You can probably guess what I'm going to say here. ;)  In general, I 
> avoid implementing __del__.  My programs may end up with cycles, but 
> as long as I don't have __del__, Python can figure out how to free the 
> objects.  Note that it does sometimes take it a while (and this has 
> implications for peak memory usage which may be important to you), but 
> if you find a case that it doesn't handle, then you've probably found 
> a bug in the GC that python-dev will fix.
> 
> Hope this helps, and happy holidays,

Thanks a lot, things looks much better now, I'm relieved that python can
figure out how to free objects, I always thought it was able to do so
infact ;). Happy holidays to you too.

So, I'll backout all my latest changes, and I'll try to find the real
cause of the reconnectingclientfactory memleak which definitely happened
even though there was no __del__ method implemented.




More information about the Twisted-Python mailing list