[Twisted-Python] help with refcounts and memleaks
Jean-Paul Calderone
exarkun at divmod.com
Mon Dec 26 12:21:29 EST 2005
On Mon, 26 Dec 2005 17:07:35 +0100, Andrea Arcangeli <andrea at cpushare.com> wrote:
>Hello,
Hey,
This is really a question for a Python list. However, I've attached
some comments below.
>
>I was just shoked today when I noticed this:
>
>-------------------
>import sys
>
>class A(object):
> y = None
> def x(self):
> pass
> def __del__(self):
> print 'deleted'
>
>
>a = A()
>print sys.getrefcount(a)
>if 1:
> a.y = a.x
>print sys.getrefcount(a)
>del a
>-------------------
>
>I understood the cross references memleaks well, like "x.y = y; y.x= x;
>del x,y", but I didn't imagine that "a.y = a.x" would be enough to
>generate a memleak. "a.y = a.x" isn't referencing another structure,
>it's referencing itself only. Infact if I do this the memleak goes
>away!!
I'm not sure how far you've gotten into this, but here's the basic
explanation: "a.x" gives you a "bound method instance"; since you
might do anything at all with the object it evaluates to, it wraps
up a reference to the object "a" references, so it knows what object
to use as "self"; this has the effect of increasing the reference
count of "a", but it doesn't actually leak any memory.
Of course, in creating a cycle which contains an object with an
implementation of __del__, you have created a leak, since Python's
GC cannot collect that kind of graph.
Hopefully the __del__ implementation is only included as an aid to
understanding what is going on, and you don't actually need it in
any of your actual applications. Once removed, the cycle will be
collectable by Python.
Another strategy is to periodically examine gc.garbage and manually
break cycles. This way, if you do have any __del__ implementations,
they will no longer be part of a cycle, and Python will again be
able to collect these objects.
>
>-------------------
>import sys
>
>class A(object):
> def x(self):
> pass
> y = x
> def __del__(self):
> print 'deleted'
>
>
>a = A()
>print sys.getrefcount(a)
>a.x()
>a.y()
>print a.x, a.y
>del a
>-------------------
>
>Now the fact a static field doesn't generate a reference but a dynamic
>one does is quite confusing to me and it also opened a can of worms in
>my code. I can handle that now that I know about it, but I wonder what
>people recommends to solve memleaks of this kind.
This is an interesting case. Python does not do what you probably
expect here. When you define a class with methods, Python does not
actually create any method objects! It is the actual attribute lookup
on an instance which creates the method object. You can see this in
the following example:
>>> class X:
... def y(self): pass
...
>>> a = X()
>>> a.y is a.y
False
>>> a.y is X.__dict__['y']
False
>>> X.__dict__['y'] is X.__dict__['y']
True
>>>
So when you added "y" to your class "A", Python didn't care, because
there aren't even any method objects until you access an attribute
which is bound to a function. Continuing the above example:
>>> sys.getrefcount(a)
2
>>> L = [a.y, a.y, a.y, a.y]
>>> sys.getrefcount(a)
6
>>>
>
>I'd also like to know how other languages like ruby and java behave in
>terms of self-references of objects. Can't the language understand it's
>a self reference, and in turn it's the same as an integer or a string,
>like it already does when the member is initialized statically?
I don't know Ruby well enough to comment directly, but I believe Ruby's
GC is much simpler (and less capable) than Python's. Java doesn't have
bound methods (or unbound methods, or heck, functions): the obvious way
in which you would construct them on top of the primitives the language
does offer seems to me as though it would introduce the same "problem"
you are seeing in Python, but that may just be due to the influence
Python has had on my thinking.
>
>Infact can't the language be smart enough to even understand when two
>cross referenced objects lost visibility from all points of view, and
>drop both objects even if they hold a reference on each other? I
>understand this is a lot more complicated but wouldn't it be possible in
>theory? What does the garbage collection of other languages like ruby
>and java, the same as python or more advanced?
When you have "two cross referenced objects", that's a cycle, and
Python will indeed clean it up. The only exception is if there is a
__del__ implementation, as I mentioned above. This is a general problem
with garbage collection. If you have two objects which refer to each
other and which each wish to perform some finalization, which finalizer
do you call first?
>
>So far my python programs never really cared to released memory (so my
>not full understanding of python refcounts wasn't a problem), but now
>since I'm dealing with a server I must make sure that the "proto" is
>released after a loseConnection invocation. So I must cleanup all cross
>and self! references in loseConnection and use weakrefs where needed.
>
>Now those structures that I'm leaking (like the protocol object) are so
>tiny that there's no chance that I could ever notice the memleak in real
>life, so I had to add debugging code to trap memleaks. You can imagine
>my server code like this:
You might be surprised :) These things tend to build up, if your process
is long-running.
>
>class cpushare_protocol(Int32StringReceiver):
> def connectionMade(self):
> [..]
> self.hard_handlers = {
> PROTO_SECCOMP : self.seccomp_handler,
> PROTO_LOG : self.log_handler,
> }
> [..]
> def log_handler(self, string):
> [..]
> def seccomp_handler(self, string):
> [..]
> def __del__(self):
> print 'protocol deleted'
> def connectionLost(self, reason):
> [..]
> # memleaks
> del self.hard_handlers
> print 'protocol refcount:', sys.getrefcount(self)
> #assert sys.getrefcount(self) == 4
>
>For things like hard_handlers (that are self-referencing callbacks) I
>can't even use the weakref.WeakValueDictionary, because it wouldn't hold
>itself, the object gets released immediately. So the only chance I have
>to release the memory of the protocol object when the connection is
>dropped, is to do an explicit del self.hard_handlers in loseConnection.
>
>I wonder what other twisted developers do to avoid those troubles.
>Perhaps I shouldn't use self referencing callbacks to hold the state
>machine, and do like the smpt protocol that does this:
>
> def lookupMethod(self, command):
> return getattr(self, 'do_' + command.upper(), None)
>
>basically working with strings instead of pointers. Or I can simply make
>sure to cleanup all structures when I stop using them (like with the del
>self.hard_handlers above), but then I'll lose part of the automatic
>garbage collection features of python. I really want garbage collection
>or I could have written this in C++ if I'm forced to cleanup by hand.
(You can probably guess what I'm going to say here. ;) In general, I
avoid implementing __del__. My programs may end up with cycles, but
as long as I don't have __del__, Python can figure out how to free the
objects. Note that it does sometimes take it a while (and this has
implications for peak memory usage which may be important to you), but
if you find a case that it doesn't handle, then you've probably found
a bug in the GC that python-dev will fix.
Hope this helps, and happy holidays,
Jean-Paul
More information about the Twisted-Python
mailing list