Opened 13 years ago

Last modified 12 years ago

#3853 defect new

Reference leak if function called by deferToThread() raises an error

Reported by: haypo Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: therve, Jean-Paul Calderone Branch:


In the following example, the DestroyMe object is not destroyed at inthread() exit.

It looks like a Failure() stores the Python frame and so stores the 'context' argument of inthread(). But the context object is not destroyed after displayError().

I think that the context object is still stored (reference leak)... somewhere. It's maybe referenced by the failure object, which is not destroyed directly? Another idea: the reference is maybe stored in the thread pool?

from twisted.internet.threads import deferToThread
from twisted.internet import reactor
from twisted.internet.defer import succeed

class DestroyMe:
    def __init__(self):
        print "CREATE DESTROY ME"

    def __del__(self):
        print "DELETE DESTROY ME"

def inthread(x, context):
    raise ValueError("error!")
    return x * 10

def func1(x, context):
    x = x + 1
    d = deferToThread(inthread, x, context)
    d.addErrback(displayError, "thread")
    return d

def printResult(x):
    print "Result = %s" % x
    print "test done"

def displayError(failure, where):
    print "FAILURE in %s: %s" % (where, failure)

def test1():
    print "test1"
    d = succeed(10)
    context = DestroyMe()
    d.addErrback(displayError, "test1")
    d.addCallback(func1, context)
    return d

def stop():
    print "Stop!"

reactor.callLater(0, test1)
reactor.callLater(4, stop)

I tested Twisted core 8.1.0 (Ubuntu Ibex).

Change History (8)

comment:1 Changed 13 years ago by therve

Cc: therve added

I think this patch solves the problem:

Index: twisted/python/
--- twisted/python/	(révision 26937)
+++ twisted/python/	(copie de travail)
@@ -228,6 +228,8 @@
           , log.err)
             del ctx, onResult, result
+            sys.exc_clear()
+            sys._getframe().f_locals
             o = self.q.get()

Python is terrible.

comment:2 Changed 13 years ago by haypo

I adapted the patch for Twisted 8.1:

  • twisted/python/

    old new  
    163163      , log.err)
    164164            self.working.remove(ct)
    165165            del o, ctx, function, args, kwargs
    166             sys.exc_clear()
    167             sys._getframe().f_locals
    168166            self.waiters.append(ct)
    169167            o = self.q.get()
    170168            self.waiters.remove(ct)

And yes, it looks like that the bug is fixed.

comment:3 Changed 13 years ago by Jean-Paul Calderone

Based on an explanation from therve on irc of why this happens, here is a minimal example which demonstrates the CPython behavior the above patches work around:

exarkun@charm:~$ python -c '
import weakref, inspect
class x:

def f():
    y = x()
    z = weakref.ref(y)
    print z()
    f = inspect.currentframe()
    d = f.f_locals
    print "y" in d
    del y
    print "y" in d
    print z()
    print "y" in d
    print z()

<__main__.x instance at 0xb7d3528c>
<__main__.x instance at 0xb7d3528c>

See frame_getlocals in frameobject.c to see why.

The broader implication here is that the stack walking combined with touching f_locals for each frame which Failure does makes it hideously unsafe for use in any program that is concerned with bounded memory use. Fixing this issue in fixes one particular cycle, but many others exist, both in Twisted and in applications.

This is the second case (that I can remember, I've probably forgotten some) where Failure was implicated in a reference leak. In the first, I accidentally avoided this issue by using local = None instead of del local to resolve the leak. I'm not sure if it was pure luck or if I would have noticed that del local didn't actually fix the problem, had I tried that (which I don't think I did, but who knows).

I'm going to suggest to the CPython developers that this is a misbehavior of f_locals. It would be better to have an API which either:

  • Gave you a copy of the locals at the current time and didn't pretend it would be automatically updated
  • Gave you a view onto the locals at the current time but didn't actually hold any references to the locals itself
  • Gave you a real dictionary of the locals but didn't hold a strong reference to that dictionary internally, so that if all external users of the dictionary went away, the frame itself would also stop referring to the locals.

comment:4 Changed 13 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added

I filed this upstream:

We probably need to work around this anyway, since I doubt seriously that upstream will be amenable to backporting the fix to Python 2.4 or Python 2.5 (and possibly even Python 2.6).

comment:5 Changed 13 years ago by haypo

I forget to say that I fixed my code to use weakref.ref() as a workaround.

comment:6 Changed 13 years ago by therve

Using a weakref should not make any difference, from what I understand.

comment:7 Changed 12 years ago by Glyph

Owner: changed from Glyph to therve

comment:8 Changed 11 years ago by <automation>

Owner: therve deleted
Note: See TracTickets for help on using tickets.