Opened 10 years ago

Last modified 10 years ago

#3853 defect new

Reference leak if function called by deferToThread() raises an error

Reported by: haypo Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: therve, Jean-Paul Calderone Branch:
Author:

Description

In the following example, the DestroyMe object is not destroyed at inthread() exit.

It looks like a Failure() stores the Python frame and so stores the 'context' argument of inthread(). But the context object is not destroyed after displayError().

I think that the context object is still stored (reference leak)... somewhere. It's maybe referenced by the failure object, which is not destroyed directly? Another idea: the reference is maybe stored in the thread pool?

from twisted.internet.threads import deferToThread
from twisted.internet import reactor
from twisted.internet.defer import succeed

class DestroyMe:
    def __init__(self):
        print "CREATE DESTROY ME"

    def __del__(self):
        print "DELETE DESTROY ME"

def inthread(x, context):
    raise ValueError("error!")
    return x * 10

def func1(x, context):
    x = x + 1
    d = deferToThread(inthread, x, context)
    d.addErrback(displayError, "thread")
    return d

def printResult(x):
    print "Result = %s" % x
    print "test done"

def displayError(failure, where):
    print "FAILURE in %s: %s" % (where, failure)

def test1():
    print "test1"
    d = succeed(10)
    context = DestroyMe()
    d.addErrback(displayError, "test1")
    d.addCallback(func1, context)
    d.addCallback(printResult)
    return d

def stop():
    print "Stop!"
    reactor.stop()

reactor.callLater(0, test1)
reactor.callLater(4, stop)

reactor.run()

I tested Twisted core 8.1.0 (Ubuntu Ibex).

Change History (8)

comment:1 Changed 10 years ago by therve

Cc: therve added

I think this patch solves the problem:

Index: twisted/python/threadpool.py
===================================================================
--- twisted/python/threadpool.py	(révision 26937)
+++ twisted/python/threadpool.py	(copie de travail)
@@ -228,6 +228,8 @@
                     context.call(ctx, log.err)
 
             del ctx, onResult, result
+            sys.exc_clear()
+            sys._getframe().f_locals
 
             self.waiters.append(ct)
             o = self.q.get()

Python is terrible.

comment:2 Changed 10 years ago by haypo

I adapted the patch for Twisted 8.1:

  • twisted/python/threadpool.py

    old new  
    163163                context.call(ctx, log.err)
    164164            self.working.remove(ct)
    165165            del o, ctx, function, args, kwargs
    166             sys.exc_clear()
    167             sys._getframe().f_locals
    168166            self.waiters.append(ct)
    169167            o = self.q.get()
    170168            self.waiters.remove(ct)

And yes, it looks like that the bug is fixed.

comment:3 Changed 10 years ago by Jean-Paul Calderone

Based on an explanation from therve on irc of why this happens, here is a minimal example which demonstrates the CPython behavior the above patches work around:

exarkun@charm:~$ python -c '
import weakref, inspect
class x:
    pass

def f():
    y = x()
    z = weakref.ref(y)
    print z()
    f = inspect.currentframe()
    d = f.f_locals
    print "y" in d
    del y
    print "y" in d
    print z()
    f.f_locals
    print "y" in d
    print z()

f()
'
<__main__.x instance at 0xb7d3528c>
True
True
<__main__.x instance at 0xb7d3528c>
False
None
exarkun@charm:~$ 

See frame_getlocals in frameobject.c to see why.

The broader implication here is that the stack walking combined with touching f_locals for each frame which Failure does makes it hideously unsafe for use in any program that is concerned with bounded memory use. Fixing this issue in threadpool.py fixes one particular cycle, but many others exist, both in Twisted and in applications.

This is the second case (that I can remember, I've probably forgotten some) where Failure was implicated in a reference leak. In the first, I accidentally avoided this issue by using local = None instead of del local to resolve the leak. I'm not sure if it was pure luck or if I would have noticed that del local didn't actually fix the problem, had I tried that (which I don't think I did, but who knows).

I'm going to suggest to the CPython developers that this is a misbehavior of f_locals. It would be better to have an API which either:

  • Gave you a copy of the locals at the current time and didn't pretend it would be automatically updated
  • Gave you a view onto the locals at the current time but didn't actually hold any references to the locals itself
  • Gave you a real dictionary of the locals but didn't hold a strong reference to that dictionary internally, so that if all external users of the dictionary went away, the frame itself would also stop referring to the locals.

comment:4 Changed 10 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added

I filed this upstream:

http://bugs.python.org/issue6116

We probably need to work around this anyway, since I doubt seriously that upstream will be amenable to backporting the fix to Python 2.4 or Python 2.5 (and possibly even Python 2.6).

comment:5 Changed 10 years ago by haypo

I forget to say that I fixed my code to use weakref.ref() as a workaround.

comment:6 Changed 10 years ago by therve

Using a weakref should not make any difference, from what I understand.

comment:7 Changed 10 years ago by Glyph

Owner: changed from Glyph to therve

comment:8 Changed 9 years ago by <automation>

Owner: therve deleted
Note: See TracTickets for help on using tickets.