[Twisted-Python] strange server crash

Alec Matusis matusis at yahoo.com
Sat Mar 21 23:02:03 EDT 2009


I have been operating a busy twisted TCP server for quite some time (1
year+).

Yesterday, it strangely crashed under peak load for that day (it has been
under more load in the past).  

 

# grep segfault /var/log/messages

Mar 20 19:12:15 serv2 kernel: [17687209.144548] twistd[10701]: segfault at 8
rip 41eccf rsp 7fff33675270 error 6

 

I cannot understand the reason of the crash, except that it’s some very rare
race condition.

While the code is quite cumbersome, here are some general details:

 

8 min before the crash, the server started giving repeated error on clients
disconnect, that I have never seen before in the last year.

 

It first produced this twice:

 

2009/03/20 19:04 -0700 [ClientProtocol,318930,70.132.227.150] Unhandled
Error

        Traceback (most recent call last):

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a
pp.py", line 113, in runReactorWithLogging

            reactor.run()

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 220, in run

            self.mainLoop()

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 231, in mainLoop

            self.doIteration(t)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 181, in doPoll

            log.callWithLogger(selectable, _drdw, selectable, fd, event)

        --- <exception caught here> ---

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 48, in callWithLogger

            return callWithContext({"system": lp}, func, *args, **kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 33, in callWithContext

            return context.call({ILogContext: newCtx}, func, *args, **kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 59, in callWithContext

            return self.currentContext().callWithContext(ctx, func, *args,
**kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 37, in callWithContext

            return func(*args,**kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 212, in _doReadOrWrite

            self._disconnectSelectable(selectable, why, inRead)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 252, in _disconnectSelectable

            selectable.readConnectionLost(f)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 405, in readConnectionLost

            self.connectionLost(reason)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 416, in connectionLost

            protocol.connectionLost(reason)

          File "/nail/live/im/myserv/protocols.py", line 170, in
connectionLost

            self.session.logout()

          File "/nail/live/im/myserv/service.py", line 1199, in logout

            if self.nick in session.sellers:

        exceptions.AttributeError: Deferred instance has no attribute
'sellers'

 

and then a different error in the same line with the same call stack, that
was repeated about 1000 times for different logging out clients:

 

2009/03/20 19:04 -0700 [ClientProtocol,326158,99.245.246.187] Unhandled
Error

        Traceback (most recent call last):

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/application/a
pp.py", line 113, in runReactorWithLogging

            reactor.run()

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 220, in run

            self.mainLoop()

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 231, in mainLoop

            self.doIteration(t)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 181, in doPoll

            log.callWithLogger(selectable, _drdw, selectable, fd, event)

        --- <exception caught here> ---

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 48, in callWithLogger

            return callWithContext({"system": lp}, func, *args, **kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/log.py
", line 33, in callWithContext

            return context.call({ILogContext: newCtx}, func, *args, **kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 59, in callWithContext

            return self.currentContext().callWithContext(ctx, func, *args,
**kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/python/contex
t.py", line 37, in callWithContext

            return func(*args,**kw)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/epol
lreactor.py", line 212, in _doReadOrWrite

            self._disconnectSelectable(selectable, why, inRead)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/posi
xbase.py", line 252, in _disconnectSelectable

            selectable.readConnectionLost(f)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 405, in readConnectionLost

            self.connectionLost(reason)

          File
"/nail/encap/Twisted-2.5.0/lib/python2.4/site-packages/twisted/internet/tcp.
py", line 416, in connectionLost

            protocol.connectionLost(reason)

          File "/nail/live/im/myserv/protocols.py", line 170, in
connectionLost

            self.session.logout()

          File "/nail/live/im/myserv/service.py", line 1199, in logout

            if self.nick in session.sellers:

        exceptions.AttributeError: WLResultSet instance has no attribute
'sellers'

 

This last error repeated about 1000 times for different clients, after which
the segfault occurred.

The problem is, that “session” is an instance of a Session class that is
defined in the code, it is NEITHER a Deferred NOR an instance  of
WLResultSet class- these are completely different classes, and there’s no
possible way in the code for variable session to become an instance of
Deferred.

The code is approximately like this:

 

def logout(self):

self.on_logout(self.do_logout) 

for session in self.service.client_sessions: ß self.service.client_sessions
is a dictionary of custom Session class instances

if self.nick in session.sellers:  ß error here, it claims that session is a
Deferred or sometimes an instance of a completely unrelated class

                                                #Some statements 

 

The function self.on_logout sometimes (rarely) dispatches a deferred (it has
maybeDeferred in it), such that its argument self.on_logout is called when
that deferred returns (otherwise it’s called synchronously).

 

My question is, is it possible to muck up deferred in such a way, that after
dispatching a deferred chain inside  self.on_logout(self.do_logout) , the
remaining statements would be executed in some  unexpected local scope, such
that I would see these weird errors where my for loop iterator would
suddenly go over Deferred instances, instead of the dictionary of Session
instances that I define in the code ?? 

 

Also, why did these errors lead to segfault of the whole twisted process
after 8 minutes?

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20090321/825facf2/attachment.htm 


More information about the Twisted-Python mailing list