Opened 10 years ago

Closed 7 years ago

#3312 defect closed fixed (fixed)

Silent server crash with kqueue.reactor

Reported by: stupidInvaders Owned by: stupidInvaders
Priority: highest Milestone:
Component: core Keywords:
Cc: Jean-Paul Calderone Branch:
Author:

Description

While high loaded, with ~1000 clients, server silently crashes. Crash looks like: server seems to be working but clients can't join the server.At that there no errors in stderr and logs. We may possibly have an error in our code. Everything was made according to manuals.

We use: FreeBSD 6.1 and 7.0; python2.4; twisted 8.1 and twisted 2.5.

import sys
import time
import md5

if "freebsd" in sys.platform:
    from twisted.internet import kqreactor
    kqreactor.install()
elif "linux" in sys.platform:
    from twisted.internet import epollreactor
    epollreactor.install()

from twisted.internet import protocol
from twisted.internet import reactor
#required for using threads with the Reactor
from twisted.python import threadable
threadable.init()

class Protocol(protocol.Protocol):
    """ClientProtocol"""
    data = ""
    maxConnections = 5000

    def connectionMade(self):
        self.factory.countConnections += 1
        if self.factory.countConnections > self.maxConnections:
            """Too many connections, try later"""
            self.transport.loseConnection()

    def connectionLost(self, reason):
        self.factory.countConnections -= 1
        self.transport.loseConnection()

    def dataReceived(self, data):
        """As soon as any data is received, write it back."""
        self.data = self.data + data
        self.parseCommand()

    def parseCommand(self):
        """Must be overridden"""
        self.handleCommand()
        
    def handleCommand(self):
        """Must be overridden"""
        response = ""
        self.sendResponse(response)

    def sendResponse(self, response):
        """Must be overridden"""
        msg = self.data
        print msg
        self.transport.write(msg)   
        
class ClientFactory(protocol.ClientFactory):
    """ClientFactory"""
    countConnections = 0
    clients = {}

    def __init__(self, pr = Protocol):
        self.protocol = pr
    
    def clientConnectionFailed(self, connector, reason):
        print "Connection failed:", reason
        
class ReconnectingClientFactory(protocol.ReconnectingClientFactory):
    """ReconnectingClientFactory"""
    def __init__(self, pr = Protocol):
        self.protocol = pr
        self.initialDelay = 1
        self.delay = self.initialDelay
        self.maxDelay = 60
        self.factor = 1

    def clientConnectionLost(self, connector, reason):
        print "Lost connection. Reason:", reason
        time.sleep(self.delay)
        self.delay *=  self.factor
        if self.delay > self.maxDelay:
            self.delay = self.maxDelay
        connector.connect()

    def clientConnectionFailed(self, connector, reason):
        print "Connection failed. Reason:", reason
        time.sleep(self.delay)
        self.delay *=  self.factor
        if self.delay > self.maxDelay:
            self.delay = self.maxDelay
        connector.connect()
    
class Server(object):
    """Server"""
    protocols = {}
    
    def __init__(self, listeningPorts = {}, connectingPorts = {}):
        self.listeningPorts = listeningPorts
        self.connectingPorts = connectingPorts
        self.factories = {}
        # Множество слушающих портов
        for item in self.listeningPorts.iteritems():
            protocolID = item[0]
            protocol = self.protocols[protocolID]
            factory = ClientFactory(protocol)
            self.factories[protocolID] = factory
        # Множество коннектящихся портов
        for item in self.connectingPorts.iteritems():
            protocolID = item[0]
            protocol = self.protocols[protocolID]
            factory = ReconnectingClientFactory(protocol)
            self.factories[protocolID] = factory            
        
    def __del__(self):
        pass
        
    def start(self):
        """Start the server"""
        self.__reactor = reactor
        self.__listener = {}
        self.__connector = {}
        for item in self.listeningPorts.iteritems():
            protocolID = item[0]
            protocol = self.protocols[protocolID]
            host, port = item[1]
            self.__listener[protocolID] = self.__reactor.listenTCP(port, self.factories[protocolID])
        for item in self.connectingPorts.iteritems():
            protocolID = item[0]
            protocol = self.protocols[protocolID]
            host, port = item[1]
            self.__connector[protocolID] = self.__reactor.connectTCP(host, port, self.factories[protocolID],timeout=0.5)
        self.__reactor.run(installSignalHandlers=0)
        
    def stop(self):
        """Correct way to stop the server"""
        self.__reactor.stop()

    def crash(self):
        """Crash the server. Data may be lost"""
        for protocolID in self.listeningPorts.iterkeys():
            self.__listener[protocolID].connectionLost("")
        for protocolID in self.connectingPorts.iterkeys():
            self.__listener[protocolID].connectionLost("")
        self.__reactor.crash()
        
    def cold_restart(self):
        """May be overridden"""
        
    def soft_restart(self):
        """May be overridden"""
        
    def hard_restart(self):
        """May be overridden"""

    def status(self):
        """Server status (running or not)"""
        return self.__reactor.running

Change History (7)

comment:1 Changed 10 years ago by Itamar Turner-Trauring

  1. If a log observer raises an exception, it gets removed (see http://twistedmatrix.com/trac/ticket/1069) - is it possible an exception is getting thrown there, causing logging to stop? One common cause is exceeding recursion limit.
  1. Does this not happen with epoll? How about select()? Some Twisted reactors have issues when hitting their file descriptor limit, so maybe your kqueue settings limit you to 1024 fds.

comment:2 Changed 10 years ago by stupidInvaders

  1. We will try.
  2. We didn't check epoll.

Select crash's on Twisted 2.5.0 with many messages like this in stderr:

--- <exception caught here> ---
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 231, in mainLoop
    self.doIteration(t)
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/selectreactor.py", line 97, in doSelect
    [], timeout)
  File "<string>", line 1, in fileno

  File "/usr/local/lib/python2.4/socket.py", line 136, in _dummy
    raise error(EBADF, 'Bad file descriptor')
socket.error: (9, 'Bad file descriptor')
Traceback (most recent call last):
  File "./MaffiaServerTwisted.py", line 116, in ?
    server.start()
  File "/usr/home/maffia/core/protocol.py", line 131, in start
    self.__reactor.run(installSignalHandlers=0)
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 220, in run
    self.mainLoop()

Server didn't reach the limit of 1024 fds. Ulimit sets to value that greatly exceed the required limit of fds.

comment:3 Changed 10 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Milestone: Twisted-8.2

kqueuereactor in trunk isn't well. It doesn't come close to passing the test suite. See #1918 and #3114 for links to branches or diffs which change it, perhaps for the better. If you're interested in KQueue support, any help you can provide in getting these tickets resolved would be appreciated.

Since this isn't a regression from a previous release (and officially, KQueue is not a "supported" event mechanism), I'm going to remove this ticket from the 8.2 milestone. Our usage of release milestones is to indicate tickets which must be resolved for that release. Generally, only regressions can block new releases.

comment:4 Changed 7 years ago by Thijs Triemstra

A new kqueue implementation was added in r33481 (#1918).

comment:5 Changed 7 years ago by Corbin Simpson

Resolution: invalid
Status: newclosed

If this bug still applies to the modern, rewritten kqueue reactor, feel free to open a new bug, but this one is being closed.

comment:6 Changed 7 years ago by Jean-Paul Calderone

Resolution: invalid
Status: closedreopened

We generally shouldn't close tickets like this. The reporter gave us instructions for reproducing the issue. We can try to reproduce it. If it's really gone, we can close the ticket as fixed. If it's still there, we know it's a bug we should actually fix.

I'll try to go reproduce the issue now and report back.

comment:7 Changed 7 years ago by Jean-Paul Calderone

Resolution: fixed
Status: reopenedclosed

Using more recent versions of everything involved, and a tiny extra main script (not included in the ticket description, unfortunately), I can't reproduce any problem with this code. So, really fixed. Thanks everybody.

Note: See TracTickets for help on using tickets.