[Twisted-Python] help with ssl timeout and not reconnecting client factory
exarkun at divmod.com
Thu Mar 17 09:28:06 EST 2005
On Thu, 17 Mar 2005 13:40:56 +0100, Andrea Arcangeli <andrea at cpushare.com> wrote:
>>From a client I'm getting this error:
> [snip - traceback and log]
> This is a reconnecting client factory, the python version is 2.3.4 and twisted
> version is 1.3.0a. The socket should sit in a idle state. No communication over
> that socket will happen (the app is under development), but it should not go in
> timeout unless the connection with the server ends and the keepalive events
> triggers a disconnect (I enabled keepalive on the tcp level).
> Even if it goes in timeout it must try to reconnect immediatly while it seems
> like it's hanging after the "Stopping factory".
> Earlier when I got the connection timed out event (for no apparent good reason)
> at least it was immediatly trying to reconnect:
> [snip - traceback and log]
> 2005/03/14 17:59 CET [cpushare_protocol,client] <twisted.internet.ssl.Connector instance at 0x2aaaac28d950> will retry in 2 s
> 2005/03/14 17:59 CET [cpushare_protocol,client] Stopping factory <cpushare.proto.cpushare_factory instance at 0x2aaaad414290>
> 2005/03/14 17:59 CET [-] Starting factory <cpushare.proto.cpushare_factory instance at 0x2aaaad414290>
> So my first priority is to understand why it stopped trying to reconnect (which
> is the major bug) and the second priority is to understand why it was going in
> timeout in the first place. (I can't exclude there have been a temporary network
> disruption that caused the keepalive to trigger the disconnect.)
For some reason unfathomable to me, ReconnectingClientFactory _stops_ trying to reconnect if a UserError is the cause of failed connection. Further, for some reason, error.TimeoutError subclasses UserError. This has bitten at least one other project (buildbot).
> Could this be a bug in 1.3.0a? I expect the client will be mostly run with
> 1.3.0a, only on the server side I use SVN + pending fixes.
I'm inclined to say that it is indeed a bug. I think ReconnectingClientFactory should always retry the connection, regardless of the exception with which the previous attempt fails. If a program wants to allow a user to interrupt the retry logic, there is a "stopTrying" method.
> This is the reconnecting code:
> class cpushare_factory(ReconnectingClientFactory):
> maxDelay = 600 # limit the maximum delay to 10 min
> protocol = cpushare_protocol
> def buildProtocol(self, addr):
> protocol = self.protocol()
> assert not hasattr(protocol, 'factory')
> protocol.factory = self
> return protocol
> def clientConnectionFailed(self, connector, reason):
> print 'Connection failed. Reason:', reason
> ReconnectingClientFactory.clientConnectionFailed(self, connector, reason)
If you look at twisted/internet/protocol.py for the definition of ReconnectingClientFactory.clientConnectionFailed, it should be pretty obvious how you want to redefine clientConnectionFailed to avoid the behavior you're seeing.
> def connectionMade(self):
> Is the above correct? It works fine when the connection failed reason is
> "ConnectionRefusedError" instead of TimeoutError.
> What else should I do to prevent this error to leave the factory stopped?
> 2005/03/17 06:31 CET [-] Connection failed. Reason: [Failure instance: Traceback: twisted.internet.error.TimeoutError, User t
> imeout caused connection failure.
> Where does the "twisted.internet.error.TimeoutError" come from?
It's generated internally by Twisted when the alloted connection time has elapsed without a connection being created.
Most likely it _is_ network related problems that caused the connection to fail, but Twisted is certainly responsible for the decision to cease further reconnection attempts.
More information about the Twisted-Python