[Twisted-Python] Handling PBConnectionLost errors
Daniel Miller
daniel at keystonewood.com
Wed Jul 25 10:38:55 EDT 2007
Is this such a stupid question that it doesn't even warrant a response?
~ Daniel
On Jul 20, 2007, at 11:52 AM, Daniel Miller wrote:
> Hello,
>
> Twisted PB sometimes loses its connection to the server. In this
> case, a PBConnectionLost exception is raised on the client. It
> would be nice to implement a fail-safe(er) way of calling remote
> methods that would retry when necessary until the remote method has
> been called successfully and the result has been returned. Note
> that this is only necessary when the remote method call should be
> invoked exactly once on the server (i.e. for POST-like calls that
> change server state). In the case of GET-like requests, a simpler
> retry mechanism will do.
>
> The motivation for this is based on my experience of using Twisted
> in an application I am developing. The network communications are
> all happening on a LAN. The good end of the network is connected
> directly to a 100Mbps switch at the server. Failures occur more
> frequently at the other end (my end) of the network, which is
> connected through a 10/100 hub that is connected to the main
> switch. I rigged up a quick test with a 1000-request sample size;
> failures ranged from 28/1000 on the good end of the network to
> 83/1000 on the bad end of the network. One request consists of a
> single remote method call through PB. A success was when I got the
> expected result, a failure was when I got a PBConnectionLost error.
>
> The following is pseudo code that I came up with to mitigate the
> problem.
>
> Simple request (GET - repeatedly call method until success or
> RETRY_LIMIT is reached)
> Client flow:
> for x in range(RETRY_LIMIT)
> invoke remote method without unique call identifier
> if result is not PBConnectionLost
> break
> if result is PBConnectionLost
> raise server not responding error
> Server flow:
> (nothing special, just plain PB)
>
> Complex request (POST - server-side method is invoked exactly once)
> Client flow:
> use simple retry method to get a unique call identifier from
> server
> a timeout value is also sent along to tell the server how
> long to hold the results of this request
> for x in range(RETRY_LIMIT)
> invoke remote method with identifier
> if return value is not PBConnectionLost
> break
> if result is PBConnectionLost
> raise server not responding error
> using simple retry method tell server to discard unique call
> identifier
> Server flow:
> receive request for unique call identifier
> create and store identifier with UNCALLED token
> schedule identifier to be discarded with timeout value
> supplied by client
> return identifier to client
> receive remote method invocation with unique call identifier
> branch on value stored with unique call identifier
> if UNCALLED
> update identifier with CALLED token
> invoke method
> while result is deferred
> get defer result
> store COMPLETED token and unique with unique call
> identifier
> if there is another invocation WAITING
> this means the connection was lost
> signal the WAITING request with the result
> else
> return result to client
> if CALLED
> store WAITING token with unique identifier (must not
> overwrite other call tokens)
> defer until COMPLETED
> if COMPLETED
> return result to client
> if unique call identifier does not exist
> raise error
> receive request to discard unique call identifier
> if identifier exists
> discard identifier, tokens, and result
> return True
>
> I realize that implementing this would not eliminate network
> errors. It would simply reduce the likelyhood of failed method
> calls due to dropped connections. If I have my math correct (I
> always struggle a bit with statistics), even a RETRY_LIMIT of 2
> would reduce the probability of a lost connection to 0.6% at the
> worst (<0.1% on the good end of the network).
>
> I have two questions:
>
> 1. Does something like this already exist?
> 2. Is this a totally stupid idea? (would it be better to improve
> our physical network than to try to band-aid the problem with
> something like this?)
More information about the Twisted-Python
mailing list