[Twisted-Python] Re: Handling PBConnectionLost errors

David Bolen db3l.net at gmail.com
Fri Jul 27 23:07:58 MDT 2007


Daniel Miller <daniel at keystonewood.com> writes:

> Is this such a stupid question that it doesn't even warrant a response?
>
> ~ Daniel

I agree with the other comment to the effect that the lack of response
may be more due to the underlying complexity of the question as to
lack of interest.  I know we definitely ran into similar issues in a
large PB-based system I worked on a while ago, and in the end
determined that we were best served by implementing our own system.

For example, your opening point about:

>>                       (...)                                 It
>> would be nice to implement a fail-safe(er) way of calling remote
>> methods that would retry when necessary until the remote method has
>> been called successfully and the result has been returned.  (...)

has an implicit assumption that the remote method will even continue
to exist once the disconnect has occurred - something that is by no
means guaranteed with PB.

That is, what if the method you are trying to call is on a
Referenceable you got back from the server, but it was to an object
instance on the server that was created just for your client
connection?  The connection breaking will destroy that remote object
and/or your ability to reconnect to it without special support on the
server to keep it persistent.  Not to mention however many other
references to that remote object you may have in existence on the
client which will no longer function even after a reconnect.

That's not to say that there aren't plausible ways to achieve what
you're looking for, but in general it becomes application specific,
since you'll need knowledge as to how state management on your server
is taking place, and what remote references are stable across
connections.  So if your use of random IDs and reconnect attempts is a
workable way for you to manage the server state in such a way that it
is reconnectable, then it may be perfectly good in your environment.

Perhaps some earlier messages of mine when we had just finished
putting together the remote wrapping and reconnect support in our
system.  See my responses to the thread at:

http://twistedmatrix.com/pipermail/twisted-python/2005-July/011030.html

and

http://twistedmatrix.com/pipermail/twisted-python/2005-July/011046.html

It hits on topics beyond that of just a reliable method call, but the
second message more specifically talks about the wrapper that
implements reconnections, and how we dealt with updating references
post-reconnect.  You can probably see how the design dovetailed with
our particular server side structure (the registry was persistent as
were the managers, so they provided the concrete point of
reattachment).  And the use of the wrappers around references meant we
could "correct" the wrappers for a new connection without having to
worry about what parts of the client application may have been holding
references.  Perhaps it will give you some other ideas in your own
system.

For your other points:

>> I have two questions:
>>
>> 1. Does something like this already exist?

There used to be a "sturdy" PB module in Twisted (looks like it's gone
in later releases) to attempt to provide a more persistent server
reference.  Also, if I recall correctly there's a
ReconnectingClientFactory class somewhere which, while not PB
specific, was a way to implement reconnections purely at the
factory/protocol level.  Of course, that's never really the complex
part in a PB application - it's figuring out what to do with your
remote references.

Some of the work in the publish.py and refpath.py PB modules are also
attempts to solve some of the issues involved here.

But I'm not aware of any existing approach that is generally suitable
for any application.  I rather doubt any single generic approach would
be possible, since PB provides for many mechanisms of statement
management and referenceability among servers and clients.

>> 2. Is this a totally stupid idea? (would it be better to improve
>> our physical network than to try to band-aid the problem with
>> something like this?)

It's never a stupid idea to engineer for network interruptions, but
like everything else a design must weigh benefits against
cost/development.  With that said, it might not be a bad idea to also
look into your network.  TCP connections are rather hard to break just
due to network transmission problems, and all your PB calls are going
across a single TCP session.  They might be significantly delayed on a
bad network, but the connection itself shouldn't fail unless something
more extreme (and unusual) is happening.  Given the level of problems
you're encountering, I wouldn't be surprised if something else was
awry.

Of course, that level of network troubleshooting can have it's own
cost/benefit analysis, and it might just be simpler to engineer around
the problem at the application level as you are doing.

For example, our system above was used over a WAN, and we actually had
several relays each of which had their own wrappers for the next hop,
so it was very important that while it might be down during an outage,
it properly healed itself as soon as whichever segment had failed was
reconnected.  But we generally expected most outages to represent real
network failures for a period of time (or a server going down), and
less so a constant percentage of failing calls.  Not that the networks
couldn't have packet loss, but network packet loss has to reach
several percent before really impacting TCP to the point where we
would notice).

But another PB-based application I'm working on now is less crucial.
Should I lose the server connection, I basically close down active UI
windows that were working with previous references and notify the user
that a disconnect has occurred.  They can then initiate a new
connection when they want.

-- David





More information about the Twisted-Python mailing list