[Twisted-Python] How to debug an AMP connection?

Peter Westlake peter.westlake at pobox.com
Wed Aug 5 11:59:07 EDT 2009


Background: I have a client program that calls callRemote, but the
Deferred that callRemote returns is not fired. This is an intermittent
error that only happens after some hours of traffic.

By putting some logging into AMP, it's apparent that the server gets
as far as sending the reply using BoxDispatcher._safeEmit. The
original version of that ignores connection errors, but I overrode it
with one that doesn't:

    trace_tags = False
 
    def _safeEmit(self, aBox):
        """
        Emit a box, ignoring L{ProtocolSwitched} and L{ConnectionLost}
        errors
        which cannot be usefully handled.
        """
        tag = None
        if amp.ANSWER in aBox:
            tag = aBox[amp.ANSWER]
        elif amp.ERROR in aBox:
            tag = aBox[amp.ERROR]
        if self.trace_tags and tag:
            log.debug('_safeEmit', tag)
        try:
            aBox._sendTo(self.boxSender)
        except (ProtocolSwitched, ConnectionLost):
            log.debug('ProtocolSwitched or ConnectionLost in _safeEmit',
            tag)

The message in the last line is never seen, nor is there anything else
in the logs on client or server to suggest that there has been any
problem with the connection.

On the client, BoxDispatcher._answerReceived logs all replies:

    def _answerReceived(self, box):
        """
        An AMP box was received that answered a command previously sent
        with
        L{callRemote}.

        @param box: an AmpBox with a value for its L{ANSWER} key.
        """
        b = box[ANSWER]
        question = self._outstandingRequests.pop(box[ANSWER])
        question.addErrback(self.unhandledError)
        log.msg('_answerReceived', b, question)
        question.callback(box)
        log.msg('_answerReceived', b, 'returning')

Likewise _errorReceived.

I always see both log.msgs or neither, so it isn't something going
wrong in the callback.

The client sends a request.
The server sends a reply with the same tag, and logs a message.
Until it goes wrong, the client receives the reply and logs it.
When it goes wrong, the client does not see the reply.

The protocol has the unfired Deferred in _outstandingRequests,
with the missing tag as key.

All this suggests that the problem is either in the low-level
network code, or somewhere in the network between the client
and server. But doesn't TCP/IP tell you if a packet doesn't
get through?

I tried tracing the packets using a python-pycapy script (attached),
but it showed more packets disappearing than actually were. That
could be for several reasons, including the packet sniffer missing
packets, and the _ask or _answer key not being the first thing in
the box. It also made the problem go away to a large extent, though
that may have been the "tail -F" I was running against the log. It
does take a lot longer for the bug to manifest when tracing it.

At this point, I'm stuck! If there isn't a solution to this,
the only answer will be to time out and retry the command.
But since TCP connections are meant to be reliable (or to
give an error if they fail), I hope it won't come to that.

Peter.



More information about the Twisted-Python mailing list