Opened 12 years ago

Closed 8 years ago

#4395 defect closed fixed (fixed)

conch appears to have a bug with rekeying the connection

Reported by: Michael Hudson-Doyle Owned by:
Priority: normal Milestone:
Component: conch Keywords:
Cc: Michael Hudson-Doyle, bshi, zooko@…, Fabian Sturm Branch: branches/conch-rekey-4395
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description

I don't have a test case that doesn't include all of Launchpad yet, but it certainly seems as if the conch server code has a bug -- possibly only with SSHProcessProtocol -- when the client rekeys the connection.

My theory is that conch just carries on pumping the data from the subprocess to the client, ballsing up the connection, but this is just a guess of course.

Attachments (1)

ssh_rekey_bug.py (5.7 KB) - added by ousado 11 years ago.
after openssh requests a key renegotiation, CPU goes up to 100% and the ssh protocol hangs

Download all attachments as: .zip

Change History (31)

comment:1 Changed 12 years ago by bshi

Cc: bshi added

comment:2 Changed 11 years ago by ousado

My observation is that

  1. SSHServerTransport doesn't adhere to RFC 4253 in that it does not send SSH_MSG_KEXINIT after receiving it:
  1. Key Re-Exchange [...] When this message is received, a party MUST respond with its own SSH_MSG_KEXINIT message, except when the received SSH_MSG_KEXINIT already was a reply. [...]

This seems to be an easy fix by calling self.sendKexInit() from SSHServerTransport.ssh_KEXINIT in case it doesn't handle the reply.

  1. SSHServerTransport does not make sure that during the key renegotiation no illegal packets (with message type other than given below) are sent.

7.1. Algorithm Negotiation:

[...] Once a party has sent a SSH_MSG_KEXINIT message for key exchange or re-exchange, until it has sent a SSH_MSG_NEWKEYS message (Section 7.3), it MUST NOT send any messages other than: Transport layer generic messages (1 to 19) (but SSH_MSG_SERVICE_REQUEST and SSH_MSG_SERVICE_ACCEPT MUST NOT be sent); Algorithm negotiation messages (20 to 29) (but further SSH_MSG_KEXINIT messages MUST NOT be sent); Specific key exchange method messages (30 to 49).

Changed 11 years ago by ousado

Attachment: ssh_rekey_bug.py added

after openssh requests a key renegotiation, CPU goes up to 100% and the ssh protocol hangs

comment:3 Changed 11 years ago by Michael Hudson-Doyle

comment:4 Changed 11 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/conch-rekey-4395

(In [30377]) Branching to 'conch-rekey-4395'

comment:5 Changed 11 years ago by Jean-Paul Calderone

(In [30396]) While key exchange is in progress, queue messages not allowed to be exchanged during that period.

refs #4395

comment:6 Changed 11 years ago by Jean-Paul Calderone

Keywords: review added
Owner: z3p deleted

Should be fixed now. Here are the build results.

Some explanation of what's going on, to perhaps make a reviewer's job easier:

  1. I refactored some existing tests in test_conch so that I could re-use the OpenSSH client setup code, as well as some server-side setup code. If we ever have support for configurable rekeying limits in the Conch client, then we could also set up a conch-to-conch test for this, but for now it's just openssh-to-conch.
  2. The new behaviour broke a lot of tests in test_transport. The setup for many of these tests created a protocol instance connected to a StringTransport. The state an SSHTransportBase is in immediately after connection is that it is waiting for key exchange to complete. Many of the tests then send packets that aren't related to key exchange, and which this branch no longer allows to be sent during key exchange (as they are forbidden by the spec). So there is now simulateKeyExchange and finishKeyExchange which do similar things but with minor differences. A handful of existing tests now call these methods to get into a state where they are allowed to do the other things they want to do.
  3. On the implementation side, there is now a state variable tracking what is going on with respect key exchange. Non-key exchange messages are queued while key exchange is happening, then flushed afterwards.
  4. Responses to KEXINIT messages are now generated, rather than assuming that a received KEXINIT must be a response to the KEXINIT message sent from connectionMade. This means that if a peer requests a new key exchange, Conch will do one. This is what the spec requires.
  5. Common parts of the client and server ssh_NEWKEYS implementations got factored up into the base class so that I could just add one _flushBlocked call instead of two.
  6. _ssh_KEXDH_INIT got factored out of ssh_KEX_DH_GEX_REQUEST_OLD with no real changes. I thought this would be necessary but it didn't turn out to be. However, I like the code more in this shape.
  7. Ditto for _ssh_KEXDH_REPLY / ssh_KEX_DH_GEX_GROUP

comment:7 Changed 11 years ago by Michael Hudson-Doyle

I haven't looked at the changes at all yet, but would it be possible to have some kind of notification that key exchange is happening? In the case where the ssh stream is connected to a subprocess, it would be nice to call pauseProducing on the right thing rather than buffer all the messages in memory. I guess this needn't happen on this branch...

comment:8 Changed 11 years ago by Jean-Paul Calderone

would it be possible to have some kind of notification that key exchange is happening?

Sure. On receipt of KEXINIT a callback could be fired into some application object (the transport only knows about the service, so probably the service). Likewise, NEWKEYS could do something similar. I don't think it's a major concern though, since key exchange is fast and only normally only happens after about a GB of traffic has been sent. I'll file a ticket for it soon, though (along with a few others I came across while working on this).

comment:9 Changed 11 years ago by Michael Hudson-Doyle

Owner: set to Michael Hudson-Doyle
Status: newassigned

comment:10 Changed 11 years ago by Michael Hudson-Doyle

Keywords: review removed
Owner: changed from Michael Hudson-Doyle to Jean-Paul Calderone
Status: assignednew

Hurrah! This looks very much like it will fix the problem we've been facing in Launchpad, so thanks for working on this (and explaining stuff to me on IRC). The test coverage looks good too, although the integration test is pretty horrid.

I don't have any real problems with the code, but I do have a number of questions and suggestions that might improve the clarity of the code. And about 3 bugs that code inspection as part of review revealed...

Apologies for any incoherence.

> === modified file 'twisted/conch/ssh/transport.py'
> --- twisted/conch/ssh/transport.py	2010-07-08 16:01:47 +0000
> +++ twisted/conch/ssh/transport.py	2011-01-05 22:05:40 +0000
> @@ -165,6 +165,18 @@
>          decrypting data twice, the first bytes are decrypted and stored until
>          the whole packet is available.
>
> +    @ivar _keyExchangeState: The current protocol state with respect to key
> +        exchange.  This is either C{_KEY_EXCHANGE_NONE} if no key exchange is in
> +        progress (and returns to this value after any key exchange completes),
> +        C{_KEY_EXCHANGE_REQUESTED} if this side of the connection initiated a
> +        key exchange, and C{_KEY_EXCHANGE_PROGRESSING} if the other side of the
> +        connection initiated a key exchange.

[1] Can you mention that this starts out as _KEY_EXCHANGE_NONE, but as connectionMade initiates key exchange, it will be _KEY_EXCHANGE_REQUESTED as soon as a connection gets going? Or words to that effect. Um, I guess this only applies to the server and that something different happens in the client, so maybe this is not the place to explain this...

> +    @ivar _blockedByKeyExchange: Whenever C{_keyExchangeState} is not
> +        C{_KEY_EXCHANGE_NONE}, this is a C{list} of pending messages which were
> +        passed to L{sendPacket} but could not be sent because it is not legal to
> +        send them while a key exchange is in progress.  When the key exchange
> +        completes, another attempt is made to send these messages.
>      """
>
>
> @@ -197,6 +209,17 @@
>      sessionID = None
>      service = None
>
> +    # There is no key exchange activity in progress.
> +    _KEY_EXCHANGE_NONE = '_KEY_EXCHANGE_NONE'
> +
> +    # Key exchange is in progress and we started it.
> +    _KEY_EXCHANGE_REQUESTED = '_KEY_EXCHANGE_REQUESTED'
> +
> +    # Key exchange is in progress and the peer started it.
> +    _KEY_EXCHANGE_PROGRESSING = '_KEY_EXCHANGE_PROGRESSING'

[2] As I've just waffled on IRC, I'm not sure that this is the most helpful way of labelling the key exchange state machine. Would it be possible to have the states be 1) kex not happening, 2) server requested kex, 3) kex in progress and mention that "client requested kex" isn't an interesting state because we immediately respond to kexinit with a kexinit of our own?

> +
> +    _keyExchangeState = _KEY_EXCHANGE_NONE
> +    _blockedByKeyExchange = None
>
>      def connectionLost(self, reason):
>          if self.service:
> @@ -218,6 +241,20 @@
>
>
>      def sendKexInit(self):
> +        """
> +        Send a I{KEXINIT} message to initiate key exchange or to respond to a
> +        key exchange initiated by the peer.
> +
> +        @raise RuntimeError: If a key exchange has already been started and it
> +            is not appropriate to send a I{KEXINIT} message at this time.
> +
> +        @return: C{None}
> +        """
> +        if self._keyExchangeState != self._KEY_EXCHANGE_NONE:
> +            raise RuntimeError(
> +                "Cannot send KEXINIT while key exchange state is %r" % (
> +                    self._keyExchangeState,))
> +
>          self.ourKexInitPayload = (chr(MSG_KEXINIT) +
>                 randbytes.secureRandom(16) +
>                 NS(','.join(self.supportedKeyExchanges)) +
> @@ -232,12 +269,45 @@
>                 NS(','.join(self.supportedLanguages)) +
>                 '\000' + '\000\000\000\000')
>          self.sendPacket(MSG_KEXINIT, self.ourKexInitPayload[1:])
> +        self._keyExchangeState = self._KEY_EXCHANGE_REQUESTED
> +        self._blockedByKeyExchange = []
> +
> +
> +    def _flushBlocked(self):
> +        """
> +        Re-send any messages which were queued up during key exchange.
> +        """
> +        self._keyExchangeState = self._KEY_EXCHANGE_NONE
> +        messages = self._blockedByKeyExchange
> +        self._blockedByKeyExchange = None
> +        for (messageType, payload) in messages:
> +            self.sendPacket(messageType, payload)

[3] I didn't think from the message name or docstring that this would reset _keyExchangeState. In fact, I don't really see why it's factored out of ssh_NEWKEYS...

> +
> +
> +    def _allowedKeyExchangeMessageType(self, messageType):
> +        """
> +        Determine if the given message type may be sent while key exchange is in
> +        progress.
> +
> +        @param messageType: The type of message
> +        @type messageType: C{int}
> +
> +        @return: C{True} if the given type of message may be sent while key
> +            exchange is in progress, C{False} if it may not.
> +        @rtype: C{bool}
> +        """
> +        if 1 <= messageType <= 19:
> +            return messageType not in (MSG_SERVICE_REQUEST, MSG_SERVICE_ACCEPT)
> +        if 20 <= messageType <= 29:
> +            return messageType not in (MSG_KEXINIT,)
> +        return 30 <= messageType <= 49

[4] This is peculiarly written. For a start, surely the first two conditions collapse to:

if 1 <= messageType <= 29:

return messageType not in (

MSG_SERVICE_REQUEST, MSG_SERVICE_ACCEPT, MSG_KEXINIT)

? And do we really not have some symbolic constants we can use instead of the magic numbers?

Finally, a reference to chapter and verse of the RFC for what's allowed and what's not would be appreciated. FWIW, it's section 7.1, and now I've read it, I understand why the method was written this way :-) So I'd be happy if you just added (and maybe quoted) the reference to the RFC.

>
>
>      def sendPacket(self, messageType, payload):
>          """
> -        Sends a packet.  If it's been set up, compress the data, encrypt it,
> -        and authenticate it before sending.
> +        Sends a packet.  If it's been set up, compress the data, encrypt it, and
> +        authenticate it before sending.  If key exchange is in progress and the
> +        message is not part of key exchange, queue it to be sent later.
>
>          @param messageType: The type of the packet; generally one of the
>                              MSG_* values.
> @@ -245,6 +315,11 @@
>          @param payload: The payload for the message.
>          @type payload: C{str}
>          """
> +        if self._keyExchangeState != self._KEY_EXCHANGE_NONE:
> +            if not self._allowedKeyExchangeMessageType(messageType):
> +                self._blockedByKeyExchange.append((messageType, payload))
> +                return
> +
>          payload = chr(messageType) + payload
>          if self.outgoingCompression:
>              payload = (self.outgoingCompression.compress(payload)
> @@ -389,6 +464,18 @@
>              self.sendUnimplemented()
>
>
> +    # Rekeying looks like this:

[5] In the case that the client initiates it, right?

> +    #
> +    #  C> MSG_KEXINIT
> +    #  S> MSG_KEXINIT
> +    #  C> MSG_KEX_DH_GEX_REQUEST  or   MSG_KEXDH_INIT
> +    #  S> MSG_KEX_DH_GEX_GROUP    or   MSG_KEXDH_REPLY
> +    #  C> MSG_KEX_DH_GEX_INIT     or   --
> +    #  S> MSG_KEX_DH_GEX_REPLY    or   --
> +    #  C> MSG_NEWKEYS
> +    #  S> MSG_NEWKEYS
> +    #
> +
>      def ssh_KEXINIT(self, packet):
>          """
>          Called when we receive a MSG_KEXINIT message.  Payload::
> @@ -454,6 +541,12 @@
>          log.msg('incoming: %s %s %s' % (self.nextEncryptions.inCipType,
>                                          self.nextEncryptions.inMACType,
>                                          self.incomingCompressionType))
> +
> +        if self._keyExchangeState != self._KEY_EXCHANGE_REQUESTED:
> +            self.sendKexInit()

[6] Given that this will raise RuntimeError if _keyExchangeState is _KEY_EXCHANGE_PROGRESSING (and this is the only vaguely plausible way that sendKexInit could be called with _keyExchangeState != _KEY_EXCHANGE_NONE), maybe the RuntimeError could be raised from this method and the error in sendKexInit changed to an AssertionError?

> +        else:
> +            self._keyExchangeState = self._KEY_EXCHANGE_PROGRESSING
> +
>          return kexAlgs, keyAlgs, rest # for SSHServerTransport to use
>
>
> @@ -617,6 +710,16 @@
>          self.sendPacket(MSG_NEWKEYS, '')
>
>
> +    def ssh_NEWKEYS(self, packet):
> +        log.msg('NEW KEYS')
> +        self.currentEncryptions = self.nextEncryptions
> +        if self.outgoingCompressionType == 'zlib':
> +            self.outgoingCompression = zlib.compressobj(6)
> +        if self.incomingCompressionType == 'zlib':
> +            self.incomingCompression = zlib.decompressobj()
> +        self._flushBlocked()

[7] I think a comment/docstring that this is not a complete implementation of handling NEWKEYS packets and that subclasses need to do something more would be good. Hmm, I think you can move the entire SSHServerTransport implementation of this method into the base class? That will get rid of the duplicate logging of 'NEW KEYS' too :-)

This also looks buggy if the client for whatever insane reason switches from 'zlib' compression to 'none' but that's a different bug I guess.

>      def isEncrypted(self, direction="out"):
>          """
>          Return True if the connection is encrypted in the given direction.
> @@ -741,17 +844,49 @@
>                  self.ignoreNextPacket = True # guess was wrong
>
>
> -    def ssh_KEX_DH_GEX_REQUEST_OLD(self, packet):
> +    def _ssh_KEXDH_INIT(self, packet):
...
> === modified file 'twisted/conch/test/test_conch.py'
> --- twisted/conch/test/test_conch.py	2009-07-20 15:41:45 +0000
> +++ twisted/conch/test/test_conch.py	2011-01-05 22:05:40 +0000
> @@ -1,15 +1,22 @@
>  # -*- test-case-name: twisted.conch.test.test_conch -*-
> -# Copyright (c) 2001-2008 Twisted Matrix Laboratories.
> +# Copyright (c) 2001-2010 Twisted Matrix Laboratories.
>  # See LICENSE for details.
>
>  import os, sys, socket
> +from itertools import count
> +
> +from zope.interface import implements
>
>  from twisted.cred import portal
>  from twisted.internet import reactor, defer, protocol
>  from twisted.internet.error import ProcessExitedAlready
> +from twisted.internet.task import LoopingCall
>  from twisted.python import log, runtime
>  from twisted.trial import unittest
>  from twisted.conch.error import ConchError
> +from twisted.conch.avatar import ConchUser
> +from twisted.conch.ssh.session import ISession, SSHSession, wrapProtocol
> +
>  try:
>      from twisted.conch.scripts.conch import SSHSession as StdioInteractingSession
>  except ImportError, e:
> @@ -254,26 +261,15 @@
>
>
>
> -class ForwardingTestBase:
> -    """
> -    Template class for tests of the Conch server's ability to forward arbitrary
> -    protocols over SSH.
> -
> -    These tests are integration tests, not unit tests. They launch a Conch
> -    server, a custom TCP server (just an L{EchoProtocol}) and then call
> -    L{execute}.
> -
> -    L{execute} is implemented by subclasses of L{ForwardingTestBase}. It should
> -    cause an SSH client to connect to the Conch server, asking it to forward
> -    data to the custom TCP server.
> -    """
> -
> +class ConchServerSetupMixin:
>      if not Crypto:
>          skip = "can't run w/o PyCrypto"
>
>      if not pyasn1:
>          skip = "can't run w/o PyASN1"
>
> +    realmFactory = ConchTestRealm
> +
>      def _createFiles(self):
>          for f in ['rsa_test','rsa_test.pub','dsa_test','dsa_test.pub',
>                    'kh_test']:
> @@ -301,7 +297,7 @@
>          Make a L{ConchTestServerFactory}, which allows us to start a
>          L{ConchTestServer} -- i.e. an actually listening conch.
>          """
> -        realm = ConchTestRealm()
> +        realm = self.realmFactory()
>          p = portal.Portal(realm)
>          p.registerChecker(ConchTestPublicKeyChecker())
>          factory = ConchTestServerFactory()
> @@ -331,6 +327,21 @@
>                  defer.maybeDeferred(self.echoServer.stopListening)])
>
>
> +
> +class ForwardingTestBase(ConchServerSetupMixin):

[8] I think it would be better style if this class was renamed to SomethingOrOtherMixin, like this other classes in this file.

[9] Also, the reference to ForwardingTestBase.setUp in OpenSSHClientMixin needs to be changed to a reference to ConchServerSetupMixin.setUp.

> +    """
> +    Template class for tests of the Conch server's ability to forward arbitrary
> +    protocols over SSH.
> +
> +    These tests are integration tests, not unit tests. They launch a Conch
> +    server, a custom TCP server (just an L{EchoProtocol}) and then call
> +    L{execute}.
> +
> +    L{execute} is implemented by subclasses of L{ForwardingTestBase}. It should
> +    cause an SSH client to connect to the Conch server, asking it to forward
> +    data to the custom TCP server.
> +    """
> +
>      def test_exec(self):
>          """
>          Test that we can use whatever client to send the command "echo goodbye"
> @@ -370,8 +381,95 @@
>
>
>
> -class OpenSSHClientTestCase(ForwardingTestBase, unittest.TestCase):
> -
> +class RekeyAvatar(ConchUser):
> +    """
> +    This avatar implements a shell which sends 60 numbered lines to whatever
> +    connects to it, then closes the session with a 0 exit status.
> +
> +    60 lines is selected as being enough to send more than 2kB of traffic, the
> +    amount the client is configured to initiate a rekey after.
> +    """
> +    # Not really a full ISession implementation, but enough for these tests.  It
> +    # is necessary to declare this so that the result of ISession(thisObj) is
> +    # thisObj, since that's how Conch gets the ISession object for an avatar.
> +    implements(ISession)

[10] My admittedly faint recollection is that it's not standard to have the avatars implement ISession themselves, and this wording slightly suggests that it is. If I'm right could you change it to something like:

# Although avatars are usually adapted into ISession, here we just # directly implement enough of it to pass these tests.

> +    def __init__(self):
> +        ConchUser.__init__(self)
> +        self.channelLookup['session'] = SSHSession
> +
> +
> +    def openShell(self, transport):
> +        """
> +        Write 60 lines of data to the transport, then exit.
> +        """
> +        proto = protocol.Protocol()
> +        proto.makeConnection(transport)
> +        transport.makeConnection(wrapProtocol(proto))
> +
> +        # Send enough bytes to the connection so that a rekey is triggered in
> +        # the client.
> +        def write(counter):
> +            i = counter()
> +            if i == 60:
> +                call.stop()
> +                transport.session.conn.sendRequest(
> +                    transport.session, 'exit-status', '\x00\x00\x00\x00')
> +                transport.loseConnection()
> +            else:
> +                transport.write("line #%02d\n" % (i,))
> +
> +        # The timing for this loop is an educated guess (and/or the result of
> +        # experimentation) to exercise the case where a packet is generated
> +        # mid-rekey.  Since the other side of the connection is (so far) the
> +        # OpenSSH command line client, there's no easy way to determine when the
> +        # rekey has been initiated.  If there were, then generating a packet
> +        # immediately at that time would be a better way to test the
> +        # functionality being tested here.
> +        call = LoopingCall(write, count().next)
> +        call.start(0.01)

Vomit! But I guess it works...

> +
> +
> +    def closed(self):
> +        """
> +        Ignore the close of the session.
> +        """
> +
> +
> +
> +class RekeyRealm:
> +    """
> +    This realm gives out new L{RekeyAvatar} instances for any avatar request.
> +    """
> +    def requestAvatar(self, avatarID, mind, *interfaces):
> +        return interfaces[0], RekeyAvatar(), lambda: None
> +
> +
> +
> +class RekeyTestsMixin(ConchServerSetupMixin):
> +    """
> +    TestCase mixin which defines tests exercising L{SSHTransportBase}'s handling
> +    of rekeying messages.
> +    """
> +    realmFactory = RekeyRealm
> +
> +    def test_clientRekey(self):
> +        """
> +        After a client-initiated rekey is completed, application data continues
> +        to be passed over the SSH connection.
> +        """
> +        process = ConchTestOpenSSHProcess()
> +        d = self.execute("", process, '-o RekeyLimit=2K')
> +        def finished(result):
> +            self.assertEquals(
> +                result,
> +                '\n'.join(['line #%02d' % (i,) for i in range(60)]) + '\n')
> +        d.addCallback(finished)
> +        return d

[11] Is there some way we can poke into the server and check that a rekey happened? I guess this answer is "yes, but it will be too horrible for words"

> +class OpenSSHClientMixin:
>      if not which('ssh'):
>          skip = "no ssh command-line client available"
>
> @@ -407,7 +505,26 @@
>
>
>
> +class OpenSSHClientForwardingTestCase(ForwardingTestBase, OpenSSHClientMixin,
> +                                      unittest.TestCase):
> +    """
> +    Connection forwarding tests run against the OpenSSL command line client.
> +    """
> +
> +
> +
> +class OpenSSHClientRekeyTestCase(RekeyTestsMixin, OpenSSHClientMixin,
> +                                 unittest.TestCase):
> +    """
> +    Rekeying tests run against the OpenSSL command line client.
> +    """
> +
> +
> +
>  class CmdLineClientTestCase(ForwardingTestBase, unittest.TestCase):
> +    """
> +    Connection forwarding tests run against the Conch command line client.
> +    """
>      if runtime.platformType == 'win32':
>          skip = "can't run cmdline client on win32"
>
> @@ -432,6 +549,3 @@
>          env['PYTHONPATH'] = os.pathsep.join(sys.path)
>          reactor.spawnProcess(process, sys.executable, cmds, env=env)
>          return process.deferred
> -
> -
> -
> === modified file 'twisted/conch/test/test_transport.py'
> --- twisted/conch/test/test_transport.py	2010-07-08 16:01:47 +0000
> +++ twisted/conch/test/test_transport.py	2011-01-05 22:05:40 +0000
> @@ -17,7 +17,7 @@
>
>  if pyasn1 is not None and Crypto is not None:
>      dependencySkip = None
> -    from twisted.conch.ssh import transport, common, keys, factory
> +    from twisted.conch.ssh import transport, keys, factory
>      from twisted.conch.test import keydata
>  else:
>      if pyasn1 is None:
> @@ -39,7 +39,7 @@
>  from twisted.python import randbytes
>  from twisted.python.reflect import qual
>  from twisted.python.hashlib import md5, sha1
> -from twisted.conch.ssh import service
> +from twisted.conch.ssh import service, common
>  from twisted.test import proto_helpers
>
>  from twisted.conch.error import ConchError
> @@ -347,11 +347,37 @@
>          self.proto.sendPacket = stubSendPacket
>
>
> +    def finishKeyExchange(self, proto):
> +        """
> +        Deliver enough additional messages to C{proto} so that the key exchange
> +        which is started in L{SSHTransportBase.connectionMade} completes and
> +        non-key exchange messages can be sent and received.
> +        """
> +        proto.dataReceived("SSH-2.0-BogoClient-1.2i\r\n")
> +        proto.dispatchMessage(
> +            transport.MSG_KEXINIT, self._A_KEXINIT_MESSAGE)
> +        proto._keySetup("foo", "bar")
> +        # SSHTransportBase can't handle MSG_NEWKEYS, or it would be the right
> +        # thing to deliver next.  Instead, touch some internals.

[12] Even without the change I made here, this isn't really true any more is it? I think you can just call ssh_NEWKEYS fine.

> +        proto._keyExchangeState = proto._KEY_EXCHANGE_NONE
> +
> +
>      def tearDown(self):
>          randbytes.secureRandom = self.oldSecureRandom
>          self.oldSecureRandom = None
>
>
> +    def simulateKeyExchange(self, sharedSecret, exchangeHash):
> +        """
> +        Finish a key exchange by calling C{_keySetup} with the given arguments.
> +        Also do extra whitebox stuff to satisfy that method's assumption that
> +        some kind of key exchange has actually taken place.
> +        """
> +        self.proto._keyExchangeState = self.proto._KEY_EXCHANGE_REQUESTED
> +        self.proto._blockedByKeyExchange = []
> +        self.proto._keySetup(sharedSecret, exchangeHash)

[13] I guess doing this by 'going in through the front door' would be pretty long and painful? You'd have to encode packets and stuff (although this reminds me that conch would be a lot nicer if it separated protocol parsing from protocol implementation!).

> +
> +
>
>  class BaseSSHTransportTestCase(TransportTestCase):
>      """
> @@ -361,6 +387,19 @@
>
>      klass = MockTransportBase
>
> +    _A_KEXINIT_MESSAGE = (
> +        "\xAA" * 16 +
> +        common.NS('diffie-hellman-group1-sha1') +
> +        common.NS('ssh-rsa') +
> +        common.NS('aes256-ctr') +
> +        common.NS('aes256-ctr') +
> +        common.NS('hmac-sha1') +
> +        common.NS('hmac-sha1') +
> +        common.NS('none') +
> +        common.NS('none') +
> +        common.NS('') +
> +        common.NS('') +
> +        '\x00' + '\x00\x00\x00\x00')
>
>      def test_sendVersion(self):
>          """
> @@ -384,6 +423,7 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          self.transport.clear()
>          message = ord('A')
>          payload = 'BCDEFG'
> @@ -399,6 +439,7 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          proto.currentEncryptions = testCipher = MockCipher()
>          message = ord('A')
>          payload = 'BC'
> @@ -406,7 +447,18 @@
>          proto.sendPacket(message, payload)
>          self.assertTrue(testCipher.usedEncrypt)
>          value = self.transport.value()
> -        self.assertEquals(value, '\x00\x00\x00\x08\x04ABC\x99\x99\x99\x99\x01')
> +        self.assertEquals(
> +            value,
> +            # Four byte length prefix
> +            '\x00\x00\x00\x08'
> +            # One byte padding length
> +            '\x04'
> +            # The actual application data
> +            'ABC'
> +            # Random padding
> +            '\x99\x99\x99\x99'
> +            # The MAC
> +            '\x02')

Argh, I hate tests that rely on such distant setup -- why is it correct to assume that the random padding is all \x99s? Oh because of monkey patching done in setUp. Ah well, I guess it's hard to avoid this sort of thing with something as brutally stateful as a crypto protocol.

[14] Anyway, not really asking you to redesign conch to support better testability here, but can you at least put 'Random' in scare quotes?

>
>      def test_sendPacketCompressed(self):
> @@ -416,6 +468,7 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          proto.outgoingCompression = MockCompression()
>          self.transport.clear()
>          proto.sendPacket(ord('A'), 'B')
> @@ -433,17 +486,27 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          proto.currentEncryptions = testCipher = MockCipher()
>          proto.outgoingCompression = MockCompression()
>          message = ord('A')
>          payload = 'BC'
>          self.transport.clear()
>          proto.sendPacket(message, payload)
> +        self.assertTrue(testCipher.usedEncrypt)
>          value = self.transport.value()
>          self.assertEquals(
>              value,
> -            '\x00\x00\x00\x0e\x09CBA\x66\x99\x99\x99\x99\x99\x99\x99\x99\x99'
> -            '\x01')
> +            # Four byte length prefix
> +            '\x00\x00\x00\x0e'
> +            # One byte padding length
> +            '\x09'
> +            # Compressed application data
> +            'CBA\x66'
> +            # Random radding

[15] Same here, but also fix the typo? radding -> padding

> +            '\x99\x99\x99\x99\x99\x99\x99\x99\x99'
> +            # The MAC
> +            '\x02')
>
>
>      def test_getPacketPlain(self):
> @@ -453,6 +516,7 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          self.transport.clear()
>          proto.sendPacket(ord('A'), 'BC')
>          proto.buf = self.transport.value() + 'extra'
> @@ -488,6 +552,7 @@
>          """
>          proto = MockTransportBase()
>          proto.makeConnection(self.transport)
> +        self.finishKeyExchange(proto)
>          self.transport.clear()
>          proto.outgoingCompression = MockCompression()
>          proto.incomingCompression = proto.outgoingCompression
> @@ -563,6 +628,78 @@
>          self.assertEquals(buf, '\x00' * 5)
>
>
> +    def test_receiveKEXINITReply(self):
> +        """
> +        Immediately after connecting, the transport expects a KEXINIT message
> +        and does not reply to it.
> +        """
> +        self.transport.clear()
> +        self.proto.dispatchMessage(
> +            transport.MSG_KEXINIT, self._A_KEXINIT_MESSAGE)
> +        self.assertEquals(self.packets, [])
> +
> +
> +    def test_sendKEXINITReply(self):
> +        """
> +        When a KEXINIT message is received which is not a reply to an earlier
> +        KEXINIT message which was sent, a KEXINIT reply is sent.
> +        """
> +        self.finishKeyExchange(self.proto)
> +        del self.packets[:]
> +
> +        self.proto.dispatchMessage(
> +            transport.MSG_KEXINIT, self._A_KEXINIT_MESSAGE)
> +        self.assertEquals(len(self.packets), 1)
> +        self.assertEquals(self.packets[0][0], transport.MSG_KEXINIT)
> +
> +
> +    def test_sendKexInitTwiceFails(self):
> +        """
> +        A new key exchange cannot be started while a key exchange is already in
> +        progress.  If an attempt is made to send a I{KEXINIT} message using
> +        L{SSHTransportBase.sendKexInit} while a key exchange is in progress
> +        causes that method to raise a L{RuntimeError}.
> +        """
> +        self.assertRaises(RuntimeError, self.proto.sendKexInit)
> +
> +
> +    def test_sendKexInitBlocksOthers(self):
> +        """
> +        After L{SSHTransportBase.sendKexInit} has been called, messages types
> +        other than the following are queued and not sent until after I{NEWKEYS}
> +        is sent by L{SSHTransportBase._keySetup}.
> +
> +        RFC 4253, section 7.1.
> +        """
> +        # sendKexInit is called by connectionMade, which is called in setUp.  So
> +        # we're in the state already.
> +        disallowedMessageTypes = [
> +            transport.MSG_SERVICE_REQUEST,
> +            transport.MSG_KEXINIT,
> +            ]
> +
> +        # Drop all the bytes sent by setUp, they're not relevant to this test.
> +        self.transport.clear()
> +
> +        # Get rid of the sendPacket monkey patch, we are testing the behavior of
> +        # sendPacket.
> +        del self.proto.sendPacket
> +
> +        for messageType in disallowedMessageTypes:
> +            self.proto.sendPacket(messageType, 'foo')
> +            self.assertEquals(self.transport.value(), "")
> +
> +        self.finishKeyExchange(self.proto)
> +        # Make the bytes written to the transport cleartext so it's easier to
> +        # make an assertion about them.
> +        self.proto.nextEncryptions = MockCipher()
> +
> +        # Deliver the peer's NEWKEYS message, which should flush the messages
> +        # which were queued above.
> +        self.proto.ssh_NEWKEYS('')
> +        self.assertEquals(self.transport.value().count("foo"), 2)

[16] This 2 should be len(disallowedMessageTypes) I guess? I guess it also relies on 'foo' not showing up in the key exchange packets...

> +
> +
>      def test_sendDebug(self):
>          """
>          Test that debug messages are sent correctly.  Payload::
> @@ -1295,9 +1432,9 @@
>          Test that _keySetup sets up the next encryption keys.
>          """
>          self.proto.nextEncryptions = MockCipher()
> -        self.proto._keySetup('AB', 'CD')
> +        self.simulateKeyExchange('AB', 'CD')
>          self.assertEquals(self.proto.sessionID, 'CD')
> -        self.proto._keySetup('AB', 'EF')
> +        self.simulateKeyExchange('AB', 'EF')
>          self.assertEquals(self.proto.sessionID, 'CD')
>          self.assertEquals(self.packets[-1], (transport.MSG_NEWKEYS, ''))
>          newKeys = [self.proto._getKey(c, 'AB', 'EF') for c in 'ABCDEF']
> @@ -1322,9 +1459,11 @@
>          self.assertIdentical(self.proto.outgoingCompression, None)
>          self.assertIdentical(self.proto.incomingCompression, None)
>          self.proto.outgoingCompressionType = 'zlib'
> +        self.simulateKeyExchange('AB', 'CD')
>          self.proto.ssh_NEWKEYS('')
>          self.failIfIdentical(self.proto.outgoingCompression, None)
>          self.proto.incomingCompressionType = 'zlib'
> +        self.simulateKeyExchange('AB', 'EF')
>          self.proto.ssh_NEWKEYS('')
>          self.failIfIdentical(self.proto.incomingCompression, None)
>
> @@ -1554,9 +1693,9 @@
>          Test that _keySetup sets up the next encryption keys.
>          """
>          self.proto.nextEncryptions = MockCipher()
> -        self.proto._keySetup('AB', 'CD')
> +        self.simulateKeyExchange('AB', 'CD')
>          self.assertEquals(self.proto.sessionID, 'CD')
> -        self.proto._keySetup('AB', 'EF')
> +        self.simulateKeyExchange('AB', 'EF')
>          self.assertEquals(self.proto.sessionID, 'CD')
>          self.assertEquals(self.packets[-1], (transport.MSG_NEWKEYS, ''))
>          newKeys = [self.proto._getKey(c, 'AB', 'EF') for c in 'ABCDEF']
> @@ -1576,24 +1715,25 @@
>              secure[0] = True
>          self.proto.connectionSecure = stubConnectionSecure
>
> -        self.proto.nextEncryptions = transport.SSHCiphers('none', 'none',
> -                                                          'none', 'none')
> +        self.proto.nextEncryptions = transport.SSHCiphers(
> +            'none', 'none', 'none', 'none')
> +        self.simulateKeyExchange('AB', 'CD')
> +        self.assertNotIdentical(
> +            self.proto.currentEncryptions, self.proto.nextEncryptions)
> +
> +        self.proto.nextEncryptions = MockCipher()
>          self.proto.ssh_NEWKEYS('')
> -
> -        self.failIfIdentical(self.proto.currentEncryptions,
> -                             self.proto.nextEncryptions)
> -
> -        self.proto.nextEncryptions = MockCipher()
> -        self.proto._keySetup('AB', 'EF')
>          self.assertIdentical(self.proto.outgoingCompression, None)
>          self.assertIdentical(self.proto.incomingCompression, None)
>          self.assertIdentical(self.proto.currentEncryptions,
>                               self.proto.nextEncryptions)
>          self.assertTrue(secure[0])
>          self.proto.outgoingCompressionType = 'zlib'
> +        self.simulateKeyExchange('AB', 'GH')
>          self.proto.ssh_NEWKEYS('')
>          self.failIfIdentical(self.proto.outgoingCompression, None)
>          self.proto.incomingCompressionType = 'zlib'
> +        self.simulateKeyExchange('AB', 'IJ')
>          self.proto.ssh_NEWKEYS('')
>          self.failIfIdentical(self.proto.incomingCompression, None)
>

Cheers, mwh

comment:11 Changed 11 years ago by Jean-Paul Calderone

(In [30415]) Mention what that the initial key exchange state is

refs #4395 (first review point)

comment:12 Changed 11 years ago by Jean-Paul Calderone

(In [30416]) Clarify the PROGRESSING state and add a comment about the fourth (unrepresented) state

refs #4395 (second review point)

comment:13 Changed 11 years ago by Jean-Paul Calderone

(In [30417]) Squish _flushBlocked into _newKeys, the new name for the base implementation of ssh_NEWKEYS

refs #4395 (third review point)

comment:14 Changed 11 years ago by Jean-Paul Calderone

(In [30418]) Reference the RFC to explain where these numbers came from

refs #4395 (fourth review point)

comment:15 Changed 11 years ago by Jean-Paul Calderone

(In [30419]) Clarify the rekeying interaction with respect to client- vs server-initiated

refs #4395 (fifth review point)

comment:16 Changed 11 years ago by Jean-Paul Calderone

(In [30420]) Rename ForwardingTestBase to follow the "...Mixin" convention

refs #4395 (eighth review point)

comment:17 Changed 11 years ago by Jean-Paul Calderone

(In [30421]) The server is set up by ConchServerSetupMixin now

refs #4395 (ninth review point)

comment:18 Changed 11 years ago by Jean-Paul Calderone

(In [30422]) Clarify comment about ISession/adapters/stuff

refs #4395 (tenth review point)

comment:19 Changed 11 years ago by Jean-Paul Calderone

(In [30423]) Clarify the "random" bit (ie, it is not)

refs #4395 (fourteenth and fifteenth review points)

comment:20 Changed 11 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Thanks for the review!

I've addressed a number of the review points, with revisions linked in the above comments. For the points I didn't address with a commit:

  1. I'd rather not put an AssertionError in anywhere. Also since sendKexInit is public, application code might want to call it, so having the extra check inside it makes sense to me.
  2. I really wanted to put the server implementation into the base. But in the client implementation, the check for a non-empty packet needs to come before the ... other check (which sort of means "we're not ready to deal with a NEWKEYS message" because there's a Deferred involved in the host key check. I did rename the base implementation to _newKeys though, since it isn't really a complete handler for MSG_NEWKEYS.
  3. It might be useful to have the transport expose this event to higher levels. The session key is probably within reach, but it would be a little gross to touch that directly.
  4. There's some state missing in order to be able to call _newKeys from finishKeyExchange, unfortunately. :/ Since sendKexInit is never called, _blockedByKeyExchange is never initiated.
  5. Yea, exactly. I'd much rather get into the right state the real way, but it would involve more big opaque blobs of crap. I can see how this might be improved, but this doesn't seem like the right ticket for it.

Build results.

comment:21 Changed 11 years ago by Jean-Paul Calderone

trac helpfully re-numbered my comments. Those are not in response to points 6 through 10, they are 6, 7, 11, 12, 13. :(

comment:22 Changed 11 years ago by Michael Hudson-Doyle

For the response to [12], I think the comment is a bit misleading then. Can you clarify it a bit? Something like "because sendKexInit has not actually been called, the instance state isn't actually correct to call _newKeys() here. Instead, touch some internals.

The only other truly minor quibble is that I'd like to see a black line in here:

    # There is a fourth conceptual state not represented here: KEXINIT received
    # but not sent.  Since we always send a KEXINIT as soon as we get it, we
    # can't ever be in that state.
    _keyExchangeState = _KEY_EXCHANGE_NONE

I parse that as the comment documenting the next line, which is not the case.

Fix both these things and land please!

comment:23 Changed 11 years ago by Michael Hudson-Doyle

Keywords: review removed
Owner: set to Jean-Paul Calderone

comment:24 Changed 11 years ago by Michael Hudson-Doyle

And, fwiw, I've confirmed in local testing that this fixes the problem we've seen with Launchpad (at least with the rekey limit lowered in the client, I haven't tried to transfer a gig across a connection).

comment:25 Changed 11 years ago by Jean-Paul Calderone

(In [30424]) Clarify two comments about _keyExchangeState

refs #4395

comment:26 Changed 11 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [30425]) Merge conch-rekey-4395

Author: exarkun Reviewer: mwhudson Fixes: #4395

Add support to SSHTransportBase for performing more than a single key exchange, and for buffering messages unrelated to key exchange while key exchange is taking place, as required by the specification.

comment:27 Changed 11 years ago by zooko

Cc: zooko@… added

FYI here is the documentation update for the SFTP feature of Tahoe-LAFS, describing this issue:

http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend?action=diff&version=61&old_version=57

And here is the ticket to track Tahoe-LAFS's dependency on a version of Twisted with this fix.

http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1297# sftp might hang or consume 100% CPU if the client tries to rekey

comment:28 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted

comment:29 Changed 8 years ago by Fabian Sturm

Cc: Fabian Sturm added
Resolution: fixed
Status: closedreopened

Dear twisted devs!

Unfortunately this bug is still not fixed with Twisted 13.0.0. See https://github.com/paramiko/paramiko/issues/175 where I did some performance analysis and ended up triggering this bug again.

Any plans when to fix this?

comment:30 Changed 8 years ago by Jean-Paul Calderone

Resolution: fixed
Status: reopenedclosed

The bug is fixed, see the revision linked above. If there is another bug, please file a new ticket. Resolved tickets may *only* be re-opened if their resolution is reverted as well.

Note: See TracTickets for help on using tickets.