Opened 23 months ago

Last modified 5 months ago

#6024 enhancement new

add an event for "TLS handshake completed"

Reported by: glyph Owned by: amluto
Priority: normal Milestone:
Component: core Keywords:
Cc: luto@…, habnabit, adi@…, hs@… Branch: branches/tls-handshake-notification-6024-3
(diff, github, buildbot, log)
Author: exarkun, glyph, habnabit Launchpad Bug:

Description

As per #3687, it would be useful to know when TLS negotiation is completed so that application-level decisions which need to be made at the beginning of the connection (or the SSL connection, for startTLS) but take certificate information, SNI or other TLS parameters into account.

Attachments (6)

tls_notify.patch (14.9 KB) - added by amluto 10 months ago.
tls-notify-handshake-done-6204-1.patch
tls-when-handshake-done-6204-2-reviewfixes.patch (7.2 KB) - added by amluto 9 months ago.
follow-up patch to address some review comments
tls-when-handshake-done-6204-3-clean-up-tests.patch (8.4 KB) - added by amluto 9 months ago.
follow-up patch to remove HandshakeCallbackContextFactory
tls_notify_v3.patch (20.6 KB) - added by amluto 9 months ago.
Version 3, against trunk
tls_notify_6204_v4.patch (10.4 KB) - added by amluto 9 months ago.
Further fixes -- applies on top of svn://svn.twistedmatrix.com/svn/Twisted/branches/tls-handshake-notification-6024@40726 bbbe8e31-12d6-0310-92fd-ac37d47ddeeb
tls_notify_6204_v4_startTLS.patch (499 bytes) - added by shawnn 9 months ago.
Added support for whenHandshakeDone when using startTLS, applies on top of amluto's tls_notify_6204_v4.patch

Download all attachments as: .zip

Change History (60)

comment:1 Changed 23 months ago by glyph

Implementation thought: we should do this with an interface, in the style of IHalfCloseableProtocol, that gets invoked if it's provided.

comment:2 Changed 11 months ago by exarkun

#6769 was a duplicate of this.

comment:3 Changed 10 months ago by amluto

I think that we shouldn't be calling connectionMade until the handshake is done. Maybe this should be selectable.

Changed 10 months ago by amluto

tls-notify-handshake-done-6204-1.patch

comment:4 Changed 10 months ago by amluto

  • Cc luto@… added
  • Keywords review added

Here's a candidate fix (which could also be considered a fix for #6782).

comment:5 Changed 10 months ago by habnabit

  • Cc habnabit added

comment:6 follow-up: Changed 10 months ago by lvh

  • Keywords review removed
  • Owner set to amluto

Hi! Thanks for working on this, amluto!

A few nitpicks:

  1. Please document the types of the @ivars
  2. Please don't put empty lines in between ivar defs
  3. _handshakeDeferreds stops being a list when the handshake is done, but the ivar docs don't specify that
  4. The following thing is removed:
247	 	        <SNIP SNIP> This 
248	 	        is used to control error reporting behavior.  If the handshake has not 
249	 	        completed, the underlying L{OpenSSL.SSL.Error} will be passed to the 
250	 	        application's C{connectionLost} method.  If it has completed, any 
251	 	        unexpected L{OpenSSL.SSL.Error} will be turned into a 
252	 	        L{ConnectionLost}.  This is weird; however, it is simply an attempt at 
253	 	        a faithful re-implementation of the behavior provided by 
254	 	        L{twisted.internet.ssl}. 

Why? AFAICT it's still true (I'll happily admit it's noisy, but it should probably remain documented).

  1. def notifyHandshakeDone(self): is missing a docstring
  2. def __tryHandshake(self): please use a single underscore for private methods
  3. What happens to the deferred previously returned from HandshakeCallbackContextFactory?

Thanks again, and I look forward to merging this patch :)

comment:7 Changed 10 months ago by itamar

I would suggest renaming the transport method whenHandhakeDone.

comment:8 in reply to: ↑ 6 ; follow-up: Changed 10 months ago by amluto

Replying to lvh:

Hi! Thanks for working on this, amluto!

A few nitpicks:

  1. Please document the types of the @ivars

Will do.

  1. Please don't put empty lines in between ivar defs

Will do.

  1. _handshakeDeferreds stops being a list when the handshake is done, but the ivar docs don't specify that

Will do.

  1. The following thing is removed:
247	 	        <SNIP SNIP> This 
248	 	        is used to control error reporting behavior.  If the handshake has not 
249	 	        completed, the underlying L{OpenSSL.SSL.Error} will be passed to the 
250	 	        application's C{connectionLost} method.  If it has completed, any 
251	 	        unexpected L{OpenSSL.SSL.Error} will be turned into a 
252	 	        L{ConnectionLost}.  This is weird; however, it is simply an attempt at 
253	 	        a faithful re-implementation of the behavior provided by 
254	 	        L{twisted.internet.ssl}. 

Why? AFAICT it's still true (I'll happily admit it's noisy, but it should probably remain documented).

I found the comment a little confusing IIRC. I'll fix it up (and document its interaction with the new handshake state stuff).

  1. def notifyHandshakeDone(self): is missing a docstring

The docstring is in ISSLTransport. Should I duplicate it? Should I reference it? I tried to follow existing practice, but I may have gotten it wrong.

  1. def __tryHandshake(self): please use a single underscore for private methods

Will do. There are some methods in Twisted -- is this a case of "follow the style of the file you're editing"?

  1. What happens to the deferred previously returned from HandshakeCallbackContextFactory?

It works exactly the same way as before. It's arguably silly now, though -- should I try to get rid of it?

Thanks again, and I look forward to merging this patch :)

My pleasure :)

I'll submit an updated version soon (and probably change it to whenHandshakeDone -- I like that better).

comment:9 in reply to: ↑ 8 Changed 9 months ago by amluto

Replying to amluto:

Replying to lvh:

Hi! Thanks for working on this, amluto!

A few nitpicks:

  1. Please document the types of the @ivars

Will do.

I think I had them mostly documented. I improved _handshakeError.

  1. Please don't put empty lines in between ivar defs

Will do.

I'm confused. Where are these empty lines? (Maybe I'm just being dense today.)

  1. _handshakeDeferreds stops being a list when the handshake is done, but the ivar docs don't specify that

Will do.

Done.

  1. The following thing is removed:
247	 	        <SNIP SNIP> This 
248	 	        is used to control error reporting behavior.  If the handshake has not 
249	 	        completed, the underlying L{OpenSSL.SSL.Error} will be passed to the 
250	 	        application's C{connectionLost} method.  If it has completed, any 
251	 	        unexpected L{OpenSSL.SSL.Error} will be turned into a 
252	 	        L{ConnectionLost}.  This is weird; however, it is simply an attempt at 
253	 	        a faithful re-implementation of the behavior provided by 
254	 	        L{twisted.internet.ssl}. 

Why? AFAICT it's still true (I'll happily admit it's noisy, but it should probably remain documented).

I found the comment a little confusing IIRC. I'll fix it up (and document its interaction with the new handshake state stuff).

I may still be confused, but I don't think the comment was or is correct. Can you elaborate?

  1. def notifyHandshakeDone(self): is missing a docstring

The docstring is in ISSLTransport. Should I duplicate it? Should I reference it? I tried to follow existing practice, but I may have gotten it wrong.

  1. def __tryHandshake(self): please use a single underscore for private methods

Will do. There are some methods in Twisted -- is this a case of "follow the style of the file you're editing"?

Done.

  1. What happens to the deferred previously returned from HandshakeCallbackContextFactory?

It works exactly the same way as before. It's arguably silly now, though -- should I try to get rid of it?

Thanks again, and I look forward to merging this patch :)

My pleasure :)

I'll submit an updated version soon (and probably change it to whenHandshakeDone -- I like that better).

Changed 9 months ago by amluto

follow-up patch to address some review comments

Changed 9 months ago by amluto

follow-up patch to remove HandshakeCallbackContextFactory

comment:10 Changed 9 months ago by amluto

  • Keywords review added

comment:11 Changed 9 months ago by exarkun

  • Owner amluto deleted

comment:12 Changed 9 months ago by exarkun

  • Keywords review removed
  • Owner set to amluto

Neither of the most recent two patches applies against trunk.

Please:

  1. Specify what your patches are against
  2. Generate svn-style patches (no "a/" and "b/" prefixes)
  3. Put everything in a single patch file

Thanks!

Changed 9 months ago by amluto

Version 3, against trunk

comment:13 Changed 9 months ago by amluto

  • Keywords review added

tls_notify_v3.patch should apply against trunk. Aside from rebasing it and rolling up the changes, I added a missing newline and moved the topfile into the correct place (I think).

comment:14 Changed 9 months ago by habnabit

  • Owner changed from amluto to habnabit
  • Status changed from new to assigned

This patch contains no documentation outside of docstrings. There should be at least some prose documentation for this. It should probably go in doc/core/howto/ssl.xhtml, preferably including a self-contained runnable example. I haven't tried writing code using this patch yet myself, and it would be nice to be able to run an example.

Additionally, adding methods to an interface breaks backward compatibility with anything that's implemented that interface externally. Instead, there should be a new interface for something which can notify on TLS handshake completion, and then make TLSMemoryBIOProtocol implement it.

Other than that, this generally looks good. I haven't done a thorough check through yet and won't mark the ticket as reviewed until I do. This is just preliminary.

comment:15 Changed 9 months ago by amluto

I'll write up a patch to ssl.xhtml with a variant of the echo server.

Should the new interface be something general like IHandshakingTransport?

I think I'll wait to send more patches until this has a branch to minimize the amount of rebasing needed.

comment:16 Changed 9 months ago by habnabit

I'm terrible at naming. ITLSHandshakingTransport maybe?

I'll make a branch later today when I have a chance to do a more thorough review.

comment:17 Changed 9 months ago by amluto

If it were ITLSHandshakingTransport then it would be odd to reuse it for some other transport (SSH, SASL, QUIC, etc). If it has a more general name, then any transport that goes through a connected-but-not-really-set-up phase could use it.

comment:18 Changed 9 months ago by habnabit

I considered that, and I suppose the patch doesn't callback/errback with anything that's TLS-specific. So, okay, maybe this is generalized enough to be IHandshakingTransport. I'm fine with it.

comment:19 follow-up: Changed 9 months ago by exarkun

Additionally, adding methods to an interface breaks backward compatibility with anything that's implemented that interface externally.

This is entirely true but everyone else completely ignores this fact and adds things to interfaces all the time. It's not fair to apply this criterion in a selective fashion against only certain contributions.

comment:20 in reply to: ↑ 19 Changed 9 months ago by glyph

Replying to exarkun:

Additionally, adding methods to an interface breaks backward compatibility with anything that's implemented that interface externally.

This is entirely true but everyone else completely ignores this fact and adds things to interfaces all the time. It's not fair to apply this criterion in a selective fashion against only certain contributions.

+1.

We should, by all means, have some more brainstorming sessions about how to address this consistently in the future, somewhere other than a ticket, though.

comment:21 Changed 9 months ago by habnabit

  • Author set to habnabit
  • Branch set to branches/tls-handshake-notification-6024

(In [40725]) Branching to 'tls-handshake-notification-6024'

comment:22 Changed 9 months ago by habnabit

(In [40726]) Applying patch from amulto. refs #6024

comment:24 Changed 9 months ago by habnabit

  • Keywords review removed
  • Owner changed from habnabit to amluto
  • Status changed from assigned to new
  1. whenHandshakeDone in tls.py needs a docstring.
  2. Is _tlsShutdownFinished always called? Even if the connection is lost midway through a handshake? It didn't look like there was a test for that scenario.
  3. Could lines 490-507 of tls.py not be moved into _handshakeSucceeded? (After renaming it, of course.) It seems to be a bit of duplication.
  4. Nit: I would say 'when the SSL handshake succeeds or fails' in the topfile, to make it more clear.
  5. You have some pyflakes and twistedchecker errors.

Thanks again for the contribution! I'm pretty excited for this to land. (Also sorry for misspelling your name in the commit message; I feel bad. ):

comment:25 Changed 9 months ago by habnabit

Oh man I am just the worst at trac. Sorry about that. Let's try again:

  1. whenHandshakeDone in tls.py needs a docstring.
  2. Is _tlsShutdownFinished always called? Even if the connection is lost midway through a handshake? It didn't look like there was a test for that scenario.
  3. Could lines 490-507 of tls.py not be moved into _handshakeSucceeded? (After renaming it, of course.) It seems to be a bit of duplication.
  4. Nit: I would say 'when the SSL handshake succeeds or fails' in the topfile, to make it more clear.
  5. You have some pyflakes and twistedchecker errors.

comment:26 Changed 9 months ago by habnabit

I was also just reminded: use L{} for True, False, and None.

comment:27 follow-up: Changed 9 months ago by exarkun

Thanks for the continued work on this issue.

This is entirely true but everyone else completely ignores this fact and adds things to interfaces all the time. It's not fair to apply this criterion in a selective fashion against only certain contributions.

This comment (of mine) notwithstanding, if you're changing an interface then you *do* at least have to update the implementations of that interface in Twisted.

twisted.internet._oldtls is deprecated but hasn't actually been removed yet. It would be very wrong to change the interface while leaving the old, now incomplete, TLS implementation in place.

The howto-style documentation is still missing. I'm sure the plan of adding it before merging this change will be followed through but it's too bad documentation is only being added at such a late stage in the development process. Actually trying to *use* and *explain* an API often reveals shortcomings of the API and suggests changes that should be made to it. It's awfully convenient to learn about those things before going to all the trouble of implementing the API rather than afterwards.

For example, I strongly suspect this feature would better be exposed as a protocol method. Something like this new optional protocol interface:

class ITLSProtocol(Interface):
    def handshakeCompleted():
        pass

However, it's difficult to compare these two APIs without some example uses.

comment:28 in reply to: ↑ 27 Changed 9 months ago by amluto

Replying to habnabit:

  1. whenHandshakeDone in tls.py needs a docstring.

The copy in the interface definition has a docstring. What's the standard practice here? Should I reference one docstring from the other? Should I duplicate it?

  1. Is _tlsShutdownFinished always called? Even if the connection is lost midway through a handshake? It didn't look like there was a test for that scenario.

Connection failure mid-handshake presumably looks just like a failed handshake, but I haven't verified that. I'll add a test.

  1. Could lines 490-507 of tls.py not be moved into _handshakeSucceeded? (After renaming it, of course.) It seems to be a bit of duplication.

I could do something like def _handshakeDone(deferred_func, result) that would be passed either Defer.callback or Defer.errback. It would be shorted and less duplicated, but it would also be a bit odd. I'll play with it.

  1. Nit: I would say 'when the SSL handshake succeeds or fails' in the topfile, to make it more clear.
  2. You have some pyflakes and twistedchecker errors.

I'll fix those.

Replying to exarkun:

Thanks for the continued work on this issue.

This is entirely true but everyone else completely ignores this fact and adds things to interfaces all the time. It's not fair to apply this criterion in a selective fashion against only certain contributions.

This comment (of mine) notwithstanding, if you're changing an interface then you *do* at least have to update the implementations of that interface in Twisted.

twisted.internet._oldtls is deprecated but hasn't actually been removed yet. It would be very wrong to change the interface while leaving the old, now incomplete, TLS implementation in place.

Fair enough. I assume that removing _oldtls at the same time is not an option.

The howto-style documentation is still missing. I'm sure the plan of adding it before merging this change will be followed through but it's too bad documentation is only being added at such a late stage in the development process. Actually trying to *use* and *explain* an API often reveals shortcomings of the API and suggests changes that should be made to it. It's awfully convenient to learn about those things before going to all the trouble of implementing the API rather than afterwards.

For example, I strongly suspect this feature would better be exposed as a protocol method. Something like this new optional protocol interface:

class ITLSProtocol(Interface):
    def handshakeCompleted():
        pass

However, it's difficult to compare these two APIs without some example uses.

Hmm. The thing I don't like about that is that I don't see how a protocol would know whether to expect handshakeCompleted to be called. For example, with the whenHandshakeDone approach, a protocol could run over plain TCP or TLS and only wait for a handshake when running over TLS. Another downside: new code would fail silently if run on an old version of Twisted.

If we stick with whenHandshakeDone, it might pay to add renenogiate at the same time to avoid more interface churn. I suspect that renegotiate would be straightforward to implement on top of this code.

I'll write up some kind of example.

Changed 9 months ago by amluto

Further fixes -- applies on top of svn://svn.twistedmatrix.com/svn/Twisted/branches/tls-handshake-notification-6024@40726 bbbe8e31-12d6-0310-92fd-ac37d47ddeeb

comment:29 Changed 9 months ago by amluto

  • Keywords review added
  • Owner amluto deleted

I attached more fixes. This adds an example, fixes pyflakes and the relevant twistedchecker stuff (I think, although I haven't added docstrings to test cases).

The open issues are:

  1. I'm still using C{True} instead of L{True}. All of tls.py does that. I think that fixing it should be a separate change.
  2. I'm still extended ISSLTransport. I'd like to see some kind of consensus on what to do before changing that. (The example I added should help.)

comment:30 Changed 9 months ago by habnabit

As a note, new examples should be using twisted.internet.react instead of importing the global reactor.

comment:31 Changed 9 months ago by habnabit

er, sorry, twisted.internet.task.react.

Changed 9 months ago by shawnn

Added support for whenHandshakeDone when using startTLS, applies on top of amluto's tls_notify_6204_v4.patch

comment:32 follow-up: Changed 6 months ago by adiroiban

  • Cc adi@… added

Is the while True: code from _tryHandshake safe? I assume that for slow connections... or bad peers that handshake could take a while until in is done and this could block the thread.

I also wanted to know when handshake was done and I used the SSL.Context.set_info_callback method to hook a watcher and then SSL.Connection.set_app_data() to update the SSL.Connection state.

Maybe TLSMemoryBIOProtocol.makeConnection could set a deferred as app_data, and then the context info callback could call that deferred once handshake is done.

Does it make any sense?

comment:33 in reply to: ↑ 32 Changed 6 months ago by amluto

Replying to adiroiban:

Is the while True: code from _tryHandshake safe? I assume that for slow connections... or bad peers that handshake could take a while until in is done and this could block the thread.

The call to do_handshake is nonblocking, so nothing blocks. The only way the loop can keep looping is if pyOpenSSL keeps saying WantWriteError, which would mean that pyOpenSSL is continually writing handshake-related data. This can't happen, barring a major bug in pyOpenSSL.

I also wanted to know when handshake was done and I used the SSL.Context.set_info_callback method to hook a watcher and then SSL.Connection.set_app_data() to update the SSL.Connection state.

Maybe TLSMemoryBIOProtocol.makeConnection could set a deferred as app_data, and then the context info callback could call that deferred once handshake is done.

The whole point of this patch is to do this cleanly instead of using hacks like that.

--Andy

comment:34 Changed 6 months ago by exarkun

  • Owner set to exarkun
  • Status changed from new to assigned

comment:35 Changed 6 months ago by exarkun

(In [41791]) Apply tls_notify_6204_v4.patch

refs #6024

comment:36 Changed 6 months ago by exarkun

  • Author changed from habnabit to exarkun, habnabit
  • Branch changed from branches/tls-handshake-notification-6024 to branches/tls-handshake-notification-6024-2

(In [41793]) Branching to 'tls-handshake-notification-6024-2'

comment:37 follow-up: Changed 6 months ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to amluto
  • Status changed from assigned to new

Thanks all. Sorry about the delay in getting this reviewed.

I made some changes, mostly boring coding standard and formatting things. I noticed two things:

  • The WantWriteError exception handler in _tryHandshake appears to be untested.
  • shawnn's patch to make whenHandshakeDone work after startTLS seems like an important one (startTLS is supposed to make the transport provide ISSLTransport and whenHandshakeDone is now part of that interface...). His patch looks quite simple but I have no idea if it is correct/complete - it needs unit tests as well. :) I didn't apply his patch to the branch yet.

Thanks again.

comment:38 in reply to: ↑ 37 Changed 6 months ago by amluto

Replying to exarkun:

Thanks all. Sorry about the delay in getting this reviewed.

I made some changes, mostly boring coding standard and formatting things. I noticed two things:

  • The WantWriteError exception handler in _tryHandshake appears to be untested.

This may be rather difficult to test. I suspect that the only way is to force the bio buffer size to be extremely small. I'll fiddle with it.

  • shawnn's patch to make whenHandshakeDone work after startTLS seems like an important one (startTLS is supposed to make the transport provide ISSLTransport and whenHandshakeDone is now part of that interface...). His patch looks quite simple but I have no idea if it is correct/complete - it needs unit tests as well. :) I didn't apply his patch to the branch yet.

Should I look at that patch? It seems straightforward, but I'm a little vague on the whole starttls mechanism.

Is there any decision on what interface whenHandshakeDone should live in?

Thanks again.

comment:39 Changed 6 months ago by exarkun

Is there any decision on what interface whenHandshakeDone should live in?

I'm aware of two options that have been considered.

First, the option that is currently implemented in the branch: dd the method to the existing interface (what the branch does). Applications can discover if the feature is available using getattr or by checking which version of Twisted is being used.

Second, introduce a new interface that extends ISSLTransport with this new method (ISecureTransport? Or will we get sued for that... ;). Update the implementations in Twisted to declare they implement this new interface instead. Deprecate the old interface. Applications can discover if the feature is available by checking for the new interface instead of the old interface.

Third, play some tricks with interface versioning. This isn't something we've attempted in Twisted before. I have some untested ideas about adding a version Attribute to interfaces the first time we want to change them, requiring that its value be a monotonically increasing integer starting with 1 (version 0 is the version of the interface that is missing the attribute) and then documenting the rest of the interface in terms of that value (ie "this attribute was introduced in version 3 of the attribute"). Again, this involves breaking new ground and some amount of trial and error to figure out if the approach is workable at all, let alone whether we really like it or not.

By the way, you've reminded me of another potential issue. IReactorSSL only uses twisted.protocols.tls if pyOpenSSL 0.10 or newer is available. For older versions of pyOpenSSL it falls back to twisted/internet/_oldtls.py. This new feature hasn't been implemented in _oldtls.py so the TLS transport applications get when pyOpenSSL is older than 0.10 will be missing this feature. With the 2nd (or 3rd) option above this isn't necessarily a problem - at least _oldtls.py will still advertise that it does not implement this feature (by declaring that it only implements ISSLTransport and not the new interface). With the first option above, _oldtls.py is now lying about its capabilities. :(

Personally, I prefer the 2nd option for basically this reason. Changing an interface generally makes liars out of all existing code. Even if we fix _oldtls.py so that it really implements this new feature that won't fix any third-party implementations of the interface. There will be a window during which if ISSLTransport.providedBy(transport): transport.whenHandshakeDone() does *not* actually work reliably. Depending on how active the development of third-party libraries implementing this interface is, the window could stretch out for quite a long time.

However, as I've mentioned elsewhere, the Twisted project doesn't actually have an official policy of not adding new attributes or methods to interfaces. I would like it if it did but that's an issue to be resolved elsewhere, not on this ticket. :)

So I think the ISSLTransport change implemented in this branch (plus fixes for _oldtls.py (oh, and maybe twisted.web.test.requesthelper.DummyChannel.SSL - sorry)) are acceptable if that's they way you really want to go.

comment:40 Changed 6 months ago by exarkun

Should I look at that patch? It seems straightforward, but I'm a little vague on the whole starttls mechanism

If you decide to go with ISSLTransport.whenHandshakeDone then I think startTLS does need to be considered here, yes.

The purpose of startTLS is to upgrade the transport from something that does not provide ISSLTransport to something that does provide ISSLTransport. If the result was an upgrade to something that provides ISSLTransport-but-not-whenHandshakeDone I think that would be quite unfortunate.

comment:41 Changed 6 months ago by adiroiban

The call to do_handshake is nonblocking, so nothing blocks. The only way the loop can keep looping is if pyOpenSSL keeps saying WantWriteError, which would mean that pyOpenSSL is continually writing handshake-related data. This can't happen, barring a major bug in pyOpenSSL.

I will give this code a try and see how it behaves.

My problem is not that the loops runs forever, but that it can take some time for the actual handshake to finish.

From my understanding for server scenario is that after the SERVER-HELLO reponse, the server will wait for client to send the secret key.
For client scenario, the client first waits for SERVER-HELLO

I searched openSSL code but could not find the timeout for handshake operation. I guess it rely on TCP timeout.. so if client lags in sending the secret key, the handshake might lag before continuing. TCP default timeouts are quite big.

This is why I think that hooking into connection state via a callback is closer to general Twisted async model.

Sorry for the noise. I am not an SSL/TLS expert so maybe I am wrong.

  • The WantWriteError exception handler in _tryHandshake appears to be untested.

This may be rather difficult to test. I suspect that the only way is to
force the bio buffer size to be extremely small. I'll fiddle with it.

I got into this issue/error by closing the SSL connection without writing anything to it.

Thanks!

comment:42 Changed 6 months ago by exarkun

Sorry for the noise. I am not an SSL/TLS expert so maybe I am wrong.

Thanks for looking into this! Twisted's TLS code can only benefit from more eyes. :)

My problem is not that the loops runs forever, but that it can take some time for the actual handshake to finish.

I think I'm convinced this loop is okay. Let me try to rephrase the response amluto gave in his comment above.

do_handshake can basically do one of two things (actually it could do both of them but let's ignore that).

  1. It can decide that it is "our" turn to talk in the handshake (for example, "we" are the client and it is the beginning of the connection, in TLS the client speaks first so in this case it is time to send the "ClientHello" message). In this case what happens is that do_handshake "writes" some bytes to the BIO (since we use a "memory BIO" this really just means copying some bytes into a buffer that we're paying attention to somewhere else). This write has two possible results.
    1. It could succeed completely. At this point it is no longer "our" turn to talk. Therefore OpenSSL will try to read some bytes. This will fail with WantReadError ("error! I want to read but I cannot."). In this case we break out of the loop.
    2. Or it could fail because there isn't enough room in the buffer to do the write. This causes do_handshake to fail with WantWriteError ("error! I want to write but I cannot."). If this happens then _flushSendBIO is called. This is the method that takes things out of the "memory BIO" and puts them into another buffer (Twisted's own send buffer). This makes more room in the "memory BIO". The loop isn't broken out of this time, instead we call do_handshake again. This time it should be able to do the write and we go back to the case immediately above (where we break out of the loop).
  2. It can decide it is not "our" turn to talk in the handshake. In this case it will try to read some bytes from the "memory BIO". Either there will be enough bytes there to read or not:
    1. If there are enough bytes then the handshake will proceed a little further. This lets us proceed to the point where it's "our" turn to talk then we go back to the first case.
    2. If there are not enough bytes then do_handshake will raise "WantReadError". This is like 1.1 above and we break out of the loop.

Notice that in all of the WantReadError cases - the cases where it is now time to wait for more bytes to arrive from the network - we immediately break out of the loop. Therefore we don't spend that waiting time inside _tryHandshake, we spend it in the reactor and doing other useful work until something calls dataReceived on the protocol again.

Hope that helped.

comment:43 Changed 5 months ago by hynek

  • Cc hs@… added

comment:44 Changed 5 months ago by glyph

  • Author changed from exarkun, habnabit to exarkun, glyph, habnabit
  • Branch changed from branches/tls-handshake-notification-6024-2 to branches/tls-handshake-notification-6024-3

(In [41998]) Branching to tls-handshake-notification-6024-3.

comment:45 follow-up: Changed 5 months ago by glyph

Okay! So, at this point, I think that adiroiban's suggestion about info_callback is actually completely the right approach, and not a hack.

There are various reasons that it might not be optimal for the initial handshake (it's somewhat challenging to implement, for one thing), but there's another problem: renegotiation.

Renegotiation is "secure" these days, which I take to mean that the renegotiation packet in the TLS stream itself is now itself authenticated and encrypted so that you can't get told to renegotiate by an arbitrary man in the middle. Nevertheless, renegotiation can still result in compromised security; your peer can change their security parameters from those you find acceptable to those you find unacceptable. Fundamentally, OpenSSL is the thing in the stack here which actually understands the TLS bytes, and the "info callback" is the way that it tells us what's going on with those bytes. A renegotiation can happen at the wire level without any observable reason to call do_handshake up at our protocol's level.

The problem, of course, is that Twisted users who already needed this functionality will already be using the info_callback, and pyOpenSSL won't actually let us get at that callback to keep sending that feedback to them compatibly, so for a general solution to this problem we will need some way for the user to unambiguously indicate that they are interested in this callback and that we should manipulate the context for them, or perhaps put it in the context itself.

I'm already using the info_callback for #5190 because I couldn't reliably abort the handshake in the right place (post-verification but pre-any-application-bytes-getting-through) from the "outside", so I think we should abandon the implementation in the branch.

In the process of thrashing around and discovering this, I did implement a significant number of useful test refactorings, and those should probably be applied to at least one other ticket.

comment:46 Changed 5 months ago by glyph

Oops, ambiguous antecedent alert: when I said "it" might not be optimal for the initial handshake, I meant the strategy implemented in the branch.

comment:47 in reply to: ↑ 45 ; follow-up: Changed 5 months ago by amluto

Replying to glyph:

Okay! So, at this point, I think that adiroiban's suggestion about info_callback is actually completely the right approach, and not a hack.

There are various reasons that it might not be optimal for the initial handshake (it's somewhat challenging to implement, for one thing), but there's another problem: renegotiation.

Renegotiation is "secure" these days, which I take to mean that the renegotiation packet in the TLS stream itself is now itself authenticated and encrypted so that you can't get told to renegotiate by an arbitrary man in the middle. Nevertheless, renegotiation can still result in compromised security; your peer can change their security parameters from those you find acceptable to those you find unacceptable. Fundamentally, OpenSSL is the thing in the stack here which actually understands the TLS bytes, and the "info callback" is the way that it tells us what's going on with those bytes. A renegotiation can happen at the wire level without any observable reason to call do_handshake up at our protocol's level.

Your peer can always break the security of your connection by leaking the secret. There's nothing you, Twisted, or anything else can do about that.

The renegotiation attacks were based on starting with a connection that shouldn't be trusted and renegotiating it to turn it into something else. The key is that, in all those attacks, the initial connection was to the attacker.

I've never heard of an attack that allows an arbitrary MITM to force a renegotiation.

[...]

I'm already using the info_callback for #5190 because I couldn't reliably abort the handshake in the right place (post-verification but pre-any-application-bytes-getting-through) from the "outside", so I think we should abandon the implementation in the branch.

Why not?

If you want to prevent application bytes from being sent, don't send them until the handshake is complete.

If you want to prevent application bytes from being received, just buffer them until the handshake is complete. I can probably guarantee that no data will be received until the handshake reports itself complete, so this should be unnecessary.

comment:48 in reply to: ↑ 47 ; follow-up: Changed 5 months ago by glyph

Replying to amluto:

Replying to glyph:

Your peer can always break the security of your connection by leaking the secret. There's nothing you, Twisted, or anything else can do about that.

And yet, browsers still display warnings when "mixed content" is present, or simply block it entirely. There are more threats on the internet, Horatio, than are dreamt of in your mathematical trust model :-).

Consider attacks that are partially on the node itself. Maybe it's not totally pwned but the attacker can somehow convince its TLS layer to renegotiate down to the null cipher or something equivalent to it. The point is, if the application has reason to verify anything about the post-handshake state of the connection other than the certificate, it has might have reason to examine subsequent handshakes as well as the first one.

The renegotiation attacks were based on starting with a connection that shouldn't be trusted and renegotiating it to turn it into something else. The key is that, in all those attacks, the initial connection was to the attacker.

Yes, sorry, what I was trying to say with my comments about renegotiation attacks was that I was not specifically trying to defend against any existing, known attack, for which there are protocol-level fixes anyway.

I've never heard of an attack that allows an arbitrary MITM to force a renegotiation.

Neither have I.

I'm already using the info_callback for #5190 because I couldn't reliably abort the handshake in the right place (post-verification but pre-any-application-bytes-getting-through) from the "outside", so I think we should abandon the implementation in the branch.

Why not?

Well, I tried, and it was complicated and hard. Then I tried using info_callback, and it was simple and easy. So I'm going with that one.

If you want to prevent application bytes from being sent, don't send them until the handshake is complete.

How, exactly, do I tell that the handshake is complete? I know that do_handshake returned successfully, but I don't actually know

If you want to prevent application bytes from being received, just buffer them until the handshake is complete. I can probably guarantee that no data will be received until the handshake reports itself complete, so this should be unnecessary.

What do you mean you can "guarantee" it? Are you an OpenSSL developer, or are you saying you'll be contractually liable for any damages that Twisted sustains as a result of handshakes being implicitly terminated by SSL_read (as they are wont to do)?

Basically the problem here is that by doing all this stuff with do_handshake, we're replicating a state machine that already exists inside OpenSSL, and we can only do it partially. Using info_callback is instead quite explicit, and works well.

I'm already doing this in #5190 quite effectively for its use-case of verifying hostnames. The only problem right now with doing this generally for notifying protocols with an arbitrary contextFactory, as we've previously discussed on the pyOpenSSL tracker, is that existing applications which depend on set_info_callback will break. Since that has been the only advice from Twisted developers for how to do this fairly critical task in the meanwhile, it seems likely that there's at least one application floating around out there that we should try not to break.

comment:49 in reply to: ↑ 48 Changed 5 months ago by glyph

Replying to glyph:

How, exactly, do I tell that the handshake is complete? I know that do_handshake returned successfully, but I don't actually know

Oh, trac, one day we'll make it possible to edit comments. Or throw you in the garbage.

As I was saying, I know that do_handshake completed successfully, but I don't know exactly what state it's in. According to OpenSSL, the way to determine the exact state of the phases of the handshake is ... wait for it ... the info callback ;-).

Sorry about the truncation,

-glyph

comment:50 follow-up: Changed 5 months ago by exarkun

According to OpenSSL, the way to determine the exact state of the phases of the handshake is ... wait for it ... the info callback

Are the phases other than "the handshake is done" important?

The following return values can occur:

  1.  The TLS/SSL handshake was successfully completed, a TLS/SSL connection has been established.

Or are you referring to the fact that renegotiation is effectively restarting the handshake and the return value of SSL_do_handshake is useless in observing when this happens?

comment:51 in reply to: ↑ 50 ; follow-up: Changed 5 months ago by glyph

Replying to exarkun:

Or are you referring to the fact that renegotiation is effectively restarting the handshake and the return value of SSL_do_handshake is useless in observing when this happens?

Exactly. What I meant was: do_handshake will totally reliably tell you that the handshake completed, but how do you know you need to call it?

We already have working code that deals with handshake-in-progress error codes from arbitrary read and write calls; we could change that to have a bunch of special-case logic to keep calling do_handshake for the initial handshake, but, why? It doesn't seem like that lets us do anything except avoid the info callback, and there doesn't seem to be any good reason to avoid it.

comment:52 in reply to: ↑ 51 ; follow-up: Changed 5 months ago by amluto

Replying to glyph:

Replying to exarkun:

Or are you referring to the fact that renegotiation is effectively restarting the handshake and the return value of SSL_do_handshake is useless in observing when this happens?

Exactly. What I meant was: do_handshake will totally reliably tell you that the handshake completed, but how do you know you need to call it?

We already have working code that deals with handshake-in-progress error codes from arbitrary read and write calls; we could change that to have a bunch of special-case logic to keep calling do_handshake for the initial handshake, but, why?

That's exactly what this branch does. And it's not exactly special-case; it's just doing what the OpenSSL docs say to do.

It doesn't seem like that lets us do anything except avoid the info callback, and there doesn't seem to be any good reason to avoid it.

It lets us have a decent interface for determining when the handshake is done. The info callback is ugly, exposes OpenSSL internals, and, worst of all, it's a callback. That means that it's liable to be re-enter Python code at inopportune moments, such as when someone calls write.

Usually a handshake will complete as a result of a call from the reactor, but there's no guarantee that this is a case. Something could race against the reactor, call transport.write, and cause the handshake to finish. That will cause the info callback to fire. Ouch.

This is unlikely. It may even be impossible, but I doubt that. I can certainly see it causing hard-to-predict breakage.

This branch shouldn't do that. I'll take a look (probably in a week or two -- I'm currently swamped with other things) to make sure that it can't happen. But at least by using do_handshake, we control when things happen.

  • It's late. I could have some of the names wrong here.

comment:53 in reply to: ↑ 52 ; follow-up: Changed 5 months ago by glyph

Replying to amluto:

Replying to glyph:

Replying to exarkun:

Or are you referring to the fact that renegotiation is effectively restarting the handshake and the return value of SSL_do_handshake is useless in observing when this happens?

Exactly. What I meant was: do_handshake will totally reliably tell you that the handshake completed, but how do you know you need to call it?

We already have working code that deals with handshake-in-progress error codes from arbitrary read and write calls; we could change that to have a bunch of special-case logic to keep calling do_handshake for the initial handshake, but, why?

That's exactly what this branch does.

Except the branch doesn't quite work. Specifically, it doesn't seem to discard unauthenticated bytes that were written by a _tlsConnection.send or read by a _tlsConnection.recv. I know that in principle this is totally possible, but it's not working yet, and starting over with the much simpler info_callback just fixed the issue with far less of an impact (almost none, in fact) on the structure of twisted.protocols.tls.

And it's not exactly special-case; it's just doing what the OpenSSL docs say to do.

This is an appeal to authority; and not a particularly good authority at that. The OpenSSL docs generally think a long list of functions are an adequate replacement for a tutorial, and they also mostly assume you're doing synchronous I/O.

It doesn't seem like that lets us do anything except avoid the info callback, and there doesn't seem to be any good reason to avoid it.

It lets us have a decent interface for determining when the handshake is done.

This is begging the question. "Why is the info callback not good? Because the thing we would get if we used the info callback is bad."

The info callback is ugly,

Again: not a real argument. Details, please.

exposes OpenSSL internals,

With "public" APIs like int SSL_get_ex_new_index(long argl, char *argp, int (*new_func);(void), int (*dup_func)(void), void (*free_func)(void)) it's not clear to me that OpenSSL has anything but internals. We would be kidding ourselves to say that any of this code would be re-usable to other TLS implementations. (Most other TLS APIs are so much easier to work with, though, that there's no particular reason to want to share this code.)

and, worst of all, it's a callback. That means that it's liable to be re-enter Python code at inopportune moments, such as when someone calls write.

Most things in Twisted are callbacks. If we need to break out of it in order to avoid reentrant application code, then that's fine; we can easily enqueue whatever needs to be done into a call queue on the TLS protocol wrapper. But the work I've done so far in #5190 indicates that we probably don't need to worry about that.

Usually a handshake will complete as a result of a call from the reactor, but there's no guarantee that this is a case. Something could race against the reactor, call transport.write, and cause the handshake to finish. That will cause the info callback to fire. Ouch.

Uh... no? You're not allowed to diddle these objects from other threads. If you do, and it breaks, you can keep both pieces. The reactor is fully in control of all the I/O going into these things. What exactly do you imagine racing with it?

This is unlikely. It may even be impossible, but I doubt that. I can certainly see it causing hard-to-predict breakage.

I literally can't imagine any valid API usage that would cause it, so I honestly don't know what you're talking about.

This branch shouldn't do that. I'll take a look (probably in a week or two -- I'm currently swamped with other things) to make sure that it can't happen. But at least by using do_handshake, we control when things happen.

I'm sorry to say that in a week or two, this will all probably be finished and moot :-). (Although I've said similar things about 6 years ago on tickets that are still open, so...)

Nevertheless, if there's a good reason to re-factor the implementation to not use the info callback - and I'm still waiting to hear a good reason - it should definitely be possible to re-factor the internals to call do_handshake rather than depend on the info callback and provide the same interface to applications, and I'm happy for you to do that.

comment:54 in reply to: ↑ 53 Changed 5 months ago by amluto

Replying to glyph:

Replying to amluto:

Replying to glyph:

Replying to exarkun:

Or are you referring to the fact that renegotiation is effectively restarting the handshake and the return value of SSL_do_handshake is useless in observing when this happens?

Exactly. What I meant was: do_handshake will totally reliably tell you that the handshake completed, but how do you know you need to call it?

We already have working code that deals with handshake-in-progress error codes from arbitrary read and write calls; we could change that to have a bunch of special-case logic to keep calling do_handshake for the initial handshake, but, why?

That's exactly what this branch does.

Except the branch doesn't quite work. Specifically, it doesn't seem to discard unauthenticated bytes that were written by a _tlsConnection.send or read by a _tlsConnection.recv. I know that in principle this is totally possible, but it's not working yet, and starting over with the much simpler info_callback just fixed the issue with far less of an impact (almost none, in fact) on the structure of twisted.protocols.tls.

What call to _tlsConnection.recv? There's only one call site, and it's guarded by a check that the handshake is done?

And what call to _tlsConnection.send? Can you explain how an application that waits for the handshake to complete before calling transport.write can possibly call _tlsConnection.send before the handshake is done?

The info callback is ugly,

Again: not a real argument. Details, please.

I'm done with this argument. As far as I know, the only real issue with this branch is that the interface needs a bit of work. If you can solve the same problem with less code or cleaner code using info_callback, be my guest.

Usually a handshake will complete as a result of a call from the reactor, but there's no guarantee that this is a case. Something could race against the reactor, call transport.write, and cause the handshake to finish. That will cause the info callback to fire. Ouch.

Uh... no? You're not allowed to diddle these objects from other threads. If you do, and it breaks, you can keep both pieces. The reactor is fully in control of all the I/O going into these things. What exactly do you imagine racing with it?

This has nothing whatsoever to do with threads. Application code could call transport.write in response to an event on a different socket. That being said, I'm not immediately seeing how the particular issue can happen.

Nevertheless, if there's a good reason to re-factor the implementation to not use the info callback - and I'm still waiting to hear a good reason - it should definitely be possible to re-factor the internals to call do_handshake rather than depend on the info callback and provide the same interface to applications, and I'm happy for you to do that.

I *already did that*.

Note: See TracTickets for help on using tickets.