Ticket #686 (closed defect: fixed)

Opened 6 years ago

Last modified 2 years ago

[TEST] startTLS is broken if there's already data in the outgoing buffer.

Reported by: jknight Owned by: exarkun
Priority: high Milestone:
Component: core Keywords:
Cc: exarkun, itamarst, jknight, thijs, therve, pjd Branch: branches/starttls-with-buffered-686-3
Author: exarkun Launchpad Bug:

Description


Change History

  Changed 6 years ago by jknight

If you do write("lotsofdata"); startTLS(), sometimes twisted will do the startTLS() before the 
write("lotsofdata"). The condition for "sometimes" being whether the lotsofdata was large enough to 
cause the data to be buffered in twisted, instead of sent immediately, or buffered by the kernel.

Apparently this doesn't show up normally, because people don't often send much data before doing 
STARTTLS, and thus the data normally *is* sent out immediately.

See also twisted.test.test_ssl.SpammyTLSTestCase test cases.

  Changed 6 years ago by jknight

Propose this patch to make the problem more visible and less data-corruptful if you do happen to run 
into it:

Index: twisted/internet/tcp.py
===============================================================
====
--- twisted/internet/tcp.py     (revision 11439)
+++ twisted/internet/tcp.py     (working copy)
@@ -211,6 +211,17 @@

         def startTLS(self, ctx):
             assert not self.TLS
+            if self.dataBuffer:
+                written = self.writeSomeData(buffer(self.dataBuffer, self.offset))
+                offset = self.offset
+                dataLen = len(self.dataBuffer)
+                self.offset = 0
+                self.dataBuffer = ""
+                if isinstance(written, Exception) or (offset + written != len(dataBuffer)):
+                    warnings.warn("startTLS with unwritten buffered data currently doesn't work right. See 
issue #686. Closing connection.", category=RuntimeWarning, stacklevel=2)
+                    self.loseConnection()
+
+
             self.stopReading()
             self.stopWriting()
             self._startTLS()

  Changed 6 years ago by itamarst

This should be fixed correctly by 2.0.

  Changed 6 years ago by itamarst

Still urgent, but jknight explained that it's very unlikely to happen in
practice given his fixes, so it's not necessary for 2.0.

  Changed 6 years ago by radix

urgent *means* "next release", so it's not urgent (not to mention no failing
unit test), sorry :-)

follow-up: ↓ 7   Changed 6 years ago by exarkun

twisted.test.test_ssl.SpammyTLSTestCase covers this bug, I think.  Note that it
is set todo.

in reply to: ↑ 6 ; follow-up: ↓ 8   Changed 2 years ago by thijs

  • owner changed from jknight to exarkun
  • keywords review added
  • component set to core
  • cc thijs added

Replying to exarkun:

twisted.test.test_ssl.SpammyTLSTestCase covers this bug, I think. Note that it is set todo.

I don't see any TODOs in the SpammyTLSTestCase anymore, guess this means it's fixed?

in reply to: ↑ 7   Changed 2 years ago by thijs

  • owner changed from exarkun to jknight
  • keywords review removed

Replying to thijs:

I don't see any TODOs in the SpammyTLSTestCase anymore, guess this means it's fixed?

Oh damn, I now see the :( faces and todo properties.. sorry for the noise.

  Changed 2 years ago by exarkun

  • branch set to branches/starttls-with-buffered-686
  • author set to exarkun

(In [24382]) Branching to 'starttls-with-buffered-686'

  Changed 2 years ago by exarkun

I think this is basically resolved, but it's blocked on #2874.

  Changed 2 years ago by exarkun

  • branch changed from branches/starttls-with-buffered-686 to branches/starttls-with-buffered-686-2

(In [24414]) Branching to 'starttls-with-buffered-686-2'

  Changed 2 years ago by exarkun

  • keywords review added
  • owner jknight deleted

  Changed 2 years ago by therve

  • keywords review removed
  • owner set to exarkun

Great, nice fix. Some small comments:

Flakes:

twisted/test/test_ssl.py:8: 'trial_util' imported but unused
twisted/internet/tcp.py:20: 'warnings' imported but unused
+                self._tlsBufferedData = []
+                self._tlsContext = ctx
+                self._tlsExtra = extra

I think it may be better if one object could hold all this state, eg self._tlsDelayed = _TLSDelayed([], ctx, extra). What do you think?

  Changed 2 years ago by therve

  • cc therve added

  Changed 2 years ago by exarkun

(In [24509]) fix pyflakes warnings

refs #686

  Changed 2 years ago by exarkun

(In [24510]) replace four _tls* attributes with _tlsWaiting

refs #686

  Changed 2 years ago by exarkun

  • owner exarkun deleted
  • keywords review added

I've made those two changes in the two revisions linked above (no other changes since the last review). Here are the most current build results:

 http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/starttls-with-buffered-686-2

  Changed 2 years ago by therve

  • owner set to exarkun
  • keywords review removed

I'm ok with the branch, but the build results show some problems:

  • gtk2/glib2 is not happy on windows it seems
  • I've noticed that the runReactor method doesn't cancel the timeout :/ Should be pretty easy to fix, though.

  Changed 2 years ago by exarkun

  • branch changed from branches/starttls-with-buffered-686-2 to branches/starttls-with-buffered-686-3

(In [24525]) Branching to 'starttls-with-buffered-686-3'

  Changed 2 years ago by exarkun

  • owner changed from exarkun to therve
  • keywords review added

I think those points have been taken care of in trunk already. I merged forward, here are the results for the new branch:

 http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/starttls-with-buffered-686-3

I should have done that before, I knew there were still weird problems in the -2 branch.

runReactor still doesn't directly cancel its timeout, but unbuildReactor does. I think this is good enough until we have a case that demands something more.

  Changed 2 years ago by therve

  • owner changed from therve to exarkun
  • keywords review removed

Please merge.

  Changed 2 years ago by exarkun

  • status changed from new to closed
  • resolution set to fixed

(In [24527]) Merge starttls-with-buffered-686-3

Author: exarkun Reviewer: jknight, therve Fixes: #686

Change the implementation of ITLSTransport.startTLS so that if there is unsent data in the buffer, TLS negotiation is delayed until all of that data is sent. This changes the existing behavior which would log an error if the data could not be sent immediately and then drop the connection.

  Changed 2 years ago by pjd

  • cc pjd added

Note that this has some fallout: see Divmod ticket  #2673.

Callers can no longer safely assume that the TLS session is established once startTLS returns, and code that used to work due to startTLS opportunistically flushing the buffer can suddenly start failing.

I'm guessing that startTLS should be defined to return a Deferred (or provide some other mechanism) to allow callers to wait for the delayed TLS session.

(Should this be reopened?)

  Changed 2 years ago by exarkun

One small correction: callers could never assume that the TLS session was established immediately after startTLS returned; they could rely on one of two things - that the connection was closed if there was data in the application buffer which couldn't be sent, or that the TLS handshake had probably started. As it turns out, the handle returned by getHandle would also be a pyOpenSSL Connection object, but the getHandle documentation is pretty clear:

Return a system- and reactor-specific handle. This might be a socket.socket() object, or some other type of object, depending on which reactor is being used. Use and manipulate at your own risk. This might be used in cases where you want to set specific options not exposed by the Twisted APIs.

I consider the change made here, from always returning a pyOpenSSL Connection object immediately after a startTLS call returning one at a somewhat later time in some cases to be reasonable. I also don't expect that it will break anything except for problemsFromTransport (however, if it does break some other applications, then I'll happily consider making some further changes to avoid that).

I agree that there should be a callback to notify application code of successful (and likely failed) TLS initiation, but that's a task for a separate ticket. Additionally, in all real-world protocols that I'm aware of, such callbacks aren't particularly useful, since the protocol is designed to allow that information to be discovered in other ways.

Note: See TracTickets for help on using tickets.