Opened 6 years ago

Last modified 4 years ago

#5341 defect assigned

loseConnection is a no-op after loseWriteConnection with new TLS code (and loseWriteConnection should not even exist there)

Reported by: Glyph Owned by: Itamar Turner-Trauring
Priority: high Milestone:
Component: core Keywords:
Cc: therve, Jean-Paul Calderone, Glyph Branch: branches/halfclose-tls-5341
branch-diff, diff-cov, branch-cov, buildbot
Author: itamar

Description (last modified by Itamar Turner-Trauring)

Half-close is almost certainly broken in the new TLS code; it should be using TLS shutdown messages to half-close the connection, but right now it's using TCP FINs.

See also #5336 for a related issue (the exception mentioned in one of the comments below).

Change History (13)

comment:1 Changed 6 years ago by Itamar Turner-Trauring

Description: modified (diff)
Owner: set to Itamar Turner-Trauring
Status: newassigned

comment:2 Changed 6 years ago by itamarst

Author: itamarst
Branch: branches/halfclose-tls-5341

(In [32989]) Branching to 'halfclose-tls-5341'

comment:3 Changed 6 years ago by Itamar Turner-Trauring

Author: itamarstitamar

I was unable to reproduce the exception, but was able to prove half-close isn't working. Unfortunately making a working half-close implementation is not a half hour task, so this may take a bit of time to fix.

comment:4 Changed 6 years ago by Itamar Turner-Trauring

Description: modified (diff)
Summary: loseWriteConnection breaks new TLS codeloseWriteConnection is broken in new TLS code

comment:5 Changed 6 years ago by Itamar Turner-Trauring

Cc: Jean-Paul Calderone Glyph added

Graaah. I can't build pyOpenSSL 0.9 on this machine because modern Ubuntu doesn't have dev headers for a version of OpenSSL that it'll work with (apparently libssl 1.0 makes it unhappy?).

Anyone want to to see if the tests I added in the branch fail with old pyOpenSSL? If they do, we can probably safely say the old code is also broken with half-close, in which case this isn't a regression and shouldn't block 11.1.

comment:6 Changed 6 years ago by Itamar Turner-Trauring

Coding when tired is hard.

I'm pretty sure I have demonstrated that in trunk at least the old SSL code doesn't do half-close correctly. It does do it better than the new code, though: the new code loses bytes that get received after loseWriteConnection, unlike the old code. So in some sense this is a regression.

comment:7 Changed 6 years ago by Itamar Turner-Trauring

Milestone: Twisted-11.1
Type: regressiondefect

In some cases, however, the old code breaks even worse (connectionLost never gets called!). So, meh, we don't need this fixed in 11.1.

I'm wary of just disabling it since it's basically a loseConnection in the new code, which is kinda sorta what you want and probably better than just breaking live applications. E.g. in Glyph's case it's much better to have loseWriteConnection be equivalent to loseConnection than to have it blow up. OTOH, we need some way to tell people to beware, until this is fixed.

Maybe log a Warning to people who attempt to use loseWriteConnection, telling them it's known to be broken?

comment:8 in reply to:  7 Changed 6 years ago by Glyph

Replying to itamar:

In some cases, however, the old code breaks even worse (connectionLost never gets called!). So, meh, we don't need this fixed in 11.1.

Should #5336 be marked as a regression then? That traceback is the only visible symptom of a problem for me. For what it's worth, I haven't seen this code leak file descriptors before, which would be the main consequence of connectionLost never being called, right?

comment:9 Changed 6 years ago by Itamar Turner-Trauring

You would see occasional fd leaks unless you have a timeout that does loseConnection() if connectionLost isn't called soon enough. Possibly web2 has one of those? The leak condition may also be difficult to trigger in the real world.

#5336 is, really, a different issue. It would happen even if half-close was working correctly.

comment:10 Changed 6 years ago by Itamar Turner-Trauring

RFC 5246:
   Unless some other fatal alert has been transmitted, each party is
   required to send a close_notify alert before closing the write side
   of the connection.  The other party MUST respond with a close_notify
   alert of its own and close down the connection immediately,
   discarding any pending writes.  It is not required for the initiator
   of the close to wait for the responding close_notify alert before
   closing the read side of the connection.

The TLS 1.0 RFC has basically the same language.

Thus half-close is basically a violation of the TLS specs. So maybe loseWriteConnection should continue to behave the way loseConnection does, but warn that half-close is not actually supported.

comment:11 Changed 6 years ago by Itamar Turner-Trauring

Also, the TLS transport protocol shouldn't provide IHalfCloseableProtocol even if the protocol it is wrapping does.

comment:12 in reply to:  9 Changed 4 years ago by Glyph

Replying to itamar:

You would see occasional fd leaks unless you have a timeout that does loseConnection() if connectionLost isn't called soon enough. Possibly web2 has one of those? The leak condition may also be difficult to trigger in the real world.

Actually the FD leak issue is worse than that.

If you call loseWriteConnection, this loses the write connection on the underlying transport. OK, everything's all right so far, that's not totally a valid state per the spec, but whatever, it's a thing that can happen on the wire, so you make it happen.

Now, remember, once you've lost the write connection, transport.write silently does nothing.

You have your timeout, then you call loseConnection, and you expect your connection to go away. OK, also fine, we should totally drop the underlying connection after a timeout.

loseConnection issues a TLS close alert message, by calling _shutdownTLS which calls _tlsConnection.shutdown() and then _flushSendBIO, and delivers the closeAlert to self.transport.write.

_tlsConnection.shutdown() will never return an immediate success, because here you're the initiator of a close alert, and that close alert has to be relayed. So we're not going to call loseConnection, we're going to wait around for the ZeroReturnError case of _flushReceiveBIO to be called by dataReceived

but if the client is done sending data, dataReceived is never going to be called again, because we never sent the closeAlert in the first place, because transport.write silently did nothing.

abortConnection might do the job, but once you have called loseWriteConnection on a TLS-ified transport, loseConnection is effectively a no-op.

comment:13 Changed 4 years ago by Glyph

Summary: loseWriteConnection is broken in new TLS codeloseConnection is a no-op after loseWriteConnection with new TLS code (and loseWriteConnection should not even exist there)
Note: See TracTickets for help on using tickets.