[Twisted-Python] ITransport.write after IProtocol.connectionLost -- no failure/exception?

Sat Dec 17 16:18:54 MST 2016

> On Dec 17, 2016, at 6:11 AM, exvito here <ex.vitorino at gmail.com> wrote:
> 
> On Fri, Dec 16, 2016 at 8:27 PM, Glyph Lefkowitz
> <glyph at twistedmatrix.com <mailto:glyph at twistedmatrix.com>> wrote:
>> 
>>> On Dec 16, 2016, at 5:22 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
>>> 
>>>> On 15 Dec 2016, at 12:07, exvito here <ex.vitorino at gmail.com> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> The subject says most of it. While diagnosing a behavior on a somewhat
>>>> large codebase, I found out something somewhat surprising -- see
>>>> subject -- which is replicated with the minimal code example below.
>>>> 
>>>> It is a LineReceiver client that:
>>>> 1. Connects to 127.0.0.1:10000.
>>>> 2. Sends a line of text.
>>>> 3. Disconnects one second later.
>>>> 4. Tries to send another line of text, one second after disconnection.
>>>> 
>>>> I expected step 4 to fail somehow, given that Twisted promptly
>>>> detected that the connection closed and IProtocol.connectionLost was
>>>> called, as documented. Nevertheless, it does not fail.
>>> 
>>> I can’t speak for the design of Twisted, but this is definitely an intended implementation behaviour: because twisted’s write method is non-blocking, it is generally nicer to avoid write exploding in your face when disconnected.
>> 
>> I can! The main design feature of this approach is that if you have a loop like this:
>> 
>> for x in messages:
>>    self.transport.write(self.encode(x))
>> 
>> you should not have to write it to be:
>> 
>> for x in messages:
>>    if self._flagISetInConnectionLost:
>>        break
>>    self.transport.write(self.encode(x))
>> 
>> just to avoid seeing tracebacks in your logs.
>> 
>> If you care about the disconnection event, implement connectionLost to do the thing that needs to be done (in this case, stop your LoopingCall).  That's what that callback is there for!
> 
> 
> Thanks for your input Cory and Glyph.
> 
> I do agree that a well written protocol should not
> self.transport.write after connectionLost -- it does not make any kind
> of sense to do that. It's just that the one I was debugging was doing
> it in its app-level keep alive messages triggered by
> IReactorTime.callLater which wasn't properly cancelled on
> connectionLost. This, in turn, lead to unpexpected app-level keep
> alive timeouts after disconnection which, while acceptable (depending
> on app/protocol design), were having a negative impact on the overall
> connection/protocol teardown and cleanup.
> 
> Having said this, the fact that self.transport.write does not complain
> after connectionLost just made my analysis and debugging more
> difficult (yes, I was aware that it is non-blocking -- thanks Cory --
> and that when talking to the network the only confirmation of delivery
> must be obtained from a received message stating that fact).
> 
> All in all, I was just looking for confirmation from a conceptual /
> design standpoint. That was my purpose and it is now clear. Thanks.
> 
> Going a little bit further, in this context, I wonder if my
> understanding of the Protocol / Transport objects lifecycle is 100%
> clear. I think there is a one to one relationship in the sense that,
> simply put, on an incoming connection the underlying machinery
> instantiates a Protocol via the Factory's buildProtocol method,
> instantiates a Transport object and then calls Protocol.makeConnection
> to "attach" the transport to the protocol; when the connection is
> closed, the Transport calls connectionLost on the Protocol and, at
> some point, those two objects are no longer "usable" -- they've done
> their job and will be destroyed if no app level references are kept to
> them.

They're not "destroyed", exactly.

The way to think about this is that the reactor, while running, is always on the stack.  The stack is the root of what's currently 'active' in the garbage collector.  The reactor then has references to various things - in this case, TCP transports and timed DelayedCall objects.  The transport has a reference to the protocol, which is what keeps both of them "usable".  The DelayedCall has a reference to your function, and your function to anything it needs to do its work.

Things can be removed from and added back to the reactor at any time - for example, a transport may be removed without being closed by calling .stopReading() and .stopWriting() on it.  If it gets garbage collected, its socket will be closed, but because the reactor didn't close it, you won't get a normal connectionLost notification.

So rather than thinking of a Transport as having a precisely-defined lifecycle - although it does have certain states it goes through; connecting/connected/disconnecting/disconnected - the best way to think about it is as a graph of objects that the reactor is doing stuff to.

> If this is correct, in particular the fact that a given Transport
> instance is not reusable after disconnection (if it is, could you
> please point out such a case?), wouldn't it be valuable to log a
> warning when write after disconnect is called? (agreed, raising an
> Exception would be too much and break compatiblity). I find two
> reasons for that: on one hand, given that it would have made my
> debugging easier, it may help other developers pin point such cases;
> on the other, the fact that Twisted itself logs messages on several
> occasions, such as, for example, the default noisy Factory on doStart
> / doStop. Adding to that, the fact that, if I'm reading the code
> correctly, ITransport.write just throws away the data for TCP
> transports since it falls back to FileDescriptor.write that simply
> returns if "not self.connected or self._writeDisconnected" in
> src/twisted/internet/abstract.py.

If you start logging a warning when write is called on a closed transport, then that means any code that wants to be warnings-clean needs to check to see if the transport is writable before writing to it.  This adds a lot of complexity to that sort of message-formatting code, and potentially makes it less efficient. After all, if you have to check each write() call, that will strongly encourage you to make your message-formatting happen entirely in memory, so you always have a big string that represents the whole message, just in case a transport.write() in the middle would encounter an error.

More importantly, when users start seeing a warning like this, they often develop an incorrect intuition about what's going on with the bytes; one of the _most_ frequently asked questions about how to write Twisted protocols is "how do I tell if my bytes got to the other end without sending an application-level acknowledgement".  The answer, of course, is "that's impossible" but if users see that sometimes it warns them when the connection's closed, it reinforces the idea that there must be a way.

If you call .write, there is always the possibility that the connection _looks_ open to Twisted, but in fact is closed on the wire, and those bytes will be lost.  So it's important to me that we behave the same way in both cases.

That said, it's not like there's no room for improvement here!  The solution _I'd_ really like to see for this problem is improved visualization of what the reactor is doing.  Something that drew a graph of the reactor, and open connections, and pending timed calls would allow you to instantly see that your timed call was not connected to an active connection, and correct your code.

> Thoughts? (or corrections, of course?)
> 
> Again, thanks for your time and input.
> Regards,
> 
> PS: I don't mean to start a huge discussion on the topic or question
> the current design. It is very nice as it is, thank you! I'm just
> looking for improving my understanding and, humbly, contributing back.

Don't worry, the design is pretty fixed - and if you managed to provoke such a discussion, then maybe it does need to change!  But I doubt it :)

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20161217/06753011/attachment-0002.html>