Opened 11 years ago

Last modified 9 years ago

#1662 enhancement new

pauseProducing() should not block POLL_DISCONNECTED

Reported by: andrea Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: andrea, jknight, Glyph, Jean-Paul Calderone Branch:


This is an enhancement to consider for the mid/long term.

The basic problem is that I can't throttle the input/output from the network using the registerProducer (that calls pauseProducing()) without risking to hang forever. Hanging forever means pratically disabling the reconnectingclient behavior.

It's a significant problem because it's in the twisted core, in pollreactor and in general in the reactor API (I doubt it can be fixed without some changes to the reactor API).

In other words the deficiency is that writing a network app with twisted is like writing a network app in a OS that only supports POLLIN and POLLOUT. All other flags are unusable with twisted:


I need to poll the fds in pausedProducer state only for POLLERR|POLLHUP|POLLNVAL. Currently either I poll them for POLLIN or I don't poll them at all, this is a big limitation if you can't trust the producer not being maliciously written.

Currently if the producer is malicious, he can either hang the tcp connection permanently to the point that it won't even notice a close() on the other side, or without registering a producer twisted can leak an infinite amount of memory leading either ways to a local DoS. Not even the keepalive seem to wake it up, after all if it could notice it, he would have noticed it before the keepalive timer triggers.

I think the fix is to try by default to make registerProducer not to remove the file descriptor from the poll syscall, but to change it from POLLIN/POLLOUT to pollreactor.POLL_DISCONNECT.

pauseProducing doesn't mean "hang indefinitely", one should still be notified if the other side of the tcp closes the socket. So a new protocol.connectionLostAsync API could be added, to notify that there is still incoming data to read, but the other side has already closed the tcp connection. This way the connectionLostAsync could in turn call transport.loseConnection() or even transport.connectionLost(). The latter is needed to close the socket hard even if there's still data in the send queue, right?

Currently I don't strictly need this, because I've always "trusted" ssl connection, that I can use to call protocol.transport.loseConnection() on the "untrusted" protocol that is hanging in paused state. That is guaranteed to unblock it and make it notice that the other side disconnected, right? Or do I need to call protocol.transport.connectionLost(closeconnection) instead to be sure it unblocks?

Attachments (2)

aa-pause.txt (21.2 KB) - added by jknight 11 years ago.
aa-paused2.txt (21.6 KB) - added by andrea 11 years ago.
updated to add 'reason' failure (just in case we need it in the future, and to make it easier to fall through connectionLost)

Download all attachments as: .zip

Change History (10)

comment:1 Changed 11 years ago by jknight

Yes, this is a problem. I don't think it needs any new user-level APIs, only some new reactor functionality.

The solution is to keep a list of fds associated with the reactor separately from each one's read/write state. That is, #365 "Reactor should track associated sockets separately from read/write state."

comment:2 Changed 11 years ago by Glyph

Owner: changed from Glyph to jknight

Since this is a fairly direct dependent of #365 I am assigning it to you.

Is any work actually happening on #365?

Changed 11 years ago by jknight

Attachment: aa-pause.txt added

comment:3 Changed 11 years ago by jknight

Cc: andrea jknight Glyph Jean-Paul Calderone added

Andrea says via email: Here the final fix for allowing the apps to catch disconnects during read-throttling. This fixes a number of troubles in ssl too.

And also attached the file above. Because I think this feature is important, I've been nice enough to copy it to the bug tracker. I'm not going to make a habit of that, though.

Changed 11 years ago by andrea

Attachment: aa-paused2.txt added

updated to add 'reason' failure (just in case we need it in the future, and to make it easier to fall through connectionLost)

comment:4 Changed 11 years ago by jknight

As for specific comments on the patch, I don't like the addition of pauseReading/resumeReading. I'd like to see instead an API added to tell the reactor the total set of FDs it is supposed to deal with, independent of the reading/writing state. That is, in addition to [add/remove][Reader/Writer], I'd like an [add/remove]FD, which would be called from the appropriate places before calling startReading/startWriting.

I realize that's a bigger change, but I think it would result in a better and more useful API.

I pretty much said that in #365:

The only issue remaining out of the original bunch is that reactors should track

sockets they are associated with, not just sockets that are currently reading or writing.

comment:5 Changed 11 years ago by jknight

As andrea refuses to participate in a discussion about this in public, and I will only do so in public, progress on this issue is pretty much suspended. Too bad.

comment:6 Changed 11 years ago by andrea

This is not exact: I'm happy to discuss it in public, I'm not really willing to discuss it here instead.

Go read the mailing list and see the feedback I received from a number of individuals part of this community. Even if we ignore the personal levels insults (but if you read doc/fun/* you'll see they insulted pretty much anyone out there), despite sending fixes for years that they merged into their SVN repository they later say I don't useful contribute to the project:

"Sorry, you don't get to dictate the course of development of projects you don't even usefully contribute to."

And if it's web2 that he's talking about, I'm sure I contributed to web2 too, probably a lot more than they did because they've all invested on top of web1 but yet they pretend to have the right to dictate its course even if they admitted having to beg for donations in order to contibute to it (I am contributing to it in my spare time).

So let me don't contribute to this project anymore, since it won't change the bottom line, and there are millions of better things to do in my spare time than discussing anything here.

comment:7 Changed 9 years ago by Jean-Paul Calderone

(In [21301]) Merge epoll-hup-err-2809-2

Author: exarkun Reviewer: therve Fixes #2809 Refs #1780 Refs #1662 Refs #2833

Change the error (EPOLLHUP, EPOLLERR) checking code so that it can handle the case where they are both set, rather than only checking the cases where one or the other is set. This allows epollreactor to properly detect and report connections which are lost uncleanly and also are not being polled for input events.

The test case for this is slightly convoluted for a number of reasons:

  • gtkreactor cannot pass the new test, and I am not fixing that reactor in this branch. The test will skip when that reactor is in use. #2833 was created as a result of this in order to be able to remove the broken gtkreactor eventually.
  • If a file descriptor is being polled for neither read nor write events, it will not be polled for hang-up events either. This is a known issue, #1662. When it is fixed, the test will need to be updated.
  • In order to exercise the case where the event reported from epoll is only HUP|ERR (specifically, not IN), the test pauses one of the involved transports. However, due to #1780, it must use a timing hack to accomplish this. Once that issue is resolved, it should be possible to remove this hack.

comment:8 Changed 6 years ago by <automation>

Owner: jknight deleted
Note: See TracTickets for help on using tickets.