Ticket #78 (new defect )

Opened 5 years ago

Last modified 1 year ago

abortConnection() method for transports

Reported by: itamarst Assigned to: jknight
Type: defect Priority: high
Milestone: Component: core
Keywords: Cc: glyph, itamarst, jknight, naked, exarkun
Branch: Author:
Launchpad Bug:

Attachments

abortConnection.diff (1.8 kB) - added by naked 5 years ago.
abortConnection2.diff (4.1 kB) - added by jknight 4 years ago.

Change History

  2003-07-15 00:54:40+00:00 changed by itamarst

Similar to loseConnection(), but doesn't flush the write
buffers or wait for producer to finish. Useful for when you
know connection should die ASAP, because the other side is
totally screwed up anyway.

  2003-07-18 19:12:38+00:00 changed by glyph

I hope you don't object.  BUT I DON'T CARE IF YOU DO, HAHA

  2003-08-12 19:54:01+00:00 changed by glyph

This should probably just be an optional argument to
loseConnection, i.e.
    loseConnection(terminateTimeout=600)
A lot of code uses loseConnection() and expects the socket
to be closed eventually, but it is pretty easy to construct
an attack which will cause Twisted to never ever close a
socket this way.  By adding an optional argument to this
interface, we can enhance the semantics of the existing
method in a mostly-backwards-compatible way.

  2003-08-12 21:58:02+00:00 changed by itamarst

Sounds good. Maybe with a default timeout? Which should
probably be high enough that existing code won't be affected
by slow connections (and for producers this is especially an
issue, though I'm not sure if we have any code that depends
on loseConnection's producer interaction).

  2003-08-12 22:23:31+00:00 changed by glyph

I bet that our existing code deals with
loseConnection+Producers differently in different reactors.
 That should be on our list of things to clean up when we
revamp the Producer API.

  2003-08-12 23:28:14+00:00 changed by itamarst

Actually it's in tcp.py (or actually maybe abstract.py), so
the reactor shouldn't matter.

  2004-05-13 20:42:42+00:00 changed by itamarst

Hm. Wouldn't loseConnection time out potentially break existing code? Like,
"send really huge file and then loseConnection". Of course, if the default
timeout was large enough it'd mostly be OK...

  2004-05-13 20:49:10+00:00 changed by naked

  • attachment abortConnection.diff added

  2004-05-13 20:49:10+00:00 changed by naked

I wrote a patch to this effect, although the method is called abortConnection().
My needs were to explicitly close a connection without waiting for write
availability on the socket/whatever first. Also this means that with SSL, we
don't send or wait for the reply to close notify, which is important.
There is a problem though, that in the normal case, connectionLost() is invoked
from the reactor after doWrite or doRead has returned CONNECTION_LOST or
CONNECTION_DONE. In this case, connectionLost() is called directly as a result
of calling abortConnection(). I do not know enough about twisted internals to
say if this is a serious problem or not, and how it should best be fixed.

  2004-05-21 02:05:33+00:00 changed by glyph

We do have code that depends on that behavior.  Quite a bit, actually.  I think I prefer
abortConnection(), since terminateTimeout is a convenience and will have all the issues normally
associated with callLater.

  2004-05-21 02:06:50+00:00 changed by jknight

I think there's two issues here. One is a API to abort the connection. The other is the timeout
issue.
For the write timeout issue, I don't think an absolute timeout on loseConnection is the right
thing. Instead, it should have behavior like: "If there is data in the write buffer, and I have not sent any
bytes for the last minute, the connection is as good as dead." I believe this behavior can and should
default to on without compatibility problems.

  2004-11-24 07:13:16+00:00 changed by jknight

  • attachment abortConnection2.diff added

  2004-11-24 07:13:17+00:00 changed by jknight

Here's a different patch. transport.abortConnection() loses the connection and tcp's _closeSocket is
updated to do an abrupt TCP disconnect unless the reason is ConnectionDone. Seems like it works but
mostly untested.

  2006-07-02 18:00:41+00:00 changed by foom

(In [17458]) An abortConnection() function for transport, and some tests.

Refs #78

  2006-07-02 18:05:25+00:00 changed by jknight

  • owner changed from itamarst to jknight
  • summary changed from terminateConnection() method for transports to abortConnection() method for transports

  2007-10-03 00:44:35+00:00 changed by kvogt

This is actually a pretty evil security hole. I logged into a server to see why it was out of memory, and found 1434 open connections with no activity. A malicious user was opening hundreds of connections to a twisted server but not reading from them. Since loseConnection() won't close the sockets unless the write buffer is empty, I have no way to fight this attack at the application level.

We have had a twisted IRC server and an entire twisted live video cluster (93 nodes) get destroyed by this exact exploit once a certain hacking group figured it out. In the mean time I'm patching in abortConnection2.diff, but I thought this real world example might be useful information for people trying to fix this.

  2007-10-03 12:33:07+00:00 changed by itamarst

  • priority changed from normal to high
  • type changed from enhancement to defect

I'm torn between happiness at live video clustering and sadness about the exploit. Upping priority, changing to defect.

  2007-10-03 14:54:35+00:00 changed by glyph

I'm a little concerned about the form of the solution here. It means that every protocol that is not specifically designed to be resistant to this particular exploit (in other words, all the protocols in Twisted) will continue to be a problem. Perhaps a better way to deal with the exploit would be to have a default timeout on loseConnection if no data is being consumed by the client? Or an easy way to set up administrative limits on the number of simultaneous connections from a single IP?

I haven't thought too hard about the solution yet, but I definitely don't think we could consider this form of DoS dealt with if you have to hand-patch some code for every port that you want to open.

  2007-10-09 10:34:43+00:00 changed by exarkun

Mechanism separate from policy.

  2007-10-09 12:39:19+00:00 changed by itamarst

  • cc changed from glyph, itamarst, jknight, naked to glyph, itamarst, jknight, naked, exarkun

Obviously we should have abortConnection that is separate from loseConnection. The question is, should loseConnection additionally have some sort of a default timeout case where it calls abortConnection. loseConnection essentially is a policy, since it can be implemented in terms of producer API + abortConnection.

  2007-10-09 13:35:27+00:00 changed by glyph

I was not trying to make a point about the validity of this ticket.

I was responding to the comment by kvogt that "This is actually a pretty evil security hole". This ticket isn't going to fix any security hole (or, more accurately, DoS). It may be a necessary prerequisite, but we should probably have a separate issue to deal with the larger problem.

Note: See TracTickets for help on using tickets.