Opened 7 years ago

Last modified 6 years ago

#6287 defect new

epoll error: epollreactor stop UDP protocol while selectreactor connectionRefused

Reported by: huyx Owned by: Martin Gergov
Priority: normal Milestone:
Component: core Keywords: epoll udp
Cc: Branch: branches/epoll-udp-refused-6287
branch-diff, diff-cov, branch-cov, buildbot
Author: marto1_

Description

When i use epollreactor and send UDP data to server, i got error:

Unhandled Error
Traceback (most recent call last):
  File "echoclient_udp_epoll.py", line 35, in <module>
    main()
  File "echoclient_udp_epoll.py", line 32, in main
    reactor.run()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1173, in run
    self.mainLoop()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 1182, in mainLoop
    self.runUntilCurrent()
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 805, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "echoclient_udp_epoll.py", line 23, in sendDatagram
    self.transport.write(datagram)
exceptions.AttributeError: 'NoneType' object has no attribute 'write'

I used the following test code:

#!/usr/bin/env python

# from twisted.internet import selectreactor    # ****
# selectreactor.install()                       # ****

from twisted.internet.protocol import DatagramProtocol
from twisted.internet import reactor
from itertools import cycle

class EchoClientDatagramProtocol(DatagramProtocol):
    strings = cycle([
        "Hello, world!\n",
        "What a fine day it is.\n",
        "Bye-bye!\n"
    ])

    def startProtocol(self):
        self.transport.connect('127.0.0.1', 8000)
        self.sendDatagram()

    def sendDatagram(self):
        datagram = self.strings.next()
        self.transport.write(datagram)
        reactor.callLater(1, self.sendDatagram)

    def connectionRefused(self):
        print 'connectionRefused'

    def datagramReceived(self, datagram, host):
        print 'Datagram received: ', repr(datagram)

def main():
    protocol = EchoClientDatagramProtocol()
    t = reactor.listenUDP(0, protocol)
    reactor.run()

if __name__ == '__main__':
    main()

But when i uncommnet the * line, it runs as i expected, i thank it is a bug of epoll reactor, isn't it?

Attachments (1)

fix-epoll-not-calling-connectionRefused (3.6 KB) - added by Martin Gergov 6 years ago.
_doReadOrWrite modification was needed so that TCP preserved it's behaviour, although still looks like a hack.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 7 years ago by Jean-Paul Calderone

Looks like epoll_wait signals this case using EPOLLERR. epollreactor handles EPOLLERR the same way it handles EPOLLHUP (so the reason the transport disappears is that the protocol's doStop method is called, which throws away the transport).

Perhaps a better thing for epollreactor to do would be to handle EPOLLERR like EPOLLIN, since "readable" is where all of the special UDP error handling logic is implemented (Port.doRead). I didn't check whether this change would have any other bad side-effects though.

comment:2 Changed 6 years ago by Martin Gergov

If you switch

    _POLL_DISCONNECTED = (EPOLLHUP | EPOLLERR)
    _POLL_IN = EPOLLIN
    _POLL_OUT = EPOLLOUT

with

    _POLL_DISCONNECTED = EPOLLHUP
    _POLL_IN = (EPOLLIN | EPOLLERR)
    _POLL_OUT = EPOLLOUT

in EPollReactor this will cause the expected behaviour for udp(connectionRefused is called, because host is unreachable), but twisted.test.test_tcp.WriteDataTestCase.test_writeAfterShutdownWithoutReading fails with:

twisted.trial.unittest.FailTest: 
Expected: (<class 'twisted.internet.error.ConnectionLost'>,)
Got:
[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
]

Changed 6 years ago by Martin Gergov

_doReadOrWrite modification was needed so that TCP preserved it's behaviour, although still looks like a hack.

comment:3 Changed 6 years ago by Martin Gergov

Keywords: review added

The bug is fixed, I hope on all platforms, unfortunately TCP breaks and returns CONNECTION_DONE instead of CONNECTION_LOST so I had to touch posixbase. Other approach to the problem is to leave EPOLLERR in _POLL_DISCONNECTED events, but inform in some way udp.Port.doRead that this is a special case.

comment:4 Changed 6 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: newassigned

comment:5 Changed 6 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/epoll-udp-refused-6287

(In [40128]) Branching to 'epoll-udp-refused-6287'

comment:6 Changed 6 years ago by Jean-Paul Calderone

(In [40129]) apply fix-epoll-not-calling-connectionRefused

refs #6287

comment:7 Changed 6 years ago by Jean-Paul Calderone

Author: exarkunmarto1_
Keywords: review removed
Owner: changed from Jean-Paul Calderone to Martin Gergov
Status: assignednew

Thanks.

The big thing I notice when looking at this change is that it doesn't seem right to skip this new test for those three reactors. It's true that they don't currently support the behavior being tested - but that's because the implementation shares exactly the bug that the epoll reactor has. Looking more closely, I see that all of these reactors define their "error" condition as part of the "disconnect" set. They can all be made to work by defining it as part of the "input" set instead. This suggests that it really might be better to teach _PollLikeMixin to treat the "error" condition in a special way. It does appear the issue can be resolved without adding this special treatment so I don't think this is a requirement to land these changes, just something to consider.

So, instead of skipping all the other poll-like reactors in this new test, I suggest running it against all of them and fixing them all (the fix for all of them is roughly the same, it seems, and only 2 or 3 lines each so this isn't going to explode the size of the changeset).

Here are some other minor issues:

  1. hasattr is broken and should essentially never be used. Also, _PollLikeMixin declares explicitly that it 'must' be mixed in to a PosixReactorBase subclass. So this makes _ContinuousPolling the offender here. Maybe the right thing to do is work around this mismatch in _PollLikeMixin but if so then the documentation needs to be updated. In fact, I think the right thing to do right now is to rename the _ContinuousPolling._readers attribute so that it agrees with _PollLikeMixin's expectations. And later perhaps rewriting _PollLikeMixin to not be a mixin for PosixReactorBase subclasses (ie, make it a helper function that takes some more arguments or do something else with composition) probably makes sense.
  2. The test method needs a docstring
  3. ReactorBuilder-style tests do their own cleanup: there's no need to return port.stopListening() at the end of the test.
  4. Ideally there would be an assertion in the test somewhere. I realize it's sometimes difficult to come up with one that makes sense in "does this event happen" tests but it's worth a good try.
  5. There's no guarantee port 8000 will not be accepting udp traffic on the test machine. An unbound UDP port should be used instead. You can find one by creating a UDP socket and asked what its port is.
  6. Why does there need to be a reactor.callLater(0, ...) in the test?
  7. The news fragment should talk about twisted.internet.epollreactor instead of Epoll reactor. Whenever possible, the subject of the fragment should be a Python FQPN so it is unambiguous and easy for people to investigate further.
Note: See TracTickets for help on using tickets.