Opened 13 months ago

Last modified 13 months ago

#9423 enhancement new

EPollReactor leaks epoll file descriptors

Reported by: mark williams Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

epoll_create(2) returns a file descriptor that should have {{{close(2)}} called on it when it's no longer needed:

http://man7.org/linux/man-pages/man2/epoll_create.2.html#DESCRIPTION

EPollReactor doesn't clean do that. Finalizer-based resource deallocation is bad, and in this case has made the test suite unreliable. EPollReactor should close this descriptor via its stop and {{{crash}} methods.

Context, copied from a GitHub comment related to #5368:

test_closePeerOnEMFILE is a flaky test:

[FAIL]
Traceback (most recent call last):
  File "/buildslave/ubuntu16_04-py2_7-nodeps/Twisted/build/py27-nodeps-nocov-posix/local/lib/python2.7/site-packages/twisted/internet/test/test_tcp.py", line 1349, in test_closePeerOnEMFILE
    connect=self.connectToListener,
  File "/buildslave/ubuntu16_04-py2_7-nodeps/Twisted/build/py27-nodeps-nocov-posix/local/lib/python2.7/site-packages/twisted/internet/test/test_tcp.py", line 1184, in assertPeerClosedOnEMFILE
    noResult, "Server accepted connection; EMFILE not triggered.")
  File "/buildslave/ubuntu16_04-py2_7-nodeps/Twisted/build/py27-nodeps-nocov-posix/local/lib/python2.7/site-packages/twisted/trial/_synctest.py", line 384, in assertFalse
    super(_Assertions, self).assertFalse(condition, msg)
  File "/usr/lib/python2.7/unittest/case.py", line 416, in assertFalse
    raise self.failureException(msg)
twisted.trial.unittest.FailTest: Server accepted connection; EMFILE not triggered.

twisted.internet.test.test_tcp.TCPFDPortTestsBuilder_EPollReactorTests.test_closePeerOnEMFILE

I can reproduce this with trial -u and other transport tests; strace implies that epoll objects are getting garbage collected. That is, given:

strace trial -u twisted.internet.test.test_unix.UNIXPortTestsBuilder_EPollReactorTests.test_closePeerOnEMFILE |& tee fail.log

This appears in the trace:

dup(0)                                  = 1017
dup(0)                                  = 1018
dup(0)                                  = 1019
dup(0)                                  = 1020
dup(0)                                  = 1021
dup(0)                                  = 1022
dup(0)                                  = 1023
dup(0)                                  = -1 EMFILE (Too many open files)
close(5)                                = 0
close(6)                                = 0
close(10)                               = 0
close(12)                               = 0
close(13)                               = 0
close(14)                               = 0
close(15)                               = 0
close(16)                               = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2845, ...}) = 0
write(3, "2018-04-12 17:57:08-0700 [-] EMF"..., 77) = 77
getsockopt(24, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
connect(24, {sa_family=AF_UNIX, sun_path="./tmp2IDgWm.sock"}, 18) = 0
epoll_ctl(17, EPOLL_CTL_ADD, 24, {EPOLLIN, {u32=24, u64=139753940844568}}) = 0
epoll_wait(17, [{EPOLLIN, {u32=23, u64=139753940844567}}], 4, 29977) = 1
accept(23, {sa_family=AF_UNIX}, [110->2]) = 5

File descriptors 5, 6, 10, 12, 13, 14, 15, and 16 all represent epoll objects, e.g.;

...
epoll_create(1024)                      = 15
pipe([16, 17])                          = 0
...

The simplest fix I can think of is to put a gc.collect() at the beginning of test_closePeerOnEMFILE. This seems silly but probably generally the right thing to do, since who knows what files might be lurking.

trial -u with the collection has also successfully completed 10,000 runs without failing.

Change History (1)

Note: See TracTickets for help on using tickets.