Opened 15 years ago

Closed 15 years ago

#1953 enhancement closed fixed (fixed)

Epoll reactor

Reported by: therve Owned by:
Priority: highest Milestone:
Component: core Keywords:
Cc: therve, spiv, jknight, Jean-Paul Calderone, Ralph Meijer, PenguinOfDoom Branch:
Author:

Description

Using the work done here: source:sandbox/exarkun/epoll/ , I reached a working reactor (tests suite pass). I attach it here if anyone interested.

We can see #1930 for another alternative.

Attachments (9)

epollreactor2.py (5.8 KB) - added by therve 15 years ago.
Updated version with more docs and use of the C module
epoll.c (4.4 KB) - added by therve 15 years ago.
Epoll C wrapper
epollreactor.py (5.8 KB) - added by therve 15 years ago.
Add more docs
_epoll.pyx (4.4 KB) - added by therve 15 years ago.
test_epoll.py (3.2 KB) - added by therve 15 years ago.
1930_app.diff (536 bytes) - added by therve 15 years ago.
1930_setup.diff (669 bytes) - added by therve 15 years ago.
reactor_1953.diff (2.3 KB) - added by therve 15 years ago.
Diff agains branches/epollreactor-1953 to make the test suite pass
comment1_1953.diff (6.4 KB) - added by therve 15 years ago.

Download all attachments as: .zip

Change History (62)

comment:1 Changed 15 years ago by spiv

Cc: spiv added
Keywords: review added; epoll reactor removed

This deserves a review tag, even though it's not a proper diff, and clearly needs a bit of polish (allowing --reactor args of twistd, trial to use it, for instance, and updating the docs).

comment:2 Changed 15 years ago by Jean-Paul Calderone

It would be better for epollreactor to be edge triggered. The code in my sandbox was level triggered, and it looks like the attached reactor is as well.

comment:3 Changed 15 years ago by therve

It does need polish, and I'm ready to make it if it encounters some interest.

The attached reactor is indeed Level Triggered, it's just some modification away from the sandbox one, to make the tests suite pass. Creating an Edge Triggered reactor would be much more work: it's really different from select/poll, and I don't know if it performs much better. Some arguments ?

comment:4 Changed 15 years ago by jknight

Yays.

I think we should add a level triggered epoll reactor before going off and trying to make it edge triggered. (Which probably performs better but will likely be a *lot* of work to make work in twisted).

comment:5 Changed 15 years ago by jknight

Cc: jknight added

comment:6 Changed 15 years ago by therve

Also, I wanted to know if the pyrex extension was a requirement. I rewrote a wrapper in plain C which seems to be a lot simpler (even if it may introduce other problems).

comment:7 Changed 15 years ago by jknight

If it _doesn't_ introduce other problems that's fine with me...pyrex introduces many problems of its own. But it'll have to be carefully checked over to make sure it does all its refcounting/etc right.

Changed 15 years ago by therve

Attachment: epollreactor2.py added

Updated version with more docs and use of the C module

Changed 15 years ago by therve

Attachment: epoll.c added

Epoll C wrapper

comment:8 Changed 15 years ago by Jean-Paul Calderone

The epoll in my sandbox had unit tests for the low-level epoll wrapper. Does this C wrapper present the same API (and so is the existing test suite usable to test it)?

In order of preference, I think I'd rather have a wrapper:

  • based on rctypes
  • based on ctypes
  • written in pyrex, perhaps with the resulting C file including svn
  • wrapped directly with C

Modulo pyrex bugs (which may be significant and disqualify this argument), pyrex modules are much easier to write and maintain correctly than C wrappers.

And rctypes may be too raw to actually use :)

comment:9 Changed 15 years ago by therve

I guess the wrapper is close enough to adapt the tests. I'm ready to do it if necessary.

I thought a C wrapper might be better because of performance: my module is around 170 lines against 1000 for the pyrex compiled version. The epoll API is simple enough for making a thin wrapper, but I agree that a pyrex version is easier to maintain.

I'll try to have some performance numbers for C version against pyrex version, but, again, it's just an alternative, I'm sure the pyrex version works very well.

comment:10 Changed 15 years ago by Jean-Paul Calderone

I wouldn't worry about performance much here right now. It seems very unlikely to me that the bottleneck in any application is going to be in the epoll bindings.

comment:11 Changed 15 years ago by therve

You're right, there doesn't seem to be a performance difference between the two version. The choice is then more a maintenance question, so I guess the Pyrex version is more suitable for that.

So, anything else needed to push for its review?

comment:12 Changed 15 years ago by Jean-Paul Calderone

Keywords: review removed

Mostly just the tests, I think. There are a few other changes to make, likely: documentation, updating the reactors dictionary, etc, but that is all pretty simple. Re-add review keyword when you're ready.

Changed 15 years ago by therve

Attachment: epollreactor.py added

Add more docs

Changed 15 years ago by therve

Attachment: _epoll.pyx added

Changed 15 years ago by therve

Attachment: test_epoll.py added

Changed 15 years ago by therve

Attachment: 1930_app.diff added

Changed 15 years ago by therve

Attachment: 1930_setup.diff added

comment:13 Changed 15 years ago by therve

Keywords: review added

Attach updated version with more docs, and the tests previously done. Also add diffs to setup.py and app.py to build the module and add the epoll reactor to twistd/trial.

To build I only checked the platform for linux2, but I guess epoll is available on other platform (bsd? darwin?).

comment:14 Changed 15 years ago by Jean-Paul Calderone

Checked into svn in the epollreactor-1953 branch. The test suite has a large number of failures, though.

comment:15 Changed 15 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Glyph to therve

comment:16 in reply to:  14 Changed 15 years ago by therve

Replying to exarkun:

There are strange behaviours during the test suite. It seems each test run alone but not in a whole. I'll try to find a reproductible behaviour, but I guess it comes from some exceptions raised and bad managed in the reactor.

Changed 15 years ago by therve

Attachment: reactor_1953.diff added

Diff agains branches/epollreactor-1953 to make the test suite pass

comment:17 Changed 15 years ago by therve

Keywords: review added

comment:18 Changed 15 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Keywords: review removed

Hmm, I took some dubious looking error handling out of the branch and it broke a handful of tests. I'll take a look at what's going on there soon, but I haven't had a chance yet.

comment:19 in reply to:  18 Changed 15 years ago by therve

Replying to exarkun:

Hmm, I took some dubious looking error handling out of the branch and it broke a handful of tests. I'll take a look at what's going on there soon, but I haven't had a chance yet.

No problem. On my dev box the branch generates the same errors with epoll and with poll, so I thought it was spurious problems. I'll also have a look.

comment:20 in reply to:  18 Changed 15 years ago by therve

Replying to exarkun:

OK I have 5 failing tests at home; they come from the epoll_ctl call in _remove. The tests are the following:

  • twisted.test.test_tcp.LoopbackTestCase.testFailing
  • twisted.test.test_tcp.ConnectorTestCase.testReconnect
  • twisted.test.test_pb.NewCredTestCase.testClientConnectionLost
  • twisted.test.test_ftp.FTPServerPortDataConnectionTestCase.testPORTCannotConnect
  • twisted.test.test_application.TestInternet2.testConnectionGettingRefused

All comes from a problem in t.i.tcp.Client: it seems the socket is closed before being removed from the reactor. Moving the close after seems to do the trick for epoll, but I haven't tested more.

Index: twisted/internet/tcp.py
===================================================================
--- twisted/internet/tcp.py     (revision 18011)
+++ twisted/internet/tcp.py     (working copy)
@@ -457,6 +457,13 @@
             not hasattr(self, "connector")):
             return

+        self.connector.connectionFailed(failure.Failure(err))
+        if hasattr(self, "reactor"):
+            # this doesn't happen if we failed in __init__
+            self.stopReading()
+            self.stopWriting()
+            del self.connector
+
         try:
             self._closeSocket()
         except AttributeError:
@@ -464,13 +471,6 @@
         else:
             del self.socket, self.fileno

-        self.connector.connectionFailed(failure.Failure(err))
-        if hasattr(self, "reactor"):
-            # this doesn't happen if we failed in __init__
-            self.stopReading()
-            self.stopWriting()
-            del self.connector
-
     def createInternetSocket(self):
         """(internal) Create a non-blocking socket using
         self.addressFamily, self.socketType.

comment:21 Changed 15 years ago by therve

Keywords: review added

comment:22 Changed 15 years ago by Jonathan Lange

Priority: lowhighest

comment:23 Changed 15 years ago by Jean-Paul Calderone

Owner: therve deleted

comment:24 Changed 15 years ago by Jonathan Lange

So, basically, I'm not competent to review this branch: I've never done anything serious with pyrex or raw sockets, and it's too late in the day to puzzle things out. Here's what I got though.

Matters of form:

  • Add docstring for t.i.tcp.BaseClient.failIfNotConnected
  • 80 cols max in twisted/test/test_epoll.py
  • 80 cols max in twisted/internet/epollreactor.py
  • It's not clear why reads, writes etc are global and passed as default args. Please add a comment explaining why to epollreactor.py

Hunches:

  • The huge timeouts in test_epoll.py make me a little suspicious.
  • test_controlAndWait looks pretty big, is that really a unit test? It looks like five.

Questions:

  • Is "sys.platform == 'linux2'" an adequate test? I have no idea.
  • Do the tests pass using epollreactor?
  • Should there be an explicit test (presumably in test_tcp.py) for the change to tcp.py?

comment:25 Changed 15 years ago by Jonathan Lange

Oops. Forgot:

  • Please update the trial manpage with the additional reactor
  • Well documented branch, thanks :)

comment:26 in reply to:  24 Changed 15 years ago by therve

Replying to jml:

Add docstring for t.i.tcp.BaseClient.failIfNotConnected

Done.

80 cols max in twisted/test/test_epoll.py

Done.

80 cols max in twisted/internet/epollreactor.py

Done.

It's not clear why reads, writes etc are global and passed as default args. Please add a comment explaining why to epollreactor.py

It comes from pollreactor. I think it's mainly for performance purpose (bind local variables). I'll add a comment.

The huge timeouts in test_epoll.py make me a little suspicious.

It's miliseconds ? 1s doesn't seem too much.

test_controlAndWait looks pretty big, is that really a unit test? It looks like five.

Most of the work is to setup the sockets, but I guess it could be split in 2.

Is "sys.platform == 'linux2'" an adequate test? I have no idea.

No, it's not, because I think epoll is at least available on OS X. I added this test because it worked for me, but it needed a review by someone more experimented.

Do the tests pass using epollreactor?

Yes they do, once applied the patch on tcp.py.

Should there be an explicit test (presumably in test_tcp.py) for the change to tcp.py?

That'd be better I guess. I think other reactors should raise an error at the same place (in removeReader/removeWriter), but they don't seem to :).

Please update the trial manpage with the additional reactor

Done.

Well documented branch, thanks :)

I did my best :). I'll attach the patch when I'll get access to a working svn env.

comment:27 Changed 15 years ago by Glyph

Keywords: review removed

I'm taking the "review" keyword off until that patch is attached. Please re-apply the keyword when you attach the patch...

Changed 15 years ago by therve

Attachment: comment1_1953.diff added

comment:28 Changed 15 years ago by therve

Keywords: review added

Here's my patch. Still to do:

  • Find the good test in topfiles/setup.py to choose to build _epoll.
  • Test for failIfNotConnected if appropriate/necessary.

comment:29 Changed 15 years ago by Jean-Paul Calderone

Applied the patch to the branch.

comment:30 Changed 15 years ago by Ralph Meijer

Cc: Ralph Meijer added

So, if there is still a todo, is this branch up for review?

comment:31 Changed 15 years ago by therve

Keywords: review removed

I set the review tag hoping someone will find an answer to both questions. I guess I'll try to answer myself.

comment:32 Changed 15 years ago by Jean-Paul Calderone

Keywords: review added
Owner: set to Jean-Paul Calderone

I don't think the failIfNotConnected change needs explicit tests.

I changed setup.py to check for sys/epoll.h.

Ready for review, I think.

comment:33 Changed 15 years ago by Jean-Paul Calderone

Owner: Jean-Paul Calderone deleted

comment:34 Changed 15 years ago by PenguinOfDoom

Keywords: review removed
Owner: set to Jean-Paul Calderone

The "except IOError;" clause in doPoll should probably re-raise, just in case. setup.py still checks for linux2, and not sys/epoll.h Other than that, it looks fine.

comment:35 Changed 15 years ago by PenguinOfDoom

It should re-raise unless it's a EINTR, same as pollreactor.

comment:36 Changed 15 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Changes made, please re-review.

comment:37 Changed 15 years ago by Jean-Paul Calderone

Cc: PenguinOfDoom added

comment:38 Changed 15 years ago by PenguinOfDoom

Keywords: review removed
Owner: set to Jean-Paul Calderone

Looks fine! Fire the commit cannons.

comment:39 Changed 15 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [18461]) Merge epollreactor-1953

Author: therve, exarkun Reviewer: PenguinOfDoom Fixes #1953

Add a level-triggered epoll-based reactor.

Also re-order the removal and the shutdown of sockets from the reactor in the tcp code shared between reactors in order to avoid possibly passing a no-longer valid file descriptor to a control API such as epoll_ctl() or kevent().

comment:40 Changed 15 years ago by Jean-Paul Calderone

(In [18464]) Merge epollreactor-1953

Author: therve, exarkun Reviewer: PenguinOfDoom Fixes #1953

Add a level-triggered epoll-based reactor.

Also re-order the removal and the shutdown of sockets from the reactor in the tcp code shared between reactors in order to avoid possibly passing a no-longer valid file descriptor to a control API such as epoll_ctl() or kevent().

comment:41 Changed 15 years ago by Jean-Paul Calderone

(In [18465]) Revert r18464 - test suite regression

Refs #1953

comment:42 Changed 15 years ago by Jean-Paul Calderone

Resolution: fixed
Status: closedreopened

For those watching at home, the reasons this ticket should have failed review:

  • tests failed on OS X
  • tests failed on Win32
  • tests failed on Python 2.5

comment:43 Changed 15 years ago by jknight

I wonder if perhaps it'd be a good idea to have buildbot run tests for commits on all branches. How it is now, one has to press 12 build buttons to test a branch.

comment:44 Changed 15 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted
Status: reopenednew

Yes. That'd be a great buildbot feature.

Some changes committed to the branch:

  • Skip the tests if _epoll cannot be imported
  • Skip the tests if _epoll.epoll(n) fails with ENOSYS
  • Generate the .c using Pyrex 0.9.4.1 from the lxml svn repository, since it seems to have been updated to work with Python 2.5.

Please re-review. :(

comment:45 Changed 15 years ago by Stephen Thorne

Keywords: review removed
Owner: set to Jean-Paul Calderone

Following an interview with exarkun, exarkun is going to add some comments to the code based on some questions I asked, and i am running some tests ( http://twistedmatrix.com/buildbot/reactors/builds/1545 ).

If the tests pass, and the comments are committed, I'm prepared to accept this branch for merging. I'm removing the review flag pending either merging, or build failure.

comment:46 Changed 15 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [18475]) Merge epollreactor-1953

Author: therve, exarkun Reviewer: PenguinOfDoom, Jerub, jml Fixes #1953

Add a level-triggered epoll-based reactor.

Also re-order the removal and the shutdown of sockets from the reactor in the tcp code shared between reactors in order to avoid possibly passing a no-longer valid file descriptor to a control API such as epoll_ctl() or kevent().

comment:47 Changed 15 years ago by Jean-Paul Calderone

Keywords: review added
Resolution: fixed
Status: closedreopened

comment:48 Changed 15 years ago by Jean-Paul Calderone

Owner: Jean-Paul Calderone deleted
Status: reopenednew

comment:49 Changed 15 years ago by therve

It seems there was multiple attempts to commit. What's the last blocking problems ?

comment:50 Changed 15 years ago by Jean-Paul Calderone

The reason for the last revert was the code for skipping tests for the functionality on platforms which lack epoll was broken. It may be fixed in the branch now. Just waiting for more reviewer time.

comment:51 Changed 15 years ago by Ralph Meijer

Keywords: review removed
Owner: set to Jean-Paul Calderone

I visually and mechanically tested if the last change by exarkun makes sure the tests are skipped when _epoll is missing. Also made sure the epoll tests pass when _epoll is present.

Please merge.

comment:52 Changed 15 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [18507]) Merge epollreactor-1953-2

Author: exarkun, therve Reviewer: jml, jerub, penguinofdoom, ralphm Fixes #1953

Add a level-triggered epoll-based reactor.

Also re-order the removal and the shutdown of sockets from the reactor in the tcp code shared between reactors in order to avoid possibly passing a no-longer valid file descriptor to a control API such as epoll_ctl() or kevent().

comment:53 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.