[Twisted-Python] Integrating Twisted with ZeroMQ

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Mon Jun 7 00:54:14 EDT 2010


On 6 Jun, 07:59 pm, lvh at laurensvh.be wrote:
>Hey,
>
>
>For the Twisted folks: this thing has been reviewed by the ZeroMQ
>folks first because I wanted to be sure I got the technical details
>right on the their side of things.
>
>I'd like to open up a discussion from a while back regarding the
>integration of ZeroMQ (a messaging system: similar to AMQP but with
>the intent to be simpler) into Twisted.
>
>The interested ZeroMQ people and the interested Twisted people (names
>withheld to protect the guilty) disagreed on what it should look like.
>I think that's mostly because neither party really understood what the
>other's software wanted to do. So, I'll try to give everyone a basic
>explanation without going too deep into either Twisted or ZeroMQ: my
>apologies if I spell out the basics of your thing too much and it gets
>boring :)
>
>ZeroMQ aims to be a thin layer above TCP, behaving like TCP but
>'better'. That sounds like a vague marketing statement, but it helps
>to understand some of the terminology if you keep that in the back of
>your head. (What exactly 'better' means is way beyond the current
>scope: basically, ZeroMQ wants to help socket programmers to stop
>reinventing the wheel by implementing common behavior such as pub/sub,
>request/reply...). Essentially AMQP but much simpler, and brokerless
>in most cases. This email is already going to go way over the sane
>character count, thankfully the ZeroMQ webpage does a great job at
>explaining stuff :-)
>
>I think this highlights the main problem people had. There a partial
>overlap between Twisted and ZeroMQ. The ZeroMQ implementation does
>things Twisted does too: it implements a bunch of low level networking
>stuff using eg epoll. It deals with real sockets, and Twisted wants to
>do that as well.
>
>ZeroMQ uses things called Sockets. They're similar but not the same
>thing as TCP sockets (instead delegating work to TCP eventually), so
>you can't use traditional methods like select or epoll with them,
>because, for example, they don't have file descriptors. Some
>underlying thing probably does have fds; but ZeroMQ worries about that
>for you under the hood, just like Twisted does for other TCP traffic.
>
>There are a couple of options for making ZeroMQ work with Twisted:
>
>1) implement everything in Python, using Twisted's TCP stuff. I think
>this is mostly a bad idea and the ZeroMQ people seem to agree: _lots_
>of work, ZeroMQ libs are stupidly fast already, Python not being the
>best tool for binary protocols...
>2) write a thin wrapper around the C(++) libs: great, as long as it
>never has to go into the Twisted trunk
>3) use pyzmq's thin wrapper around the C(++) libs: sounds like the
>best idea to me, again with reservations wrt the Twisted trunk
>
>Originally there was a fourth idea, which considered libzmq as a new
>mechanism: like epoll, so you'd have a ZMQ-specific reactor. A bunch
>of people didn't like this, and I can somewhat see the point: hard to
>integrate with other event loops like GUIs, for example.
>
>pyzmq offers something called select, which works just like select
>except it works on both file descriptors and ZeroMQ Sockets. It just
>delegates all of the work to libzmq. We could use
>ThreadedSelectReactor and have it use ZMQ's select. I'm not sure if it
>should use "normal" select everywhere else: because zmq's select is in
>fact much better than select.select (it just behaves like
>select.select in the sense that you give it three sets of fds and an
>optional timeout; under the hood it's actually epoll or kqueue or
>whatever) and it can handle plain old file descriptors just fine. So,
>you'd have a TRS with either 1 zmq.select running on everything or 1
>zmq.select running over Sockets and 1 select.select running over your
>classic fds. Personally I kind of like the idea of zmq's select taking
>over, but I don't know how well that works in practice.

A shortcoming of this approach is that much of the inefficiency of 
select(2) comes from its API.  If you have a select(2)-compatible API 
that's implemented in terms of epoll, you're still wasting a ton of 
effort that you could be skipping if you were using an epoll-compatible 
API instead.

But this is only an argument about performance, and likely no one is 
going to care about the poor performance of zmq.select anyway.
>
>A potential option for Twisted, which some people don't quite like,
>would be to have a listenZMQ and connectZMQ, analogous to
>listenTCP/listenUDP/listenSSL and the respective connect*s. I think
>this makes more sense to the ZeroMQ people (who think of ZeroMQ as a
>layer "next to" TCP which happens to be implemented on top of TCP, on
>top of which you build your stuff) than the Twisted people (who think
>of ZeroMQ's protocol as yet another TCP-using protocol just like HTTP
>for example). Having worked with both pieces of software, the more I
>play with ZeroMQ the more I think listenZMQ/connectZMQ make sense.
>ZeroMQ really tries to be one of those things and it shows. What
>ZeroMQ wants to do is semantically much closer to the existing
>connects and listens. I'm not just making this up: the ZeroMQ people
>have reviewed this and this is really what ZeroMQ wants to be.

A shortcoming of this approach is that as a reactor method, you have to 
implement it for each reactor you want to support.  You covered this a 
bit earlier in your email, where you talked about GUI integration.  Do 
you want to maintain an implementation of {listen,connect}ZMQ for 
select(/whatever), Glib2, Gtk2, wxWidgets, Qt, and Windows?  That's a 
lot more work than just maintaining one implementation.
>
>Another argument for making ZMQ special is that TCP is just one of the
>things ZeroMQ works with. UNIX domain pipes, PGM reliable multicast,
>UDP PGM encapsulation, and even inter-thread communication.

You got this one backwards.  This is an argument for not implementing 
ZMQ at the same level as TCP and UNIX sockets.  This is an argument for 
implementing it *on top of* those things.  Of course, the main benefit 
of implementing it on top of them is that you don't have to write a 
bunch of code to support each transport.  And the ZMQ people did that 
already.

Here's how it should work (modulo stupid factoring issues that aren't 
really related to ZMQ issues), given that there's a big C library that 
already implements a bunch of stuff that you don't want to re-implement:

    from twisted.internet.interfaces import IReactorFDSet

    class ZMQTransport(object):
        def __init__(self, reactor, zmqSocket, protocol):
            self._zmqSocket = zmqSocket
            self._transportPieces = []
            # On the next line, I use a method which I made up.  Maybe it
            # corresponds to some actual API ZMQ provides, maybe not, I
            # dunno.
            for fd in zmqSocket.allFileDescriptors():
                desc = _ZMQFileDescriptor(reactor, fd, zmqSocket)
                self._transportPieces.append(desc)

            self._protocol = protocol
            self._protocol.makeConnection(self)


    class _ZMQFileDescriptor(object):
        def __init__(self, reactor, fd, zmqSocket):
            if not IReactorFDSet.providedBy(reactor):
                raise RuntimeError(
                    "This is the IReactorFDSet implementation; "
                    "use another reactor or another zmq transport.")

            self._reactor = reactor
            self._reactor.addReader(self)
            self._fd = fd
            self._zmqSocket = zmqSocket

        def doRead(self):
            # Another made up method
            zmqEvents = self._zmqSocket.nonBlockingReadFrom(self._fd)
            if zmqEvents:
                self._protocol.zmqEventsReceived(zmqEvents)

        def doWrite(self):
            # One more, for luck.
            finished = self._zmqSocket.nonBlockingWriteTo(self._fd)
            if finished:
                self._reactor.removeWriter(self)

        def fileno(self):
            return self._fd

        def sendZMQEvents(self, events):
            # Whatever the API is.
            self._zmqSocket.sendZMQEvents(events)
            self._reactor.addWriter(self)


    class ZMQProtocol(object):
        def makeConnection(self, zmqTransport):
            self.zmqTransport = zmqTransport

        def zmqEventsReceived(self, zmqEvents):
            pass


    def connectZMQ(reactor, addrinfo, factory):
        # Blah blah blah - somehow get to the point where you have a 
# ZMQ Socket.
        d = ...
        def cbConnectionSetup(socket):
            ZMQTransport(
                reactor, socket, factory.buildProtocol(addrinfo))
        d.addCallback(cbConnectionSetup)

    def main():
        from twisted.internet import reactor
        from twisted.internet.protocol import ClientFactory
        f = ClientFactory()
        f.protocol = ZMQProtocol
        connectZMQ(reactor, ('example.com', 1234), f)
        reactor.run()


Okay, so that came out a little longer than I planned, but turn about is 
fair play.  Anyway, this is a bog standard transport implementation. 
The only thing even remotely interesting is that it maps multiple file 
descriptors onto a single transport.  And that seems to be the
So, if the ZMQ library offers APIs like the ones used in this example, 
then you're all set.  With just a little more code, you can have an 
overlapped I/O version of this transport (for the one Twisted reactor 
that doesn't support IReactorFDSet).  And then you've got proper Twisted 
ZMQ support.

If it *doesn't* offer APIs like these, then I'd say it's missing some 
pretty critical APIs.  After all, if you can't drive it this way, your 
chances of being able to write reasonable unit tests for ZMQ-based code 
are somewhat diminished (not out the window, but it'll be annoying).

And I don't understand how you would implement something like ZMQ in a 
way that *didn't* make it easy to do this.  *Particularly* since they 
have support for several different event notification APIs.  So 
hopefully the worst case is that there are no APIs like these, but it's 
a minor oversight because the authors thought no one would want them, 
but they can be added trivially because they map directly onto how the 
underlying implementation works.
>
>I know some Twisted people way smarter than me basically thought the
>connectZMQ/listenZMQ thing was a mistake, but I'm not sure to what
>extent that is because they were right and to what extent that was
>because they didn't really know very much about ZeroMQ and just went
>"it works on top of TCP so that's not where it goes". To Twisted folks
>that disagree: would you change your opinion of ZMQ was *really*
>something that's side-by-side with TCP instead of being implemented on
>top of it? Like, say, SCTP is? Does the fact that it can work on top
>of a bunch of stuff that isn't TCP change that?

If ZMQ were supported in the kernel with new syscalls to interface with 
it, then it would be nonsensical to talk about implementing it on top of 
Twisted's existing TCP support.  You simply couldn't, because all of the 
code would have been pushed into the kernel where it can't be used any 
other way.  This doesn't mean it would be a good idea overall to have 
ZMQ supported at the same level as TCP, though: it just means there 
would be no other alternative (aside from not supporting it - like what 
Twisted for SCTP).

Whether or not it makes any sense to implement ZMQ in the kernel is 
something I have no opinion on, since I don't know nearly enough about 
the particular details of ZMQ.
>
>Talking with the ZeroMQ people has been a positive experience: they
>were very accessible and cooperative, and really just want a bigger
>market for their software (who doesn't?) so I hope something useful
>comes out of this :-)

Great!  Convince them to add the necessary APIs (if they don't exist 
already) from above and everything should be set. :)

Jean-Paul



More information about the Twisted-Python mailing list