Opened 5 years ago

Last modified 20 months ago

#4387 enhancement new

API for distributing `ITransport` and `IListeningPort` implementations to different processes

Reported by: glyph Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: thomas@…, marcandre.lureau@…, bra@…, manishtomar.public@… Branch:
Author: Launchpad Bug:

Description

One design pattern for high-performance servers is to have a master process which accepts incoming connections, but then dispatches those connections to sub-processes or peer processes which handle the actual content of the connection, either immediately or after some initial handshake.

While it's always possible to simply forward traffic, it is of course more efficient for the master accepting process to just get out of the way completely and let the application process handle all system calls and buffering itself.

On UNIX, you can use the sendmsg and recvmsg functions along with the SCM_RIGHTS ancillary data type to transmit file descriptors between processes via an AF_UNIX socket.

On Windows, I believe the WSADuplicateSocket function serves the same purpose, albeit in a more restricted form (only sockets may be transferred, not arbitrary handles).

You can already implement bits and pieces of this using Twisted, by inspecting internal state in the reactor, but we should have a high-level, portable interface for establishing the channel to send sockets to a different process.

Fluendo has also generously voiced some interest in contributing their implementation of this idiom to Twisted.

I have also recently done some work on Darwin Calendar & Contacts Server to implement this feature. I can use that code as a basis for the API in Twisted as well.

While I tend to prefer my own implementation for its minimalist usage of C (fluendo implements FD-sending at the C level, whereas I just have a very literal-minded sendmsg wrapper, with FD-sending implemented in Python) it isn't quite general enough to use as-is. At least, the high-level bits aren't, as they rely on application-level (protocol-level) hooks to report status back to the parent process, whereas that should be the responsibility of the transport. However, as I said, the lowest-level bits are a fairly straightforward binding to the sendmsg and recvmsg C library functions; essentially a marginally improved rewrite of the sendmsg code which has been floating around in various people's sandboxes for quite a while.

While I mention the Windows here to provide a reference point for the portability restrictions of a high-level API, we can definitely defer that work to a separate ticket.

Also, given that you can inherit the connection in multiple different ways - via file descriptor inheritance, via connect() to a numeric UNIX socket, connect() to a UNIX named socket, or via a process identifier on Windows, we may need to provide some platform-specific bootstrapping APIs which result in a common interface once invoked.

Change History (21)

comment:1 Changed 5 years ago by exarkun

This seems to be conflating two separate features:

  1. Passing an existing socket (or file descriptor) to another process; or receiving such a thing.
  2. Putting a socket (or file descriptor) into a reactor to be monitored for read and write events.

There should be no reason to require that these both be implemented at the same time, although clearly they are useful in conjunction with each other.

comment:2 Changed 5 years ago by glyph

Replying to exarkun:

This seems to be conflating two separate features:

There are several layers of features here. First, obviously, we need bindings for the low-level functionality that would enable us to implement the other bits, but I am most interested in the highest-level API; the one where I take an object supporting some interface, send it, and then receive it on the other end as the same kind of object. In other words, I want to be able to take a twisted.internet.tcp.Server, serialize it, and get a twisted.internet.tcp.Server out on the other end - properly disposing the Server in the parent process along the way, at the correct time. In addition to transferring the socket, this means some coordinating protocol between client and server.

So, as I see it, there is:

  1. Low-level bindings for passing an existing socket (on POSIX). We need this regardless of whatever else happens, so I made a separate ticket for it.
  2. Low-level bindings for passing an existing socket (on Windows). We clearly need these either way, so I made a separate ticket for that too. PyWin32 already wraps DuplicateHandle, but you can't use that to wrap sockets. To quote the documentation: "You should not use DuplicateHandle to duplicate handles to (...) sockets. No error is returned, but the duplicate handle may not be recognized by Winsock at the target process. Also, using DuplicateHandle interferes with internal reference counting on the underlying object. To duplicate a socket handle, use the WSADuplicateSocket function."
  3. Have a way of listening in the reactor for a received socket - we need something that calls recvmsg to get ancillary data in response to doRead on the POSIX side... on the Windows side, it seems like the thing to do would just be a regular Protocol which would receive a message and then call WSASocket with some of the data that it just got.
  4. Have a way of associating that received socket with some metadata to describe the socket that got transmitted.
    1. What address family and socket type is it? We could implement this either by extending our C module to allow us to call getsockname without first having to call socket.fromfd first (and therefore already know the address family), or by sending the data along another channel. AF_INET/SOCK_STREAM will be by far the most popular, I'm sure, but one day it would be nice to support AF_INET6 too.
    2. What application-level protocol should it be speaking? As the simplest example (the only one which my existing code actually implements): if you have a listening port speaking TLS and a port speaking plaintext, but both are sending connections via the same UNIX-domain socket, each one needs to be told what it is expected to speak.
    3. Is there any extra data that we want to re-enqueue (to allow for the super-process to have a bootstrapping conversation, we may have already recv()'d some of the data intended for the subprocess, so we need to put that back so that the subprocess can handle it).
  5. As far as "Putting a socket (or file descriptor) into a reactor to be monitored for read and write events.", it seems like that would just be startReading and startWriting - but I guess you meant something more like "an actual, public API for instantiating things like twisted.internet.tcp.Server from a file descriptor", since this currently relies on a private instance attribute of tcp.Port and is almost totally undocumented.
  6. Finally, of course, the high-level interface would need to have a separate, portable taxonomy of the objects required for sending and receiving connections, since WSADuplicateSocket and sendmsg are doing fairly different things and have fairly different requirements, under the covers.

The reason we might want to implement some of these features at the same time is to ensure that all the medium-level hooks are implemented are sufficient to support the highest-level interface.

comment:3 Changed 5 years ago by exarkun

I am most interested in the highest-level API; the one where I take an object supporting some interface, send it, and then receive it on the other end as the same kind of object.

That is definitely a cool use case which should be supported. Let's just make sure not to support it to the detriment of other use cases, like handing off a connection to a non-Twisted-based process, or accepting a file descriptor from a non-Twisted-based process.

comment:4 Changed 5 years ago by thomasvs

On the Flumotion side, I created https://code.fluendo.com/flumotion/trac/ticket/1406 to track this.

comment:5 Changed 5 years ago by thomasvs

  • Cc thomas@… added

comment:6 Changed 5 years ago by mlureau

  • Cc marcandre.lureau@… added

comment:7 Changed 5 years ago by bra

  • Cc bra@… added

This one is badly needed, I hope it won't sink and turns into actual code.

comment:8 follow-up: Changed 4 years ago by alessandrod

Note that although it's far from being perfect, there's partial support for the low level bits needed to transfer sockets in _multiprocessing.sendfd and _multiprocessing.recvfd.

comment:9 Changed 4 years ago by glyph

Huh. This begs the question; is it better to call this private function or to include even more C code in Twisted? I don't like either option :-\.

comment:10 Changed 4 years ago by alessandrod

Ideally, I would patch multiprocessing to make them public (optionally implementing something similar for windows) and in the meantime use those functions even though they're private.

If someone really dislikes using those private functions, we could add a C extension to twisted for now and then patch multiprocessing.

comment:11 follow-up: Changed 4 years ago by tantra

Does you have any patch to apply to existing twisted distribution?

comment:12 in reply to: ↑ 11 ; follow-up: Changed 4 years ago by glyph

Replying to tantra:

Does you have any patch to apply to existing twisted distribution?

Not yet, but I hope I'll have the time to do one soon.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by tantra

Replying to glyph:

Replying to tantra:

Does you have any patch to apply to existing twisted distribution?

Not yet, but I hope I'll have the time to do one soon.

And i have follow question. How do you think, in you model where one process accepts all incoming connections may be ease to use haproxy(http://haproxy.1wt.eu/)?

comment:14 in reply to: ↑ 13 Changed 4 years ago by glyph

Replying to tantra:

Replying to glyph:

Replying to tantra:

Does you have any patch to apply to existing twisted distribution?

Not yet, but I hope I'll have the time to do one soon.

And i have follow question. How do you think, in you model where one process accepts all incoming connections may be ease to use haproxy(http://haproxy.1wt.eu/)?

I don't really know much about haproxy, but I don't believe it has much to do with this ticket.

comment:15 Changed 4 years ago by tantra

Please forgive me for my obsession. But why not simply add support for fork. It's imho easier and more efficient. In this scheme no front end process that accept all connection, which of them will do accept determined by OS intenals(kqueue, epoll etc).
Yes there is some issue in reactors for example kqreactor(kqueue doesn't duplicate when we do fork) but all this things solved. For example implement in reactor function spawWorker, which will do all fork based stuff. Perhaps I did not understand some things and wrong?

Thank you for your patience :-))

comment:16 Changed 4 years ago by exarkun

But why not simply add support for fork.

I think I answered that question fairly well in my recent post to the mailing list.

If you think some point made there is incorrect, feel free to explain why. If you think the answer is just that all the problems can be solved, then feel free to contribute a patch. :) I think it's more effort than it's worth, and offers no significant advantages over the APIs proposed to resolve this ticket, so I'm not going to work on it; I suspect Glyph and many others feel the same way.

comment:17 Changed 4 years ago by exarkun

In this scheme no front end process that accept all connection, which of them will do accept determined by OS intenals(kqueue, epoll etc)

Maybe I should respond to this as well. The APIs suggested to resolve this ticket do not require a special front-end process. Instead, the listening port can be shared amongst many processes and they can all accept connections from it directly. However, unlike a fork-based solution, an explicit fd passing API also allows a custom front end process which then distributes work to back end processes. Such a front end may be desirable, for example to ensure that all HTTP requests in a particular session are handled by the same process.

comment:18 Changed 4 years ago by <automation>

  • Owner glyph deleted

comment:19 in reply to: ↑ 8 Changed 3 years ago by exarkun

Replying to alessandrod:

Note that although it's far from being perfect, there's partial support for the low level bits needed to transfer sockets in _multiprocessing.sendfd and _multiprocessing.recvfd.

http://bugs.python.org/issue11657 probably torpedoes this idea, at least for a while (because the fix will very likely not be backported to Python 2.5 and perhaps not even to Python 2.6).

comment:20 Changed 20 months ago by manishtomar

  • Cc manishtomar.public@… added

comment:21 Changed 20 months ago by exarkun

Just pointing out that the following exist now:

  1. IUNIXTransport.sendFileDescriptor, allowing file descriptors to be copied into new processes (on the same host, of course
  2. twisted.protocols.amp.Descriptor, based on the above feature, allowing file descriptors to be passed as arguments in AMP commands and responses.
  3. IReactorSocket, allowing existing listening ports and stream-oriented connections, represented as file descriptors, to be added to the reactor.

This seems like, taken together, these are the features this ticket is describing - or else they are at least all of the features necessary to implement the feature this ticket is describing.

So, I wonder if someone would like to outline what APIs are actually still missing and need to be added to consider this ticket complete.

Note: See TracTickets for help on using tickets.