Ticket #4859 enhancement new
client endpoint: super-smart name-based TCP connection algorithm
| Reported by: | glyph | Owned by: | ashfall |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | core | Keywords: | endpoint ipv6 |
| Cc: | twistedmatrix.com@… | Branch: |
branches/gai-endpoint-4859-4
(diff, github, buildbot, log) |
| Author: | ashfall | Launchpad Bug: |
Description (last modified by glyph) (diff)
The Goal
The user wants to enter the name of a particular host, and connect as quickly as possible. They may also want to enter a port number or service name.
The application developer wants to let the user do that, and just use a simple-to-construct endpoint to do all the work involved in that.
The Problems
Name resolution and routing are not always sensibly connected. In particular, it is very common for networks to automatically configure their clients with local IPv6 addresses and happily resolve remote IPv6 addresses, but be misconfigured in such a way as to not route IPv6 past the border gateway. It isn't even that unusual for the network, or a particular host on it, to publish an internal IPv6 address that, for whatever reason, won't even respond to IPv6 locally.
The fact that it doesn't route IPv6 at all means that you don't get any feedback that your connection attempt isn't working besides the eventual timeout from your first SYN packet.
Of course, IPv6 isn't the only reason a network or nameserver may be misconfigured in this way. Un-connectable hosts happen all the time; it's just that this is a particularly common problem that one hits when talking about switching from a naive IPv4 configuration connection to a more sophisticated multi-address-family approach. Really though, even if you're doing IPv4 correctly, you'll hit it sometimes.
The Solution
We should follow the relevant specification and resolve all possible connectable addresses under the given host name / service name combination using getaddrinfo. (While we should not rule out a truly asynchronous version of getaddrinfo, this involves trying to parse a lot of platform-specific policy and it would be best to keep that work separate.)
Then, as said specification suggests, we should attempt to connect to them in the order in which they are returned, as that is the preferred order. However, as some addresses may not respond promptly enough, we should initiate several attempts in parallel.
If everything's working properly, the first attempt will complete quickly and we won't even make the second one. If there's a little bit of lag, the first attempt should still have an advantage over the second by virtue of the fact that it initiated faster and lag should affect them equally, it'll complete first, and we will cancel the second one.
In the case that one or more of the addresses is going to time out for some reason, the user won't have to wait for every one to time out in turn; they'll be timing out in parallel.
In order to conserve resources, and to avoid bugs where user code gets invoked twice, once one connection attempt has succeded, we should cancel all the outstanding ones.
It would be useful to represent this internally as one unit which converts the hostname/service pair into a list of endpoints, and then a separate unit which implements connecting in parallel to a list of endpoints. It may be useful in the future to expand the name-resolution portion of this to generate endpoints which do something custom (for example: resolve "hostnames" by looking at an OpenSSH format ssh_config file with Host lines in it, then doing the process recursively to resolve the real underlying hostnames and using conch to actually connect).
