Opened 2 years ago

Closed 2 years ago

#5695 enhancement closed fixed (fixed)

Add IPv6 TCP Client Endpoint

Reported by: ashfall Owned by: ashfall
Priority: normal Milestone:
Component: core Keywords: endpoint ipv6
Cc: Branch: branches/ipv6-tcp-client-endpoint-5695-3
(diff, github, buildbot, log)
Author: ashfall Launchpad Bug:

Description

Narrowing down #4470 for clients.

Change History (24)

comment:1 Changed 2 years ago by ashfall

  • Author set to ashfall
  • Branch set to branches/ipv6-tcp-client-endpoint-5695

(In [34476]) Branching to 'ipv6-tcp-client-endpoint-5695'

comment:2 Changed 2 years ago by ashfall

  • Branch changed from branches/ipv6-tcp-client-endpoint-5695 to branches/ipv6-tcp-client-endpoint-5695-2

(In [34553]) Branching to 'ipv6-tcp-client-endpoint-5695-2'

comment:3 Changed 2 years ago by ashfall

  • Keywords review added
  • Owner ashfall deleted

comment:4 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to ashfall

Thanks! This looks really good. IPv6 has a number of subtleties, so I have quite a few review comments, but I want to emphasize that the code in the branch right now does look good in general and you're definitely going in the right direction. :)

  1. '::1' should not be special cased. Any other IPv6 address literal also does not need to be resolved using getaddrinfo. I think the proper check to use is isIPv6Address. If it is, you can proceed directly to using connectTCP. If it is not, you can use getaddrinfo to get an IPv6 address to use.
  2. Some additional checking of the result of getaddrinfo is required. It may return IPv4 addresses as well as IPv6 addresses. Consider the result of socket.getaddrinfo('www.google.com', 0). The ordering of the result list actually depends on local system configuration, but as of today most systems are probably configured to return IPv4 before IPv6 (and the order is supposed to indicate preference). Add at least one more test that simulates a result with ipv4 and ipv6 results and verify that the ipv6 address is used. (But also note that if you do pass an address family as the 3rd argument to getaddrinfo, it is supposed to only return results that match that address family, so you can use this feature to restrict the results to IPv6 if you want.)
  3. The endpoint probably shouldn't save the result of getaddrinfo. The address associated with a hostname may change while the program is running, and the program should have the possibility of using the new address rather than the old, saved address. Any caching that is appropriate belongs inside the implementation of getaddrinfo rather than in the endpoint.
  4. nameResolution should be a private method instead of a public one.
  5. host_tuple is misnamed according to the naming convention, it should be hostTuple. Also, since getaddrinfo returns a list, it's a bit of a misleading name. :)
  6. There's some trailing whitespace in the implementation of TCP6ClientEndpoint.
  7. _TCPClientEndpoint doesn't look like a useful base class in the latest version of the code. TCP6ClientEndpoint doesn't re-use any of its functionality. It should probably be removed and TCP4ClientEndpoint restored to how it was before.
  8. Also, note that "pass" is only required where Python syntax requires at least one statement and you don't have anything else to put there. A docstring counts as a statement, so if you have a class definition with a docstring, it doesn't also need "pass" after the docstring.
  9. The myGetaddrinfo argument to TCP6ClientEndpoint should be less intrusive. There are a couple things wrong with it now:
    1. It's a required argument. This means no one can use the endpoint unless they know to pass socket.getaddrinfo as the value for this argument. Apart from being unnecessary extra work for people, many people probably won't know that this is what to pass to make the endpoint work. The default should be to work correctly.
    2. It's part of the public interface. It's not clear there's any reason beyond testing for anyone to use this point of customization. So it should be private and out of the way. One common option to avoid users accidentally (or intentionally) using this is to just make it an attribute of TCP6ClientEndpoint, not an argument to __init__. The value of the attribute can always be the correct value by default, socket.getaddrinfo, but the unit tests can set a new value for the attribute before calling connect. The attribute can also be private (as it is now), keeping it out of the way of normal users. This also makes it easier for us to change how the hostname is looked up in the future. If myGetaddrinfo is part of the public interface, it's subject to the backwards compatibility policy, which makes changing it a lot more work.
  10. test_nameResolution should also verify that the correct arguments are passed to getAddress. This is important, since the real socket.getaddrinfo will only return the correct results if passed the correct arguments.
  11. Also, perhaps consider passing self._port to getaddrinfo instead of 0. The second argument is supposed to represent the port. I don't know what cases this makes a difference in, but it's easy to do so we might as well (plus unit test :).

Thanks again.

comment:5 Changed 2 years ago by exarkun

  1. Also, perhaps consider passing self._port to getaddrinfo instead of 0. The second argument is supposed to represent the port. I don't know what cases this makes a difference in, but it's easy to do so we might as well (plus unit test :).

Actually, let's not bother. I can't find any information about this argument mattering at all. 0 is good enough until someone can explain why anything else is any better.

comment:6 Changed 2 years ago by ashfall

  • Keywords review added
  • Owner ashfall deleted

Changed code as per review.

comment:7 Changed 2 years ago by ashfall

Filed #5719 for the string description plugin for this endpoint.

comment:8 follow-up: Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to ashfall

Cool, thanks! This looks really close now. A few points:

  1. _resolvedHostConnect can avoid the need for an isIPv6Address check if it just demands that, yes, its argument *must* be an IPv6 address. This is mostly done already, since connect also uses isIPv6Address and takes different code paths depending on the result. The only problem is that getaddrinfo doesn't return just the bare IPv6 address, it returns a big structure that somewhere down inside contains the desired IPv6 address. However, this can be fixed. A useful technique when using Deferreds is to use intermediate callbacks to restructure data, so it always ends up in the same shape, regardless of which code paths get followed (in particular, of which branches of an if get followed, in this case). Consider:
def foo():
    return succeed([100, 200])

def bar():
    return succeed(300)

def baz(quux):
    if quux:
        d = foo()
    else:
        d = bar()
    d.addCallback(foobarbaz)
    return d

def foobarbaz(result):
    if isinstance(result, list):
        result = result[1]
    print "The result is %d" % (result,)

Contrast this with:

def foo():
    return succeed([100, 200])

def bar():
    return succeed(300)

def baz(quux):
    if quux:
        d = foo()
        d.addCallback(lambda result: result[1])
    else:
        d = bar()
    d.addCallback(foobarbaz)
    return d

def foobarbaz(result):
    print "The result is %d" % (result,)

In the second version, the logic about the existence of a list and the code for dealing with it is kept with the code that already knows there will be a list. The ultimate callback, foobarbaz, is simplified by only having to deal with one kind of data, and the second conditional for determining what kind of data we have is eliminated by combining its code with the code already in the first conditional.

This produces simpler, more maintainable code compared to the other version. I suggest applying this idea to _nameResolution/_resolvedHostConnect.

  1. TCP6ClientEndpoint.__init__ could use some parameter documentation. Rather than duplicating the documentation for connectTCP, though, perhaps @see: L{twisted.internet.interfaces.IReactorTCP.connectTCP would be appropriate, with just an extra note that host should be an IPv6 address literal or a hostname with an IPv6 address.
  2. The docstring for TCP6ClientEndpoint._resolvedHostConnect would probably be better as the docstring for connect, maybe with an added note about resolving hostnames.
  3. in test_nameResolution, getAddress has a long line. Can you wrap it at 79 columns?
  4. All of the IPv6 addresses returned by getAddress in test_nameResolution are the same. Can you vary them, so the test reliably demonstrates that the endpoint selects the first? Also, it would be fine to use fake values, like "::2", "::3", etc. These are a bit easier on the eyes.
  5. Same comments for test_ipv6HostAddress - but, what is test_ipv6HostAddress testing, beyond what test_nameResolution tested?
  6. There's no test coverage for passing an IPv6 address literal (instead of a hostname) to TCP6ClientEndpoint.

Thanks again! This is shaping up well.

comment:9 in reply to: ↑ 8 Changed 2 years ago by ashfall

  • Keywords review added
  • Owner ashfall deleted

Replying to exarkun:

Thanks for the review! Made the changes.

  1. Same comments for test_ipv6HostAddress - but, what is test_ipv6HostAddress testing, beyond what test_nameResolution tested?

test_ipv6HostAddress was meant to fail if the endpoint is called with any address other than one belonging to the IPv6 address family. I guess we don't need it. Removed.

comment:10 Changed 2 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

Reviewing.

comment:11 Changed 2 years ago by glyph

New build results, since I couldn't tell about the provenance of the last one.

comment:12 follow-up: Changed 2 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to ashfall
  • Status changed from assigned to new

Hi ashfall,

First, the usual mandatory/coding-standard stuff:

  1. There are some merge conflicts now unfortunately, so you'll need to make sure to resolve those; might want to force a build before landing, just to make sure.
  2. There's still some undocumented stuff here. We should document all new public variables and private variables. For example, _myGetaddrinfo should probably explain that it's a hook used for testing. Documentation of stuff like this is useful for maintainers who come to the code a few years later and need to change something. (Tests are good for making sure the implementation behaves like it wants to behave, but documentation is good so the maintainer can understand why the implementation is the way it is.)
  3. I started off thinking that [0][1][0] was just overly magical and you should use some named constants there, like result[0][GAI_ADDRESS][GAI_ADDRESS_HOST] might be an improvement. It's always good to use sensible names rather than magic numbers, if you can. But then I realized that GAI_ADDRESS should actually be 4, not 1. I'm tired and I can't see the flaw in the test coverage right now, but all of test_endpoints passes if I change this to [0][781234][0], so clearly the tests are deficient in their coverage somehow!
  4. Tests should always refer to domains in example.com. Let's not refer to specific companies, even if Google is very nice to host the Summer of Code program :). It's not just a good idea, it's a documented Internet standard that must be followed!

And then some suggestions, which you may want to do something about or you may just want to ponder for future things:

  1. There's nothing that makes sure that, for example, the data structure returned by test_nameResolution is actually the right shape for a getaddrinfo result. Partly this is just Python's fault, for not actually making that a data type that you can instantiate, rather than a list of tuples. But it might be helpful to have some kind of function that returns test data, and then have a test making sure that it returns a result that at least resembles a real GAI result by getting, for example, the results for 'localhost' or some other it-should-always-work address; or if nothing should necessarily "always" work, perhaps detecting whether the should-usually-be-working address is resolvable and then skipping said test otherwise.
  2. Again, fewer magic numbers; many of the values in that big ball of constants there are actually logical values OR'd together, and some of those are actually already present as symbolic constants in the socket module, so it would be better to get those from the platform than to assume they're always unchanging.

Almost there! Good luck with the next round.

comment:13 Changed 2 years ago by ashfall

  • Branch changed from branches/ipv6-tcp-client-endpoint-5695-2 to branches/ipv6-tcp-client-endpoint-5695-3

(In [34844]) Branching to 'ipv6-tcp-client-endpoint-5695-3'

comment:14 in reply to: ↑ 12 Changed 2 years ago by ashfall

  • Cc ashwini.oruganti@… added
  • Keywords review added
  • Owner ashfall deleted

Replying to glyph:

Addressed the points.

And then some suggestions, which you may want to do something about or you may just want to ponder for future things:

  1. There's nothing that makes sure that, for example, the data structure returned by test_nameResolution is actually the right shape for a getaddrinfo result. Partly this is just Python's fault, for not actually making that a data type that you can instantiate, rather than a list of tuples. But it might be helpful to have some kind of function that returns test data, and then have a test making sure that it returns a result that at least resembles a real GAI result by getting, for example, the results for 'localhost' or some other it-should-always-work address; or if nothing should necessarily "always" work, perhaps detecting whether the should-usually-be-working address is resolvable and then skipping said test otherwise.

Noted, as discussed, leaving it for now.

Almost there! Good luck with the next round.

Hopefully.

comment:15 Changed 2 years ago by ashfall

  • Cc ashwini.oruganti@… removed

Alright, why did trac me as CC. I did not ask it to.

comment:16 Changed 2 years ago by exarkun

  • Owner set to exarkun
  • Status changed from new to assigned

comment:17 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to ashfall
  • Status changed from assigned to new

Thanks!

  1. twisted/internet/test/test_endpoints.py
    1. The docstring for assertConnectArgs has an error, it says "we don't only care". I think the "don't" is extra.
    2. Here are two points to consider, but probably not to do anything about, considering the point that will follow:
      1. The docstring for test_nameResolution should be more specific. This is a test for TCP6ClientEndpoint._nameResolution, so mention that method, rather than talking about "The endpoint" in general.
      2. test_ipv6AddressLiteral and test_resolvedHostConnect are playing that trick with return values. test_ipv6AddressLiteral is being particularly tricky, in changing the return type of connect from a Deferred to the value of the host argument. This is fragile: the implementation could change to assume that _resolvedHostConnect returns a Deferred`, which will cause this test to break.
    3. The tests inherited from ClientEndpointTestCaseMixin (via EndpointTestCaseMixin) cover the case where the host is an IPv6 literal. The extra tests added on TCP6EndpointsTestCase will cover a few cases where the host is a hostname. test_ipv6AddressLiteral seems like it's probably entirely redundant with one of the inherited test methods. This is potentially not ideal for two reasons:
      1. Having tests implemented in two different ways for these two different cases just makes the code more complicated. It's roughly twice as much code for a maintainer to understand and remember. It avoids benefiting from the mixin in the hostname case, because new tests added to the mixin won't be run against a hostname, only against an IPv6 address literal.
      2. The extra tests added on TCP6EndpointsTestCase rely on implementation details of the interface in a way that's really not necessary. The MemoryReactor is ultimately used by the endpoint implementation somehow. That's the really important part of the behavior. The things that happen in between the endpoint's connect and the reactor's connectTCP aren't specifically very important. As long as they happen somehow. The tests defined on the mixin only use the public interface of the endpoint - the interface we care about working in a specific way. That makes them more robust and easily maintainable.
    4. For those reasons, I think a better way to exercise the hostname cases of the endpoint would be to have another subclass of ClientEndpointTestCaseMixin with an implementation of createClientEndpoint that creates an endpoint pointed at a hostname instead of an IPv6 address literal. This should cover all of the cases, will involve having no extra, custom tests, and will allow both cases to be expanded by new tests added to ClientEndpointTestCaseMixin later on.
  2. twisted/internet/endpoints.py
    1. GAI_ADDRESS and GAI_ADDRESS_HOST don't need to be public.
    2. There's some trailing whitespace in the docstring for TCP6ClientEndpoint.__init__.

I think the testing change I suggest above will simplify the code overall quite a bit - enough so that I'm suggesting it here at this late stage in the development of this ticket. Ideally this would have been pointed out earlier in the process. However, I think the intermediate steps were a good learning process, so I don't think we should feel bad about this. :)

Thanks again! With the above changes made, I think this branch will be extremely close to finished!

comment:18 Changed 2 years ago by ashfall

  • Keywords review added
  • Owner ashfall deleted

Changes made as suggested. New build results

comment:19 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to ashfall

Thanks! This is extremely close now. :)

  1. GAI_ADDRESS and GAI_ADDRESS_HOST are still documented as public
  2. _myGetaddrinfo can probably just be _getaddrinfo. The "my" prefix is a C++-ism that doesn't see much usage in Python (it derives from the implicit this-> C++ supports, making it important to be able to easily differentiate between local variables and member variables; in Python, self. is mandatory).
  3. The unit tests look good now, except for the slight contortion TCP6EndpointNameResolutionTestCase.createClientEndpoint goes through. Instead of constructing the IPv6Address with a hostname and then re-assigning the host attribute, I suggest giving it the address literal to begin with. Apart from being simpler, this conforms to the requirements of IPv6Address more completely, since it should be impossible to ever get an IPv6Address instance with a host attribute that is not an IPv6 address literal (of course, you still want to pass a host name to the endpoint initializer).
  4. Most importantly, _nameResolution needs a unit test. The unit tests that do exist now do a good job of covering all of the rest of the implementation of the endpoint. However, since they monkey-patch _nameResolution, that method is never executed. There are two options.
    1. Write a dedicated unit test, exercising just _nameResolution. Below is a very rough sketch (with many problems, many coding standard violations, etc) of what this might look like.
    2. Monkey-patch at a lower-level, so that the actual _nameResolution does get called. This probably means monkey-patching deferToThread. A tricky thing here is that in this approach, it will be necessary to verify that deferToThread actually got called, not just that the correct results are produced. This is a slightly more attractive idea, but it's not immediately obvious to me how you would verify that deferToThread actually got called. Or... perhaps that's something assertConnectArgs could be in charge of. Perhaps that's a bit of a hack, but it gets the timing to work out right (in other words, assertConnectArgs must be called after deferToThread was called, so it's safe to assert that deferToThread was called (by checking some flag the fake deferToThread sets) inside assertConnectArgs). And now that I think about it, the fake deferToThread could also munge the result produces by the fake getaddrinfo. Re-order the results so a different address is selected, for example. That would demonstrate it is being used.
    def test_nameResolution(self):
        class StubThreadPoolReactor(MemoryReactor):
            def __init__(self):
                MemoryReactor.__init__(self)
                self._threadpool = object()


            def getThreadPool(self):
                return self._threadpool

        calls = []
        def fakeDeferToThreadPool(reactor, threadpool, f, *args, **kwargs):
            calls.append((reactor, threadpool, f, args, kwargs))
            return defer.Deferred()

        reactor = StubThreadPoolReactor()
        endpoint = endpoints.TCP6ClientEndpoint(reactor, 'ipv6.example.com', 1234)
        fakegetaddrinfo = object()
        endpoint._myGetaddrinfo = fakegetaddrinfo
        endpoint._deferToThreadPool = fakeDeferToThreadPool
        d = endpoint.connect(TestFactory())
        threadpool = reactor.getThreadPool()
        self.assertEqual(
            [(reactor, threadpool, simulated, ("ipv6.example.com", 0, AF_INET6), {})],
            calls)

Note that this is a version which assumes the implementation is switched to using deferToThreadPool instead of deferToThread. This is optional; I'm pretty sure the same idea applies if you keep using deferToThread.

comment:20 Changed 2 years ago by ashfall

  • Keywords review added
  • Owner ashfall deleted

Addressed the review points. New build results.

comment:21 Changed 2 years ago by ashfall

Filed #5823 to replace deferToThread in the endpoint, once it becomes possible to implement something non-threaded in future.

comment:22 Changed 2 years ago by exarkun

  • Owner set to exarkun
  • Status changed from new to assigned

comment:23 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to ashfall
  • Status changed from assigned to new

Thanks. This looks good to me now. Please go ahead and merge.

comment:24 Changed 2 years ago by ashfall

  • Resolution set to fixed
  • Status changed from new to closed

(In [35046]) Merge ipv6-tcp-client-endpoint-5695-3: An IPv6 TCP Client Endpoint

Author: ashfall
Reviewer: exarkun, glyph
Fixes: #5695

Added a new IPv6 stream endpoint, TCP6ClientEndpoint, in twisted.internet.endpoints, which accepts IPv6 host address literals as well as host names (which it resolves to an IPv6 address literal using socket.getaddrinfo).

Note: See TracTickets for help on using tickets.