Ticket #4888 enhancement new

Opened 2 years ago

Last modified 9 days ago

Hostname verification for HTTPS URIs in twisted.web.client.Agent.

Reported by: glyph Owned by:
Priority: normal Milestone:
Component: web Keywords:
Cc: ivank-twisted-bugs@… Branch: branches/hostname-verification-4888
Author: itamarst Launchpad Bug:

Description

When we implemented HTTPS for the new client API, we left out any form of hostname verification or certificate checking.

We should support it in some way.

Change History

1

  Changed 2 years ago by glyph

  • component changed from core to web

First we need to support just checking that the hostname matches as expected. Does Agent do that already? Then, we need to actually verify the peer's certificate.

My inclination would be, by default, to support it opportunistically, verifying when possible, emitting a warning when it's not possible.

There should be separate tickets for supporting each of: the MacOS X certificate store, the Linux conventional default (/etc/ssl/certs) , and the Windows thing, which there are all references to on #4023. This ticket could probably be for the Linux default. Or, maybe just for hostname checking against the (unverified) cert, if we don't do that yet.

There should be some kind of option, which says "for real, be secure, don't transmit my request in the clear, don't accept a response in the clear". In my opinion, this flag should be specified independently of the protocol, since maybe there could be other transports besides HTTPS that could be secure. Unfortunately for compatibility, this is probably what users mean when they take the care to type " https://" into a box that takes an URL.

I expect that there will be some further discussion first though. Thoughts?

2

  Changed 18 months ago by exarkun

Or, maybe just for hostname checking against the (unverified) cert, if we don't do that yet.

I think this is a good idea. It's not terribly useful to verify hostnames without verifying certificates, but the implementation of those two things are fairly independent.

3

  Changed 17 months ago by glyph

There's a  relevant RFC, 6125 that an implementor of this feature should be aware of.

4

  Changed 17 months ago by glyph

See also #5446 and #5445 for related tasks.

5

  Changed 17 months ago by ivank

  • cc ivank-twisted-bugs@… added

6

  Changed 3 months ago by itamarst

  • branch set to branches/hostname-verification-4888
  • branch_author set to itamarst

(In [37200]) Branching to 'hostname-verification-4888'

7

follow-up: ↓ 10   Changed 3 months ago by radix

This is a good feature, but we shouldn't have our own certificate database.

  • It's a liability to us in that we'd have to maintain it, even if we generate it from mozilla's DB. curl maintainers learned this the hard way. so are python-requests maintainers.

and it's a liability to our users:

  • the certificate store would become out of date (probably without them even knowing it)
  • administrators don't have management tools that work with this database included in Twisted

You should load certificates out of /etc/ssl/certs if it exists. If you want to solve this for users, then maybe include certs in the windows .exe distribution, but this shouldn't go into Twisted upstream.

8

follow-up: ↓ 9   Changed 3 months ago by radix

On top of that, why is the CA bundle being added in this branch? it seems like it should be a part of a different ticket and not be bound to HTTPS hostname verification.

9

in reply to: ↑ 8   Changed 3 months ago by glyph

Replying to radix:

On top of that, why is the CA bundle being added in this branch? it seems like it should be a part of a different ticket and not be bound to HTTPS hostname verification.

+1 to that. This should be split into as many discrete actions as possible, and shipping a cert bundle is obviously the most contentious.

10

in reply to: ↑ 7   Changed 3 months ago by glyph

Replying to radix:

This is a good feature, but we shouldn't have our own certificate database. * It's a liability to us in that we'd have to maintain it, even if we generate it from mozilla's DB. curl maintainers learned this the hard way. so are python-requests maintainers. and it's a liability to our users: * the certificate store would become out of date (probably without them even knowing it) * administrators don't have management tools that work with this database included in Twisted You should load certificates out of /etc/ssl/certs if it exists. If you want to solve this for users, then maybe include certs in the windows .exe distribution, but this shouldn't go into Twisted upstream.

I mostly agree, except for a few minor points:

Let's not the perfect be the enemy of the good here. We should not be in the cert-bundle shipping business if we can avoid it, but an out-of-date CA bundle is still better security in many cases than having to fall back to cleartext. For example, pip's reluctance to ship a CA bundle lead to years of everyone just downloading and executing stuff plaintext because security was too hard. It would have been a lot better for everyone if they - and we - had just shipped a CA bundle that wasn't getting updates and then worked to figure out a way to update it later. Thanks for the prompting though, because I noticed that the ticket for looking at the platform certificate store didn't actually have any API references on it and therefore it might not have been as clear as it could have what I was proposing there.

11

  Changed 3 months ago by glyph

As to the implementation in the branch itself; looking at the function implemented for matching certificate names, I'm wondering why we have one function for extracting the names and one function for matching the names, rather than a matcher object that wraps the certificate object.

For example, RFC 6125 seems very cagey on the concept of wildcard certificates, and wish that they weren't a thing, conceding that support is necessary only for "backward compatibility with deployed infrastructure". It seems quite likely to me that some future X509 certificate or certificate authority metadata would add a field saying "no wildcards please" or otherwise modify the matching algorithm, based on some information that can't be represented in a list of certificate names (plus wildcards) represented by the certificate. I would like this API to be as abstract – and therefore as future-proof – as possible, and so have an API which is just like "here's an object, it matches a hostname or it doesn't, don't worry about how".

(Also always include epytext @type fields on all parameters, especially parameters representing strings of characters or bytes.)

12

follow-up: ↓ 14   Changed 3 months ago by itamar

I should note that I'm doing this for one of my clients; this branch is just my way of dumping my first pass somewhere public. My plan therefore is to first get this to the point where it works, even if it's not mergeable. Later, in my free time or with the help of others, I will split this up into multiple branches as necessary, address review comments, write documentation etc..

13

  Changed 3 months ago by itamar

As far as shipping CA bundle: it's not ideal, but it's better than nothing (as glyph noted in the pip example.) My current thoughts are that we should use the platform cert database if possible, falling back to a bundled cert database that we refresh on every release (if not more frequenly). Initially it'd probably be easiest to only support some set of Linux distributions, but we could add better defaults for other platforms over time.

14

in reply to: ↑ 12   Changed 3 months ago by glyph

Replying to itamar:

I should note that I'm doing this for one of my clients; this branch is just my way of dumping my first pass somewhere public. My plan therefore is to first get this to the point where it works, even if it's not mergeable. Later, in my free time or with the help of others, I will split this up into multiple branches as necessary, address review comments, write documentation etc..

I thought something like this might be the case, since you seem to be loading up the branch with lots of code :). I'm offering comments now so there isn't one gigantic dump of review feedback at the end - and I thank radix for doing the same. Also, this is security code, and benefits from extra eyeballs even more than usual.

On that note, regarding another changeset:

  1. getCACertificates is a crappy name, in no small part because it starts with "get". Call it authorityCertificates or certificateAuthorityCertificates or trustedRoots or something.
  2. This shouldn't return X509 objects directly. They probably segfault less than we're used to, but we already have a perfectly reasonable wrapper, twisted.internet.ssl.Certificate. Return one of those; if you need to turn it back into an X509, you can.
  3. Oh my goodness, do not tell distributors to patch a function! Make a plugin, aggregate the certificates from the plugins.
  4. Loading multiple certificates from a single file is independently useful of loading platform CAs; in particular I believe this could be helpful when loading trust chains. That code should be in a different function. Certificate.manyFromFile perhaps.

15

follow-up: ↓ 16   Changed 3 months ago by exarkun

For example, pip's reluctance to ship a CA bundle lead to years of everyone just downloading and executing stuff plaintext because security was too hard. It would have been a lot better for everyone if they - and we - had just shipped a CA bundle that wasn't getting updates and then worked to figure out a way to update it later.

I don't think we should actually be in the business of managing and distributing such a bundle, but it sounds like enough other people are that it'll probably happen.

So. Since apparently pip, python-requests, and Twisted should all have been bundling their own private CA databases for years now, how about creating a new, shared, independent package that consists only of such a bundle and can be depended on by all of these projects? Hopefully the numerous advantages of such an approach over Twisted (and each of those other projects, perhaps plus others) are obvious, but if not:

  • It can be updated more frequently than Twisted itself is updated. This is fairly important. If you want to stop trusting a compromised authority, you want to stop trusted them today, not three months from now.
  • It provides consistent trust behavior across multiple projects. It is better if all HTTPS software on a host comes to the same conclusion about a particular certificate (ie, the system provides a CA database). Failing that, the more software agrees the better. Having Twisted and pip always agree is certainly preferable to allowing them to use desynchronized CA databases and come to different conclusions about potentially compromised hosts.
  • It is less work. It is one database to maintain and distribute instead of N.
  • There are more people to help with the work. Many more people use and care about pip than use and care about Twisted.
  • Platform vendors could potentially someday be some of the people helping. For example, Ubuntu could start maintaining an Ubuntu-specific version of this package which just points at the exact same certificates as are included in the ca-certificates package.

16

in reply to: ↑ 15   Changed 3 months ago by glyph

Replying to exarkun:

I don't think we should actually be in the business of managing and distributing such a bundle, but it sounds like enough other people are that it'll probably happen.

I would prefer to avoid this as well, and you make some good points. And we haven't even started talking about licensing issues! But, in any case, as we keep saying, this ticket is not about including a CA bundle or in fact any kind of trust configuration, but about hostname matching, so we should stop filling up the comments here. I have opened a new ticket specifically about this problem, described in such a way to make my feelings clear (i.e. we should not do it, I'm not entirely convinced that the ticket for fixing this the right way is actually sufficiently harder than resolving all these issues).

17

  Changed 3 months ago by glyph

PLATFORM = object() is pretty crummy. At the very least, there's no excuse for object() any more because we now have twisted.python.constants, and you can have a symbolic placeholder that actually has a nice stringification.

But, it would also be handy to say "the platform, plus some additional trusted roots that I added myself". Given that you're enumerating the full list of platform CA certs anyway, why not provide an object that offers at least an __iter__ if not a plain list of certs?

18

  Changed 3 months ago by glyph

"standard" security rules and "well known" certificate authorities are both kind of weasel words, too. Since there's no actual standard governing which CAs to use, maybe "BrowserLikeContextFactory" would be a better name for it? (LockIconContextFactory, ha ha?)

19

  Changed 9 days ago by dreid

Ok, this work actually looks pretty great. But I couldn't help but notice that StandardWebContextFactory does not:

  1. Document the level of verification it does (it only implies that it is similar to browsers) this makes making an informed decision about if the level of verification it does perform is suitable for your application just a little bit harder.
  2. Provide any mechanism for specifying the caCerts. Seems like StandardWebContextFactory.__init__ sould take a caCerts keyword argument and pass it to OpenSSLCertificateOptions with it defaulting to PLATFORM.
Note: See TracTickets for help on using tickets.