Opened 5 years ago

Closed 5 years ago

Last modified 3 years ago

#4023 enhancement closed fixed (fixed)

Support HTTPS URIs in `twisted.web.client.Agent`

Reported by: exarkun Owned by:
Priority: normal Milestone:
Component: web Keywords:
Cc: ivank, itamar, glyph, radix, jml Branch: branches/agent-https-4023-2
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description

#3987 introduced Agent with a request method that accepts URIs. It only implemented support for HTTP URIs, though. It should be extended to support HTTPS as well. This should include proper certificate checking (and a way to skip this, for when you really just want the page, darn it).

Change History (27)

comment:1 Changed 5 years ago by exarkun

  • Owner changed from jknight to exarkun
  • Status changed from new to assigned

comment:2 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/agent-https-4023

(In [28142]) Branching to 'agent-https-4023'

comment:3 Changed 5 years ago by ivank

  • Cc ivank added

comment:4 Changed 5 years ago by exarkun

The branch basically implements this feature now. I'm pausing to contemplate how to allow custom certificate verification. The obvious solution involves making Agent accept a context factory factory. If we can avoid the double factory though, that'd probably be nice.

comment:5 Changed 5 years ago by exarkun

  • Cc itamar glyph radix added

Option 1

context = appDefinedContextFactory()
agent = Agent(reactor)
d = agent.request(method, url, context=context, verifyHTTPSHostname=True)

Here, context is an OpenSSL.SSL.Context instance which the application has configured however it wants. By passing True (which may be the default) for verifyHTTPSHostname, the app is asking the agent to mutate that context object by adding a verification step that ensures the url hostname matches the cert hostname.

context also probably has a default which just does enough for SSL to happen.

Nice things about this option:

  • Disabling security is explicit, hard to do by accident
  • Allows almost arbitrary customization of the SSL context

Not as nice things:

  • Doesn't easily allow customization of the certificate verification step (either use twisted's logic or supply your own, and ne'er the 'twain shall meet)
  • Two new arguments to Agent.request (but at least they're optional)

Option 2

context = appDefinedContextFactory(url, Agent.standardHostnameVerificationMethod)
agent = Agent(reactor)
d = agent.request(method, url, context=context)

This differs from Option 1 in that the application is responsible for setting up hostname verification on the context object. Help is provided by making the standard verification scheme available as part of the public agent api, but actually gluing things together is the application's job.

Advantages:

  • Still somewhat secure by default
  • No mutation of the supplied context object

Disadvantages:

  • More work for the application than Option 1
  • One new (optional) request parameter
  • Same cross-platform issues in the default case
  • Handling redirects will be problematic, since a redirect to a different domain will require a different verification function, and the application won't be around to help provide it.
  • Security is easily disabled by accident if you're not paying attentioon

Option 3

class NewContextFactory:
    def getContext(self, url):
        return someContext

agent = Agent(reactor, contextFactory=NewContextFactory())
agent.request(url)

Advantages

  • Requires no context mutation
  • Supports redirects well
  • Doesn't involve any new parameters to request

Disadvantages

  • Requires new kind of context factory

Orthogonal to all the above choice, there's the question of whether or not to require certificates to validate by default in these APIs. New versions of pyOpenSSL offer an API to use platform-supplied CAs, but these APIs seem to only work on certain Linux distributions (I've tried Debian and Ubuntu). They do not work on Windows or OS X.

comment:6 Changed 5 years ago by exarkun

  • Branch changed from branches/agent-https-4023 to branches/agent-https-4023-2

(In [28821]) Branching to 'agent-https-4023-2'

comment:7 follow-up: Changed 5 years ago by exarkun

So, in a discussion with jknight on IRC, the topic of whether trying to use the platform-supplied CAs by default is even a useful feature. One might expect that it is, but slightly deeper consideration reveals some arguments to the contrary:

  1. http://lwn.net/Articles/380140/
  2. http://lwn.net/Articles/372264/

A quick look around will reveal more issues like this one. I think this suggests it's not worth trying to make the default be to verify certificates against a platform-supplied list of certificate authorities and apply the standard hostname checks. This will still be possible by providing a specific context factory, but I'm going to make the default be the same as for getPage or urllib.urlopen.

comment:8 in reply to: ↑ 7 Changed 5 years ago by glyph

Replying to exarkun:

So, in a discussion with jknight on IRC, the topic of whether trying to use the platform-supplied CAs by default is even a useful feature. One might expect that it is, but slightly deeper consideration reveals some arguments to the contrary:

  1. http://lwn.net/Articles/380140/
  2. http://lwn.net/Articles/372264/

In my opinion, the right thing to do about these is to fix the platform's list of trusted CAs, not to have every framework and application arbitrarily decide on some subset of those CAs or its own scheme for doing the validation.

The paid-CA model for security is broken; at this point, everybody knows that. These URLs don't sound like news to me; just a popularization of an attack that's been a problem for a long time.

A quick look around will reveal more issues like this one. I think this suggests it's not worth trying to make the default be to verify certificates against a platform-supplied list of certificate authorities and apply the standard hostname checks. This will still be possible by providing a specific context factory, but I'm going to make the default be the same as for getPage or urllib.urlopen.

Which is to say ... no verification at all? How is that better? This is not a rhetorical question, I recognize that it might actually be better in some ways. But this is a pretty important security issue, and the documentation needs to express a very clear rationale.

comment:9 follow-up: Changed 5 years ago by exarkun

The paid-CA model for security is broken; at this point, everybody knows that. These URLs don't sound like news to me; just a popularization of an attack that's been a problem for a long time.

It might not be news, but this idea of relying on the root CAs and claiming that provides you with security might bear re-evaluation now (or it mightbe past time to re-evaluate it).

Which is to say ... no verification at all? How is that better?

Yes, that was the plan. I can think of basically one reason that might be used as an argument that this is worse than respecting a handful of (widely recognized) root CAs. It lowers the bar for being MitM'd from:

  • anyone who can subvert your DNS and spend a few thousand dollars on bogus certs

to

  • anyone who can subvert your DNS

The two reasons that could be used as arguments that it is better rather than worse are:

  1. It allows everyone to use HTTPS, not just people who buy certificates from a root CA.
  2. It allows code to work across multiple platforms and against builds of OpenSSL w/o a root CA list.

The second of these two points could be mitigated slightly (by someone figuring out how to get the root CA list on Windows), but there'll still always be the possibility for an OpenSSL build on other platforms that's doesn't know where the root CA list is.

The first seems pretty fundamental though.

So which argument wins?

comment:10 in reply to: ↑ 9 Changed 5 years ago by glyph

Replying to exarkun:

The paid-CA model for security is broken; at this point, everybody knows that. These URLs don't sound like news to me; just a popularization of an attack that's been a problem for a long time.

It might not be news, but this idea of relying on the root CAs and claiming that provides you with security might bear re-evaluation now (or it mightbe past time to re-evaluate it).

I guess my argument isn't so much that it's not past time to re-evaluate it, but that it's not Twisted's job to re-evaluate it. There should at least be a convenient way to say "give me the platform's behavior, because I know what the platform does and I trust it". Personally I think that should be the default, and "give me the Twisted smarty-pants slightly-better-threat-model behavior, because I trust the Twisted guys more than I trust my platform" should be the thing you have to ask for.

My opinion is that both the platform and the application have way more information than Twisted about what's desirable from a security perspective. Both will routinely make wrong decisions about security, but I don't think we should monkey with those decisions unless we have a really good understanding of the security properties of the decisions that we're making. Mechanism not policy etc.

Which is to say ... no verification at all? How is that better?

Yes, that was the plan. I can think of basically one reason that might be used as an argument that this is worse than respecting a handful of (widely recognized) root CAs. It lowers the bar for being MitM'd from:

  • anyone who can subvert your DNS and spend a few thousand dollars on bogus certs

to

  • anyone who can subvert your DNS

This is definitely a smaller set :).

The two reasons that could be used as arguments that it is better rather than worse are:

  1. It allows everyone to use HTTPS, not just people who buy certificates from a root CA.

So, if we use the default CA list, couldn't we still get this benefit, because there are existing tools to add your own self-signed certificate to that list? (More below)

  1. It allows code to work across multiple platforms and against builds of OpenSSL w/o a root CA list.

This begs the question: if stock OpenSSL is built without a root CA list, is there a way to affect what it thinks the root CA list is, with runtime configuration? Environment variables, config files, or something like that?

The second of these two points could be mitigated slightly (by someone figuring out how to get the root CA list on Windows), but there'll still always be the possibility for an OpenSSL build on other platforms that's doesn't know where the root CA list is.

The first seems pretty fundamental though.

So which argument wins?

So, the behavior that I actually want is slightly different than any of these options.

For some platforms at least, there's some UI for customizing the platform-bundled list of certificates, selecting whether they are trusted or not, and adding your own. /etc/ca-certificates.conf, for example. I'm not 100% sure if that affects the behavior of OpenSSL directly, but at a minimum you can root around in /usr/share/ca-certificates and delete stuff.

I am not familiar with any of the details, but the behavior that I want is: if a site administrator is familiar with the platform tools for customizing SSL certificate trust options, and sets those options according to a (hopefully well-thought-out) security policy, Twisted should obey that policy rather than trying to determine its own.

There's nothing in particular about the 'root'-ness of a CA which makes it more trustworthy. For example: if one of them goes rogue in a well-known or well-publicised incident, every platform in the world has a security team set up to distribute rapid-response updates to events like that, and we don't.

So, while I'm not a big fan of "no verification at all", I do think it's a better option than "verify with our own ad-hoc certificate list".

I'm not intimately familiar with the specifics here though. Maybe the general concept of "platform CA list" doesn't map to "OpenSSL root CA list" in the same way that I'm thinking it does. If not, then I'll have to re-think in terms of those options.

(And finally, of course, it's worth stating this assumption: this API is really insufficiently wide to express the real right solution, which would be to introduce UI elements that allow the user or a trusted proxy for the user to make decisions about the security properties of a particular application-level transaction, but that's at least a dozen other hard problems...)

comment:11 Changed 5 years ago by exarkun

It might not be news, but this idea of relying on the root CAs and claiming that provides you with security might bear re-evaluation now (or it mightbe past time to re-evaluate it).

I guess my argument isn't so much that it's not past time to re-evaluate it, but that it's not Twisted's job to re-evaluate it. There should at least be a convenient way to say "give me the platform's behavior, because I know what the platform does and I trust it". Personally I think that should be the default, and "give me the Twisted smarty-pants slightly-better-threat-model behavior, because I trust the Twisted guys more than I trust my platform" should be the thing you have to ask for.

I'm not sure I agree with the premise here. Let's ignore all of the specifics of SSL for a moment. Consider a third-party library X which claims to provide security properties Y. Should a Twisted API which uses library X claim to provide security properties Y in the face of non-trivial evidence to the contrary Should the implementation of the Twisted API go out of its way to use library X in the way it claims are necessary to provide security properties Y, regardless of the answer to the previous question?

This is definitely a smaller set :).

Yes, and if there were no downsides it would be obvious what we should do.

The two reasons that could be used as arguments that it is better rather than worse are: 1. It allows everyone to use HTTPS, not just people who buy certificates from a root CA.

So, if we use the default CA list, couldn't we still get this benefit, because there are existing tools to add your own self-signed certificate to that list? (More below)

To some extent, certainly. However, for most people, what's going to happen is that an HTTPS request is going to fail and that's going to be the end of the story.

  1. It allows code to work across multiple platforms and against builds of OpenSSL w/o a root CA list.

This begs the question: if stock OpenSSL is built without a root CA list, is there a way to affect what it thinks the root CA list is, with runtime configuration? Environment variables, config files, or something like that?

I think that the SSL_CERT_DIR environment variable may specify a /etc/ssl/certs-style directory which will override the build-time options that control the behavior of the particular "use the platform-supplied CA certs" API which I know of.

I am not familiar with any of the details, but the behavior that I want is: if a site administrator is familiar with the platform tools for customizing SSL certificate trust options, and sets those options according to a (hopefully well-thought-out) security policy, Twisted should obey that policy rather than trying to determine its own.

That would be nice. I don't know how to implement it on Windows.

So, while I'm not a big fan of "no verification at all", I do think it's a better option than "verify with our own ad-hoc certificate list".

That's somewhat out of left field, so I'll assume it's an editing oversight. For what it's worth though, I agree, and I'm not proposing that Twisted ships with its own ad-hoc CA certificate list.

(And finally, of course, it's worth stating this assumption: this API is really insufficiently wide to express the real right solution, which would be to introduce UI elements that allow the user or a trusted proxy for the user to make decisions about the security properties of a particular application-level transaction, but that's at least a dozen other hard problems...)

The underlying OpenSSL API is wide enough to express this. I think pyOpenSSL might need to be improved a bit to expose this, though. And the proposed Agent API is wide enough, because it will let applications provide arbitrary SSL context objects.


To use the platform-supplied CA certs by default, one of two things needs to happen:

  1. Kick Windows into a second-class status, where by default either all HTTPS requests fail or they all succeed with no particular security guarantees.
  2. Find a way to ask for the platform-supplied CA certificates on Windows.

I have very little interest in the latter. Of the former, the first option strikes me as a really terrible behavior for an API to have and the latter seems to be a worse security guarantee than if the API did no certificate verification on any platform. Perhaps I am mistaken about one of those, though.

Hence my plan to not use the platform-supplied CA certificates by default in this API. This does not preclude the possibility of an application specifically asking for them, though.

Thanks for your comments.

comment:12 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  • Status changed from assigned to new

Further discussion of the defaults may be productive, but I don't want it to block this branch for too long. In that spirit, a version of the proposed change which implements the no verification default is now up for review in the branch. The only thing that should need to change in order for the default to become something else (like standard web browser cert verification, for example) is the default value for Agent.__init__'s contextFactory parameter.

comment:13 Changed 5 years ago by glyph

You're right that discussion of the default shouldn't block this branch.

As I have considered it more, the no-verification case may be the best of a bad lot of options. There are many other things to consider, if we wanted to provide a reasonable security guarantee on an HTTP API. For example, redirections between 'http' and 'https'; disallowing 'http' URLs entirely; requiring EV certificates or a particular certificate.

I did find these two links which may be helpful in determining how to implement platform-certificate-store access on Windows:

The good news is that it seems pretty straightforward, but the bad news is that it requires a ton of additional C code.

comment:15 Changed 5 years ago by exarkun

Note for later, when someone wants to implement a helper for hostname checking:
http://svn.osafoundation.org/m2crypto/trunk/M2Crypto/SSL/Checker.py

comment:16 Changed 5 years ago by jml

  • Owner set to jml

comment:17 Changed 5 years ago by jml

I've read through the comments, and on the understanding that we shouldn't block on the default, I've done the following review.

exarkun, the code is great, as is so frequently the case. Most of my comments below are about text and formatting.

In the tests:

  • The docstring for test_connectHTTPS says when passed a scheme of C{'http'}. It should say when passed a scheme of C{'https'}.
  • In the same test, the XXX comment "Can't I assert something better" isn't going to help anyone except you. Maybe not even you. Please either delete the comment or change it to say why what you've got sucks and what you'd expect in a better solution.
  • The docstring for test_connectHTTPSCustomContextFactory is technically accurate, but it would be great if you could break it up into multiple sentences.
  • In the same test, I don't think you need to re-assert the stuff about expectedHost and expectedPort from the previous test. Likewise for checking the protocol. I guess I might be missing something though.

In twisted/web/client.py

  • I dislike the indentation on the line following d = defer.fail(SchemeNotSupported(. I think it should be only indented by four spaces rather than the eight you have there. Doesn't matter though.

In the documentation change

  • Wow! I wasn't expecting this. Thanks!
  • The doc says "ssl context factory", "SSL" should be in caps.
  • "This allows it to return a context object which verifies the server's certificate matches the URL being requested." I think "verifies that" would make the whole sentence clearer.
  • Do we have any checks to make sure code in docs runs properly?
  • I guess we don't comply to the regular coding standard for code in docs?

I'll try to get to a follow-up review as soon as you reply.

comment:18 Changed 5 years ago by jml

  • Keywords review removed
  • Owner changed from jml to exarkun

comment:19 Changed 5 years ago by exarkun

  • Cc jml added
  • Keywords review added
  • Owner exarkun deleted

The docstring for test_ ...

Fixed.

In the same test, the XXX comment ...

Fixed.

The docstring for test_ ...

I changed it a bit. I think I need more practice in this area. I hope I did a good job, let me know if I didn't.

In the same test, I don't think you need to re-assert ...

Yea, I suppose it's not covering a different codepath from the previous test. Removed.

I dislike the indentation on the line following ...

That's where emacs wants to put it. Should I get a new python mode for emacs? Change some setting? I could just move it, but I'm as like as not to move it back again next time I touch this code if I don't change emacs somehow.

Wow! I wasn't expecting this. Thanks!

:)

The doc says "ssl context factory", "SSL" should be in caps.

Fixed.

"This allows it to return a context object which ...

Fixed.

Do we have any checks to make sure code in docs runs properly?

Nope, we're sorely lacking in that area. One possibility, though, is that I could dump the code listing into separate files (actual python files) and use a lore include-type dealie (lore has such a thing, used earlier in this doc). Then at least it's easier to run the example code and see if it doesn't work.

I guess we don't comply to the regular coding standard for code in docs?

For things like docstrings, unit tests, etc? Yea, not really. I'm open to discussion about it, but it's probably part of a larger discussion, not specifically about this ticket.

Fixes all in r28842.

Thanks very much for the review!

comment:20 Changed 5 years ago by jml

  • Keywords review removed
  • Owner set to exarkun

All the changes look good to me. I agree that the code-in-docs stuff is not in scope for this ticket.

Please land this change :)

comment:21 Changed 5 years ago by exarkun

(In [28843]) Fix an incompatibility with pyOpenSSL 0.9 and earlier, and another random bug pointed out by pyflakes

refs #4023

comment:22 Changed 5 years ago by jml

Changes look good. Please land.

comment:23 Changed 5 years ago by exarkun

(In [28844]) Skip the Agent HTTPS test which needs to create a Context if OpenSSL is not installed

refs #4023

comment:24 Changed 5 years ago by exarkun

(In [28846]) Fix three trivial doc issues

refs #4023

comment:25 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [28847]) Merge agent-https-4023-2

Author: exarkun
Reviewer: jml
Fixes: #4023

Add support for HTTPS URLs to twisted.web.client.Agent.

comment:26 Changed 4 years ago by <automation>

  • Owner exarkun deleted

comment:27 Changed 3 years ago by glyph

The security discussion here seemed to end halfway through, but some tickets have been filed for additional work in the area of verification and validation of certificates and certificate authorities. See #4888, #5446, and #5445 for more information.

Note: See TracTickets for help on using tickets.