Opened 10 years ago

Last modified 3 years ago

#4536 defect new

Credentials materials are compared unsafely throughout Twisted

Reported by: Jean-Paul Calderone Owned by:
Priority: highest Milestone:
Component: core Keywords: security
Cc: zooko@…, ivank, warner Branch: branches/password-comparison-4536-2
branch-diff, diff-cov, branch-cov, buildbot
Author: ivank, mithrandi

Description (last modified by Thijs Triemstra)

Twisted has many timing attack vulnerabilities identical to the one found in Keyczar:

Originally reported by Ivan Kozik.

Attachments (4)

fix-slowStringCompare-for-py24.patch (2.8 KB) - added by ivank 10 years ago.
apply on top of the branch
fix-slowStringCompare-for-py24-2.patch (2.7 KB) - added by ivank 10 years ago.
apply on top ot the branch; obsoletes older patch
fix-slowStringCompare-for-py24-3.patch (2.6 KB) - added by ivank 10 years ago.
Reduces redundant test code. Apply on top ot the branch; obsoletes older patch
test-timing.py (2.1 KB) - added by zooko 7 years ago.
measures the time it takes to compare strings

Download all attachments as: .zip

Change History (66)

comment:1 Changed 10 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/password-comparison-4536

(In [29523]) Branching to 'password-comparison-4536'

comment:2 Changed 10 years ago by Jean-Paul Calderone

Author: exarkunivank

comment:3 Changed 10 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Glyph deleted
Priority: highhighest

This is still mostly ivank's work, but I made a few changes, including completely deleting twisted/python/otp.py and adding some test coverage to modified code which was previously untested. Please review.

comment:4 Changed 10 years ago by zooko

Owner: set to zooko
Status: newassigned

comment:5 Changed 10 years ago by zooko

Cc: zooko@… added
Status: assignednew

comment:6 Changed 10 years ago by zooko

Status: newassigned

comment:7 Changed 10 years ago by zooko

Keywords: review removed
Owner: changed from zooko to Jean-Paul Calderone
Status: assignednew

Okay I've reviewed this patch and it is okay.

There are a couple of things that are changed to use slowStringCompare where I'm not sure that it is necessary:

http://twistedmatrix.com/trac/browser/branches/password-comparison-4536/twisted/conch/client/knownhosts.py?rev=29614#L266

Is it important to prevent someone who can submit matchesHost requests from learning the hashed hostname?

http://twistedmatrix.com/trac/browser/branches/password-comparison-4536/twisted/conch/checkers.py?rev=29675#L169

Is it important to prevent someone who can submit checkKey requests from learning the ssh public key? I think of ssh public keys as being public, so I don't think there's any harm in letting an attacker know the ssh public key.

comment:8 Changed 10 years ago by Jean-Paul Calderone

There are a couple of things that are changed to use slowStringCompare where I'm not sure that it is necessary:

I think you're right about these. I had the same doubts as I was looking over the code. At the time, I left the changes in because I thought it couldn't hurt to be overly cautious. Taken to the extreme, that would leave Twisted with no uses of == though. ;) Since you also doubt the use of these, I'm going to go ahead and switch the code back to naive string comparison.

Thanks for the review!

comment:9 Changed 10 years ago by Jean-Paul Calderone

(In [29826]) Remove two non-productive uses of slowStringCompare

  1. IKnownHostEntry.matchesHost is used to find a host key fingerprint in a local known_hosts files.
  2. SSHPublicKeyDatabase.checkKey checks the public part of an SSH keypair. Public keys are public.

refs #4536

comment:10 Changed 10 years ago by Jean-Paul Calderone

(In [29827]) @since

refs #4536

comment:11 Changed 10 years ago by Jean-Paul Calderone

Author: ivankivank, exarkun
Branch: branches/password-comparison-4536branches/password-comparison-4536-2

(In [29828]) Branching to 'password-comparison-4536-2'

comment:12 Changed 10 years ago by Jean-Paul Calderone

(In [29834]) valid epytext

refs #4536

comment:13 Changed 10 years ago by Jean-Paul Calderone

Most recent build results look good. Merging.

comment:14 Changed 10 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [29835]) Merge password-comparison-4536-2

Author: ivank, exarkun Reviewer: exarkun, zooko Fixes: #4536

Change comparisons of sensitive materials to use a helper function which is resistent against leaking timing information.

comment:15 Changed 10 years ago by Tristan Seligmann

Resolution: fixed
Status: closedreopened

Reverted in [29875]: the exception raised for unicode needs to be a warning.

comment:16 Changed 10 years ago by Tristan Seligmann

Keywords: review added
Owner: Jean-Paul Calderone deleted
Status: reopenednew

comment:17 Changed 10 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Tristan Seligmann

Thanks for fixing this. I definitely messed up merging the previous version.

A couple simple things:

  1. If you feel like it, make the test use TestCase.flushWarnings. If you can't find documentation talking about this, please also file a ticket for adding such. :) Or even if it exists but isn't easily discoverable, we should fix that too.
  2. The test method name looks like it should be changed.

This is going to result in deprecation warnings being emitted at the code in Twisted which directly calls slowStringCompare, even though this code is fine. It's the code that constructs UsernamePassword instances with unicode or calls checkPassword or similar APIs that's a problem. So we may also want to add a layer of deprecation warnings to those APIs so that applications get a warning pointing at their code. So file a ticket for that as well.

Please merge when you've addressed those points.

comment:18 Changed 10 years ago by Tristan Seligmann

(In [29877]) Change test name and use flushWarnings. (refs #4536)

comment:19 Changed 10 years ago by Tristan Seligmann

I filed #4607 to cover the latter issue.

comment:20 Changed 10 years ago by Tristan Seligmann

Resolution: fixed
Status: newclosed

(In [29879]) Merge password-comparison-4536-2

Author: ivank, exarkun, mithrandi Reviewer: exarkun, zooko Fixes: #4536

Change comparisons of sensitive materials to use a helper function which is resistent against leaking timing information.

comment:21 Changed 10 years ago by Tristan Seligmann

Resolution: fixed
Status: closedreopened

(In [29880]) Reopens #4536

Revert the merge *again*, sigh; u'\xff' == '\xff' is False, but slowStringCompare(u'\xff', '\xff') returns True in this form.

comment:22 Changed 10 years ago by Tristan Seligmann

Author: ivank, exarkunivank, exarkun, mithrandi
Keywords: review added
Owner: Tristan Seligmann deleted
Status: reopenednew

Okay, I've (hopefully) fixed the unicode comparison behaviour. Also, I'd appreciate some guidance as to whether the news fragment should mention the unicode deprecation thing.

comment:23 Changed 10 years ago by ivank

Cc: ivank added

comment:24 Changed 10 years ago by Screwtape

Keywords: added; review removed
Owner: set to Tristan Seligmann

Looks good to me. +1 for merging

comment:25 Changed 10 years ago by Tristan Seligmann

Resolution: fixed
Status: newclosed

(In [29906]) Author: ivank, exarkun, mithrandi Reviewer: exarkun, zooko, Screwtape Fixes: #4536

Change comparisons of sensitive materials to use a helper function which is resistent against leaking timing information.

comment:26 Changed 10 years ago by Tristan Seligmann

Resolution: fixed
Status: closedreopened

(In [29907]) Reopens #4536

The tests don't work on Python 2.4.

Changed 10 years ago by ivank

apply on top of the branch

comment:27 Changed 10 years ago by ivank

Keywords: review added; removed

Please review. (maybe apply that patch to the branch first?)

Changed 10 years ago by ivank

apply on top ot the branch; obsoletes older patch

comment:28 Changed 10 years ago by ivank

Owner: Tristan Seligmann deleted
Status: reopenednew

Changed 10 years ago by ivank

Reduces redundant test code. Apply on top ot the branch; obsoletes older patch

comment:29 Changed 10 years ago by Jean-Paul Calderone

(In [29956]) Apply fix-slowStringCompare-for-py24-3.patch

refs #4536

comment:30 Changed 10 years ago by Jonathan Lange

Keywords: review removed
Owner: set to Jean-Paul Calderone

Thanks for the work done so far everyone! Some questions:

  • I don't get why slowStringCompare emits a DeprecationWarning when passed unicode strings. It's new, it can do whatever it wants, surely.
  • I would like to see _compare in test_unicodeComparison split up. I think the test is too big & too hard to read. Unfortunately, no ideas leap to mind. Thoughts?
  • Do the tests actually pass?

Reassigning to exarkun since he touched it last.

comment:31 in reply to:  30 Changed 10 years ago by Tristan Seligmann

Replying to jml:

  • I don't get why slowStringCompare emits a DeprecationWarning when passed unicode strings. It's new, it can do whatever it wants, surely.

slowStringCompare is new, but all of the places where it is used are not new; putting the DeprecationWarning in slowStringCompare was easier than putting it in all of the places that it is now used. (Some of them probably aren't affected, but I think most of them are.)

comment:32 Changed 10 years ago by Jonathan Lange

Even so, it's a new public API in itself. Under what conditions would we actually remove unicode support from slowStringCompare?

Although it's more work, I do think it's more correct to have the deprecation warnings in the call sites.

comment:33 in reply to:  32 Changed 10 years ago by Tristan Seligmann

Replying to jml:

Even so, it's a new public API in itself. Under what conditions would we actually remove unicode support from slowStringCompare?

The original version raised an exception, instead of a warning. I reverted the branch and implemented the current fix once I noticed breakage in various software (such as axiom.userbase). I can't really comment too much on the design decisions beyond that, but I guess it's really hard to implement the function correctly for non-str comparisons?

comment:34 Changed 10 years ago by Jonathan Lange

slowStringCompare actually works when given unicode input. Why should it be concerned if others are passing around unicode when they ought to be using encoded bytes?

comment:35 Changed 10 years ago by ivank

The security properties of slowStringCompare are good when it compares strs in Python 2.5+ (which shares int objects up ~256 or so, unlike 2.4 which goes up to ~100). When comparing unicode, slowStringCompare may creates ints > 255, and this *might* leak information about the password.

At least, I *think* that was the problem. So, it's deprecated because it needs to work with unicode for compatibility, even though it doesn't work securely.

I would really like to see this in the next release. I think the current state of this branch is much-improved over using == everywhere. If we make slowStringCompare work securely with unicode in the future (assuming the above problem is real), we could take the unusual step of un-deprecating unicode comparisons.

comment:37 Changed 10 years ago by Jean-Paul Calderone

If we might decide not to deprecate unicode, then a different kind of warning might be more appropriate to put in in the first place?

I also think it would be great to see evidence that this change improves things (or doesn't make them worse, at least).

comment:38 Changed 9 years ago by <automation>

Owner: Jean-Paul Calderone deleted

comment:39 Changed 9 years ago by warner

Cc: warner added

comment:40 Changed 9 years ago by Itamar Turner-Trauring

URGH. I thought this was merged already!

What's left to do for this to get merged? There seems to be:

  1. Decide whether slowStringCompare should emit DeprecatingWarning.
  1. Split up _compare in the test function.
  1. Prove, somehow, that this is better - except I thought JP had already demonstrated that?

comment:41 Changed 9 years ago by Jean-Paul Calderone

Prove, somehow, that this is better - except I thought JP had already demonstrated that?

For a minute I thought I had, but my results were not reproducable.

comment:42 Changed 9 years ago by Thijs Triemstra

Description: modified (diff)

comment:43 in reply to:  41 Changed 9 years ago by Glyph

Replying to exarkun:

Prove, somehow, that this is better - except I thought JP had already demonstrated that?

For a minute I thought I had, but my results were not reproducable.

I asked exarkun what this meant, and he explained to me that the timing attacks don't actually appear to work against string comparison in Python, and he's been unable to fabricate one that does. So, basically, there's no bug here until someone can demonstrate that "==" is actually vulnerable in Python, or at least, there's too much noise from other stuff to do anything useful against a real server. Does anyone else have a pointer to a password-extraction attack against Twisted, in any configuration, with any checker, that we can use to validate that this is actually a bug and it's worth getting this branch merged again? I understand the attack in principle but unless we can validate it this just artificially limits the number of logins/sec that Twisted can process.

comment:44 Changed 9 years ago by zooko

Owner: set to zooko
Status: newassigned

I'll try to reproduce the issue.

comment:45 Changed 9 years ago by Jean-Paul Calderone

Author: ivank, exarkun, mithrandiivank, mithrandi

Fixing metadata. I checked some code into the branch, but I think ivank and mithrandi wrote it.

comment:46 Changed 8 years ago by zooko

Owner: zooko deleted
Status: assignednew

comment:47 Changed 7 years ago by Jean-Paul Calderone

#6783 was a duplicate of this.

Changed 7 years ago by zooko

Attachment: test-timing.py added

measures the time it takes to compare strings

comment:48 Changed 7 years ago by Tomas Touceda

I've been trying to create a PoC so this issue will be taken more seriously, but I haven't been able to. That being said, I'm not a experienced hacker or however you want to call it, so this doesn't mean it's not *possible* to create one.

Being that this is a security issue, the policy should be to err on the side of caution, which right now means stop ignoring this issue and patch at hand. How can I help this get solved after all this time?

comment:49 Changed 7 years ago by zooko

In my opinion it would be a good idea to provide a function which compares two strings and guarantees not to leak information about their content in its timing. In my opinion, "regular bugs" should usually not be fixed without a unit test that exercises the bug and the alleged fix, but security vulnerabilities often should. In this case, in particular, there is a risk that someone out there can figure out a way to exploit this timing issue, and instead of submitting a unit test, they use it to steal other people's passwords.

comment:50 in reply to:  49 Changed 7 years ago by Glyph

Replying to zooko:

In my opinion it would be a good idea to provide a function which compares two strings and guarantees not to leak information about their content in its timing. In my opinion, "regular bugs" should usually not be fixed without a unit test that exercises the bug and the alleged fix, but security vulnerabilities often should. In this case, in particular, there is a risk that someone out there can figure out a way to exploit this timing issue, and instead of submitting a unit test, they use it to steal other people's passwords.

As long as no-one can create one, then:

  1. It's possible that there is no attack, and that our current string-comparison strategy is secure, for some reason we don't understand.
  2. It's possible that the "fixed" version, by doing more work in Python, will leak timing information in some other way (again, for some reason we don't understand), and making this change will actually introduce a timing vulnerability. Given that the change does more operations in Python as opposed to C it is entirely plausible to me that it widens the window for timing attacks against some other object (namespace hashtables? I don't know).

So we currently require PoCs for security issues for the same reason that we require them for regular bugs: every change carries risk, and should be justified by demonstrable reward. In the case of security, the risk is that much greater. So, zooko, I understand the emotional appeal of "we should do something!" in the case where users might be at risk, but unless we can demonstrate that they are at risk, then we may be doing more harm than good.

But I'm open to being convinced, if you disagree with my assessment of possible risks or have some other objective model for evaluating the validity of an attack rather than a PoC. (Can we have some kind of mathematical proof of anything related to the execution semantics of Python? Is that even possible?)

comment:51 Changed 7 years ago by Glyph

Also, zooko: test-timing.py seems to have NameErrors and AttributeErrors and doesn't actually do anything when run (no __main__ block). Did you attach the wrong version?

comment:52 Changed 7 years ago by Glyph

For what it's worth I spent the better part of today trying to come up with a repeatable experiment, and the difference (if there is any) is way down below the threshold of noise, even if I disable frequency scaling and run tens of thousands of iterations per attempt, on both CPython and PyPy.

comment:53 Changed 7 years ago by zooko

I'm working on a PoC, but I really think that it should be fixed even if I can't come up with a working PoC. We know (from code inspection) that the timing information is there. So it is only a question of whether an attacker can exploit it. Requiring a PoC first is just saying "You have to be better at this than we are in order to harm our users." That's not a high-enough bar!

comment:54 Changed 7 years ago by Tomas Touceda

I'm also working on a PoC, and while I can reproduce it reliably with the first letter, the attack PoC fails from the second letter on. Here's basically what I'm doing:

max(min((timeit.timeit("a==b", setup="a='H'+'ello'*40; b=chr(%i)+'ello'*40" % c, number=10**7), chr(c)) for j in range(5)) for c in range(256))

comment:55 Changed 7 years ago by Jean-Paul Calderone

CPython 2.7 string equality is implemented like:

    if (op == Py_EQ) {
        /* Supporting Py_NE here as well does not save                                                                                                 
           much time, since Py_NE is rarely used.  */
        if (Py_SIZE(a) == Py_SIZE(b)
            && (a->ob_sval[0] == b->ob_sval[0]
            && memcmp(a->ob_sval, b->ob_sval, Py_SIZE(a)) == 0)) {
            result = Py_True;
        } else {
            result = Py_False;
        }
        goto out;
    }

Notice the special handling of the first byte and the use of memcmp. I think a common implementation of memcmp may be to compare bytes 4 (or even 8) at a time - so you may have to correctly guess blocks of 4 to see a timing difference. I haven't inspected any implementations of memcmp to verify this though.

comment:56 Changed 7 years ago by Tomas Touceda

Interesting, thanks for the insight. So... does that look like a proven timing leak to everyone? or just me? :)

comment:57 Changed 7 years ago by Jean-Paul Calderone

So... does that look like a proven timing leak to everyone? or just me? :)

Not to me, no. Without a demonstration that there's a leak there's still no way to evaluate a change to determine whether the leak has been fixed (or, say, made worse).

Requiring a PoC first is just saying "You have to be better at this than we are in order to harm our users." That's not a high-enough bar!

I'm not sure about this. Trying to fix it without a PoC seems much the same to me. If the fix is wrong and the attacker is smart enough to see it when we are not, the outcome is the same.

comment:58 in reply to:  57 Changed 7 years ago by Tomas Touceda

Replying to exarkun:

So... does that look like a proven timing leak to everyone? or just me? :)

Not to me, no. Without a demonstration that there's a leak there's still no way to evaluate a change to determine whether the leak has been fixed (or, say, made worse).

I showed you a PoC that clearly shows how Python leaks the first char (at least) in a == comparison. You showed the reason why that's the case for the first char... and you still say there is no PoC for a leak? I'm not following. Does it need to be big enough for this to be a security issue?

comment:59 in reply to:  53 Changed 7 years ago by Glyph

Replying to zooko:

I'm working on a PoC, but I really think that it should be fixed even if I can't come up with a working PoC. We know (from code inspection) that the timing information is there.

No, we don't. Timing information is leaked by hardware, not by algorithms. memcmp implementations differ, and it's possible that all the ones our users are practically using a memcmp implementation which does not leak for values within the range of the reasonable length of passwords. Maybe there's some crazy SIMD instruction that compares ten kilobytes at a time, maybe they're doing it on the GPU, nothing would surprise me at this point. CPUs are squirrely little bastards these days, and it seems entirely reasonable to me that even with a fairly straightforward byte- or word-at-a-time comparison algorithm, through a combination of instruction pipelining, variances in how microcode is executed, and noise introduced by operating system schedulers and network latency, there is no possible remote PoC.

So it is only a question of whether an attacker can exploit it. Requiring a PoC first is just saying "You have to be better at this than we are in order to harm our users." That's not a high-enough bar!

The thing is, if the attacker is better at this than we are, they do get to harm our users, whether we like it or not. They can as easily find a flaw that we can't recognize in the "fixed" version as they can find a flaw in the "broken" version that we can't prove.

For example, on Twitter you asked, I don't understand why not to do this, with a link to a constant_time_compare function within Tahoe-LAFS.

  • This calls os.urandom. Is entropy depletion a concern? If not, why not? Do all platforms promise that it won't be, both now and in the future?
  • Does SHA256 cache anything internally, so that hashing certain passwords might be faster than hashing others? Is promising never to do so part of its API contract?
  • This algorithm is still using Python string equality, and so it will still early out if the prefix of the hashes are the same. This is a pretty weak criticism, of course, because the hardness of finding common SHA hash prefixes is basically the entire foundation of bitcoin, and I'm sure those folks are smarter than I am.

But the point is, if we don't have a test-case that demonstrates that at even the problem we're trying to fix here is fixed, we're taking the risk of introducing other issues and maybe not even fixing the timing attack because we're adding some new timing channel we hadn't previously considered.

All we have a PoC for right now is that we're leaking timing information about the first letter of the password. That suggests to me that the safest fix is to do this:

def safeBytesCompare(a, b):
    return (b'\x00' + a) == (b'\x00' + b)

Now the first byte's always the same, and the only demonstrable attack has been defeated; we can much more plausibly estimate that + is not risky than we can about the entire attack surface of kernel urandom implementations, SHA256 and the Python wrappers for those things.

I'm still open to being convinced I'm in the wrong here, I am not a security expert, but just repeating "users are at risk, users are at risk" is not going to convince me, because empirically, science says that as far as we know, users are not at risk ("users are not at risk" is a falsifiable hypothesis and nobody's falsified it), and ethically, the precautionary principle says that the burden rests with those who say "we should do something" when both doing something and not-doing-something might be risky.

One thing that might be interesting to try would be to give this a spin on some alternate CPU architectures, like ARM, which, I am given to understand, are a bit more deterministic in their performance characteristics than Intel. Anybody have a raspberry pi or server-side ARM thing to look for a PoC on?

comment:60 Changed 7 years ago by Glyph

I've been corresponding with Nate Lawson (author of the articles referenced in the ticket's description). While he hasn't entirely managed to convince me that we should fix it without a PoC, he has both provided some thoughts and provoked some of my own:

  • We shouldn't be dealing with unicode and non-unicode in the same function. Encoding is potentially data-dependent, so that's just another place to leak some timing information. Instead, we should have a separate slowUnicodeCompare that does something with the unicode data structure. This could normalize just the attacker-controlled input, and require the data store to have pre-normalized its own input. At that point we should be able to just hash the underlying bytes without encoding, but as far as I can tell, py3 has taken away the ability to access those bytes because unicode objects implement the old, but not the new, buffer interface. Anyone know a way around that?
  • He recommended using an rdtsc instruction for doing timing rather than using a wall clock. Anyone tried that yet?
  • We support a fixed list of platforms, for which we have buildbots. We could potentially speed up the process of identifying any supported platforms with demonstrable timing leaks by running that test as part of a build of this branch.
  • If we're going to call os.urandom as zooko's implementation suggests, those bytes can be retrieved once on process startup. Given that sha hashing by itself ought to be constant time anyway, re-seeding the randomness on every comparison is overkill. (This significantly mitigates my entropy-leak concern.)
  • We could potentially instrument by counting instructions by tracing, rather than timing. Mr. Lawson suggested pintool for this purpose - anyone know how to use it?

Thanks everyone for your continued interest in making Twisted more secure.

comment:61 Changed 3 years ago by Glyph

Since this has come up again recently, it's worth pointing out that the right solution here might be to sidestep this issue entirely.

Secure (constant time) comparisons can only be done on secrets of a known length, which is why cryptography's bytes_eq and hmac.compare_digest only make promises about secrets of equivalent length. And there are numerous good reasons for that, but one is that if, at secret comparison time, you have to do data-dependent things based on both the secret and the input, you're in a bad situation.

The comparisons that are a problem here involve the credential checker having both the input _and the plain-text secret_ in memory at the same time. We could possibly eventually have a proof of concept that conclusively verifies that comparison is timing-dependent, but this buries the lede which is that in this case _we have a database full of plain text passwords_.

We should really have every checker insist upon a digest algorithm to be used when storing shared secrets at rest (and then we can rely on the existing research and tools related specifically to digest comparisons). The thing to do with plain-text checkers is to deprecate them entirely, not to try to make their comparisons better.

comment:62 Changed 3 years ago by Glyph

Oh also since I made some comments about "entropy depletion" earlier, I should say: that is a super dumb concern spurred on by the urban legends promulgated by the Linux /dev/urandom manpage. I was wrong. It is absolutely not a concern. See https://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

Note: See TracTickets for help on using tickets.