Opened 9 years ago

Closed 7 years ago

#5066 defect closed fixed (fixed)

Jabber SASL can't login with password which contains non-ASCII chars

Reported by: Sergey Owned by: ralphm
Priority: highest Milestone:
Component: words Keywords:
Cc: ralphm Branch: branches/xmpp-sasl-non-ascii-5066
branch-diff, diff-cov, branch-cov, buildbot
Author: ralphm

Description

If I trying to connect to my jabber-account where password contains non-ASCII characters I get the following error:

 File "/usr/lib64/python2.7/site-packages/twisted/words/protocols/jabber/sasl_mechanisms.py", line 218, in _gen_response
    a1 = "%s:%s:%s" % (H("%s:%s:%s" % (username, realm, password)),
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

It may seems to be strange because username and password are correctly encoded into "utf-8" with some lines before but realm is still in unicode and the whole string became to be unicode and exceptions is got.

To resolve this issue we must just encode realm too:

http://twistedmatrix.com/trac/browser/trunk/twisted/words/protocols/jabber/sasl_mechanisms.py#L207

after this line add: realm = realm.encode(charset)

Attachments (4)

5066_patch.diff (1.9 KB) - added by Sergey 8 years ago.
5066_patch_2.diff (5.0 KB) - added by Sergey 8 years ago.
The second try
5066_patch_3.diff (4.9 KB) - added by Sergey 8 years ago.
5066_patch_4.diff (5.7 KB) - added by Dmitry 8 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 Changed 9 years ago by DefaultCC Plugin

Cc: ralphm added

comment:2 Changed 8 years ago by Sergey

Priority: highhighest

comment:3 Changed 8 years ago by Jean-Paul Calderone

Owner: set to Sergey

Changed 8 years ago by Sergey

Attachment: 5066_patch.diff added

comment:4 Changed 8 years ago by Sergey

Keywords: review added
Owner: Sergey deleted

comment:5 Changed 8 years ago by ralphm

Keywords: xmpp jabber sasl unicode encode realm removed
Owner: set to ralphm
Status: newassigned

comment:6 Changed 8 years ago by ralphm

Thanks for looking into this! I am in conversation with some people in the XMPP Standards Foundation on this. The problem is that I'm not sure if digest-uri and realm may be directly encoded to the encoding or that the domain labels in them must be in Punycode. The various specs (like rfc:2831) are not entirely clear on this, unfortunately.

comment:7 Changed 8 years ago by Sergey

I think that punycode would not be used anywhere except of DNS... But, sure thing, it would be better to check first.

Thanks for your work! Will wait about any progress.

comment:8 Changed 8 years ago by ralphm

Keywords: review removed
Owner: changed from ralphm to Sergey
Status: assignednew

So it turns out this is not a problem, most implementations seem to just encode it. My review notes:

  • Make it explicit in DigestMD5 that the __init__ parameters are supposed to be unicode objects (by documenting them) and make the format strings there unicode strings, too.
  • About: test_getResponseUnicode.
    • It lacks a docstring.
    • The new test doesn't actually test that those two parameters are in fact encoded. I.e. the only thing that is tested is that a unicode password now doesn't break anything. Could you add some tests around that?
  • Could you add a news file?

Thanks!

Changed 8 years ago by Sergey

Attachment: 5066_patch_2.diff added

The second try

comment:9 Changed 8 years ago by Sergey

Keywords: review added
Owner: changed from Sergey to ralphm

Tried to follow your suggestions, please take a look.

I don't actually get how the NEWS file should look, but I can try if patch itself is good :)

comment:10 Changed 8 years ago by ralphm

Author: ralphm
Branch: branches/xmpp-sasl-non-ascii-5066

(In [35331]) Branching to 'xmpp-sasl-non-ascii-5066'

comment:11 Changed 8 years ago by ralphm

Keywords: review removed
Owner: changed from ralphm to Sergey

I applied both patches to source:branches/xmpp-sasl-non-ascii-5066.

My comments:

  • I'd prefer the new test to be put at the bottom of the other ones testing getResponse (i.e. just before test__parse).
  • The change to a unicode string for the nested function KD is wrong as it calls H on it, and that in turn calls md5, which expects a bytes string.
  • The indent in the assignment for reponse was changed. This makes second argument to KD no longer line up below the first. I think it is easier to read that way, even though the spaces after the parentheses are not conforming to pep:8. It could be split up, but this is more or less literally how it is written in rfc:2831.
  • The duplication of the code from _gen_response in test_getResponseUnicode to compare the response parameter is undesirable. It doesn't sufficiently test _gen_response itself and I'd rather see that in a set of additional tests. E.g. it doesn't address the note on having to re-encode to ISO-8859-1 if all characters are in that set, even though 'UTF-8' was specified. There are more such horrible details, and this causes interoperability issues all over the place anyway. Don't waste your time with it, especially since Digest-MD5 has been obsoleted over a year ago (see rfc:6331). Or at least not as part of this ticket. For this test, you should just encode the results yourself and put the outcome in the test as literals, for both encodings.
  • See here for the documentation on writing ReviewProcess

Changed 8 years ago by Sergey

Attachment: 5066_patch_3.diff added

comment:12 Changed 8 years ago by Sergey

Keywords: review added
Owner: changed from Sergey to ralphm

Fixed all of these except of third because can't get it. I can't be sure that password was properly encoded anyway but calculate it myself, no?

comment:13 in reply to:  12 ; Changed 8 years ago by ralphm

Keywords: review removed
Owner: changed from ralphm to Sergey

Replying to binary:

Fixed all of these ..

Thanks!

.. except of third because can't get it. I can't be sure that password was properly encoded anyway but calculate it myself, no?

The point I was trying to make is that the code in the test is a literal copy from the actual implementation that is being tested, including a comment. I suggested replacing it with a literal value, but you are right that you'd have to calculate it yourself. That may also not be the best thing to do.

It basically comes down to this: for every piece of code that we change in Twisted, we like it to have high quality tests. Doing that for the Digest-MD5 SASL mechanism would require quite some more tests, and even more changes to the actual implementation. Even if you would eventually come up with a reasonable solution, it might not actually interoperate with other implementations. The list of reasons in rfc:6331 are the sad realization of this.

So really I'd not encourage you to do all that, unless you really really need to. Might I suggest you work on adding a SCRAM implementation instead (in another ticket)? See XMPP Core section 13.8.1 and SCRAM for details. There's also this nice whitepaper on SCRAM by a prominent XMPP server implementation.

Coming back to this ticket, I think it would be sufficient to test that the directives username, password and digest-uri are properly encoded and leave response untested.

If you really want to test them anyway, convince yourself that the current implementation is in fact correct (for these values) and check against the literal values 0a810f0437e4eeada7e403bcea3f772c and 4fd4b1d56845219cb4cb8dd13eaef214 respectively. Fortunately your choice of U+0418 CYRILLIC CAPITAL LETTER I completely dances around the ISO-8859-1 issue I mentioned.

comment:14 Changed 8 years ago by ralphm

I meant realm' instead of password`, of course.

comment:15 in reply to:  13 ; Changed 8 years ago by Sergey

Owner: changed from Sergey to ralphm

Replying to ralphm:

Replying to binary:

Fixed all of these ..

Thanks!

Thank you too! :)

So really I'd not encourage you to do all that, unless you really really need to. Might I suggest you work on adding a SCRAM implementation instead (in another ticket)? See XMPP Core section 13.8.1 and SCRAM for details. There's also this nice whitepaper on SCRAM by a prominent XMPP server implementation.

I can try but not immediately because I am planning to go to a vacation soon, but that's an interesting task.

Coming back to this ticket, I think it would be sufficient to test that the directives username, password and digest-uri are properly encoded and leave response untested.

If you really want to test them anyway, convince yourself that the current implementation is in fact correct (for these values) and check against the literal values 0a810f0437e4eeada7e403bcea3f772c and 4fd4b1d56845219cb4cb8dd13eaef214 respectively. Fortunately your choice of U+0418 CYRILLIC CAPITAL LETTER I completely dances around the ISO-8859-1 issue I mentioned.

But that's impossible to hardcode response values there because they are depend on random generated cnonce.

comment:16 in reply to:  15 ; Changed 8 years ago by ralphm

Owner: changed from ralphm to Sergey

Replying to binary:

Replying to ralphm:

Coming back to this ticket, I think it would be sufficient to test that the directives username, realm and digest-uri are properly encoded and leave response untested.

If you really want to test them anyway, convince yourself that the current implementation is in fact correct (for these values) and check against the literal values …

But that's impossible to hardcode response values there because they are depend on random generated cnonce.

Oh, of course. You have three options here:

  • Monkey patch _gen_nonce.
  • Factor out the core of this function in a new one that just calculates the response given nonce, cnonce, nc and the encoded username, realm, password and digest_uri parameters.

This makes it possible to test the encoding and unparsing separately from the actual response generation. For testing _gen_response you'd then override the new function. I believe it to be more proper to test the outcome of this new function against pre-calculated values, if only as a sanity check. You could use the example from RFC 2831 section 4 (and mention this in the docstring of the new test).

  • Not testing response at all.

comment:17 in reply to:  16 Changed 8 years ago by Sergey

Replying to ralphm:

Replying to binary:

Replying to ralphm:

Coming back to this ticket, I think it would be sufficient to test that the directives username, realm and digest-uri are properly encoded and leave response untested.

If you really want to test them anyway, convince yourself that the current implementation is in fact correct (for these values) and check against the literal values …

But that's impossible to hardcode response values there because they are depend on random generated cnonce.

Oh, of course. You have three options here:

  • Monkey patch _gen_nonce.
  • Factor out the core of this function in a new one that just calculates the response given nonce, cnonce, nc and the encoded username, realm, password and digest_uri parameters.

This makes it possible to test the encoding and unparsing separately from the actual response generation. For testing _gen_response you'd then override the new function. I believe it to be more proper to test the outcome of this new function against pre-calculated values, if only as a sanity check. You could use the example from RFC 2831 section 4 (and mention this in the docstring of the new test).

  • Not testing response at all.

Can you provide me with an advice how to do that better? :) The problem is being to be unsolved for a too long time, I think but I don't really understand what is the problem with the current patch.

Changed 8 years ago by Dmitry

Attachment: 5066_patch_4.diff added

comment:18 Changed 8 years ago by Sergey

Keywords: review added
Owner: changed from Sergey to ralphm

The next try.

comment:19 Changed 8 years ago by ralphm

Keywords: review removed

I applied the new patch to the branch, and forced a build. Unfortunately it fails. From the looks of it, this is because the arguments to DigestMD5 in this test are already encoded, where its __init__ assumes them to be unicode strings. I'll prepare a fix for it.

comment:20 Changed 8 years ago by ralphm

(In [37606]) Fix failing test, add tests for unicode domains and remove unused imports.

Re: #5066.

comment:21 Changed 8 years ago by ralphm

Keywords: review added
Owner: ralphm deleted

Forced another build with my changes. I also did some test by actually connecting to servers, with and without IDNs, and that seemed to be working as expected.

binary, could you check that my changes work nicely for your use, too?

Otherwise, I think this is ready for review, but with my changes I disqualified myself from doing that.

comment:22 Changed 8 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to ralphm

Thanks!

  1. in the tests:
    1. in test_getResponseNonAsciiRealm, can you write the assertions about directives using a single assertEqual call? That way if multiple items mismatch, a single test failure will report all of them.
    2. Can you expand the test_getResponseNonAsciiRealm docstring a little bit (ie, what aspect of the response is interesting in the case where the realm has non-ascii in it)? Also, sadly, I think it is "An encoded" rather than "A encoded". ;)
    3. The docstring test_getResponseNoRealmIDN claims something "works", but doesn't explain what that means. Can you expand it as well? Also, I'm confused why it says something is happening "without realm". "realm" seems to be the same as "host", and the test does supply a host to DigestMD5.
    4. test_calculate_response doesn't follow the naming convention (because _calculate_response doesn't follow it, I guess :/)
    5. and its docstring could be cleaned up a bit (in particular, it is usually a good idea to avoid having the active verb be "test").
    6. test_calculate_response also needs some more whitespace following it to separate it from test__parse.
  2. in the implementation:
    1. not introduced in this branch, but serv_type, serv_name, digest_uri are all named against the coding standard :/
    2. The DigestMD5.__init__ parameters should have more than just @type I suppose
    3. Not mandatory, but it might be nice to use b"" for the definitely-these-are-bytes literals in the code being changed/added (instead of "")
    4. _calculate_response and _gen_response should document their return value/type at least, if not also all their parameters
    5. The new realm = realm line in _gen_response is probably not needed
  3. Can you add a link to the relevant specification (or if there is none, an archive of a mailing list post?) so it's clear what this code is trying to accomplish?

These basically seem to be cosmetic issues. Please go ahead and merge after addressing them and forcing a build with green results.

comment:23 Changed 7 years ago by Jean-Paul Calderone

(In [41290]) Address test-related review comments.

refs #5066

comment:24 Changed 7 years ago by Jean-Paul Calderone

(In [41291]) Address implementation-related review comments.

refs #5066

comment:25 Changed 7 years ago by Jean-Paul Calderone

(In [41293]) twistedchecker fixes

refs #5066

comment:26 Changed 7 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [41294]) Merge xmpp-sasl-non-ascii-5066

Author: binary, ralphm, exarkun Reviewer: ralphm, exarkun Fixes: #5066

Add necessary encoding logic to the DigestMD5 SASL mechanism in twisted.words.protocols.jabber.sasl_mechanisms so that it supports unicode arguments and encoded non-ASCII challenges.

Note: See TracTickets for help on using tickets.