Opened 13 years ago

Closed 12 years ago

#3847 defect closed fixed (fixed)

twisted.words.test.test_xmlstream.XmlStreamTest.test_receiveBadXML fails with Python trunk@HEAD

Reported by: ivank Owned by:
Priority: normal Milestone: Python-2.7
Component: words Keywords:
Cc: Jean-Paul Calderone Branch: branches/jabber-hashpassword-3847
branch-diff, diff-cov, branch-cov, buildbot
Author: ralphm

Description

Python 2.7r72791:

[ERROR]: twisted.words.test.test_xmlstream.XmlStreamTest.test_receiveBadXML

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/twisted/words/protocols/jabber/client.py", line 119, in _cbAuthQuery
    digest = xmlstream.hashPassword(self.xmlstream.sid, password)
  File "/usr/local/lib/python2.7/site-packages/twisted/words/protocols/jabber/xmlstream.py", line 46, in hashPassword
    return sha1("%s%s" % (sid, password)).hexdigest()
exceptions.TypeError: Unicode-objects must be encoded before hashing

Attachments (1)

Twisted_jabber_xmlstream_ascii.patch (406 bytes) - added by ivank 12 years ago.

Download all attachments as: .zip

Change History (24)

comment:1 Changed 13 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Owner: changed from Jean-Paul Calderone to Ralph Meijer

Additionally, there seem to be no direct unit tests for hashPassword, and it doesn't document its parameters (in particular, it doesn't document whether they are to be str or unicode).

Changed 12 years ago by ivank

comment:2 Changed 12 years ago by ivank

The patch makes keeps supporting ascii-range user/password only (as before), except compatible with Python 2.7.

comment:3 Changed 12 years ago by Ralph Meijer

Status: newassigned

Note 7 in XEP-0078 reads:

In Digest authentication, .. non-US-ASCII characters MUST be encoded as UTF-8 since the SHA-1 hashing algorithm operates on byte arrays.

So, hashPassword should be modified to:

  • Document that it takes unicode parameters.
  • Encode those parameters to utf-8 prior to calculating the hash.

(Obviously the there must be tests for this as well)

comment:4 Changed 12 years ago by Ralph Meijer

Author: ralphm
Branch: branches/jabber-hashpassword-3847

(In [27712]) Branching to 'jabber-hashpassword-3847'

comment:5 Changed 12 years ago by Ralph Meijer

(In [27713]) Make sure input is encoded to UTF-8 before calculating hash.

Addresses #3741, #3742, #3847.

The fix for hashPassword was rather easy, but highlights issues in how the tests in test_jabberxmlstream work.

These tests work for the happy case, but exceptions can be raised asynchronously in XPath observers, causing spurious errors in unrelated tests (#3847) if the actual test also raises an exception causing the deferred to not be returned.

I addressed this by rewriting all of those tests so the handlers also trigger deferreds we can wait for.

comment:6 Changed 12 years ago by Ralph Meijer

Keywords: review added
Owner: Ralph Meijer deleted
Status: assignednew

According to http://buildbot.twistedmatrix.com/builders/ubuntu64-py2.7-select/builds/2, the changes in this branch indeed fix this ticket, and the related issues #3741 and #3742.

Please review.

comment:7 Changed 12 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: newassigned

comment:8 Changed 12 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Ralph Meijer
Status: assignednew

Thanks!

  1. All three modified files need to have their copyright date updated.
  2. As long as twisted.words.protocols.jabber.xmlstream.hashPassword is being modified, can you move the import to the top of the module?
  3. There's an unused import (reported by pyflakes) in twisted/words/protocols/jabber.py
  4. InitiatingInitializerHarness could use a class docstring
  5. IQAuthInitializerTest likewise
  6. It's not obvious what the comment at the top of onAuthGet in testFailAuth refers to
  7. BindInitializerTest is also missing a class docstring
  8. SessionInitializerTest as well
  9. A bunch of tests being modified are also not named according to the current naming convention. They also have somewhat non-optimal docstrings. For example, Test basic operations with plain text authentication. Set up a stream, and act as if authentication succeeds. The first sentence just restates what I learned from the test method name (although it does tell me that this is a test about authentication, which is a bit of new information). The second part is just implementation details of the test which would be more appropriate as comments in the method. What are "basic operations"? How is "plain text authentication" expected to work (it seems to involve sha1 - something one might not expect from something named "plain text authentication"!).
  10. Without the fix, twisted.words.test.test_jabberclient.IQAuthInitializerTest.testDigest ends up with two errors - or rather, one error twice. This isn't necessarily a problem, but it seems unusual.
  11. twisted.words.test.test_jabberxmlstream.HashPasswordTest.test_notUnicode failed on one of the builders, http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems
  12. Check out http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles

comment:9 in reply to:  8 Changed 12 years ago by Ralph Meijer

Replying to exarkun:

Thanks!

  1. All three modified files need to have their copyright date updated.
  2. As long as twisted.words.protocols.jabber.xmlstream.hashPassword is being modified, can you move the import to the top of the module?

Done.

  1. There's an unused import (reported by pyflakes) in twisted/words/protocols/jabber.py

I assume you mean the import of STREAM_START_EVENT in twisted/words/protocols/jabber/xmlstream.py. This is for backwards compatibility and are used by external projects.

  1. InitiatingInitializerHarness could use a class docstring
  2. IQAuthInitializerTest likewise
  3. It's not obvious what the comment at the top of onAuthGet in testFailAuth refers to
  4. BindInitializerTest is also missing a class docstring
  5. SessionInitializerTest as well

Fixed.

  1. A bunch of tests being modified are also not named according to the current naming convention. They also have somewhat non-optimal docstrings. For example, Test basic operations with plain text authentication. Set up a stream, and act as if authentication succeeds. The first sentence just restates what I learned from the test method name (although it does tell me that this is a test about authentication, which is a bit of new information). The second part is just implementation details of the test which would be more appropriate as comments in the method. What are "basic operations"? How is "plain text authentication" expected to work (it seems to involve sha1 - something one might not expect from something named "plain text authentication"!).

I tried to document the whole test class more extensively. You probably misread the code for testBasic (now test_plainText), it does not mention sha1, unlike test_digest.

  1. Without the fix, twisted.words.test.test_jabberclient.IQAuthInitializerTest.testDigest ends up with two errors - or rather, one error twice. This isn't necessarily a problem, but it seems unusual.

Yes, this is an interesting interaction between the errbacked deferred returned from the observer being chained to d1 and d2. There is no harm in it I think. It could be resolved by having DeferredList eat secondary failures, but that might hide other future issues.

  1. twisted.words.test.test_jabberxmlstream.HashPasswordTest.test_notUnicode failed on one of the builders, http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems

I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.

  1. Check out http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles

Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?

comment:10 Changed 12 years ago by Ralph Meijer

(In [27727]) Address review comments.

Re #3847.

comment:11 Changed 12 years ago by Jean-Paul Calderone

I assume you mean the import of STREAM_START_EVENT in twisted/words/protocols/jabber/xmlstream.py. This is for backwards compatibility and are used by external projects.

Yep. If you add the name to __all__ then (new enough versions of) pyflakes won't complain about it (and presumably human readers will have an easier time of it too :).

I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.

Did you mean ... also accepted unicode input to be hashed.? If so, that seems likely. I guess doing an explicit type check in hashPassword would be the solution, in that case.

Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?

Make one for each ticket that is going to be closed when you merge the branch.

comment:12 Changed 12 years ago by Ralph Meijer

(In [27772]) Explicitly test for unicode as input for hashPassword, add all.

Re #3741, #3742, #3847.

comment:13 in reply to:  11 Changed 12 years ago by Ralph Meijer

Keywords: review added
Owner: Ralph Meijer deleted

Replying to exarkun:

I assume you mean the import of STREAM_START_EVENT in twisted/words/protocols/jabber/xmlstream.py. This is for backwards compatibility and are used by external projects.

Yep. If you add the name to __all__ then (new enough versions of) pyflakes won't complain about it (and presumably human readers will have an easier time of it too :).

I added __all__ for this.

I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.

Did you mean ... also accepted unicode input to be hashed.? If so, that seems likely. I guess doing an explicit type check in hashPassword would be the solution, in that case.

No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.

I have now changed hashPassword to only accept unicode, or raise TypeError. I also adjusted the code using hashPassword to convert their input to unicode, to not break existing code that passes (ASCII-7) strings as secrets.

Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?

Make one for each ticket that is going to be closed when you merge the branch.

Done.

Please review.

comment:14 Changed 12 years ago by Jean-Paul Calderone

No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.

This is absolutely incomprehensible to me. :/

  • "utf-8 encoded input" is a byte string (type str). When is str not hashable?
  • What is a "Python Unicode build"?
  • Why is the locale relevant at all? There's no locale usage here. Are you talking about the return value of sys.getdefaultencoding()?
  • What does "formatting to unicode" mean?

comment:15 in reply to:  14 Changed 12 years ago by Ralph Meijer

Replying to exarkun:

No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.

This is absolutely incomprehensible to me. :/

Hehe.

  • "utf-8 encoded input" is a byte string (type str). When is str not hashable?

It is always, that's why the test in http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems failed: it didn't raise a UnicodeError where I did expect it in the previous incarnation of the tests.

  • What is a "Python Unicode build"?
  • Why is the locale relevant at all? There's no locale usage here. Are you talking about the return value of sys.getdefaultencoding()?
  • What does "formatting to unicode" mean?

In sane installs of Python, this happens:

>>> u"%s%s" % ("123", "bla\xc3\xa9")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

However, the infamous site.py and sitecustomize.py allow one to make the default encoding different from 'ascii' through sys.setdefaultencoding, which is conveniently removed after use. This is a horrible practice, but I suspect that this has happened on this particular build slave. I am pretty sure that sys.getdefaultencoding returns 'utf-8' there. With the del in site.py removed, you can do this:

>>> import sys
>>> sys.setdefaultencoding('utf-8')
>>> u"%s%s" % ("123", "bla\xc3\xa9")
u'123bla\xe9'

This makes the test fail by not throwing UnicodeError.

All of this is now avoided by explicitly requiring unicode inputs.

comment:16 Changed 12 years ago by Jean-Paul Calderone

yes - gtk sets the default encoding to utf-8. Just to be clear though, this has nothing to do with "locales" - those are something else.

comment:17 Changed 12 years ago by Ralph Meijer

No, sure. The comment about locales came from my first glance at site.py. On another look, I saw that all that code has if 0: around it, so please disregard any of that :-)

comment:18 Changed 12 years ago by TimAllen

Owner: set to TimAllen

comment:19 Changed 12 years ago by TimAllen

Keywords: review removed
Owner: changed from TimAllen to Ralph Meijer
  1. In twisted/words/test/test_jabberclient.py, the method InitiatingInitializerHarness.gatherResults is mostly identical to defer.gatherResults() with a little bit tacked on the end. Maybe it should call that function, and/or be renamed to something different, like gatherResultsOrFailure?
  2. In IQAuthInitializerTest.testDigest(), the code does "self.assertEquals(sha1('12345secret').hexdigest(), unicode(iq.query.digest))". I'm pretty sure that .hexdigest() will return a byte-sequence, not a unicode object., so you shouldn't need to cast the expected digest value.
  3. I realise you didn't write these, but in BindInitializerTest both tests have a description of "Test basic operations." The second line of each docstring would make a better docstring, I think.
  4. Likewise, in SessionInitializerTest both tests have a description of "Test basic operations." - but SessionInitializerTest.testSuccess() mentions "resource binding", while .testFailure() claims to test whether session establishment "succeeds."

comment:20 in reply to:  19 Changed 12 years ago by Ralph Meijer

Keywords: review added
Owner: Ralph Meijer deleted

Replying to TimAllen:

  1. In twisted/words/test/test_jabberclient.py, the method InitiatingInitializerHarness.gatherResults is mostly identical to defer.gatherResults() with a little bit tacked on the end. Maybe it should call that function, and/or be renamed to something different, like gatherResultsOrFailure?

I opted for just using gatherResults. The look-alike was mostly instrumental while developing the new tests.

  1. In IQAuthInitializerTest.testDigest(), the code does "self.assertEquals(sha1('12345secret').hexdigest(), unicode(iq.query.digest))". I'm pretty sure that .hexdigest() will return a byte-sequence, not a unicode object., so you shouldn't need to cast the expected digest value.

The use of unicode() here is to extract the CDATA from the element, much like XPath's string() function. For ultimate correctness, I encode this to 'utf-8' before comparison now. Note that using str() wouldn't work, as this returns unicode objects, unfortunately. Backwards compatibility prevents us from fixing that any time soon.

  1. I realise you didn't write these, but in BindInitializerTest both tests have a description of "Test basic operations." The second line of each docstring would make a better docstring, I think.

I did write these and corrected them.

  1. Likewise, in SessionInitializerTest both tests have a description of "Test basic operations." - but SessionInitializerTest.testSuccess() mentions "resource binding", while .testFailure() claims to test whether session establishment "succeeds."

Likewise, I fixed these.

Please review.

comment:21 Changed 12 years ago by TimAllen

Keywords: review removed
Owner: set to Ralph Meijer

Looks good to me. Ready for merge!

comment:22 Changed 12 years ago by Ralph Meijer

Resolution: fixed
Status: newclosed

(In [27896]) Fix failing tests for Python 2.7, allow non-ascii secrets in hashPassword.

Author: ralphm. Reviewer: exarkun, TimAllen. Fixes #3741, #3742, #3847.

This lets twisted.words.protocols.jabber.xmlstream.hashPassword now only accept unicode as input. The input before hashing is encoded to UTF-8. Tests for hashPassword were added and a number of tests were rewritten to deal with asynchronous failures.

comment:23 Changed 11 years ago by <automation>

Owner: Ralph Meijer deleted
Note: See TracTickets for help on using tickets.