Opened 12 years ago
Closed 11 years ago
#3847 defect closed fixed (fixed)
twisted.words.test.test_xmlstream.XmlStreamTest.test_receiveBadXML fails with Python trunk@HEAD
Reported by: | ivank | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | Python-2.7 |
Component: | words | Keywords: | |
Cc: | Jean-Paul Calderone | Branch: |
branches/jabber-hashpassword-3847
branch-diff, diff-cov, branch-cov, buildbot |
Author: | ralphm |
Description
Python 2.7r72791:
[ERROR]: twisted.words.test.test_xmlstream.XmlStreamTest.test_receiveBadXML Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/twisted/words/protocols/jabber/client.py", line 119, in _cbAuthQuery digest = xmlstream.hashPassword(self.xmlstream.sid, password) File "/usr/local/lib/python2.7/site-packages/twisted/words/protocols/jabber/xmlstream.py", line 46, in hashPassword return sha1("%s%s" % (sid, password)).hexdigest() exceptions.TypeError: Unicode-objects must be encoded before hashing
Attachments (1)
Change History (24)
comment:1 Changed 12 years ago by
Cc: | Jean-Paul Calderone added |
---|---|
Owner: | changed from Jean-Paul Calderone to ralphm |
Changed 12 years ago by
Attachment: | Twisted_jabber_xmlstream_ascii.patch added |
---|
comment:2 Changed 12 years ago by
The patch makes keeps supporting ascii-range user/password only (as before), except compatible with Python 2.7.
comment:3 Changed 12 years ago by
Status: | new → assigned |
---|
Note 7 in XEP-0078 reads:
In Digest authentication, .. non-US-ASCII characters MUST be encoded as UTF-8 since the SHA-1 hashing algorithm operates on byte arrays.
So, hashPassword
should be modified to:
- Document that it takes
unicode
parameters. - Encode those parameters to utf-8 prior to calculating the hash.
(Obviously the there must be tests for this as well)
comment:4 Changed 11 years ago by
Author: | → ralphm |
---|---|
Branch: | → branches/jabber-hashpassword-3847 |
(In [27712]) Branching to 'jabber-hashpassword-3847'
comment:5 Changed 11 years ago by
(In [27713]) Make sure input is encoded to UTF-8 before calculating hash.
Addresses #3741, #3742, #3847.
The fix for hashPassword
was rather easy, but highlights issues in how the
tests in test_jabberxmlstream
work.
These tests work for the happy case, but exceptions can be raised asynchronously in XPath observers, causing spurious errors in unrelated tests (#3847) if the actual test also raises an exception causing the deferred to not be returned.
I addressed this by rewriting all of those tests so the handlers also trigger deferreds we can wait for.
comment:6 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | ralphm deleted |
Status: | assigned → new |
According to http://buildbot.twistedmatrix.com/builders/ubuntu64-py2.7-select/builds/2, the changes in this branch indeed fix this ticket, and the related issues #3741 and #3742.
Please review.
comment:7 Changed 11 years ago by
Owner: | set to Jean-Paul Calderone |
---|---|
Status: | new → assigned |
comment:8 follow-up: 9 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from Jean-Paul Calderone to ralphm |
Status: | assigned → new |
Thanks!
- All three modified files need to have their copyright date updated.
- As long as
twisted.words.protocols.jabber.xmlstream.hashPassword
is being modified, can you move the import to the top of the module? - There's an unused import (reported by pyflakes) in
twisted/words/protocols/jabber.py
InitiatingInitializerHarness
could use a class docstringIQAuthInitializerTest
likewise- It's not obvious what the comment at the top of
onAuthGet
intestFailAuth
refers to BindInitializerTest
is also missing a class docstringSessionInitializerTest
as well- A bunch of tests being modified are also not named according to the current naming convention. They also have somewhat non-optimal docstrings. For example, Test basic operations with plain text authentication. Set up a stream, and act as if authentication succeeds. The first sentence just restates what I learned from the test method name (although it does tell me that this is a test about authentication, which is a bit of new information). The second part is just implementation details of the test which would be more appropriate as comments in the method. What are "basic operations"? How is "plain text authentication" expected to work (it seems to involve sha1 - something one might not expect from something named "plain text authentication"!).
- Without the fix,
twisted.words.test.test_jabberclient.IQAuthInitializerTest.testDigest
ends up with two errors - or rather, one error twice. This isn't necessarily a problem, but it seems unusual. twisted.words.test.test_jabberxmlstream.HashPasswordTest.test_notUnicode
failed on one of the builders, http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems- Check out http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles
comment:9 Changed 11 years ago by
Replying to exarkun:
Thanks!
- All three modified files need to have their copyright date updated.
- As long as
twisted.words.protocols.jabber.xmlstream.hashPassword
is being modified, can you move the import to the top of the module?
Done.
- There's an unused import (reported by pyflakes) in
twisted/words/protocols/jabber.py
I assume you mean the import of STREAM_START_EVENT
in twisted/words/protocols/jabber/xmlstream.py
. This is for backwards compatibility and are used by external projects.
InitiatingInitializerHarness
could use a class docstringIQAuthInitializerTest
likewise- It's not obvious what the comment at the top of
onAuthGet
intestFailAuth
refers toBindInitializerTest
is also missing a class docstringSessionInitializerTest
as well
Fixed.
- A bunch of tests being modified are also not named according to the current naming convention. They also have somewhat non-optimal docstrings. For example, Test basic operations with plain text authentication. Set up a stream, and act as if authentication succeeds. The first sentence just restates what I learned from the test method name (although it does tell me that this is a test about authentication, which is a bit of new information). The second part is just implementation details of the test which would be more appropriate as comments in the method. What are "basic operations"? How is "plain text authentication" expected to work (it seems to involve sha1 - something one might not expect from something named "plain text authentication"!).
I tried to document the whole test class more extensively. You probably misread the code for testBasic
(now test_plainText
), it does not mention sha1, unlike test_digest
.
- Without the fix,
twisted.words.test.test_jabberclient.IQAuthInitializerTest.testDigest
ends up with two errors - or rather, one error twice. This isn't necessarily a problem, but it seems unusual.
Yes, this is an interesting interaction between the errbacked deferred returned from the observer being chained to d1
and d2
. There is no harm in it I think. It could be resolved by having DeferredList eat secondary failures, but that might hide other future issues.
twisted.words.test.test_jabberxmlstream.HashPasswordTest.test_notUnicode
failed on one of the builders, http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems
I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.
Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?
comment:11 follow-up: 13 Changed 11 years ago by
I assume you mean the import of STREAM_START_EVENT in twisted/words/protocols/jabber/xmlstream.py. This is for backwards compatibility and are used by external projects.
Yep. If you add the name to __all__
then (new enough versions of) pyflakes won't complain about it (and presumably human readers will have an easier time of it too :).
I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.
Did you mean ... also accepted unicode input to be hashed.? If so, that seems likely. I guess doing an explicit type check in hashPassword
would be the solution, in that case.
Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?
Make one for each ticket that is going to be closed when you merge the branch.
comment:12 Changed 11 years ago by
comment:13 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | ralphm deleted |
Replying to exarkun:
I assume you mean the import of STREAM_START_EVENT in twisted/words/protocols/jabber/xmlstream.py. This is for backwards compatibility and are used by external projects.
Yep. If you add the name to
__all__
then (new enough versions of) pyflakes won't complain about it (and presumably human readers will have an easier time of it too :).
I added __all__
for this.
I'm not sure why this happens. It could be that older python versions also accepted utf-8 encoded input to be hashed. If that's the case, I am not sure how to adjust the test. Bah. I'll look into it.
Did you mean ... also accepted unicode input to be hashed.? If so, that seems likely. I guess doing an explicit type check in
hashPassword
would be the solution, in that case.
No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.
I have now changed hashPassword
to only accept unicode, or raise TypeError
. I also adjusted the code using hashPassword
to convert their input to unicode, to not break existing code that passes (ASCII-7) strings as secrets.
Do I need to create just one file for this ticket, or also similar ones for the other tickets that this branch solves?
Make one for each ticket that is going to be closed when you merge the branch.
Done.
Please review.
comment:14 follow-up: 15 Changed 11 years ago by
No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.
This is absolutely incomprehensible to me. :/
- "utf-8 encoded input" is a byte string (type
str
). When isstr
not hashable? - What is a "Python Unicode build"?
- Why is the locale relevant at all? There's no locale usage here. Are you talking about the return value of sys.getdefaultencoding()?
- What does "formatting to unicode" mean?
comment:15 Changed 11 years ago by
Replying to exarkun:
No, I actually meant utf-8 encoded input. I couldn't reproduce this until it occurred to me that if you have a Python Unicode build, with the locale set up for UTF-8, UTF-8 will be accepted when formatting to unicode.
This is absolutely incomprehensible to me. :/
Hehe.
- "utf-8 encoded input" is a byte string (type
str
). When isstr
not hashable?
It is always, that's why the test in http://buildbot.twistedmatrix.com/builders/debian-py2.4-gtk2/builds/5/steps/gtk2/logs/problems failed: it didn't raise a UnicodeError
where I did expect it in the previous incarnation of the tests.
- What is a "Python Unicode build"?
- Why is the locale relevant at all? There's no locale usage here. Are you talking about the return value of sys.getdefaultencoding()?
- What does "formatting to unicode" mean?
In sane installs of Python, this happens:
>>> u"%s%s" % ("123", "bla\xc3\xa9") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
However, the infamous site.py
and sitecustomize.py
allow one to make the default encoding different from 'ascii'
through sys.setdefaultencoding
, which is conveniently removed after use. This is a horrible practice, but I suspect that this has happened on this particular build slave. I am pretty sure that sys.getdefaultencoding
returns 'utf-8'
there. With the del
in site.py
removed, you can do this:
>>> import sys >>> sys.setdefaultencoding('utf-8') >>> u"%s%s" % ("123", "bla\xc3\xa9") u'123bla\xe9'
This makes the test fail by not throwing UnicodeError
.
All of this is now avoided by explicitly requiring unicode
inputs.
comment:16 Changed 11 years ago by
yes - gtk sets the default encoding to utf-8. Just to be clear though, this has nothing to do with "locales" - those are something else.
comment:17 Changed 11 years ago by
No, sure. The comment about locales came from my first glance at site.py. On another look, I saw that all that code has if 0:
around it, so please disregard any of that :-)
comment:18 Changed 11 years ago by
Owner: | set to TimAllen |
---|
comment:19 follow-up: 20 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from TimAllen to ralphm |
- In
twisted/words/test/test_jabberclient.py
, the methodInitiatingInitializerHarness.gatherResults
is mostly identical todefer.gatherResults()
with a little bit tacked on the end. Maybe it should call that function, and/or be renamed to something different, likegatherResultsOrFailure
? - In
IQAuthInitializerTest.testDigest()
, the code does "self.assertEquals(sha1('12345secret').hexdigest(), unicode(iq.query.digest))
". I'm pretty sure that.hexdigest()
will return a byte-sequence, not a unicode object., so you shouldn't need to cast the expected digest value. - I realise you didn't write these, but in
BindInitializerTest
both tests have a description of "Test basic operations." The second line of each docstring would make a better docstring, I think. - Likewise, in
SessionInitializerTest
both tests have a description of "Test basic operations." - butSessionInitializerTest.testSuccess()
mentions "resource binding", while.testFailure()
claims to test whether session establishment "succeeds."
comment:20 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | ralphm deleted |
Replying to TimAllen:
- In
twisted/words/test/test_jabberclient.py
, the methodInitiatingInitializerHarness.gatherResults
is mostly identical todefer.gatherResults()
with a little bit tacked on the end. Maybe it should call that function, and/or be renamed to something different, likegatherResultsOrFailure
?
I opted for just using gatherResults
. The look-alike was mostly instrumental while developing the new tests.
- In
IQAuthInitializerTest.testDigest()
, the code does "self.assertEquals(sha1('12345secret').hexdigest(), unicode(iq.query.digest))
". I'm pretty sure that.hexdigest()
will return a byte-sequence, not a unicode object., so you shouldn't need to cast the expected digest value.
The use of unicode()
here is to extract the CDATA from the element, much like XPath's string()
function. For ultimate correctness, I encode this to 'utf-8' before comparison now. Note that using str()
wouldn't work, as this returns unicode
objects, unfortunately. Backwards compatibility prevents us from fixing that any time soon.
- I realise you didn't write these, but in
BindInitializerTest
both tests have a description of "Test basic operations." The second line of each docstring would make a better docstring, I think.
I did write these and corrected them.
- Likewise, in
SessionInitializerTest
both tests have a description of "Test basic operations." - butSessionInitializerTest.testSuccess()
mentions "resource binding", while.testFailure()
claims to test whether session establishment "succeeds."
Likewise, I fixed these.
Please review.
comment:21 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | set to ralphm |
Looks good to me. Ready for merge!
comment:22 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [27896]) Fix failing tests for Python 2.7, allow non-ascii secrets in hashPassword.
Author: ralphm. Reviewer: exarkun, TimAllen. Fixes #3741, #3742, #3847.
This lets twisted.words.protocols.jabber.xmlstream.hashPassword now only accept unicode as input. The input before hashing is encoded to UTF-8. Tests for hashPassword were added and a number of tests were rewritten to deal with asynchronous failures.
comment:23 Changed 10 years ago by
Owner: | ralphm deleted |
---|
Additionally, there seem to be no direct unit tests for
hashPassword
, and it doesn't document its parameters (in particular, it doesn't document whether they are to be str or unicode).