Opened 2 years ago

Closed 22 months ago

Last modified 22 months ago

#6245 regression closed fixed (fixed)

Enforcing byte strings in DNS names breaks XMPP example, Wokkel.

Reported by: ralphm Owned by: therve
Priority: high Milestone:
Component: names Keywords:
Cc: Branch: branches/srvconnect-warn-unicode-6245
(diff, github, buildbot, log)
Author: ralphm Launchpad Bug:

Description

The change for porting of twisted.names to Python 3 in #6057 breaks doc/words/examples/xmpp_client.py and more or less all of Wokkel. To get the host to connect to they use twisted.named.srvconnect.SRVConnector with the XMPP domain. In most cases, this is a unicode object, as gotten from JID.host.

Of course, this never worked correctly for domains with labels that are not all ASCII, because no encoding to punycode is done by Twisted Names.

Change History (27)

comment:1 Changed 2 years ago by ralphm

  • Status changed from new to assigned

I believe there are basically two approaches to getting this fixed:

  • Continue to expect byte strings in Twisted Names and expect all users to encode IDNs themselves. This implies fixing the XMPP example and Wokkel, and all of the older code to continue to be broken for ASCII-only-but-still-unicode domains.
  • Support IDNs in Twisted Names.

To translate a unicode domain name into a punycode encoded version, the algorithm goes like this:

from encodings import idna

def prepare(domain):
    result = []
    labels = idna.dots.split(domain)

    if labels and len(labels[-1]) == 0:
        trailing_dot = b'.'
        del labels[-1]
    else:
        trailing_dot = b''

    for label in labels:
        result.append(idna.ToASCII(label))

    return b'.'.join(result) + trailing_dot

comment:2 Changed 2 years ago by ralphm

  • Author set to ralphm
  • Branch set to branches/names-idn-6245

(In [36761]) Branching to 'names-idn-6245'

comment:3 Changed 2 years ago by ralphm

(In [36762]) Add utility function for encoding IDN domain names to ACE.

Refs: #6245.

comment:4 Changed 2 years ago by ralphm

  • Branch changed from branches/names-idn-6245 to branches/srvconnect-warn-unicode-6245

(In [36763]) Branching to 'srvconnect-warn-unicode-6245'

comment:5 Changed 2 years ago by ralphm

(In [36764]) Attempt to encode unicode domain to ASCII and issue a deprecation warning.

Refs: #6245.

comment:6 Changed 2 years ago by ralphm

  • Keywords review added
  • Owner ralphm deleted
  • Status changed from assigned to new

After some discussion with exarkun, this ticket should really only be about how passing a domain name as a unicode object is deprecated. The IDN support for Twisted Names could be another ticket at some point.

I changed the xmpp_client.py example to explicitly encode the host part of the JID to ASCII. I'm not even sure that you could pass a non-ASCII JID from the command line as Python 2 doesn't actually specify which encoding the bytes of sys.argv are in. This seems to have been fixed by Python issue 2128, for Python 3.

Please review.

comment:7 Changed 2 years ago by itamar

The docstrings/deprecations say "since Twisted 12.3.0", that should be probably 13.0 since 12.3 is already out.

comment:8 Changed 2 years ago by ralphm

But it was really deprecated by #6057. This change adds the deprecation warning and a fallback encoding to bytes so that existing applications don't break.

comment:9 Changed 2 years ago by ralphm

To be clear, I'd like to see if we can make a 12.3.1 release for this.

comment:10 Changed 2 years ago by therve

Making a point release is relatively expensive and boring, and it seems this bug is relatively easy to workaround in wokkel, so I'm not particularly motivated to do it personally.

comment:11 Changed 2 years ago by ralphm

I've made a Wokkel ticket to address this and released Wokkel 0.7.1 to address this.

I understand that making a new bug fix release is boring, and just hope we don't get in a situation where distributions ship Twisted 12.3.0 and Wokkel < 0.7.1.

comment:12 Changed 2 years ago by tom.prince

  • Keywords review removed
  • Owner set to ralphm
  • The test for deprecation should pass itself in offendingFunctions to flushWarnings.
  • It is customary to assert that only the expected warning was emitted (i.e. len(warningsShown) == 1)
  • There is a trivially failing test (buildbot run).

Thanks. Please merge after fixing.

comment:13 Changed 2 years ago by tom.prince

Also, I'm not sure, but it might be worthwhile to mention the regression in 12.3 related to this, in the news.

comment:14 Changed 2 years ago by ralphm

(In [36865]) Address review comments.

Refs: #6245.

comment:15 Changed 2 years ago by ralphm

(In [36866]) Also flush the warning issued (indirectly) by the other test.

Refs: #6245.

comment:16 Changed 2 years ago by ralphm

  • Resolution set to fixed
  • Status changed from new to closed

(In [36867]) Deprecate unicode domain names for SRVConnector.

Author: ralphm
Reviewer: tom.prince
Fixes: #6245

twisted.names.dns.Names was recently changed to reject unicode domain names,
breaking XMPP clients passing the domain part of a JID to SRVConnector. This
change issues a deprecation warning if a unicode domain name is passed and
attempts to convert the domain name to an all-ASCII byte string for backwards
compatibility.

Note that, to support Internationalized Domain Names (IDN), applications must
convert the domain name to its ASCII Compatible Equivalent (ACE) as specified
in RFC 3490. The result can then be used as the domain to connect to with
SRVConnector.

comment:17 Changed 23 months ago by ralphm

  • Resolution fixed deleted
  • Status changed from closed to reopened

(In [36923]) Revert r36867 - This fix doesn't address the core problem.

Reopens: #6245.

The core issue is that twisted.names.dns.Name changed to explicitly reject
unicode domain names. The fix done for SRVConnector should also be applied
there so that direct users of Name will also get the warning and fallback
behaviour.

The warning in SRVConnector might be kept to have the warning refer to its
caller rather than further down the call stack.

comment:18 Changed 23 months ago by ralphm

(In [37042]) Also deprecate unicode of twisted.names.dns.Name.

Refs: #6245.

comment:19 Changed 23 months ago by ralphm

  • Keywords review added
  • Owner ralphm deleted
  • Status changed from reopened to new

comment:21 follow-up: Changed 22 months ago by glyph

  • Keywords review removed
  • Owner set to ralphm

Test failures all look unrelated, so, good.

  1. Sorry to be so late to this discussion, but your claim on your initial comment looks to be incorrect. Python ships with a fully-functional IDNA encoding, which is distinct from its Punycode encoding. Observe:
    >>> u'x.\xe0.y.z'.encode('punycode')                                            
    'x..y.z-jta'                                                                    
    >>> u'x.\xe0.y.z'.encode('idna')                                                
    'x.xn--0ca.y.z'                                                                 
    
    This program behaves the same on 2.7 and 3.3. Couldn't we implement all of IDNA support for twisted.names just by changing the string 'ascii' to the string 'idna'? If not, please proceed without regard to this point; I can see why testing the code that you put into the first comment here would be tricky enough to warrant its own ticket. Otherwise, I'd really like to hear why you and exarkun thought that changing this string and un-deprecating unicode in twisted.names ought to be reserved for a future release.
  2. The example, at least, ought to be changed to use IDNA encoding; that is more correct than ASCII.
  3. The @type should never be str, as that's ambiguous. L{bytes} or L{unicode} (since you have to import it from py3k compat anyway).
  4. If you decide to address the first point by implementing IDNA, you should add some documentation to dns.Name explaining the encoding behavior of the __init__ argument (since the @ivar will in fact always be bytes). Otherwise you should add some documentation explaining the deprecation.

Otherwise, this looks fine, such as it is. If there's really a good reason to not just implement IDNA, go ahead; in either case, the change is small enough that I'm confident you can ensure it's in good shape before landing. I'm available tomorrow if you want to discuss whether IDNA support would actually be accomplished by changing that one string :).

comment:22 in reply to: ↑ 21 Changed 22 months ago by ralphm

Replying to glyph:

Test failures all look unrelated, so, good.

  1. Sorry to be so late to this discussion, but your claim on your initial comment looks to be incorrect. Python ships with a fully-functional IDNA encoding, which is distinct from its Punycode encoding. Observe:
    >>> u'x.\xe0.y.z'.encode('punycode')                                            
    'x..y.z-jta'                                                                    
    >>> u'x.\xe0.y.z'.encode('idna')                                                
    'x.xn--0ca.y.z'                                                                 
    
    This program behaves the same on 2.7 and 3.3. Couldn't we implement all of IDNA support for twisted.names just by changing the string 'ascii' to the string 'idna'? If not, please proceed without regard to this point; I can see why testing the code that you put into the first comment here would be tricky enough to warrant its own ticket. Otherwise, I'd really like to hear why you and exarkun thought that changing this string and un-deprecating unicode in twisted.names ought to be reserved for a future release.

I had a mental black-out the day I wrote that code, even though it is
partially from the stringprep support in Words. After I had released Wokkel
with this, I came to the same conclusion and .encode('idna') indeed does
exactly the same thing.

  1. The example, at least, ought to be changed to use IDNA encoding; that is more correct than ASCII.

Actually, given comment:6, I'm not sure how to make that work properly in
Python 2 for all cases. And then there's also the situation with the horribly
broken concept of the Digest-MD5 SASL mechanism, of which #5066 is only one
symptom, that would also make things break if you would actually try to use
IDNs there.

  1. The @type should never be str, as that's ambiguous. L{bytes} or L{unicode} (since you have to import it from py3k compat anyway).
  1. If you decide to address the first point by implementing IDNA, you should add some documentation to dns.Name explaining the encoding behavior of the __init__ argument (since the @ivar will in fact always be bytes). Otherwise you should add some documentation explaining the deprecation.

Otherwise, this looks fine, such as it is. If there's really a good reason to not just implement IDNA, go ahead; in either case, the change is small enough that I'm confident you can ensure it's in good shape before landing. I'm available tomorrow if you want to discuss whether IDNA support would actually be accomplished by changing that one string :).

In general (not taking the example into account) I think that's probably a
good idea. I think the only reason I didn't do that to begin with was that it
would introduce new code.

comment:23 Changed 22 months ago by therve

  • Keywords review added
  • Owner ralphm deleted

Back for review with idna encoding. Build results here: http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/539

comment:24 Changed 22 months ago by ralphm

  • Keywords review removed
  • Owner set to therve

I'm not sure if I should review this, but I want to note that I think that the change to the XMPP example should be reversed again, per comment:#6 and comment:22.

Looks good to me otherwise.

comment:25 Changed 22 months ago by glyph

That build results link is bogus; it's just a 3.3 build. Assuming the builds on other platforms look good though, I am also willing to sign off on the current contents of the branch. I don't see anything related to the XMPP example in http://twistedmatrix.com/~diffresource.twistd/6245 so I don't think that is relevant right now.

comment:26 Changed 22 months ago by therve

  • Resolution set to fixed
  • Status changed from new to closed

(In [37417]) Merge srvconnect-warn-unicode-6245

Authors: ralphm, therve
Reviewers: glyph, tom.prince
Fixes: #6245

Support unicode in twisted.names.srvconnect.SRVConnector and
twisted.names.dns.Names by automatically converting to bytes using the idna
encoding. This fixes a regression introduced by porting twisted.names code to
python 3.

comment:27 Changed 22 months ago by glyph

Now that this is closed, someone should likely revisit what needs to be done with an existing ticket for unicode / dns interaction.

Note: See TracTickets for help on using tickets.