Opened 5 years ago

Last modified 18 months ago

#5389 enhancement new

Unicode support in DNS lookups

Reported by: Itamar Turner-Trauring Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:


reactor.resolve(), as well as connectTCP etc., should accept Unicode strings, and convert them to byte strings (using IDNA) for lookup.

Change History (5)

comment:1 Changed 5 years ago by philmayers

One note: you need to be careful about "false friends" - unicode characters that look like other characters, and allow domain impersonation attacks and such like. At the current time, I'm not aware of any consensus on what level (application, framework, etc.) this should be handled.

I will note that you can already trivially do this:

>>> u'some.domain£'.encode('idna')

...and just lookup the encoded IDNA. It might be safest to punt on handling unicode DNS names ;o)

comment:2 Changed 5 years ago by Jean-Paul Calderone

There's probably some value in centralizing the logic, if any, for defending against this kind of attack. Punting is "safer" for Twisted in that we avoid having code that might end up being vulnerable to an attack, but not safer over all since it means each application needs to implement a solution, each of which might be vulnerable to an attack.

Twisted also needs this functionality itself, for full HTTP client support, so that it can issues requests for URLs (IRIs) with a non-ASCII host (#5388).

comment:3 Changed 5 years ago by philmayers

Maybe. If you get it right, and if there's even any consensus on what "right" means in this context!

Personally I think it would be a serious mistake to retrofit this support into reactor.connectTCP and the like, for two reasons:

  1. As is documented somewhere in the IPv6 plan, AIUI the long (long long long) term aim is to deprecate hostname lookup support in these APIs in favour of some better API (endpoints or whatever). I can't find the reference to that right now, but I'm sure it's written down in a ticket somewhere.
  1. If you transparently add support for unicode names to connectTCP, you risk changing behaviour. If this wasn't acceptable for IPv6, I fail to see why it's acceptable for unicode names.

However, it probably does make sense to support this in the web client. Just not at the TCP API level.

I note that, at least under Firefox and Chrome, entering a unicode domain in the URL bar doesn't stay that way - they are replaced with the ASCII, IDNA-encoded copy. This was precisely for the "false friends" reasons IIRC - the browser vendors were unable to come up with a satisfactory solution. This should indicate the difficulties involved.

comment:5 Changed 18 months ago by Adi Roiban

There is ticket #7956 with ask for better documentation for connectTCP. Is it still true that name resolusion is going to be deprecated from connectTCP ?

Note: See TracTickets for help on using tickets.