Opened 5 years ago

Last modified 5 months ago

#7956 enhancement new

The host parameter of IReactorTCP.connectTCP in twisted.internet.interfaces should be documented as unicode instead of bytes.

Reported by: _sunu_ Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

Here's a conversation between me and hawkowl about the issue.

<sunu> The connectTCP method of MemoryReactor which is used in tests seem to expect a unicode string as host argument. https://github.com/twisted/twisted/blob/trunk/twisted/test/proto_helpers.py#L489 <sunu> But the actual reactor needs the host argument to be byte string. https://github.com/twisted/twisted/blob/trunk/twisted/internet/interfaces.py#L652 <sunu> Also the MemoryReactor stuff doesn't look like Python3 ready(strings have not been marked as unicode or bytes) But dist3.py marks it as fully ported module. <hawkowl> sunu: some strings should be native strings <sunu> hawkowl: Ah, ok. What about the host argument thing? A bunch of tests are failing because it expects unicode string instead of byte string :( <hawkowl> sunu: hmm. i think they should be bytes <sunu> hawkowl: Exactly. But if I pass host as a byte string to https://github.com/twisted/twisted/blob/trunk/twisted/test/proto_helpers.py#L489 it fails at https://github.com/twisted/twisted/blob/trunk/twisted/test/proto_helpers.py#L495 <hawkowl> sunu: well, IPv4/6Address needs to be able to cope <hawkowl> sunu: so change it to handle both, I guess :)

Attachments (2)

reactor-doc.patch (398 bytes) - added by _sunu_ 5 years ago.
reactor-doc-1.patch (717 bytes) - added by _sunu_ 5 years ago.
Added topfiles

Download all attachments as: .zip

Change History (10)

comment:1 Changed 5 years ago by Adi Roiban

Thanks for taking the time to work on this!

connectTCP was documented here https://twistedmatrix.com/trac/ticket/1822 https://github.com/twisted/twisted/blame/trunk/twisted/internet/interfaces.py#L658 ... but I think that this is a bug... as I see no explicit request from a reviewer to bytes.

I think that test are fine and hostnames and address literals should always be text aka Unicode.

Do we know of people passing hostnames as bytes?

I would prefer to fix the documentation.

In case we really need to support bytes, I prefer to have deprecation warning each time bytes are passed as hostnames or address literals.

Thanks!

Changed 5 years ago by _sunu_

Attachment: reactor-doc.patch added

comment:2 Changed 5 years ago by _sunu_

Keywords: review added
Owner: _sunu_ deleted

I took a quick look around the code base. And it seems almost everywhere host is expected to be unicode. So I'm attaching a patch for the documentation here. Not sure if this is the right ticket to attach that patch though.

comment:3 Changed 5 years ago by Adi Roiban

The patch needs a news file fragment.

The title of this ticket talks about having an isIPv4Address which accepts both bytes and unicodes. With your patch isIPv4Address is documented as only accepting unicode.

Maybe you should update the title of this ticket.

I am leaving this on review so that other developer can check the change.

Meanwhile feel free to update the patch ... but if change is accepted I can also create the newsfile for you before the merge :)

Thanks for your contribution!

comment:4 Changed 5 years ago by _sunu_

Summary: isIPv6Address from twisted.internet.abstract should be able to address both unicode and bytes addressIReactorTCP's host argument should be documented as unicode instead of bytes.

Changed 5 years ago by _sunu_

Attachment: reactor-doc-1.patch added

Added topfiles

comment:5 Changed 5 years ago by _sunu_

Summary: IReactorTCP's host argument should be documented as unicode instead of bytes.The host parameter of IReactorTCP.connectTCP in twisted.internet.interfaces should be documented as unicode instead of bytes.

comment:6 Changed 5 years ago by Adi Roiban

Keywords: review removed
Owner: set to _sunu_

The argument is documented as Unicode, but do we have a test for the case when Unicode (with non-ascii characters are used) ?

Does the code work with any Unicode value / internationalized domain names?

I think that current documentation should state that it should be Unicode wish ascii only characters as IDNA (Punycode) is not yet tested ... and supported.

If we document it as Unicode users might think that IDNA is supported.

Either update the documentation to explain that IDNA are no supported yet or convert this ticket into a ticket implementing IDNA support. Python's idna package is already a soft dependency for the TLS code and we can make it a general soft dependency.

There is ticket #5389 which talks about adding IDNA support for connect and resolve methods. See this comment https://twistedmatrix.com/trac/ticket/5389#comment:3

It looks like the long, long term plan is to deprecate hostname lookup in connectTCP so maybe is ok to skip implementing IDNA support at this level and just document the current support.

Documentation should state that values should be text/Unicode but without non-ascii characters.

Thanks!

comment:7 Changed 5 months ago by Julian Berman

Owner: _sunu_ deleted

#9763 was closed as a duplicate.

comment:8 Changed 5 months ago by Jean-Paul Calderone

Why is the correct type unicode when it was always bytes on Python 2? This is an incompatible change to one of the oldest APIs in Twisted.

Note: See TracTickets for help on using tickets.