Opened 4 years ago

Closed 2 years ago

#6833 enhancement closed fixed (fixed)

Port twisted.protocols.amp to Python 3

Reported by: habnabit Owned by: Gavin Panella
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: ftsamis@…, Thijs Triemstra, gavin@… Branch: branches/amp-python3-6833-4
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl, adiroiban

Description

twisted.protocols.amp is a commonly-used module, and it should run on Python 3.

Attachments (4)

amp-py3.patch (5.3 KB) - added by Thijs Triemstra 2 years ago.
amp-python3-6822-3.patch (74.2 KB) - added by Gavin Panella 2 years ago.
amp-python3-6822-3.1.patch (88.5 KB) - added by Gavin Panella 2 years ago.
amp-python3-6833-4.add-docstrings.patch (1.3 KB) - added by Gavin Panella 2 years ago.
Incremental patch to add missing test docstrings to branches/amp-python3-6833-4.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 4 years ago by habnabit

Author: habnabit
Branch: branches/amp-python3-6833

(In [40739]) Branching to 'amp-python3-6833'

comment:2 Changed 2 years ago by Fotis Tsamis

Cc: ftsamis@… added

Changed 2 years ago by Thijs Triemstra

Attachment: amp-py3.patch added

comment:3 Changed 2 years ago by Thijs Triemstra

Author: habnabit
Cc: Thijs Triemstra added
Keywords: review added
Owner: habnabit deleted

Attached a patch that fixes the most obvious python 3 errors. It now dies with this error:

Traceback (most recent call last):
  File "/Users/thijs/projects/twisted/twisted/trial/runner.py", line 784, in loadByName
    return self.suiteFactory([self.findByName(name, recurse=recurse)])
  File "/Users/thijs/projects/twisted/twisted/trial/runner.py", line 699, in findByName
    obj = reflect.namedModule(searchName)
  File "/Users/thijs/projects/twisted/twisted/python/reflect.py", line 154, in namedModule
    topLevel = __import__(name)
  File "/Users/thijs/projects/twisted/twisted/test/test_amp.py", line 17, in <module>
    from twisted.protocols import amp
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 2114, in <module>
    _DescriptorExchanger):
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 2406, in BinaryBoxProtocol
    StartTLS.responder(_defaultStartTLSResponder)
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 1813, in responder
    CommandLocator._currentClassCommands.append((cls, methodfunc))
builtins.AttributeError: type object 'CommandLocator' has no attribute '_currentClassCommands'

Maybe this patch can be send to buildbot so we can start figuring out what's needed to port this.

comment:4 Changed 2 years ago by habnabit

That would be because of differing metaclass semantics, which is one of the reasons I gave up when I was working on this before. Since there's interest, here are the two major problems:

  1. This problem: metaclasses work differently. At the time I was working on this, six only had with_metaclass, which was insufficient. I think the newer @add_metaclass will do the right thing, though.
  2. AMP takes advantage of that python native strings are bytes. Since python 3 native strings are not bytes, there's a lot of code that breaks. With the size of AMP's implementation and tests, it's tedious to examine each string literal to determine "okay, is this supposed to be bytes or a native string?" I don't know if semantics were ever defined for non-ASCII identifiers in AMP, but it might be sufficient to force utf-8 for all keys.

comment:5 Changed 2 years ago by Adi Roiban

Author: adiroiban
Branch: branches/amp-python3-6833branches/amp-py3-6833

(In [46279]) Branching to amp-py3-6833.

comment:6 Changed 2 years ago by Adi Roiban

Branch: branches/amp-py3-6833branches/amp-python3-6833-2

(In [46280]) Branching to amp-python3-6833-2.

comment:7 Changed 2 years ago by Adi Roiban

Keywords: review removed
Owner: set to Thijs Triemstra

Patch applied and sent to buildbot.

Looking forward for the final patch.

Don't forget to add a news file fragment and described here http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles

Thanks again!

Changed 2 years ago by Gavin Panella

Attachment: amp-python3-6822-3.patch added

comment:8 Changed 2 years ago by Gavin Panella

Keywords: review added
Owner: Thijs Triemstra deleted

This is a port of twisted.protocols.amp to Python 3. The tests pass on Python 2.7, 3.4 and 3.5. My branch is on GitHub, and a diff can be seen here:

https://github.com/twisted/twisted/compare/trunk...allenap:amp-python3-6833-3

I did see thijs's work on this before starting; his/her contribution was helpful. Thank you!

It began as fairly strict with regards to byte strings: if something MUST be a byte string or COULD be a byte string in Python 2, it MUST be a byte string in Python 3.

However, I relaxed that rule in several instances for Python 3, to create a more natural API and, in some instances, out of necessity:

  • AmpBox.__init__() accepts keyword arguments. In Python 3 these MUST be Unicode/native strings whereas in Python 2 they could be either byte strings or Unicode strings. In Python 3, therefore, keys are coerced to (ASCII) byte strings.

This poses a problem: AmpBox.__init__() has no command information so there's no way to make use of _wireNameToPythonIdentifier(). Coercing is all we can do.

However, this coercion is entirely for the benefit of tests. The implementation code does not use keyword arguments when creating instances of AmpBox and its derivatives. Eliminating the use of keyword arguments when creating AmpBox would address the need for coercion. There may be non-Twisted code around that creates AmpBox directly, so a compromise might be to forbid keyword arguments under Python 3 only.

  • BoxDispatcher._errorReceived() decodes error descriptions as UTF-8 text using the replace errors strategy. Python 3 does not permit byte strings as exception descriptions.

Would the surrogateescape errors strategy be better, so that decoding is loss-less?

  • BoxDispatcher._commandReceived() > formatError() encodes error descriptions as UTF-8 text using the replace errors strategy.

Would the surrogateescape errors strategy be better, so that encoding is loss-less?

At the same time as doing this port I've also been porting some not-in-Twisted code that uses t.p.amp. This highlighted some mistakes I made early on, and has helped this port feel "right".

I had to make many changes to the tests, mostly to add binary prefixes to strings. However, I have tested the ported t.p.amp module with the non-ported test module, and it worked (in Python 2.7 only, naturally).

There's one small thing that I spotted that I'm not sure how to fix, or even if we care to fix it:

  • Path (Argument subclass): A bytes-mode FilePath on one side will always be deserialised as a text-mode FilePath.

Of course, there may be others.

Please review!

comment:9 Changed 2 years ago by Adi Roiban

Branch: branches/amp-python3-6833-2branches/amp-python3-6833-3

Update branch

comment:10 Changed 2 years ago by Adi Roiban

I have never used AMP so I am just doing a high level review here.

Tests are green ... but the test code itself was also updated.. so I am not sure if actual application using AMP will still work after we merge these changes.


Do we still need the metaclass = type declaration?

Do we want/need byteString as a public API?

AmpBox.__init___ special Python3 handling needs a comment.

I know that you added the comments as a Trac comments but we need this info as close as possible to the source and future developers will not browser all past Trac conversation to get info about a certain change.

Also, please check Twisted naming conventions non_byte_names vs nonByteNames

Does the new AMP code only accept bytes or still works with 'str' on Py3 ? Please also update the docstrins to make it clear that we have 'bytes'.

While on Py2 str is just bytes, on Py3 str closer to unicode than bytes.

If a value should be unicode, please keep the u"ignored" marker as it will make sure it continue to be Unicode in Python2.


For BoxDispatcher._nextTag can we have some call which works on all systems... example explicit hexify ? From what I can see, there is no interpolation there, just a conversion to hex... which as a hack it was implemented using interpolation.

I would like to see as few _PY3 branches as possible.


In dispatchCommand the interpolation is just used as a concatenation. Can we replace the bytesString call with a bytes concatenation?


# Apply __metaclass__ to CommandLocator.

Is just stating the obvious, can we have a comment explaining why this is needed ? :)


For _wireNameToPythonIdentifier, why do we need the new assertion?

Can we just update the docstring to inform that bytes are required ?

This looks like a private method so we are in full control of the way in which it is called.


For Argument.fromBox / toBox we need to update the docstrings to explicilty say bytes.... and this is a general comments for all the current API.

But is it ok to force bytes ? What happen with existing apps which pass Unicode ?


I see a lot of new assert statments. Why do we need them?

I prefer to have better docstrings that those assertions.


Here is the code coverage

https://codecov.io/github/twisted/twisted/commit/a78c07355c15a7e7cb24b3beea7f0c32846800a1

Please note that the port should have at least 100% coverage ... so if the new code raises new exceptions (including assertions) please write tests for them.


Here are some style guide errors for our builder:

(view as text)
************* Module twisted.protocols.amp
W9201:293,0:byteString: The opening/closing of docstring should be on a line by themselves
W9202:293,0:byteString: Missing epytext markup @param for argument "encoding"
W9202:293,0:byteString: Missing epytext markup @param for argument "string"
C0103:624,12:AmpBox.__init__: Invalid name "non_byte_names" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:625,16:AmpBox.__init__: Invalid name "non_byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:626,16:AmpBox.__init__: Invalid name "byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
W9501:823,20:BoxDispatcher._nextTag: String formatting operations should always use a tuple for non-mapping values
************* Module twisted.test.test_amp
C0103:3068,8:DescriptorTests.test_roundTrip: Invalid name "name_as_bytes" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)



I am leaving this for review as I hope that this patch will get a second pair of eyes... but please consider my comments and update your branch.

Thanks!

comment:11 Changed 2 years ago by Gavin Panella

Cc: gavin@… added

comment:12 Changed 2 years ago by Adi Roiban

Keywords: review removed
Owner: set to Gavin Panella

Please check my comments and submit a new patch.

And I hope that we can find someone else to do the final review.

Thanks!

Changed 2 years ago by Gavin Panella

Attachment: amp-python3-6822-3.1.patch added

comment:13 Changed 2 years ago by Gavin Panella

Keywords: review added
Owner: changed from Gavin Panella to Adi Roiban

My comments below refer to the (quoted) review of amp-python3-6822-3.patch and my updates which are in amp-python3-6822-3.1.patch.

I have never used AMP so I am just doing a high level review here.

Tests are green ... but the test code itself was also updated.. so I am not sure if actual application using AMP will still work after we merge these changes.


Do we still need the __metaclass__ = type declaration?

For Python 2, yes, for the classes that are not defined with an explicit metaclass.

Do we want/need byteString as a public API?

No. I realised it was used in two places and neither greatly benefitted from it. I have removed it.

AmpBox.__init___ special Python3 handling needs a comment.

I know that you added the comments as a Trac comments but we need this info as close as possible to the source and future developers will not browser all past Trac conversation to get info about a certain change.

Good point. Done.

Also, please check Twisted naming conventions non_byte_names vs nonByteNames

Done.

Does the new AMP code only accept bytes or still works with 'str' on Py3 ? Please also update the docstrins to make it clear that we have 'bytes'.

In general it accepts native strings in Python 3, but coerces them to byte strings where necessary. In many places there's no need to coerce because _wireNameToPythonIdentifier is used to derive a key. When updating the docs I've drawn attention to places where this happens.

While on Py2 str is just bytes, on Py3 str closer to unicode than bytes.

If a value should be unicode, please keep the u"ignored" marker as it will make sure it continue to be Unicode in Python2.

I think there was just the one place where I had inadvertently removed the u prefix. Fixed, but let me know if you found others.


For BoxDispatcher._nextTag can we have some call which works on all systems... example explicit hexify ? From what I can see, there is no interpolation there, just a conversion to hex... which as a hack it was implemented using interpolation.

I would like to see as few _PY3 branches as possible.

hexlify operates on byte strings. hex(int) returns a native string so this would need to be encoded in Python 3 too (it also returns a leading "0x" which would need to be removed). We could use a Unicode string in all Python versions:

    def _nextTag(self):
        self._counter += 1
        return (u'%x' % self._counter).encode("ascii")

which would avoid the conditional on _PY3. What do you think?


In dispatchCommand the interpolation is just used as a concatenation. Can we replace the bytesString call with a bytes concatenation?

I've eliminated this. The description passed to RemoteAmpError should always be a native string, and it is elsewhere.


# Apply __metaclass__ to CommandLocator.

Is just stating the obvious, can we have a comment explaining why this is needed ? :)

Done.


For _wireNameToPythonIdentifier, why do we need the new assertion?

Can we just update the docstring to inform that bytes are required ?

This looks like a private method so we are in full control of the way in which it is called.

It was a belt-n-braces thing. It helped while I was working but can be removed now.


For Argument.fromBox / toBox we need to update the docstrings to explicilty say bytes.... and this is a general comments for all the current API.

Done.

But is it ok to force bytes ? What happen with existing apps which pass Unicode ?

I added most/all of the assertions to help with the port. Most of them I've now removed.

Unicode.toString()

In Python 2 folk could be taking advantage of implicit promotion from str to unicode. In Python 3 this will break if the argument is a byte string. The assertion is no longer necessary, and could break some valid (though dubious) scenarios in Python 2.

AmpList.__init__()

This checks that all sub-arguments are defined with byte string names. This helps catch mistakes early on; errors resulting from a mistake here can be hard to diagnose. I've improved the fail message.


I see a lot of new assert statments. Why do we need them?

I added most/all of the assertions to help with the port, explanations for those remaining are above.

I prefer to have better docstrings that those assertions.

I agree with the docstrings.


Here is the code coverage

https://codecov.io/github/twisted/twisted/commit/a78c07355c15a7e7cb24b3beea7f0c32846800a1

Please note that the port should have at least 100% coverage ... so if the new code raises new exceptions (including assertions) please write tests for them.

Done, I think. Most significantly I added new tests for the checks I previously added to Command.__metaclass__.


Here are some style guide errors for our builder:

************* Module twisted.protocols.amp
W9201:293,0:byteString: The opening/closing of docstring should be on a line by themselves
W9202:293,0:byteString: Missing epytext markup @param for argument "encoding"
W9202:293,0:byteString: Missing epytext markup @param for argument "string"
C0103:624,12:AmpBox.__init__: Invalid name "non_byte_names" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:625,16:AmpBox.__init__: Invalid name "non_byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:626,16:AmpBox.__init__: Invalid name "byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
W9501:823,20:BoxDispatcher._nextTag: String formatting operations should always use a tuple for non-mapping values
************* Module twisted.test.test_amp
C0103:3068,8:DescriptorTests.test_roundTrip: Invalid name "name_as_bytes" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)

I think I've addressed all of those.


I am leaving this for review as I hope that this patch will get a second pair of eyes... but please consider my comments and update your branch.

Thank you!

comment:14 Changed 2 years ago by hawkowl

Author: adiroibanhawkowl, adiroiban
Branch: branches/amp-python3-6833-3branches/amp-python3-6833-4

(In [46462]) Branching to amp-python3-6833-4.

comment:15 Changed 2 years ago by hawkowl

(In [46466]) applying patch from allenap, refs #6833

comment:16 Changed 2 years ago by hawkowl

Keywords: review removed
Owner: changed from Adi Roiban to Gavin Panella

Hi allenap!

Thanks for the patch! The coverage is looking very nice, so I'm confident that it'll work great when merged :)

Unfortunately twisted.protocols.amp relies on twisted.python._tzhelper, which is not a ported module. Since it's small enough, it should be able to be ported as a part of this ticket with no worries.

Could you port it (basing your patch off this branch, as I fixed up a future import or two) and put it up for review again?

Again, many thanks! This is an essential module that a lot of people use, and it being ported is a good first step for other modules which depend on it.

comment:17 Changed 2 years ago by Gavin Panella

Keywords: review added
Owner: Gavin Panella deleted

I'm happy to do that, but I think t.p._tzhelper has been ported already. It (and test_tzhelper) are in dist3.py, and the tests pass in Python 3.4 and 3.5.

comment:18 Changed 2 years ago by Adi Roiban

Thanks. I can confirm that tzhelper is in dist3.pyy

Waiting for Amber to come online just to double check.

comment:19 Changed 2 years ago by hawkowl

Keywords: review removed
Owner: set to Adi Roiban

Huh, seems it is. Doesn't have future imports, but if it's in dist3, it probably works.

Adi, go ahead and land it :)

comment:20 Changed 2 years ago by Adi Roiban

Owner: changed from Adi Roiban to Gavin Panella

Thanks, will merge.

Thanks Gavin for your contribution!

comment:21 Changed 2 years ago by Adi Roiban

Waiting for test results from latest commit... and there is a failing documentation builder.

Changed 2 years ago by Gavin Panella

Incremental patch to add missing test docstrings to branches/amp-python3-6833-4.

comment:22 Changed 2 years ago by Gavin Panella

Keywords: review added
Owner: changed from Gavin Panella to Adi Roiban

Incremental patch for docstrings, as requested in #twisted-dev. Plus I have only just realised that I had named my patches with the wrong ticket number. Oops, and sorry.

comment:23 Changed 2 years ago by Adi Roiban

Keywords: review removed
Owner: changed from Adi Roiban to Gavin Panella

Thanks. Latest patch looks good.

I will merge.

Many thanks!

comment:24 Changed 2 years ago by Adi Roiban

Resolution: fixed
Status: newclosed

(In [46516]) Merge amp-python3-6833-4: Port twisted.protocols.amp to Python 3.

Author: allenap Reviewer: adiroiban, hawkowl Fixes: #6833

Note: See TracTickets for help on using tickets.