Opened 3 years ago

Closed 7 months ago

#6833 enhancement closed fixed (fixed)

Port twisted.protocols.amp to Python 3

Reported by: habnabit Owned by: allenap
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: ftsamis@…, thijs, gavin@… Branch: branches/amp-python3-6833-4
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl, adiroiban

Description

twisted.protocols.amp is a commonly-used module, and it should run on Python 3.

Attachments (4)

amp-py3.patch (5.3 KB) - added by thijs 8 months ago.
amp-python3-6822-3.patch (74.2 KB) - added by allenap 8 months ago.
amp-python3-6822-3.1.patch (88.5 KB) - added by allenap 8 months ago.
amp-python3-6833-4.add-docstrings.patch (1.3 KB) - added by allenap 8 months ago.
Incremental patch to add missing test docstrings to branches/amp-python3-6833-4.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 3 years ago by habnabit

  • Author set to habnabit
  • Branch set to branches/amp-python3-6833

(In [40739]) Branching to 'amp-python3-6833'

comment:2 Changed 10 months ago by ftsamis

  • Cc ftsamis@… added

Changed 8 months ago by thijs

comment:3 Changed 8 months ago by thijs

  • Author habnabit deleted
  • Cc thijs added
  • Keywords review added
  • Owner habnabit deleted

Attached a patch that fixes the most obvious python 3 errors. It now dies with this error:

Traceback (most recent call last):
  File "/Users/thijs/projects/twisted/twisted/trial/runner.py", line 784, in loadByName
    return self.suiteFactory([self.findByName(name, recurse=recurse)])
  File "/Users/thijs/projects/twisted/twisted/trial/runner.py", line 699, in findByName
    obj = reflect.namedModule(searchName)
  File "/Users/thijs/projects/twisted/twisted/python/reflect.py", line 154, in namedModule
    topLevel = __import__(name)
  File "/Users/thijs/projects/twisted/twisted/test/test_amp.py", line 17, in <module>
    from twisted.protocols import amp
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 2114, in <module>
    _DescriptorExchanger):
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 2406, in BinaryBoxProtocol
    StartTLS.responder(_defaultStartTLSResponder)
  File "/Users/thijs/projects/twisted/twisted/protocols/amp.py", line 1813, in responder
    CommandLocator._currentClassCommands.append((cls, methodfunc))
builtins.AttributeError: type object 'CommandLocator' has no attribute '_currentClassCommands'

Maybe this patch can be send to buildbot so we can start figuring out what's needed to port this.

comment:4 Changed 8 months ago by habnabit

That would be because of differing metaclass semantics, which is one of the reasons I gave up when I was working on this before. Since there's interest, here are the two major problems:

  1. This problem: metaclasses work differently. At the time I was working on this, six only had with_metaclass, which was insufficient. I think the newer @add_metaclass will do the right thing, though.
  2. AMP takes advantage of that python native strings are bytes. Since python 3 native strings are not bytes, there's a lot of code that breaks. With the size of AMP's implementation and tests, it's tedious to examine each string literal to determine "okay, is this supposed to be bytes or a native string?" I don't know if semantics were ever defined for non-ASCII identifiers in AMP, but it might be sufficient to force utf-8 for all keys.

comment:5 Changed 8 months ago by adiroiban

  • Author set to adiroiban
  • Branch changed from branches/amp-python3-6833 to branches/amp-py3-6833

(In [46279]) Branching to amp-py3-6833.

comment:6 Changed 8 months ago by adiroiban

  • Branch changed from branches/amp-py3-6833 to branches/amp-python3-6833-2

(In [46280]) Branching to amp-python3-6833-2.

comment:7 Changed 8 months ago by adiroiban

  • Keywords review removed
  • Owner set to thijs

Patch applied and sent to buildbot.

Looking forward for the final patch.

Don't forget to add a news file fragment and described here http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles

Thanks again!

Changed 8 months ago by allenap

comment:8 Changed 8 months ago by allenap

  • Keywords review added
  • Owner thijs deleted

This is a port of twisted.protocols.amp to Python 3. The tests pass on Python 2.7, 3.4 and 3.5. My branch is on GitHub, and a diff can be seen here:

https://github.com/twisted/twisted/compare/trunk...allenap:amp-python3-6833-3

I did see thijs's work on this before starting; his/her contribution was helpful. Thank you!

It began as fairly strict with regards to byte strings: if something MUST be a byte string or COULD be a byte string in Python 2, it MUST be a byte string in Python 3.

However, I relaxed that rule in several instances for Python 3, to create a more natural API and, in some instances, out of necessity:

  • AmpBox.__init__() accepts keyword arguments. In Python 3 these MUST be Unicode/native strings whereas in Python 2 they could be either byte strings or Unicode strings. In Python 3, therefore, keys are coerced to (ASCII) byte strings.

This poses a problem: AmpBox.__init__() has no command information so there's no way to make use of _wireNameToPythonIdentifier(). Coercing is all we can do.

However, this coercion is entirely for the benefit of tests. The implementation code does not use keyword arguments when creating instances of AmpBox and its derivatives. Eliminating the use of keyword arguments when creating AmpBox would address the need for coercion. There may be non-Twisted code around that creates AmpBox directly, so a compromise might be to forbid keyword arguments under Python 3 only.

  • BoxDispatcher._errorReceived() decodes error descriptions as UTF-8 text using the replace errors strategy. Python 3 does not permit byte strings as exception descriptions.

Would the surrogateescape errors strategy be better, so that decoding is loss-less?

  • BoxDispatcher._commandReceived() > formatError() encodes error descriptions as UTF-8 text using the replace errors strategy.

Would the surrogateescape errors strategy be better, so that encoding is loss-less?

At the same time as doing this port I've also been porting some not-in-Twisted code that uses t.p.amp. This highlighted some mistakes I made early on, and has helped this port feel "right".

I had to make many changes to the tests, mostly to add binary prefixes to strings. However, I have tested the ported t.p.amp module with the non-ported test module, and it worked (in Python 2.7 only, naturally).

There's one small thing that I spotted that I'm not sure how to fix, or even if we care to fix it:

  • Path (Argument subclass): A bytes-mode FilePath on one side will always be deserialised as a text-mode FilePath.

Of course, there may be others.

Please review!

comment:9 Changed 8 months ago by adiroiban

  • Branch changed from branches/amp-python3-6833-2 to branches/amp-python3-6833-3

Update branch

comment:10 Changed 8 months ago by adiroiban

I have never used AMP so I am just doing a high level review here.

Tests are green ... but the test code itself was also updated.. so I am not sure if actual application using AMP will still work after we merge these changes.


Do we still need the metaclass = type declaration?

Do we want/need byteString as a public API?

AmpBox.__init___ special Python3 handling needs a comment.

I know that you added the comments as a Trac comments but we need this info as close as possible to the source and future developers will not browser all past Trac conversation to get info about a certain change.

Also, please check Twisted naming conventions non_byte_names vs nonByteNames

Does the new AMP code only accept bytes or still works with 'str' on Py3 ? Please also update the docstrins to make it clear that we have 'bytes'.

While on Py2 str is just bytes, on Py3 str closer to unicode than bytes.

If a value should be unicode, please keep the u"ignored" marker as it will make sure it continue to be Unicode in Python2.


For BoxDispatcher._nextTag can we have some call which works on all systems... example explicit hexify ? From what I can see, there is no interpolation there, just a conversion to hex... which as a hack it was implemented using interpolation.

I would like to see as few _PY3 branches as possible.


In dispatchCommand the interpolation is just used as a concatenation. Can we replace the bytesString call with a bytes concatenation?


# Apply __metaclass__ to CommandLocator.

Is just stating the obvious, can we have a comment explaining why this is needed ? :)


For _wireNameToPythonIdentifier, why do we need the new assertion?

Can we just update the docstring to inform that bytes are required ?

This looks like a private method so we are in full control of the way in which it is called.


For Argument.fromBox / toBox we need to update the docstrings to explicilty say bytes.... and this is a general comments for all the current API.

But is it ok to force bytes ? What happen with existing apps which pass Unicode ?


I see a lot of new assert statments. Why do we need them?

I prefer to have better docstrings that those assertions.


Here is the code coverage

https://codecov.io/github/twisted/twisted/commit/a78c07355c15a7e7cb24b3beea7f0c32846800a1

Please note that the port should have at least 100% coverage ... so if the new code raises new exceptions (including assertions) please write tests for them.


Here are some style guide errors for our builder:

(view as text)
************* Module twisted.protocols.amp
W9201:293,0:byteString: The opening/closing of docstring should be on a line by themselves
W9202:293,0:byteString: Missing epytext markup @param for argument "encoding"
W9202:293,0:byteString: Missing epytext markup @param for argument "string"
C0103:624,12:AmpBox.__init__: Invalid name "non_byte_names" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:625,16:AmpBox.__init__: Invalid name "non_byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:626,16:AmpBox.__init__: Invalid name "byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
W9501:823,20:BoxDispatcher._nextTag: String formatting operations should always use a tuple for non-mapping values
************* Module twisted.test.test_amp
C0103:3068,8:DescriptorTests.test_roundTrip: Invalid name "name_as_bytes" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)



I am leaving this for review as I hope that this patch will get a second pair of eyes... but please consider my comments and update your branch.

Thanks!

comment:11 Changed 8 months ago by allenap

  • Cc gavin@… added

comment:12 Changed 8 months ago by adiroiban

  • Keywords review removed
  • Owner set to allenap

Please check my comments and submit a new patch.

And I hope that we can find someone else to do the final review.

Thanks!

Changed 8 months ago by allenap

comment:13 Changed 8 months ago by allenap

  • Keywords review added
  • Owner changed from allenap to adiroiban

My comments below refer to the (quoted) review of amp-python3-6822-3.patch and my updates which are in amp-python3-6822-3.1.patch.

I have never used AMP so I am just doing a high level review here.

Tests are green ... but the test code itself was also updated.. so I am not sure if actual application using AMP will still work after we merge these changes.


Do we still need the __metaclass__ = type declaration?

For Python 2, yes, for the classes that are not defined with an explicit metaclass.

Do we want/need byteString as a public API?

No. I realised it was used in two places and neither greatly benefitted from it. I have removed it.

AmpBox.__init___ special Python3 handling needs a comment.

I know that you added the comments as a Trac comments but we need this info as close as possible to the source and future developers will not browser all past Trac conversation to get info about a certain change.

Good point. Done.

Also, please check Twisted naming conventions non_byte_names vs nonByteNames

Done.

Does the new AMP code only accept bytes or still works with 'str' on Py3 ? Please also update the docstrins to make it clear that we have 'bytes'.

In general it accepts native strings in Python 3, but coerces them to byte strings where necessary. In many places there's no need to coerce because _wireNameToPythonIdentifier is used to derive a key. When updating the docs I've drawn attention to places where this happens.

While on Py2 str is just bytes, on Py3 str closer to unicode than bytes.

If a value should be unicode, please keep the u"ignored" marker as it will make sure it continue to be Unicode in Python2.

I think there was just the one place where I had inadvertently removed the u prefix. Fixed, but let me know if you found others.


For BoxDispatcher._nextTag can we have some call which works on all systems... example explicit hexify ? From what I can see, there is no interpolation there, just a conversion to hex... which as a hack it was implemented using interpolation.

I would like to see as few _PY3 branches as possible.

hexlify operates on byte strings. hex(int) returns a native string so this would need to be encoded in Python 3 too (it also returns a leading "0x" which would need to be removed). We could use a Unicode string in all Python versions:

    def _nextTag(self):
        self._counter += 1
        return (u'%x' % self._counter).encode("ascii")

which would avoid the conditional on _PY3. What do you think?


In dispatchCommand the interpolation is just used as a concatenation. Can we replace the bytesString call with a bytes concatenation?

I've eliminated this. The description passed to RemoteAmpError should always be a native string, and it is elsewhere.


# Apply __metaclass__ to CommandLocator.

Is just stating the obvious, can we have a comment explaining why this is needed ? :)

Done.


For _wireNameToPythonIdentifier, why do we need the new assertion?

Can we just update the docstring to inform that bytes are required ?

This looks like a private method so we are in full control of the way in which it is called.

It was a belt-n-braces thing. It helped while I was working but can be removed now.


For Argument.fromBox / toBox we need to update the docstrings to explicilty say bytes.... and this is a general comments for all the current API.

Done.

But is it ok to force bytes ? What happen with existing apps which pass Unicode ?

I added most/all of the assertions to help with the port. Most of them I've now removed.

Unicode.toString()

In Python 2 folk could be taking advantage of implicit promotion from str to unicode. In Python 3 this will break if the argument is a byte string. The assertion is no longer necessary, and could break some valid (though dubious) scenarios in Python 2.

AmpList.__init__()

This checks that all sub-arguments are defined with byte string names. This helps catch mistakes early on; errors resulting from a mistake here can be hard to diagnose. I've improved the fail message.


I see a lot of new assert statments. Why do we need them?

I added most/all of the assertions to help with the port, explanations for those remaining are above.

I prefer to have better docstrings that those assertions.

I agree with the docstrings.


Here is the code coverage

https://codecov.io/github/twisted/twisted/commit/a78c07355c15a7e7cb24b3beea7f0c32846800a1

Please note that the port should have at least 100% coverage ... so if the new code raises new exceptions (including assertions) please write tests for them.

Done, I think. Most significantly I added new tests for the checks I previously added to Command.__metaclass__.


Here are some style guide errors for our builder:

************* Module twisted.protocols.amp
W9201:293,0:byteString: The opening/closing of docstring should be on a line by themselves
W9202:293,0:byteString: Missing epytext markup @param for argument "encoding"
W9202:293,0:byteString: Missing epytext markup @param for argument "string"
C0103:624,12:AmpBox.__init__: Invalid name "non_byte_names" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:625,16:AmpBox.__init__: Invalid name "non_byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
C0103:626,16:AmpBox.__init__: Invalid name "byte_name" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)
W9501:823,20:BoxDispatcher._nextTag: String formatting operations should always use a tuple for non-mapping values
************* Module twisted.test.test_amp
C0103:3068,8:DescriptorTests.test_roundTrip: Invalid name "name_as_bytes" for type variable (should match ([a-z_][a-zA-Z0-9]*)$)

I think I've addressed all of those.


I am leaving this for review as I hope that this patch will get a second pair of eyes... but please consider my comments and update your branch.

Thank you!

comment:14 Changed 8 months ago by hawkowl

  • Author changed from adiroiban to hawkowl, adiroiban
  • Branch changed from branches/amp-python3-6833-3 to branches/amp-python3-6833-4

(In [46462]) Branching to amp-python3-6833-4.

comment:15 Changed 8 months ago by hawkowl

(In [46466]) applying patch from allenap, refs #6833

comment:16 Changed 8 months ago by hawkowl

  • Keywords review removed
  • Owner changed from adiroiban to allenap

Hi allenap!

Thanks for the patch! The coverage is looking very nice, so I'm confident that it'll work great when merged :)

Unfortunately twisted.protocols.amp relies on twisted.python._tzhelper, which is not a ported module. Since it's small enough, it should be able to be ported as a part of this ticket with no worries.

Could you port it (basing your patch off this branch, as I fixed up a future import or two) and put it up for review again?

Again, many thanks! This is an essential module that a lot of people use, and it being ported is a good first step for other modules which depend on it.

comment:17 Changed 8 months ago by allenap

  • Keywords review added
  • Owner allenap deleted

I'm happy to do that, but I think t.p._tzhelper has been ported already. It (and test_tzhelper) are in dist3.py, and the tests pass in Python 3.4 and 3.5.

comment:18 Changed 8 months ago by adiroiban

Thanks. I can confirm that tzhelper is in dist3.pyy

Waiting for Amber to come online just to double check.

comment:19 Changed 8 months ago by hawkowl

  • Keywords review removed
  • Owner set to adiroiban

Huh, seems it is. Doesn't have future imports, but if it's in dist3, it probably works.

Adi, go ahead and land it :)

comment:20 Changed 8 months ago by adiroiban

  • Owner changed from adiroiban to allenap

Thanks, will merge.

Thanks Gavin for your contribution!

comment:21 Changed 8 months ago by adiroiban

Waiting for test results from latest commit... and there is a failing documentation builder.

Changed 8 months ago by allenap

Incremental patch to add missing test docstrings to branches/amp-python3-6833-4.

comment:22 Changed 8 months ago by allenap

  • Keywords review added
  • Owner changed from allenap to adiroiban

Incremental patch for docstrings, as requested in #twisted-dev. Plus I have only just realised that I had named my patches with the wrong ticket number. Oops, and sorry.

comment:23 Changed 7 months ago by adiroiban

  • Keywords review removed
  • Owner changed from adiroiban to allenap

Thanks. Latest patch looks good.

I will merge.

Many thanks!

comment:24 Changed 7 months ago by adiroiban

  • Resolution set to fixed
  • Status changed from new to closed

(In [46516]) Merge amp-python3-6833-4: Port twisted.protocols.amp to Python 3.

Author: allenap Reviewer: adiroiban, hawkowl Fixes: #6833

Note: See TracTickets for help on using tickets.