[Twisted-Python] PEP3131: non-ascii identifiers

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Mon Sep 8 04:54:44 MDT 2014


On 09:19 am, wolfgang.kde at rohdewald.de wrote:
>This does not seem to be supported by Python yet.
>
>Should that be enabled at all?
>If one process with PY3 sends such identifiers to
>a separate process with PY2, that will fail. I am not
>sure if that would be a problem, whoever uses this must
>make sure PY3 is used everywhere.

This is why we will *not* change the PB wire protocol as part of the 
porting work.  The wire protocol will remain the same whether you are 
using Python 2 or Python 3 to run your program or.

This is the point of a protocol, after all.  It is to let two programs 
communicate with each other.
>If this should be forbidden, I will add a test to
>test_pb for this. And of course somebody should document that
>somewhere. There more PEP3131 is used, the more users will
>fall into this trap.

I'm not exactly sure what you mean here.  Using unicode where only bytes 
are allowed is probably already forbidden throughout PB.
>
>If this should be enabled (which I think is not difficult,
>at least for pb):
>
>At least the patch below will be needed (only for PY3),
>maybe it is already sufficient. Given that nativeString
>and networkString are always used (done that for pb).
>
>networkString may then return bytes with the high bit set

Definitely not.
>But since networkString is called in many places I want to ask and
>make sure that it may really be changed this way.
>
>
>https://twistedmatrix.com/documents/14.0.0/core/specifications/banana.html
>does not speak against it, so I wonder why networkString has that 
>limitation
>to 7bit.

That is the sole purpose of `networkString`.  It is a work-around for 
the inconvenient fact that Python changed the meaning of the string 
literal syntax from bytes to unicode.
>
>concrete banana-encoded example, from modified test_pb: (the method 
>name is getSimpleä)
>test_pb still passes with patched nativeString/networkString (but I
>only have one test for this so far, test_refcount).
>
>b'\x07\x80\x07\x82message\x01\x81\x03\x82foo\x0b\x82getSimple\xc3\xa4\x01\x81\x01\x80\x05\x82tuple\x01\x80\n\x82dictionary'
>
>
>diff --git twisted/python/compat.py twisted/python/compat.py
>index 6f76c39..6919cf6 100644
>--- twisted/python/compat.py
>+++ twisted/python/compat.py
>@@ -348,10 +348,9 @@ def nativeString(s):
>         raise TypeError("%r is neither bytes nor unicode" % s)
>     if _PY3:
>         if isinstance(s, bytes):
>-            return s.decode("ascii")
>+            return s.decode("utf-8")
>         else:
>-            # Ensure we're limited to ASCII subset:
>-            s.encode("ascii")
>+            return s
>     else:
>         if isinstance(s, unicode):
>             return s.encode("ascii")
>@@ -428,7 +427,7 @@ if _PY3:
>     def networkString(s):
>         if not isinstance(s, unicode):
>             raise TypeError("Can only convert text to bytes on Python 
>3, I got %r" % (s,))
>-        return s.encode('ascii')
>+        return s.encode('utf-8')
>
>     def networkChar(integer):
>         """

This change definitely won't be acceptable.  It completely removes the 
feature `networkString` exists to provide: verifying that strings that 
might be either unicode or bytes can still be implicitly combined into 
bytes.

Can you point out the specific places where you think PB needs to start 
using UTF-8 instead of ASCII?  Those are the places that need to be 
fixed, not the underlying porting helpers they happen to use.

Jean-Paul




More information about the Twisted-Python mailing list