[Twisted-Python] PEP3131: non-ascii identifiers

Wolfgang Rohdewald wolfgang.kde at rohdewald.de
Mon Sep 8 03:19:37 MDT 2014


This does not seem to be supported by Python yet.

Should that be enabled at all?
If one process with PY3 sends such identifiers to 
a separate process with PY2, that will fail. I am not
sure if that would be a problem, whoever uses this must
make sure PY3 is used everywhere.

If this should be forbidden, I will add a test to
test_pb for this. And of course somebody should document that
somewhere. There more PEP3131 is used, the more users will
fall into this trap.

If this should be enabled (which I think is not difficult,
at least for pb):

At least the patch below will be needed (only for PY3), 
maybe it is already sufficient. Given that nativeString
and networkString are always used (done that for pb).

networkString may then return bytes with the high bit set

But since networkString is called in many places I want to ask and
make sure that it may really be changed this way.


https://twistedmatrix.com/documents/14.0.0/core/specifications/banana.html
does not speak against it, so I wonder why networkString has that limitation
to 7bit.

concrete banana-encoded example, from modified test_pb: (the method name is getSimpleä)
test_pb still passes with patched nativeString/networkString (but I
only have one test for this so far, test_refcount).

b'\x07\x80\x07\x82message\x01\x81\x03\x82foo\x0b\x82getSimple\xc3\xa4\x01\x81\x01\x80\x05\x82tuple\x01\x80\n\x82dictionary'


diff --git twisted/python/compat.py twisted/python/compat.py
index 6f76c39..6919cf6 100644
--- twisted/python/compat.py
+++ twisted/python/compat.py
@@ -348,10 +348,9 @@ def nativeString(s):
         raise TypeError("%r is neither bytes nor unicode" % s)
     if _PY3:
         if isinstance(s, bytes):
-            return s.decode("ascii")
+            return s.decode("utf-8")
         else:
-            # Ensure we're limited to ASCII subset:
-            s.encode("ascii")
+            return s
     else:
         if isinstance(s, unicode):
             return s.encode("ascii")
@@ -428,7 +427,7 @@ if _PY3:
     def networkString(s):
         if not isinstance(s, unicode):
             raise TypeError("Can only convert text to bytes on Python 3, I got %r" % (s,))
-        return s.encode('ascii')
+        return s.encode('utf-8')
 
     def networkChar(integer):
         """

-- 
Wolfgang



More information about the Twisted-Python mailing list