[Twisted-Python] Python 3: bytes vs. str in twisted.python.filepath

Harry Bock bock.harryw at gmail.com
Sun Jul 14 08:18:20 MDT 2013


On Sun, Jul 14, 2013 at 8:16 AM, Itamar Turner-Trauring <itamar at itamarst.org
> wrote:

> On 07/13/2013 10:00 PM, Harry Bock wrote:
>
>> Hi all,
>>
>> My name is Harry Bock.  I'm interested in helping out porting Twisted to
>> Python 3, and I've popped in IRC a few times to introduce myself and ask a
>> few questions. A few developers agreed that working on trial dependencies
>> would be a big help.
>>
>> In doing some porting work on trial, I stumbled upon a previous porting
>> effort (possibly by Itamar?) for twisted.python.filepath and related
>> modules.  It seemed like the porting effort included forcing all pathname
>> inputs to be byte strings instead of native strings.
>>
>
> You imply that this was a change, somehow, but it wasn't. The API was
> *always* bytes and it continues to be bytes on Python 3.
>

Ah, I understand now.  Since the native string type was used in Python 2,
it follows that in Python 3 the API should be bytes.


>
> It's a common Python 3 porting mistake to change everything from bytes to
> unicode just because. E.g. Python standard library does this in many places
> for no good reason, resulting in bugs that are still being fixed (
> http://bugs.python.org/**issue12411 <http://bugs.python.org/issue12411>)
> or APIs that are less useful (zipfile docs explicitly state that there is
> no standard encoding in zip files, but Python 3 zipfile module only
> supports one specific encoding because they switched to Unicode and didn't
> bother reading the module's own docs). Our goal in porting was backwards
> compatibility with Python 2 code, so porters don't have to change
> everything, and correctness. And, in this particular case, to get something
> working in the minimal amount of time - *adding* Unicode support is useful
> and should be done.
>
>
>  After some investigation, I believe this is the wrong approach, but I
>> wanted to start a discussion here first.  Some thoughts:
>>
>> (a) As of Python 3.3, use of the ANSI API in Windows is deprecated[1], so
>> many functions in os and os.path raise DeprecationWarning when given byte
>> strings as input.  Although win32 is not an initial target of the porting
>> effort, we'll have to support it eventually and the API should be supported
>> before then.
>>
>> (b) Misunderstandings at the application level about the underlying
>> filesystem's path encoding is not the problem of the Twisted API.  Correct
>> me if I'm wrong, but that's the responsibility of the system administrator
>> or individual user (at least on UNIX) to set the LANG environment variable,
>> or for the application to call setlocale(3) to explicitly override it.
>>
> Given operating systems that don't really know about encodings on the
> filesystem level, forcing everything to be unicode doesn't make sense. I'm
> pretty sure you can end up with files in multiple different Unicode
> encodings on same filesystem on Linux, for example.


This is very true and I didn't consider it in my initial investigation.
While I think it would be uncommon to have files in multiple encodings on
the same filesystem, it certainly would not be rare - to Tristan's point,
copying names from filesystem to filesystem could easily result in multiple
encodings.  The operating system may not need to understand the encodings,
but applications do to display them correctly,  Which leads to your last
point...

>
>  Thus, my vote is that on Python 2.x, Twisted should accept either the
>> native str or unicode types for path names, and on Python 3.x, only accept
>> the str type to prevent deprecation issues with system calls.  I have a
>> patch set that will make this happen including unittest modifications; if
>> there's a consensus I'm happy to open a ticket and submit the patches.
>>
>
> The ideal situation would be to support bytes and Unicode on Python 2
> *and* Python 3, for maximum compatibility. Even if deprecated on Windows,
> filesystem operations on Python 3 still do accept bytes (and they're not
> deprecated elsewhere). Given existing code that already takes bytes,
> switching to only doing Unicode on Python 3 would not be backwards
> compatible, so we can't really do that without a bunch of deprecation
> warnings and a few releases. Instead we should just do what Python does: if
> you start with bytes path you always get back bytes, if you start with
> Unicode path you always get back Unicode.
>
>
Yes, you're right, that's probably the best solution.  It would not be
terribly hard to do so - then application developers can choose whether to
defer to the local user's interpretation of the setting, or explicitly use
byte paths.  Thanks so much for your input!

Is this something I can open a ticket for?



>
> ______________________________**_________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.**com <Twisted-Python at twistedmatrix.com>
> http://twistedmatrix.com/cgi-**bin/mailman/listinfo/twisted-**python<http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20130714/a1b87813/attachment-0001.html>


More information about the Twisted-Python mailing list