[Twisted-Python] Python3: should paths be bytes or str?

anatoly techtonik techtonik at gmail.com
Mon Sep 8 04:26:58 MDT 2014


On Mon, Sep 8, 2014 at 5:14 AM,  <exarkun at twistedmatrix.com> wrote:
> On 01:26 am, wolfgang.kde at rohdewald.de wrote:
>>
>> The porting guide says
>>
>> No byte paths in sys.path.
>
>
> What porting guide is that?
>>
>>
>> doc for FilePath says
>>    On both Python 2 and Python 3, paths can only be bytes.
>>
>>
>> I stumbled upon this while trying to find out how much work it might be
>> to make bin/trial run with python3
>>
>> admin/run-python3-tests already passes for all twisted.spread related
>> tests but I still need to clean up a lot.
>>
>> after adding an assert to FilePath.__init__, python3 bin/trial ... gives
>>
>>  File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 601, in run
>>    config.parseOptions()
>>  File "/home/wr/ssdsrc/Twisted/twisted/python/usage.py", line 277, in
>> parseOptions
>>    self.postOptions()
>>  File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 472, in
>> postOptions
>>    _BasicOptions.postOptions(self)
>>  File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 382, in
>> postOptions
>>    self['reporter'] = self._loadReporterByName(self['reporter'])
>>  File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 369, in
>> _loadReporterByName
>>    for p in plugin.getPlugins(itrial.IReporter):
>>  File "/home/wr/ssdsrc/Twisted/twisted/plugin.py", line 209, in getPlugins
>>    allDropins = getCache(package)
>>  File "/home/wr/ssdsrc/Twisted/twisted/plugin.py", line 134, in getCache
>>    mod = getModule(module.__name__)
>>  File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 781, in
>> getModule
>>    return theSystemPath[moduleName]
>>  File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 702, in
>> __getitem__
>>    self._findEntryPathString(moduleObject)),
>>  File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 627, in
>> _findEntryPathString
>>    if _isPackagePath(FilePath(topPackageObj.__file__)):
>>  File "/home/wr/ssdsrc/Twisted/twisted/python/filepath.py", line 664, in
>> __init__
>>    assert isinstance(path, bytes), 'path must be bytes: %r' % (path,)
>> AssertionError: path must be bytes:
>> '/home/wr/ssdsrc/Twisted/twisted/__init__.py'
>
>
> If paths are being represented using unicode somewhere and you want to use
> them with FilePath then you have to encode them (or you have to add unicode
> path support to FilePath and let FilePath encode them).
>
> Unfortunately it's not entirely obvious how to make FilePath support unicode
> paths since not all platforms Twisted supports represent filesystem paths
> using unicode.

It really depends on filesystem, not on a platform. Platform just makes sure
that you won't shoot it in the foot. So to behave good you need to translate
you path knowledge to platform knowledge when you have to make change.

In data transformation theory that may mean:
[ ] get data about path in native format
  [ ] detect the source encoding of filesystem
[ ] figure out if you can work with native format
  [ ] python 2 way - just work with bytes
  [ ] python 3 way - look if native filesystem format is convertible to unicode
    [ ] if conversion is symmetrical - operate in unicode
    [ ] if not convertible, alternatives (options, switches)
      [ ] fail and explain why to user in user actionable manner (don't use ?)
      [ ] use some symmetrical mapping to unicode and mark path objects as
          `mapped` so that there is a trace of hacks on filepaths
      [ ] provide API on objects without ability to use names directly
      [ ] transform the name to a "safe" valid value loosing the original name
          and explain the user why and what happened to the old name

> The choice python-dev made to bridge this gap was the creation of the
> "surrogateescape" error handler for the UTC-8 codec.  This lets you pretend
> that any time you need to convert between bytes and unicode the correct
> codec is UTF-8 (with this special error handler).
>
> It's not clear this was a good choice (since the result is unicode strings
> that may contain garbage which will confuse other software) but it's also
> not clear it's possible for Twisted to try to make any other choice (at some
> point Twisted has to interoperate with the path-related APIs in Python
> itself - `sys.path`, for example).
>
> Not sure if that helps you at all.  Maybe it outlines the problem a little
> more clearly, at least.

I think that it should be a choice of application maintainers. If they want to
create files with dots at the end, Windows allows this, but doesn't support
it through standard WinAPI calls, because of FAT. But you can use special
path transformation prefix \\?\ to do this on NTFS or on a remote machine.
In networking OS plays less role, but the new choice for every platform
filesystem is not clear for users. They don't realize where the problem
comes from. In the end it all depends on FS first, then on
OS API (which can be skipped thanks to direct disk access), then on
FS library, then on application. Application should be able to opt-in for
handling all possible cases, or just "safe" subset or something middle,
so a task of a framework is to just describe the problem well and ensure
that people can make their choice.
-- 
anatoly t.




More information about the Twisted-Python mailing list