[Twisted-Python] Changing supported configurations regarding Unicode handling on Windows

Amber Brown (hawkowl) hawkowl at atleastfornow.net
Fri Jun 19 05:37:21 MDT 2020


Hi all,

The past week or so, I noticed failures in the Azure Pipelines CI (see 
https://github.com/twisted/twisted/pull/1278 for the ticket with them, 
among others) that were due to Python + Windows falling apart on 
mgorny's name. After some debugging, I ascertained:

- The environment has Unicode strings in it (because environments are 
Unicode on Windows)
- but sys.stdout.encoding is cp1252 -- 
https://www.python.org/dev/peps/pep-0528/ does not apply due to it being 
a non-interactive console
- One of the characters in the environment is not printable under 
cp1252, which causes an exception.

I think we should avoid running under ANSI-mode by default at all costs, 
since it causes non-obvious bugs like this (`print(os.environ)` causing 
an exception). This would also bring Windows in line with UNIX, where we 
basically assume a non-UTF-8 locale is more or less broken by design and 
we don't run the tests on it.

It also seems like Windows is heading in the direction of having console 
output be CP65001 (aka UTF-8), so I think this is a reasonable direction 
to go in as well. [1] [2] [3]

PEP-528 makes sys.stdout/sys.stdin use the W ("wide", aka UTF-16LE) 
APIs, as it's assumed that a human is on the other side of the console. 
For compatibility, it will encode Unicode to UTF-8, pass it to 
WindowsConsoleIO, which will then decode it into UTF-16 and pass it to 
the console, meaning that writing raw UTF-8 bytes to sys.stdout.buffer 
works as you'd expect on Windows and UNIXes. We can enable UTF-8 text 
output universally with the environment variable 
`PYTHONIOENCODING=utf8:surrogateescape`. If a user wants ANSI output, 
they can use the "PYTHONLEGACYWINDOWSSTDIO" environment to make Python 
not perform the Unicode conversions for the console, so we could perhaps 
use this too, if someone is SURE they want ANSI output.

Python 3.7 has PEP-540's `-X utf8` mode, which also does this, more or 
less, but in a nicer way (no environment variables).

Python 3.5 doesn't seem to work with either of these options. Not sure 
why. Maybe it's busted.

So, due to this, I would like to propose the following:

- On Windows, raising a deprecation warning when sys.stdout and 
sys.stderr are not UTF-8 AND the environment variable 
"PYTHONLEGACYWINDOWSSTDIO" is not set.
- Declaring said environments unsupported and running our tests with -X 
utf8/PYTHONIOENCODING=utf8 or PYTHONLEGACYWINDOWSSTDIO (which will 
require some Unicode tests which fail because CP1252 is bad to be skipped).
- After the deprecation period, start issuing loud RuntimeWarnings 
saying that you're probably not doing the thing you want to be doing.

Opinions?

- Amber

[1] 
https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
[2] 
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?view=netcore-3.1#the-default-property-on-net-core
[3] 
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page



More information about the Twisted-Python mailing list