[Twisted-Python] Bytes vs unicode in twisted.python.logfile's python3 porting
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Sat Oct 26 19:19:00 MDT 2013
On 26 Oct, 09:49 pm, _ at lvh.io wrote:
>Hi everyone,
>
>
>I'm working on #6749 for porting t.p.logfile to python3. I'm dealing
>with
>some test failures, which you can see from the buildbot here:
>
>http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/steps/shell/logs/stdio
>
>I have pasted the relevant bit into a gist here:
>
>https://gist.github.com/lvh/7174766
>
>I think what's happening is that LogFile.write should take native
>strings
>(since that's what log.msg takes). However, I'm opening all files in
>binary
>mode, since that's on the reviewer checklist (point 8) for the Python 3
>porting plan.
Argh. I was all set up to object to the "...should take native strings"
bit but then I reviewed some history.
If you haven't already, read
<https://twistedmatrix.com/trac/ticket/5932>. In hindsight, it would
have been nice if this branch had come with documentation of some kind.
From what I can tell, this means:
1. yep, `log.msg` takes native strings
2. LogPublisher doesn't do anything with encoding or decoding
3. a log observer can choose to do whatever it wants with what it gets
(but it should be prepared to handle bytes on python 2 and unicode
on python 3)
So... invent some policy. What encoding does Twisted's built-in file
log observer use for the files it writes? Up until now the answer has
been ASCII because it doesn't try to handle unicode so anything non-
ASCII results in exceptions (unless you're using PyGTK... let's not go
there).
UTF-8 is obviously the only possible correct answer, I guess. This
needs to be documented, of course.
And I predict that next someone will come along with a feature request
for a command-line option to twistd to make it write logs with a
different encoding. :(
Whether you open the file in binary mode or not is up to you in this
case, I think. You could do that and handle the encoding yourself or
you could open it in text mode with the right encoding and let it handle
the encoding.
The blanket statement on
https://twistedmatrix.com/trac/wiki/Plan/Python3 about open calls is
perhaps slightly too general. It seems like that should apply to cases
where the behavior is not supposed to be changing - which should be the
case for the majority of porting work. However, `log.msg` has already
changed behavior as part of the port.
*Or*, it now occurs to me, just stick with the ASCII-only policy that's
already in place. I'd even say this is more correct since porting isn't
supposed to change behavior. Leave support for some other codec for
another ticket (perhaps #989). Apart from being simpler (I hope) and
avoiding breaching the documented porting guidelines, this also means
someone will actually have to think about unicode support on Python 2 as
well. Saying we support unicode in the logging system is a lot better
than saying we support unicode in the logging system on Python 3 only.
Jean-Paul
More information about the Twisted-Python
mailing list