[Twisted-Python] Bytes vs unicode in twisted.python.logfile's python3 porting

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Sat Oct 26 19:19:00 MDT 2013


On 26 Oct, 09:49 pm, _ at lvh.io wrote:
>Hi everyone,
>
>
>I'm working on #6749 for porting t.p.logfile to python3. I'm dealing 
>with
>some test failures, which you can see from the buildbot here:
>
>http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/steps/shell/logs/stdio
>
>I have pasted the relevant bit into a gist here:
>
>https://gist.github.com/lvh/7174766
>
>I think what's happening is that LogFile.write should take native 
>strings
>(since that's what log.msg takes). However, I'm opening all files in 
>binary
>mode, since that's on the reviewer checklist (point 8) for the Python 3
>porting plan.

Argh.  I was all set up to object to the "...should take native strings" 
bit but then I reviewed some history.

If you haven't already, read 
<https://twistedmatrix.com/trac/ticket/5932>.  In hindsight, it would 
have been nice if this branch had come with documentation of some kind.

 From what I can tell, this means:

  1. yep, `log.msg` takes native strings
  2. LogPublisher doesn't do anything with encoding or decoding
  3. a log observer can choose to do whatever it wants with what it gets
     (but it should be prepared to handle bytes on python 2 and unicode 
on python 3)

So... invent some policy.  What encoding does Twisted's built-in file 
log observer use for the files it writes?  Up until now the answer has 
been ASCII because it doesn't try to handle unicode so anything non- 
ASCII results in exceptions (unless you're using PyGTK... let's not go 
there).

UTF-8 is obviously the only possible correct answer, I guess.  This 
needs to be documented, of course.

And I predict that next someone will come along with a feature request 
for a command-line option to twistd to make it write logs with a 
different encoding. :(

Whether you open the file in binary mode or not is up to you in this 
case, I think.  You could do that and handle the encoding yourself or 
you could open it in text mode with the right encoding and let it handle 
the encoding.

The blanket statement on 
https://twistedmatrix.com/trac/wiki/Plan/Python3 about open calls is 
perhaps slightly too general.  It seems like that should apply to cases 
where the behavior is not supposed to be changing - which should be the 
case for the majority of porting work.  However, `log.msg` has already 
changed behavior as part of the port.

*Or*, it now occurs to me, just stick with the ASCII-only policy that's 
already in place.  I'd even say this is more correct since porting isn't 
supposed to change behavior.  Leave support for some other codec for 
another ticket (perhaps #989).  Apart from being simpler (I hope) and 
avoiding breaching the documented porting guidelines, this also means 
someone will actually have to think about unicode support on Python 2 as 
well.  Saying we support unicode in the logging system is a lot better 
than saying we support unicode in the logging system on Python 3 only.

Jean-Paul




More information about the Twisted-Python mailing list