Opened 8 years ago

Closed 4 years ago

#1990 defect closed fixed (fixed)

StdioOnnaStick ignores sys.stdout.encoding, won't print unicode properly

Reported by: termie Owned by: itamar
Priority: normal Milestone:
Component: core Keywords:
Cc: spiv, exarkun, elsdoerfer@…, davidsarah Branch: branches/stdio-unicode-1990
(diff, github, buildbot, log)
Author: itamarst, termie Launchpad Bug:

Description

Twisted overloads stdout to go to its log, but the log ignores encoding and tends to choke on unicode even though my terminal can support it

Attachments (2)

log.patch (2.2 KB) - added by termie 8 years ago.
A patch to have StdioOnnaStick default to the sys.stdout.encoding and push that through to msg()
print.patch (795 bytes) - added by termie 8 years ago.
A patch fixing only the StdioOnnaStick side of things

Download all attachments as: .zip

Change History (14)

Changed 8 years ago by termie

A patch to have StdioOnnaStick default to the sys.stdout.encoding and push that through to msg()

comment:1 Changed 8 years ago by spiv

  • Cc spiv added
  • Keywords review added

comment:2 Changed 8 years ago by exarkun

  • Cc exarkun added
  • Keywords review removed
  • Owner changed from glyph to termie

Needs tests. I also don't think this is the right way to go about this. StdioOnnaStick can encode any unicode it sees: the rest of the logging system doesn't need to know anything about this.

comment:3 Changed 8 years ago by termie

glyph, safe_str is at fault here, too, as it by design only wants to handle ascii. So either safe_str should handle everything as unicode (which would be unsafe to most terminals and choke japanese), it should not be used, or it needs to have its encoding be

As StdioOnnaStick has a known encoding, sys.stdout.encoding, it is safe for things going through there to use that encoding, however other logging applications do not have the benefit and as long as msg() uses safe_str, which fails on different encodings, and nothing accepts an encoding argument, nobody gets a chance to output proper data.

comment:4 Changed 8 years ago by exarkun

If the purpose here is to fix `print', then my above comments apply.

If the purpose is to add unicode support to twisted.python.log, then there's a bit more work to do. StdioOnnaStick isn't the right place to determine the encoding, since the log may not be going to stdout. FileLogObserver does the actual write, so it should do the actual encoding. safe_str shouldn't know about encodings, but maybe it should return a unicode string or there should be a safe_unicode. The caller can worry about doing the encoding.

This all has the benefit of avoiding mixing byte strings with different encodings and passing an explicit encoding name through the logging system.

Changed 8 years ago by termie

A patch fixing only the StdioOnnaStick side of things

comment:5 Changed 5 years ago by exarkun

The patch should include unit tests. Thanks.

comment:6 Changed 5 years ago by miracle2k

  • Cc elsdoerfer@… added

comment:7 Changed 4 years ago by davidsarah

  • Cc davidsarah added

The absence of an encoding attribute on StdioOnnaStick instances caused Tahoe-LAFS bug #1099.

comment:8 Changed 4 years ago by itamarst

  • Author set to itamarst
  • Branch set to branches/stdio-unicode-1990

(In [32003]) Branching to 'stdio-unicode-1990'

comment:9 Changed 4 years ago by itamar

  • Author changed from itamarst to itamarst, termie
  • Keywords review added
  • Owner termie deleted

I wrote a bunch of tests for StdioOnnaStick, and then added the encoding attribute as well as code to encode unicode. The limited goal is to make print continue to work in the face of Twisted logging; if printing unicode used to work, it should continue to work.

comment:10 Changed 4 years ago by therve

  • Owner set to therve

comment:11 Changed 4 years ago by therve

  • Keywords review removed
  • Owner changed from therve to itamar
  1. You should use addCleanup in the tests, instead of try/finally.
  1. It would be nice to use better variable names than f and s.
  1. You could reduce the number of calls to sys.getdefaultencoding by making the default encoding argument to StdioOnnaStick.__init__ None, and use getdefaultencoding in that case.

Build results here: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/stdio-unicode-1990

Thanks, please merge!

comment:12 Changed 4 years ago by itamarst

  • Resolution set to fixed
  • Status changed from new to closed

(In [32022]) Merge stdio-unicode-1990

Author: termie, itamarst
Reviewer: therve
Fixes: #1990

When the log system overrides sys.stdout and sys.stderr, continue to support standard Python behavior of encoding Unicode in print.

Note: See TracTickets for help on using tickets.