Ticket #1990 defect closed fixed

Opened 7 years ago

Last modified 2 years ago

StdioOnnaStick ignores sys.stdout.encoding, won't print unicode properly

Reported by: termie Owned by: itamar
Priority: normal Milestone:
Component: core Keywords:
Cc: spiv, exarkun, elsdoerfer@…, davidsarah Branch: branches/stdio-unicode-1990
Author: itamarst, termie Launchpad Bug:

Description

Twisted overloads stdout to go to its log, but the log ignores encoding and tends to choke on unicode even though my terminal can support it

Attachments

log.patch Download (2.2 KB) - added by termie 7 years ago.
A patch to have StdioOnnaStick default to the sys.stdout.encoding and push that through to msg()
print.patch Download (0.8 KB) - added by termie 7 years ago.
A patch fixing only the StdioOnnaStick side of things

Change History

Changed 7 years ago by termie

A patch to have StdioOnnaStick default to the sys.stdout.encoding and push that through to msg()

1

Changed 7 years ago by spiv

  • cc spiv added
  • keywords review added

2

Changed 7 years ago by exarkun

  • cc exarkun added
  • keywords review removed
  • owner changed from glyph to termie

Needs tests. I also don't think this is the right way to go about this. StdioOnnaStick can encode any unicode it sees: the rest of the logging system doesn't need to know anything about this.

3

Changed 7 years ago by termie

glyph, safe_str is at fault here, too, as it by design only wants to handle ascii. So either safe_str should handle everything as unicode (which would be unsafe to most terminals and choke japanese), it should not be used, or it needs to have its encoding be

As StdioOnnaStick has a known encoding, sys.stdout.encoding, it is safe for things going through there to use that encoding, however other logging applications do not have the benefit and as long as msg() uses safe_str, which fails on different encodings, and nothing accepts an encoding argument, nobody gets a chance to output proper data.

4

Changed 7 years ago by exarkun

If the purpose here is to fix `print', then my above comments apply.

If the purpose is to add unicode support to twisted.python.log, then there's a bit more work to do. StdioOnnaStick isn't the right place to determine the encoding, since the log may not be going to stdout. FileLogObserver does the actual write, so it should do the actual encoding. safe_str shouldn't know about encodings, but maybe it should return a unicode string or there should be a safe_unicode. The caller can worry about doing the encoding.

This all has the benefit of avoiding mixing byte strings with different encodings and passing an explicit encoding name through the logging system.

Changed 7 years ago by termie

A patch fixing only the StdioOnnaStick side of things

5

Changed 4 years ago by exarkun

The patch should include unit tests. Thanks.

6

Changed 3 years ago by miracle2k

  • cc elsdoerfer@… added

7

Changed 3 years ago by davidsarah

  • cc davidsarah added

The absence of an encoding attribute on StdioOnnaStick instances caused  Tahoe-LAFS bug #1099.

8

Changed 2 years ago by itamarst

  • branch set to branches/stdio-unicode-1990
  • branch_author set to itamarst

(In [32003]) Branching to 'stdio-unicode-1990'

9

Changed 2 years ago by itamar

  • keywords review added
  • owner termie deleted
  • branch_author changed from itamarst to itamarst, termie

I wrote a bunch of tests for StdioOnnaStick, and then added the encoding attribute as well as code to encode unicode. The limited goal is to make print continue to work in the face of Twisted logging; if printing unicode used to work, it should continue to work.

10

Changed 2 years ago by therve

  • owner set to therve

11

Changed 2 years ago by therve

  • owner changed from therve to itamar
  • keywords review removed

1. You should use addCleanup in the tests, instead of try/finally.

2. It would be nice to use better variable names than f and s.

3. You could reduce the number of calls to sys.getdefaultencoding by making the default encoding argument to StdioOnnaStick.__init__ None, and use getdefaultencoding in that case.

Build results here:  http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/stdio-unicode-1990

Thanks, please merge!

12

Changed 2 years ago by itamarst

  • status changed from new to closed
  • resolution set to fixed

(In [32022]) Merge stdio-unicode-1990

Author: termie, itamarst Reviewer: therve Fixes: #1990

When the log system overrides sys.stdout and sys.stderr, continue to support standard Python behavior of encoding Unicode in print.

Note: See TracTickets for help on using tickets.