Opened 15 years ago

Last modified 5 years ago

#989 defect new

Default log observer chokes on unicode messages

Reported by: Jean-Paul Calderone Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: Glyph, Jean-Paul Calderone, itamarst, jknight, bruce, ralphm, Thijs Triemstra, miracle2k, therve, oubiwann Branch: branches/unicode-logging-989
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun


Change History (16)

comment:1 Changed 15 years ago by Jean-Paul Calderone

When logging a unicode message, the default log observer will raise an
exception, causing itself to be removed from the logging system.

This fact is helpfully logged so that the developer or sysadmin can rectify the
problem, only there are often no log observers left to record this message...

1) We should support unicode messages

2a) There should be an ultra-safe fallback log observer which can be used to
record things like the default log observer failing and being removed


2b) The default log observer should never be permanently removed, even if it
starts raising exceptions

comment:2 Changed 15 years ago by bruce

I see a problem similar to this, but I don't use unicode, so it isn't exactly
this.  But I do see that *something* happens, and forever after that, my
twistd.log is sadly silent.

comment:3 Changed 15 years ago by jknight

[21:50] <foom> the proposal was to not remove the logger, until trying to log 
the exception first, and having that fail.
[21:51] <foom> as well ensuring that unicode strings in tracebacks don't screw 
things up or something
[21:51] <exarkun> tracebacks shouldn't be able to be unicode
[21:51] <foom> then the first should fix it
[21:51] <exarkun> of course that is only a moral stance, I don't know if it is 
[21:52] <foom> Removing the logger because the logged item is defective is 
wrong, so that's the first thing that should be fixed.
[21:52] <exarkun> Eh
[21:52] <exarkun> That's a funny way of looking at it.
[21:52] <foom> Well, if logging the exception raised when trying to log works, 
nothing needs to be removed.
[21:53] <foom> because the system is still in a workable state.
[21:53] <exarkun> right, it seems like it should work
[21:53] <exarkun> it's just seems funny :)
[21:53] <foom> if this proposal isn't in the bug report yet someone should add 

comment:4 Changed 15 years ago by Jean-Paul Calderone

Something of a fix committed at r13877.  This was not the fix discussed below,
because there are so many ugly cases handled in the code right now, it would
have been too difficult to implement.

As a long term solution, all of the stupid misfeatures of log.msg() should be
deprecated and well specified features added in their place (including unicode
support and interface-based observer delegation).

comment:5 Changed 15 years ago by bruce

My bug was something different and is now filed as issue #1069.

comment:6 Changed 11 years ago by Thijs Triemstra

Cc: Thijs Triemstra added

comment:7 Changed 10 years ago by miracle2k

Cc: miracle2k added

comment:8 Changed 10 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/unicode-logging-989

(In [29679]) Branching to 'unicode-logging-989'

comment:9 Changed 10 years ago by Jean-Paul Calderone

Keywords: review added
Owner: itamarst deleted


I added unicode support to the three log observers we have. This basically takes the form of:

  • Try to discover the encoding of the file we're logging to. This probably only works for stdout.
  • Add an option to specify the encoding. This can't be passed on the twistd command line, only specified in a tac file.
  • Force UTF-8 for syslog because the RFC said so
  • Just pass unicode through to the stdlib logging module because it seems to be able to handle it

I'm not happy there's no way to specify this except by writing a tac file, but the mess of twistd command line logging options isn't something I want to touch now.

comment:10 Changed 10 years ago by Jean-Paul Calderone

Build results, by the way.

comment:11 Changed 10 years ago by therve

Cc: therve added
Keywords: review removed
Owner: set to Jean-Paul Calderone
  • There should probably be something like safe_unicode, to prevent breakage from object with broken __unicode__. Or random exceptions should be logged.
  • textFromEventDict docstring should mention the fact that it can return unicode if encoding is _DO_NOT_ENCODE.
  • I suspect the current implementation may break existing correct code. For example, if I logging utf-8 encoded messages right now, and I use stdout, the cast to unicode will fail. You may have to check the type of the data you're getting.

comment:12 Changed 10 years ago by Jean-Paul Calderone

One thing which will simplify this is to actually deprecate and disable the longer-than-one tupleness of the message passed to observers. I filed #4584 and will resolve that before trying to go further on this.

comment:13 Changed 9 years ago by <automation>

Owner: Jean-Paul Calderone deleted

comment:14 Changed 9 years ago by Jean-Paul Calderone

#5056 was a duplicate of this.

comment:15 Changed 8 years ago by oubiwann

Cc: oubiwann added

comment:16 Changed 5 years ago by Wilfredo Sánchez Vega

FWIW, twisted.logger uses unicode for format strings, and really only tolerates bytes because str on Py2.

It's LogPublisher also logs an observer's raise exception without removing the observer permanently. A quick test using twisted.python.log shows that the legacy module is benefiting from that now.

This program:

from twisted.python.log import addObserver, msg

def raisingObserver(event):
    raise RuntimeError("I hate data.  Why you give data?  Shut up.")




Temporarily disabling observer LegacyLogObserverWrapper(<function raisingObserver at 0x1078aa050>) due to exception: [Failure instance: Traceback: <type 'exceptions.RuntimeError'>: I hate data.  Why you give data?  Shut up.
... (traceback follows)

So, you might call this fixed, though I have not added unit tests to the old system to verify this.

Note: See TracTickets for help on using tickets.