Opened 9 years ago

Last modified 2 years ago

#989 defect new

Default log observer chokes on unicode messages

Reported by: exarkun Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: glyph, exarkun, itamarst, jknight, bruce, ralphm, thijs, elsdoerfer@…, therve, oubiwann Branch: branches/unicode-logging-989
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description


Change History (15)

comment:1 Changed 9 years ago by exarkun

When logging a unicode message, the default log observer will raise an
exception, causing itself to be removed from the logging system.

This fact is helpfully logged so that the developer or sysadmin can rectify the
problem, only there are often no log observers left to record this message...

1) We should support unicode messages

2a) There should be an ultra-safe fallback log observer which can be used to
record things like the default log observer failing and being removed

or

2b) The default log observer should never be permanently removed, even if it
starts raising exceptions

comment:2 Changed 9 years ago by bruce

I see a problem similar to this, but I don't use unicode, so it isn't exactly
this.  But I do see that *something* happens, and forever after that, my
twistd.log is sadly silent.

comment:3 Changed 9 years ago by jknight

[21:50] <foom> the proposal was to not remove the logger, until trying to log 
the exception first, and having that fail.
[21:51] <foom> as well ensuring that unicode strings in tracebacks don't screw 
things up or something
[21:51] <exarkun> tracebacks shouldn't be able to be unicode
[21:51] <foom> then the first should fix it
[21:51] <exarkun> of course that is only a moral stance, I don't know if it is 
true
[21:52] <foom> Removing the logger because the logged item is defective is 
wrong, so that's the first thing that should be fixed.
[21:52] <exarkun> Eh
[21:52] <exarkun> That's a funny way of looking at it.
[21:52] <foom> Well, if logging the exception raised when trying to log works, 
nothing needs to be removed.
[21:53] <foom> because the system is still in a workable state.
[21:53] <exarkun> right, it seems like it should work
[21:53] <exarkun> it's just seems funny :)
[21:53] <foom> if this proposal isn't in the bug report yet someone should add 
it.

comment:4 Changed 9 years ago by exarkun

Something of a fix committed at r13877.  This was not the fix discussed below,
because there are so many ugly cases handled in the code right now, it would
have been too difficult to implement.

As a long term solution, all of the stupid misfeatures of log.msg() should be
deprecated and well specified features added in their place (including unicode
support and interface-based observer delegation).

comment:5 Changed 9 years ago by bruce

My bug was something different and is now filed as issue #1069.

comment:6 Changed 5 years ago by thijs

  • Cc thijs added

comment:7 Changed 5 years ago by miracle2k

  • Cc elsdoerfer@… added

comment:8 Changed 4 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/unicode-logging-989

(In [29679]) Branching to 'unicode-logging-989'

comment:9 Changed 4 years ago by exarkun

  • Keywords review added
  • Owner itamarst deleted

So...

I added unicode support to the three log observers we have. This basically takes the form of:

  • Try to discover the encoding of the file we're logging to. This probably only works for stdout.
  • Add an option to specify the encoding. This can't be passed on the twistd command line, only specified in a tac file.
  • Force UTF-8 for syslog because the RFC said so
  • Just pass unicode through to the stdlib logging module because it seems to be able to handle it

I'm not happy there's no way to specify this except by writing a tac file, but the mess of twistd command line logging options isn't something I want to touch now.

comment:10 Changed 4 years ago by exarkun

Build results, by the way.

comment:11 Changed 4 years ago by therve

  • Cc therve added
  • Keywords review removed
  • Owner set to exarkun
  • There should probably be something like safe_unicode, to prevent breakage from object with broken __unicode__. Or random exceptions should be logged.
  • textFromEventDict docstring should mention the fact that it can return unicode if encoding is _DO_NOT_ENCODE.
  • I suspect the current implementation may break existing correct code. For example, if I logging utf-8 encoded messages right now, and I use stdout, the cast to unicode will fail. You may have to check the type of the data you're getting.

comment:12 Changed 4 years ago by exarkun

One thing which will simplify this is to actually deprecate and disable the longer-than-one tupleness of the message passed to observers. I filed #4584 and will resolve that before trying to go further on this.

comment:13 Changed 4 years ago by <automation>

  • Owner exarkun deleted

comment:14 Changed 3 years ago by exarkun

#5056 was a duplicate of this.

comment:15 Changed 2 years ago by oubiwann

  • Cc oubiwann added
Note: See TracTickets for help on using tickets.