Opened 2 years ago

Last modified 5 months ago

#5696 defect new

PythonLoggingObserver does not log structured data.

Reported by: dreid Owned by: richzeng
Priority: normal Milestone:
Component: core Keywords:
Cc: chris@…, wsanchez@… Branch: branches/structured-log-logging-5696
(diff, github, buildbot, log)
Author: glyph Launchpad Bug:

Description

The stdlib logging module supports an extra keyword argument which is a dictionary that will be combined with the LogRecord. When PythonLoggingObserver calls self.logging.log it fails to pass the eventDict argument to emit as this extra keyword argument, thus all structured data that might be attached to the log message is lost.

The extra keyword argument is documented here: http://docs.python.org/library/logging.html#logging.Logger.debug

Attachments (3)

logging-extra.patch (1.2 KB) - added by chris- 2 years ago.
Pass eventDict along to stdlib logging as extras
logging-extra-v2.patch (2.6 KB) - added by chris- 2 years ago.
Replaces patch logging-extra.patch
logging-extra-v2.2.patch (799 bytes) - added by richzeng 2 years ago.
Meant to replace the original patch to twisted.python.log. Other processes make use of the eventDict, so we can't modify it in place. We need to make a copy of eventDict (in this case 'extraDict'). We also need to filter out the extraDict for certain keys that will make Python's stdlib complain when you try to pass them in with the 'extra' keyword.

Download all attachments as: .zip

Change History (14)

Changed 2 years ago by chris-

Pass eventDict along to stdlib logging as extras

comment:1 Changed 2 years ago by chris-

  • Cc chris@… added
  • Keywords review added

comment:2 follow-up: Changed 2 years ago by ralphm

  • Keywords review removed
  • Owner set to chris-

After quite some recent research into twisted.python.log and Python's logging, I am unconvinced about using extra as the way to pass additional data along so it can be used by Formatters and/or Handlers. Internally, everything from extra is bolted onto the LogRecord instance as additional attributes. This means you have to be careful about already existing keys. E.g. it will prohibit overriding message and asctime. However it doesn't provide any protection against overriding methods. Damaging example:

>>> import logging
>>> logging.basicConfig()
>>> logging.warning('hello', extra={'getMessage': 'something'})
Traceback (most recent call last):
  File "logging/__init__.py", line 846, in emit
    msg = self.format(record)
  File "logging/__init__.py", line 723, in format
    return fmt.format(record)
  File "logging/__init__.py", line 464, in format
    record.message = record.getMessage()
TypeError: 'str' object is not callable
Logged from file <stdin>, line 1

Oops!

I'll explain a bit about Python's logging facilities below given the signature of the generic Logging.log method:

    def log(self, level, msg, *args, **kwargs):
        """
        Log 'msg % args' with the integer severity 'level'.
        """

There are two different steps for formatting a log message through Formatter.format with the argument record, a LogRecord instance`.

  1. The record's getMessage is called. This basically does msg = msg % record.args, where record.args is usually what is passed in using the args in the above signature. But see below. The result is stored in record.message.
  2. The actual logged string is formatted. This is s = self._fmt % record.__dict__, where s is what is returned by format. If record.exc_info is set, a traceback will be rendered and added to s, too. The traceback is cached in record.exc_text.

If args has only item and the item is a dict and not empty, it is used instead (args = args[0]). This allows you to do this:

logging.log('WARNING', 'Hello, %{place}s!', {'place': 'World'})

In this usage, record.args will hold {'place': 'World'}, and record.getMessage yields Hello, World''.

Depending how you want to actually the additional data from eventDict determines how we should pass it. If you need to rely on the standard Formatter behaviour and want to refer to keys from eventDict in the pre-set format string, you probably should use extra. However, if you need to write your own formatter and/or handler, you have direct access to the full LogRecord instance, including args, so that you don't get bitten by key clashes.

On top of all that, you probably also want to treat the failure key in eventDict if isError is set, so that the embedded exception information is passed on to the log record, too. You can pass this along as a tuple with the exc_info keyword argument to Logging.log.

As for the patch itself, it lacks a news file and we should probably document the added behaviour in our prose docs.

Changed 2 years ago by chris-

Replaces patch logging-extra.patch

comment:3 in reply to: ↑ 2 Changed 2 years ago by chris-

  • Keywords review added
  • Owner chris- deleted

Replying to ralphm:

After quite some recent research into twisted.python.log and Python's logging, I am unconvinced about using extra as the way to pass additional data along so it can be used by Formatters and/or Handlers. Internally, everything from extra is bolted onto the LogRecord instance as additional attributes. This means you have to be careful about already existing keys. E.g. it will prohibit overriding message and asctime. However it doesn't provide any protection against overriding methods.

This is certainly true, but in my opinion, this is a problem that needs to be addressed in sdtlib logging, not by twisted.

Depending how you want to actually the additional data from eventDict determines how we should pass it. If you need to rely on the standard Formatter behaviour and want to refer to keys from eventDict in the pre-set format string, you probably should use extra. However, if you need to write your own formatter and/or handler, you have direct access to the full LogRecord instance, including args, so that you don't get bitten by key clashes.

This is true, but the PythonLogObserver should not rely on the Handler and Formatter to handle this special case. The way I see it the extra keyword is the recommended way to pass structured data to the LogRecord and as stated above, it should not be up to twisted to work around problems the stdlib logging module.

On top of all that, you probably also want to treat the failure key in eventDict if isError is set, so that the embedded exception information is passed on to the log record, too. You can pass this along as a tuple with the exc_info keyword argument to Logging.log.

Agreed, but when writing the patch, I did not consider this to be in the scope of the issue, especially since twisted already handles the traceback formatting for failures and exceptions. Passing the exc_info on as separate information would probably lead to logging each traceback twice.

As for the patch itself, it lacks a news file and we should probably document the added behaviour in our prose docs.

Attached patch v2 adds a topfile and a mention of the behaviour in the api docs.

comment:4 follow-up: Changed 2 years ago by richzeng

Please note that the current patch only deletes the 'message' key from the eventDict. There are several other keywords that Python stdlib's LogRecord keeps track of, and it complains when you try to pass them in using the 'extra' keyword argument.

They are documented here.
http://docs.python.org/library/logging.html#logrecord-attributes

Changed 2 years ago by richzeng

Meant to replace the original patch to twisted.python.log. Other processes make use of the eventDict, so we can't modify it in place. We need to make a copy of eventDict (in this case 'extraDict'). We also need to filter out the extraDict for certain keys that will make Python's stdlib complain when you try to pass them in with the 'extra' keyword.

comment:5 in reply to: ↑ 4 Changed 2 years ago by chris-

Replying to richzeng:

Please note that the current patch only deletes the 'message' key from the eventDict. There are several other keywords that Python stdlib's LogRecord keeps track of, and it complains when you try to pass them in using the 'extra' keyword argument.

They are documented here.
http://docs.python.org/library/logging.html#logrecord-attributes

  1. The 'message' key is the only key that is mandatory to twisted log and incompatible to twisted, thus twisted should take care of cleaning that one up. This is not much of a problem because the message is passed to logging using a separate argument, so no data is lost here.
  1. You assume that sdtlib logging is done with the root logger. Because of that, you do not handle attributes of data that conflicts with possible LogRecord subclasses, so those will still fail. This applies to the logger and Logger subclasses as well.

Bottom line, you are trying to take filtering that normally for good reason takes place at the end of the log record creation chain and put it to the top. IMO that is a bad idea because it changes the way stdlib logging is meant to be used. Twisted should not try to predict what is meant to be done with data it passes to non-twisted libs.

comment:6 Changed 2 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

Reviewing.

comment:7 Changed 2 years ago by glyph

  • Author set to glyph
  • Branch set to branches/structured-log-logging-5696

(In [34772]) Branching to 'structured-log-logging-5696'

comment:9 Changed 2 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to richzeng
  • Status changed from assigned to new

Thanks Rich,

  1. The approach taken to quash the problematic attributes in this patch has a number of problems.
    1. There's no tests for it. What if this list of names were totally different? Would the tests still pass?
    2. It's silently dropping data. Don't do that. The names of conflicting attributes should be mangled and then passed through.
    3. This last one is a silly nit-pick that I feel bad even making because I don't have benchmarks to back it up, but the list is re-created every time here rather than being stored in a global constant. This means that you do an extra heap allocation. It should be a set().
  2. Something weird happened when I applied the patch, I guess. It looks like logger.log gets invoked twice now. Please do another patch from trunk to avoid this (hopefully it wasn't in your code!). Tellingly, this does not cause any tests to fail, which is a clear deficiency in the test suite.
  3. Twisted's coding standard is a tiny bit different from PEP8; please put two blank lines between methods, three between top-level suites (i.e. classes and module-level functions).
  4. Similarly I don't think the monkeying with whitespace elsewhere in the file is necessary (although maybe this is more patch-application weirdness).
  5. When the docstring refers to the extra parameter, it should use formatting to indicate that it's talking about a variable. C{} is the epytext markup for fixed-width code text.

Sorry for all the back-and-forth on this, stdlib logging is pretty terrible (and ours is only slightly less bad).

comment:10 Changed 12 months ago by glyph

I'm working on this a bit in the context of #6750.

comment:11 Changed 5 months ago by wsanchez

  • Cc wsanchez@… added

#6750 isn't going to do everything here. See also its follow-up bug #7167.

Note: See TracTickets for help on using tickets.