Opened 15 months ago

Last modified 3 weeks ago

#6750 enhancement reopened

New Logging Hotness™

Reported by: wsanchez Owned by: wsanchez
Priority: normal Milestone: New Logging System
Component: core Keywords:
Cc: glyph, radix, exarkun, spam+twistedmatrix.com@… Branch: branches/log-booyah-6750-5
(diff, github, buildbot, log)
Author: wsanchez, glyph Launchpad Bug:

Description

Our logging system should have certain desirable attributes, so as:

  • Logging events, rather than messages. Events are structured data.
    • Events may provide a format string, to use when rendering as text. This is not, however, required.
  • Allow for arbitrary objects in events.
    • Which covers, but needs to be explicitly called out: allow for arbitrary identifiers in events.
  • Rendering to text should happen only when someone wants to write an event as text, not at the time that the event is created.
    • This prevents doing the work of rendering a message that no observer chooses to utilize, such as: log.debug(“%s” % (somethingExpensive(),))
  • Logging levels.
  • Logging namespaces. (a.k.a. “systems”).
  • Straightforward filtering mechanisms
    • Built-in filtering based on log levels and log namespaces.
  • Don’t use %-formatting, which is clearly no longer hip. Use PEP3101.
  • Straightforward mechanism for registering event observers
    • That may opt into or out of the filtering mechanisms, or build a separate filtering chain.
  • API for logging an event should be simple.
    • Logging a string should be very simple: log.info(“I peed my pants.”)
    • Logging data should be pretty simple: log.info(“{user} peed pants at {time}”, user=getUser(), time=getTime())
    • We can crush PEP3101 with EXTREME AWESOMENESS: log.info(“{self.user} peed pants at {self.getTime()}”, self=self) WHAT NO WAY BOOYAH
    • But you don’t need formatting if it’s not appropriate for you: log.info(obj=self); it just won’t necessarily be all pretty in a text file, is all.

Current logging does some, but not all of the above. The current API is also kind of a mess, where twisted.python.log.msg() takes *message, not message and OMG why? and global variables and stuff.

Attachments (1)

final-pass.png (4.9 KB) - added by glyph 6 months ago.
just sayin'

Download all attachments as: .zip

Change History (73)

comment:1 Changed 15 months ago by wsanchez

  • Owner set to wsanchez
  • Status changed from new to assigned

Much conversation on this went on in email: http://thread.gmane.org/gmane.comp.python.twisted/25777

comment:2 Changed 15 months ago by wsanchez

I’m gonna take a stab at this using code I already wrote which addresses, I think, the above desirable attributes:

comment:3 Changed 15 months ago by wsanchez

And by “I wrote”, I mean “I wrote with considerable help from Glyph”.

comment:4 Changed 15 months ago by wsanchez

  • Author set to wsanchez
  • Branch set to branches/log-booyah-6750

(In [40036]) Branch for #6750

comment:5 Changed 15 months ago by wsanchez

Committed on branch as twisted.python.logging, tests pass.

Next step is to implement twisted.python.log in terms of twisted.python.logging. That’ll have to include implementing some additional features, like the file log observer, in the new module’s terms; right now, the new module uses the old module for observers.

comment:6 Changed 14 months ago by wsanchez

Correction: I used twisted.python.logger as the module name.

comment:7 follow-up: Changed 14 months ago by jml

Does structlog have the attributes described in the OP?

comment:8 in reply to: ↑ 7 Changed 14 months ago by glyph

Replying to jml:

Does structlog have the attributes described in the OP?

Some of them, but not all of them. It also isn't attached as a patch to a Twisted ticket, so it's not really relevant here – but I know that it has generated a lot of excitement so perhaps it's best to have a brief comparison :-).

One of the major themes of my original post was making work-deferral easier, so that logging is intuitively cheap and we will generally do more logging within Twisted. Structlog doesn't appear to do anything more than our existing logging API to defer or skip computation, except for the fact that it provides a filtering API. Sadly, structlog also makes the curious decision to implement filtering by requiring that the filter raise an exception for each dropped message, which (as primitive operations go, anyway) is quite expensive.

The branch on this ticket addresses work deferral with idiomatic extensions to PEP3101. One benefit of this extended syntax (which allows for the suffix of () to invoke a callable, in addition to the standard PEP3101 getattr (.) and getitem ([]) syntax) is that it allows for you to specify a reduced set of potentially non-serializable objects in your event, and defer not simply serializing the serializable aspects of the event, but even the decision to retrieve said data. For example, you can do this:

from twisted.python.logger import Logger
class MyInterestingObject(object):
    log = Logger()
    def __init__(self, socket, value):
        self.socket = socket # Not serializable!
        self.value = value
    def doSomethingInteresting(self):
        self.log.info("{source} doing something interesting "
                      "to {source.value.computation()}.")

Presumably, source.value.computation() is expensive enough (i.e. it's a python function call so it's not completely free) that we don't want to do it unless someone actually wants to see this info message. But what's worth noting is that in this example, we don't even have to build a dictionary with a second key; until format-time, we just have an event dictionary with the built-in source key (and the descriptor on Logger takes care of providing self as the source key, so we could optimize that even further).

So, the format string isn't purely about the english text; it's a mini-language that lets you specify which parts of your log messages might be interesting to subsequent observers. In a live system, pre-serialization, it's helpful for log event objects to contain references to things like their emitter. This is useful for incident-reporting tools (like the one in Foolscap, described in some mailing list messages many aeons ago), which may or may not want to inspect those objects.

I guess you could accomplish this with structlog by building a custom processor for a particular module and then interposing it, but that requires careful consideration and application-specific observer tuning, both of which are possible with the existing logging system.

comment:9 follow-up: Changed 14 months ago by exarkun

I wonder how the logging use demonstrated by

     self.log.info("{source} doing something interesting "
                   "to {source.value.computation()}.")

will interact with structured log analysis tools. At first glance, the use of expressions like "source.value.computation()" inside the log message template and the explicit avoidance of constructing an extra dictionary with keys (thus, with keys into which any thought or planning has gone) means that apart from parsing the prose English version of this log message, there is no reliable way to write a tool that examines logs and treats the result of that computation semantically.

I'm sure this branch will come with copious documentation (and I admit I haven't read any of the code checked in so far (sorry)) so feel free to just point me at that for the answer.

Thanks for your work in this area!

comment:10 Changed 14 months ago by wsanchez

exarkun: That is true if source.value.computation() returns some random thing every time. But if emits, for example, a username, then this is totally parseable by a tool. The point is that the fetching username might be expensive for some reason, and best avoided if the log message will be dropped. Perhaps a clearer example:

self.log.debug("{log_source.user()} wrote data to {log_source.destination()}")

Also: this all assumes that you are looking at output from an observer that renders the format into text, a la FileLogObserver, of course.

glyph: All event keys added by the log system have a log_ prefix, so: log_source, not source. Nit, but it took me a sec to figure out what source was. Other keys provided by the system include: log_level, log_namespace, log_format, log_time (which is how FileLogObserver generates its prefix).

comment:11 follow-up: Changed 14 months ago by exarkun

exarkun: That is true if source.value.computation() returns some random thing every time. But if emits, for example, a username, then this is totally parseable by a tool. The point is that the fetching username might be expensive for some reason, and best avoided if the log message will be dropped.

Thanks. Examples are indeed helpful. How about this one - how would I find all users who have written data to a path beneath /foo/bar? Is there a solution that doesn't involve rb"([^ ]+) wrote data to (.+)"?

comment:12 in reply to: ↑ 11 Changed 14 months ago by wsanchez

Replying to exarkun:

Thanks. Examples are indeed helpful. How about this one - how would I find all users who have written data to a path beneath /foo/bar? Is there a solution that doesn't involve rb"([^ ]+) wrote data to (.+)"?

Slightly better would be to use !r in the format to make it more parse friendly, but you're still playing string parsing games.

A better option is don't use FileLogObserver, use an observer that serializes the event, and make sure your object (log_source) serializes usefully. The problem with FileLogObserver is that it's emitting text meant to be read by people, which is what the format argument is about.

What you can totally do in that case is:

self.log.debug()

Though that will be unfriendly to the observers that do want readable text.

comment:13 in reply to: ↑ 9 Changed 14 months ago by glyph

Replying to exarkun:

I wonder how the logging use demonstrated by

     self.log.info("{source} doing something interesting "
                   "to {source.value.computation()}.")

will interact with structured log analysis tools. At first glance, the use of expressions like "source.value.computation()" inside the log message template and the explicit avoidance of constructing an extra dictionary with keys (thus, with keys into which any thought or planning has gone) means that apart from parsing the prose English version of this log message, there is no reliable way to write a tool that examines logs and treats the result of that computation semantically.

Perhaps I should add yet more code to the branch to demonstrate a workable vision for this :-) but first I'll attempt to describe one.

We can write a serializing log observer that parses the format string far enough to notice that source.value.computation() is a thing that should be getattr'd and apply'd, then store the result of said application in a separate dictionary key, "formatted:source.value.computation()". The stored form of this will be a JSON dictionary or something, the actual smashing together of strings should be done post-serialization only when a human is actually reading it, and not when tools are analyzing it. Hence, no need for grep, just do self.logStorage.query(format="{source} doing something interesting to {source.value.computation()}.").

Note: no need for !r here :). Just make sure anything that ends up in a format key is a serialization-friendly value, and not a mutable object with some random structure, and you can get integers, lists, floats and the like back out with type information intact. The serializer will get to exercise some discretion on the level of fidelity it's going to preserve during serialization, but presumably as soon as we have one which does JSON the semantics will just converge on those semantics.

This does mean you'll need to have a little more public API for log analysis tools to either look up that key or compute it as needed, but presumably that'll be part of the log analysis module :).

I'm sure this branch will come with copious documentation (and I admit I haven't read any of the code checked in so far (sorry)) so feel free to just point me at that for the answer.

Thanks for your work in this area!

comment:14 Changed 14 months ago by wsanchez

Status: The new t.p.logger.py and tests are pretty much done, I think.

I've made a number of changes to t.p.log so that it's implementation uses the logger module where possible:

  • log.LogPublisher:
    • Now wraps it's observers with logger.LegacyLogObserverWrapper
    • msg() forwards events to logger.Logger.publisher
    • _err() is gone
  • log.FileLogObserver:
    • Forwards events to a contained logger.FileLogObserver
    • Time zone offset for the default format is now part of the format string (%z) instead of being tacked onto the result.
      • As a result, getTimezoneOffset() and formatTime() go away, as they are unused.
  • log. PythonLoggingObserver:
    • Is now just a subclass of logger.PythonLogObserver that adds emit() for compatibility.
  • logfile and logerr:
    • Are now instances of logger.LoggingFile instead of StdioOnnaStick.

Those changes verify that the new logging module implements the functionality of the old one, and does so compatibly, since the existing tests for the old module (and the rest of twisted) have to keep passing even after the old module is implemented in terms of the old module.

But the tests did have to change a bit:

  • test_erroneousErrors:
    • The delivery order for events when an observer raises was noted as being "slightly wrong". This is no longer so.
  • test_getTimezoneOffset:
    • Removed due to method removal.
  • test_timeFormatting:
    • Removed due to method removal.
  • test_microsecondTimestampFormatting:
    • Removed due to method removal.
  • First several tests in FileObserverTestCase:
    • Checks for specific exception error strings which have been changed a bit.
  • test_startLogging:
    • isinstance checks on sys.stdout and sys.stderr now check for logger.LoggingFile instead of log.StdioOnnaStick.
  • test_customLevel:
    • Use logging.ERROR instead of logging.DEBUG to avoid getting filtered.

Also, in test_endpoints:

  • test_logStderr:
    • Don't rely on checking event["format"] here; look at the rendered result message instead.

And there are some tests in test_log still in need of some review:

  • test_doubleErrorDoesNotRemoveObserver:
    • This expects log.LogPublisher._err() to be a thing and it's not any more. Need to re-read the test and see whether is still applies.
  • test_showwarning:
    • This test registers an observer that is already registered in setUp(); not sure what's up with that.
  • test_publisherReportsBrokenObserversPrivately:
    • Disabled due to side effects; need to find out what's up here.
    • Changed so as to not rely on publisher.observers being a list.
  • test_errorString:
    • Now calls self.flushLoggedErrors(). Unclear why that wasn't needed before.

And finally, a few are still failing; haven't figured out why yet:

  • These are all in twisted.trial.test.

For reference, here's the complete diff for the branch from creation to HEAD: https://twistedmatrix.com/trac/changeset?old_path=%2Fbranches%2Flog-booyah-6750&old=40036&new_path=%2Fbranches%2Flog-booyah-6750&new=HEAD

comment:15 Changed 14 months ago by exarkun

As a result, getTimezoneOffset() and formatTime() go away, as they are unused.

These are hooks for applications to override. How hard will it be to put them back?

comment:16 Changed 14 months ago by wsanchez

exarkun: not hard.

comment:17 Changed 14 months ago by wsanchez

exarkun: getTimezoneOffset() and formatTime() are back in r40183.

comment:18 Changed 14 months ago by glyph

I've fixed a couple of epytext issues in the branch and gotten the 'new pydoctor warnings' error down to just

twisted.python.logger.PythonLogObserver: invalid ref to logging

which I think should explicitly remain un-fixed, because it's a result of https://bugs.launchpad.net/pydoctor/+bug/1237284 which I think mwhudson already has a fix for. At least he showed me a one-line patch which seemed to address the issue.

comment:19 Changed 14 months ago by glyph

Looks like that pydoctor bug is fixed.

comment:20 Changed 14 months ago by wsanchez

HOWTO is underway:

branches/log-booyah-6750/doc/core/howto/logger.xhtml

Which is great because, for example, writing it got me a bit stuck on describing how to register observers, which I thought made more sense than I now think it does. (If it’s hard to document clearly, it’s probably not well-designed.)

Talked to Glyph at length about it, and I’m going to try a different approach there.

comment:21 Changed 14 months ago by glyph

  • Author changed from wsanchez to wsanchez, glyph
  • Branch changed from branches/log-booyah-6750 to branches/log-booyah-6750-2

(In [40290]) Branching to 'log-booyah-6750-2'

comment:22 follow-up: Changed 14 months ago by glyph

I've been trying to address failing tests. Many builders are passing now, including the documentation builder, so yay for that!

However, I'm still wrestling with some problems; here are some in case anyone is interested and would like to help out.

  1. The python 3.3 tests are at least running on the branch again now, whereas before they were just blowing up while bootstrapping. They still fail, but the failures all seem to somewhat straightforwardly indicate that we're just getting unicode wrong. Some of the fixes I made need their own tests, and should very definitely be moved onto their own branch to mitigate the giganticness of this already-too-large review.
  2. Something is causing the tests in the branch to consume so much memory that they blow up some of the less generously provisioned buildbots. Based on some random things I saw while debugging, my hunch is that this has to do with the log_trace key, which seems to blow up to a really surprising size. It could also be a bug related to the ring buffer though. This also appears to be affecting the wheel-building builder, which is extra weird because I didn't think that even imported any code from Twisted?
  3. I thought that dependence on the Olson timezone database was problematic, so I fixed the TZ values in the time-formatting test to be stand-alone, but FreeBSD appears to be unable to do mktime on any time longer than a year ago - or maybe it can't do it at all - so we need some other way to test the date formatting.

The new pyflakes errors appear to be completely spurious, as they're in a module this branch doesn't even touch.

comment:23 in reply to: ↑ 22 Changed 14 months ago by rwall

Replying to glyph:

  1. The python 3.3 tests are at least running on the branch again now, whereas before they were just blowing up while bootstrapping. They still fail, but the failures all seem to somewhat straightforwardly indicate that we're just getting unicode wrong. Some of the fixes I made need their own tests, and should very definitely be moved onto their own branch to mitigate the giganticness of this already-too-large review.

See #6570 for Python3 compatible constants. The branch for that ticket has been through a couple of reviews already, so should just need a quick final review.

comment:24 Changed 14 months ago by glyph

To follow up to my earlier comment:

  1. Python 3.3 tests pass now.
  2. The memory bug is fixed; I think it was an artifact of some GC interactions that were resolved (and caught by a test for trial itself, woo!).
  3. I worked around the TZ malfunction on FreeBSD manually.

The tests are in much better shape now, but it needs to be merged forward to get rwall's fix.

comment:25 Changed 14 months ago by glyph

  • Branch changed from branches/log-booyah-6750-2 to branches/log-booyah-6750-3

(In [40320]) Branching to 'log-booyah-6750-3'

comment:26 Changed 14 months ago by glyph

Merged forward to get rwall's fix, and wsanchez split up the module into a package to make the code easier to review and maintain. (The public API is still a single namespace, but the implementation is factored so it's clearer to see which pieces are core logging functionality, which pieces are integration with the stdlib logging module, etc.)

I've just forced a build to bring the buildslaves up to date with the latest code.

comment:27 Changed 14 months ago by glyph

Remaining issues before this is reviewable:

  • A couple of pydoctor issues.
  • Spurious pyflakes errors; I've asked someone to upgrade the buildbot, hopefully that'll take care of that red builder before the branch lands;
  • Many twistedchecker errors, many of which are legitimate coding-standard violations but should be easy to fix,
  • As raised on the callWithLogger ticket, we need to look a little more closely at the system-key functionality to see if it's worth reproducing. I'm thinking probably not but it would be good to do a bit of checking around to make sure.

comment:28 Changed 14 months ago by wsanchez

pydoctor issues fixed.
twistedchecker on twisted.python.logger is now clean.

comment:30 Changed 14 months ago by wsanchez

The only part of the system key functionality that is missing here is the part where the reactor magically fills it in via callWithContext; it can still be provided explicitly by applications.

In the new system it’s called log_system, as all keys added/used by the logging system itself now have a log_ prefix.

comment:31 follow-up: Changed 12 months ago by wsanchez

Glyph: there’s one test failure: twisted.test.test_log.FileObserverTestCase.test_startLogging fails because the encoding of (the new) sys.stdout (ASCII) is not he same as it was prior to calling startLogging() (UTF-8).

I’m not sure that’s unexpected, given that the test calls startLogging() with a StringIO object, and StringIO objects don’t have encodings, so we get the default which is ASCII (on Mac OS, anyway).

Is this test expecting startLogging() to ensure that the LoggingFile that replaces sys.stdout preserves the encoding?

comment:32 Changed 12 months ago by glyph

(In [40970]) Address gross, old, vague, sometimes-failing test with nice, new, specific, always-failing test refs #6750

comment:33 Changed 12 months ago by glyph

(In [40971]) Fix for that test. refs #6750

comment:34 in reply to: ↑ 31 Changed 12 months ago by glyph

Replying to wsanchez:

Glyph: there’s one test failure: twisted.test.test_log.FileObserverTestCase.test_startLogging fails because the encoding of (the new) sys.stdout (ASCII) is not he same as it was prior to calling startLogging() (UTF-8).

I’m not sure that’s unexpected, given that the test calls startLogging() with a StringIO object, and StringIO objects don’t have encodings, so we get the default which is ASCII (on Mac OS, anyway).

Is this test expecting startLogging() to ensure that the LoggingFile that replaces sys.stdout preserves the encoding?

Yes. The idea, I believe, is to ensure that startLogging manipulates only as much state as it needs in order to do its job: if the application had previously set up a particular stdout encoding, then that encoding ought to be preserved since it may be depending on that.

Interestingly, this test, since it was written in an old crummy way full of global state, wasn't failing for me because sys.stdout won't have an encoding at all if it's not a tty (as when it's a pipe, because it's run from an editor rather than a terminal).

In the two commits above, I've addressed this purely in the new system, with a much better (and clearer) test.

comment:35 Changed 12 months ago by glyph

Since the comment prompted me to look at this again, I updated the branch with a couple of other outstanding changes that I had, adding some stuff to the documentation.

I also added a negative observer to the filtering observer so that it's actually possible to implement the "unbuffer recent filtered stuff" thing we've discussed (as it was, there was no way to tell which things were filtered).

I think basically everything is covered now. I wish twistd were using the new APIs natively, but probably the thing to do at this point is to revert the changes in twisted.application.app and put that into a separate ticket, since the new API is already useful on its own even without direct tool support.

The old docs should be updated to point at the new ones, and reference limitations of the new system (like the fact that you still need old-style log observers to interact with twistd), and there should be ticket numbers for that stuff. I think it's just about ready for review.

comment:36 Changed 10 months ago by glyph

  • Keywords review added
  • Owner wsanchez deleted
  • Status changed from assigned to new

Notes for the reviewer:

This branch contains a lot of code. It's more code than one can reasonably review all at once. However, breaking the branch up into multiple changes will result in changes that don't fully implement something, and are, even at the smaller code size, not really worth the effort of reviewing. Since it wouldn't fully implement something, it would be hard to get a holistic view of the intended effect (for example, many of the small implementation modules don't expose any public API and therefore have no narrative documentation).

So, my suggestion for reviewers is this:

Select one of:

  1. a single private implementation module (and of course, its tests),
  2. the changes implemented outside of logging, or
  3. the public interface and documentation

and do a complete code review just on that piece. Feel free, of course, to "look aside" to examine how the bits of code that you're reviewing interact with the system as a whole, since that's the whole point of putting this into review all at once, but don't feel as though you need to comprehensively review anything other than the one thing you've selected.

As always, please separate your feedback into sections so that you make a distinction between things that you think must be addressed before landing, things that should be addressed before landing but could be addressed in a separate ticket, and things that you might want the change authors to consider.

For maximum accuracy on the high scores page, and so everybody gets credit on what is still a significant amount of review work, I suggest that each reviewer remove and then re-add the review keyword as they complete their review. If you're going to review a particular portion, say which one you're doing in a comment here first, so that others know.

I'm assuming that not all of these reviews are going to pass, and that there will be a brief second review portion. Once there's a code review in for each portion of the code, a subsequent reviewer can do the whole thing, just making sure that all of the feedback from the original reviews is dealt with, rather than trying to absorb the whole change.

comment:37 follow-up: Changed 10 months ago by dstufft

Maybe I'm wrong, but isn't this documentation not in sphinx format, which it should be in now?

comment:38 in reply to: ↑ 37 Changed 10 months ago by glyph

Replying to dstufft:

Maybe I'm wrong, but isn't this documentation not in sphinx format, which it should be in now?

Oh, good point. I guess we should translate that before landing. You can still read it though, it's HTML :-).

comment:39 follow-up: Changed 8 months ago by adiroiban

I agree with the requirements for the new logging system.

Does this new code really depends on twisted or deferred?

Why not create it as a separate project, independent of Twisted.


I guess that creating a generic logger to fit all use cases is hard and every project tend to reinvent its own logging system.

I have also re-invented the logging system in a project to support "event" based structured logs and option lazy-rendered text.

Instead of log levels and log namespaces I went for a flat "tag" like structure. Some tags have log level semantic while other have namespace semantic. In this way, I find that core code is much simple and flexible. Filter and observers could add custom behavior on top of those tags.

With events containing arbitrary data, it should be easy to ignore log levels and log system and just filter based on event tag data.

Hope this feedback is useful. Otherwise, feel free to ignore it.

comment:40 in reply to: ↑ 39 ; follow-up: Changed 8 months ago by wsanchez

Replying to adiroiban:

Does this new code really depends on twisted or deferred?

The only dependency I can think of is on Failure. I do think that losing that functionality would be a bummer, but it is a small part of the feature set.

Why not create it as a separate project, independent of Twisted.

That argument probably applies to most of twisted.python

comment:41 in reply to: ↑ 40 Changed 8 months ago by glyph

Replying to wsanchez:

Replying to adiroiban:

Does this new code really depends on twisted or deferred?

The only dependency I can think of is on Failure. I do think that losing that functionality would be a bummer, but it is a small part of the feature set.

The issue is that Twisted should depend on it, rather than the other way around.

Twisted needs a way to do its own logging in a way that communicates well with the myriad other logging systems available, and the totally unstructured, undocumented way that the current logging system is put together makes that problematic. You could log in a structured way, but there's no real incentive to use it correctly. This new system provides a number of cool features which would encourage both Twisted maintainers and application developers to use it right, even if their ultimate consumer is stdlib logging or syslog.

Why not create it as a separate project, independent of Twisted.

That argument probably applies to most of twisted.python

Definitely true. And we should have more projects like filepath and Deferred, possibly eventually breaking Twisted itself up into those projects.

However, doing all the work of splitting this stuff up involves dozens of complicated logistical problems related to documentation, issue tracking, provenance tracking, continuous integration infrastructure, package management, version control, contributor authentication, release management, and so on and so on. By all means, let's try to address those problems, but a drive-by comment on a single ticket is not the way to go about it :-). Let's leave further comments on this ticket for review comments, please.

comment:42 Changed 8 months ago by ralphm

  • Owner set to ralphm
  • Status changed from new to assigned

comment:43 Changed 8 months ago by ralphm

  • Keywords review removed
  • Owner changed from ralphm to glyph
  • Status changed from assigned to new

This is the first phase on my review on this branch, covering part 3 (public
interface and documentation), per comment:36.

  • In general I really like the new system and it is a definite improvement over the old one. Yay!
  • At first glance, log_source is confusing. In ad_hoc.py, I expected self and didn't immediately understood where log_source was coming from. Maybe it would help to have an explicit table listing all of the fields that might be added by the system.
  • There isn't a clear nomenclature for what goes in events. Names used to refer to mostly the same concept are: event keys, log keys, log values, event attributes, fields, objects.
  • In comment:13, there is brief mention of mutable objects and how they might be undesirable. Given that the format string isn't used until an observer wants to, effectively all field values are supposed to be unchanged until that happens. I think that passing dicts as values is going to be a common use case, so maybe this needs to spelled out more. Especially if there will be observers that queue up events to be formatted and rendered some time in the (near?) future.
  • The section on the global log publisher talks about the new concept publisher without really introducing it. I'm not sure if this is clear if you have no proper knowledge of the existing logging system.
  • Even though it is a public interface, Logger.emit isn't mentioned in the howto. I think it worth mentioning that it exists and that you can pass it the log level as the first argument.
  • The Python logging infrastructure makes it easy to pass exc_info to .info, .error, etc., or have it be extracted from sys.exc_info by passing True instead. I appreciate the explicitness of Logger.failure. The assumption that people want this logged at level critical doesn't match my experiences. I'm on the fence which of the two alternatives is a nicer API.

I also briefly looked at the bridge to the Python Logging Facility from the standard library (STDLibLogObserver) and think it is good start, but a bit lacking, like with the old logging system. A few comments:

  • The whole event is rendered as a string and only the log level is converted to the standard library counterpart. I think that we can do better by creating a LogRecord and calling logging.Logger.handle directly. This would also remove the need to explicitly overriding findCaller.
  • Our log_namespace could be used as the logger name.
  • Custom fields can be passed as .args. You can't really use extra because those are merged into LogRecord.__dict__. Yeah.
  • It would be nice to convert Failures into the exc_info used in LogRecord.
  • It would be nice to keep the original timestamp of the log event, especially in the face of possible delays before an event is passed to the Python logging facility.
  • Since the documentation mentions that observers must be thread-safe, why doesn't this use callInThread for Logger.log?

As a side note and not really part of the review, distinguishing conceptually different events, and the types of the values in fields, is important when sending log events to systems that analyze and/or store them.

For example, I intend to send logs (indirectly) via Logstash to Elasticsearch. The way Elasticsearch works is that a JSON structure gets interpreted as (nested) fields with (arrays of) values. Those fields are strongly typed using a so-called mapping. This mapping can be dynamic, depending on the very first value received for a particular field. In this case, if the first value is a number, all future events with a string value for that field will be rejected. However, a mapping is tied to a document type, with each type represented by a unique identifier (for that index). This means that you can have documents of different types, each with their own field mapping, in the same index.

It would be great if this was easy to do with the new logging system. Possible solutions would be an application specific field (like category) that is used for the document type. Another would be to use the format string and maybe hash that to arrive at a unique identifier to use for distinguishing document types. Finally, systems like Eliot have a way make the typing of log events explicit. My feeling is that this logger doesn't need any adjustments to accommodate, but I wanted to put these observations out there.

comment:44 Changed 8 months ago by ralphm

  • Keywords review added
  • Owner changed from glyph to ralphm
  • Status changed from new to assigned

comment:45 Changed 8 months ago by ralphm

  • Keywords review removed
  • Owner changed from ralphm to glyph
  • Status changed from assigned to new
  • Changes look good and comprehensive all around. Booyah!
  • Alas, as expected, the branch needs to be merged forward. The most exciting conflict is that of the howto, that needs to be transformed to Sphinx.
  • Tests pass!
  • Test coverage is not 100%. Besides the expected Python 3 support, there are also a few edge cases that miss coverage. Some places even have a FIXME comment (_json.py and test_logger.py).
  • Depend on dateutil instead of 'adding' _tzhelper?
  • In t.p.test.test_tzhelper.py:
    • There is a mention of pyflakes needing an upgrade. I notice that requireModule was recently added, but I'm not sure if that should exist and weather or not this import of tzset should use it.
    • Typo in addTZCleanup's docstring.
  • The header of _flatten mentions test_flatten, which doesn't exist. However, test_flatten.FlatFormattingTests tests all of the _flatten module, minus the above mentioned edge cases.
  • _flatten.py is not in setup3.py.
  • There are various changes to modules for better Python 3 compat and coding style, without those modules receiving changes related to this ticket. E.g. twisted.python.reflect, twisted.internet.test.test_unix, twisted.test.test_twistd.
  • twisted.test.test_log imports io halfway the module.
  • Missing news file.

This concludes my review. Overview:

  • The comments on documentation are mostly clarification and should be done as part of this ticket.
  • The comments on STDLibLogObserver are nice-to-have, except for the thread-safeness.
  • 100% test coverage seems attainable without too much effort and should be done as part of this ticket.
  • Obviously, the branch must be properly merged forward for a final short review.

Thanks!

comment:46 Changed 8 months ago by wsanchez

r42329 adds documentation for the log_* event keys added by the system.

r42332 cleans up the terminology around events.

r42333 adds more documentation around potential issues surrounding putting objects into events which may be later mutated.

r42334 adds an introduction to the Global Log Observer section.

r42335 notes the existence of Logger.emit().

Logger.failure() takes a level argument, if you don't like the default value of LogLevel.critical. The thinking around that default is that because this is meant to capture unhandled exceptions, that something when wrong that the developed of the software can't even describe, so an observer should probably show a traceback and complain very loudly. If you are logging an error which doesn't merit this sort of attention, you probably want log.error() or a more appropriate emitter. If this is an unpopular decision, we can add functionality to Logger to change this default to something else when it is created or something.

I agree that the integration with the stdlib logging system could be improved. I tried to make it a little bit better than our current logging, but I'm not terribly familiar with stdlib logging. Since this isn't worse than the status quo, and fixing these issues does not (I think) alter any other parts of the new logging system, I suggest we defer that work to a follow-on ticket, #7167.

On the topic of threading and requiring observers to be thread-safe, this system handles it as well (as poorly) at the old system; the docs just call this out a bit more prominently. Glyph thinks that this is addressable in a follow-on ticket, perhaps with decorators for observers. I opened #7168 to track that.

I agree with the idea of having some way to statically type values in events that are expected to conform to some interface. I think that can be added on, perhaps by providing events with a key that describes the expected value types for specific keys, etc. and then the loggin system can provide a way to verify conformance (eg. for tests).


r42347 rounds out the tests for the _json module.

r42348 adds missing tests for the magic log_source key.

r42349 puts tests for the _flatten module into their own file. …and r42350 adds the file I forgot to add in r42349.

I don't think dateutil quite provides the neatness of FixedOffsetTimeZone, and we really didn't want to add the dependency just for this. We did at least, replace another existing FixedOffsetTimeZone, so that there's only one.

r42351 removes the complaint about needing to update pyflakes.

r42352 fixes the typo in addTZCleanup's docstring.

r42353 add twisted.python.logger._flatten to setup3.py.

Glyph says the seemingly gratuitous Python 3 fixes were necessary to test this on Python 3.

r42355 moves the io import to the top of test_log.py.


Still to do:

  • Merge forward
  • Port the logger HOWTO to Sphinx
  • Verify coverage
  • Run tests in new branch
  • Update the old logging HOWTO to note that there's a new sherrif in town

comment:47 Changed 8 months ago by wsanchez

  • Branch changed from branches/log-booyah-6750-3 to branches/log-booyah-6750-4

(In [42394]) One more for the win

comment:48 Changed 8 months ago by wsanchez

And…

  • Merge forward
  • Port the logger HOWTO to Sphinx
  • Verify coverage
  • Run tests in new branch
  • Update the old logging HOWTO to note that there's a new sherrif in town
  • Add new HOWTO to index

Forcing build on final bits…

comment:49 Changed 8 months ago by wsanchez

  • Keywords review added
  • Owner changed from glyph to ralphm

comment:50 follow-up: Changed 7 months ago by zooko

People who are interested in this ticket would probably also be interested in the fact that Foolscap has a similarly designed logging system:

http://foolscap.lothar.com/docs/logging.html

It is similar in terms of logging structured "events" instead of strings.

Foolscap's logging system has many excellent innovations that have served us well for many years in use in Tahoe-LAFS:

  • "incident-triggered logging": it keeps one circular buffer in memory for each log level, so under normal operation those messages do not get stored to persistent state at all, but if an "incident" occurs (i.e. a sufficiently high-level log message, or the user clicks a "report a bug" button, or something) then the contents of those circular buffers get dumped to persistent storage. The result is that you do not incur any cost for normal logging noise during normal operation, and yet you still get all of the context about "what led up to this ??" when you're trying to diagnose an incident. This has worked beautifully for us and we'll never give it up.
  • remote logging; Foolscap is a remote object protocol, so it is natural and easy to set up a daemon which receives logging messages from one or many remote processes and aggregates them together. Again, we couldn't do without this, because we need to watch many processes distributed across many computers. This also means you can attach to a running foolscap process and start tailing its logs, which is another way to start looking at the detailed, noisy log messages (the ones that are getting buffered), even though under normal operation they do not get persisted or transmitted at all.
  • an attempt at causality-tracing; You know how in synchronous-concurrency languages people use call-stacks to keep track of causality? And when they switch to Twisted they have a problem, because although event B was caused by event A (e.g. by event A doing a reactor.callLater(event_B) or something), there's no evidence of that in any Python stack trace. So, Foolscap logging has an option to explicitly name which other thing is your causal-parent (or one of your causal-parents). Foolscap comes with an HTML renderer that shows those in tree structure. I say "an attempt" instead of a home run (like the first two features), because we've never used that optional feature consistently enough to make the results really useful to us.

comment:51 in reply to: ↑ 50 Changed 7 months ago by glyph

Replying to zooko:

People who are interested in this ticket would probably also be interested in the fact that Foolscap has a similarly designed logging system:

To give credit where credit is due, discussions with Brian about how Foolscap's logging works was an explicit inspiration for many aspects of this new system.

comment:52 Changed 7 months ago by wsanchez

ralphm: With Twisted 14.0.0 out, it's a good time to merge. Does this look OK to go now?

comment:53 follow-up: Changed 7 months ago by ralphm

I'll do a final pass this week.

comment:54 in reply to: ↑ 53 Changed 7 months ago by glyph

Replying to ralphm:

I'll do a final pass this week.

Believe me, I'll hold you to that :-).

Thanks.

comment:55 Changed 7 months ago by wsanchez

meep

Changed 6 months ago by glyph

just sayin'

comment:56 Changed 6 months ago by glyph

comment:57 Changed 6 months ago by glyph

Ralph, should we leave this ticket assigned or should I be trying to entice another reviewer?

comment:58 Changed 6 months ago by ralphm

  • Keywords review removed
  • Owner changed from ralphm to wsanchez

Woah, wasn't that a long week?

I have some final comments:

  • Docstring of Logger.__get__ should say its instead of it's.
  • LoggerTests.test_trace doesn't have full coverage. Both nested functions are never called, which means the test is a no-op. The cause is that it sets publisher on the logger, instead of observer. Changing that makes the test pass and have full coverage again. The wrapper function named publisher can be dropped.
  • LoggerTests.test_conflicting_kwargs has a FIXME comment.
  • In various test_interface methods: why the try/except around verifyObject?
  • LogBeginner.showwarning: the case where file is not None and the Python version >= 2.6 is not tested. The implementation is correct, though.

These all seem minor to address. When having done so, please merge!

Booyah!

comment:59 Changed 6 months ago by glyph

Thanks for doing this review, Ralph, I know it was a tough one! :-D

comment:60 Changed 6 months ago by wsanchez

  • Docstring of Logger.get should say its instead of it's. r42752
  • LoggerTests.test_trace doesn't have full coverage. r42753 (Great catch, thanks!)
  • LoggerTests.test_conflicting_kwargs has a FIXME comment. r42754 (No going to log conflicts, doesn't seem needed.)
  • why the try/except around verifyObject? (I did that to raise a trial assertion rather than some random exception. Perhaps that's too pedantic…)
  • LogBeginner.showwarning: the case where file is not None and the Python version >= 2.6 is not tested. The implementation is correct, though.

comment:61 Changed 6 months ago by wsanchez

  • LogBeginner.showwarning (r42755)

comment:62 Changed 6 months ago by wsanchez

  • Resolution set to fixed
  • Status changed from new to closed

(In [42756]) Merge log-booyah-6750-4: Brief description

Author: wsanchez, glyph
Reviewer: ralphm
Fixes: #6750

New logging system.

comment:63 Changed 6 months ago by wsanchez

  • Resolution fixed deleted
  • Status changed from closed to reopened

(In [42757]) Revert r42756: Add feature file and nothing else, whoops.

Committed only the topfile.

Reopens: #6750

comment:64 Changed 6 months ago by wsanchez

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [42758]) Merge log-booyah-6750-4: New logging system.

Author: wsanchez, glyph
Reviewer: ralphm
Fixes: #6750

New logging system: twisted.python.logger

comment:65 Changed 6 months ago by itamarst

  • Resolution fixed deleted
  • Status changed from closed to reopened

(In [42759]) Revert r42758, r42757, and r42756 (ideally in a history-fixing way)

The merge of the log-booyah-6750-4 branch was broken up across multiple
revisions in an extremely confusing (to humans and in metadata) way.
Clear out those changes so it can be merged normally.

Reopens: #6750

comment:66 Changed 6 months ago by exarkun

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [42760]) Merge log-booyah-6750-4: New logging system.

Author: wsanchez, glyph
Reviewer: ralphm
Fixes: #6750

New logging system: twisted.python.logger

comment:67 Changed 5 months ago by posita

  • Cc spam+twistedmatrix.com@… added

comment:68 Changed 3 weeks ago by hawkowl

  • Resolution fixed deleted
  • Status changed from closed to reopened

(In [43431]) Revert r42760: Revert #6750 and related tickets.

Please see http://twistedmatrix.com/pipermail/twisted-python/2014-November/028912.html.

Reopens: #6750

comment:69 Changed 3 weeks ago by wsanchez

  • Milestone set to New Logging System

comment:70 Changed 3 weeks ago by wsanchez

  • Branch changed from branches/log-booyah-6750-4 to branches/log-booyah-6750-5

(In [43432]) Nth time is the charm

comment:71 Changed 3 weeks ago by wsanchez

r43432 brings back the code that was on trunk. The following regressions resulted afterwards, resulting in a revert:

  • #7545 add default error reporting back to new logging system
  • #7548 Tracebacks added to Logger.failure() message text?
  • #7596 twisted.python.log.FileLogObserver no longer outputs timestamps
  • #7676 spread.pb: python.reflect.qual() gets a string which it cannot handle
  • #7551 logger-logged exceptions and log-logged exceptions look different to observers

comment:72 Changed 3 weeks ago by wsanchez

r43434 and r43435 merge the previously reviewed fixes for #7545 and #7548 respectively, which had also been on trunk.

That puts this branch on the same footing as trunk prior to the revert in r43431.

Buildbot is green except for some twistedchecker and pydoctor errors. r43436 cleans up three of the twisted checker errors. The rest are bogus: "too many blank lines" where there aren't, in fact, too many and pydoctor doesn't seem to know about NoneType.

So r43436 is our starting point for continuing from already-reviewed code.

Note: See TracTickets for help on using tickets.