Opened 7 years ago

Last modified 3 years ago

#8036 enhancement new

Document how the new logging system should be used and tested in new code

Reported by: Adi Roiban Owned by: Adi Roiban
Priority: normal Milestone:
Component: logger Keywords:
Cc: d.vinella@…, wsanchez@… Branch: new-log-usage-8036
branch-diff, diff-cov, branch-cov, buildbot
Author:

Description (last modified by Adi Roiban)

I still don't have a clear guideline about how the new logging systems should be used in Twisted to replace the old logging system.

  • Do we use namespaces?
  • Do we always use the global log publisher?
  • Do we document in the docstring/API what is logged by calling a method?
  • Is the logging part of the contract/API ?
  • How do we test the log text and data?
  • Do we want optional or mandatory check of log calls.
  • Do we check all structured logs ?

Some info here: https://twistedmatrix.com/trac/ticket/7897#comment:10


Problems with new name of the new log object https://twistedmatrix.com/trac/ticket/7983


From https://twistedmatrix.com/trac/ticket/8028#comment:4:

Regarding the logging part I based my choice looking at the only two other instances of the new logging system outside the logger package: defer.py and protocol.py . Looking at those files and these discussions: https://twistedmatrix.com/trac/ticket/7897#comment:10 https://twistedmatrix.com/trac/ticket/7983 it's pretty clear that the "common" accepted way is to use the global publisher, at module level, without a custom namespace. I'll just change the log symbol to _log.

Regarding the coverage of the formatting string I'd like to have some hints about which is the best solution to handle it, since the the "catch all" formatUnformattableEvent makes quite difficult to know if a log_format can't be properly formatted until the log is directed to a textfileobserver, requiring a not so nice "assertIn('expected_formatted_string', logline)". Besides being breakable by an innocuous rephrasing, this obviously requires a logging setup/teardown phase that can easily bloat the test code, e.g.

def test_getUserNonexistentDatabase(self):
    """
    A missing/inaccessible db file should cause a permanent rejection of
    authorization attempts.
    """

    fileHandle = io.StringIO()
    observer = textFileLogObserver(fileHandle)
    globalLogPublisher.addObserver(observer)
    self.addCleanup(lambda: globalLogPublisher.removeObserver(observer))

    self.db = checkers.FilePasswordDB('test_thisbetternoteverexist.db')
    self.failUnlessRaises(error.UnauthorizedLogin, self.db.getUser, 'user')

    self.assertIn("Unable to load credentials db: "
                  "IOError(2, 'No such file or directory')",
                  fileHandle.getvalue())

Maybe a reusable test utility like a context manager could help keeping the overhead code to the minimum; something like this:

class Capture(object):

    def __init__(self, testCase):
        self.testCase = testCase
        self.fileHandle = io.StringIO()
        self.observer = textFileLogObserver(self.fileHandle)

    def __enter__(self):
        globalLogPublisher.addObserver(self.observer)
        return self

    def __exit__(self, type_, value_, tb_):
        self.testCase.addCleanup(
            lambda: globalLogPublisher.removeObserver(self.observer))

    @property
    def logs(self):
        return self.fileHandle.getvalue()


def test_getUserNonexistentDatabase(self):
    """
    A missing/inaccessible db file should cause a permanent rejection of
    authorization attempts.
    """

    with Capture() as capture:

        self.db = checkers.FilePasswordDB('test_thisbetternoteverexist.db')
        self.failUnlessRaises(error.UnauthorizedLogin, self.db.getUser, 'user')

        self.assertIn("Unable to load credentials db: "
                      "IOError(2, 'No such file or directory')",
                      capture.logs)

Honestly I don't really like the string assertion and the fact that it seems that we are testing the logger's formatEvent method among cred tests, so I'd like to hear some advices from twisted devs about it.

Attachments (3)

log_capture.patch (2.5 KB) - added by Dario Vinella 7 years ago.
log_capture-2.patch (7.1 KB) - added by Dario Vinella 7 years ago.
logcapture-wsanchez.diff (5.4 KB) - added by Wilfredo Sánchez Vega 6 years ago.
wsanchez suggested edits

Download all attachments as: .zip

Change History (47)

comment:1 Changed 7 years ago by Adi Roiban

Description: modified (diff)
Summary: Document how the new logging system should be used in new codeDocument how the new logging system should be used and tested in new code

comment:2 Changed 7 years ago by Adi Roiban

Description: modified (diff)

comment:3 Changed 7 years ago by Adi Roiban

Description: modified (diff)

comment:4 Changed 7 years ago by Adi Roiban

If we use the global log observer, why do we need to instantiate a log object in each module?


Here is an idea about how to test the log messages

class TestCaseWithLogging(TestCase):
    """
    A test case which will capture the logs emitted using the new
    logging system.
    """

    def setUp(self):
        self.__observer = InMemoreWithTextFormatObserver()

    def tearDown(self):
        self.assertEqual(
            0, len(self.__observer.logs), 'There are still unchecked logs.')

    def assertLog(self, text, data):
        """
        Check the log message from the top of the queue.
        """
        actual_log = self.__observer.logs.pop(0)
        self.assertTrue(actual_log.valid_format)
        if isinstance(text, Contains):
            self.assertTrue(text in actual_log.text)
        else:
            self.assertEqual(actual_log.text, text)
        self.assertEqual(actual_log.data, data)


class TestSomeLogUsage(TestCaseWithLogging):

    def test_getUserNonexistentDatabase(self):
        """
        A missing/inaccessible db file should cause a permanent rejection of
        authorization attempts.
        """

        self.db = checkers.FilePasswordDB('test_thisbetternoteverexist.db')
        self.failUnlessRaises(error.UnauthorizedLogin, self.db.getUser, 'user')

        self.assertLog(
            text=(
                "Unable to load credentials db: "
                "IOError(2, 'No such file or directory')"
                ),
            data={
                'e': SomeError(),
                }
            )
        # or
        self.assertLog(
            text=Contains('Unable to load credentials db:')
            data={'e': SomeError()}
            )

comment:5 Changed 7 years ago by Adi Roiban

(00:18:51) adiroiban1: wsanchez: why do we need self._log or self.log ... why not use a singleton ?
(00:19:25) glyph: adiroiban1: 'self._log' helpfully curries in 'self'
(00:19:48) glyph: adiroiban1: so it's present automatically in the log event

comment:6 Changed 7 years ago by Adi Roiban

Comment by d.vinella https://twistedmatrix.com/trac/ticket/8028#comment:9

I hope the use of the _log attribute doesn't raise the same problems exposed in #7983, since FilePasswordDB is a good subclassing candidate. But if it is the agreed way of introducing the new logger, let's merge it.

Last edited 7 years ago by Adi Roiban (previous) (diff)

Changed 7 years ago by Dario Vinella

Attachment: log_capture.patch added

comment:7 Changed 7 years ago by Dario Vinella

Cc: d.vinella@… added

I'd like to reopen a discussion on this ticket, addressing at least one of the original four points, the 4th, with some code. I attached a reviewed implementation of my Capture class ( https://twistedmatrix.com/trac/ticket/8028#comment:4 ), extended with a capture of the raw event dicts. I think that the context manager is a leaner way to implement this utilty rather than introducing inheritance in test case classes.

One would simply import it from proto_helpers (maybe it is not the correct location but it's just a proposal) and use it like

    with LogCapture() as lc:
        [... code that logs events ...]

        self.assertIn('logline text', lc.asText)
        self.assertNotIn('Unable to format event', lc.asText)
        selc.assertEqual(lc.asEvents[0]['log_level'], Loglevel.debug)

All comments are welcome :)

comment:8 Changed 7 years ago by Dario Vinella

Keywords: review added

comment:9 Changed 7 years ago by Adi Roiban

Description: modified (diff)

d.vinella, many thanks for your patch.

Please update your patch to have a few real case examples.

For example you can use it to extend/refactor the tests from #8028 #7897 #7983


Before doing any code I would like to have glyph or wsanchez "approve" the plan and then we can start working on documentation + test helpers

I need them since they were involved in the design / development of the new system and should have a better understanding of why and how the new logging system is used.


If we fail to get some feedback from them and we have more than 1 person interested in this (for ex adiroiban and d.vinella) we can start and implement this as we see fit.


Thanks and sorry for all the confusion!

comment:10 Changed 7 years ago by hawkowl

From the questions at the top:

Do we use namespaces?

It automatically puts them in -- the class where it was defined. So we shouldn't need to do much about it.

Do we always use the global log publisher?

Yes, we should, because Twisted code may call into non-Twisted code which makes stdlib/legacy log messages, which then get redirected to the global logger.

Do we document in the docstring/API what is logged by calling a method?

No, unless it is a security feature (eg. "potentially insecure operation").

Is the logging part of the contract/API ?

No, unless it is a security feature (eg. this will log when potentially insecure operations are done).

How do we test the log text and data?

I don't think we should test the log representation, but that the keys in it are certain values. If it wasn't structured logging, it would make sense -- but as it is structured, if information is lost in the text representation (which it will nearly always be), developers can consume it in a structured format and make their own text representations.

As to how, I think the with LogCapture() would just add an observer to the global log publisher, record the events it hears, and then as the context manager exits it removes the observer to the global log publisher. Maybe LogCapture can take a param that only filters logs with a specific namespace (so, your hypothetical class under test).

That is, one of your examples would have:

self.assertEqual(logs[0].failure.value.args[0],
                "Unable to load credentials db: "
                "IOError(2, 'No such file or directory')")

Maybe. Not sure about that API. The filtering would make the indexing easier, but it would make it more fragile. Maybe there needs to be an API which is "assert that this param in any logs was this value", like

assertInALog(attr="failure.value.args", index=0,
             val=("Unable to load credentials db: "
                 "IOError(2, 'No such file or directory')")

This would look for attr in any log, with an index (in case it is a list, it would ignore it for non-indexable types), and then compare equality.

Do we want optional or mandatory check of log calls.

Optional. It might make sense to validate that something was logged if we explicitly say it will be (see above with security), but right now that's too much of a jump from not testing it at all.

Do we check all structured logs ?

Nah, I don't think we'll get much value out of checking every single log entry.

Changed 7 years ago by Dario Vinella

Attachment: log_capture-2.patch added

comment:11 Changed 7 years ago by Dario Vinella

Here it is a new LogCapture proposal with a couple of real use examples; it now supports setting a namespace filter, as suggested by hawkowl. Please review!

comment:12 Changed 7 years ago by Adi Roiban

Thanks.

Changes looks good.

Maybe we can put it into the TestCase class so that we don't have to import it in each test.

Maybe we can put the code in twisted/test/logger.py or something like that.

proto_helper should be for protocol related things... and the logger is not really a protocol.

The new LogCapture code needs dedicated test. We need to make sure that this code is solid and it don't just ignore stuff or add random stuff :)

You can look at twiste/test/iosim.py and the associated tests.

The usage of LogCapture in real test is there just to validate that the API is OK and helpful.

Beside creating the LogCapture helper we should also create a narrative documentation part which describe best practices in testing the code using new logging system.


Amber told me over IRC that we can not enforce all tests to always check for their emitted events... so that we should start with option log capturing in tests.

I would say that since we don't have many new places in Twisted using the new logging system we can start to enforce, ASAP, all logging calls made using the new logging system.

If we don't enforce this, we will grow a big tech debt and we will never have the time to pay it.


I have pushed your changes to buildbot :)

I will also push soon a commit which I hope will summarize the current recommendation and best practices for using and testing the new logging system in Twisted core.

Many thanks for your work.

comment:13 Changed 7 years ago by Adi Roiban

I have pushed a commit which tried to document / summarize the discussions from this ticket.

Please review it.

Thanks!

comment:14 Changed 6 years ago by hawkowl

Keywords: review removed
Owner: set to Adi Roiban

Hi Adi, it doesn't seem to show up on the ticket -- could you double check the branch is made/pushed, or add it to the ticket manually?

comment:15 Changed 6 years ago by Adi Roiban

Branch: branches/new-log-usage-8036
Keywords: review added
Owner: Adi Roiban deleted

comment:16 Changed 6 years ago by Adi Roiban

I have updated the branch name... I think that Trac hook failed and I did not checked before merge.

Please review.

Thanks!

comment:17 Changed 6 years ago by Wilfredo Sánchez Vega

Cc: wsanchez@… added
Keywords: review removed

I really like LogCapture(); that's such a nice way to test logging output.

Nit: I'd make the attributes set in __init__ private (eg. _events).

I would suggest that is only register one observer and return the captured events in an events() method. If you want that turned into text, it's pretty easy:

with LogCapture() as lc:
    ...
    self.assertIn('expected text', formatEvent(lc.events[0]))

formatEventAsClassicLogText, which is what textFileLogObserver uses, is also available if you really want the timestamp as well, but it seems unlikely that this would be useful in a test.

Some advantages to this are:

  • You are testing for emitted text in a specific event instead of in any event that may have been captured.
  • You are unlikely to be testing the timestamp and system prefix in the textFileLogObserver output, so I think formatEvent is usually more appropriate for testing.
  • The context manager doesn't need to store a StringIO with the rendered text, which may be a lot, depending on the test, and you may never use it anyway.

If we really want an asText() equivalent, it could return a sequence of messages instead of a single blob with all messages, so you could at least index into it; something like [formatEventAsClassicLogText(e) for e in self.events].

This is useful enough in general that I'd consider adding a twisted.logger._context and exporting the class.

Changed 6 years ago by Wilfredo Sánchez Vega

Attachment: logcapture-wsanchez.diff added

wsanchez suggested edits

comment:18 Changed 6 years ago by Wilfredo Sánchez Vega

Attached a patch with what I was suggesting for reference.

comment:19 Changed 6 years ago by Wilfredo Sánchez Vega

Other thing… I think hawkowl was referring to this in IRC…

I don't think we should suggest that people use self.noisy as a way to manage log output.

The new logging system uses log levels to this effect, so a more idiomatic (to twisted.logging) way of expressing that would be to use self._log.debug(). One reason one might wrap logging in a conditional is to avoid some expensive operation when not necessary:

if self.noisy:
    self._log.info("This happened: {thing}", thing=expensiveCall())

But that should be avoidable by deferring the calling to the rendering:

self._log.info("This happened: {thing()}", thing=expensiveCall)

But the main point is that by using log levels instead, you can control what gets logged by configuring the logging system, rather than configuring the object doing the logging, which I think is cleaner and more flexible, as setting an attribute on an object is less likely to be configurable at runtime than the logging system, and it's also a decision than can be left to observers to make individually.

Ideally twistd would get ported to the new logging system and set up a filtering observer chain that can be configured at runtime. (We do this in CalendarServer.) So we're not really set up for all this yet, but let's not encourage self.noisy as a convention.

comment:20 Changed 6 years ago by Adi Roiban

Keywords: review added

Thanks wsanchez for your patch and changes.

Thanks for your review.


I like that messages are broken in a list :)

Why not also make LogCapture.messages a property?


I like the LogCapture.messages as I hope it will not require to import formatEvent in every test.

I hope that we can extend our trial TestCase so that we don't have to import LogCapture and just have

` with self.logCapture() as lc:

... self.assertIn('expected text', formatEvent(lc.events[0]))

`


I think that moving LogCapture to a public API should be done in a separate ticket. And maybe we can use it for a while in the test and see if it requires any changes. After that we can make it public.


I understand the bad pattern in 'noisy'. I have remove it and add a section talking about this bad pattern... as I think it is already present is some code in trunk.


Please check latest changes.

Thanks!

comment:21 Changed 6 years ago by Wilfredo Sánchez Vega

I didn't make messages a property because initially I though it might want to take some flag parameters like prefix=True to throw in the timestamp and system bits if you cared. But it's not a big deal, and I don't think it'll be a common enough use case to matter.

I also avoid properties that do something expensive-ish, to make it a little more clear that if you want to access the result a bunch of times, you might want to store it in a local variable to avoid recomputing the event text:

with LogCapture() as lc:
   …
   self.assertEquals("…", lc.messages[0])
   self.assertEquals("…", lc.messages[1])
   self.assertEquals("…", lc.messages[2])
   self.assertEquals("…", lc.messages[3])

It's not a huge deal, though, and I agree it looks nicer as a property. It's also not impossible to optimize later if one cares.


I agree that making this public should wait for another ticket and a little baking-in time.

comment:22 Changed 6 years ago by Wilfredo Sánchez Vega

Keywords: review removed

I think this looks good except for the "self._log conflicts" bit:

from twisted.logger import _logFor

I don't think that _logFor actually exists in twisted.logger… I don't see it there, but I do see it in twisted.internet.protocol.

I know there was a conversation about the origins of this helper, but I missed it, so I'm not sure why that would be useful as API in twisted.logger. Is another work-around to just use a different attribute name, like self._logger?

comment:23 Changed 6 years ago by Adi Roiban

Keywords: review added
Owner: set to Wilfredo Sánchez Vega

@wsanchez you have removed the review tag, but did not assign the ticket and I am not sure what are the followup up steps for this ticket.

comment:24 Changed 6 years ago by Cory Benfield

Branch: branches/new-log-usage-8036new-log-usage-8036

Looks like the branch got out of step when we moved to GitHub. I've fixed it up.

comment:25 Changed 6 years ago by Wilfredo Sánchez Vega

Owner: changed from Wilfredo Sánchez Vega to Adi Roiban

Adi- my apologies for going dark on you here.

Summary of my thoughts:

  • I'm still unclear why an existing _log attribute needs _logFor, rather than picking a different name for the logger, but I don't object to it in principle. I do think _logFor has good use cases.
  • The docs says to import _logFor from twisted.logger. I don't see it defined there, so this wouldn't work.
    • I don't think we should private symbols at the package level.
    • I suggest from twisted.logger._logger import _logFor.

So the only thing left to do, I think, is put _logFor into _logger and update the docs to import form there. If you agree, then you're good to merge after that change. Ping me on IRC if not so we can discuss alternatives.

comment:26 Changed 6 years ago by Wilfredo Sánchez Vega

Keywords: review removed

comment:27 Changed 6 years ago by Adi Roiban

Keywords: review added
Owner: Adi Roiban deleted

I think that _logFor was added just as an idea of how a workaround might be... and to introduce it as private and see how well it can be used.

I have removed it from the example and suggested to use self._logger.

Please check that the branch is ok to be merged.

Thanks!


Instead of _logFor, for my project, I am using a simple method for all objects which can emit log... so if I want to emit a log as something.elseInstance, I use something.elseInstance.log()

the idea is that beside injecting the "self" instance, self.log() can also inject other instance specific attributes. For example, a protocol can always inject into the log the remote peer.

comment:28 Changed 6 years ago by Wilfredo Sánchez Vega

Keywords: review removed
Owner: set to Adi Roiban

OK this is totally merge-ready. Thanks, Adi.

comment:29 Changed 6 years ago by hawkowl

Except that _logFor exists because adding _logger on various Twisted things stomps on subclasses. This, for example, broke Autobahn (since we had our own _log/_logger on Protocol subclasses).

comment:30 Changed 6 years ago by Adi Roiban

Does _logFor exists ? Where ? :)

I was not able to find it in trunk.

Thanks!

comment:31 Changed 6 years ago by Adi Roiban

Owner: changed from Adi Roiban to hawkowl

Hm... I would not find where _logFor exists.

Amber, can you please shed some light here ?

Thanks!

comment:32 Changed 6 years ago by Wilfredo Sánchez Vega

Oh, I just noticed you still reference _logFor in a docstring on this branch: https://github.com/twisted/twisted/compare/trunk...new-log-usage-8036#diff-0c081f70298d151fd1af9f43fce1a89bR707

comment:33 Changed 6 years ago by hawkowl

It's called _loggerFor now, I think I moved it and renamed it.

comment:34 Changed 6 years ago by Adi Roiban

Owner: changed from hawkowl to Adi Roiban
Status: newassigned

Thanks.

from twisted.logger import _loggerFor

I will update the docs and will come back with an update

comment:35 Changed 6 years ago by Wilfredo Sánchez Vega

Well, it's presently in twisted.internet.protocol.

If we are going to use it more broadly, it should probably move, and I still vote or twisted.logger._logger.

I'm not fond of putting private symbols in the public module namespace; we've managed to avoid that so far. It's not pretty, but it's private.

comment:36 Changed 6 years ago by Wilfredo Sánchez Vega

Wait, I take it back, already moved.

But it *is* being imported into twisted.logger's namespace, and I'm sad about this. I'd prefer we remove that import (which should be OK because it's a private symbol) and document this instead:

` from twisted.logger._logger import _loggerFor `

And fix the import in twisted/internet/protocol.py and twisted/application/app.py.

I guess if you don't want to remove the import, that's OK, since that's in trunk and not directly required for this ticket, but I think the docs should use the above import so we don't have to fix more later.

comment:37 Changed 6 years ago by Adi Roiban

Keywords: review added
Owner: changed from Adi Roiban to Wilfredo Sánchez Vega
Status: assignednew

I have updated the branch. Please check latest changes on the associated branch.

The _loggerFor example is a bit "simple" but I hope that it will help people discover this helper and from there they will know what to do with it.

I don't know why https://twistedmatrix.com/documents/16.3.0/api/twisted.logger._logger.html has no API docs ... due to this I can not get the link to _loggerFor

Thanks!

comment:39 Changed 6 years ago by Glyph

I'll do a review with some comments on the PR.

comment:40 in reply to:  37 Changed 6 years ago by Glyph

Replying to adiroiban:

I don't know why https://twistedmatrix.com/documents/16.3.0/api/twisted.logger._logger.html has no API docs ... due to this I can not get the link to _loggerFor

The problem here is that _loggerFor is not documented at all. It's defined as an attribute (a lambda) rather than a proper function; _logger would need to have a module docstring that had a @var for it to be visible there. The fact that Logger doesn't show up is because it's been "moved" via an __all__ import to https://twistedmatrix.com/documents/16.3.0/api/twisted.logger.Logger.html

comment:41 Changed 6 years ago by Glyph

Keywords: review removed
Owner: changed from Wilfredo Sánchez Vega to Adi Roiban

Thanks for helping to pay down this technical debt, adi! A clear explanation of how to do this kind of testing, and how to help move us towards the new logging system generally, would be great.

I do have a little bit of housekeeping feedback: it looks like it needs a merge to update against trunk, and it seems to be failing on all the python 3 builders.

I did have quite a bit of feedback (as comments on the github PR, https://github.com/twisted/twisted/pull/375). Unfortunately I think there are still some open questions about the recommendations you've made in the docs and some of the placement of public API, so

On the plus side, I think this could be broken down into three very small PRs which would each be easier to address. The first would be LogCapture by itself, and tests, possibly even just as private API initially.

One thing to think about as you do that - one of Nathaniel and Augie's observations is that it's a smaller compatibility surface to expose a function that creates a provider of a documented interface, rather than exposing an actual class. This is a minor concern here, but it's a concern every time we expose a new type, so it's something to think about. You could write a public function captureLogs() which returns a LogCapture instance. For example, this is why optionsForClientTLS is a function, to provide flexibility in its implementation details, and so we can ignore the possibility that someone might subclass ClientTLSOptions in the future.

Please address the feedback so far, break up into some smaller chunks if you can, and re-submit.

Thanks again!

comment:42 Changed 5 years ago by Wilfredo Sánchez Vega

Component: corelogger

comment:43 Changed 5 years ago by Wilfredo Sánchez Vega

Milestone: New Logging System

Closing New Logging System milestone; use logger component for ongoing tickets.

comment:44 Changed 3 years ago by Wilfredo Sánchez Vega

I created a new ticket, #9617 for just the LogCapture class.

Note: See TracTickets for help on using tickets.