<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Jun 14, 2013, at 3:43 PM, Glyph <<a href="mailto:glyph@twistedmatrix.com">glyph@twistedmatrix.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-family: Menlo-Regular; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">I think that we can satisfy all of these requirements by expanding the notion of an "identifier" a bit.</div></blockquote></div><br><div>In our last episode, I mentioned that an identifier should have several properties:</div><div><br></div><div><ol class="MailOutline"><li>human readable (and potentially non-unique) for ease of writing messages and reading them to get a general sense of the behavior of a system</li><li>globally unique (and potentially hard to read) for ease of filtering and monitoring</li><li>long-term (and potentially unique to humans) for ease of writing tools to deal with particular messages, whose exact details may change over time</li></ol></div><div><br></div><div>I think we can achieve all of these things by making there be <i>multiple</i> ways to identify a log message.  Here's how that could work:</div><div><br></div><div>When you are first writing a log message, just do this, as you usually would; toss off some crummy english message:</div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div>log.info("hey check it out I just {verb}ed a {noun}", verb=request.method, noun=request.uri)</div></blockquote><div><br></div><div>Asking the author to write a log message with more detail and care than this is basically just a recipe for disappointment :).</div><div><br></div><div>Later, someone might want to instrument these messages.  We've all agreed that re.findall("hey check it out.*") is a crummy way to do this.  The obvious thing would be something like this:</div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div>tapEventsLike(myObserver, publisher, log_format="hey check it out I just {verb}ed a {noun}")</div></blockquote><div><br></div>That's fine to start with, of course, but it's not very resilient to changes in the code; the developer might later change this line to read:<div><br></div><div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div>log.info("hey check it out I just {verb}ed a {noun} and got a {result}", verb=request.method, noun=request.uri, result=request.response.status)</div></blockquote><div><br></div>Now the log tap is broken, and there's no reasonable way to fix it; the format string doesn't have a place that it could go.</div><div><br></div><div>But, the 'log' in the above example is a logger object associated with a class or module, and so has 3 persistent pieces of information it can use to build a long-term identifier that the event tap can use.  We could reflect that in an API like this:</div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div>from example.module import log</div><div>tapEventsLike(myObserver, publisher,</div><div>              log_id=log.idForOriginalFormat("hey check it out I just {verb}ed a {noun}"))</div></blockquote><div><div><br></div><div>which could be computed offline, and then specified like this, to be even more resilient to API changes:</div></div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div>tapEventsLike(myObserver, publisher, log_id='604e45b5-2d41-5ba5-9ae1-7205163c3e3f')</div></div></blockquote><div><br></div><div>That string, of course, is computed like this:</div><div><br></div><div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div>>>> import uuid</div><div>>>> from urllib import urlencode</div><div>>>> uri = "<a href="http://twistedmatrix.com/ns/log/?">http://twistedmatrix.com/ns/log/?</a>" + urlencode(dict(format="hey check it out I just {verb}ed a {noun}", module="example.module"))</div><div>>>> uuid.uuid5(uuid.NAMESPACE_URL, uri)</div><div>UUID('604e45b5-2d41-5ba5-9ae1-7205163c3e3f')</div></div></blockquote></div><div><div><br></div><div>(Although if we decide we like this system, we could of course go with a helpful command-line tool to generate these for you.)</div><div><br></div><div>As you evolve your codebase to take logging more seriously, you could change the original 'log.info' call to instead look like this:</div><div><br></div></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">verbANoun = log.declare(</blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">    "hey check it out I just {verb}ed a {noun} and got a {result}",</blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">    log_id='604e45b5-2d41-5ba5-9ae1-7205163c3e3f',</blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">)</blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"># ...</blockquote><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">verbANoun(verb=request.method, noun=request.uri, result=request.response.status)</blockquote><div><div><br></div><div>and then monitoring can be accomplished via public API:</div><div><br></div></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div>from example.module import verbANoun</div></div><div>verbANoun.tap(myObserver, theLogPublisher)</div></blockquote><div><div><br></div><div>So, after some evolution, our message has 3 identifiers:</div><div><br></div><div><ul class="MailOutline"><li>the original format string, which was easy to come up with on the spot, and need not be unique</li><li>a UUID, that was automatically generated but can be carried around to allow for cleaning up the codebase and the message without breaking any code doing monitoring</li><li>an API name that the logger and the logee can use to agree on the semantics of a particular message</li></ul><div><br></div></div><div>Best of all, you can get this <i>even for the plethora of existing log messages within Twisted that aren't even using structured values</i>, as we evolve them to have more metadata.  You also now have the ability to localize the format string, if you want to start presenting these log messages in contexts other than log files for system administrators.</div><div><br></div><div>Thoughts?</div><div><br></div><div>-glyph</div></div><div><br></div></body></html>