[Twisted-Python] logging

Sat Jun 15 09:46:07 MDT 2013

On Sat, Jun 15, 2013 at 6:37 AM, Glyph <glyph at twistedmatrix.com> wrote:
>
> On Jun 14, 2013, at 11:18 PM, Christopher Armstrong
> <radix at twistedmatrix.com> wrote:
>
> This conversation has gotten pretty sprawling; time to reel it in with
> some code.
>
> What do you think of this for an API that meets in the middle?
>
> https://gist.github.com/radeex/5787124
>
> This example implementation only concerns itself with the points under
> debate right now; obviously it's completely unusable in general. But
> anyway:
>
> 1. it supports English logs
> 2. it doesn't require you to specify a formatting if you want to just
> log a bunch of data
> 3. it makes it easy to specify a system (both manually and based on
> the module your code is in)
>
>
> First off, thanks again for taking the time to correspond about this, I'm
> very happy for this conversation to be attracting attention and making
> progress :).
>
> Moving right along...
>
> This code is relying on the semantics of the existing log.msg() call (which
> are bad, for the reasons I enumerated in the start of this thread), right
> down to the dependence on %-formatting.  The main thing I'm interested in at
> the moment is actually fixing those semantics to be more useful; the
> specific thing we're talking about here, persistent log identifiers, is
> interesting to me at the moment mostly as it relates to that.
>
> I realize you might just be relying on it for familiarity, you might not
> have had time to read the whole new thing, so I'm not trying to tear down
> your idea based on that alone :).

Yeah... I feel like a lot of the things you say in this email aren't
related to the point I was trying to get across, which was the
interface and effect, not the implementation of it.

>
> But Wilfredo already implemented a thing which addresses the
> namespacing-of-log-messaages issue, and it does about 2/3 of what your
> proposal already does (except it has, like, tests and stuff).  Right down to
> the _getframe() hack (although
> <http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py#L312>
> uses inspect.currentframe() which has a public-sounding name at least :)).
> As I mentioned in my immediately-previous reply, the logger's 'namespace'
> does not need to be the same as your module name, so it can be a "more
> persistent" identifier if you have one.
>
> More significantly, you're actually going to the trouble of constructing a
> format string.  This produces output in an ad-hoc format: '='-separated
> str()s of your log keys (although only the ones you pass directly in your
> message; those annotating or augmenting the message at subsequent points by
> other publishers / observers are ignored), which you are presumably relying
> upon being smooshed into a plain-text log file, right along side many other
> messages which do not fit this format.  Presumably this could lead to issues
> like traceback lines with '=' assignments in them showing up in whatever
> kind of ad-hoc log-analysis you want to do later, and ambiguous unparseable
> results if any fields in the log message have a '=' and a space in their
> repr.
>
> Let's not make it the responsibility of the emitter, or the format string,
> to get those values into a text log file that can be scanned with a regex
> (albeit in the case of your proposal, a much-simplified regex).  Let's push
> that responsibility down into the log observer, and actually save it as
> structured data and *parse* it later.
>
> As all the hip young kids will tell you, this type of data should be
> persisted with JSON, but we should probably have code to support multiple
> formats.  Another advantage of serializing and parsing the logs as
> structured data would allow tools written to work on a "live" log observer
> to also analyze persistent logs in at least a similar way, if not entirely
> the same way.
>
> (An object that has survived the grim ordeal of translation to and from JSON
> will surely be made less by the process, and can't be treated quite the same
> way by a log observer; but, as I'm sure you'll agree, this is probably
> better than the alternative of Pickle.  An unpickled object will be brought
> fully back to life, but, in the words of the recent NIN release, it will
> have "came back haunted".)
>
> Maybe we could even do a SQLite observer.  Honestly I'm not sure which
> format is best.

So, I responded to Hynek about this confusion already, so I'll let you
read that. I have always assumed that twistd.log would continue to be
vaguely human-readable, with special formatting for several of the
keys. I also assume that developers will be able to hook up additional
observers to save out .concatenated-json files, or stream to some
network protocol, or whatever. The little bit of formatting my example
does is just a minor convenience for the people still logging at
twistd.log files. The arguments you're making now sounds ilke you want
to et rid of that ad-hoc formatting; is that true? So, instead of

2013-06-15 10:24:21-0500 [-] Server Shut Down.

we should have this?

time=1371331461.0 system='-' msg='Server Shut Down.'

> Since it's also important to be able to *filter* logs, and avoid the work of
> formatting or serializing them at all, it would be best to push this work as
> far down the line as possible.  The construction of the format string is
> itself a whole ton of string-copying, the sort of thing I'd really like to
> avoid.  (Last I measured, dictionary operations that don't cause a
> reallocation - and most don't, unless you're inserting in a long loop - are
> cheaper than even apparently small string copies.)
>
> In fact, it strikes me that perhaps *this* is the feature we should be
> talking about implementing: writing a structured log observer and parser,
> firming up the contract between emitters and observers so it's clear what to
> do with data that can't be serialized - more than exactly which fields we
> use to identify messages that are interesting to us.
>
> Relatedly, we could make log analysis a first-class task; instead of
> assuming someone will grep the log file later, actually provide a simple
> log-query tool that users can plug their code into, presuming they're using
> a log format which is cleanly parseable.  Maybe have a little web resource
> that serves up nicely formatted views of your logs.
>
> I can implement the UUID thing as a completely separate ticket and we can
> discuss it on its own merits; meanwhile, you can happily add log IDs to
> everything in your applications.  I'd be quite happy to table this part of
> the conversation for a separate ticket; I just thought it would be a nice
> thing that could fall out of some of the other parts of a new logging
> system.

Yeah, I'm sorry that this has gotten so much scope-creep. I'll
reiterate that I think redoing what I've been speaking about as the
"system" key is most important (but maybe it should be called
something other than "system", since you've clarified what its intent
was in another email).

--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/