Opened 2 years ago

Closed 2 years ago

#5932 enhancement closed fixed (fixed)

Port twisted.python.log to Python 3

Reported by: itamar Owned by: itamar
Priority: normal Milestone: Python 3.3 Minimal
Component: core Keywords:
Cc: Branch: branches/log-python3-5932-2
(diff, github, buildbot, log)
Author: itamar Launchpad Bug:

Description

twisted.python.log needs to work on Python 3.

Change History (15)

comment:1 Changed 2 years ago by itamarst

  • Author set to itamarst
  • Branch set to branches/log-python3-5932

(In [35473]) Branching to 'log-python3-5932'

comment:2 Changed 2 years ago by itamar

We need to port threadable first (#5860).

comment:3 Changed 2 years ago by exarkun

Also untilConcludes and context. All three of those dependencies are up for review.

comment:4 Changed 2 years ago by itamarst

  • Branch changed from branches/log-python3-5932 to branches/log-python3-5932-2

(In [35504]) Branching to 'log-python3-5932-2'

comment:5 follow-up: Changed 2 years ago by itamar

  • Author changed from itamarst to itamar
  • Keywords review added
  • Owner changed from itamar to exarkun

Here is my proposed design; I would appreciate comments before proceeding:

  1. Log message dictionaries' keys will be native strings, i.e. Unicode on Python 3, bytes on Python 2.
  2. log.msg(*message) can be Unicode, as can the why argument to log.err(). Unicode messages they will be encoded to bytes using UTF-8.
  3. Documentation will note that if you mix bytes and Unicode, you should really make sure all your bytes messages are decodeable as ASCII or UTF-8.
  4. On Python 3, StdioOnAStick has to potentially deal with some other encoding for its output, in which case it will decode the message using UTF-8 and then write that to stdout. If they can't be decoded, we can just write the bytes as a fallback.

comment:6 Changed 2 years ago by itamar

The reasons for the first two points:

  1. Standard Python 3 porting idiom. Perhaps we should stick to bytes and provide constants, instead.
  2. Expanded functionality to support Python 3, which is also available in Python 2... while still being backwards compatible with existing observers.

The latter two should be obvious.

comment:7 in reply to: ↑ 5 Changed 2 years ago by glyph

Replying to itamar:

Here is my proposed design; I would appreciate comments before proceeding:

  1. Log message dictionaries' keys will be native strings, i.e. Unicode on Python 3, bytes on Python 2.

I have a feeling we should let people log unicode keys on Python 2, to prepare for their own transitions in advance.

  1. log.msg(*message) can be Unicode, as can the why argument to log.err(). Unicode messages they will be encoded to bytes using UTF-8.

At what stage? What do observers get?

  1. Documentation will note that if you mix bytes and Unicode, you should really make sure all your bytes messages are decodeable as ASCII or UTF-8.

There is no "or". Either they must be decodeable as UTF-8 or they are restricted to ASCII.

  1. On Python 3, StdioOnAStick has to potentially deal with some other encoding for its output, in which case it will decode the message using UTF-8 and then write that to stdout. If they can't be decoded, we can just write the bytes as a fallback.

I think that in any ambiguous situation we may need to guess for compatibility reasons, but un-decodeable bytes in this context should definitely emit a warning.

comment:8 Changed 2 years ago by itamar

Ah, to clarify point 2: the Unicode will be encoded to bytes inside msg() or err(), by the time they reach the observer they will be bytes. Thus existing observers can start accepting Unicode log messages with no code changes.

Re Unicode keys... it's not accurate to say Python 3 has Unicode keys, really. Rather, it has "native string" keys (more like Lisp symbols). As such making their type be "native string" actually is more idiomatic way of being compatible across versions, in so far as these effectively are opaque symbols and not strings you intend to manipulate. There's a utility function twisted.python.compat.nativeString which deals with conversions to this pseudo-type.

comment:9 Changed 2 years ago by itamar

glyph: perhaps the observer-registration API should say whether you want bytes or unicode. it allows for a clear expression of the caller's intent
glyph: and hey, this would be a great opportunity to introduce a new logging API where you just pass your unicode-ness preference once, and curry it into your log object so you don't need to say it every time...
glyph: c.f. http://twistedmatrix.com/trac/ticket/5872

comment:10 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to itamar

Log message dictionaries' keys will be native strings, i.e. Unicode on Python 3, bytes on Python 2.

Well, this is obviously an incompatible change in the API, but since preserving compatibility would be more work for a worse result I guess it's fine to do this instead. That probably sounds like a sarcastic comment, but there's nothing I can do about that. I am completely serious.

(reasoning) Standard Python 3 porting idiom. Perhaps we should stick to bytes and provide constants, instead.

I'm not sure what you mean by "provide constants" here. Sticking to bytes avoids introducing some problems for application developers, but are you suggesting that the logging API becomes..

    log.msg(**{log.FORMAT: "hello, world"})

I expect not, but I can't quite see another interpretation.

log.msg(*message) can be Unicode, as can the why argument to log.err(). Unicode messages they will be encoded to bytes using UTF-8. (to clarify point 2: the Unicode will be encoded to bytes inside msg() or err(), by the time they reach the observer they will be bytes. Thus existing observers can start accepting Unicode log messages with no code changes.)

I would say they must be unicode on Python 3, not merely that they can be. Also, I suggest ASCII only for this ticket. UTF-8, if necessary, can come later. Besides, the choice of encoding will ultimately be up to the log observer, not the log publisher. Perhaps someone wants to write out Latin-1 or MCBS log files (how else would you read them on Windows?).

Also it seems totally insane to say that the keys will become unicode but the values will be forced to remain bytes. This is just gratuitous complexity. I can think of tons of arguments not to do this, but I'm not going to bother to present them, because I see no rationale for going the route you described. Perhaps there's some alternate interpretation of these ideas that makes sense, but it's not clear from these comments what it is.

(reasoning) Expanded functionality to support Python 3, which is also available in Python 2... while still being backwards compatible with existing observers.

twisted.python.log already supports unicode on Python 2, so I'm not sure what the expanded functionality is.

Documentation will note that if you mix bytes and Unicode, you should really make sure all your bytes messages are decodeable as ASCII or UTF-8.

If we go with ASCII above, then we can just say ASCII here. Also, when we talk about "mixing" things to users, we should be clear on the mixing occurring in different log events, not in the same log event. Within a single event, there should really be consistency. Otherwise, fine.

On Python 3, StdioOnAStick has to potentially deal with some other encoding for its output, in which case it will decode the message using UTF-8 and then write that to stdout. If they can't be decoded, we can just write the bytes as a fallback.

This part doesn't make sense. You cannot write bytes to sys.stdout on Python 3. sys.stdout is a file in text mode, which means you can only write text to it. This is also a big part of why I think the log publisher doesn't need to be extremely concerned with what to encode text to: log files already specify an encoding and require text as input, not bytes.

Here are some comments about the code in the branch (though not a complete review), not about the above "design":

  1. Some of the newer tests in twisted/test/test_log.py instantiate a LogPublisher and exercise it, rather than using the global publisher in twisted.python.log. This would be a good way for just about all of the tests in this module to work. Notice this would remove the need for flushLoggedErrors, since the errors would all be logged on a LogPublisher that trial isn't even watching for errors.
  2. The twisted.python.threadable import in the middle of twisted/python/log.py can be moved to the top. With the twisted.python.log import now safe inside a function in failure.py, perhaps the failure import can be moved as well?

comment:11 Changed 2 years ago by itamar

  • Keywords review added
  • Owner changed from itamar to exarkun

Using a custom LogPublisher would be nice, but I didn't do it in order to minimize time spent on this ticket.

Ready for review, I think. There might be a build running here someday: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/log-python3-5932-2

comment:12 Changed 2 years ago by exarkun

  • Status changed from new to assigned

comment:13 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to itamar
  • Status changed from assigned to new

Thanks.

  1. admin/_twistedpython3.py
    1. I propose that underscores sort like the letter following them (there seem to be two competing approaches in the lists right now).
    2. Why is twisted.test.test_context deleted from the ported list?
  2. in twisted/python/_reflectpy3.py, _PY3 is now unused
  3. twisted/python/failure.py
    1. missing __future__.division
    2. cStringIO.StringIO? I suspect this is fine, but I wonder why. Is it to preserve performance? Is it because we don't have twisted.python.compat.BytesIO?
    3. twisted.python.compat.unicode avoids problems with NameError, but the way unicode is used in failure.py is clearly total nonsense. That's a problem to be resolved when we actually port failure.py though, I guess?
  4. Regarding the use of LogPublisher instances in twisted/test/test_log.py, please file a ticket.
  5. twisted/python/log.py
    1. The docstring for msg ends with a sentence saying These forms but it should be This form now.
    2. The comment starting Some more sibling imports, should be deleted.

More generally, the _PY3 conditionals introduced here make me sad. I guess I'll just have to live with that, since I can offer no alternative suggestion which might be better.

Sadly, buildbot says the redhat slaves are still missing the necessary version of zope.interface, so it's not clear that this passes on all "supported" platforms. Oh well.

Please merge when all these points have been addressed. Thanks.

comment:14 Changed 2 years ago by itamar

  1. Done.
  2. Done.
  3. Fixed the first, will deal with other two in failure ticket.
  4. #5946.
  5. Done.

comment:15 Changed 2 years ago by itamarst

  • Resolution set to fixed
  • Status changed from new to closed

(In [35527]) Merge log-python3-5932-2

Author: itamar
Review: exarkun
Fixes: #5932

twisted.python.log now works on Python 3.

Note: See TracTickets for help on using tickets.