Opened 2 years ago

Last modified 21 months ago

#6070 enhancement new

Add a function like chr that always returns a byte string

Reported by: exarkun Owned by: meissenPlate
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: Branch:
Author: Launchpad Bug:

Description

Python 2 chr returns an instance of str (byte string). Python 3 chr returns an instance of str (text string). There's no obvious polyglot expression of the Python 2 version of the behavior, so we should have an API that takes care of the differences between Python 2 and Python 3 for this behavior.

At least twisted.python.randbytes and twisted.names.dns want to use this.

Attachments (4)

ticket_6070_patch_1.diff (6.3 KB) - added by meissenPlate 2 years ago.
Patch that adds a chr function and unittest.
ticket_6070_2.diff (8.1 KB) - added by meissenPlate 21 months ago.
Moved chr_ into compat and improved test. Added topfile.
ticket_6070_3.diff (6.3 KB) - added by meissenPlate 21 months ago.
A different way to generate _allASCIICharacters.
ticket_6070_4.diff (6.3 KB) - added by meissenPlate 21 months ago.
This version only tests 9 ASCII characters. See the discussion below.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 2 years ago by meissenPlate

  • Owner set to meissenPlate
  • Status changed from new to assigned

comment:2 Changed 2 years ago by meissenPlate

  • Keywords review added

Here's a patch that creates such a chr function. It tries to be an exact duplicate of 2x's chr, right down to the error messages. I stuck it in its own file, chr_.py, under twisted/python. This also means there's a new file of tests, test_chr_.py, under twisted/python/test. I could also see putting it in _util_py3.py, until such time as _util_py_3 is merged back into util.

unichr is a closely related function. 3x's chr is a wide 2x build's unichr. The narrow 2x build's unichr has no equivilent. It would make sense to smooth this over as well, though in a separate ticket. Unichr is only used in one place in Twisted, twisted.words.protocols.jabber.xmpp_stringprep, so it's not very important.

ord also changed a bit; it used to behave differently between wide and narrow builds and that distinction is gone now. But that doesn't really have anything to do with Twisted, so ord is fine the way it is.

Changed 2 years ago by meissenPlate

Patch that adds a chr function and unittest.

comment:3 Changed 2 years ago by exarkun

  • Owner meissenPlate deleted
  • Status changed from assigned to new

Assign tickets to noone when you submit them for review (follow the documented ReviewProcess).

comment:4 Changed 22 months ago by therve

  • Keywords review removed
  • Owner set to meissenPlate

Thanks for the patch! Some comments:

  1. It seems to be that the new function should be in the compat module itself.
  1. The __future__ imports in the test module seem unnecessary, as well as the compat import.
  1. The test method should be splitted in at least 3 different tests.
  1. The fail call in unnecessary in the tests.
  1. Please use camelCase for your variables.

comment:5 Changed 22 months ago by exarkun

The future imports in the test module seem unnecessary

These are part of the Python 3 plan itamar and I outlined. Since it sounds like the new test module should go away, I guess this doesn't really matter here - but just fyi. :)

The test method should be splitted in at least 3 different tests.

As long as I'm here... I might have said even more than 3 :)

Thanks to you both!

comment:6 Changed 21 months ago by meissenPlate

  • Keywords review added
  • Owner meissenPlate deleted
  1. Moved. Thanks for figuring that out.
  1. N/A
  1. I broke the test up.
  1. Yeah, sorry. I wrote this test before I learned this point of Twisted style. Fixed.
  1. Fixed.

I also added a topfile 6070.misc. I'm not sure whether I just forgot that before or what. Anyway it's there now.

Changed 21 months ago by meissenPlate

Moved chr_ into compat and improved test. Added topfile.

comment:7 follow-up: Changed 21 months ago by borko

can you create list of ascii elements in a compact way e.g. _allASCIICharacters = [chr(x) for x in range(126, 255, 1)] ? thx

comment:8 in reply to: ↑ 7 Changed 21 months ago by borko

Replying to borko:

can you create list of ascii elements in a compact way e.g. _allASCIICharacters = [chr(x) for x in range(126, 255, 1)] ? thx

(in tests of course)

comment:9 Changed 21 months ago by meissenPlate

Here's why I DIDN'T do it that way initially:

_allASCIICharacters needs to be full of bytestrings. In 2x that's what chr returns, but in 3x that's not. So we would need to do this:

if _PY3:
    _allASCIICharacters = [chr(i).encode("latin1") for i in range(256)]
else:
    _allASCIICharacters = [chr(i) for i in range(256)]

You'll notice that's pretty much exactly what chr_ does. So you'd be testing chr_ by comparing its output to ... essentially its own output.

I think that's pretty silly. However, I could totally be wrong. So I've attached a ticket that makes the test do it that way instead. :) Thanks for the feedback in any case.

Changed 21 months ago by meissenPlate

A different way to generate _allASCIICharacters.

comment:10 Changed 21 months ago by borko

oh, I see it now. So personally I would test only 126 and 255 for positive testcases, and 125 and 254 for negative scenarios (and all other negative scenarios). Can you create one function and move " if _PY3: " inside the function (and also move doc inside)?

comment:11 Changed 21 months ago by meissenPlate

So personally I would test only 126 and 255 for positive testcases, and 125 and 254 for negative scenarios (and all other negative scenarios).

I don't know what you mean here. chr_ should return a bytestring containing the corresponding ASCII (okay latin1, I'm being sloppy) character for any integer in [0, 255] inclusive. I don't understand why we shouldn't be testing lower than 126, and I don't understand what you mean by testing for negative scenarios for 125 and 254.

chr_(125) should, and does, yield '}' in 2x and b'}' in 3x. And note that these are the same thing. Well, almost the same thing. Index into a bytestring in 2x and you get a 1 character byte string. Index into a bytestring in 3x and you get an integer. But smoothing THAT over is more than a measly chr() replacement can do.

http://www.utoronto.ca/web/HTMLdocs/NewHTML/iso_table.html

Certainly there should be a negative result (a ValueError) for 256 and -1, and those checks are already in place. (Lines 565 and 566, _negativeIntegers and integersOver255.)

Incidentally, lower pane ASCII ends at the 128th character, that is, the character represented by the binary number 127. So the boundary is 127-128. Not 125-126.

Can you create one function and move " if _PY3: " inside the function (and also move doc inside)?

I've been told in the past to perform the 3x check outside of functions (http://twistedmatrix.com/trac/ticket/5897#comment:4), so I did the same here. In addition, this style is consistent with the definitions of iterbytes and a few other functions in twisted.python.compat, just above the definition of chr_.

Personally I agree, it looks just a little bit odd.

Again, I appreciate the feedback, especially since it was so surprisingly prompt :)!

comment:12 Changed 21 months ago by borko

Sorry for creating mess with the range ends. So if the range is 0-255 I would check just positive scenarios for beggining of range, maybe middle of the range and end of range (255) to create more compact tests. Not to check same case multiple times. But I guess it's just minor issue we can leave it as it is now.

comment:13 Changed 21 months ago by meissenPlate

I still favor the second version of the diff. Who cares if the test has a somewhat long line? But for argument's sake, here's a version with just 9 characters.

Changed 21 months ago by meissenPlate

This version only tests 9 ASCII characters. See the discussion below.

comment:14 Changed 21 months ago by lvh

  • Keywords review removed
  • Owner set to meissenPlate

Hey :) Thanks for your patch, looks pretty good to me so far. Some questions I had reading it:

  1. What's the purpose of the step size of 1 in the range for _allASCIICharacters? Does it do anything special vs just omitting it that I'm forgetting?
  2. Why is this chr_ instead of chr? At first I thought it may be to avoid stomping over existing names, but the rest of the compat module appears to have no qualms with that, Secondly I thought Py3 might've added a prohibition on renaming certain things like it does for None, True and False, but that's not the case, at least not in my version of 3.3 or 3.4a0.
  3. I'd chr_ = chr instead of defining chr_; but that's maybe being a bit pedantic.

More nitpicks than anything else, like I said, it looks good to me mostly :)

Note: See TracTickets for help on using tickets.