Opened 8 years ago

Closed 11 months ago

Last modified 11 months ago

#4053 enhancement closed invalid (invalid)

PEP 0290 Replace "x.has_key(y)" with "y in x"

Reported by: Screwtape Owned by:
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: Branch: has-key-4053
branch-diff, diff-cov, branch-cov, buildbot
Author:

Description (last modified by Craig Rodrigues)

In PEP 0290, has_key() was deprecated in Python 2.2 and higher:

In Python 3, dict.has_key() was removed:

PEP 0290 says that for testing dictionary membership, this:

if d.has_key(k):

should be changed to this:

if k in d:

Attachments (3)

fix_has_key_warnings.patch (59.0 KB) - added by Screwtape 8 years ago.
A patch that replaces instances of "has_key" with the "in" operator.
4053-fix-has_key-warnings.patch (57.2 KB) - added by Screwtape 7 years ago.
A patch that replaces instances of "has_key" with the "in" operator, attempt 2.
haskey.diff (51.0 KB) - added by Vladimir Perić 5 years ago.

Download all attachments as: .zip

Change History (46)

Changed 8 years ago by Screwtape

Attachment: fix_has_key_warnings.patch added

A patch that replaces instances of "has_key" with the "in" operator.

comment:1 Changed 8 years ago by Screwtape

Keywords: review added
Owner: Glyph deleted

comment:2 Changed 8 years ago by Antoine Pitrou

The change in twisted/mail/mail.py looks like you are fixing a separate bug.

comment:3 Changed 8 years ago by TimAllen

It does look like that, but note that while previously both twisted.mail.mail.DomainWithDefaultDict.has_key() and .__contains__() had identical implementations, now .__contains__() is clearly the authoritative implementation and .has_key() is a wrapper for compatibility with older versions.

comment:4 in reply to:  3 Changed 7 years ago by kelly

Replying to TimAllen:

It does look like that, but note that while previously both twisted.mail.mail.DomainWithDefaultDict.has_key() and .__contains__() had identical implementations, now .__contains__() is clearly the authoritative implementation and .has_key() is a wrapper for compatibility with older versions.

Wouldn't that still mean that has_key = __contains__ would be better, since that wouldn't change how the class functions (ie has_key always returns true)?

Other notes:

twisted/vfs/adapters/sftp.py:74:

if "uid" in attrs and attrs.has_key("gid"):

has_key hides behind short circuit evaluation - I wonder if that means there is a hole in the tests for this file...

twisted/lore/indexer.py:25:

if not text in entries:

The form "if text not in entries:" is used everywhere else.

twisted/names/authority.py:163:

if not 'zone' in l:

comment:5 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Screwtape

Looks like antoine and kelly did a decent review on this.

Changed 7 years ago by Screwtape

A patch that replaces instances of "has_key" with the "in" operator, attempt 2.

comment:6 Changed 7 years ago by Screwtape

Keywords: review added
Owner: Screwtape deleted

comment:7 Changed 7 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: newassigned

comment:8 Changed 7 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/has-key-4053

(In [27503]) Branching to 'has-key-4053'

comment:9 Changed 7 years ago by therve

Cc: therve added

Just noting the errors that I saw:

-            if not rpcConf.services.has_key(name):
+            if rpcConf.name not in services:


-            if not inetd.internalProtocols.has_key(service.name):
+            if inetd.service.name not in internalProtocols:

-        if self.active and self.processes.has_key(name):
+        if self.active and self.name in processes:

-        if attrs.has_key("atime") and attrs.has_key("mtime"):
+        if "atime" in attrs and mtime in attrs:


-            self.failUnless(self.openfun_called.has_key(conn))
+            self.failUnless(conn in self.openfun_called)

-            assert d.has_key(k), "has_key() failed"
+            assert k in d, "has_key() failed"

-            assert not d.has_key(k), "has_key() even though we deleted it"
+            assert not k in d, "has_key() even though we deleted it"

-        self.failUnless(i.has_key("time"))
+        self.failUnless("time" in i)

Those should probably use assertIn.

comment:10 Changed 7 years ago by Jean-Paul Calderone

(In [27505]) Change not x in y to x not in y; use assertIn instead of failUnless where appropriate; change other minor coding standard issues near has_key changes; fix some mistakes in the has_key/in translation

refs #4053

comment:11 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Status: assignednew

I fixed those, as well as a few other less important things. I'm going to try to figure out how much of the changed code is actually covered by unit tests (if only warnings emitted by the test suite had been fixed, then this wouldn't be a problem, but I guess has_key uses were tracked down with grep instead for some reason).

comment:12 Changed 7 years ago by Jean-Paul Calderone

Author: exarkunScrewtape, exarkun
Owner: changed from Jean-Paul Calderone to Screwtape

These changes are uncovered:

  1. twisted/conch/client/options.py
  2. twisted/conch/openssh_compat/primes.py
  3. twisted/conch/scripts/conch.py
  4. twisted/conch/unix.py
    1. The change to setModes
    2. The change to UnixSFTPFile.__init__ (half covered)
  5. twisted/internet/main.py (half covered, shouldn't be an assert anyway)
  6. twisted/mail/imap4.py
    1. The two MemoryAccount.rename changes are each only half covered
  7. twisted/mail/pb.py - there are no unit tests for this module at all
  8. twisted/manhole/service.py - one of the three hunks has half coverage, the other two have none
  9. twisted/manhole/telnet.py - no coverage at all
  10. twisted/names/authority.py
  11. twisted/names/cache.py
  12. twisted/names/aot.py - unjellyFromSource change is only half covered
  13. twisted/persisted/dirdbm.py
  14. twisted/persisted/styles.py
    1. unpickleModule change is untested

... at this point I'm rather bored. I'll just assume that the rest of the changes are uncovered too. Feel like writing some tests? Or splitting the parts of the patch which change code that does have tests off from the rest?

comment:13 in reply to:  11 Changed 7 years ago by Screwtape

Owner: Screwtape deleted

Replying to exarkun:

(if only warnings emitted by the test suite had been fixed, then this wouldn't be a problem, but I guess has_key uses were tracked down with grep instead for some reason)

The reason being, if you had to track down up to 1,970 uses of 'has_key', would you want to do it one-by-one, running the entire test-suite in between each attempt, or would you just write a regexp, search-and-replace across the tree, and run the tests to shake out whatever corner-cases the regexp might not have handled correctly?

Also, I guess I assumed that Twisted had better test-coverage than it evidently does. Had I known about that, I might well have taken the one-at-a-time route.

If I should have a sufficiently lazy Sunday sometime soon, I'll try to make a reduced patch that only changes code that's actually tested. In the mean time, I'll assign this ticket to nobody so that people won't think I'm actively working on it, and somebody else with a lazy Sunday might pick up the challenge.

comment:14 Changed 7 years ago by Screwtape

It has been pointed out to me that replacing "x.has_key(y)" with "y in x" is a thing that will automatically be done by the 2to3 tool, so perhaps it's not a strictly-necessary precondition to attempting the 2-to-3 transation.

comment:15 Changed 7 years ago by Jean-Paul Calderone

The problem with using 2to3 for this (closely related to the problem with global search and replace) is that there's no way for an automated tool to actually know if a use of has_key should be replaced with a use of in. Not every object that has has_key called on it is a dict. A human needs to consider each case and make the decision.

comment:16 Changed 7 years ago by terrycojones

Author: Screwtape, exarkunScrewtape, exarkun, terrycojones

I took a look through all the changes and found the following errors:

  1. 12 should be a string:
    Index: twisted/words/protocols/toc.py
    ===================================================================
    --- twisted/words/protocols/toc.py      (revision 28367)
    +++ twisted/words/protocols/toc.py      (working copy)
    
    @@ -1078,7 +1078,7 @@
             func(user,cookie,seq,pip,vip,port,tlvs)
    
         def tocSEND_FILE(self,user,cookie,seq,pip,vip,port,tlvs):
    -        if tlvs.has_key('12'):
    +        if 12 in tlvs:
    
  2. There are two errors (gid, mtime) in the following:
    Index: twisted/vfs/adapters/sftp.py
    ===================================================================
    --- twisted/vfs/adapters/sftp.py        (revision 28367)
    +++ twisted/vfs/adapters/sftp.py        (working copy)
    @@ -71,11 +71,11 @@
             NOTE: this function assumes it runs as the logged-in user:
             i.e. under _runAsUser()
             """
    -        if attrs.has_key("uid") and attrs.has_key("gid"):
    +        if "uid" in attrs and gid in attrs:
                 os.lchown(path, attrs["uid"], attrs["gid"])
    -        if attrs.has_key("permissions"):
    +        if "permissions" in attrs:
                 os.chmod(path, attrs["permissions"])
    -        if attrs.has_key("atime") and attrs.has_key("mtime"):
    +        if "atime" in attrs and mtime in attrs:
                 os.utime(path, (attrs["atime"]. attrs["mtime"]))
    
  3. The following has self.name instead of name:
    Index: twisted/runner/procmon.py
    ===================================================================
    --- twisted/runner/procmon.py   (revision 28367)
    +++ twisted/runner/procmon.py   (working copy)
    
    -        if self.active and self.processes.has_key(name):
    +        if self.active and self.name in processes:
    

In addition, find . -name '*.py' | xargs grep has_key shows 30 hits, so someone is going to have to go change these too.

comment:17 Changed 6 years ago by <automation>

comment:18 Changed 6 years ago by Jean-Paul Calderone

This patch needs to be split into smaller pieces and spread across multiple tickets. We have much better success at accepting 10 small patches than 1 enormous one.

comment:19 Changed 5 years ago by Vladimir Perić

Cc: Vladimir Perić added
Keywords: review added

I have written a new patch that removes some (obvious) uses of has_key. Not all uses are fixed though, but in lieu of comment #18, I prefer to split it into smaller chunks. All of these changes are straight-forward. To quote my commit message:

Replace uses of dict.has_key(key) with key in d

The has_key syntax has been deprecated in favor of "key in d" and is no longer supported in Python 3. Fix most uses of the old syntax, by running the 2to3 fixer with the following command, which runs it with just the "has_key" fixer. The 2to3 fixer is smart so this is a safe change to make.

2to3 -w -n -f has_key ./

Changed 5 years ago by Vladimir Perić

Attachment: haskey.diff added

comment:20 Changed 5 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Vladimir Perić

Thanks for picking this up!

I notice that there's still a change being made to twisted/spread/ui/gtk2util.py, which I know doesn't have unit tests. Can you cut out the parts of this diff that change untested code (or add unit tests for the changed code)? Difficult to test areas of Twisted that need to be changed should be addressed individually (and under separate tickets), with a bit of thought as to how to get test coverage of them so we can be confident the changes don't break things.

Also, notice that there has been some effort put into this ticket already. Please make sure that starting over from scratch on this ticket is the best use of time. Many potential pitfalls may have been covered in previous code reviews (for example, the point about changing untested code has been raised on this ticket before). It would be a shame to have to re-discover all of those.

Thanks again!

comment:21 Changed 5 years ago by Jonathan Lange

exarkun, the implications of your comment seem to be:

  1. That each change of x.has_key(y) to y in x in untested code ought to be filed as a separate ticket and patch. That is, one ticket per call to has_key in untested code. Is that right?
  2. That vperic, if he is to close this master ticket, will have to write tests for each current call to has_key that lacks tests?

Is this correct? If so, it's a pretty hefty burden. Especially since there'll be similar issues for apply, print and perhaps others. Is there anything we can do to ease that burden? (Perhaps by relaxing some of the landing constraints, or finding classes of change where we could allow an exception to a rule, or some means of automating the tedium (e.g. a script for filing tickets)).

Thanks, jml

comment:22 Changed 5 years ago by Vladimir Perić

Just a note, ticket #5688 fixed all uses of has_key which were covered by tests. There's still about a hundred remaining, though. Also, I guess a concern was that Twisted classes define their own has_key methods, but this only happens in three cases, all three of which also provide the appropriate contains method.

comment:23 Changed 5 years ago by Vladimir Perić

Milestone: Python-3.x

comment:24 Changed 5 years ago by Jean-Paul Calderone

That each change of x.has_key(y) to y in x in untested code ought to be filed as a separate ticket and patch. That is, one ticket per call to has_key in untested code. Is that right?

It could be one per module, probably. Or one per-how-many-can-be-fixed-in-a-100-line-patch. The idea is to not overwhelm the author or reviewer with so much that they make unnecessary mistakes.

That vperic, if he is to close this master ticket, will have to write tests for each current call to has_key that lacks tests?

Yes.

Is this correct? If so, it's a pretty hefty burden. Especially since there'll be similar issues for apply, print and perhaps others. Is there anything we can do to ease that burden? (Perhaps by relaxing some of the landing constraints, or finding classes of change where we could allow an exception to a rule, or some means of automating the tedium (e.g. a script for filing tickets)).

Maybe dash can provide a command line tool for filing tickets. Otherwise, I don't see how this can be made easier. The has_key calls exist, they need to be changed, and tests are needed to demonstrate the new code works.

comment:25 Changed 4 years ago by Thijs Triemstra

Cc: Thijs Triemstra added

comment:26 in reply to:  18 Changed 4 years ago by Thijs Triemstra

Replying to exarkun:

This patch needs to be split into smaller pieces and spread across multiple tickets. We have much better success at accepting 10 small patches than 1 enormous one.

Opened few more (with easy keyword for gsoc):

Still many more to open..

comment:27 Changed 11 months ago by Craig Rodrigues

Description: modified (diff)

comment:28 Changed 11 months ago by Craig Rodrigues

Summary: Replace "x.has_key(y)" with "y in x"PEP 0290 Replace "x.has_key(y)" with "y in x"

comment:29 Changed 11 months ago by Craig Rodrigues

Owner: changed from Vladimir Perić to Craig Rodrigues

comment:31 Changed 11 months ago by Craig Rodrigues

Keywords: review added
Owner: Craig Rodrigues deleted

comment:32 Changed 11 months ago by Adi Roiban

Many thanks for you contribution.

We should have one PR associated with one ticket... and one ticket fully addressed by a single PR.

If the PR is to big, is OK to split them in multiple branches but please create a ticket for each PR so that we can track their progress.

In order to land, each branch need a ticket describing the scope and a news file fragment describing the change type.

Exarkun rejected the initial patch since some changes were pushed without having any test coverage.

rodrigc, in the past I have merged some of your changes even if they did not had 100% coverage, but I don't want to make this the norm.

I understand the need for fixing the syntax so that we have py3 valid syntax, but for anything else, I think that is better to just focus on a module and port it by making sure it is 100% py3 compatible and 100% py3 and py2 covered.

I am leaving this so that another developer can add another opinion, but I don't like the way these py3 changes are pushed to Twisted.

I am sorry ... but in the current format, I can not approve the proposed branches.

comment:33 Changed 11 months ago by Adi Roiban

Branch: branches/has-key-4053has-key-4053

comment:34 Changed 11 months ago by Adi Roiban

I have updated the old branch associated with this ticket... in case it helps

comment:35 Changed 11 months ago by Craig Rodrigues

I appreciate your assistance in reviewing, testing and taking in my patches to Twisted.

In this case, I don't agree with your judgment and would like second opinion from another another Twisted developer. These are very small changes which are easy to review, don't break Python 2.7 compatibility, and don't break any of the tests which I have run.

While I respect the Twisted development process, progress on these Python 3 tickets is taking multiple years, and I see no justification for the progress to be so slow.

Since I submitted separate pull requests, what I can do is create one ticket per pull request per pull request, so that things can be looked at separately.

comment:36 Changed 11 months ago by Adi Roiban

Sure. No problem. We should get feedback from at least another developer. This is why I have not rejected your changed and this is why I left this ticket in the review queue.

Regarding slow progress with python3... Amber and others have put a lot of effort in porting to py3 and getting good coverage, and even with a good coverage we still hit a few regressions.

This is the main reason why I would like to continue doing py3 poring based on this slow process. Otherwise I would image that we would have had many more regression and then we might hear people asking to do the port in a separate branch and not in trunk and mess with their production python 2.7 environments :)


Since the changes are not that big, I would expect to have them in a single branch :) ... there is the branch with "print statement to function" migration which is much larger than this :)

comment:37 Changed 11 months ago by Antoine Pitrou

Sorry to disrupt this conversation, but does anyone know why I'm receiving e-mail updates to this ticket? I don't see myself in the CC list.

comment:38 Changed 11 months ago by Craig Rodrigues

Thank for the review feedback. Again, I disagree with your judgment.

I have a lot of experience doing py3 porting of other non-trivial Python libraries. In my experience, what I have found is that py3 porting falls into three categories of fixes:

  1. Relatively small syntax changes, that can be done as one liner changes that work on Python 2.6/2.7 and Python 3.x.
  1. Harder changes, which may require different code for Python 2 and Python 3, and needs wrapping in:
    if __PY3
    

blocks.

  1. Really hard changes, which require major rewriting of code. For example, dealing with unicode vs. bytes, or dealing with differences

The current approach Twisted developers have used is to combine all types of changes in the same review. This makes reviewing things harder, and when you combine multiple types of changes into a single review, it actually makes the risks of regressions much higher.

By breaking things down into multiple smaller changes, things are easier to review and test, piece by piece. This can done in compliance with existing Twisted code review and testing processes.

comment:39 Changed 11 months ago by Craig Rodrigues

I split out this into multiple tickets:

comment:40 Changed 11 months ago by Adi Roiban

See https://twistedmatrix.com/trac/wiki/Plan/Python3, the initial strategy and methodology.

This is the "approved" method of porting.

If you want to have another method for the port, I think that this should be taken over the mailing list and see if the other twisted developers are happy with changing the way we port to python 3.

I would argue that only updating the code to allow it to be imported in py3 or to execute on python 3 is not enough, we also need to enable the tests in python3 and have a good coverage for python3

Please send an email to the mailing lists, explaining why and how we should update the plan for porting to python3.

This ticket is not a good place to argue over the current method of porting.

As soon as the plan is updated, I will do the reviews according to those new rules.

Thanks!

comment:41 Changed 11 months ago by Glyph

Author: Screwtape, exarkun, terrycojones
Cc: therve Vladimir Perić Thijs Triemstra removed
Keywords: review py3k removed

Small syntax changes are always a good idea and are totally acceptable. But huge hard-to-review diffs are not. I'm going to clear this from the review queue and close it, because "in review" means "has a patch waiting for feedback", and in this case, the patches waiting for feedback are now on the split-out tickets.

Thank you for doing the work of splitting up the changes, rodrigc. I hope we can start landing them soon :-).

(One minor procedural note: there's no point in attaching the random "py3k" keyword; the only keyword that has significance in Twisted-land is "review".)

comment:42 Changed 11 months ago by Glyph

Resolution: invalid
Status: newclosed

comment:43 Changed 11 months ago by Craig Rodrigues

Other tickets used py3k as a keyword: https://twistedmatrix.com/trac/query?keywords=~py3k hence my confusion. Thanks for clarifying. I will no longer use py3k keyword.

Note: See TracTickets for help on using tickets.