Opened 4 years ago

Last modified 4 years ago

#4613 enhancement new

FancyHashMixin: like FancyEqMixin but for __hash__

Reported by: lvh Owned by:
Priority: lowest Milestone:
Component: core Keywords:
Cc: awclinford@… Branch:
Author: Launchpad Bug:

Description

Right now FancyEqMixin has a somewhat strange (and I would argue broken) behavior:

>>> from twisted.python.util import FancyEqMixin
>>> class C(object, FancyEqMixin):
...     compareAttributes = "a",
...     
...     def __init__(self, a):
...         self.a = a
... 
>>> c1, c2 = C(1), C(1)
>>> c1 is not c2 and c1 == c2 and hash(c1) != hash(c2)
True

(The problem being that C.__hash__ == object.__hash__; which hashes on id.)

I'm fairly certain that c1 == c2 ==> hash(c1) == hash(c2). I propose that FancyEqMixin grows a __hash__ method that raises NotImplementedError. This will not break backwards compatibility; when it does break existing code that code was wrong anyway. The problem with that is that if something else *does* implement a working __hash__ it's hard to guarantee that the good __hash__ is lower in the MRO (eg gets used), plus it breaks cooperative MI if the good __hash__ uses __hash__es from further up in the MRO. Not sure what the right way to fix that is. I think you should implement __hash__ on the level you introduce the mixin anyway?

thrashold has suggested __hash__ = None on IRC, that would raise TypeError; not really the TypeError I would want (I would expect to see "type x is not hashable" rather than something about None not being callable, but apparently this is how you disable hash?

This is new in 2.6:


Changed in version 2.6: __hash__ may now be set to None to explicitly flag instances of a class as unhashable.


http://docs.python.org/reference/datamodel.html#object.__hash__

2.6 actually does the right thing automagically (which is *this*), but only on new-style classes, it appears.

Anyway, I propose a FancyHashMixin that has a hashAttributes classattr like FEM has a compareAttributes classattr and then hashes on the value of [getattr(self, v) for v in self.hashAttributes]. I think it should also subclass FEM; if we can count on cooperative MI (which we can't due to old-style classes) the default value for hashAttributes could be compareAttributes, since in 90% of cases they will be the same (as long as hashAttributes is a subset of compareAttributes it will still behave correctly).

Ceterum censeo geventem et medusam esse delendam.

Attachments (2)

fancyhashmixin-4613.diff (4.6 KB) - added by lvh 4 years ago.
fancyhashmixin-4613.2.diff (10.9 KB) - added by lvh 4 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 4 years ago by exarkun

  • Milestone regular-releases deleted
  • Owner changed from glyph to lvh

Yea, this makes sense. A patch would be great. :)

comment:2 Changed 4 years ago by lvh

I am working on adding some more tests to the implementation in txOAuth, and then I will make a patch to Twisted trunk.

Furthermore I think we should add __hash__ = None to FancyEqMixin (credits to Michael Foord for pointing this out to me). In new Pythons this is how you say "this thing is not hashable"; in older Pythons, you still get the appropriate TypeError.

Springing this on existing users is fine. If the old code didn't raise, it was silently broken anyway (see demonstration code snippet in ticket).

Thoughts?

Changed 4 years ago by lvh

comment:3 Changed 4 years ago by lvh

  • Keywords review added
  • Owner lvh deleted

Made the changes in lp:~lvh/twisted/fancyhashmixin-4613

Diff attached

comment:4 follow-ups: Changed 4 years ago by exarkun

  • Keywords review removed
  • Owner set to lvh
  • I wouldn't change FancyEqMixin's hashing behavior. It's backwards incompatible and it's not really related to the new feature. Existing code relying on the current behavior is not necessarily broken. Hashing by identity is frequently useful. Even if that's wrong and the behavior should change, mixing that behavior change with this new feature is a bad idea.
  • Stuff needs docstrings
  • Use assertEquals not assertEqual
  • Don't put underscores in test method names after the test_ prefix.
  • Don't use generator expressions in Twisted code
  • FancyHashMixin needs to be in __all__

I'm not sure about making FancyHashMixin a FancyEqMixin subclass. Can you elaborate on your explanation for doing that? I don't quite follow the logic in the ticket description.

comment:5 in reply to: ↑ 4 Changed 4 years ago by lvh

Fixed the low-hanging fruit; will comment on open questions when I have time to.

Replying to exarkun:

  • Use assertEquals not assertEqual

Done. http://bazaar.launchpad.net/~lvh/twisted/fancyhashmixin-4613/revision/15669

  • Don't put underscores in test method names after the test_ prefix.

Done. http://bazaar.launchpad.net/~lvh/twisted/fancyhashmixin-4613/revision/15668

  • Don't use generator expressions in Twisted code

As discussed on IRC, not removing because Twisted supports 2.4 now.

  • FancyHashMixin needs to be in __all__

Done. http://bazaar.launchpad.net/~lvh/twisted/fancyhashmixin-4613/revision/15667

comment:6 in reply to: ↑ 4 Changed 4 years ago by lvh

Replying to exarkun:

  • I wouldn't change FancyEqMixin's hashing behavior. It's backwards incompatible and it's not really related to the new feature. Existing code relying on the current behavior is not necessarily broken. Hashing by identity is frequently useful. Even if that's wrong and the behavior should change, mixing that behavior change with this new feature is a bad idea.

Okay, so the following argument is about the technical side of things (specifically, that c1 is not c2 and c1 == c2 and hash(c1) != hash(c2) should never happen), not about the administrative side (requiring a new ticket or not). I have no problem making a separate ticket for fixing that bug.

I am also commenting on the last part of your review in tandem, because I believe these two parts to be related.

I'm not sure about making FancyHashMixin a FancyEqMixin subclass. Can you elaborate on your explanation for doing that? I don't quite follow the logic in the ticket description.

It basically comes down to "hashability and equality are very related things, and separating them often leads to easily overlooked broken behavior."

Documents I'm citing here:

From the glossary, emphasis mine:


hashable
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

c1 and c2 from the example have a hash value which never changes during its lifetime: it has object's __hash__ method. It is also comparable. Hence, it is hashable. Hence, if it compares equal to a different object, those two objects must have the same hash value. It doesn't, so the object breaks a Python invariant. (Epigram 62 comes to mind)

About the *fix*, from the changelog:

Classes that inherit a __hash__() method from a parent class can set __hash__ = None to indicate that the class isn’t hashable. This will make hash(obj) raise a TypeError and the class will not be indicated as implementing the Hashable ABC.

You should do this when you’ve defined a __cmp__() or __eq__() method that compares objects by their value rather than by identity. All objects have a default hash method that uses id(obj) as the hash value. There’s no tidy way to remove the __hash__() method inherited from a parent class, so assigning None was implemented as an override. At the C level, extensions can set tp_hash to PyObject_HashNotImplemented(). (Fixed by Nick Coghlan and Amaury Forgeot d’Arc; issue 2235.)

The reason they recommend __hash__ = None is that it's deemed the nicest workaround for an unfortunate Python oversight; the inability to remove __hash__ on a type. 2.6 with only new-style classes will do the right thing and raise TypeError if a type claims being unhashable with __hash__ = None. Older versions will still raise the appropriate class, but for the wrong reason. It's not pretty, but at least it's not kaputt.

Another reason for issubclass(FancyHashMixin, FancyEqMixin) is practicality. An alternative implementation where FancyHashMixin is not a subclass of FancyEqMixin would let you make classes that aren't equal, but hash equal. That's not broken (hash equality vs equality is an implication relation, not an equivalence one). So, after trying (both with glyph if I remember correctly in #twisted and more recently in #python), we couldn't come up with any sane use cases:

  • You could use it to destroy set/dict's performance in a subtle fashion
  • You could use it to avoid a custom, expensive __eq__ by making hashing a much cheaper operation; however this is still treacherous since hashing on a limited subset means more collisions, and collisions mean doing that expensive __eq__ anyway; FancyHashMixin in its current implementation does not actually prevent this, since said expensive __eq__ would be implemented on a type that subclasses FancyHashMixin.

So, just pragmatism: if such a case exists it's probably well outside of the scope of FancyHashMixin. The already somewhat implausible use case similar to the second one above for hashing on a different set of attributes than you compare equal on, is directly supported by having compareAttributes and hashAttributes be distinct things.

comment:7 Changed 4 years ago by lvh

  • Keywords review added
  • Owner lvh deleted

So I did the docstring thing. In addition to the last comment, I think that concludes the points in the review, so I'm putting it back up.

Docstring changes:

New diff attached

Changed 4 years ago by lvh

comment:8 Changed 4 years ago by lvh

  • Keywords review removed

comment:9 Changed 4 years ago by awclin

  • Cc awclinford@… added

Could I ask someone to post a 'this is where we are and these are the next agreed steps' pls? Am doing some related work and might be able to build this in, if indeed, anything needs doing. TIA

comment:10 Changed 3 years ago by <automation>

Note: See TracTickets for help on using tickets.