Can we get rid of cooperative equality?

x == 150L returns true in Java. It won’t if one side uses new Long though. Auto-boxing uses Long.valueOf and 150 is guaranteed by the spec to be a flyweight instance because its magnitude is small. It would break if the Long was large enough, and this is something that is a problem in Java.

Anyhow, most of the problems with java are due to auto-boxing peculiarities that are not relevant for Scala if implemented well.

you are thinking that a user is someone writing a regular Scala program.
But that’s not the case in a REPL, in a DSL, in a Jupyter type worksheet
etc. And I was specifically talking about float, double, int, i.e.
leaving away an f suffix.

best, .h.h.

Absolutely. The boxing side of things is behind the scenes and not fundamental. In Java’s case, auto-boxing sometimes causes a surprise switch from reference equality to numeric equality, Scala can avoid that.

We know a few important things:

AnyRef/Object must expose two notions of equality

  • Equivalence Relation (of which, structural is a subset)
  • Reference equality (which is also a valid equivalence relation)

This leads to a simple solution:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

And thus, for AnyRef, == is simply short-hand for null-safe equivalence.

Numerics must expose two notions of equality

  • An Equivalence relation, consistent with total Order
  • IEEE numeric equivalence, consistent with total order for Integral types and partial order for floating point types

And here is where the dilemma lies. For reference types, the ‘secondary’ notion of equivalence is a valid equivalence relation. This makes a default implementation for equals simple. However, IEEE numeric equivalence (and partial order) is not suitable for collection keys, be it for hash-based or order based key identification. There is no default implementation that can satisfy both notions of equivalence for numeric values.

If we want to avoid Java’s issues where == means one thing in one place, and something entirely different in another (numerics vs references; numerics get boxed and suddenly == means something else) then the choice for == on numerics should be the same as for references: a null-safe equivalence relation. AnyVal types are not null, but are sometimes boxed and could be null at runtime, so the null-safe aspect is at least important as an implementation detail. But otherwise, it can be the same as with AnyRef, we just need to specify equals for each numeric type N as

def equals(that: Any): scala.Boolean = that match {
  case that: N => compare(that) == 0
  case _ => false
 }

Or in English, equals is consistent with the total ordering if the type matches, and false otherwise. On the JVM this is effectively the contents of each Java boxed type’s equals method.

But what about IEEE?

The above defines == to be a null safe equivalence. This is incompatible with IEEE’s == for floating point numbers. It however is consistent with IEEE ==, <, <= for integer types. I propose that we implement the IEEE754 total ordering for floating point numbers in these cases ( What Java’s compareTo on the boxed types do). In short, NaN == NaN. After all, most users would expect that. Also, it is very fast – comparison can be done by converting to integer bits and then comparing the integers - at the assembly level just using a two’s complement integer compare op.

I would not find it strange to specify that ==, <=, >=, <, and > in scala represent equivalence and total order by default. It is what we do for reference types, why not numerics? That breaks away from Java but I suspect it is more intuitive to most users and more consistent in the language IMO. It is certainly more useful for collections.

For users who are more advanced with floating point, they can pull out the IEEE tools. It is only with floating point types where there is a gap and we need parallel notions of equality to go with partial order.

The users that truly want IEEE semantics for numeric operations on floating point values must know what they are doing to succeed in writing an algorithm that depends on NaN != NaN anyway. For them, switching to some other syntax for IEEE will not be difficult. Perhaps there are different symbols, perhaps different names, or perhaps an import will switch a scope to the IEEE definitions.

Proposal

Expression Scala 2.11 Proposal
1.0 == 1L true false or will not compile
(1.0: Any) == 1L true false
(1.0: Any).equals(1L) false false
Double.NaN == Double.NaN false true
Double.NaN.equals(Double.NaN) true true
1.0F < Float.NaN false true
1.0F > Float.NaN false false
Set(1.0F, 1, 1L).size 1 3
Map(Double.NaN -> "hi").size 1 1
Map(Double.NaN -> "hi").get(Double.NaN) None Some(hi)
TreeMap(Double.NaN -> "hi").get(Double.NaN) Some(hi) Some(hi)
(1.0, 1) == (1, 1.0) true false
Some(1.0) == Some(1) true false
List(1, 1.0) == List(1, 1) true false
BigInt(1) == 1 true false or will not compile
UnsignedInt(1) == 1 N/A false or will not compile
Left(1) == Right(1) false w/ warning will not compile?
List(1) == Vector(1) true ???

The proposal boils down to a couple rules for consistency:

Two values of different nominal types are never equal. This holds for case classes today, the proposal makes it work consistently with tuples, case classes, and plain numeric types. The compiler can error when a result is guaranteed to be false due to mismatched types. It would be consistent with Valhalla. I don’t have an opinion on what to do with List(1) == Vector(1), that is more collection design than language.

For use cases where we want to compare across types in a cooperative way (perhaps the DSL / worksheet use case mentioned) one can either provide different methods, or use an import to switch the behavior. Or perhaps there are better ideas.

equals is consistent with == This leaves the definition for == as short-hand for null-safe equals – an equivalence relation – consistent with Ordering for a type. The consequence is that NaN == NaN and the default behavior is conformant to use of values as collection keys. Every other option I thought of was just far more inconsistent overall. Give up on NaN != NaN and the rules are clean and consistent. Otherwise you have to carve out an exception for floating point numbers and have collections avoid using == in some cases, or make equals inconsistent with ==.

Combined, these two rules would make it much simpler to extend numeric types, and add things like UnsignedInt – there is no quadratic explosion of complexity if equivalence is not cooperative.

5 Likes

kudos for making the table, which really is the most important thing on this thread
imo the desired outcome would be for those samples to not compile, but the the other result is okay too
i’m sure this would not have much impact even on unityped programs (contentious NaN cases included)

isn’t this trivially implemented by not exposing .equals and requiring a Equal typeclass and a === op? as per scalaz.

3 Likes

That will requires each of collection stores the Equal typeclass, which is impossible for java.util.* collections.

1 Like

That’s not required. Scalaz doesn’t do this, neither do any collections in Haskell. You just have Equal[A] => Equal[List[A]], etc.

That only properly works when you have globally-unique type class instances, as in Haskell. Otherwise, you may run into inconsistencies if different instances are picked up in different contexts for the same data structure (think of a Set, which needs Equal as part of its semantics).

1 Like

Storing the instances still doesn’t work if you don’t have globally unique type class instances, because there is no mechanism for testing instances for equality.

Edit: To make this clear. Say you add two lists together with different notions of equality. What is the notion of equality of the output list?

To be clear: I think that none of these two naive approaches are good solutions to the problem. So even if we were willing to go that way and base the language on it, we’d need something better.
Anyway, this discussion is probably out of topic: it seems to me that equality as a method in Scala is not going anywhere –– removing it would probably break most code in existence… It’s not like we can just suddenly turn Scala into Scalaz.

1 Like

There is a very interesting survey about primitive numeric types equality run by Brian Goetz that @odersky and others may want to have a look. It gives some insights on how Java developers treat numbers, which can be useful data to weigh in for this proposal. In Brian’s words:

This poll told us that: the Stockholm Syndrome is so strong that, when
given the option to treat boxed numbers as numbers, rather than
instances of accidental boxes, 85% chose the latter.

1 Like

Very useful to know that, thanks! We have not considered the implication of (non)cooperative equality on pattern matching, but we intend to change it we need to take this into account.

Isn’t pattern matching on Any part of the problem. A pattern match on an Int makes no sense to me. An Int only ever has one pattern. You can match /switch on an Int and you can match /switch on the Int component of a compound match case, but the Int itself is always an Atomic or Leaf Value. Therefore a match on an Int should not share the same syntax as a match on an AnyRef, or the same syntax as Compound value types (Scala native Structs).

Can we have a strict comparison like === that behaves strictly, e.g. not compiling 1 == 1L, as wished by lots of people in this thread, and make == consistent to Java or whatever? Using == in programs can introduce really unexpected bugs and the language should offer a stricter version IMHO, so users don’t have to go for scalaz for such basic stuff.

1 Like

What would === mean? As noted earlier, == is in general equivalent to:

How would === be defined?

Well most importantly it would have the same type for both arguments

final def === (that: A): Boolean = ...

where A is type of this.

At best you could define that as an extension method:

implicit final class EqOps[A](val self: A) extends AnyVal {
  def === (that: A): Boolean = ??? // How is this defined though?
}

However, that can’t be defined on Any like == is.

Additionally, what is the right-hand side? is it just a call to ==?

And what about collections? If you want to compare a List and a Vector, do you need to cast them to Seq?

I was looking into Float recently, so I’ll note this here.

scala> for {
         i <- 0 to 64
       } {
         val x: Int = Int.MaxValue - i
         val y: Float = (1L << 31).toFloat
         println(s"$i: x = $x; y = $y; x == y: ${x == y}; x.## == y.##: ${x.## == y.##}")
       }
0: x = 2147483647; y = 2.14748365E9; x == y: true; x.## == y.##: false
1: x = 2147483646; y = 2.14748365E9; x == y: true; x.## == y.##: false
2: x = 2147483645; y = 2.14748365E9; x == y: true; x.## == y.##: false
...
63: x = 2147483584; y = 2.14748365E9; x == y: true; x.## == y.##: false
64: x = 2147483583; y = 2.14748365E9; x == y: false; x.## == y.##: false

This shows that Int to Float is lossy, which makes sense because it’s a binary faction with 23 bits in the significand.

scala> Set[Float](2147483584, 2147483645, 2147483646, 2147483647)
res1: scala.collection.immutable.Set[Float] = Set(2.14748365E9)

This is not a specific phenomena to Float. We can do the same with Double using Long.

scala> Set[Double](99999999991234561L, 99999999991234562L, 99999999991234563L)
res2: scala.collection.immutable.Set[Double] = Set(9.999999999123456E16)

I think the problem is not just cooperative equality, but weak conformance involving IEEE floats.

I don’t think the integer to floating point problem is particular to IEEE floating point behaviour (and numeric equivalence); just to the fact that Floats and Doubles have fewer bits to store the value than Ints and Longs.

scalafix 0.6.0-M4+ lets you disable Object.equals/hashCode. e.g. see my config https://gitlab.com/fommil/drone-dynamic-agents/blob/master/project/scalafix.conf and further improvements in the semanticdb will ensure reliability will improve over time.

Thus ends my interest in this topic.