I have been wanting to change this for a long time. This is a fantastic discussion. The proposal does not go far enough to fix the problems, but does describe them very well.
I will first address the following regarding 1 == 1L
:
Well, Java says it, and I don’t think we should contradict it on this one
Java does not say that 1 == 1L
, it says that 1 eq 1L
!
Integer.valueOf(1).equals(1L) --> false
In my mind, scala’s ==
is java’s .equals()
. Unboxed types in Java do not have equals(). It is unsound for Scala to say “Scala’s ==
is like Java’s equals()
except for AnyVal”, which I’ll get into later. IMO Scala’s eq is analagous to java’s ==
: reference equality for objects, and IEEE equality for primitives. This is true for AnyRef/Object, why should it differ for AnyVal? Yes, scala speaks of eq
as reference equality and it is not defined for values, but in the case of numerics it can be bit equality and/or IEEE equality (like Java’s ==
). And not having two separate notions of equality for numerics is exactly the root of the problem, performance wise.
With that ouf of the way, I will describe my proposal, then justify it.
Sameness, and the ==
method
This is essentially what is used in Set
s and Map
s by default, and should satisfy:
identity: x == x
reflexive: if x == y then y == x
transitive: if x == y and y == z then x == z
This implies that it can not be IEEE notions of equality for numerics, since all of that is destroyed by Float / Double. Luckily, this is highly performant within the same data type on numerics! Just look at how Java implements it, it is comparing the bit result of Double.toLongBits
.
What does this imply about how ==
functions between numeric types?
Well, identity can hold for every numeric type, if we compare bits like Java and do not do IEEE floating point == (where e.g. NaN != NaN
).
Regarding transitivity, we quickly get in trouble if we try the following:
def x: Int
def y: Float
println(y == x)
println(x == y)
Float and Int do not have the same range. Furthermore, Int is not even a subset of Float. There are value in Int that can not be represented by float and vice-versa. Transitivity can hold only if x == y is true if and only if the value is in the range of both. This is possible, but highly confusing. One can do what scala currently does, and coerce the data to float, but that leads to interesting results:
scala> 123456789.toFloat
res2: Float = 1.23456792E8
scala> 123456789.toFloat == 123456789
res3: Boolean = true
scala> 123456789 == 123456789.toFloat
res4: Boolean = true
scala> 123456788 == 123456789.toFloat
res5: Boolean = false
scala> 123456790 == 123456789.toFloat
res6: Boolean = true
Maybe you can stomach that, but then we break transitivity trivially. NOTE the above has different output for Scala.js, can you guess what it is? The answer is at the end.
You could convert both of these to Double, and since Double can fit the entire range of Int and Float in it, sanity will hold. But If you introduce Long into the mix the same dilemma appears.
My proposal is simple: ==
between integral and floating point data types always returns false
The above examples are only the tip of the iceberg. The unsoundness of trying to have universal equality ‘work’ across all numeric types is fundamentally broken in the current implementation.
Now sanity can be kept within integral or floating point types, provided we up-convert to the wider type and compare. A double that is out of range of a float will always compare as false with any float value. This is not consistent with Java and the JVM, and implies that 1L == 1
but 1L != 1.0f
. I propose that this be dropped too, so that 1L != 1
and 1.0f != 1.0d
, for the sake of consistency with the JVM and with the barrier between floating point and integral numbers, but it would not be unsound to allow it.
So, back to Any.==
, I propose essentially the following:
- No change to
==
on AnyRef
- For AnyVal,
==
returns false if the numeric types are not the same, and otherwise conforms to Java’s boxed types and is reflexive and transitive.
Numeric equality, IEEE, and eq
Numeric values need two notions of equality, just like reference values do. One can not construct something that works with Sets/Maps and also works with IEEE equality. The simplest, but not only, reason is that NaN != NaN
.
I propose that numerics get eq
and that this be identical to Java’s IEEE based ==
. Exposing this is critical / required for any serious floating point numeric libraries. It also means that 1 eq 1L
and 1 eq 1.0f
can be supported with well defined semantics.
Partial Order, Total Order, <
, >
and numerics
This may seem like a distraction, but it is fundamental to my proposal above. Numerics have two notions of equality. One of them is reflexive, transitive, and satisfies identity. This is the exact quality required in order for equality to be consistent with total order. That is, in my proposal above, ==
can be consistent with total order. IEEE equality and eq
can not. In Java, Double.compare provides a Total Ordering, but <
and ==
on a double
does not. Scala needs to expose this as well, and hopefully in a much less confusing way than Java.
Current scala behavior
scala> Double.NaN == Double.NaN
res13: Boolean = false
scala> 1.0d < Double.NaN
res14: Boolean = false
scala> 1.0d > Double.NaN
res15: Boolean = false
These are analogous with Java, and are the IEEE implementations, which are not consistent with equals or total order on floating point numbers.
For <
, <=
and friends, there are two implementation possibilities, one that is a partial order, and consistent with IEEE, and another that is a total order, and consistent with equals.
I have a few possible proposals:
- Leave these implementations the same (which are consistent with partial order and
eq
), and add a new set that is consistent with ==
, perhaps lt
gt
, gte
etc.
- Rename the above symbolic implementations to
lt
, gt
, etc which is consistent with eq
, and make new symbolic <
, >
, etc consistent with ==
- Same as the first proposal , but also swap the meaning of
eq
and ==
on numerics.
Each of these are insane in their own way. Sadly, I can not see any way to fix the problems with numerics in Scala without breaking code. But each has merits:
#1 is the most compatible with existing code, but is a bit confusing, as the symbolic <=
would be consistent with eq
but not the symbolic ==
#2 fixes the above problem, making all of the symbolic methods relate to Total Ordering and all of the non-sumbolic ones relate to IEEE.
#3 is the inverse of #2, with symbolics being IEEE and non-symbols being related to Total Order. However, it implies that AnyVal and AnyRef are now at odds with each other, and a Set or Map would use ==
for ref types and eq
for value types, which is awful, and really messes up the “what does ==
on Any mean” question, unless eq
and ==
are swapped for AnyRef too… yikes.
TL;DR
The root of the problem: Numerics require two notions of equality, and Scala currently tries to squish both of these into ==
. The requirement comes from the fact that we need one equality that is consistent with total order, and one that is consistent with partial order, in the case of floating point values. The one that is consistent with partial order is inadequate for use in sets/maps, etc. In some cases NaN
must not equal itself, and in others it must!
The consequence of this is that numerics need eq
too, and that overall language consistency demands considering how eq
and ==
on numerics relates to <=
.
I honestly think that Java got this right and Scala got it wrong, except for Java’s confusing and verbose API that means you have 1.0d == 1.0f
in one place, and Double.compareTo
in another, with .equals()
and ==
being somewhat consistent with those, but due to auto-boxing its a bit of a mess.
Vallhalla
When project Vallhalla lands, and composite value types on the JVM likely also get Java ==
definitions (but not .equals()) Scala will have even more problems of this sort. The JVM notion of equality for an unboxed value type with two numbers in it: x: (Int, Float) = (1, 1.0f)
will likely NOT be equal to a y: (Float, Int) = (1.0f, 1)
. They will need to define at minimum the equivalence consistent with boxed types, or raw bit-equality. They may also need to have one that is consistent with IEEE (so that composites with NaN don’t equal). Ironically, this is backwards for them; since ==
for doubles does not satisfy identity but in order to have sane value types in the jvm value equality demands it.
IMO, the existence of Vallhalla means that Scala will be facing the music eventually, and be forced to either have a big breaking change W.R.T. semantics of equality on numeric values, OR have even crazier (and branchier, slower) library code to bridge the semantic gap.
Scala.js trivia
- Scala.js up-converts to Double, so it does not have inconsistencies for Int/Float but does for Long/Double
Apologies
I’m out of time for now but wish I had time to clean up my message above to be more clear and concise. I realize that this grows way out of scope from the initial proposal, but IMO once you go breaking how numbers compare with each other, you might as well fix all of the broken stuff, breaking it all at once.