Absolutely. The boxing side of things is behind the scenes and not fundamental. In Java’s case, auto-boxing sometimes causes a surprise switch from reference equality to numeric equality, Scala can avoid that.
We know a few important things:
AnyRef/Object must expose two notions of equality
- Equivalence Relation (of which, structural is a subset)
- Reference equality (which is also a valid equivalence relation)
This leads to a simple solution:
final def == (that: Any): Boolean =
if (null eq this) null eq that else this equals that
And thus, for AnyRef, ==
is simply short-hand for null-safe equivalence.
Numerics must expose two notions of equality
- An Equivalence relation, consistent with total Order
- IEEE numeric equivalence, consistent with total order for Integral types and partial order for floating point types
And here is where the dilemma lies. For reference types, the ‘secondary’ notion of equivalence is a valid equivalence relation. This makes a default implementation for equals simple. However, IEEE numeric equivalence (and partial order) is not suitable for collection keys, be it for hash-based or order based key identification. There is no default implementation that can satisfy both notions of equivalence for numeric values.
If we want to avoid Java’s issues where ==
means one thing in one place, and something entirely different in another (numerics vs references; numerics get boxed and suddenly == means something else) then the choice for ==
on numerics should be the same as for references: a null-safe equivalence relation. AnyVal types are not null, but are sometimes boxed and could be null at runtime, so the null-safe aspect is at least important as an implementation detail. But otherwise, it can be the same as with AnyRef, we just need to specify equals for each numeric type N as
def equals(that: Any): scala.Boolean = that match {
case that: N => compare(that) == 0
case _ => false
}
Or in English, equals is consistent with the total ordering if the type matches, and false otherwise. On the JVM this is effectively the contents of each Java boxed type’s equals method.
But what about IEEE?
The above defines ==
to be a null safe equivalence. This is incompatible with IEEE’s ==
for floating point numbers. It however is consistent with IEEE ==
, <
, <=
for integer types. I propose that we implement the IEEE754 total ordering for floating point numbers in these cases ( What Java’s compareTo on the boxed types do). In short, NaN == NaN
. After all, most users would expect that. Also, it is very fast – comparison can be done by converting to integer bits and then comparing the integers - at the assembly level just using a two’s complement integer compare op.
I would not find it strange to specify that ==
, <=
, >=
, <
, and >
in scala represent equivalence and total order by default. It is what we do for reference types, why not numerics? That breaks away from Java but I suspect it is more intuitive to most users and more consistent in the language IMO. It is certainly more useful for collections.
For users who are more advanced with floating point, they can pull out the IEEE tools. It is only with floating point types where there is a gap and we need parallel notions of equality to go with partial order.
The users that truly want IEEE semantics for numeric operations on floating point values must know what they are doing to succeed in writing an algorithm that depends on NaN != NaN anyway. For them, switching to some other syntax for IEEE will not be difficult. Perhaps there are different symbols, perhaps different names, or perhaps an import will switch a scope to the IEEE definitions.
Proposal
Expression |
Scala 2.11 |
Proposal |
1.0 == 1L |
true |
false or will not compile |
(1.0: Any) == 1L |
true |
false |
(1.0: Any).equals(1L) |
false |
false |
Double.NaN == Double.NaN |
false |
true |
Double.NaN.equals(Double.NaN) |
true |
true |
1.0F < Float.NaN |
false |
true |
1.0F > Float.NaN |
false |
false |
Set(1.0F, 1, 1L).size |
1 |
3 |
Map(Double.NaN -> "hi").size |
1 |
1 |
Map(Double.NaN -> "hi").get(Double.NaN) |
None |
Some(hi) |
TreeMap(Double.NaN -> "hi").get(Double.NaN) |
Some(hi) |
Some(hi) |
(1.0, 1) == (1, 1.0) |
true |
false |
Some(1.0) == Some(1) |
true |
false |
List(1, 1.0) == List(1, 1) |
true |
false |
BigInt(1) == 1 |
true |
false or will not compile |
UnsignedInt(1) == 1 |
N/A |
false or will not compile |
Left(1) == Right(1) |
false w/ warning |
will not compile? |
List(1) == Vector(1) |
true |
??? |
The proposal boils down to a couple rules for consistency:
Two values of different nominal types are never equal. This holds for case classes today, the proposal makes it work consistently with tuples, case classes, and plain numeric types. The compiler can error when a result is guaranteed to be false due to mismatched types. It would be consistent with Valhalla. I don’t have an opinion on what to do with List(1) == Vector(1)
, that is more collection design than language.
For use cases where we want to compare across types in a cooperative way (perhaps the DSL / worksheet use case mentioned) one can either provide different methods, or use an import to switch the behavior. Or perhaps there are better ideas.
equals
is consistent with ==
This leaves the definition for ==
as short-hand for null-safe equals – an equivalence relation – consistent with Ordering for a type. The consequence is that NaN == NaN
and the default behavior is conformant to use of values as collection keys. Every other option I thought of was just far more inconsistent overall. Give up on NaN != NaN
and the rules are clean and consistent. Otherwise you have to carve out an exception for floating point numbers and have collections avoid using ==
in some cases, or make equals inconsistent with ==
.
Combined, these two rules would make it much simpler to extend numeric types, and add things like UnsignedInt – there is no quadratic explosion of complexity if equivalence is not cooperative.