Btw we cannot turn overloading off for ==
. Scala has no way to achieve this. We can turn overriding of by making ==
final (and it is!) but that does not prevent us from overloading it.
But people donât overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.
But people donât overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.
Correct. I believe this illustrates well the different concerns here. I see at least three:
- What is an intuitive meaning of
==
from an API perspective? - What is the cleanest way to express
==
from a semantics perspective? - What is the most straightforward and efficient implementation?
I thought initially that (1) and (2) were aligned, that we needed co-operative equality to hide boxing for generics. I am now convinced that (2) and (3) are aligned. The only way to approach ==
in the Scala context is as an overloaded method and that means co-operative equality only muddles the picture. The seeming inconsistencies we see are not due to ==
specifically but due to the fact that Scala has overloading as opposed to multi-methods. The same inconsistencies can be constructed for any other overloaded method, including ==
on other types.
I agree that (2) and (3) align better than (1) and (2) do.
But how much do (2) and (3) matter if we canât have (1)?
We already have a perfectly sensible method that can be used to align (2) and (3) in those special cases where (1) is not a primary concern: equals
. The question is whether ==
should lose its special status in attempting to achieve (1).
(Also, as an aside, def foo(final a: Any)
could be added as syntax that forbids overloading of method arguments.)
Multi-methods are handy; I appreciate them in Julia, for instance. But Iâm not convinced yet that a best-effort manual emulation of them to preserve the intuitive meaning of ==
isnât worthwhile. Yes, itâs a hassle. But itâs an up-front hassle that simplifies all later thinking about how to use equality, which for scalability is a nice win since you can concentrate on more relevant things than the difference between intuitive == and the actual behavior in the face of the diversity of numeric types.
Shouldnât the signature of == be
def ==(other: Any)
instead of
def ==(other: A)
I have been wanting to change this for a long time. This is a fantastic discussion. The proposal does not go far enough to fix the problems, but does describe them very well.
I will first address the following regarding 1 == 1L
:
Well, Java says it, and I donât think we should contradict it on this one
Java does not say that 1 == 1L
, it says that 1 eq 1L
!
Integer.valueOf(1).equals(1L) --> false
In my mind, scalaâs ==
is javaâs .equals()
. Unboxed types in Java do not have equals(). It is unsound for Scala to say âScalaâs ==
is like Javaâs equals()
except for AnyValâ, which Iâll get into later. IMO Scalaâs eq is analagous to javaâs ==
: reference equality for objects, and IEEE equality for primitives. This is true for AnyRef/Object, why should it differ for AnyVal? Yes, scala speaks of eq
as reference equality and it is not defined for values, but in the case of numerics it can be bit equality and/or IEEE equality (like Javaâs ==
). And not having two separate notions of equality for numerics is exactly the root of the problem, performance wise.
With that ouf of the way, I will describe my proposal, then justify it.
Sameness, and the ==
method
This is essentially what is used in Set
s and Map
s by default, and should satisfy:
identity: x == x
reflexive: if x == y then y == x
transitive: if x == y and y == z then x == z
This implies that it can not be IEEE notions of equality for numerics, since all of that is destroyed by Float / Double. Luckily, this is highly performant within the same data type on numerics! Just look at how Java implements it, it is comparing the bit result of Double.toLongBits
.
What does this imply about how ==
functions between numeric types?
Well, identity can hold for every numeric type, if we compare bits like Java and do not do IEEE floating point == (where e.g. NaN != NaN
).
Regarding transitivity, we quickly get in trouble if we try the following:
def x: Int
def y: Float
println(y == x)
println(x == y)
Float and Int do not have the same range. Furthermore, Int is not even a subset of Float. There are value in Int that can not be represented by float and vice-versa. Transitivity can hold only if x == y is true if and only if the value is in the range of both. This is possible, but highly confusing. One can do what scala currently does, and coerce the data to float, but that leads to interesting results:
scala> 123456789.toFloat
res2: Float = 1.23456792E8
scala> 123456789.toFloat == 123456789
res3: Boolean = true
scala> 123456789 == 123456789.toFloat
res4: Boolean = true
scala> 123456788 == 123456789.toFloat
res5: Boolean = false
scala> 123456790 == 123456789.toFloat
res6: Boolean = true
Maybe you can stomach that, but then we break transitivity trivially. NOTE the above has different output for Scala.js, can you guess what it is? The answer is at the end.
You could convert both of these to Double, and since Double can fit the entire range of Int and Float in it, sanity will hold. But If you introduce Long into the mix the same dilemma appears.
My proposal is simple: ==
between integral and floating point data types always returns false
The above examples are only the tip of the iceberg. The unsoundness of trying to have universal equality âworkâ across all numeric types is fundamentally broken in the current implementation.
Now sanity can be kept within integral or floating point types, provided we up-convert to the wider type and compare. A double that is out of range of a float will always compare as false with any float value. This is not consistent with Java and the JVM, and implies that 1L == 1
but 1L != 1.0f
. I propose that this be dropped too, so that 1L != 1
and 1.0f != 1.0d
, for the sake of consistency with the JVM and with the barrier between floating point and integral numbers, but it would not be unsound to allow it.
So, back to Any.==
, I propose essentially the following:
- No change to
==
on AnyRef - For AnyVal,
==
returns false if the numeric types are not the same, and otherwise conforms to Javaâs boxed types and is reflexive and transitive.
Numeric equality, IEEE, and eq
Numeric values need two notions of equality, just like reference values do. One can not construct something that works with Sets/Maps and also works with IEEE equality. The simplest, but not only, reason is that NaN != NaN
.
I propose that numerics get eq
and that this be identical to Javaâs IEEE based ==
. Exposing this is critical / required for any serious floating point numeric libraries. It also means that 1 eq 1L
and 1 eq 1.0f
can be supported with well defined semantics.
Partial Order, Total Order, <
, >
and numerics
This may seem like a distraction, but it is fundamental to my proposal above. Numerics have two notions of equality. One of them is reflexive, transitive, and satisfies identity. This is the exact quality required in order for equality to be consistent with total order. That is, in my proposal above, ==
can be consistent with total order. IEEE equality and eq
can not. In Java, Double.compare provides a Total Ordering, but <
and ==
on a double
does not. Scala needs to expose this as well, and hopefully in a much less confusing way than Java.
Current scala behavior
scala> Double.NaN == Double.NaN
res13: Boolean = false
scala> 1.0d < Double.NaN
res14: Boolean = false
scala> 1.0d > Double.NaN
res15: Boolean = false
These are analogous with Java, and are the IEEE implementations, which are not consistent with equals or total order on floating point numbers.
For <
, <=
and friends, there are two implementation possibilities, one that is a partial order, and consistent with IEEE, and another that is a total order, and consistent with equals.
I have a few possible proposals:
- Leave these implementations the same (which are consistent with partial order and
eq
), and add a new set that is consistent with==
, perhapslt
gt
,gte
etc. - Rename the above symbolic implementations to
lt
,gt
, etc which is consistent witheq
, and make new symbolic<
,>
, etc consistent with==
- Same as the first proposal , but also swap the meaning of
eq
and==
on numerics.
Each of these are insane in their own way. Sadly, I can not see any way to fix the problems with numerics in Scala without breaking code. But each has merits:
#1 is the most compatible with existing code, but is a bit confusing, as the symbolic <=
would be consistent with eq
but not the symbolic ==
#2 fixes the above problem, making all of the symbolic methods relate to Total Ordering and all of the non-sumbolic ones relate to IEEE.
#3 is the inverse of #2, with symbolics being IEEE and non-symbols being related to Total Order. However, it implies that AnyVal and AnyRef are now at odds with each other, and a Set or Map would use ==
for ref types and eq
for value types, which is awful, and really messes up the âwhat does ==
on Any meanâ question, unless eq
and ==
are swapped for AnyRef too⌠yikes.
TL;DR
The root of the problem: Numerics require two notions of equality, and Scala currently tries to squish both of these into ==
. The requirement comes from the fact that we need one equality that is consistent with total order, and one that is consistent with partial order, in the case of floating point values. The one that is consistent with partial order is inadequate for use in sets/maps, etc. In some cases NaN
must not equal itself, and in others it must!
The consequence of this is that numerics need eq
too, and that overall language consistency demands considering how eq
and ==
on numerics relates to <=
.
I honestly think that Java got this right and Scala got it wrong, except for Javaâs confusing and verbose API that means you have 1.0d == 1.0f
in one place, and Double.compareTo
in another, with .equals()
and ==
being somewhat consistent with those, but due to auto-boxing its a bit of a mess.
Vallhalla
When project Vallhalla lands, and composite value types on the JVM likely also get Java ==
definitions (but not .equals()) Scala will have even more problems of this sort. The JVM notion of equality for an unboxed value type with two numbers in it: x: (Int, Float) = (1, 1.0f)
will likely NOT be equal to a y: (Float, Int) = (1.0f, 1)
. They will need to define at minimum the equivalence consistent with boxed types, or raw bit-equality. They may also need to have one that is consistent with IEEE (so that composites with NaN donât equal). Ironically, this is backwards for them; since ==
for doubles does not satisfy identity but in order to have sane value types in the jvm value equality demands it.
IMO, the existence of Vallhalla means that Scala will be facing the music eventually, and be forced to either have a big breaking change W.R.T. semantics of equality on numeric values, OR have even crazier (and branchier, slower) library code to bridge the semantic gap.
Scala.js trivia
- Scala.js up-converts to Double, so it does not have inconsistencies for Int/Float but does for Long/Double
Apologies
Iâm out of time for now but wish I had time to clean up my message above to be more clear and concise. I realize that this grows way out of scope from the initial proposal, but IMO once you go breaking how numbers compare with each other, you might as well fix all of the broken stuff, breaking it all at once.
One more quick example of numeric value equality insanity in Scala today, that my proposal fixes:
scala> case class FI(f: Float, i: Int)
defined class FI
scala> case class IF(i: Int, f: Float)
defined class IF
scala> FI(1.0f, 1) == IF(1, 1.0f)
res19: Boolean = false
scala> val intfloat = (1, 1.0f)
intfloat: (Int, Float) = (1,1.0)
scala> val floatint = (1.0f, 1)
floatint: (Float, Int) = (1.0,1)
scala> intfloat == floatint
res20: Boolean = true
My proposal would make case classes isomorphic to tuples at the data and equality level, which they are currently not. This is especially important once Valhalla arrives on the JVM.
Also, it solves the original problem cleanly, by solving the problem at the root:
1 == 1L
// false
(1: Any) == (1L: Any)
// false
(1: Any).equals(1L: Any)
// false
new java.lang.Integer(1).equals(new java.lang.Long(1L))
// false
I believe value types are actually supposed to be allowed to define their own equals
method (see here). I assume that ==
in Java will delegate to equals
for value types (since there is no reference identity to compare), but I donât know.
The Valhalla team decided not to use tuples for value types, because they lack encapsulation and nominality. Thus, even two different value types with the same underlying representation would never evaluate as equal (unless for some reason the author went out of their way to make it so).
I think, as @sjrd said, it would be a much better idea to forbid comparing 1
and 1L
than to make it return false
. Returning false will lead to a plethora of subtle, difficult to find bugs because you forgot to cast 1
to a Long
.
@NthPortal That is good to know. If two value types with the same underlying structure can not be equated, providing the encapsulation and nominality, that puts some restrictions on the implementation and use sites, but is definitely cleaner.
The default implementation of equals / == (whichever syntax they pick) will likely compare the raw bits of the value then, in which case the bits for 1.0f
are not the same as 1
; unless they require authors to define equals. Also, Iâm sure there will be a VarHandle -ish API that lets one compare the raw bit values, which can lead to very good performance on equals/gt/lt etc if leveraged.
I consider whether the compiler allows these things to be a somewhat
independent consideration. I would agree that disallowing 1 == 1L
makes
a lot of sense. Then we could optionally allow 1 eq 1L
which is better
defined in relation to IEEE and Javaâs ==
.
But if the decision is to allow 1 == 1L
because ==
is on Any and
1.==(1L)
conforms to Any.==(other: Any)
, it should be false.
Set(1, 1L)
implies that 1 == 1L
exists, unless the scope is broadened further. It should return false.
Whether the user can type 1 == 1L
is a different issue. If we reject that, we should also reject Some(1) == None
and Left('a') == Right('a')
. Currently the compiler emits a warning saying these are always false when two case classes of different types are compared with ==
. I think this is essentially the same thing: comparing values of different types is false regardless of content.
Hmm, maybe that is another way to phrase my proposal. two values of different types are not equal, regardless of content. This would hold for:
Int
, Float
, case class F(f: Float)
, (Float, Int)
, (Int, Int)
, case class FI(f: Float, i: Int)
etc.
All are values with different types, and are thus not equal, regardless of content.
| But how much do (2) and (3) matter if we canât have (1)?
For me as a language designer and compiler writer, a whole lot! Performance matters as much as convenience, maybe more so. Semantic consistency and elegance also matters a lot to me and acts in this case as a clear tie-breaker.
There are also two other observations:
- Scala is the odd man out here, no other higher performance statically typed language implements co-operative equality in the way it does.
- Scala did not make itself a lot of friends with co-operative equality. Quite the contrary, a lot of people moved away or found an excuse to not adopt Scala because collection performance is so bad.
I would not take the popularity argument too seriously, if I was convinced that we do the right thing from a language design and semantics standpoint. But I am now convinced we actually did the wrong thing from that standpoint.
That said, I still donât think we should change the way equality works for statically known numeric types. Thatâs a huge distraction. There is no best way to do this, so Scalaâs choice to do it exactly like Java is valid and will stay like this.
Java does not say that 1 == 1L, it says that 1 eq 1L!
No, it says 1 == 1L
. Letâs not re-interprete what symbols mean. Your observation that there really need to be two notions of equality because of NaN is valid and important. In my mind, thereâs the equivalence relation, which is equals
and thereâs the ad-hoc equality, which is statically type-specific and is called ==
. Currently, we have:
scala> Double.NaN == Double.NaN
res5: Boolean = false
(as mandated by IEEE standard)
scala> Double.NaN.equals(Double.NaN)
res6: Boolean = true
(as mandated by equivalence relation axioms)
scala> (Double.NaN: Any) == (Double.NaN: Any)
res7: Boolean = false
This is a problem, because it means that NaN
cannot be stored as a key in Scala maps. If we drop co-operative equality, ==
on Any
would be the same as equals
and the comparison would do again the right thing.
Thereâs one compromise proposal, which might be worth investigating: Keep ==
as it is, but systematically use equals
and hashCode
instead of ==
and ##
in the new collections. This is a more limited change which does away with most of the performance penalty in practice. But if collections start to ignore the cooperative versions of Any.## and Any.==, the question remains why keep them around at allâŚ
==
behavior in Java is not important because we had broken Java behavior in many cases, e.g.:
new String("foo") == new String("foo") // Returns false in Java, true in Scala
I did not feel ==
is frequently used for arbitrary types.
Is it possible to remove Any.==
?
Suppose there are only some of types support ==
, e.g. Iterable.==
, StringOps.==
, Product.==
, Int.==
, then Scala collection should simply use equals
for internal comparison.
There is another advantage to remove Any.==
. We can define ArrayOps.==
for cases like:
Array(1, 2, 3) == Array(1, 2, 3) // should return true
==
is designed to support null
in case of NullPointerException
in null.equals(null)
.
However, it is unnecessary to let Any.==
become a compiler instruction method, because this NullPointerException
problem can be easily resolved by a XxxOps.==
extension method.
I just donât see how semantic consistency and elegance points in the direction of âyou must know, in your head, the static type of the arguments in order to know how ==
behavesâ.
Of course itâs possible to come up with awkward and confusing APIs where you have overloaded methods that make static types essential to know. But just because you can do this doesnât, to me, indicate anything about whether one should. Failing to prevent misuse of a language feature is not the same as eagerly advocating it. The benefit of static types is, in large part, so that the computer can keep track of things that I might make a mistake with. If the static type matters critically in how equality is interpreted, it means that the compiler is no longer helping me; the burden is now on me to get it right so that ==
has the meaning I intend. Fundamentally, I think this is inelegant, shifting the burden from computer to person when it should go the other way; and I think this promotes semantic inconsistency because although the language is regular, it makes the code less consistent.
They donât have a top type, either. Thatâs the real culprit: a top type that implements ==
. Thatâs a huge burden to shoulder.
Java doesnât have a top type. Itâs got Object
which is the top type of the object hierarchy, but it also has the primitive types that have no particular relationship with Object
. You canât speak generically about int
and float
and so on; only their boxed representations, Integer
, Float
, and so on. Despite some boxing and unboxing, there are nonetheless a myriad of difficulties dealing with boxed types vs. primitive types in Java.
C doesnât have objects at all.
C++ doesnât have a top type. Instead, templates form an entirely orthogonal way to get generic behavior by deferring the compilation to the point of usage, and only complaining there if it doesnât make sense. Itâs a very different model, and of course equality works very differently also.
Haskell doesnât even have proper subtyping, so the idea of cooperative equality doesnât even make sense to me in that context.
Rust sort of doesnât have subtyping. (Technically it does for lifetimes.) Anyway, the idea of cooperative equality doesnât make sense to me there either. In any case, you canât compare numerics of different primitive types; thereâs no implicit conformance and equality is defined within-type only.
C# doesnât have a top type, and you canât apply equality to generics without taking an equality typeclass.
Anyway, I could go on, but the hard part for Scala is that Any
exists and has ==
defined on it, which in Scala is presumably value equality. No other high-performance language has that.
Youâd know better than I do, but it is hard to measure the people who stayed and liked Scala because numbers âjust workâ with generics and collections instead of being a pain point like in Java.
Also, this is hardly the only case where Scala collections are behind Java. And itâs not even true that theyâre behind in general. I wrote AnyRefMap
precisely for those people who wanted java.util.HashMap
-like performance in Scala (and it delivers!); and LongMap
for people who had primitive keys and wanted to beat Java (and it does!). But despite this, Java 8 Streams can deliver 5x-10x faster performance on common operations (map and filter and such) on primitives than can Scala collections, since Scala has to box primitives. Scala implementations are generally immutable, which in many cases results in 2-3x worse performance just due to differences in algorithms. If people want to compare Java to Scala, there are loads of ways for Scala to seem worse. So though I take performance very seriously in my day-to-day work, I have trouble viewing this as a critical performance issue especially since you can always [A <: AnyRef]
(in your own code, not library code) and get back to Java-style equality.
Anyway, as you said, we shouldnât take the popularity argument too seriously. But if weâre going to make it at all, I think it would be good to gather more empirical data, e.g. on whether people like how ==
works with numbers.
It doesnât, but it almost certainly will in the relatively near future. I donât know the current state of Valhalla, and whether or not you can actually test how it behaves when comparing int
and long
when treated as the âanyâ type, but it would be interesting (and perhaps useful?) to see how it behaves.