Can we get rid of cooperative equality?


#21

Btw we cannot turn overloading off for ==. Scala has no way to achieve this. We can turn overriding of by making == final (and it is!) but that does not prevent us from overloading it.


#22

But people don’t overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.


#23

But people don’t overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.

Correct. I believe this illustrates well the different concerns here. I see at least three:

  1. What is an intuitive meaning of == from an API perspective?
  2. What is the cleanest way to express == from a semantics perspective?
  3. What is the most straightforward and efficient implementation?

I thought initially that (1) and (2) were aligned, that we needed co-operative equality to hide boxing for generics. I am now convinced that (2) and (3) are aligned. The only way to approach == in the Scala context is as an overloaded method and that means co-operative equality only muddles the picture. The seeming inconsistencies we see are not due to == specifically but due to the fact that Scala has overloading as opposed to multi-methods. The same inconsistencies can be constructed for any other overloaded method, including == on other types.


#24

I agree that (2) and (3) align better than (1) and (2) do.

But how much do (2) and (3) matter if we can’t have (1)?

We already have a perfectly sensible method that can be used to align (2) and (3) in those special cases where (1) is not a primary concern: equals. The question is whether == should lose its special status in attempting to achieve (1).

(Also, as an aside, def foo(final a: Any) could be added as syntax that forbids overloading of method arguments.)

Multi-methods are handy; I appreciate them in Julia, for instance. But I’m not convinced yet that a best-effort manual emulation of them to preserve the intuitive meaning of == isn’t worthwhile. Yes, it’s a hassle. But it’s an up-front hassle that simplifies all later thinking about how to use equality, which for scalability is a nice win since you can concentrate on more relevant things than the difference between intuitive == and the actual behavior in the face of the diversity of numeric types.


#25

Shouldn’t the signature of == be

def ==(other: Any)

instead of

def ==(other: A)


#26

I have been wanting to change this for a long time. This is a fantastic discussion. The proposal does not go far enough to fix the problems, but does describe them very well.

I will first address the following regarding 1 == 1L:

Well, Java says it, and I don’t think we should contradict it on this one

Java does not say that 1 == 1L, it says that 1 eq 1L!

Integer.valueOf(1).equals(1L)   --> false

In my mind, scala’s == is java’s .equals(). Unboxed types in Java do not have equals(). It is unsound for Scala to say “Scala’s == is like Java’s equals() except for AnyVal”, which I’ll get into later. IMO Scala’s eq is analagous to java’s == : reference equality for objects, and IEEE equality for primitives. This is true for AnyRef/Object, why should it differ for AnyVal? Yes, scala speaks of eq as reference equality and it is not defined for values, but in the case of numerics it can be bit equality and/or IEEE equality (like Java’s ==). And not having two separate notions of equality for numerics is exactly the root of the problem, performance wise.


With that ouf of the way, I will describe my proposal, then justify it.

Sameness, and the == method

This is essentially what is used in Sets and Maps by default, and should satisfy:

identity: x == x
reflexive: if x == y then y == x
transitive: if x == y and y == z then x == z

This implies that it can not be IEEE notions of equality for numerics, since all of that is destroyed by Float / Double. Luckily, this is highly performant within the same data type on numerics! Just look at how Java implements it, it is comparing the bit result of Double.toLongBits.

What does this imply about how == functions between numeric types?

Well, identity can hold for every numeric type, if we compare bits like Java and do not do IEEE floating point == (where e.g. NaN != NaN).

Regarding transitivity, we quickly get in trouble if we try the following:

def x: Int
def y: Float
println(y == x)
println(x == y)

Float and Int do not have the same range. Furthermore, Int is not even a subset of Float. There are value in Int that can not be represented by float and vice-versa. Transitivity can hold only if x == y is true if and only if the value is in the range of both. This is possible, but highly confusing. One can do what scala currently does, and coerce the data to float, but that leads to interesting results:

scala> 123456789.toFloat
res2: Float = 1.23456792E8

scala> 123456789.toFloat == 123456789
res3: Boolean = true

scala> 123456789 == 123456789.toFloat
res4: Boolean = true

scala> 123456788 == 123456789.toFloat
res5: Boolean = false

scala> 123456790 == 123456789.toFloat
res6: Boolean = true

Maybe you can stomach that, but then we break transitivity trivially. NOTE the above has different output for Scala.js, can you guess what it is? The answer is at the end.

You could convert both of these to Double, and since Double can fit the entire range of Int and Float in it, sanity will hold. But If you introduce Long into the mix the same dilemma appears.

My proposal is simple: == between integral and floating point data types always returns false

The above examples are only the tip of the iceberg. The unsoundness of trying to have universal equality ‘work’ across all numeric types is fundamentally broken in the current implementation.

Now sanity can be kept within integral or floating point types, provided we up-convert to the wider type and compare. A double that is out of range of a float will always compare as false with any float value. This is not consistent with Java and the JVM, and implies that 1L == 1 but 1L != 1.0f. I propose that this be dropped too, so that 1L != 1 and 1.0f != 1.0d, for the sake of consistency with the JVM and with the barrier between floating point and integral numbers, but it would not be unsound to allow it.

So, back to Any.==, I propose essentially the following:

  • No change to == on AnyRef
  • For AnyVal, == returns false if the numeric types are not the same, and otherwise conforms to Java’s boxed types and is reflexive and transitive.

Numeric equality, IEEE, and eq

Numeric values need two notions of equality, just like reference values do. One can not construct something that works with Sets/Maps and also works with IEEE equality. The simplest, but not only, reason is that NaN != NaN.

I propose that numerics get eq and that this be identical to Java’s IEEE based ==. Exposing this is critical / required for any serious floating point numeric libraries. It also means that 1 eq 1L and 1 eq 1.0f can be supported with well defined semantics.

Partial Order, Total Order, <, > and numerics

This may seem like a distraction, but it is fundamental to my proposal above. Numerics have two notions of equality. One of them is reflexive, transitive, and satisfies identity. This is the exact quality required in order for equality to be consistent with total order. That is, in my proposal above, == can be consistent with total order. IEEE equality and eq can not. In Java, Double.compare provides a Total Ordering, but < and == on a double does not. Scala needs to expose this as well, and hopefully in a much less confusing way than Java.

Current scala behavior

scala> Double.NaN == Double.NaN
res13: Boolean = false

scala> 1.0d < Double.NaN
res14: Boolean = false

scala> 1.0d > Double.NaN
res15: Boolean = false

These are analogous with Java, and are the IEEE implementations, which are not consistent with equals or total order on floating point numbers.

For <, <= and friends, there are two implementation possibilities, one that is a partial order, and consistent with IEEE, and another that is a total order, and consistent with equals.

I have a few possible proposals:

  1. Leave these implementations the same (which are consistent with partial order and eq), and add a new set that is consistent with ==, perhaps lt gt, gte etc.
  2. Rename the above symbolic implementations to lt, gt, etc which is consistent with eq, and make new symbolic <, >, etc consistent with ==
  3. Same as the first proposal , but also swap the meaning of eq and == on numerics.

Each of these are insane in their own way. Sadly, I can not see any way to fix the problems with numerics in Scala without breaking code. But each has merits:

#1 is the most compatible with existing code, but is a bit confusing, as the symbolic <= would be consistent with eq but not the symbolic ==
#2 fixes the above problem, making all of the symbolic methods relate to Total Ordering and all of the non-sumbolic ones relate to IEEE.
#3 is the inverse of #2, with symbolics being IEEE and non-symbols being related to Total Order. However, it implies that AnyVal and AnyRef are now at odds with each other, and a Set or Map would use == for ref types and eq for value types, which is awful, and really messes up the “what does == on Any mean” question, unless eq and == are swapped for AnyRef too… yikes.

TL;DR

The root of the problem: Numerics require two notions of equality, and Scala currently tries to squish both of these into ==. The requirement comes from the fact that we need one equality that is consistent with total order, and one that is consistent with partial order, in the case of floating point values. The one that is consistent with partial order is inadequate for use in sets/maps, etc. In some cases NaN must not equal itself, and in others it must!

The consequence of this is that numerics need eq too, and that overall language consistency demands considering how eq and == on numerics relates to <=.

I honestly think that Java got this right and Scala got it wrong, except for Java’s confusing and verbose API that means you have 1.0d == 1.0f in one place, and Double.compareTo in another, with .equals() and == being somewhat consistent with those, but due to auto-boxing its a bit of a mess.

Vallhalla

When project Vallhalla lands, and composite value types on the JVM likely also get Java == definitions (but not .equals()) Scala will have even more problems of this sort. The JVM notion of equality for an unboxed value type with two numbers in it: x: (Int, Float) = (1, 1.0f) will likely NOT be equal to a y: (Float, Int) = (1.0f, 1). They will need to define at minimum the equivalence consistent with boxed types, or raw bit-equality. They may also need to have one that is consistent with IEEE (so that composites with NaN don’t equal). Ironically, this is backwards for them; since == for doubles does not satisfy identity but in order to have sane value types in the jvm value equality demands it.

IMO, the existence of Vallhalla means that Scala will be facing the music eventually, and be forced to either have a big breaking change W.R.T. semantics of equality on numeric values, OR have even crazier (and branchier, slower) library code to bridge the semantic gap.

Scala.js trivia

  • Scala.js up-converts to Double, so it does not have inconsistencies for Int/Float but does for Long/Double

Apologies

I’m out of time for now but wish I had time to clean up my message above to be more clear and concise. I realize that this grows way out of scope from the initial proposal, but IMO once you go breaking how numbers compare with each other, you might as well fix all of the broken stuff, breaking it all at once.


#27

One more quick example of numeric value equality insanity in Scala today, that my proposal fixes:

scala> case class FI(f: Float, i: Int)
defined class FI

scala> case class IF(i: Int, f: Float)
defined class IF

scala> FI(1.0f, 1) == IF(1, 1.0f)
res19: Boolean = false

scala> val intfloat = (1, 1.0f)
intfloat: (Int, Float) = (1,1.0)

scala> val floatint = (1.0f, 1)
floatint: (Float, Int) = (1.0,1)

scala> intfloat == floatint
res20: Boolean = true

My proposal would make case classes isomorphic to tuples at the data and equality level, which they are currently not. This is especially important once Valhalla arrives on the JVM.

Also, it solves the original problem cleanly, by solving the problem at the root:

1 == 1L // false
(1: Any) == (1L: Any) // false
(1: Any).equals(1L: Any) // false
new java.lang.Integer(1).equals(new java.lang.Long(1L)) // false


#28

I believe value types are actually supposed to be allowed to define their own equals method (see here). I assume that == in Java will delegate to equals for value types (since there is no reference identity to compare), but I don’t know.

The Valhalla team decided not to use tuples for value types, because they lack encapsulation and nominality. Thus, even two different value types with the same underlying representation would never evaluate as equal (unless for some reason the author went out of their way to make it so).


#29

I think, as @sjrd said, it would be a much better idea to forbid comparing 1 and 1L than to make it return false. Returning false will lead to a plethora of subtle, difficult to find bugs because you forgot to cast 1 to a Long.


#30

@NthPortal That is good to know. If two value types with the same underlying structure can not be equated, providing the encapsulation and nominality, that puts some restrictions on the implementation and use sites, but is definitely cleaner.

The default implementation of equals / == (whichever syntax they pick) will likely compare the raw bits of the value then, in which case the bits for 1.0f are not the same as 1; unless they require authors to define equals. Also, I’m sure there will be a VarHandle -ish API that lets one compare the raw bit values, which can lead to very good performance on equals/gt/lt etc if leveraged.


#31

I consider whether the compiler allows these things to be a somewhat
independent consideration. I would agree that disallowing 1 == 1L makes
a lot of sense. Then we could optionally allow 1 eq 1L which is better
defined in relation to IEEE and Java’s ==.

But if the decision is to allow 1 == 1L because == is on Any and
1.==(1L) conforms to Any.==(other: Any) , it should be false.


#32

Set(1, 1L) implies that 1 == 1L exists, unless the scope is broadened further. It should return false.

Whether the user can type 1 == 1L is a different issue. If we reject that, we should also reject Some(1) == None and Left('a') == Right('a'). Currently the compiler emits a warning saying these are always false when two case classes of different types are compared with ==. I think this is essentially the same thing: comparing values of different types is false regardless of content.

Hmm, maybe that is another way to phrase my proposal. two values of different types are not equal, regardless of content. This would hold for:
Int, Float, case class F(f: Float), (Float, Int), (Int, Int), case class FI(f: Float, i: Int) etc.

All are values with different types, and are thus not equal, regardless of content.


#33

| But how much do (2) and (3) matter if we can’t have (1)?

For me as a language designer and compiler writer, a whole lot! Performance matters as much as convenience, maybe more so. Semantic consistency and elegance also matters a lot to me and acts in this case as a clear tie-breaker.

There are also two other observations:

  • Scala is the odd man out here, no other higher performance statically typed language implements co-operative equality in the way it does.
  • Scala did not make itself a lot of friends with co-operative equality. Quite the contrary, a lot of people moved away or found an excuse to not adopt Scala because collection performance is so bad.

I would not take the popularity argument too seriously, if I was convinced that we do the right thing from a language design and semantics standpoint. But I am now convinced we actually did the wrong thing from that standpoint.

That said, I still don’t think we should change the way equality works for statically known numeric types. That’s a huge distraction. There is no best way to do this, so Scala’s choice to do it exactly like Java is valid and will stay like this.


#34

Java does not say that 1 == 1L, it says that 1 eq 1L!

No, it says 1 == 1L. Let’s not re-interprete what symbols mean. Your observation that there really need to be two notions of equality because of NaN is valid and important. In my mind, there’s the equivalence relation, which is equals and there’s the ad-hoc equality, which is statically type-specific and is called ==. Currently, we have:

 scala> Double.NaN == Double.NaN
 res5: Boolean = false

(as mandated by IEEE standard)

 scala> Double.NaN.equals(Double.NaN)
 res6: Boolean = true

(as mandated by equivalence relation axioms)

 scala> (Double.NaN: Any) == (Double.NaN: Any)
 res7: Boolean = false

This is a problem, because it means that NaN cannot be stored as a key in Scala maps. If we drop co-operative equality, == on Any would be the same as equals and the comparison would do again the right thing.


#35

There’s one compromise proposal, which might be worth investigating: Keep == as it is, but systematically use equals and hashCode instead of == and ## in the new collections. This is a more limited change which does away with most of the performance penalty in practice. But if collections start to ignore the cooperative versions of Any.## and Any.==, the question remains why keep them around at all…


#36

== behavior in Java is not important because we had broken Java behavior in many cases, e.g.:

new String("foo") == new String("foo") // Returns false in Java, true in Scala

#37

I did not feel == is frequently used for arbitrary types.

Is it possible to remove Any.==?

Suppose there are only some of types support ==, e.g. Iterable.==, StringOps.==, Product.==, Int.==, then Scala collection should simply use equals for internal comparison.

There is another advantage to remove Any.==. We can define ArrayOps.== for cases like:

Array(1, 2, 3) == Array(1, 2, 3) // should return true

#38

== is designed to support null in case of NullPointerException in null.equals(null).

However, it is unnecessary to let Any.== become a compiler instruction method, because this NullPointerException problem can be easily resolved by a XxxOps.== extension method.


#39

I just don’t see how semantic consistency and elegance points in the direction of “you must know, in your head, the static type of the arguments in order to know how == behaves”.

Of course it’s possible to come up with awkward and confusing APIs where you have overloaded methods that make static types essential to know. But just because you can do this doesn’t, to me, indicate anything about whether one should. Failing to prevent misuse of a language feature is not the same as eagerly advocating it. The benefit of static types is, in large part, so that the computer can keep track of things that I might make a mistake with. If the static type matters critically in how equality is interpreted, it means that the compiler is no longer helping me; the burden is now on me to get it right so that == has the meaning I intend. Fundamentally, I think this is inelegant, shifting the burden from computer to person when it should go the other way; and I think this promotes semantic inconsistency because although the language is regular, it makes the code less consistent.

They don’t have a top type, either. That’s the real culprit: a top type that implements ==. That’s a huge burden to shoulder.

Java doesn’t have a top type. It’s got Object which is the top type of the object hierarchy, but it also has the primitive types that have no particular relationship with Object. You can’t speak generically about int and float and so on; only their boxed representations, Integer, Float, and so on. Despite some boxing and unboxing, there are nonetheless a myriad of difficulties dealing with boxed types vs. primitive types in Java.

C doesn’t have objects at all.

C++ doesn’t have a top type. Instead, templates form an entirely orthogonal way to get generic behavior by deferring the compilation to the point of usage, and only complaining there if it doesn’t make sense. It’s a very different model, and of course equality works very differently also.

Haskell doesn’t even have proper subtyping, so the idea of cooperative equality doesn’t even make sense to me in that context.

Rust sort of doesn’t have subtyping. (Technically it does for lifetimes.) Anyway, the idea of cooperative equality doesn’t make sense to me there either. In any case, you can’t compare numerics of different primitive types; there’s no implicit conformance and equality is defined within-type only.

C# doesn’t have a top type, and you can’t apply equality to generics without taking an equality typeclass.

Anyway, I could go on, but the hard part for Scala is that Any exists and has == defined on it, which in Scala is presumably value equality. No other high-performance language has that.

You’d know better than I do, but it is hard to measure the people who stayed and liked Scala because numbers “just work” with generics and collections instead of being a pain point like in Java.

Also, this is hardly the only case where Scala collections are behind Java. And it’s not even true that they’re behind in general. I wrote AnyRefMap precisely for those people who wanted java.util.HashMap-like performance in Scala (and it delivers!); and LongMap for people who had primitive keys and wanted to beat Java (and it does!). But despite this, Java 8 Streams can deliver 5x-10x faster performance on common operations (map and filter and such) on primitives than can Scala collections, since Scala has to box primitives. Scala implementations are generally immutable, which in many cases results in 2-3x worse performance just due to differences in algorithms. If people want to compare Java to Scala, there are loads of ways for Scala to seem worse. So though I take performance very seriously in my day-to-day work, I have trouble viewing this as a critical performance issue especially since you can always [A <: AnyRef] (in your own code, not library code) and get back to Java-style equality.

Anyway, as you said, we shouldn’t take the popularity argument too seriously. But if we’re going to make it at all, I think it would be good to gather more empirical data, e.g. on whether people like how == works with numbers.


#40

It doesn’t, but it almost certainly will in the relatively near future. I don’t know the current state of Valhalla, and whether or not you can actually test how it behaves when comparing int and long when treated as the ‘any’ type, but it would be interesting (and perhaps useful?) to see how it behaves.