Can we get rid of cooperative equality?


#12

It’s actually more like a factor of two on a fair comparison. I did these tests when creating AnyRefMap; switching from cooperative to non-cooperative equality (as possible when things are typed as AnyRef) saves about a factor of two in speed. Using primitives directly gives about another factor of two (hence LongMap), but that isn’t a fair comparison because we’re talking about the behavior of Any.

Absolutely! The opaqueness of boxing of numbers is the source of endless Java puzzlers. Intuitively, the number one is the number one, regardless of whether it happens to be stored in a 32 bit integer or a 64 bit floating point value or a byte. It’s just one. Because users can create their own numeric type (e.g. Rational) with their own representation of one, it is not practical to maintain “one is one” universally. But it’s still a huge and worthwhile simplification of the cognitive model needed for dealing with numbers.

This is an implementation detail, presumably for speed. It needn’t be done this way. The various equalsXYZ methods in scala.runtime can handle any comparison.

The current treatment is expensive, but makes numbers more regular than they would be otherwise, thus avoiding a class of bugs that people run into in Java.

Fundamentally, as long as we have weak conformance and such around numbers, it’s profoundly inconsistent to allow 1L + 1 but not say 1L == 1 is both valid and returns true.

Rust, for example, has decided to disallow all of these: you cannot write 1u64 + 1u32 or 1u64 == 1u32. This is consistent and reduces the chance of error, but is also something of a hassle. (Unadorned numeric literals will conform to the type expected to avoid making it much too much of a hassle.) But Rust has no top type, so there is no expectation that (1L: Any) == (1: Any) behaves the same as 1L == 1.

So if cooperative equality were removed, I think equality on Any would have to go away entirely.


#13

Basically, since Java primitives behave differently from Java boxed
numbers, we can’t have comparisons between different numeric types that
satisfy all three of these:

(1) Scala unboxed numbers behave like Java primitives
(2) Scala boxed numbers behave like Java boxed numbers
(3) Scala unboxed numbers behave like Scala boxed numbers

It is difficult to have good JVM performance unless Scala numbers behave
like Java numbers. Scala boxed and unboxed being different sounds insane.

The only sane and efficient option seems to be, as has been suggested, to
deprecate comparisons between different numeric types and instead require
conversion to larger types, like Long and Double. Since these days almost
every platform is 64 bit, Long and Double are natively efficient.

Side note: comparing floating points to anything is pure evil. It can only
be forgiven in rare circumstances, such as emulating a language that does
not have integer types, like JavaScript.


#14

I would simply argue that === defined in this way is a poor API because it does not conform to the intuitive notion of sameness.

When concepts are different, it’s a good idea to use different method names.

class Confusing {
  def buh(s: String) = Try{ (new File(s)).delete }.isSuccess
  def buh(f: File) = f.exists
}

(In fact, I’d suggest that this example is a good argument against allowing overloaded method names.)


#15

If you happen to not like this change, another way to think about it is:

The current way is odd and surprising.
The changed way is odd and surprising, but it’s faster.


#16

Can you post a REPL transcript of the odd and surprising behavior? (With ==, not equals?)


#17

Can you post a REPL transcript of the odd and surprising behavior? (With
==, not equals?)

scala> class A(val x: String) { def ==(that: A) = this.x == that.x }
defined class A
scala> val a = new A("")
val a: A = A@23b3aa8c
scala> val b = new A("")
val b: A = A@338cc75f
scala> a == b
val res2: Boolean = true
scala> (a: Any) == (b: Any)
val res3: Boolean = false
scala>

The example shows that == is an overloaded method, and behaves like one.
Except for numeric types
where we magically make it a multi-method.


#18

I would simply argue that === defined in this way is a poor API because it does not conform to the intuitive notion of sameness.

Poor API or not, that’s how == is defined! And there are many good reasons for that, starting with performance. Imagine if all primitive == comparisons delegated to Any


#19

So are we suggesting that overloading be turned off for ==? Right now best practice would be (barring an extra canEqual check) to:

class A(val x: String) { def equals(a: Any) = a match {
  case a2: A => x == a2.x
  case _    => false
}

which doesn’t have the odd and surprising behavior you demonstrated. At least a linter should by default complain if one overloads == in that way.


#20

So are we suggesting that overloading be turned off for ==? Right now best practice would be (barring an extra canEqual check) to:

No, the opposite. Keep overloading but don’t treat numeric types specially. I.e.

(1: Any) != (1L: Any)

just like

(a: Any) != (b: Any)

in my example.


#21

Btw we cannot turn overloading off for ==. Scala has no way to achieve this. We can turn overriding of by making == final (and it is!) but that does not prevent us from overloading it.


#22

But people don’t overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.


#23

But people don’t overload equals that way for the most part; they do it like case classes do, precisely so that they avoid the confusing behavior that equality is not preserved by upcasting.

Correct. I believe this illustrates well the different concerns here. I see at least three:

  1. What is an intuitive meaning of == from an API perspective?
  2. What is the cleanest way to express == from a semantics perspective?
  3. What is the most straightforward and efficient implementation?

I thought initially that (1) and (2) were aligned, that we needed co-operative equality to hide boxing for generics. I am now convinced that (2) and (3) are aligned. The only way to approach == in the Scala context is as an overloaded method and that means co-operative equality only muddles the picture. The seeming inconsistencies we see are not due to == specifically but due to the fact that Scala has overloading as opposed to multi-methods. The same inconsistencies can be constructed for any other overloaded method, including == on other types.


#24

I agree that (2) and (3) align better than (1) and (2) do.

But how much do (2) and (3) matter if we can’t have (1)?

We already have a perfectly sensible method that can be used to align (2) and (3) in those special cases where (1) is not a primary concern: equals. The question is whether == should lose its special status in attempting to achieve (1).

(Also, as an aside, def foo(final a: Any) could be added as syntax that forbids overloading of method arguments.)

Multi-methods are handy; I appreciate them in Julia, for instance. But I’m not convinced yet that a best-effort manual emulation of them to preserve the intuitive meaning of == isn’t worthwhile. Yes, it’s a hassle. But it’s an up-front hassle that simplifies all later thinking about how to use equality, which for scalability is a nice win since you can concentrate on more relevant things than the difference between intuitive == and the actual behavior in the face of the diversity of numeric types.


#25

Shouldn’t the signature of == be

def ==(other: Any)

instead of

def ==(other: A)


#26

I have been wanting to change this for a long time. This is a fantastic discussion. The proposal does not go far enough to fix the problems, but does describe them very well.

I will first address the following regarding 1 == 1L:

Well, Java says it, and I don’t think we should contradict it on this one

Java does not say that 1 == 1L, it says that 1 eq 1L!

Integer.valueOf(1).equals(1L)   --> false

In my mind, scala’s == is java’s .equals(). Unboxed types in Java do not have equals(). It is unsound for Scala to say “Scala’s == is like Java’s equals() except for AnyVal”, which I’ll get into later. IMO Scala’s eq is analagous to java’s == : reference equality for objects, and IEEE equality for primitives. This is true for AnyRef/Object, why should it differ for AnyVal? Yes, scala speaks of eq as reference equality and it is not defined for values, but in the case of numerics it can be bit equality and/or IEEE equality (like Java’s ==). And not having two separate notions of equality for numerics is exactly the root of the problem, performance wise.


With that ouf of the way, I will describe my proposal, then justify it.

Sameness, and the == method

This is essentially what is used in Sets and Maps by default, and should satisfy:

identity: x == x
reflexive: if x == y then y == x
transitive: if x == y and y == z then x == z

This implies that it can not be IEEE notions of equality for numerics, since all of that is destroyed by Float / Double. Luckily, this is highly performant within the same data type on numerics! Just look at how Java implements it, it is comparing the bit result of Double.toLongBits.

What does this imply about how == functions between numeric types?

Well, identity can hold for every numeric type, if we compare bits like Java and do not do IEEE floating point == (where e.g. NaN != NaN).

Regarding transitivity, we quickly get in trouble if we try the following:

def x: Int
def y: Float
println(y == x)
println(x == y)

Float and Int do not have the same range. Furthermore, Int is not even a subset of Float. There are value in Int that can not be represented by float and vice-versa. Transitivity can hold only if x == y is true if and only if the value is in the range of both. This is possible, but highly confusing. One can do what scala currently does, and coerce the data to float, but that leads to interesting results:

scala> 123456789.toFloat
res2: Float = 1.23456792E8

scala> 123456789.toFloat == 123456789
res3: Boolean = true

scala> 123456789 == 123456789.toFloat
res4: Boolean = true

scala> 123456788 == 123456789.toFloat
res5: Boolean = false

scala> 123456790 == 123456789.toFloat
res6: Boolean = true

Maybe you can stomach that, but then we break transitivity trivially. NOTE the above has different output for Scala.js, can you guess what it is? The answer is at the end.

You could convert both of these to Double, and since Double can fit the entire range of Int and Float in it, sanity will hold. But If you introduce Long into the mix the same dilemma appears.

My proposal is simple: == between integral and floating point data types always returns false

The above examples are only the tip of the iceberg. The unsoundness of trying to have universal equality ‘work’ across all numeric types is fundamentally broken in the current implementation.

Now sanity can be kept within integral or floating point types, provided we up-convert to the wider type and compare. A double that is out of range of a float will always compare as false with any float value. This is not consistent with Java and the JVM, and implies that 1L == 1 but 1L != 1.0f. I propose that this be dropped too, so that 1L != 1 and 1.0f != 1.0d, for the sake of consistency with the JVM and with the barrier between floating point and integral numbers, but it would not be unsound to allow it.

So, back to Any.==, I propose essentially the following:

  • No change to == on AnyRef
  • For AnyVal, == returns false if the numeric types are not the same, and otherwise conforms to Java’s boxed types and is reflexive and transitive.

Numeric equality, IEEE, and eq

Numeric values need two notions of equality, just like reference values do. One can not construct something that works with Sets/Maps and also works with IEEE equality. The simplest, but not only, reason is that NaN != NaN.

I propose that numerics get eq and that this be identical to Java’s IEEE based ==. Exposing this is critical / required for any serious floating point numeric libraries. It also means that 1 eq 1L and 1 eq 1.0f can be supported with well defined semantics.

Partial Order, Total Order, <, > and numerics

This may seem like a distraction, but it is fundamental to my proposal above. Numerics have two notions of equality. One of them is reflexive, transitive, and satisfies identity. This is the exact quality required in order for equality to be consistent with total order. That is, in my proposal above, == can be consistent with total order. IEEE equality and eq can not. In Java, Double.compare provides a Total Ordering, but < and == on a double does not. Scala needs to expose this as well, and hopefully in a much less confusing way than Java.

Current scala behavior

scala> Double.NaN == Double.NaN
res13: Boolean = false

scala> 1.0d < Double.NaN
res14: Boolean = false

scala> 1.0d > Double.NaN
res15: Boolean = false

These are analogous with Java, and are the IEEE implementations, which are not consistent with equals or total order on floating point numbers.

For <, <= and friends, there are two implementation possibilities, one that is a partial order, and consistent with IEEE, and another that is a total order, and consistent with equals.

I have a few possible proposals:

  1. Leave these implementations the same (which are consistent with partial order and eq), and add a new set that is consistent with ==, perhaps lt gt, gte etc.
  2. Rename the above symbolic implementations to lt, gt, etc which is consistent with eq, and make new symbolic <, >, etc consistent with ==
  3. Same as the first proposal , but also swap the meaning of eq and == on numerics.

Each of these are insane in their own way. Sadly, I can not see any way to fix the problems with numerics in Scala without breaking code. But each has merits:

#1 is the most compatible with existing code, but is a bit confusing, as the symbolic <= would be consistent with eq but not the symbolic ==
#2 fixes the above problem, making all of the symbolic methods relate to Total Ordering and all of the non-sumbolic ones relate to IEEE.
#3 is the inverse of #2, with symbolics being IEEE and non-symbols being related to Total Order. However, it implies that AnyVal and AnyRef are now at odds with each other, and a Set or Map would use == for ref types and eq for value types, which is awful, and really messes up the “what does == on Any mean” question, unless eq and == are swapped for AnyRef too… yikes.

TL;DR

The root of the problem: Numerics require two notions of equality, and Scala currently tries to squish both of these into ==. The requirement comes from the fact that we need one equality that is consistent with total order, and one that is consistent with partial order, in the case of floating point values. The one that is consistent with partial order is inadequate for use in sets/maps, etc. In some cases NaN must not equal itself, and in others it must!

The consequence of this is that numerics need eq too, and that overall language consistency demands considering how eq and == on numerics relates to <=.

I honestly think that Java got this right and Scala got it wrong, except for Java’s confusing and verbose API that means you have 1.0d == 1.0f in one place, and Double.compareTo in another, with .equals() and == being somewhat consistent with those, but due to auto-boxing its a bit of a mess.

Vallhalla

When project Vallhalla lands, and composite value types on the JVM likely also get Java == definitions (but not .equals()) Scala will have even more problems of this sort. The JVM notion of equality for an unboxed value type with two numbers in it: x: (Int, Float) = (1, 1.0f) will likely NOT be equal to a y: (Float, Int) = (1.0f, 1). They will need to define at minimum the equivalence consistent with boxed types, or raw bit-equality. They may also need to have one that is consistent with IEEE (so that composites with NaN don’t equal). Ironically, this is backwards for them; since == for doubles does not satisfy identity but in order to have sane value types in the jvm value equality demands it.

IMO, the existence of Vallhalla means that Scala will be facing the music eventually, and be forced to either have a big breaking change W.R.T. semantics of equality on numeric values, OR have even crazier (and branchier, slower) library code to bridge the semantic gap.

Scala.js trivia

  • Scala.js up-converts to Double, so it does not have inconsistencies for Int/Float but does for Long/Double

Apologies

I’m out of time for now but wish I had time to clean up my message above to be more clear and concise. I realize that this grows way out of scope from the initial proposal, but IMO once you go breaking how numbers compare with each other, you might as well fix all of the broken stuff, breaking it all at once.


#27

One more quick example of numeric value equality insanity in Scala today, that my proposal fixes:

scala> case class FI(f: Float, i: Int)
defined class FI

scala> case class IF(i: Int, f: Float)
defined class IF

scala> FI(1.0f, 1) == IF(1, 1.0f)
res19: Boolean = false

scala> val intfloat = (1, 1.0f)
intfloat: (Int, Float) = (1,1.0)

scala> val floatint = (1.0f, 1)
floatint: (Float, Int) = (1.0,1)

scala> intfloat == floatint
res20: Boolean = true

My proposal would make case classes isomorphic to tuples at the data and equality level, which they are currently not. This is especially important once Valhalla arrives on the JVM.

Also, it solves the original problem cleanly, by solving the problem at the root:

1 == 1L // false
(1: Any) == (1L: Any) // false
(1: Any).equals(1L: Any) // false
new java.lang.Integer(1).equals(new java.lang.Long(1L)) // false


#28

I believe value types are actually supposed to be allowed to define their own equals method (see here). I assume that == in Java will delegate to equals for value types (since there is no reference identity to compare), but I don’t know.

The Valhalla team decided not to use tuples for value types, because they lack encapsulation and nominality. Thus, even two different value types with the same underlying representation would never evaluate as equal (unless for some reason the author went out of their way to make it so).


#29

I think, as @sjrd said, it would be a much better idea to forbid comparing 1 and 1L than to make it return false. Returning false will lead to a plethora of subtle, difficult to find bugs because you forgot to cast 1 to a Long.


#30

@NthPortal That is good to know. If two value types with the same underlying structure can not be equated, providing the encapsulation and nominality, that puts some restrictions on the implementation and use sites, but is definitely cleaner.

The default implementation of equals / == (whichever syntax they pick) will likely compare the raw bits of the value then, in which case the bits for 1.0f are not the same as 1; unless they require authors to define equals. Also, I’m sure there will be a VarHandle -ish API that lets one compare the raw bit values, which can lead to very good performance on equals/gt/lt etc if leveraged.


#31

I consider whether the compiler allows these things to be a somewhat
independent consideration. I would agree that disallowing 1 == 1L makes
a lot of sense. Then we could optionally allow 1 eq 1L which is better
defined in relation to IEEE and Java’s ==.

But if the decision is to allow 1 == 1L because == is on Any and
1.==(1L) conforms to Any.==(other: Any) , it should be false.