Can we get rid of cooperative equality?

You wouldn’t drop it on AnyRef; that’s well-defined already to be non-cooperative. Just on Any. You can always .asInstanceOf[AnyRef] when you need non-cooperative equality on Any (and a typeclass could make that better).

That sounds like one reasonable way to get some data on how widespread problems are.

The thing is, I don’t expect the problems to be very widespread, just rather dire; and they would tend to occur in places where people have done things which are valid but not best practice (hopefully rare in the community build).

For example, suppose there is a site that has User IDs that are given by number, but during account creation there are partial user records that are identified by username instead (which is also guaranteed to be unique). Someone writes

val users: Map[Any, UserRecord] = ...

It really should be Either[String, Long] or somesuch, but hey, it works.

Now suppose there are a set of admin users with predefined user numbers.

users.get(0)

Uh-oh. After the change to equality, the admin user lookups fail.

In C# it doesn’t work in generic context, but I don’t have enough experience with C# to really know whether there are equality pitfalls there.

In Java people do have problems with == vs. equals with stuff like

Long x = 150L; if (x != 150L) System.out.println("What the...?!");

at least judging from StackOverflow questions.

Java forces you to pay attention all the time to whether something is boxed or not in order to even know what method name to use. If you’re already doing that, it’s easy enough to cope with !((Object)1L).equals((Object)1) despite 1L == 1. Again, it’s not even the same method name!

The difference to Java is that in Java, it is always obvious whether a type
is unboxed or boxed, so at least people can more easily adapt to unboxed
and boxed types behaving differently.

But that would mean that e.g. HashMap could not use == anymore and would have to fall back on equals. We could do that but doing so would probably already cause most of the migration errors we would expect overall. So, if migration is our main concern, we might as well keep == for Any.

Perhaps you’re right–it wouldn’t be worth it to have the compiler help people catch errors in their own code when it’s usage of library code that is most likely to reveal the difference.

There’s no way we can drop == on Any or AnyRef. It plays a central role in almost every Scala program.

Why? IIRC, moving Any.== to Any.AnyOps does not break source-level compatibility.

The title of this thread was not meant as a rhetorical question. I started this thread because I was not sure whether I had all the arguments for co-operative equality. In the discussion that followed I did not see any new arguments for it, but several serious new arguments against.

Here’s the case against co-operative equality:

  • it is very slow
  • It is not an equivalence relation
  • It “breaks” the one operation that is fast and has a chance of being an equivalence: equals
  • It therefore “breaks” usage of Java collections from Scala.
  • it is a mess to specify correctly

“Break” means: We can construct examples where the outcome violates important laws.

Slow: Map get is at least twice as slow in Scala than in Java because it has to use co-operative equality. Other operations are also affected.

Not an equivalence: The culprit here is NaN. The IEEE floating point standard mandates

NaN != NaN

and that’s what the JVM implements, One can have a philosophical discussion whether that makes sense or not (and there are good arguments for both sides), but the fact is that we will not go against an established standard. The problem is then that with co-operative equality this irregularity, which was restricted to floating point comparisons only, now gets injected into our universal equality relation. I remember having seen bug reports about this. Users get bitten because

mutable.Map[Any, Int](NaN -> 1).get(NaN)

gives a None instead of a Some(1).

Now things get ironical. People might turn to Java collections instead of Scala collections to solve the two problems above. Java collections are based on equals instead of ==. Unfortunately, cooperative equality means that equals in Scala is also broken! Consider:

scala> NaN equals NaN
res1: Boolean = true

scala> (NaN, 1) equals (NaN, 1)
res2: Boolean = false

Similarly, but dually,

scala> 1 equals 1L
res3: Boolean = false

scala> Some(1) equals Some(1L)
res4: Boolean = true

So, equals is not even a congruence anymore! In other words, our well-intentioned attempt to improve the API of == has actually ruined the API of equals! (and, no, there’s no easy way to fix this).

Breaks Java collections. The illogical implementation of equals is a problem if we want to use Java collections with Scala case classes as keys.

Messy to specify. That was my original complaint and I have already written too much about it.

For me the most enlightening comments in this thread were the one by @scottcarey where he showed that we need two notions of equality, one an equivalence and the other not, and the one by @LPTK where he showed the problems with equals.

So I am now convinced that we should do what we can to drop cooperative equality on Any (and by extension on all unbounded generic type parameters). As @Ichoran notes, the big problem here is migration. And I am not sure I have a good answer yet, except, try it out on large code bases and see what happens. Hopefully, the instances where the change matters will be far and few between.

3 Likes

For clarity: does this change mean that primitive 1 and 1L will be equal, but boxed 1 and 1L won’t?

2 Likes

That’s what it means, yes.

So these two methods would give different results for isOne(1L)? That’s also far from an ideal situation. Maybe better to go all the way and get rid of universal equality then.

def isOne(a: Long) = a == 1
def isOne[A](a: A) = a == 1

Perhaps even more confusing, specializing a class or method might also result in different behavior I guess.

5 Likes
def isOne[A](a: A) = a == 1

is a little bit odd. If you know it’s numeric, then you can do

def isOne[A: Numeric](a: A) = Numeric[A].toInt(a) == 1

instead. If it’s not numeric, then I’m not sure I get the point of the method.

For migration purpose, we can drop the usage of == in all case class code generation and in Scala collections before other actions.

Another idea is making == a placeholder, whose implementation must be introduced by explicit importing.

import numbericEquality._
Double.NaN == Double.NaN // false
(Double.NaN, 1) == (Double.NaN, 1) // Compilation error because no NumbericEquality for Tuple2
null == Double.NaN  // Compilation error because no NumbericEquality for Null
import structuralEquality._
Double.NaN == Double.NaN // true
(Double.NaN, 1) == (Double.NaN, 1) // true
null == Double.NaN // false
import referentialEquality._
Double.NaN == Double.NaN // Compilation error because Double is not an AnyRef
(Double.NaN, 1) == (Double.NaN, 1) // false
null == Double.NaN  // Compilation error because Double is not an AnyRef

structuralEquality can be implemented as equals with null checking, which is useful in case class code generation and Scala collections.

I agree it’s silly, but the point is that someone can write it and it will behave differently from what they expect because of boxing which should be just an implementation detail.

Current:

// Scala
Set(1, 1.0).size // 1
Set(1, 1.0, 1L).size // 1

// Scala.js
Set(1, 1.0).size // 1
Set(1, 1.0, 1L).size // 1
1.isInstanceOf[Double] // true
1.isInstanceOf[Long] // false

// Java
Set<Object> set = new HashSet<>();
set.add(1);
set.add(1.0);
set.size();  // 2
set.add(1L);
set.size();  // 3

// Kotlin
setOf(1, 1.0).size // 2
setOf(1, 1.0, 1L).size // 3

// Kotlin-js
setOf(1, 1.0).size // 1
setOf(1, 1.0, 1L).size // 2, because Long in Kotlin is not mapped to any JavaScript object.
(1 as Any) is Double // true
(1 as Any) is Long // false

I think getting rid of cooperative equality is a good idea, because the behavior above in Scala would become the same to Java and Kotlin in JVM.
Though the new behavior above for collections in JVM and JavaScript becomes inconsistent, It is not bad to align with platform (JVM/JavaScript) collection behavior.

As some people comment, 1 == 1.0 is reported as a compile error is a good idea IMHO. Kotlin doesn’t allow you to compare two AnyVal instance (though Kotlin doesn’t have this type concept).

// kotlin
1 == 1.0 // compile error
1 as Any == 1.0 // false
1 == 1.0 as Any// false
1 as Int == 1.0 // compile error

fun <T> isOne(t: T): Boolean {
    return t == 1
}
isOne(1) // true
isOne(1.0) // false 

I think this is a better design, because if we can’t compare 1 to 1L, people will not confuse why 1 == 1L is not inconsistent to (1: Any) == (1L: Any), Set(1, 1L).size is 1 or 2. And we can get rid of 'A' == 65 as sjrd mentioned.
Though I know 1 == 1L compiles has better semantic consistency, the Kotlin way looks better in practice.
If some people still want to compare 1 to 1L, we can let he/she import something, and use this feature. We can also group this feature and any2stringadd into the same package to tell people the features in this package is not strict.

BigInt(1) == 1 // true look likes a bug to me. Because in BigInteger.valueOf(1) == 1 is false in Scala.

1 Like

I think we need to look at it in another way. It’s not boxing that is to blame, but the interaction of overloading with type abstraction. Have a look the example with === I gave towards the beginning of the thread. It behaves in exactly the same way as == without co-operative equality, Yet there is no boxing. So this is a fundamental property (or shortcoming, depending how you want to look at it) of systems that mix type abstraction and overloading.

I agree that it would probably be good to get rid of cooperative equality.

But I also agree with critiques of the fact that (1 == 1L) != ((1:Any) == 1L). This looks pretty surprising to me, so I’m in favor of disabling or at least warning against the numeric == overloads. As long as these overloads were just an optimization of the more general cooperative ==, they were fine –– but without cooperative equality, their semantics becomes problematic.

It doesn’t matter that it makes perfect sense or that it’s regular from the point of view of language experts. It’s a problem because it flies in the face of intuition. It’s a problem for the same reason that the case classes with === described above are an anti-pattern. People should be discouraged from using anti-patterns (such as unrestricted implicit conversions, which are now behind a feature flag), so lifting that one anti-patterns into the language does not seems to be going in the right direction.

In IntelliJ, I can’t even control-click on == in 1 == 0.5. It is (or at least, it’s close to be) a built-in method, so a reasonable amount of magic is expected –– but surprising behaviors are not. I’d wager that most current Scala users, irrelevant of expertise level, don’t even know that these overloads exist. The only reason I knew is because I’ve been working on a metaprogramming framework where that turned out to be relevant. That I only learned about them through metaprogramming is a good thing, because it means that overloading == was truly an implementation detail that people generally need not worry about. Making that gruesome detail suddenly relevant to all Scala users by making its behavior counter-intuitive will only work to add to the mental overhead that Scala developers already have to carry while working with the language.

Personally, I like statically-typed languages because the compiler and libraries can relieve some of the cognitive burden from my mind, not because they can add to it (via surprising overloads).

2 Likes

Going away from this might be ‘ok’ for library code, but not for user code. It’s a punch in the face of dynamic-feel applications where you don’t want the user to be confronted with 1f versus 1.0 versus 1.

Any user code that does not comprehend the difference between 1.0f, 1.0 and 1 is bound to be broken. I see making

def f: Float
f == 1L

fail to compile be a very, very good thing. For example,

scala> def f: Float = 1.2f - 1 
f: Float

scala> f * 5 == 1
res5: Boolean = false

scala> f * 5
res6: Float = 1.0000002

equality on floating point numbers is an anti-pattern for most users, who don’t know enough about them to avoid traps like the above.

Users will be confronted with 1.0f vs 1 whether they like it or not.

4 Likes

x == 150L returns true in Java. It won’t if one side uses new Long though. Auto-boxing uses Long.valueOf and 150 is guaranteed by the spec to be a flyweight instance because its magnitude is small. It would break if the Long was large enough, and this is something that is a problem in Java.

Anyhow, most of the problems with java are due to auto-boxing peculiarities that are not relevant for Scala if implemented well.

you are thinking that a user is someone writing a regular Scala program.
But that’s not the case in a REPL, in a DSL, in a Jupyter type worksheet
etc. And I was specifically talking about float, double, int, i.e.
leaving away an f suffix.

best, .h.h.

Absolutely. The boxing side of things is behind the scenes and not fundamental. In Java’s case, auto-boxing sometimes causes a surprise switch from reference equality to numeric equality, Scala can avoid that.

We know a few important things:

AnyRef/Object must expose two notions of equality

  • Equivalence Relation (of which, structural is a subset)
  • Reference equality (which is also a valid equivalence relation)

This leads to a simple solution:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

And thus, for AnyRef, == is simply short-hand for null-safe equivalence.

Numerics must expose two notions of equality

  • An Equivalence relation, consistent with total Order
  • IEEE numeric equivalence, consistent with total order for Integral types and partial order for floating point types

And here is where the dilemma lies. For reference types, the ‘secondary’ notion of equivalence is a valid equivalence relation. This makes a default implementation for equals simple. However, IEEE numeric equivalence (and partial order) is not suitable for collection keys, be it for hash-based or order based key identification. There is no default implementation that can satisfy both notions of equivalence for numeric values.

If we want to avoid Java’s issues where == means one thing in one place, and something entirely different in another (numerics vs references; numerics get boxed and suddenly == means something else) then the choice for == on numerics should be the same as for references: a null-safe equivalence relation. AnyVal types are not null, but are sometimes boxed and could be null at runtime, so the null-safe aspect is at least important as an implementation detail. But otherwise, it can be the same as with AnyRef, we just need to specify equals for each numeric type N as

def equals(that: Any): scala.Boolean = that match {
  case that: N => compare(that) == 0
  case _ => false
 }

Or in English, equals is consistent with the total ordering if the type matches, and false otherwise. On the JVM this is effectively the contents of each Java boxed type’s equals method.

But what about IEEE?

The above defines == to be a null safe equivalence. This is incompatible with IEEE’s == for floating point numbers. It however is consistent with IEEE ==, <, <= for integer types. I propose that we implement the IEEE754 total ordering for floating point numbers in these cases ( What Java’s compareTo on the boxed types do). In short, NaN == NaN. After all, most users would expect that. Also, it is very fast – comparison can be done by converting to integer bits and then comparing the integers - at the assembly level just using a two’s complement integer compare op.

I would not find it strange to specify that ==, <=, >=, <, and > in scala represent equivalence and total order by default. It is what we do for reference types, why not numerics? That breaks away from Java but I suspect it is more intuitive to most users and more consistent in the language IMO. It is certainly more useful for collections.

For users who are more advanced with floating point, they can pull out the IEEE tools. It is only with floating point types where there is a gap and we need parallel notions of equality to go with partial order.

The users that truly want IEEE semantics for numeric operations on floating point values must know what they are doing to succeed in writing an algorithm that depends on NaN != NaN anyway. For them, switching to some other syntax for IEEE will not be difficult. Perhaps there are different symbols, perhaps different names, or perhaps an import will switch a scope to the IEEE definitions.

Proposal

Expression Scala 2.11 Proposal
1.0 == 1L true false or will not compile
(1.0: Any) == 1L true false
(1.0: Any).equals(1L) false false
Double.NaN == Double.NaN false true
Double.NaN.equals(Double.NaN) true true
1.0F < Float.NaN false true
1.0F > Float.NaN false false
Set(1.0F, 1, 1L).size 1 3
Map(Double.NaN -> "hi").size 1 1
Map(Double.NaN -> "hi").get(Double.NaN) None Some(hi)
TreeMap(Double.NaN -> "hi").get(Double.NaN) Some(hi) Some(hi)
(1.0, 1) == (1, 1.0) true false
Some(1.0) == Some(1) true false
List(1, 1.0) == List(1, 1) true false
BigInt(1) == 1 true false or will not compile
UnsignedInt(1) == 1 N/A false or will not compile
Left(1) == Right(1) false w/ warning will not compile?
List(1) == Vector(1) true ???

The proposal boils down to a couple rules for consistency:

Two values of different nominal types are never equal. This holds for case classes today, the proposal makes it work consistently with tuples, case classes, and plain numeric types. The compiler can error when a result is guaranteed to be false due to mismatched types. It would be consistent with Valhalla. I don’t have an opinion on what to do with List(1) == Vector(1), that is more collection design than language.

For use cases where we want to compare across types in a cooperative way (perhaps the DSL / worksheet use case mentioned) one can either provide different methods, or use an import to switch the behavior. Or perhaps there are better ideas.

equals is consistent with == This leaves the definition for == as short-hand for null-safe equals – an equivalence relation – consistent with Ordering for a type. The consequence is that NaN == NaN and the default behavior is conformant to use of values as collection keys. Every other option I thought of was just far more inconsistent overall. Give up on NaN != NaN and the rules are clean and consistent. Otherwise you have to carve out an exception for floating point numbers and have collections avoid using == in some cases, or make equals inconsistent with ==.

Combined, these two rules would make it much simpler to extend numeric types, and add things like UnsignedInt – there is no quadratic explosion of complexity if equivalence is not cooperative.

5 Likes

kudos for making the table, which really is the most important thing on this thread
imo the desired outcome would be for those samples to not compile, but the the other result is okay too
i’m sure this would not have much impact even on unityped programs (contentious NaN cases included)