Can we get rid of cooperative equality?

NthPortal · September 17, 2017, 5:41am

I think it’s worth noting that in Java, 1 == 1L is actually ((long) 1) == 1L, so it’s a (rather mundane) comparison of two longs. Comparing int and boolean (for example) is not allowed, which leads me to think that if, hypothetically, one could prevent the compiler from converting the int to a long, 1 == 1L would not be allowed either. All of this is to say that, if ((any) 1) == ((any) 1L) was somehow valid Java code, I think it would probably yield false, because the types are not comparable.

It is my impression from this discussion that, unlike Java, Scala treats 1 == 1L as a call to a method Int.==(Long), and not as 1.toLong == 1L. If the latter was the case, I think it would be obvious to say that (1: Any) != (1L: Any). However, even in the former case, it’s not clear to me that Any.==(Any) should jump through hoops to call Int.==(Long) if it’s an Int and its argument is a Long.

Sciss · September 17, 2017, 10:14am

Going away from this might be ‘ok’ for library code, but not for user code. It’s a punch in the face of dynamic-feel applications where you don’t want the user to be confronted with 1f versus 1.0 versus 1.

LPTK · September 17, 2017, 3:44pm

However, this does not solve the problem for composite keys like in Map[(Int,Int),Int] – when keys are case classes or tuples, they will still compare their components and compute hashes with == and ##… though I’m not sure how often people actually use composite keys.

> (1.0:Any) equals (1:Any)
res0: Boolean = false
> (1.0:Any, 2.0:Any) equals (1:Any, 2:Any)
res1: Boolean = true

This doesn’t seem consistent. For it to be, equals on Product classes should rely on equals of their components, and == should rely on == similarly.

odersky · September 17, 2017, 4:08pm

I see we have a disconnect on what semantic consistency and elegance means. What I mean by that is: “Be able to describe what the language does with as few principles as possible”. Pre co-operative equality we had

There is a method == in Any, defined as follows:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

There are overloaded methods in numeric classes that define == for specific combinations of types.

From a spec perspective, (2) could be seen as libraries, so I really care only about (1), which is simple enough. Post co-operative equality things got much messier. One sign of this is that the spec is now actually wrong, in the sense that it does not describe what the actual implemented behavior is (see my earlier post). Even if we would fix the spec it would have to specify the == method on Any as something like this:

final def ==(that: Any): Boolean  = this match {
  case this: Byte =>
    that match {
      case that: Byte   => this == that
      case that: Short  => this == that
      case that: Char   => this == that
      case that: Int    => this == that
      case that: Long   => this == that
      case that: Float  => this == that
      case that: Double => this == that   
      case _ => false
    }
  case this: Short =>
    ... 
    ... same as for Byte for all other numeric types
    ...
  case _ =>     
    if (null eq this) null eq that else this equals that
}

I guess you agree that’s cringeworthy. It’s bloated, and anti-modular in that it ties the definition of Any with the precise set of supported numeric classes. If we ever would want to come back and add another numeric class, the definition would be invalid and would have to be rewritten. We could try to hide the complexity by specifying that == should behave like a multi-method. But that means we pull a rabbit out of our hat, because Scala does not have multi-methods. That’s actually another good illustration of the difference between (1) and (2). Multi-methods are very intuitive so from a viewpoint of (1) are desirable. But adding them to a semantic would be a huge complication.

odersky · September 17, 2017, 4:22pm

However, this does not solve the problem for composite keys like in Map[(Int,Int),Int] – when keys are case classes or tuples, they will still compare their components and compute hashes with == and ##…

Thanks for this observation! So because of co-operative equality equals and hashCode now turn out to be broken as well! This is a great demonstration that complexity breeds further complexity.

I think equals and hashCode for case classes need to be defined in terms of themselves. It’s weird that they should forward to == and ##. But of course, that would mean we need four instead of two methods per case class to implement equality and hashing.

Unless, of course, we get rid of co-operative equality. It seems the case for doing this gets ever stronger.

In light of this development we might actually need to do this for 2.13. The problem is that we cannot fix the new collections to be more performant and NaN safe without also fixing the generation of equals and hashCode for case classes.

NthPortal · September 17, 2017, 4:33pm

An interesting thing I discovered is that, if cooperative equality is removed (without changing anything else), symmetry for == will be broken for a small number of cases. Specifically:

> (1: Any) == (BigInt(1): Any)
res0: Boolean = false
> (BigInt(1): Any) == (1: Any)
res1: Boolean = true
> (1: Any) == (BigDecimal(1): Any)
res2: Boolean = false
> (BigDecimal(1): Any) == (1: Any)
res3: Boolean = true

In fact, the above cases are already violate symmetry for equals (is that a bug?)

scala> 1 equals BigInt(1)
res4: Boolean = false

scala> BigInt(1) equals 1
res5: Boolean = true

(I’m not saying this is a reason to keep cooperative equality; I’m only noting that it may add complications.)

odersky · September 17, 2017, 5:48pm

In fact, the above cases are already violate symmetry for equals (is that a bug?)

I would say, yes. If we want to stay consistent, we should have

 BigInt(1) == 1      == true     
 1 == BigInt(1)      == true
 BigInt(1).equals(1) == false
 1.equals(BigInt(1)) == false

NthPortal · September 17, 2017, 7:33pm

BigInt(1) == 1      == true

BigInt(1).equals(1) == false

Aren’t BigInt(1) == 1 and BigInt(1).equals(1) equivalent, assuming BigInt(1) isn’t null (which it isn’t)?

Ichoran · September 18, 2017, 1:52am

I agree with that!

I agree they’re messier in code. But the principle is really simple:

Equality behaves the same way regardless of context for standard library types.

You can rewrite it in pseudocode as

forall[A, B, C >: A, D >: B]{ (a: A) == (b: B) iff (c: C) == (d: D) }

if you want a formula. Despite the simple principle and simple formula, though, it’s quite hairy to implement it.

I do agree with all that. Despite being really nice to work with at the user level, it makes certain parts of the implementation very awkward to adjust, effectively freezing that part of the language in stone (or at least greatly raising the barrier to make changes, e.g. with the unsigned numeric types).

But I think the solution, if any, has to be to drop == on Any, because

I can’t think of any other way we can catch behavioral changes in existing code. Working code will just randomly and surprisingly fail (hopefully rarely!) as the behavior shifts, otherwise.
I can’t think of any other way that won’t create a perpetual source of bugs as people try equality in the context of different type information and get different results. (We can’t forbid overloading equals, but we certainly can have linters flag this as almost surely wrong and confusing.)

Do you have a better way to catch behavioral changes in existing code and prevent bugs in future code?

We could add a typeclass that would re-enable == on Any; effectively

trait Equivalence[A] {
  def apply(lhs: A, rhs: A): Boolean
}

implicit class AnyHasEquals(a: Any)(implicit eql: Equivalence[Any])
extends AnyVal {
  def ==(that: Any) = eql(this, that)
}

One of the implicits one could use could forward to scala.runtime.BoxesRunTime.equals and then the existing behavior would continue (with speed penalty, but at least you can control your destiny then).

Of course, we’d have to make sure that this was typically zero-cost, to meet speed requirements. Implicit AnyVals leave a lot of crud behind in the bytecode presently.

odersky · September 18, 2017, 4:46pm

There’s no way we can drop == on Any or AnyRef. It plays a central role in almost every Scala program.

But migration indeed the crux of the matter. Would it be too risky to revert now? I don’t remember any breakage when we introduced co-operative equality (was it in 2.8?) so at least at the time few programs cared either way. I do remember people being bitten by NaN in collections, but here the situation would improve if we reverted.

Maybe we could put the new behavior under a command-line switch and try to do the community build with the new option? That would give us some indication how widespread problems would be.

I can’t think of any other way that won’t create a perpetual source of bugs as people try equality in the context of different type information and get different results. (We can’t forbid overloading equals, but we certainly can have linters flag this as almost surely wrong and confusing.)

I don’t think this will be much of an issue. Somehow people have no problem in Java or C# with this, nor do I remember our users having had a problem in Scala before we introduced the change.

soronpo · September 18, 2017, 5:32pm

An idea, but please forgive if it is ridiculous. What if == would return a different boolean type when used between unrelated numerics? Which means 1 == 1L would return a JBoolean. If someone wants to support cooperative equality, then import a JBoolean => Boolean implicit.

Ichoran · September 18, 2017, 5:33pm

You wouldn’t drop it on AnyRef; that’s well-defined already to be non-cooperative. Just on Any. You can always .asInstanceOf[AnyRef] when you need non-cooperative equality on Any (and a typeclass could make that better).

That sounds like one reasonable way to get some data on how widespread problems are.

The thing is, I don’t expect the problems to be very widespread, just rather dire; and they would tend to occur in places where people have done things which are valid but not best practice (hopefully rare in the community build).

For example, suppose there is a site that has User IDs that are given by number, but during account creation there are partial user records that are identified by username instead (which is also guaranteed to be unique). Someone writes

val users: Map[Any, UserRecord] = ...

It really should be Either[String, Long] or somesuch, but hey, it works.

Now suppose there are a set of admin users with predefined user numbers.

users.get(0)

Uh-oh. After the change to equality, the admin user lookups fail.

In C# it doesn’t work in generic context, but I don’t have enough experience with C# to really know whether there are equality pitfalls there.

In Java people do have problems with == vs. equals with stuff like

Long x = 150L; if (x != 150L) System.out.println("What the...?!");

at least judging from StackOverflow questions.

Java forces you to pay attention all the time to whether something is boxed or not in order to even know what method name to use. If you’re already doing that, it’s easy enough to cope with !((Object)1L).equals((Object)1) despite 1L == 1. Again, it’s not even the same method name!

curoli · September 18, 2017, 6:53pm

The difference to Java is that in Java, it is always obvious whether a type
is unboxed or boxed, so at least people can more easily adapt to unboxed
and boxed types behaving differently.

odersky · September 18, 2017, 7:09pm

But that would mean that e.g. HashMap could not use == anymore and would have to fall back on equals. We could do that but doing so would probably already cause most of the migration errors we would expect overall. So, if migration is our main concern, we might as well keep == for Any.

Ichoran · September 18, 2017, 7:58pm

Perhaps you’re right–it wouldn’t be worth it to have the compiler help people catch errors in their own code when it’s usage of library code that is most likely to reveal the difference.

yangbo · September 18, 2017, 9:33pm

There’s no way we can drop == on Any or AnyRef. It plays a central role in almost every Scala program.

Why? IIRC, moving Any.== to Any.AnyOps does not break source-level compatibility.

odersky · September 21, 2017, 8:02am

The title of this thread was not meant as a rhetorical question. I started this thread because I was not sure whether I had all the arguments for co-operative equality. In the discussion that followed I did not see any new arguments for it, but several serious new arguments against.

Here’s the case against co-operative equality:

it is very slow
It is not an equivalence relation
It “breaks” the one operation that is fast and has a chance of being an equivalence: equals
It therefore “breaks” usage of Java collections from Scala.
it is a mess to specify correctly

“Break” means: We can construct examples where the outcome violates important laws.

Slow: Map get is at least twice as slow in Scala than in Java because it has to use co-operative equality. Other operations are also affected.

Not an equivalence: The culprit here is NaN. The IEEE floating point standard mandates

NaN != NaN

and that’s what the JVM implements, One can have a philosophical discussion whether that makes sense or not (and there are good arguments for both sides), but the fact is that we will not go against an established standard. The problem is then that with co-operative equality this irregularity, which was restricted to floating point comparisons only, now gets injected into our universal equality relation. I remember having seen bug reports about this. Users get bitten because

mutable.Map[Any, Int](NaN -> 1).get(NaN)

gives a None instead of a Some(1).

Now things get ironical. People might turn to Java collections instead of Scala collections to solve the two problems above. Java collections are based on equals instead of ==. Unfortunately, cooperative equality means that equals in Scala is also broken! Consider:

scala> NaN equals NaN
res1: Boolean = true

scala> (NaN, 1) equals (NaN, 1)
res2: Boolean = false

Similarly, but dually,

scala> 1 equals 1L
res3: Boolean = false

scala> Some(1) equals Some(1L)
res4: Boolean = true

So, equals is not even a congruence anymore! In other words, our well-intentioned attempt to improve the API of == has actually ruined the API of equals! (and, no, there’s no easy way to fix this).

Breaks Java collections. The illogical implementation of equals is a problem if we want to use Java collections with Scala case classes as keys.

Messy to specify. That was my original complaint and I have already written too much about it.

For me the most enlightening comments in this thread were the one by @scottcarey where he showed that we need two notions of equality, one an equivalence and the other not, and the one by @LPTK where he showed the problems with equals.

So I am now convinced that we should do what we can to drop cooperative equality on Any (and by extension on all unbounded generic type parameters). As @Ichoran notes, the big problem here is migration. And I am not sure I have a good answer yet, except, try it out on large code bases and see what happens. Hopefully, the instances where the change matters will be far and few between.

dwijnand · September 21, 2017, 8:31am

For clarity: does this change mean that primitive 1 and 1L will be equal, but boxed 1 and 1L won’t?

odersky · September 21, 2017, 4:25pm

That’s what it means, yes.

Jasper-M · September 21, 2017, 4:45pm

So these two methods would give different results for isOne(1L)? That’s also far from an ideal situation. Maybe better to go all the way and get rid of universal equality then.

def isOne(a: Long) = a == 1
def isOne[A](a: A) = a == 1

Perhaps even more confusing, specializing a class or method might also result in different behavior I guess.