Can we get rid of cooperative equality?

odersky · September 15, 2017, 8:14am

Scala has co-operative equality. This means that equality between numeric values is kept the same if the values are abstracted to Any:

scala> 1 == 1L
res0: Boolean = true

scala> (1: Any) == (1L: Any)
res1: Boolean = true

scala> (1: Any).equals(1L: Any)
res2: Boolean = false

The transcript shows that equality == on Any is not the same as equals. Indeed the == operator is treated specially by the compiler and leads to quite complicated code sequences. The same holds for the hash operator ## which is also more complex than hashCode. This has a price - it’s the primary reason why most sets and maps in Scala are significantly slower than equivalent data structures in Java (factors of up to 5 were reported, but I won’t vouch for their accuracy).

Now, why was cooperative equality added to Scala? This was not my idea, so I can only try to reconstruct the motivation. I believe the main reason was that it was felt that 1 == 1L should be the same as (1: Any) == (1L: Any). In other words, boxing should be transparent.

The problem with reasoning is that this tries to “paper over” the true status of == in Scala. In fact == is an overloaded method. There is one version on Any, and others on Int, Long, Float, and so on. If we look at it in detail the method called for 1 == 1L is this one, in class Int:

def ==(x: Long): Boolean

If we write (1: Any) == (1L: Any), it’s another == method, which is called. This used to be just the method postulated on Any:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

But with co-operative equality, we assume there’s an override of this method for numeric value types. In fact the SLS is wrong in the way this is specified. It says that equals is overriden for numeric types as follows:

That is, the equals method of a numeric value type can be thought of being defined as follows:

def equals(other: Any): Boolean = other match {
  case that: Byte   => this == that
  case that: Short  => this == that
  case that: Char   => this == that
  case that: Int    => this == that
  case that: Long   => this == that
  case that: Float  => this == that
  case that: Double => this == that
  case _ => false
}

This is demonstratively false:

scala> 1.equals(1L)
res3: Boolean = false

So, the conclusion seems to be that the compiler somehow treats == on Any as a combination of the numeric equals with the fallback case of general equals for non-numeric types.

The question is: Do we want to keep it that way? The current treatment seems to be both irregular and expensive. Are there other benefits that I have overlooked? And, how difficult would it be to move away from cooperative equality?

retronym · September 15, 2017, 9:12am

For reference, some prior discussions on equality: http://www.scala-lang.org/old/node/9423

The start of the change to the new scheme for 2.8: https://github.com/scala/scala/commit/b7772a6535b1d3989ad350069568b124619f2877

I’m also having trouble finding a particular example that makes the inconsistency between boxed and primitive hashing/equality harder to stomach in Scala than it is in Java.

My intuition is the ability to use primitives as type arguments sends a signal that Some[Long](x) == Some[Int](y) is morally equivalent to x == y. In Java, you’d have to explicitly use the box type as the type argument.

lrytz · September 15, 2017, 9:33am

Just for reference, in Java

jshell> 1 == 1L
$1 ==> true

jshell> new Integer(1).equals(new Long(1))
$3 ==> false

sjrd · September 15, 2017, 9:48am

If 1 == 1L is true, then I strongly believe that (1: Any) == (1L: Any). However, nothing says that 1 == 1L needs to be true! We can instead make it false, or, even better, a compile error (this is achieved very easily with an @compileTimeOnly annotation on the forbidden overloads). If we had a clean slate, and Scala only compiled only the JVM and/or native code, I would 100% champion this specification. Btw, 'A' == 65 is an aberration.

Now, as the Scala.js author, I need to point out that cooperative equality among numeric types (not including Char) was a blessing for Scala.js. Indeed, Scala.js does not box primitive numbers, both for performance reasons and for interoperability with JavaScript. This means that (1: Any) and (1.0: Any) are indistinguishable at run-time (Longs are special), and that means that (1: Any) == (1.0: Any) just has to be true in Scala.js; there is no way around it. If cooperative equality is dropped on the JVM (and Native), this would make == between numeric types inconsistent across platforms.

That said, I do not think it is reason enough to prevent Scala/JVM from fixing this very bad performance bug. After all, primitive numeric types in Scala.js are already inconsistent with Scala/JVM when upcast to Any, for example wrt. pattern matching and isInstanceOf. Making == slightly different wouldn’t make that much worse, especially if the primitive equality test 1 == 1.0 is reported as a compile error rather than silently returning false.

odersky · September 15, 2017, 11:16am

If 1 == 1L is true, then I strongly believe that (1: Any) == (1L: Any)

But why? They are two overloaded methods, there is no inherent requirement it should be so.

However, nothing says that 1 == 1L needs to be true!

Well, Java says it, and I don’t think we should contradict it on this one

odersky · September 15, 2017, 11:19am

I think that would be not ideal but still admissible. Essentially it says that you can’t rely on Any#== to have a particular behavior when called on values of mixed numeric types.

sjrd · September 15, 2017, 12:11pm

Because it’s bad API design? Because it’s super confusing? Using the “overloaded method” argument is not enough to explain away unnecessary surprising behavior.

Let me put the other way around: what would be the argument in favor of still allowing 1 == 1L to pass the typechecker, if it would otherwise be inconsistent with (1: Any) == (1L: Any)?

If you consider a compile error too hard a breakage, let’s have it still return true but let’s deprecate it, on the grounds that it is inconsistent with the upcast version. Basically deprecating a bad API.

soronpo · September 15, 2017, 1:42pm

Question (pardon my ignorance): Why is == and != different than say a + operation?
The expected behavior I want is:

1 == 1L //fails compilation
1 + 1L  //fails compilation
1 == 1  //true
1 + 1   //2

If we want the first two examples to work, then we can introduce an implicit conversion into the scope.

dwijnand · September 15, 2017, 2:25pm

Shouldn’t boxing be transparent in Scala?

In what sense is it irregular?

I’m with Sébastien: == should return the same result for primitives and boxed types. So to preserve that and improve performance we should explore the idea of 1 != 1L, or 1 == 1L being a type error.

shawjef3 · September 15, 2017, 3:22pm

(1: Any).equals(1L: Any) yielding false is completely surprising to me. I expected Scala’s == to have the same semantics as Java’s Object#equals, always and without exception. For an Int, I’d expect == to be java.lang.Integer.equals.

I’m for this change.

odersky · September 15, 2017, 3:26pm

I have the impression the discussion got derailed. I did not propose 1 != 1L and in fact would strongly object to this. To show why co-operative equality is irregular even if it looks regular at first, let me simplify the question to some synthetic classes ANY, A, and B with a === method:

  class ANY {
    def ===(that: ANY) = this eq that
  }

  case class A(x: String) extends ANY {
    def ===(that: A) = this.x == that.x
    def ===(that: B) = this.x == that.x
  }

  case class B(x: String) extends ANY {
    def ===(that: A) = this.x == that.x
    def ===(that: B) = this.x == that.x
  }

  val a = A("")
  val b = B("")

  a == b   // --> true
  (a: ANY) == (b: ANY)  // --> false

That’s what we would expect from Scala’s behavior. The point is, === is an overloaded method so the static types on which it is called matter. It also means that boxing is visible because the static types change. If === was a multi-method it would give true also for the second time, but Scala does not have multi-methods. On the other hand, for ==, we treat it as if it was a multi-method, or, rather, as if it had an extra-ordinarily complex and expensive implementation which makes it look like it is a multi-method for some types, but not for others, where we still use the overloaded behavior. This is what’s irregular about it.

Ichoran · September 15, 2017, 3:34pm

It’s actually more like a factor of two on a fair comparison. I did these tests when creating AnyRefMap; switching from cooperative to non-cooperative equality (as possible when things are typed as AnyRef) saves about a factor of two in speed. Using primitives directly gives about another factor of two (hence LongMap), but that isn’t a fair comparison because we’re talking about the behavior of Any.

Absolutely! The opaqueness of boxing of numbers is the source of endless Java puzzlers. Intuitively, the number one is the number one, regardless of whether it happens to be stored in a 32 bit integer or a 64 bit floating point value or a byte. It’s just one. Because users can create their own numeric type (e.g. Rational) with their own representation of one, it is not practical to maintain “one is one” universally. But it’s still a huge and worthwhile simplification of the cognitive model needed for dealing with numbers.

This is an implementation detail, presumably for speed. It needn’t be done this way. The various equalsXYZ methods in scala.runtime can handle any comparison.

The current treatment is expensive, but makes numbers more regular than they would be otherwise, thus avoiding a class of bugs that people run into in Java.

Fundamentally, as long as we have weak conformance and such around numbers, it’s profoundly inconsistent to allow 1L + 1 but not say 1L == 1 is both valid and returns true.

Rust, for example, has decided to disallow all of these: you cannot write 1u64 + 1u32 or 1u64 == 1u32. This is consistent and reduces the chance of error, but is also something of a hassle. (Unadorned numeric literals will conform to the type expected to avoid making it much too much of a hassle.) But Rust has no top type, so there is no expectation that (1L: Any) == (1: Any) behaves the same as 1L == 1.

So if cooperative equality were removed, I think equality on Any would have to go away entirely.

curoli · September 15, 2017, 3:33pm

Basically, since Java primitives behave differently from Java boxed
numbers, we can’t have comparisons between different numeric types that
satisfy all three of these:

(1) Scala unboxed numbers behave like Java primitives
(2) Scala boxed numbers behave like Java boxed numbers
(3) Scala unboxed numbers behave like Scala boxed numbers

It is difficult to have good JVM performance unless Scala numbers behave
like Java numbers. Scala boxed and unboxed being different sounds insane.

The only sane and efficient option seems to be, as has been suggested, to
deprecate comparisons between different numeric types and instead require
conversion to larger types, like Long and Double. Since these days almost
every platform is 64 bit, Long and Double are natively efficient.

Side note: comparing floating points to anything is pure evil. It can only
be forgiven in rare circumstances, such as emulating a language that does
not have integer types, like JavaScript.

Ichoran · September 15, 2017, 3:45pm

I would simply argue that === defined in this way is a poor API because it does not conform to the intuitive notion of sameness.

When concepts are different, it’s a good idea to use different method names.

class Confusing {
  def buh(s: String) = Try{ (new File(s)).delete }.isSuccess
  def buh(f: File) = f.exists
}

(In fact, I’d suggest that this example is a good argument against allowing overloaded method names.)

shawjef3 · September 15, 2017, 3:49pm

If you happen to not like this change, another way to think about it is:

The current way is odd and surprising.
The changed way is odd and surprising, but it’s faster.

Ichoran · September 15, 2017, 4:12pm

Can you post a REPL transcript of the odd and surprising behavior? (With ==, not equals?)

odersky · September 15, 2017, 4:32pm

Can you post a REPL transcript of the odd and surprising behavior? (With
==, not equals?)

scala> class A(val x: String) { def ==(that: A) = this.x == that.x }
defined class A
scala> val a = new A("")
val a: A = A@23b3aa8c
scala> val b = new A("")
val b: A = A@338cc75f
scala> a == b
val res2: Boolean = true
scala> (a: Any) == (b: Any)
val res3: Boolean = false
scala>

The example shows that == is an overloaded method, and behaves like one.
Except for numeric types
where we magically make it a multi-method.

odersky · September 15, 2017, 4:41pm

I would simply argue that === defined in this way is a poor API because it does not conform to the intuitive notion of sameness.

Poor API or not, that’s how == is defined! And there are many good reasons for that, starting with performance. Imagine if all primitive == comparisons delegated to Any…

Ichoran · September 15, 2017, 4:44pm

So are we suggesting that overloading be turned off for ==? Right now best practice would be (barring an extra canEqual check) to:

class A(val x: String) { def equals(a: Any) = a match {
  case a2: A => x == a2.x
  case _    => false
}

which doesn’t have the odd and surprising behavior you demonstrated. At least a linter should by default complain if one overloads == in that way.

odersky · September 15, 2017, 4:46pm

So are we suggesting that overloading be turned off for ==? Right now best practice would be (barring an extra canEqual check) to:

No, the opposite. Keep overloading but don’t treat numeric types specially. I.e.

(1: Any) != (1L: Any)

just like

(a: Any) != (b: Any)

in my example.