Can we get rid of cooperative equality?

NthPortal · September 15, 2017, 8:02pm

I think, as @sjrd said, it would be a much better idea to forbid comparing 1 and 1L than to make it return false. Returning false will lead to a plethora of subtle, difficult to find bugs because you forgot to cast 1 to a Long.

scottcarey · September 15, 2017, 8:07pm

@NthPortal That is good to know. If two value types with the same underlying structure can not be equated, providing the encapsulation and nominality, that puts some restrictions on the implementation and use sites, but is definitely cleaner.

The default implementation of equals / == (whichever syntax they pick) will likely compare the raw bits of the value then, in which case the bits for 1.0f are not the same as 1; unless they require authors to define equals. Also, I’m sure there will be a VarHandle -ish API that lets one compare the raw bit values, which can lead to very good performance on equals/gt/lt etc if leveraged.

scottcarey · September 15, 2017, 8:33pm

I consider whether the compiler allows these things to be a somewhat
independent consideration. I would agree that disallowing 1 == 1L makes
a lot of sense. Then we could optionally allow 1 eq 1L which is better
defined in relation to IEEE and Java’s ==.

But if the decision is to allow 1 == 1L because == is on Any and
1.==(1L) conforms to Any.==(other: Any) , it should be false.

scottcarey · September 15, 2017, 8:43pm

Set(1, 1L) implies that 1 == 1L exists, unless the scope is broadened further. It should return false.

Whether the user can type 1 == 1L is a different issue. If we reject that, we should also reject Some(1) == None and Left('a') == Right('a'). Currently the compiler emits a warning saying these are always false when two case classes of different types are compared with ==. I think this is essentially the same thing: comparing values of different types is false regardless of content.

Hmm, maybe that is another way to phrase my proposal. two values of different types are not equal, regardless of content. This would hold for:
Int, Float, case class F(f: Float), (Float, Int), (Int, Int), case class FI(f: Float, i: Int) etc.

All are values with different types, and are thus not equal, regardless of content.

odersky · September 16, 2017, 10:04am

| But how much do (2) and (3) matter if we can’t have (1)?

For me as a language designer and compiler writer, a whole lot! Performance matters as much as convenience, maybe more so. Semantic consistency and elegance also matters a lot to me and acts in this case as a clear tie-breaker.

There are also two other observations:

Scala is the odd man out here, no other higher performance statically typed language implements co-operative equality in the way it does.
Scala did not make itself a lot of friends with co-operative equality. Quite the contrary, a lot of people moved away or found an excuse to not adopt Scala because collection performance is so bad.

I would not take the popularity argument too seriously, if I was convinced that we do the right thing from a language design and semantics standpoint. But I am now convinced we actually did the wrong thing from that standpoint.

That said, I still don’t think we should change the way equality works for statically known numeric types. That’s a huge distraction. There is no best way to do this, so Scala’s choice to do it exactly like Java is valid and will stay like this.

odersky · September 16, 2017, 10:17am

Java does not say that 1 == 1L, it says that 1 eq 1L!

No, it says 1 == 1L. Let’s not re-interprete what symbols mean. Your observation that there really need to be two notions of equality because of NaN is valid and important. In my mind, there’s the equivalence relation, which is equals and there’s the ad-hoc equality, which is statically type-specific and is called ==. Currently, we have:

 scala> Double.NaN == Double.NaN
 res5: Boolean = false

(as mandated by IEEE standard)

 scala> Double.NaN.equals(Double.NaN)
 res6: Boolean = true

(as mandated by equivalence relation axioms)

 scala> (Double.NaN: Any) == (Double.NaN: Any)
 res7: Boolean = false

This is a problem, because it means that NaN cannot be stored as a key in Scala maps. If we drop co-operative equality, == on Any would be the same as equals and the comparison would do again the right thing.

odersky · September 16, 2017, 2:01pm

There’s one compromise proposal, which might be worth investigating: Keep == as it is, but systematically use equals and hashCode instead of == and ## in the new collections. This is a more limited change which does away with most of the performance penalty in practice. But if collections start to ignore the cooperative versions of Any.## and Any.==, the question remains why keep them around at all…

yangbo · September 16, 2017, 2:39pm

== behavior in Java is not important because we had broken Java behavior in many cases, e.g.:

new String("foo") == new String("foo") // Returns false in Java, true in Scala

yangbo · September 16, 2017, 2:54pm

I did not feel == is frequently used for arbitrary types.

Is it possible to remove Any.==?

Suppose there are only some of types support ==, e.g. Iterable.==, StringOps.==, Product.==, Int.==, then Scala collection should simply use equals for internal comparison.

There is another advantage to remove Any.==. We can define ArrayOps.== for cases like:

Array(1, 2, 3) == Array(1, 2, 3) // should return true

yangbo · September 16, 2017, 3:02pm

== is designed to support null in case of NullPointerException in null.equals(null).

However, it is unnecessary to let Any.== become a compiler instruction method, because this NullPointerException problem can be easily resolved by a XxxOps.== extension method.

Ichoran · September 17, 2017, 3:35am

I just don’t see how semantic consistency and elegance points in the direction of “you must know, in your head, the static type of the arguments in order to know how == behaves”.

Of course it’s possible to come up with awkward and confusing APIs where you have overloaded methods that make static types essential to know. But just because you can do this doesn’t, to me, indicate anything about whether one should. Failing to prevent misuse of a language feature is not the same as eagerly advocating it. The benefit of static types is, in large part, so that the computer can keep track of things that I might make a mistake with. If the static type matters critically in how equality is interpreted, it means that the compiler is no longer helping me; the burden is now on me to get it right so that == has the meaning I intend. Fundamentally, I think this is inelegant, shifting the burden from computer to person when it should go the other way; and I think this promotes semantic inconsistency because although the language is regular, it makes the code less consistent.

They don’t have a top type, either. That’s the real culprit: a top type that implements ==. That’s a huge burden to shoulder.

Java doesn’t have a top type. It’s got Object which is the top type of the object hierarchy, but it also has the primitive types that have no particular relationship with Object. You can’t speak generically about int and float and so on; only their boxed representations, Integer, Float, and so on. Despite some boxing and unboxing, there are nonetheless a myriad of difficulties dealing with boxed types vs. primitive types in Java.

C doesn’t have objects at all.

C++ doesn’t have a top type. Instead, templates form an entirely orthogonal way to get generic behavior by deferring the compilation to the point of usage, and only complaining there if it doesn’t make sense. It’s a very different model, and of course equality works very differently also.

Haskell doesn’t even have proper subtyping, so the idea of cooperative equality doesn’t even make sense to me in that context.

Rust sort of doesn’t have subtyping. (Technically it does for lifetimes.) Anyway, the idea of cooperative equality doesn’t make sense to me there either. In any case, you can’t compare numerics of different primitive types; there’s no implicit conformance and equality is defined within-type only.

C# doesn’t have a top type, and you can’t apply equality to generics without taking an equality typeclass.

Anyway, I could go on, but the hard part for Scala is that Any exists and has == defined on it, which in Scala is presumably value equality. No other high-performance language has that.

You’d know better than I do, but it is hard to measure the people who stayed and liked Scala because numbers “just work” with generics and collections instead of being a pain point like in Java.

Also, this is hardly the only case where Scala collections are behind Java. And it’s not even true that they’re behind in general. I wrote AnyRefMap precisely for those people who wanted java.util.HashMap-like performance in Scala (and it delivers!); and LongMap for people who had primitive keys and wanted to beat Java (and it does!). But despite this, Java 8 Streams can deliver 5x-10x faster performance on common operations (map and filter and such) on primitives than can Scala collections, since Scala has to box primitives. Scala implementations are generally immutable, which in many cases results in 2-3x worse performance just due to differences in algorithms. If people want to compare Java to Scala, there are loads of ways for Scala to seem worse. So though I take performance very seriously in my day-to-day work, I have trouble viewing this as a critical performance issue especially since you can always [A <: AnyRef] (in your own code, not library code) and get back to Java-style equality.

Anyway, as you said, we shouldn’t take the popularity argument too seriously. But if we’re going to make it at all, I think it would be good to gather more empirical data, e.g. on whether people like how == works with numbers.

NthPortal · September 17, 2017, 4:35am

It doesn’t, but it almost certainly will in the relatively near future. I don’t know the current state of Valhalla, and whether or not you can actually test how it behaves when comparing int and long when treated as the ‘any’ type, but it would be interesting (and perhaps useful?) to see how it behaves.

NthPortal · September 17, 2017, 5:41am

I think it’s worth noting that in Java, 1 == 1L is actually ((long) 1) == 1L, so it’s a (rather mundane) comparison of two longs. Comparing int and boolean (for example) is not allowed, which leads me to think that if, hypothetically, one could prevent the compiler from converting the int to a long, 1 == 1L would not be allowed either. All of this is to say that, if ((any) 1) == ((any) 1L) was somehow valid Java code, I think it would probably yield false, because the types are not comparable.

It is my impression from this discussion that, unlike Java, Scala treats 1 == 1L as a call to a method Int.==(Long), and not as 1.toLong == 1L. If the latter was the case, I think it would be obvious to say that (1: Any) != (1L: Any). However, even in the former case, it’s not clear to me that Any.==(Any) should jump through hoops to call Int.==(Long) if it’s an Int and its argument is a Long.

Sciss · September 17, 2017, 10:14am

Going away from this might be ‘ok’ for library code, but not for user code. It’s a punch in the face of dynamic-feel applications where you don’t want the user to be confronted with 1f versus 1.0 versus 1.

LPTK · September 17, 2017, 3:44pm

However, this does not solve the problem for composite keys like in Map[(Int,Int),Int] – when keys are case classes or tuples, they will still compare their components and compute hashes with == and ##… though I’m not sure how often people actually use composite keys.

> (1.0:Any) equals (1:Any)
res0: Boolean = false
> (1.0:Any, 2.0:Any) equals (1:Any, 2:Any)
res1: Boolean = true

This doesn’t seem consistent. For it to be, equals on Product classes should rely on equals of their components, and == should rely on == similarly.

odersky · September 17, 2017, 4:08pm

I see we have a disconnect on what semantic consistency and elegance means. What I mean by that is: “Be able to describe what the language does with as few principles as possible”. Pre co-operative equality we had

There is a method == in Any, defined as follows:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

There are overloaded methods in numeric classes that define == for specific combinations of types.

From a spec perspective, (2) could be seen as libraries, so I really care only about (1), which is simple enough. Post co-operative equality things got much messier. One sign of this is that the spec is now actually wrong, in the sense that it does not describe what the actual implemented behavior is (see my earlier post). Even if we would fix the spec it would have to specify the == method on Any as something like this:

final def ==(that: Any): Boolean  = this match {
  case this: Byte =>
    that match {
      case that: Byte   => this == that
      case that: Short  => this == that
      case that: Char   => this == that
      case that: Int    => this == that
      case that: Long   => this == that
      case that: Float  => this == that
      case that: Double => this == that   
      case _ => false
    }
  case this: Short =>
    ... 
    ... same as for Byte for all other numeric types
    ...
  case _ =>     
    if (null eq this) null eq that else this equals that
}

I guess you agree that’s cringeworthy. It’s bloated, and anti-modular in that it ties the definition of Any with the precise set of supported numeric classes. If we ever would want to come back and add another numeric class, the definition would be invalid and would have to be rewritten. We could try to hide the complexity by specifying that == should behave like a multi-method. But that means we pull a rabbit out of our hat, because Scala does not have multi-methods. That’s actually another good illustration of the difference between (1) and (2). Multi-methods are very intuitive so from a viewpoint of (1) are desirable. But adding them to a semantic would be a huge complication.

odersky · September 17, 2017, 4:22pm

However, this does not solve the problem for composite keys like in Map[(Int,Int),Int] – when keys are case classes or tuples, they will still compare their components and compute hashes with == and ##…

Thanks for this observation! So because of co-operative equality equals and hashCode now turn out to be broken as well! This is a great demonstration that complexity breeds further complexity.

I think equals and hashCode for case classes need to be defined in terms of themselves. It’s weird that they should forward to == and ##. But of course, that would mean we need four instead of two methods per case class to implement equality and hashing.

Unless, of course, we get rid of co-operative equality. It seems the case for doing this gets ever stronger.

In light of this development we might actually need to do this for 2.13. The problem is that we cannot fix the new collections to be more performant and NaN safe without also fixing the generation of equals and hashCode for case classes.

NthPortal · September 17, 2017, 4:33pm

An interesting thing I discovered is that, if cooperative equality is removed (without changing anything else), symmetry for == will be broken for a small number of cases. Specifically:

> (1: Any) == (BigInt(1): Any)
res0: Boolean = false
> (BigInt(1): Any) == (1: Any)
res1: Boolean = true
> (1: Any) == (BigDecimal(1): Any)
res2: Boolean = false
> (BigDecimal(1): Any) == (1: Any)
res3: Boolean = true

In fact, the above cases are already violate symmetry for equals (is that a bug?)

scala> 1 equals BigInt(1)
res4: Boolean = false

scala> BigInt(1) equals 1
res5: Boolean = true

(I’m not saying this is a reason to keep cooperative equality; I’m only noting that it may add complications.)

odersky · September 17, 2017, 5:48pm

In fact, the above cases are already violate symmetry for equals (is that a bug?)

I would say, yes. If we want to stay consistent, we should have

 BigInt(1) == 1      == true     
 1 == BigInt(1)      == true
 BigInt(1).equals(1) == false
 1.equals(BigInt(1)) == false

NthPortal · September 17, 2017, 7:33pm

BigInt(1) == 1      == true

BigInt(1).equals(1) == false

Aren’t BigInt(1) == 1 and BigInt(1).equals(1) equivalent, assuming BigInt(1) isn’t null (which it isn’t)?