Can we get rid of cooperative equality?


#61
def isOne[A](a: A) = a == 1

is a little bit odd. If you know it’s numeric, then you can do

def isOne[A: Numeric](a: A) = Numeric[A].toInt(a) == 1

instead. If it’s not numeric, then I’m not sure I get the point of the method.


#62

For migration purpose, we can drop the usage of == in all case class code generation and in Scala collections before other actions.

Another idea is making == a placeholder, whose implementation must be introduced by explicit importing.

import numbericEquality._
Double.NaN == Double.NaN // false
(Double.NaN, 1) == (Double.NaN, 1) // Compilation error because no NumbericEquality for Tuple2
null == Double.NaN  // Compilation error because no NumbericEquality for Null
import structuralEquality._
Double.NaN == Double.NaN // true
(Double.NaN, 1) == (Double.NaN, 1) // true
null == Double.NaN // false
import referentialEquality._
Double.NaN == Double.NaN // Compilation error because Double is not an AnyRef
(Double.NaN, 1) == (Double.NaN, 1) // false
null == Double.NaN  // Compilation error because Double is not an AnyRef

structuralEquality can be implemented as equals with null checking, which is useful in case class code generation and Scala collections.


#63

I agree it’s silly, but the point is that someone can write it and it will behave differently from what they expect because of boxing which should be just an implementation detail.


#64

Current:

// Scala
Set(1, 1.0).size // 1
Set(1, 1.0, 1L).size // 1

// Scala.js
Set(1, 1.0).size // 1
Set(1, 1.0, 1L).size // 1
1.isInstanceOf[Double] // true
1.isInstanceOf[Long] // false

// Java
Set<Object> set = new HashSet<>();
set.add(1);
set.add(1.0);
set.size();  // 2
set.add(1L);
set.size();  // 3

// Kotlin
setOf(1, 1.0).size // 2
setOf(1, 1.0, 1L).size // 3

// Kotlin-js
setOf(1, 1.0).size // 1
setOf(1, 1.0, 1L).size // 2, because Long in Kotlin is not mapped to any JavaScript object.
(1 as Any) is Double // true
(1 as Any) is Long // false

I think getting rid of cooperative equality is a good idea, because the behavior above in Scala would become the same to Java and Kotlin in JVM.
Though the new behavior above for collections in JVM and JavaScript becomes inconsistent, It is not bad to align with platform (JVM/JavaScript) collection behavior.

As some people comment, 1 == 1.0 is reported as a compile error is a good idea IMHO. Kotlin doesn’t allow you to compare two AnyVal instance (though Kotlin doesn’t have this type concept).

// kotlin
1 == 1.0 // compile error
1 as Any == 1.0 // false
1 == 1.0 as Any// false
1 as Int == 1.0 // compile error

fun <T> isOne(t: T): Boolean {
    return t == 1
}
isOne(1) // true
isOne(1.0) // false 

I think this is a better design, because if we can’t compare 1 to 1L, people will not confuse why 1 == 1L is not inconsistent to (1: Any) == (1L: Any), Set(1, 1L).size is 1 or 2. And we can get rid of 'A' == 65 as sjrd mentioned.
Though I know 1 == 1L compiles has better semantic consistency, the Kotlin way looks better in practice.
If some people still want to compare 1 to 1L, we can let he/she import something, and use this feature. We can also group this feature and any2stringadd into the same package to tell people the features in this package is not strict.

BigInt(1) == 1 // true look likes a bug to me. Because in BigInteger.valueOf(1) == 1 is false in Scala.


#65

I think we need to look at it in another way. It’s not boxing that is to blame, but the interaction of overloading with type abstraction. Have a look the example with === I gave towards the beginning of the thread. It behaves in exactly the same way as == without co-operative equality, Yet there is no boxing. So this is a fundamental property (or shortcoming, depending how you want to look at it) of systems that mix type abstraction and overloading.


#66

I agree that it would probably be good to get rid of cooperative equality.

But I also agree with critiques of the fact that (1 == 1L) != ((1:Any) == 1L). This looks pretty surprising to me, so I’m in favor of disabling or at least warning against the numeric == overloads. As long as these overloads were just an optimization of the more general cooperative ==, they were fine –– but without cooperative equality, their semantics becomes problematic.

It doesn’t matter that it makes perfect sense or that it’s regular from the point of view of language experts. It’s a problem because it flies in the face of intuition. It’s a problem for the same reason that the case classes with === described above are an anti-pattern. People should be discouraged from using anti-patterns (such as unrestricted implicit conversions, which are now behind a feature flag), so lifting that one anti-patterns into the language does not seems to be going in the right direction.

In IntelliJ, I can’t even control-click on == in 1 == 0.5. It is (or at least, it’s close to be) a built-in method, so a reasonable amount of magic is expected –– but surprising behaviors are not. I’d wager that most current Scala users, irrelevant of expertise level, don’t even know that these overloads exist. The only reason I knew is because I’ve been working on a metaprogramming framework where that turned out to be relevant. That I only learned about them through metaprogramming is a good thing, because it means that overloading == was truly an implementation detail that people generally need not worry about. Making that gruesome detail suddenly relevant to all Scala users by making its behavior counter-intuitive will only work to add to the mental overhead that Scala developers already have to carry while working with the language.

Personally, I like statically-typed languages because the compiler and libraries can relieve some of the cognitive burden from my mind, not because they can add to it (via surprising overloads).


#67

Going away from this might be ‘ok’ for library code, but not for user code. It’s a punch in the face of dynamic-feel applications where you don’t want the user to be confronted with 1f versus 1.0 versus 1.

Any user code that does not comprehend the difference between 1.0f, 1.0 and 1 is bound to be broken. I see making

def f: Float
f == 1L

fail to compile be a very, very good thing. For example,

scala> def f: Float = 1.2f - 1 
f: Float

scala> f * 5 == 1
res5: Boolean = false

scala> f * 5
res6: Float = 1.0000002

equality on floating point numbers is an anti-pattern for most users, who don’t know enough about them to avoid traps like the above.

Users will be confronted with 1.0f vs 1 whether they like it or not.


#68

x == 150L returns true in Java. It won’t if one side uses new Long though. Auto-boxing uses Long.valueOf and 150 is guaranteed by the spec to be a flyweight instance because its magnitude is small. It would break if the Long was large enough, and this is something that is a problem in Java.

Anyhow, most of the problems with java are due to auto-boxing peculiarities that are not relevant for Scala if implemented well.


#69

you are thinking that a user is someone writing a regular Scala program.
But that’s not the case in a REPL, in a DSL, in a Jupyter type worksheet
etc. And I was specifically talking about float, double, int, i.e.
leaving away an f suffix.

best, .h.h.


#70

Absolutely. The boxing side of things is behind the scenes and not fundamental. In Java’s case, auto-boxing sometimes causes a surprise switch from reference equality to numeric equality, Scala can avoid that.

We know a few important things:

AnyRef/Object must expose two notions of equality

  • Equivalence Relation (of which, structural is a subset)
  • Reference equality (which is also a valid equivalence relation)

This leads to a simple solution:

final def == (that: Any): Boolean  =
  if (null eq this) null eq that else this equals that

And thus, for AnyRef, == is simply short-hand for null-safe equivalence.

Numerics must expose two notions of equality

  • An Equivalence relation, consistent with total Order
  • IEEE numeric equivalence, consistent with total order for Integral types and partial order for floating point types

And here is where the dilemma lies. For reference types, the ‘secondary’ notion of equivalence is a valid equivalence relation. This makes a default implementation for equals simple. However, IEEE numeric equivalence (and partial order) is not suitable for collection keys, be it for hash-based or order based key identification. There is no default implementation that can satisfy both notions of equivalence for numeric values.

If we want to avoid Java’s issues where == means one thing in one place, and something entirely different in another (numerics vs references; numerics get boxed and suddenly == means something else) then the choice for == on numerics should be the same as for references: a null-safe equivalence relation. AnyVal types are not null, but are sometimes boxed and could be null at runtime, so the null-safe aspect is at least important as an implementation detail. But otherwise, it can be the same as with AnyRef, we just need to specify equals for each numeric type N as

def equals(that: Any): scala.Boolean = that match {
  case that: N => compare(that) == 0
  case _ => false
 }

Or in English, equals is consistent with the total ordering if the type matches, and false otherwise. On the JVM this is effectively the contents of each Java boxed type’s equals method.

But what about IEEE?

The above defines == to be a null safe equivalence. This is incompatible with IEEE’s == for floating point numbers. It however is consistent with IEEE ==, <, <= for integer types. I propose that we implement the IEEE754 total ordering for floating point numbers in these cases ( What Java’s compareTo on the boxed types do). In short, NaN == NaN. After all, most users would expect that. Also, it is very fast – comparison can be done by converting to integer bits and then comparing the integers - at the assembly level just using a two’s complement integer compare op.

I would not find it strange to specify that ==, <=, >=, <, and > in scala represent equivalence and total order by default. It is what we do for reference types, why not numerics? That breaks away from Java but I suspect it is more intuitive to most users and more consistent in the language IMO. It is certainly more useful for collections.

For users who are more advanced with floating point, they can pull out the IEEE tools. It is only with floating point types where there is a gap and we need parallel notions of equality to go with partial order.

The users that truly want IEEE semantics for numeric operations on floating point values must know what they are doing to succeed in writing an algorithm that depends on NaN != NaN anyway. For them, switching to some other syntax for IEEE will not be difficult. Perhaps there are different symbols, perhaps different names, or perhaps an import will switch a scope to the IEEE definitions.

Proposal

Expression Scala 2.11 Proposal
1.0 == 1L true false or will not compile
(1.0: Any) == 1L true false
(1.0: Any).equals(1L) false false
Double.NaN == Double.NaN false true
Double.NaN.equals(Double.NaN) true true
1.0F < Float.NaN false true
1.0F > Float.NaN false false
Set(1.0F, 1, 1L).size 1 3
Map(Double.NaN -> "hi").size 1 1
Map(Double.NaN -> "hi").get(Double.NaN) None Some(hi)
TreeMap(Double.NaN -> "hi").get(Double.NaN) Some(hi) Some(hi)
(1.0, 1) == (1, 1.0) true false
Some(1.0) == Some(1) true false
List(1, 1.0) == List(1, 1) true false
BigInt(1) == 1 true false or will not compile
UnsignedInt(1) == 1 N/A false or will not compile
Left(1) == Right(1) false w/ warning will not compile?
List(1) == Vector(1) true ???

The proposal boils down to a couple rules for consistency:

Two values of different nominal types are never equal. This holds for case classes today, the proposal makes it work consistently with tuples, case classes, and plain numeric types. The compiler can error when a result is guaranteed to be false due to mismatched types. It would be consistent with Valhalla. I don’t have an opinion on what to do with List(1) == Vector(1), that is more collection design than language.

For use cases where we want to compare across types in a cooperative way (perhaps the DSL / worksheet use case mentioned) one can either provide different methods, or use an import to switch the behavior. Or perhaps there are better ideas.

equals is consistent with == This leaves the definition for == as short-hand for null-safe equals – an equivalence relation – consistent with Ordering for a type. The consequence is that NaN == NaN and the default behavior is conformant to use of values as collection keys. Every other option I thought of was just far more inconsistent overall. Give up on NaN != NaN and the rules are clean and consistent. Otherwise you have to carve out an exception for floating point numbers and have collections avoid using == in some cases, or make equals inconsistent with ==.

Combined, these two rules would make it much simpler to extend numeric types, and add things like UnsignedInt – there is no quadratic explosion of complexity if equivalence is not cooperative.


#71

kudos for making the table, which really is the most important thing on this thread
imo the desired outcome would be for those samples to not compile, but the the other result is okay too
i’m sure this would not have much impact even on unityped programs (contentious NaN cases included)


#72

isn’t this trivially implemented by not exposing .equals and requiring a Equal typeclass and a === op? as per scalaz.


#73

That will requires each of collection stores the Equal typeclass, which is impossible for java.util.* collections.


#74

That’s not required. Scalaz doesn’t do this, neither do any collections in Haskell. You just have Equal[A] => Equal[List[A]], etc.


#75

That only properly works when you have globally-unique type class instances, as in Haskell. Otherwise, you may run into inconsistencies if different instances are picked up in different contexts for the same data structure (think of a Set, which needs Equal as part of its semantics).


#76

Storing the instances still doesn’t work if you don’t have globally unique type class instances, because there is no mechanism for testing instances for equality.

Edit: To make this clear. Say you add two lists together with different notions of equality. What is the notion of equality of the output list?


#77

To be clear: I think that none of these two naive approaches are good solutions to the problem. So even if we were willing to go that way and base the language on it, we’d need something better.
Anyway, this discussion is probably out of topic: it seems to me that equality as a method in Scala is not going anywhere –– removing it would probably break most code in existence… It’s not like we can just suddenly turn Scala into Scalaz.


#78

There is a very interesting survey about primitive numeric types equality run by Brian Goetz that @odersky and others may want to have a look. It gives some insights on how Java developers treat numbers, which can be useful data to weigh in for this proposal. In Brian’s words:

This poll told us that: the Stockholm Syndrome is so strong that, when
given the option to treat boxed numbers as numbers, rather than
instances of accidental boxes, 85% chose the latter.


#79

Very useful to know that, thanks! We have not considered the implication of (non)cooperative equality on pattern matching, but we intend to change it we need to take this into account.


#80

Isn’t pattern matching on Any part of the problem. A pattern match on an Int makes no sense to me. An Int only ever has one pattern. You can match /switch on an Int and you can match /switch on the Int component of a compound match case, but the Int itself is always an Atomic or Leaf Value. Therefore a match on an Int should not share the same syntax as a match on an AnyRef, or the same syntax as Compound value types (Scala native Structs).