Can we improve `CanEqual`?

If I know how to compare instances of A, and I have an instance a of A and an instance ab of A | B
It would make sense to be able to check if a is ab
Even more so in cases where A is a case of an enum “A | B”, since in those cases, I would expect pattern matching to work (and pattern matching to be re-writable as ifs)

This has caused issues in the past: How to improve strictEquality? Implementation of SIP-67: better pattern matching with `strictEquality` · Issue #22732 · scala/scala3 · GitHub
And still leads to issues: Pre-SIP: relaxed `strictEquality` for `==` and `!=`

And it feels like it will keep going, what if I don’t want to use Enums, but want to use a type alias of a union instead ?

The obvious solution is to do:

given CanEqual[A, A | B] = ???

And this does “solve” the issue, in the sense that now a and ab can be compared, but it also allows a to be compared against instance b of B, which have nothing to do with each other !
(b is widened to A | B, and then given resolution succeeds)

But there’s a trick we can do, if we check for both CanEqual and subtyping, we get what we want:

  1. a can be compared to ab
  2. a cannot be compared to b
  3. a cannot be compared to instances of other supertypes like Any
  4. ab cannot be compared to itself
    Scastie - An interactive playground for Scala.

The downside being, we lose transitivity: we can have situations where a == ab and ab == b are valid, but a == b is not.
I’m not sure this is actually a loss ?
We definitely cannot call it “multiversal” equality anymore though

There might definitely be a better way to do this, and I encourage you to tell me, and point any issues with this proposal
But I do think we should rethink CanEqual to try to make it easier to use without sacrificing safety

1 Like

I recall there was an issue.

2 Likes

Transitivity is really, really important for equality. Pretty much every time someone breaks it, it is a source of unending headaches. For instance, whenever people write longs as floats, if the comparison is done as long.toDouble == double, you have the infuriating

d == l1
d == l2
l1 != l2

which causes all sorts of problems with keys and so on. It’s a rare problem because you are rarely in a regime where this matters, but it is a problem; when you end up touching a case where this matters, bugs are very common.

If this becomes just part of the standard for CanEquals, I expect way more problems.

Furthermore, the soundness is questionable.

val a: A = someA()
val b: B = someB()
def test(a: A, ab: A | B): Boolean = a == ab
test(a, b)

Now your trick is thwarted.

Furthermore, it won’t necessarily even be false! test might be true if the two things are opaque types of things that could be compared. For instance, A might wrap lists, while B wraps Vectors, and the entire point of using opaque types is that you want B to be inscrutable. But now not only is the inscrutability broken, you find that the inscrutable B is the same as the differently-typed A.

So I doubt that this is an improvement. Making people implement CanEqual[A, A|B] encourages them to think through the logic–they could do it wrongly or confusingly, still, but at least they have a chance to decide what they want (and the compiler may help out by complaining about non-exhaustive matches or abstract type member).

1 Like

I disagree !
It is up to test to guarantee good properties through its interface !
And we might have a case like:

val abs = [A(), B(), B(), A()]
val filtered = abs.filter(test) // we want `B`s to sometimes be passed !

Note that the proposed approach adds a requirement, and doesn’t remove any, so people would still have to implement CanEqual[A, A|B] themselves !
(I do think we should provide it by default for case classes and enums however)

What I thought before testing:

But you make very good points about transitivity !
So maybe the solution is to simply be a lot more clear about transitivity and “multiversality” in the docs, something like:
To ensure transitivity is kept, the set of available CanEqual instances create little “universes” where any two things can be compared together:

given CanEqual[Cat, Mammal] = ??? // Cat and Mammal are in the same universe
given CanEqual[Human, Mammal] = ??? // Human and Mammal are in the same universe

// Therefore, Cat and Human are also in the same universe, and can be compared:
human == cat // works !

After testing:

You still do make some very good points about transitivity, sadly it’s already broken:

import scala.language.strictEquality

trait Mammal
trait Cat
trait Human

given CanEqual[Cat, Mammal] = ??? 
given CanEqual[Mammal, Cat] = ???

given CanEqual[Human, Mammal] = ???
given CanEqual[Mammal, Human] = ???

human == mammal
mammal == cat
human == cat // error Values of types Playground.Human and Playground.Cat cannot be compared with == or !=

So to me the feature seems fundamentally confusing, it has a clear mechanism on paper, but when combined with widening and other things it just leads to confusing results !

Therefore I think one of two things should be done:

  1. Make it truly multiversal (it does come from multiverse, right ?)
  2. Accept non-transitivity, and make it less clunky to use with enum-like constructs, including user-defined ones (for example with the initial proposal, but I would take anything else)

That’s a good point about the lack of transitivity already.

I guess it depends on what CanEqual is supposed to model. Is it there for efficiency/catching-silly-mistakes? Or is it there for correctness in unusual situations? Or is it for declaring the set on which an ordinary equivalence relation works?

Equivalence relations are lovely and intuitive to work with. But CanEqual doesn’t seem to be about equivalence relations in the mathematical sense, because a relation R is defined on a set and for every any a, b in the set, a R b is defined.

But that’s not true with CanEqual!

scala> object O:
     |   opaque type A = Int
     |   opaque type B = Long
     |   given CanEqual[A, A] = CanEqual.derived
     |   given CanEqual[B, B] = CanEqual.derived
     |   given CanEqual[A, B] = CanEqual.derived
     |   val a: A = 7
     |   val b: B = 8
     | 
// defined object O
                                                                                
scala> object P:
     |   import scala.language.strictEquality
     |   import CanEqual.given
     |   import O.{a, b}
     |   val aisa = a == a
     |   val bisb = b == b
     |   val aisb = a == b
     |   val bisa = b == a
     | 
-- [E172] Type Error: ----------------------------------------------------------
8 |  val bisa = b == a
  |             ^^^^^^
  |             Values of types O.B and O.A cannot be compared with == or !=
1 error found

So you have the bizarre situation that a == b is valid (and could be true) but you can’t even ask about b == a. Indeed, there is no point to having two type parameters unless you aren’t actually trying to establish an equivalence relation. You only need one parameter for that: CanEqual[S].

I had originally interpreted the second parameter as a poor approximation to writing CanEqual on the type union, but now I’m not so sure. I think the point of CanEqual is to use == for things that potentially aren’t equivalence relations.

For instance, technically it just doesn’t work to say 5.C == tempRecord where tempRecord is some class, if you store 5.C as an opaque type on an Int or Double. But you can write a working tempRecord == 5.C. And CanEqual allows you to express this: given CanEqual[TempRecord, Celsius] = CanEqual.derived, but without the corresponding CanEqual[Celsius, TempRecord].

So I think it is in the nature of CanEqual to take you out of simple and intuitive equivalence-relation land unless you write CanEqual[Q, Q]. Instead, CanEqual is its own unruly beast, because for good or ill, JVM equality is an unruly beast, and CanEqual needs to capture that. How unexpected things are depend on your discipline in using the beast, not because you cannot express unexpected things.

If we’re already in a regime where (a: A) == (a: (A | B)) could be valid but (a: (A| B)) == (a: A) could be invalid, which is exactly what CanEqual[A, A | B] (alone) expresses, so we lose symmetry, we can’t complain too much about losing transitivity. If we wanted it to be nice, we would have written CanEqual[A | B, A | B].

But the feature you’re asking for is basically

def check[C <: (A | B)](a: A, ab: C)(using A <:< C)

scala> def check[C <: (O.A | O.B)](a: O.A, ab: C)(using O.A <:< C) = true
def check[C <: O.A | O.B](a: O.A, ab: C)(using x$3: O.A <:< C): Boolean
                                                                                
scala> val ab: (O.A | O.B) = O.b
val ab: O.A | O.B = 8
                                                                                
scala> check(O.a, O.a)
val res1: Boolean = true
                                                                                
scala> check(O.a, O.b)
-- [E172] Type Error: ----------------------------------------------------------
1 |check(O.a, O.b)
  |               ^
  |               Cannot prove that O.A <:< O.B.
1 error found
                                                                                
scala> check(O.a, ab)
val res2: Boolean = true

But we can want either behavior. For instance, we might need asymmetric equals, but for convenience we want to do the test at runtime rather than having different compile-time paths, like we do with x: (1 | 2 | 3) compared to Int. Regular widening makes sense sometimes; excluding nonoverlapping sets also makes sense sometimes.

So I think that rather than stating a rule, we actually would need another trait of some sort to specify which behavior we wanted. I think the best would probably be a trait like Between[A, A | B] instead of A | B, where that type is an instruction to the compiler that when widening, it has to bind the type first and then check (as in the example check method), but when using it, it’s just A | B. That is, something like

opaque type Betweener[X, Y >: X] = Unit
object Betweener:
  inline def apply[X, Y >: X]: Betweener[X, Y >: X] = ()
  extension [X, Y >: X](bt: Betweener[X, Y >: X])
    inline def eval[Z >: X](z: Z)(using lt: Z <:< Y) = lt(z)

There may be some way to generalize this to be more useful, but acting as if the widening was done via Betweener (and failing in the cases it wouldn’t compile) ought to do the trick.

Without an opt-in like this, I’m nervous about any default behavior that is less obvious than a different default, but prevents the obvious behavior from being had at all.

(Edit: note that the justification document does not make these points, but rather discusses how to encode contains on collections. However, if you ignore the original rationale and just look at the consequences of what we’ve got, I think we have to either embrace the non-equivalence-relationness of CanEqual, or we need to migrate from CanEqual to something else that enforces the desirable properties. In particular, the key observation in the document that the CanEqual1[Any] fallback is always available could be resolved by making it not always available via some mechanism, for instance by having Eql[T] <: CanEql[T], where Eql[T] is what you use for tight type bounds but CanEql[T] is what has the Any fallback.)