Beyond Liskov: Type Safe Equality in Scala

lihaoyi · December 15, 2019, 3:02pm

Wrote a blog post that gives a new (as far as I can tell) view on the problem of universal-equality and type-safety in the Scala language

http://www.lihaoyi.com/post/BeyondLiskovTypeSafeEqualityinScala.html

Thought it might be of interest to the people here!

soronpo · December 15, 2019, 6:22pm

I made a related proposal to implement == as extension methods in dotty (in Predef), so they can be opt-out and also overridden via more specific type class equality.

LPTK · December 15, 2019, 6:51pm

Equality in Scala is only difficult because of the historical baggage we inherited from Java, where people cast and instance-check their way around, which is the only way to make a generic Java-compatible equals method work.

But I think that’s backwards. You shouldn’t have to match the runtime class of an Any parameter to be able to implement your equality — this style of programming is an anti-pattern in modern Scala.

Equality should be defined on classes of equatable types, based on their intended core semantics. For example different sequence types can be equated given the elements can be equated (making sure both the order and presence of elements is the same) and different set types too (disregarding any ordering), but a sequence and a set should not be equated, because they do not have comparable core semantics — there is more structure in a sequence.

It is for that reason that I do not think the proposals for Multiversal Equality or an implicit-based invariant Seq.contains are the right way to go: they’re each a relatively-specific hack for a relatively-specific manifestation of this problem, neither solving the core issue nor being based upon any strong theoretical foundation.

An equality type class (such as the one proposed in Multiversal Equality) also solves the contains problem. The current signature of contains is inadequate, as it relies on the assumption that anything can be compared.

Each adds a lot of additional rules to the relatively simple core concept of subtyping, along with their own edge cases, ergonomic issues, and complexity.

There is nothing special about an equality type class. It’s no more difficult than an Ordering one, which we already have.

The end goal of this should be clear: a heuristic, best-effort partial evaluator to help identify trivial snippets of code that are likely to be programmer errors. We should be able to add such functionality mostly by program analysis, without needing invasive changes to the language or core libraries.

I don’t think such a central and important concept as equality should be relegated to a likely-expensive and incomplete set of linter rules. Besides, compilation times are already slow enough.

nafg · December 15, 2019, 6:54pm

The idea that this is part of a broader problem gave me a thought. What if we describe the problem of e.g. “the answer to a == b is trivially known” as "the static type of a == b is the literal type false but your code clearly indicates that it’s handling all Boolean values (for instance it’s used in an if condition). This seems like a formal way to generalize this problem.

The first problem is that currently == doesn’t result in anything narrower than Boolean.

Second, it raises the question of what “clearly indicates” means. Certainly it implies that there shouldn’t be anything wrong with this code:

def doSomethingWithABoolean(parameter: Boolean): Unit = ???
doSomethingWithABoolean(1 == 2)

This code has a different, more general problem: it could be trivially simplified. That sounds like a reasonably thing to lint for. But what about these?

def thirdOfCircle = Math.Pi * 2 / 3
def timeToWaitInMillis = 5 * 60 * 1000

Those could be simplified but I don’t think they should (although an abstraction would be a good idea)

If we could somehow solve these issues then we might improve the quality of a lot of code.

oscar · December 15, 2019, 8:28pm

I’m kind of bummed the == problem wasn’t fixed in a way that didn’t require using def equals at all.

One approach would be that == in scala always looks for an Equiv[A] instance (which is already in the standard library). If you want java .equals(that: Any) use .equals.

By automatically generating Equiv instances (and ordering) for case classes and enums (which I think can be done with deriving), I think the usability is good.

I’m kind of bummed we have this complex two parameter type-class that is only a marker trait, and not really how the function is dispatched… It feels pretty hacky to me, and it would be really nice if we didn’t have to have a hacky solution and could just do the direct and obvious thing.

I guess this means that cats/scalaz === trick which does the above will continue to live on, which is a shame.

lihaoyi · December 15, 2019, 10:33pm

I am sympathetic to the desire for an equality type class, but in general my experience with typeclasses is that they interact poorly with subtyping, covariance, and contravariance that are core to Scala applications. My own typeclasses in e.g. uPickle consciously choose to be invariant to avoid weird inference problems, and AFAIK many of the FP libraries make the same choice: this works well in isolated cases, but I can’t imagine it working ergonomically with something like == which is expected to be well behaved for the open-inheritance-heavy Java/Javascript standard and third-party libraries that Scala leans heavily upon.

You disparage partial evaluation, but there is another name for an incomplete, best-effort partial evaluator: a typechecker! As Naftoli has pointed out, trying to partial evaluate booleans or other constants is nothing more than typechecking with singleton types or constant-folding, which Scala already supports.

“incomplete” is right: we already do not infer singleton false or true types even in cases where it is trivially obvious, and instead infer an unnecessarily wide Boolean. “likely-expensive” is also right: we know such partial evaluation is expensive, and we are already paying the cost every day.

In fact, it seems only three things are required for identifying such problematic “trivially constant” snippets:

The code in question is non-trivial: i.e. not just a single primitive literal. This is easy to check.
The code in question typechecks to a singleton type: we already have the ability to do this.
The code in question is pure: we do not currently have the ability to track this in the compiler

The only thing we need to add is some rudimentary purity analysis to catch problematic cases of ==, and perhaps some annotations @pure to handle things like .contains. That would give us all the safety we wanted, while being a much less invasive change than the other proposals I have seen.

(@pure annotations can lie, but then again so can generic types due to erasure, so in the end it’s a wash)

nafg:

This code has a different, more general problem: it could be trivially simplified. That sounds like a reasonably thing to lint for. But what about these?
def thirdOfCircle = Math.Pi * 2 / 3
def timeToWaitInMillis = 5 * 60 * 1000
Those could be simplified but I don’t think they should (although an abstraction would be a good idea)

If we could somehow solve these issues then we might improve the quality of a lot of code.

What if we required trivially-constant values to be wrapped in a marker method?

def thirdOfCircle = const{ Math.Pi * 2 / 3 }
def timeToWaitInMillis = const{ 5 * 60 * 1000 }

That would be enough to tell the compiler “I know this is a trivial computation, but let me do it”. I’d argue that providing such a marker function and making people use it would be an overall boost to readability overall: as a reader, I can immediately identify which portions of the code compute constants, rather than having to run a constant-folder in my head to try and figure out which snippets of code are going to be optimized away into constants.

This way, I get constant folding within const blocks, no constant folding outside, and both the compiler and programmer are 100% clear where constant folding exists and where it doesn’t. I see this as an absolute win

(This is great discussion, and I’ll add it to the blog post)

smarter · December 16, 2019, 10:50am

I’m interested in your weird inference problems :). The behavior of contravariance in implicit resolution was also changed in dotty to make things like equality typeclasses work better: dotty/compiler/src/dotty/tools/dotc/typer/Applications.scala at 6d3fe1db99c140258511782980ccac60d10b900a · lampepfl/dotty · GitHub

LPTK · December 16, 2019, 12:15pm

I’m not following this bit:

     *  `Cmp[T] <:s Cmp[U]` if `T <: U`. On the other hand, non-variant occurrences
     *  of parameters are not affected. So `T <: U` would imply `Set[Cmp[U]] <:s Set[Cmp[T]]`,
     *  as usual, because `Set` is non-variant.

It does not seem to follow from the definition. Since Set is invariant, flip(Set[Cmp[X]]) should be the same as Set[Cmp[X]], and T <: U would not imply Set[Cmp[U]] <:s Set[Cmp[T]]. Or has there been an undocumented generalization of flip to invariant parameters? (Does not seem to be the case from the implementation.)

smarter · December 16, 2019, 12:31pm

Hmm indeed, I think the comment is wrong here.

Naetmul · December 18, 2019, 6:39am

Interesting.

It reminds me of my years-ago SO question about the [difference between Any.== and Object.equals] and [boxing/unboxing].

…