Type-safe contains

Can we make something like .invariant.contains(...) work?

def doIt[A](l: List[A], a: A): Boolean = l.invariant.contains(a)

doIt(List(1, 2, 3), "hi")
// compiles

A <:< B is not the same as A <: B, that’s the issue here. You’re skirting around the problem, but it very much still exists, and the only reason this works is A <:< B is not considered by the compiler during subtyping checks.

If instead of having A <:< B, you have two concrete types A and B, the problem still exists. These are still specific instances of the problem, not the problem itself; I don’t really know how else to say this, though. Applying this to user code also requires replacing every instance of A <: B with A <:< B, which is not in general acceptable; <: is very useful.

We cannot simply say that “A <: B, except when it doesn’t make sense”. We have to do it when B is introduced, via some form of evidence, as @LPTK suggests.

That can be done easily by making contains an extension method instead of regular method:

trait List[+A] {
  private def unsafeContains(any: Any): Boolean
}
object List {
  implicit class ListOps[A](list: List[A]) {
    def contains(value: A): Boolean = list.unsafeContains(value)
  }
}

Now, List(1,2,3).contains("hi") won’t compile because covariant List[Int] will be lifted into invariant ListOps[Int] before typechecking based on argument type.

However, this still doesn’t solve the fundamental problem - snippets given by @edmundnoble will still compile.

1 Like

thanks @LPTK for the pointer to what looks like a promising longer-term solution.

I’m also interested in hearing about shorter-term partial solutions even if they only disarm the footgun in some situations. Do Scalafix (or wartremover) rules already exist in this area?

@mpilquist’s original example already warns under -Xlint, but I’m wondering whether other linters can already catch things like List(Nil).contains(Some(1)), which scalac doesn’t warn on.

Yes. Wartremover has a specific rules (“warts”) against inferred Any, AnyVal, JavaSerializable, and Product , which in combination can catch a lot of these. (Though as discussed above, the general problem exists for any common traits).

For Scalafix 0.8.0-RC1 that just came out I wrote a rule yesterday called NoInfer to catch specific inference, and blogged about it as stricter Scala with -Xlint, -Xfatal-warnings, and Scalafix.

[info] Running scalafix on 2 Scala sources
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:7:3: error: [NoInfer.Serializable] Serializable was inferred, but it's forbidden by NoInfer
[error]   List(Animal()).contains("1")
[error]   ^^^^^^^^^^^^^^^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:8:3: error: [NoInfer.any2stringadd] any2stringadd was inferred, but it's forbidden by NoInfer
[error]   Option(1) + "what"
[error]   ^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:9:3: error: [NoInfer.Product] Product was inferred, but it's forbidden by NoInfer
[error]   List(Nil).contains(Some(1))
[error]   ^^^^^^^^^^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:9:3: error: [NoInfer.Serializable] Serializable was inferred, but it's forbidden by NoInfer
[error]   List(Nil).contains(Some(1))
[error]   ^^^^^^^^^^^^^^^^^^
[error] (Compile / scalafix) scalafix.sbt.ScalafixFailed: LinterError

In terms of the short-term, reducing foot-guns, I think Paul’s (implicit ev: T <:< T1) trick is worth considering.

1 Like

Note the project in which I encountered this a few days ago uses -Xlint and -Xfatal-warnings. Here’s a more representative example:

The warning isn’t given because it infers Serializable.

1 Like

There was a PR to add a flag to warn on inferred Product with Serializable, which maybe could be revived to lint for “not-quite-top” types like those (individually, too). But this is probably a digression.

Here’s a pull request to make Seq(1, 2, 3).contains("wat") not compile.

I initially tried (implicit ev: T <:< T1) trick, but accepting only supertypes of A was not usable.

Fundamentally, this is a broader issue than Seq.contains, and is fundamental to how subtyping works with the Liskov Substitution Principle (LSP).

The basic issue is that when performing a computation using Seq[Any].contains(x: NotFoo) can return either true or false, while Seq[Foo].contains(x: NotFoo) is guaranteed to only return false. And this is entirely expected: Seq[Foo] is a sub-type of Seq[Any], and false is a sub-type of true | false. This is is an inherent property of any computation: having fewer possible inputs results in fewer possible outputs. Usually this is OK, and entirely expected: it’s just that at some point, the number of possible outputs is small enough (1?) that we think there is a programmer error.

Seq.contains isn’t even the only issue in this class. Consider the following:

(x: Option[T]).isEmpty // This is normal
Some(1).isEmpty // This always returns false, just like Seq.contains!
None.isEmpty  // This always returns true

(x: Int) == 1 // This is normal 
2 == 1 // This always returns false
1 == 1 // This always returns true

(x: Seq[Int]).isInstanceOf[List[Int]] // this is typical
(x: Vector[Int]).isInstanceOf[List[Int]] // this always returns false, just like Seq.contains

(x: Any) match{case i: Int => true case _ => false} // this is typical
(x: String) match{case i: Int => true case _ => false} // this always returns false, just like Seq.contains, and is blocked by the compiler, but violates LSP!!!
((x: String): Any) match{case i: Int => true case _ => false} // the fact that the above fails to compile may not sound unreasonable, until you realize that by up-casting to `: Any`, it can compile again!

(x: Any).getClass // this is typical, can return anything
(x: CharSequence).getClass // this is more specific than above, can return fewer things
(x: String).getClass  // this can only ever return classOf[String], 
                     // just like Seq.contains sometimes only ever returns false!

The basic issue is:

  • LSP allows you to substitute subtypes where-ever a supertype is allowed. This is OK

  • Since the subtype has fewer possible values than the supertype, the set of possible result values is reduced. This is OK

  • At some point, the set of possible result values has cardinality 1. Only one possible result can be returned from the computation. This is also “OK”, from the computer’s point of view and from LSP’s point of view, but calls into question why the programmer wants to do such a trivial computation: likely it’s a bug!

This is a fundamental issue with subtyping and the Liskov Substitution Principle.

Covariance itself, is just a side-show: the only reason removing covariance from Seq/Set “fixes” the issue and makes the code not-compile is because removing covariance removes subtyping from Seq[T] and Seq[Any], and as I have shown before equivalent “this always returns a fixed answer” computations can be constructed that do not make use of covariance at all.

A proper fix for this would be to fully acknowledge the limitation of LSP: that while performing a computation using a subtype, the set of possible results is narrower than performing it using the supertype, but in the extreme yields a “trivial” result that can be inferred without even performing the computation at all. And there are many, many more “trivial” results than Seq[Foo].contains(x: NotFoo), such as the examples listed above.

This would apply to all sorts of scenarios and computations broader than the single Seq.contains/Set.contains case brought up in this post, and I would be not in favor of trying to hack some fix for Seq.contains without a proper understanding and generalized fix/mitigation of the broader issue at hand.

8 Likes

Ha, you gravely underestimate my ability to finger-point. I point at the signature of contains, which is not sensible and can’t be implemented unless you have universal equals.

With great power (like having universal equals) comes great responsibility (like not writing the signature of contains as it exists now)

2 Likes

Fair.

I can confirm this. There’s multiple data types in Cats that are covariant and have a contains method, but do not suffer from this issue, because they use Eq.

I agree that the current situation isn’t ideal and could benefit from a temporary fix until Dotty arrives :+1:

3 Likes

I don’t think a type inference trick is a good solution here. For starter, you’d have to make sure it works with Dotty and its improved inference capabilities (which might improve even more in the future). I’ve been thinking about this problem for some time and came up with a different idea. Fundamentally, the issue is that the type signature of a method serves two purposes:

  1. Limit which arguments can be passed to the method
  2. Limit how the passed arguments can be used inside the body of the method

We’d like to write:

def contains(elem: A): Boolean = ...

This is exactly what we want for 1. (I should only be able to check if a List of A contains elements of type A), but this is wrong for 2. (Because of covariance, I cannot assume that I’m actually being passed an A).
So instead we write:

def contains[B >: A](elem: B): Boolean = ...

Which satisfies 2. but not 1, and now we’re stuck.

I think the only way to get out of this conundrum is to add a way to decouple 1. from 2. For example, maybe we could write:

def contains(elem: +A): Boolean = ...

Where +A here would mean: when calling contains, the user must pass a value of type A, but in the body of contains, the type of elem is just an abstract type lower-bounded by A. This allows us to satisfy both 1. and 2.

2 Likes

I think this is a promising direction.

To be specific about this, I think we should allow A and all subtypes of A so:

List(Option(1)).contains(None) // compiles
List(Option(1)).contains(1) // doesn't compile
List(Some(1)).contains(None) // doesn't compile

I think variance notation is confusing enough in Scala to allow such a thing as : +A.

2 Likes

While I agree with you, the moment someone puts +A in their class they’ve already gone into the confusing rabbit hole. If one were to write def contains(elem: A): Boolean, the error message they would get is:

Error:(4, 11) covariant type A occurs in contravariant position in type A of value a

And as we’re seeing here, [A1 >: A] is just a cooler way of writing Any.

2 Likes

Yes, but when people begin with variance, some of their trial-error usages can involve putting +/- in different places (I need to find that @adriaanm quote), and I don’t think we should allow the parser to accept more places for +/-Type to exist.

Here’s Pierce’s definition of type system:

In this light, there are certain things that are incompatible with Liskov Substitution Principle, Scala’s current design, or the fewer-possible-output theory.

To me, the fact that 1 == 1 always returning true, or 1 + 1 always returning 2 is not a problem. The fact that List("abc").contains("abc") always returning true is some sort of logic level concern. However, the fact that 1 == Option(1) compiles is unsound. Sort of like asking someone “how did you kill Fred?”

The implementation of contains is underpinned by this Liskovian dystopia that is universal equality:

  def contains[A1 >: A](elem: A1): Boolean = exists (_ == elem)

This make unsound List(1, 2, 3).contains("wat") possible.

To some degree we need to keep living in the design, but for the safety improvement we would need to break apart from the hegemony of Any.

A known workaround for this that’s been tried by a few including Dotty is introducing Eq typeclass (Multiversal Equality). Typeclasses will select the most specific instance of Eq[A], === operator will behave differently based on lhs and rhs’s type. That would not be kosher according to Barbara either.

1 Like

This is an interesting analysis, I had not thought about it this way :+1: I like the solution too. The only drawback I can see is that it would not interact very well with the prefix types SIP but, given how important this problem is, I would be totally fine with your solution.

Y’know, consider SIPping this :wink: