Type-safe contains

eed3si9n · September 21, 2018, 5:51pm

background

List’s contains accepts arbitrary type into the parameter, which has bitten many of us over the years.
https://twitter.com/mpilquist/status/1042954337222369281

Let’s see Seth’s question from 2010. Why does Seq.contains take an Any argument, instead of an argument of the sequence type?

scala> val l = List(1,2,3,4)
l: List[Int] = List(1, 2, 3, 4)

scala> l.contains("wat")
res3: Boolean = false

2.13.x definition of contains method looks like this: scala/src/library/scala/collection/Seq.scala at v2.13.0-M5 · scala/scala · GitHub

  def contains[A1 >: A](elem: A1): Boolean = exists (_ == elem)

If we read the code’s intent, this should in theory prevent "wat" from getting accepted because "wat" does not satisfy String >: Int test.

analysis

The common explanation for this phenomena is because Seq is covariant. I think it’s a misleading argument since if we supply the type parameter A1 would actually do the job.

scala> l.contains[String]("wat")
                 ^
       error: type arguments [String] do not conform to method contains's type parameter bounds [A1 >: Int]

To me, this is a demonstration that the bug is actually in the type inferencer calculating least upper bound (LUB) too early. To quote Daniel Spiewak’s analysis from 2010:

The problem here is that we are calling the contains method passing a value of type String. Scala sees this and tries to infer a type for B which is a supertype of both String and A , which in this case is instantiated as Int . The LUB for these two types is Any (as shown earlier), and so the local type instantiation for contains will be Any => Boolean . Thus, the contains method appears to not be type safe.

(Ed. I flipped String and Int to match our example)

In other words, we can retain the current covariant collection design, but maybe by fixing the type inference, we can try to fix this thing.

scalafix-noinfer

As a desperate workaround at the tooling layer I’ve started looking into linting tool trying to catch bad type inference. See stricter Scala with -Xlint, -Xfatal-warnings, and Scalafix. But it would be better if we didn’t have to do this.

Paul Phillips knew how to fix this in 2012

See scare-quoted “type-safe” contains method for covariant collections.

class MyCovariantCollection[+T, +CC[X] <: Seq[X]](xs: CC[T]) {
  // Oops, our "evidence" seems to be useless, as what we want
  // to know is that T1 <:< T, but of course we can't do that because
  // of the variance position.
  def contains[T1](x: T1)(implicit ev: T <:< T1): Boolean = xs contains x
}

Let’s try this out.

scala> val xs = new MyCovariantCollection(List(1, 2, 3))
xs: MyCovariantCollection[Int,List] = MyCovariantCollection@177ad340

scala> xs.contains("wat")
                  ^
       error: Cannot prove that Int <:< String.

It works!

Paul provides the explanation as follows:

// But wait! Unspecified implementation vagaries to the rescue!
object Test extends App {
  val xs = new MyCovariantCollection(List(1, 2, 3))
  assert(xs contains "abc", "I'm sure it's in there somewhere")
}

// Thank-you type inference algorithm! Pick the most specific
// type while resolving the first parameter list without regard
// for the fact that we are doomed once you see the second
// parameter list.  That's what we want today.

Could we please use this trick for contains?

soronpo · September 21, 2018, 6:56pm

The error message should be improved, IMO (using another implicit with @implicitNotFound)

joshlemer · September 21, 2018, 7:17pm

@eed3si9n You should open a ticket for this in www.github.com/scala/bug as well, I think

eed3si9n · September 21, 2018, 8:09pm

There’s List(1,2,3,4).contains("hello") compiles · Issue #10831 · scala/bug · GitHub.

edmundnoble · September 21, 2018, 8:18pm

While I certainly sympathize with the motivation behind this, I can’t agree that this is a real solution or really anything but a hack. This was stated by Paul as well. List[+A] means that if I have a List[A], I must be able to treat it as a List[B] if A <: B. And that’s what contains currently does. This is explicitly broken by changing contains; it’s just pretending. The real problem still exists. The problem is covariance itself. You’re just making a syntactic adjustment. By merely slightly refactoring code, you can break this solution trivially, by making code create a List[Any] from a List[A], because I am fundamentally allowed to treat any List[A] as a List[Any]. If you think contains is broken right now, then you have nowhere to point a finger but to covariance. You cannot have your cake and eat it too; this is another special-case in the stdlib which relies on fragile and arguably broken behavior, is code that no user would write themselves (and no user should have to write themselves), and doesn’t fix the underlying issue, which can be uncovered (and will be uncovered) by users by accident during refactoring.

What I think would start us on a path to a real solution would be a form of “parametric top” as has been previously suggested; that would actually exclude Any. But that doesn’t solve everything, and can’t, because bad bounds are intrinsic to subtyping (think about Product, Serializable, user marker traits, etc.).

alexknvl · September 21, 2018, 8:55pm

You can not implement def contains[B >: A](a: A): Boolean on a List[A] if Any is parametric, so there would not be this problem in the first place.

You would have to supply A in a negative position somehow, like def find(f: A => Boolean): Boolean or def contains[B >: A : Eq](b: B): Boolean, both of which would not lead to the issue above.

Subtyping and variance are not fundamentally broken. Having seemingly broken contains is a direct consequence of having a non-parametric bound on both Int and String.

trait NonParametricTop {
  def equal(b : NonParametricTop): Boolean
}

sealed abstract class List[+A <: NonParametricTop] {
  def contains[B >: A <: NonParametricTop](x: B): Boolean = this match {
    case Nil => false
    // There is no way to implement this line 
    // without non-parametric top type:
    case h :: t => h.equal(x) || t.contains(x)
  }
  
  def :: [B >: A <: NonParametricTop](h : B): List[B] = new ::[B](h, this)
}
final case object Nil extends List[Nothing]
final case class ::[A <: NonParametricTop](head: A, tail: List[A]) extends List[A]

eed3si9n · September 21, 2018, 9:02pm

Yes, people tend to focus too much on Any and AnyVal, but the real life contains problem I’ve ran into has been more like:

scala> case class Address()
defined class Address

scala> List(Address()).contains("address")
res0: Boolean = false

in this case java.io.Serializable.

If we supply the type parameter here:

scala> List(Address()).contains[String]("address")
                               ^
       error: type arguments [String] do not conform to method contains's type parameter bounds [A1 >: Address]

This will go through A1 >: Address check, so I don’t think it’s about the type of A being +A. Because if that was the case List[Address] should’ve moved to List[java.io.Serializable] even given contains[String].

The problem is that A1 the type parameter for contains method is compromised to the LUB of Address and String before we can perform A1 >: Address test.

edmundnoble · September 21, 2018, 9:08pm

Reducing it to an issue of timing, as in the type inference doing things at the wrong time, is incorrect because as I said, I must be able to treat List[A] as List[B] if A <: B. This is what covariance is. Where type inference happens is immaterial, because if I call (l: List[Any]).contains(x), I immediately have the same problem, and that type ascription is implicitly applied all the time by type inference doing its job (correctly).

Look at it more generally: just because the code List(1, 2, 3).contains("hi") doesn’t compile, doesn’t mean you’ve fixed the problem. The problem underlies this. The problem is this:

def doIt[A](l: List[A], a: A): Boolean = l.contains(a)
doIt(List(1, 2, 3), "hi")

The problem is this, too:

def doIt[A, B <: List[A]](l: B, a: A): Boolean = l.contains(a)
doIt(List(1, 2, 3), "hi")

And also this:

def doIt[A](l: List[A], a: A): Boolean = l.contains(a)
val l = List(1, 2, 3)
doIt(l, "hi")

The problem is not the single line of code List(1, 2, 3).contains("hi"); that’s the only reason why I can think of type inference seeming like it’s the problem. I think we’re missing the forest for the trees; just because this is a nicely succinct example of the problem, doesn’t mean that it’s the problem. The solution to this problem is not arrived at by being able to tell the user “technically, that line of code is fixed”, but by fixing the conceptual issue.

LPTK · September 21, 2018, 9:25pm

There is prior discussions on this very issue there: Proposal: Make Set covariant

I agree with @alexknvl that the issue is due to the fact that contains currently assumes it can compare anything for equality – which it shouldn’t, even though lack of parametricity on the JVM happens to allow it.

For what it’s worth, I think the multiversal equality proposal for Dotty could fix this, as I have already discussed there: https://github.com/scala/collection-strawman/issues/100#issue-233863714

Of course, a more traditional and rigid Eq[T] type class would also work, but is probably off the table for the foreseeable future of Scala.

eed3si9n · September 21, 2018, 9:39pm

I think that shifts the problem to another place since you’re unifying both the types first. Here’s the fix:

scala> def doIt[A, B](l: List[A], b: B)(implicit ev: A <:< B): Boolean = l.contains(b)
doIt: [A, B](l: List[A], b: B)(implicit ev: A <:< B)Boolean

scala> doIt(List(1, 2, 3), "hi")
           ^
       error: Cannot prove that Int <:< String.

SethTisue · September 21, 2018, 9:41pm

Can we make something like .invariant.contains(...) work?

edmundnoble · September 21, 2018, 9:46pm

def doIt[A](l: List[A], a: A): Boolean = l.invariant.contains(a)

doIt(List(1, 2, 3), "hi")
// compiles

edmundnoble · September 21, 2018, 9:48pm

A <:< B is not the same as A <: B, that’s the issue here. You’re skirting around the problem, but it very much still exists, and the only reason this works is A <:< B is not considered by the compiler during subtyping checks.

If instead of having A <:< B, you have two concrete types A and B, the problem still exists. These are still specific instances of the problem, not the problem itself; I don’t really know how else to say this, though. Applying this to user code also requires replacing every instance of A <: B with A <:< B, which is not in general acceptable; <: is very useful.

We cannot simply say that “A <: B, except when it doesn’t make sense”. We have to do it when B is introduced, via some form of evidence, as @LPTK suggests.

ghik · September 21, 2018, 10:37pm

That can be done easily by making contains an extension method instead of regular method:

trait List[+A] {
  private def unsafeContains(any: Any): Boolean
}
object List {
  implicit class ListOps[A](list: List[A]) {
    def contains(value: A): Boolean = list.unsafeContains(value)
  }
}

Now, List(1,2,3).contains("hi") won’t compile because covariant List[Int] will be lifted into invariant ListOps[Int] before typechecking based on argument type.

However, this still doesn’t solve the fundamental problem - snippets given by @edmundnoble will still compile.

SethTisue · September 22, 2018, 12:44am

thanks @LPTK for the pointer to what looks like a promising longer-term solution.

I’m also interested in hearing about shorter-term partial solutions even if they only disarm the footgun in some situations. Do Scalafix (or wartremover) rules already exist in this area?

@mpilquist’s original example already warns under -Xlint, but I’m wondering whether other linters can already catch things like List(Nil).contains(Some(1)), which scalac doesn’t warn on.

eed3si9n · September 22, 2018, 2:08am

Yes. Wartremover has a specific rules (“warts”) against inferred Any, AnyVal, JavaSerializable, and Product , which in combination can catch a lot of these. (Though as discussed above, the general problem exists for any common traits).

For Scalafix 0.8.0-RC1 that just came out I wrote a rule yesterday called NoInfer to catch specific inference, and blogged about it as stricter Scala with -Xlint, -Xfatal-warnings, and Scalafix.

[info] Running scalafix on 2 Scala sources
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:7:3: error: [NoInfer.Serializable] Serializable was inferred, but it's forbidden by NoInfer
[error]   List(Animal()).contains("1")
[error]   ^^^^^^^^^^^^^^^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:8:3: error: [NoInfer.any2stringadd] any2stringadd was inferred, but it's forbidden by NoInfer
[error]   Option(1) + "what"
[error]   ^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:9:3: error: [NoInfer.Product] Product was inferred, but it's forbidden by NoInfer
[error]   List(Nil).contains(Some(1))
[error]   ^^^^^^^^^^^^^^^^^^
[error] /Users/eed3si9n/work/quicktest/noinfer/Main.scala:9:3: error: [NoInfer.Serializable] Serializable was inferred, but it's forbidden by NoInfer
[error]   List(Nil).contains(Some(1))
[error]   ^^^^^^^^^^^^^^^^^^
[error] (Compile / scalafix) scalafix.sbt.ScalafixFailed: LinterError

In terms of the short-term, reducing foot-guns, I think Paul’s (implicit ev: T <:< T1) trick is worth considering.

mpilquist · September 22, 2018, 3:29pm

Note the project in which I encountered this a few days ago uses -Xlint and -Xfatal-warnings. Here’s a more representative example:

The warning isn’t given because it infers Serializable.

hrhino · September 23, 2018, 3:59am

There was a PR to add a flag to warn on inferred Product with Serializable, which maybe could be revived to lint for “not-quite-top” types like those (individually, too). But this is probably a digression.

eed3si9n · September 23, 2018, 4:19am

Here’s a pull request to make Seq(1, 2, 3).contains("wat") not compile.

I initially tried (implicit ev: T <:< T1) trick, but accepting only supertypes of A was not usable.

lihaoyi · September 23, 2018, 5:30am

Fundamentally, this is a broader issue than Seq.contains, and is fundamental to how subtyping works with the Liskov Substitution Principle (LSP).

The basic issue is that when performing a computation using Seq[Any].contains(x: NotFoo) can return either true or false, while Seq[Foo].contains(x: NotFoo) is guaranteed to only return false. And this is entirely expected: Seq[Foo] is a sub-type of Seq[Any], and false is a sub-type of true | false. This is is an inherent property of any computation: having fewer possible inputs results in fewer possible outputs. Usually this is OK, and entirely expected: it’s just that at some point, the number of possible outputs is small enough (1?) that we think there is a programmer error.

Seq.contains isn’t even the only issue in this class. Consider the following:

(x: Option[T]).isEmpty // This is normal
Some(1).isEmpty // This always returns false, just like Seq.contains!
None.isEmpty  // This always returns true

(x: Int) == 1 // This is normal 
2 == 1 // This always returns false
1 == 1 // This always returns true

(x: Seq[Int]).isInstanceOf[List[Int]] // this is typical
(x: Vector[Int]).isInstanceOf[List[Int]] // this always returns false, just like Seq.contains

(x: Any) match{case i: Int => true case _ => false} // this is typical
(x: String) match{case i: Int => true case _ => false} // this always returns false, just like Seq.contains, and is blocked by the compiler, but violates LSP!!!
((x: String): Any) match{case i: Int => true case _ => false} // the fact that the above fails to compile may not sound unreasonable, until you realize that by up-casting to `: Any`, it can compile again!

(x: Any).getClass // this is typical, can return anything
(x: CharSequence).getClass // this is more specific than above, can return fewer things
(x: String).getClass  // this can only ever return classOf[String], 
                     // just like Seq.contains sometimes only ever returns false!

The basic issue is:

LSP allows you to substitute subtypes where-ever a supertype is allowed. This is OK
Since the subtype has fewer possible values than the supertype, the set of possible result values is reduced. This is OK
At some point, the set of possible result values has cardinality 1. Only one possible result can be returned from the computation. This is also “OK”, from the computer’s point of view and from LSP’s point of view, but calls into question why the programmer wants to do such a trivial computation: likely it’s a bug!

This is a fundamental issue with subtyping and the Liskov Substitution Principle.

Covariance itself, is just a side-show: the only reason removing covariance from Seq/Set “fixes” the issue and makes the code not-compile is because removing covariance removes subtyping from Seq[T] and Seq[Any], and as I have shown before equivalent “this always returns a fixed answer” computations can be constructed that do not make use of covariance at all.

A proper fix for this would be to fully acknowledge the limitation of LSP: that while performing a computation using a subtype, the set of possible results is narrower than performing it using the supertype, but in the extreme yields a “trivial” result that can be inferred without even performing the computation at all. And there are many, many more “trivial” results than Seq[Foo].contains(x: NotFoo), such as the examples listed above.

This would apply to all sorts of scenarios and computations broader than the single Seq.contains/Set.contains case brought up in this post, and I would be not in favor of trying to hack some fix for Seq.contains without a proper understanding and generalized fix/mitigation of the broader issue at hand.