Proposal: Make Set covariant

I’ve just made a PR that attempts to make Set covariant: https://github.com/scala/scala/pull/6728, if this is something you care strongly about, feel free to weight in with your opinion.

A Set in Scala, is something that can

  • be iterated: iterator: Iterator[A] (covariant);
  • test membership: contains(a: A): Boolean (contravariant).

So a Set should be invariant.

Also, a set implies an equivalence relation on its type parameter eq(a: A, b: A): Boolean to test distinctness. A better design would be using a cats.Eq[A] typeclass (or Hash[A] for HashSets; Order[A] for TreeSets) to specify its equivalence relation.

1 Like

To be fair, the same things are true of Seq.

1 Like

Membership testing is different for Set vs Seq.

The signature of Seq's contains method is

def contains[B >: A](b: B): Boolean

This is actually

seq.exists(a => a == b)

where the == relation is implicitly defined because that B >: A and because of the existence of the universal equality relation endowed on supertype B.

This is in stark contrast with Set, where the distinctness of the elements in the set is defined in the set itself. Note that in SortedSet[A], the distinctness is defined within the Ordering[A] of that object instead of universal equality.

scala> SortedSet("a", "b", "c")(Ordering.by(_.length)).size
res3: Int = 1

Consider the following:
Set[A] implicitly requires Eq[A]
HashSet[A] <: Set[A] implicitly requires Hash[A] <: Eq[A]
SortedSet[A] <: Set[A] implicitly requires Order[A] <: Eq[A]

None of these sets should be covariant.

3 Likes

I recall an example from very old (and partially obsolete) video of Paul Phillips. For now, invariant Sets do not allow to do meaningless checks like

def f(intSet: Set[Int]) = intSet contains "your mom" // doesn't compile :-)

But Seq’s contains with B >: A allows this:

def g(intSeq: Seq[Int]) = intSeq contains "your mom" // compiles :-( 

By the way, @smarter, as far as I understand, your pull request also makes such meaningless expressions compile.

2 Likes

I believe it does. And that being the case, the type parameter B serves no purpose; the signature could just as well be def contains(value: Any): Boolean.

1 Like

Sure, but that’s not a reason for Seq and Set to have different variance. That’s a reason to make Seq invariant as well. Or to implement contains as an extension method, but that’s also far from ideal.

Passing anything into contains that is not of type A is almost certainly a bug that I want the compiler to capture before runtime.

Seq[A]#contains is fundamentally different from Set[A]#contains.

Seq[A]#contains could be understood as

def contains[B >: A](b: B)(implicit B: Eq[B]) = this.exists(a => B.eqv(a, b))

where the equivalence relation Eq[B] is attached to the supertype B, not the internal element type A.

However, since Set[A] inherently contains the capability of deciding whether two elements are equivalent (universal equivalence, or by Ordering[A] as in SortedSet[A]), the contains method is just

def contains(a: A) // using internal Eq[A] to check distinctness

where the Eq[A] object could be considered as within the set (compare ordering: Ordering[A] in SortedSet[A]).

1 Like

But this implies non-covariancy, doesn’t it?

Maybe, we are having wrong abstractions? Maybe, it’s better to have sets for enumerating (iterating, if you like) and sets for existence checks to be distinguished types with appropriate variancy each?

This issue is much deeper and harder.

For instance, how do you compare two Sets with different equality definition?

Imagine, you compare Sets like sets of objects (considering, you have fail-prone universal equality, maybe). If so, it would imply that s1.size != s2.size would imply s1 != s2.

Consider now two Sets s1 == s2. It seems natural that for every normally-typed a s1 + a == s2 + a. But it is not depending on whether internally stored equality (or ordering) of these two sets (at runtime!) differ or not — at least, because sizes after addition can differ depending on the equality or ordering.

All that goes to a thought that again we have wrong (not precise enough) abstraction. In particular, that Set must have to have somewhat Eq at its type (and SortedSet must have Ordering at its type too), i.e. something like Set[A, E <: Eq[A]] and SortedSet[A, E <: Ordering[A]] or so.

2 Likes

I completely agree. I’m arguing for invariance.

Set[A] should be dependent on some Eq[-A] instance. Whether to encode this as a type or as a runtime value is dependent on how precise we want the types to be.

We should explore making contains an extension method for both sets and sequences. Covariance is nice, but detecting non-sensical contains expressions is also nice.

Btw: What about maps?

3 Likes

I object making contains an extension method for Set. It is the essential operation of sets.

Making Seq#contains an extension method is good, as it is not part of the core definition of a sequence.

If we do Map[K, +V] (invariance on K) then I think it naturally entails that Set[K] is invariant on K. Unless we want to do Map[+K, +V], not <: (K => V)?

How would that work? I remember trying to define a contains extension method for a covariant set-like data type. I tried all kinds of tricks (such as various implicit evidence schemes) to prevent it from upcasting the type argument on application, but did not find any way to do it – which is actually a testament to the completeness of the compiler’s type inference.

EDIT: as pointed out by @Jasper-M it turns out that this works and is probably a good solution to the problem.

Maybe I’m misunderstanding you, but what doesn’t work about:

implicit class HasSyntax[A](private val self: Seq[A]) extends AnyVal {
  def has(a: A): Boolean = self.contains(a )
}
scala> List(1,2,3).contains("a")
res9: Boolean = false

scala> List(1,2,3).has("a")
                       ^
       error: type mismatch;
        found   : String("a")
        required: Int
2 Likes

I can image that @LPTK does not like that self must have a suitable contains method that can be called by the extension ops:

trait Container[+A] {
  private[Container] def permissiveContains[B >: A](b: B): Boolean
}

object Container {

  implicit class ContainerOps[A](val container: Container[A]) {
    def contains(a: A): Boolean = container.permissiveContains(a)
  }

}

Hmm… strange. I must have misremembered my old experiments, then. Please disregard my objection :slight_smile:

1 Like

I checked. The no-covariance trick works in Scala 2 but not in Dotty, so probably @LPTK did the experiments in Dotty. Dotty’s type inference is more global, so that it can handle the case. This is good in principle, but it does deprive us of the trick in question.

1 Like

There is a way that seems to work in both. I don’t like it, but still.

implicit class HasSyntax[A](val s: Seq[A]) extends AnyVal {
  def has[B](a: B)(implicit ev: B <:< A): Boolean = s contains a
}

So, we’ve almost repeated that Seq is covariant by using <:< here.

Then those work well:

def u[A](s: Seq[A], a: A) = s has a // compiles :-)
def v(s: Seq[Int]) = s has 5 // compiles :-)

And even artificially introduced method-like covariancy works:

trait A
trait B extends A
  
def v(s: Seq[A], b: B) = s has b // compiles :-)

And this does not compile both in Scala 2 and Dotty:

def w(s: Seq[Int]) = s has "your mom" // does not compile :-)

But all this stuff that can be done only through extension methods (which seems to be not intended for this naturally) looks really weird. Isn’t there any way to set type inference bounds during declaration of methods/arguments? It looks like those cheats we are doing through extension methods should be done when declaring a class (but maybe I have the wrong feeling).

1 Like

I believe Kotlin has an escape hatch that allows the programmer to say “No really, allow this use of the type paramter in a variance incorrect location.” I’m not really a fan of this approach (basically break the type system to allow something that is incorrect, but “useful”). But if we are able to make it happen with extension methods anyway then maybe it’s a reasonable thing to do.

(PS: I was unable to find documentation of the Kotlin escape hatch, but I specifically remember hearing about it in a talk.)