I’ve just made a PR that attempts to make Set covariant: https://github.com/scala/scala/pull/6728, if this is something you care strongly about, feel free to weight in with your opinion.
A Set
in Scala, is something that can
- be iterated:
iterator: Iterator[A]
(covariant); - test membership:
contains(a: A): Boolean
(contravariant).
So a Set
should be invariant.
Also, a set implies an equivalence relation on its type parameter eq(a: A, b: A): Boolean
to test distinctness. A better design would be using a cats.Eq[A]
typeclass (or Hash[A]
for HashSet
s; Order[A]
for TreeSet
s) to specify its equivalence relation.
To be fair, the same things are true of Seq
.
Membership testing is different for Set
vs Seq
.
The signature of Seq
's contains
method is
def contains[B >: A](b: B): Boolean
This is actually
seq.exists(a => a == b)
where the ==
relation is implicitly defined because that B >: A
and because of the existence of the universal equality relation endowed on supertype B
.
This is in stark contrast with Set
, where the distinctness of the elements in the set is defined in the set itself. Note that in SortedSet[A]
, the distinctness is defined within the Ordering[A]
of that object instead of universal equality.
scala> SortedSet("a", "b", "c")(Ordering.by(_.length)).size
res3: Int = 1
Consider the following:
Set[A]
implicitly requires Eq[A]
HashSet[A] <: Set[A]
implicitly requires Hash[A] <: Eq[A]
SortedSet[A] <: Set[A]
implicitly requires Order[A] <: Eq[A]
None of these sets should be covariant.
I recall an example from very old (and partially obsolete) video of Paul Phillips. For now, invariant Set
s do not allow to do meaningless checks like
def f(intSet: Set[Int]) = intSet contains "your mom" // doesn't compile :-)
But Seq
’s contains
with B >: A
allows this:
def g(intSeq: Seq[Int]) = intSeq contains "your mom" // compiles :-(
By the way, @smarter, as far as I understand, your pull request also makes such meaningless expressions compile.
I believe it does. And that being the case, the type parameter B serves no purpose; the signature could just as well be def contains(value: Any): Boolean
.
Sure, but that’s not a reason for Seq
and Set
to have different variance. That’s a reason to make Seq
invariant as well. Or to implement contains
as an extension method, but that’s also far from ideal.
Passing anything into contains
that is not of type A
is almost certainly a bug that I want the compiler to capture before runtime.
Seq[A]#contains
is fundamentally different from Set[A]#contains
.
Seq[A]#contains
could be understood as
def contains[B >: A](b: B)(implicit B: Eq[B]) = this.exists(a => B.eqv(a, b))
where the equivalence relation Eq[B]
is attached to the supertype B
, not the internal element type A
.
However, since Set[A]
inherently contains the capability of deciding whether two elements are equivalent (universal equivalence, or by Ordering[A]
as in SortedSet[A]
), the contains
method is just
def contains(a: A) // using internal Eq[A] to check distinctness
where the Eq[A]
object could be considered as within the set (compare ordering: Ordering[A]
in SortedSet[A]
).
But this implies non-covariancy, doesn’t it?
Maybe, we are having wrong abstractions? Maybe, it’s better to have sets for enumerating (iterating, if you like) and sets for existence checks to be distinguished types with appropriate variancy each?
This issue is much deeper and harder.
For instance, how do you compare two Set
s with different equality definition?
Imagine, you compare Set
s like sets of objects (considering, you have fail-prone universal equality, maybe). If so, it would imply that s1.size != s2.size
would imply s1 != s2
.
Consider now two Set
s s1 == s2
. It seems natural that for every normally-typed a
s1 + a == s2 + a
. But it is not depending on whether internally stored equality (or ordering) of these two sets (at runtime!) differ or not — at least, because sizes after addition can differ depending on the equality or ordering.
All that goes to a thought that again we have wrong (not precise enough) abstraction. In particular, that Set
must have to have somewhat Eq
at its type (and SortedSet
must have Ordering
at its type too), i.e. something like Set[A, E <: Eq[A]]
and SortedSet[A, E <: Ordering[A]]
or so.
I completely agree. I’m arguing for invariance.
Set[A]
should be dependent on some Eq[-A]
instance. Whether to encode this as a type or as a runtime value is dependent on how precise we want the types to be.
We should explore making contains
an extension method for both sets and sequences. Covariance is nice, but detecting non-sensical contains
expressions is also nice.
Btw: What about maps?
I object making contains
an extension method for Set
. It is the essential operation of sets.
Making Seq#contains
an extension method is good, as it is not part of the core definition of a sequence.
If we do Map[K, +V]
(invariance on K
) then I think it naturally entails that Set[K]
is invariant on K
. Unless we want to do Map[+K, +V]
, not <: (K => V)
?
How would that work? I remember trying to define a contains
extension method for a covariant set-like data type. I tried all kinds of tricks (such as various implicit evidence schemes) to prevent it from upcasting the type argument on application, but did not find any way to do it – which is actually a testament to the completeness of the compiler’s type inference.
EDIT: as pointed out by @Jasper-M it turns out that this works and is probably a good solution to the problem.
Maybe I’m misunderstanding you, but what doesn’t work about:
implicit class HasSyntax[A](private val self: Seq[A]) extends AnyVal {
def has(a: A): Boolean = self.contains(a )
}
scala> List(1,2,3).contains("a")
res9: Boolean = false
scala> List(1,2,3).has("a")
^
error: type mismatch;
found : String("a")
required: Int
I can image that @LPTK does not like that self
must have a suitable contains method that can be called by the extension ops:
trait Container[+A] {
private[Container] def permissiveContains[B >: A](b: B): Boolean
}
object Container {
implicit class ContainerOps[A](val container: Container[A]) {
def contains(a: A): Boolean = container.permissiveContains(a)
}
}
Hmm… strange. I must have misremembered my old experiments, then. Please disregard my objection
I checked. The no-covariance trick works in Scala 2 but not in Dotty, so probably @LPTK did the experiments in Dotty. Dotty’s type inference is more global, so that it can handle the case. This is good in principle, but it does deprive us of the trick in question.
There is a way that seems to work in both. I don’t like it, but still.
implicit class HasSyntax[A](val s: Seq[A]) extends AnyVal {
def has[B](a: B)(implicit ev: B <:< A): Boolean = s contains a
}
So, we’ve almost repeated that Seq
is covariant by using <:<
here.
Then those work well:
def u[A](s: Seq[A], a: A) = s has a // compiles :-)
def v(s: Seq[Int]) = s has 5 // compiles :-)
And even artificially introduced method-like covariancy works:
trait A
trait B extends A
def v(s: Seq[A], b: B) = s has b // compiles :-)
And this does not compile both in Scala 2 and Dotty:
def w(s: Seq[Int]) = s has "your mom" // does not compile :-)
But all this stuff that can be done only through extension methods (which seems to be not intended for this naturally) looks really weird. Isn’t there any way to set type inference bounds during declaration of methods/arguments? It looks like those cheats we are doing through extension methods should be done when declaring a class (but maybe I have the wrong feeling).
I believe Kotlin has an escape hatch that allows the programmer to say “No really, allow this use of the type paramter in a variance incorrect location.” I’m not really a fan of this approach (basically break the type system to allow something that is incorrect, but “useful”). But if we are able to make it happen with extension methods anyway then maybe it’s a reasonable thing to do.
(PS: I was unable to find documentation of the Kotlin escape hatch, but I specifically remember hearing about it in a talk.)