Proposal to officialize and generalize name-based pattern matching

sjrd · February 14, 2019, 4:58pm

Hi Scala Community!

This thread is the SIP Committee’s request for comments on a proposal to officialize and generalize name-based pattern matching. You can find all the details here.

Summary

In Scala, pattern matching is based on the notion of extractors: objects with an unapply or unapplySeq method that satisfies certain conditions. In Scala 2, extractors are mostly type-based, in the sense that their signature must conform to certain built-in types: Options, tuples and Seqs. Roughly, the possibilities can be summarized as:

def unapply(x: T): Boolean: extractor for 0 inner value
def unapply(x: T): Option[U]: extractor for 1 inner value
def unapply(x: T): Option[(U1, ..., UN)]: extractor for exactly N inner values
def unapplySeq(x: T): Option[Seq[U]]: extractor for a variable number of values

Subtypes such as Some[U] also work, but for the most part specific built-in types are required. Since Scala 2.11, in a PR, which to this day remains the only real documentation of the feature, a certain form of name-based pattern matching has been supported.

In this proposal, we officialize name-based pattern matching, and in fact use it to entirely supersede the old type-based pattern matching:

Instead of Option, anything with get and isEmpty is valid.
Instead of tuples, anything with _1 through _N is valid.
Instead of Seq, anything with length, apply, drop and toSeq is valid.

Read the documentation linked above for the exact details.

Currently, the implemented feature still has a type-based dependency on Product, but those should also be merged with the name-based system.

Implications

Name-based pattern matching offers the potential to define, when it matters, zero-cost extractors that avoid any allocation, unlike Options of tuples, which can incur several allocations.

The new rules are a superset of the rules of Scala 2, so they will allow all existing extractors to keep working.

Opening this Proposal for discussion by the community to get a wider perspective and use cases.

aborg · February 14, 2019, 5:35pm

Are there plans to allow overriding the generated case class unapply methods too? (Technically we can define them now too, but those are not used during pattern matching in Scala 2.12.x.)

Ichoran · February 14, 2019, 5:51pm

This looks great! Not all the ambiguities are listed, though. I’m assuming that the preference is (1) product, (2) name-based, (3) name-based Seq?

Will case classes no longer be special-cased for pattern matching?

Is it worth adding an explicit null-returning flavor (unapplyOrNull) for performance? You still have to pollute your interface if you want zero-cost classes, because U has to be allocated since it returns the specific S (rather than producing it given an input argument), so you are only zero cost if U is S meaning it has an extraneous get argument that is self-returning.

(Note: couldn’t this be implemented via the new structural types, except that the new structural types aren’t high-enough performance? Certainly not “zero cost”, anyway, if they’re reflective.)

AMatveev · February 14, 2019, 6:05pm

Are there any plan to allow named and default arguments in pattern matching?

It can be rather difficult to use “visitor pattern” when there are an extractor for more than n(3,5,10) inner values.

scalway · February 15, 2019, 11:56am

Name based pattern matching seams to be good way to go. Original proposition does not say what should happen when .get and .isEmpty are available only implicitly. In my opinion this shouldn’t be allowed.

OFF-TOP

I know that this proposal is about something different but still i hope that this idea’ll be considered. There is lot of places where i decide to match case s:Person => instead od case Person(name, _, _, _) because when i’ll change Person class i need also change ALL places where it was matched. if notation like case Person(name = name) could be allowed then there’ll be no such problems with refactoring.

This could be implemented for example like that:

case class Team(manager:Person, name:String)
case class Person(firstName:String, lastName:String)

object Team {
  //for example extracts Person from its own toString implementation 
  def unapplyObj(s:String):Option[Person] = ??? 
  //or we are sure that we always can do that
  //def unapplyObj(s:String):Person = ???
}

val c = Team(Person(firstName = "N", lastName = "L"), "uglyteamname").toString

c match {
    //- bind `name` to `unapplyObj(c).firstName`. 
    //- test if `"L"` is equal to `unapplyObj(c).lastName`.
    // this'll froze what equals means :( 
    // and we really needs to check that types matches here! 
    case Team(person.firstName = name, person.lastName = "L") => whathever(name)
    //or
    //case Team(person = Person(firstName = name, lastName = "L")) => whathever(name)
}

This is off-top. Sorry for that.

jducoeur · February 15, 2019, 1:11pm

Agreed that it’s off-topic, but +1 to the request: this is the desireable solution to one of my more regular pain points in day-to-day Scala programming…

joshlemer · February 15, 2019, 1:44pm

Wow this really clears things up! I’ve been wondering for a while how these UnapplySeqWrappers work scala/src/library/scala/collection/Factory.scala at 2.13.x · scala/scala · GitHub, since they didn’t seem to follow the documented behaviour for pattern matching.

szeiger · February 15, 2019, 2:51pm

We actually had to improve name-based pattern matching for 2.13 to make unapplySeq work efficiently for mutable collections now that Seq is immutable: https://github.com/scala/scala/pull/7068. This wasn’t advertised anywhere because - just like the original 2.11 implementation - name-based extractors are not an officially specified feature yet, but the 2.13 implementation follows the Dotty implementation and the proposed spec.

dwijnand · February 21, 2019, 4:41pm

By the way, it gets weird at arity 1: Arity-1 name-based extractors should work like arity 2+ · Issue #9836 · scala/bug · GitHub.

liufengyun · March 5, 2019, 12:47pm

The arity-1 problem can be avoided if we merge product-pattern to name-based pattern so that _1 will be used without calling get.

Two more comments:

We don’t really need unapplySeq, unapply is enough
We need to support product-seq patterns like (Int, String, Seq[Int]). Fix #3248: support product-seq pattern by liufengyun · Pull Request #5989 · lampepfl/dotty · GitHub

LPTK · March 5, 2019, 1:09pm

Will it be possible to distinguish between an extractor extracting a Seq and an extractor extracting a variable number of elements (which was the purpose of unapplySeq)? If not, this will be a breaking change. When in Scala 2 people wrote case Foo(xs) => where xs was an extracted Seq[A], if Dotty merges the two concepts, won’t xs be typed A?

liufengyun · March 5, 2019, 1:48pm

Yes, it’s possible do it:

object Foo {
  def unapply(x: Any): Option[Seq[Int]] = ???
}

object Bar {
  def unapply(x: Any): Seq[Int] = ???
}

"" match {
  case Foo(xs) => // xs: Seq[Int]
  case Foo(x, y) => // x: Int, y: Int
  case Foo(x, xs: _*) => // x: Int, xs: Seq[Int]
  case Bar(xs: _*) => // xs: Seq[Int]
  case Bar(x) => // x: Int
  case Bar(x, y) => // x: Int, y: Int
  case Bar(x, xs: _*) => // x: Int, xs: Seq[Int]
}

LPTK · March 5, 2019, 1:56pm

Cool. But how do I match Foo with exactly one argument in the sequence? I would have to write case Foo(Seq(x)) => // x : Int, right?

liufengyun · March 5, 2019, 2:18pm

Yes, that seems is the only option. We keep unapplySeq for compatibility with Scala2 to support existing usage.

For all new pattern match on sequences, I assume in most cases programmers can write the result type of unapply to conform to the following type X, instead of wrapping the result in Option[T]:

type X = {
  def lengthCompare(len: Int): Int // or, `def length: Int`
  def apply(i: Int): T1
  def drop(n: Int): scala.Seq[T2]
  def toSeq: scala.Seq[T3]
}

sjrd · March 5, 2019, 3:03pm

That doesn’t seem right. If declaring a variadic extractor with unapply causes extracting 1 versus non-1 arguments not to look the same at call site, something’s wrong. If that’s really how it works, I would push for removing that behavior, and make unapplySeq the only way to define variadic extractors.

liufengyun · March 5, 2019, 4:38pm

tldr; I agree with the conclusion above.

This is only a concern only if programmers write def unapply(x: Any): Option[Seq[String]]. If they write def unapply(x: Any): Seq[String], there is no such concern.

A Github search + random check shows that real-world usage of def unapplySeq: Option[Seq[T]] can be written in an irrefutable way. The following are some examples:

Example 1:

  def unapplySeq(whole: String): Option[Seq[String]] =
    Some(whole.split("\\.").reverse)

Example 2

    def unapplySeq(s: Any): Option[List[String]] =
        NativeUrl.unapplySeq(s) orElse
          HttpsUrl.unapplySeq(s) orElse
          HttpUrl.unapplySeq(s) orElse
          SshUrl.unapplySeq(s)
    }
    args.headOption match {
      case Some(Github(_, _)) => true
      case Some(Local(_))     => true
      case Some(GitUrl(uri))  => uri.mkString("") endsWith (".g8.git")
      case _                  => false
    }

So, if in practice most usage of sequence patterns involves the following signatures:

def unapplySeq(whole: String): Some[Seq[String]]
def unapplySeq(whole: String): Seq[String]

The programmers may wonder, why not make it simpler & more symmetric with apply:

def unapply(whole: String): Seq[String]

That said, there does exist one strong argument for keeping unapplySeq: that is the case with product-seq patterns like Option[(Int, String, Seq[Int])]. Without unapplySeq, the behavior for matching 1 and 2 elements of the sequence are inconsistent, which is annoying.

If we keep both unapply and unapplySeq, I agree it’s better to make their semantics orthogonal, by restricting unapplySeq as the only way to define variadic extractors.

abm · March 6, 2019, 10:57am

I am wondering if there is any strong reason for not allowing implicit parameters on the name-based extractors (isEmpty, get, _1, …).
The targeted use case would be the definition of extractors based on implicitly available typeclasses.

Right now, one can do the following encoding for, e.g., extracting integer fields on any type that implement Numeric:

class IntLike[N](private val n: N)(implicit ev: Numeric[N]) {
  def isEmpty: Boolean = n != ev.fromInt(ev.toInt(n))
  def get: Int = ev.toInt(n)
}
object IntLike {
  def unapply[N: Numeric](n: N): IntLike[N] = new IntLike(n)
}
2.0 match {
  case IntLike(i) => println(s"int: $i")
  case _ => println("Not an int")
}

This works pretty well in the small but has a terrible performance due to the allocation of a new IntLike instance at every match.

The usual trick is to make IntLike an AnyVal. To ensure that we have a single parameter to the constructor, we need to push the ev: Numeric[N] implicit down on all the methods:

class IntLikeNoBox[N](private val n: N) extends AnyVal {
  def isEmpty(implicit ev: Numeric[N]): Boolean = n != ev.fromInt(ev.toInt(n))
  def get(implicit ev: Numeric[N]): Int = ev.toInt(n)
}
object IntLikeNoBox {
  def unapply[N: Numeric](n: N): IntLikeNoBox[N] = new IntLikeNoBox(n)
}

However in 2.12.8, IntLikeNoBox is no longer a valid extractor because it does not provide the def isEmpty: Boolean method:

2.0 match {
  case IntLikeNoBox(i) => println(s"no box: $i")
  // compile error: an unapply result must have a member `def isEmpty: Boolean`
}

I was really surprised by this since, as far as I am aware, every syntactic sugar in scala allows the usage of implicit arguments.
Is there any reason for this irregularity ?

The main use case for me would be the ability to inspect complex data structures in a completely generic way with near optimal performance. Right now, the impact on performance of allocating at each match make this approach inapplicable in practice and one has to either lose genericity or resort to a java-like cascade of if statement.

See https://scastie.scala-lang.org/arthur-bit-monnot/mgvREvNqQFi1PQQszzEwBA/4 for an example of destructuring an arbitray type that implement a Binary Tree type class.
Without the current limitation on implicit arguments, the extractors could be made AnyVal and would result in no allocation at all when destructuring a tree of an arbitrary type.

liufengyun · March 7, 2019, 12:15pm

I like the example, it is a convincing use case.

However, if we relax the rules, there is a worry about ambiguity, as the following example shows:

trait Data

class Test {
  def get: Data = ???
  def get(implicit x: Int): Data = ???
}

There might be more cases like the above, which is a headache for specification, and makes language lawyers cautious.

Another technical problem in Dotty is that pattern match is desugared after type checking: there is no enough context to resolve the implicits.

abm · March 7, 2019, 2:17pm

I would not be overly worried about this kind of ambiguity where two methods only differ by their implicit argument list since such methods can only be called by explicitly providing the implicit parameters.

Here (new Test).get would fail with ambiguous reference. Even with a third method with another type of implicit argument, the second and third method would still be ambiguous regardless of the implicits in scope.
In fact, I cannot see a way to invoke your first get method without introducing a super class that defines it and then casting Test to the super class.

That is to say, I believe the case where we would get such ambiguities should be at most extremely rare if not completely absent. If this ever occur, letting the compilation fail with ambiguous reference seems perfectly sane to me =)