More useful pattern matching

Hey,

Consider this example:

class StripPrefix(prefix: String):
  def unapply(s: String): Option[String] =
    Option.when(s.startsWith(prefix)):
      s.stripPrefix(prefix)

It’s a simple class that can be used to define extractor objects that will match strings that start with a certain prefix and extract the remainder of the string.
It can be used like so:

val StripFoo = StripPrefix("Foo")
val end =
  "Foobar" match
    case StripFoo(x) => x
// `end` is now "bar"

This works, but having to give a name to the pattern every time you want to use StripPrefix is a hassle. I wonder if we can extend the language to somehow allow usage of objects that weren’t previously bound to an identifier as extractors? Do you think it would be useful? I certainly wished this were possible on more than one occasion.

5 Likes
val end = "Foobar" match
  case s"Foo$x" => x

More information you can find here https://github.com/scala/scala/blob/v2.13.14/src/library/scala/StringContext.scala#L95

4 Likes

String interpolation matching works for more simple scenarios but doesnt scale to more complex ones. Why should parameters be limited to literal strings?

Notably, Fsharp has exactly what @mberndt has proposed, called Parametrized Active Patterns Active Patterns - F# | Microsoft Learn

2 Likes

I wished to use a plain function or expression instead of extractor quite often, but some effort would be necessary to invent a syntax which fits well into the pattern matching.

1 Like

A workaround I use where applicable is a given parameter of the extractor, especially in combination with compile-time constants:

object ConstVal:
  opaque type ConstVal[T] <: T = T
  inline given constVal[T]: T =
    scala.compiletime.constValue[T]

import ConstVal.ConstVal


class StripPrefix[S <: String](using prefix: ConstVal[S]):
  def unapply(s: String): Option[String] =
    if s.startsWith(prefix)
    then Some(s.stripPrefix(prefix))
    else None

val end =
  "Foobar" match
    case StripPrefix["Foo"](x) => x

end // is now "bar"
5 Likes

A slightly simplified version:

class StripPrefix(prefix: String):
  def unapply(s: String): Option[String] =
    Option.when(s.startsWith(prefix))(s.stripPrefix(prefix))

 object StripPrefix:
   import scala.compiletime.constValue
   inline def apply[S <: String]: StripPrefix = new StripPrefix(constValue[S])

val end =
  "Foobar" match
    case StripPrefix["Foo"](x) => x

end // is now "bar"

https://scastie.scala-lang.org/AO4IKbKUTZiMULv7wnwXYw

2 Likes

Shorter still:

object StripPrefix:
  def unapply[S <: String : ValueOf](s: String): Option[String] =
    val prefix = valueOf[S]
    Option.when(s.startsWith(prefix))(s.stripPrefix(prefix))

val end = "Foobar" match
  case StripPrefix["Foo"](x) => x
3 Likes

While these examples based on singleton types are clever and cool and everything, I feel they focus too much on the specifics of this particular example, specifically the fact that the value that the extractor is parameterized with happens to be a singleton. So here’s another example: matching Strings with a DateTimeFormatter.

class DateTimeExtractor[A](d: DateTimeFormatter, q: TemporalQuery[A]):
  def unapply(s: String): Option[A] =
    Try(d.parse(s, q)).toOption

val BasicIsoDate =  DateTimeExtractor(
    DateTimeFormatter.BASIC_ISO_DATE,
    TemporalQueries.localDate())

Note that TemporalQueries.localDate() is a method call, so you’re not going to get far with singleton types.

Before the code golf contest rolls on (which is fine, it’s funny!), how would the wished for potential improvement to usability of pattern matching actually look like?

What is the core of the idea?

What I get is @mberndt wants to avoid to have to write something like:

val StripFoo = StripPrefix("Foo")
// …where "Foo" is not constant in all cases…

But what is the proposed alternative?

Did I miss that part somehow, or did I miss the idea in general?

2 Likes

I was describing the problem, not proposing a solution. If you want to know what a solution might look like, you could read the thread (gasp) and look at @lihaoyi’s link.

I discussed this with a friend and we noticed that we had independently come up with the same syntax to solve this problem, which is to allow multiple parameter lists for the unapply method of an extractor. The last one contains the object to be matched, and the ones before contain parameters on how to perform matching.

When matching, you can also supply multiple parameter lists, the last one of which is a pattern and the ones before are expressions that are passed to the unapply method.

object StripPrefix:
  def unapply(prefix: String)(s: String): Option[String] =
    Option.when(s.startsWith(prefix)):
      s.stripPrefix(prefix)

"Foobar" match
  case StripPrefix("Foo")(x) => x
// evaluates to "bar"

But while this is intuitive, I think it’s ultimately not viable because now suddenly a full-blown expression can occur inside a pattern and, crucially, the compiler has no way to know whether the thing after StripPrefix( is supposed to be parsed as an expression or as a pattern; it only knows that once it has reached the end of the pattern, and that is a complication that we should avoid.

So we need another approach, and I think the best way to do it is to have a syntax to tell the compiler “here comes an expression, evaluate it and use the result’s unapply method as an extractor”.

One possibility would be to use {} to surround the expression:

"Foobar" match
  case {StripPrefix("Foo")}(x) => x

Another idea would be to use a keyword, e. g. with:

"Foobar" match
  case x with StripPrefix("Foo") => x

Any other suggestions?

Syntax is one issue with this idea, but another is efficiency. Suppose we use the syntax with {}. Then we could use Regex like this:

"Foobar" match
  case {"Foo(.*)".r}(x) => x
// evaluates to "bar"

The problem is that the Regex pattern needs to be compiled, which is inefficient and shouldn’t be done every time the match is executed. So the compiler should hoist that expression out and compile it to something like this:

// on package level
val Pat = "Foo(.*)".r

// whereever the match is:
"Foobar" match
  case Pat(x) => x

But then, the expression inside the {} could be using symbols (vals, defs etc.) from an excluding scope, so it’s not always possible to hoist it to the package level.
We could hoist it to the outermost scope that’s possible given the symbols that are being used. For example, given an extractor like this:

class Surround(prefix: String, suffix: String):
  def unapply(s: String): Option[String] =
    Option.when(s.startsWith(prefix) && s.endsWith(suffix)):
      s.stripPrefix(prefix).stripSuffix(suffix)

Hoisting would happen at different levels for different examples.
Example 1:

def a(pre: String) =
  def b(suf: String) =
    "Foobar" match
      case {Surround(pre, suf)}(x) => x

// translates to
def a(pre: String) = 
  def b(suf: String) =
    val Pat = Surround(pre, suf) // hoisted to `b` because `suf` is local to `b`
    "Foobar" match
      case Pat(x) => x

Example 2:

def a(pre: String) =
  def b =
    "Foobar" match
      case {Surround(pre, "bar"}(x) => x

// translates to
def a(pre: String) =
  val Pat = Surround(pre, "bar") // hoisted to `a` because no variables in `b` are used
  def b =
    "Foobar" match
      case Pat(x) => x

This is no doubt possible, but I don’t know if it’s a good or a bad idea. On the one hand it’s the most flexible: you can simply use surrounding variables as you would expect and it would work. On the other hand, it’s a performance pitfall: seemingly minor changes to the code could lead to different hoisting and thus wildly different performance characteristics.

I’d like to hear the community’s thoughts about this.

I also don’t know if it’s a good idea, but I think defining semantics is “orthogonal” to mechanisms, and the emphasis on scope is misplaced. (Other language features are at odds with implementation details, such as nested classes accessing private members or left-to-evaluation of operands of right-associative ops.)

Hey @som-snytt, thanks for commenting. But I’m afraid I can’t follow what you mean. If we want this hoisting behaviour in some form (I think we do), we need to define where that expression is hoisted to and which variables should be accessible. And I wouldn’t consider it an implementation detail as it affects the resulting program both on a semantic and a performance level.

You’re right, I assumed we only care about “evaluate at most once” like lazy val.

I’m not quite awake but I did watch that fellow at the JVM language summit talk about String concat. One candidate for “instrinsification” is String.format, which is used by our expensive Cadillac f-interpolator, where the format string is needlessly reparsed. That’s a similar example where it would be nice to precompute or at least compute once, but I don’t necessarily care when.

In pattern matches, I would expect patterns that are never evaluated not to precompute eagerly.

Yes, “evaluate at most once” is the ideal case, but it’s not possible when local symbols are accessed.
Actually, we should probably limit such access (if we want to allow it at all – we might just decide that only global symbols may be accessed) to stable identifiers. Allowing access to defs would make this whole hoisting thing completely unpredictable.

I have had the problem of having to define a name before being able to match before.

However, when I think this through, I am not convinced that expressions in pattern matches are a good thing.

Structural pattern matching is nice, because it’s visually pretty clear what is going on.
This proposal seems to (ab)use pattern matching syntax as syntactic sugar, and I think doing so is a terrible idea in the case of for and this seems similar.

On a meta level, all the examples I see so far could also be written just calling unapply directly:

someMethod(params).unapply("value").foreach {x => ...}

Which also does not require finding a new name for someMethod(params).

I guess a hidden part of the motivation is to still have multiple patterns and select the first matching one?
But is that common? Are there examples where they would not become more readable by just rewriting things a bit, adding a convenience method?

(Not saying it isn’t so, but I don’t have any good examples handy, and having them might help figure out what the missing functionality/syntax could be)

2 Likes

Pattern matching is syntactic sugar for calling unapply (plus some magic), so I don’t see why that would be considered an abuse.

The point of pattern matching is that it works recursively and so you don’t need to name the intermediate results.

No, because that is already possible without this proposal, and if you’re just going to call unapply, you can simply use orElse to fall back to another pattern.

The point here is simply to remove an odd inconsistency and inconvenience in the language. Nowhere else in Scala am I forced to bind an expression to a name in order to use it, and there’s no reason why it should be that way for extractor objects.

I don’t think the intent behind pattern matching is to make unapply look better.
In fact, I do believe unapply only exists to support pattern matching, not the other way round.

It’s a quirk of the language from a different time and should certainly not be something to build on top without questioning if that is a good idea.

We don’t seem to have examples for this either.

I find that makes it hard to think about questions like:

Would that not make long and hard to read patterns?
Would you not just use something like this for simple cases:

pattern1.unapply(value).flatMap(pattern2.unapply)

And maybe prefer intermediate names for more compelex cases?

Not sure if the points you mention after “because” talk about the same thing I had in mind, but I guess that’s not important if you did not have multiple patterns in mind anyway.

If this is how you want to motivate this change, I am completely unconvinced.
A pattern is not an expression. Patterns allowing identifiers but not expressions is completely reasonable and not a quirk.

Actually, in Scala it is, and it’s called an extractor object.

Whether or not that is desirable is a separate issue, and if that is what you want to debate, I encourage you to do so in a separate thread.