Pre SIP: Demote match keyword to a method

Blaisorblade · October 17, 2018, 7:14am

We have two variants of the question:

can we actually move the implementation of pattern matching to libraries? I still think the answer is (very likely to be) no.
can we pretend that at least x.match looks like a normal method call? Yes if Dotty can be patched for your variant of the syntax — whether you turn it into a match call or just bring the grammars closer…

First, the spec on meta is outdated. And I don’t think pattern matching could be implemented in a Dotty library (macros or not) — if it is, I feel it’s likely a bug.

That’s because pattern matching is supported both in the typechecker, a compiler phase for translation relying on some lower-level nodes (our version of labeled defs — labeled blocks by @sjrd, but the difference shouldn’t concern us), and handling in the phases after translation for the new generated constructs.

While I’m still the youngest Dotty contributor inside EPFL, here’s (part of) the rationale for the new metaprogramming designs as far as I can tell:

The Dotty typechecker is more robust than Scalac’s, but a few things are still subtle. Among those, preventing cycles is still delicate business — and that’s complexity essential to the Scala spec. To remove it, you’d just have to limit mutual recursion, but that would be a different language.
Scala 2 whitebox macros are very expressive, but they allow any library to extend the typechecker — which has frozen all its internals. People at Lightbend aren’t maintaining Scalac, they’re maintaining Scalac, macros-paradise, all macros in the community build, and probably more ones. They have to, lest the ecosystem stops working. That’s not sustainable, and has prevented many bigger changes in Scalac. The strong feeling is that we don’t want to repeat that mistake.
Limiting access to the typechecker has been an explicit design goal for the whole time — the new macros will run after typechecking is done and types are set in stone. More controlled mechanisms are possible (see match types and all that), but nothing like whitebox macros. Even usual compiler plugins are more limited — “research plugins” have fewer limitations but are currently disabled on stable builds.
There is an in-progress lower-level macro API, but that’s run after typechecking and cannot affect types, and the current design gives you a snapshot of Tasty trees, so I’d even expect no access to labeled blocks. Even then, you don’t have full access to the internals, just restricted access to a controlled subset.
Since people can be determined, I suspect you could maybe break some of the restrictions of the lower-level macro API by linking against the compiler (unsafely) and using enough reflection, but there’s no support and no binary compatibility guarantees for internal compiler APIs. No scala-reflect with stronger guarantees for half of the internals.

EDIT: To conclude: does that prove you can’t implement pattern matching in a library somehow? Not quite — I haven’t checked if there’s some clever way to encode parts of it via match types and whatnot. But it’s not pattern matching as we have it today.
EDIT2: what has not been discussed are macros before Namer and Typer, that is, simple desugarings. They’d be extremely limited

odersky · October 17, 2018, 12:35pm

I think the point was to treat pattern matching closures as the primitive form; then match can indeed be a method. It probably still needs to be treated specially in the compiler to avoid the actual creation of a closure, but that’s a minor point.

However, I am not sure it’s worth the change, for two reasons:

you lose syntax highlighting
even if we made match a method, the main usage would still be like this
```
x match { ... }
```
rather than
```
x.match { ... }
```
It just looks better. But that would then rely on alphanumeric methods being written in infix form, and that’s what we might actually want to get away from since it gives a confusing, and mostly pointless, choice to people. I am not sure about this yet, but just to get more experience with it I have taken to write now
```
x `eq` y
```

instead of

   x eq y

and so on. As I said, so far it’s an experiment, nothing more. I am not sure we will reach a decision anytime soon. But since it affects the question of match I would like to clarify what Scala 3 should do about infix operators before deciding on a change of the status quo.

viktorklang · October 17, 2018, 12:56pm

Assuming that the pattern-matcher would generate the appropriate logic for creating PartialFunctions, and match being a method which takes a function from the “MyType” of the receiver to some type X, if the match method is marked as inline, and with an optimizer which can inline PartialFunction literals, wouldn’t we (at least theorhetically) be able to replace match as a keyword with match as a method, without necessarily incurring any performance implications?

odersky · October 17, 2018, 1:10pm

The problem is, what would it look like after inlining a match? At some point you need a primitive match expression. Except if the pattern matcher does the inlining and eliminates the closure; that would work. But, either way, that’s a minor issue.

viktorklang · October 17, 2018, 2:26pm

I’m not sure I understand the question, could you elaborate?

What I mean is that for a “partial function literal”, ex:

{ case Foo(_) => “bar” }

the compiler will generate an apply-method which performs the pattern match. Now, in order to reduce overhead, let’s say that match is loosely defined as follows:

trait Any { type Self =>
@inline def match[T](expr: PartialFunction[Self, T]): T = expr(this)
// you could technically have the following as well:
@inline def pipe[T](f: Self => T): T = f(this)
}

Now, a sufficiently advanced compiler would look at the following code:

1 match {
case 0 => “foo”
case _ => “bar”
}

And it would do the following rewrite:

({
case 0 => “foo”
case _ => “bar”
})(1)

Then during optimization, it would want to avoid megamorphic calls, or even invokeinterface ones, and it’d see that it’s a PF literal which is anonymous and only used once, and it would instead elide the allocation of the PartialFunction and apply (sic!) the body of its apply directly to its parameter.
Upon seeing that it is a primitive, it could elide all boxing and in the end you’d end up with something as performant and straightforward as the current match keyword.

All of the above makes a lot of assumptions, but at least it should outline the gist of my train of thought.

odersky · October 17, 2018, 2:48pm

Exactly. But what does the result of this look like? It would look like our good friend, the match expression unless you map at the same time to a lower level. These are all compiler games which can be played. Like I said, that would not be the problem.

shawjef3 · October 17, 2018, 4:53pm

If match were a method I would expect it to be overridable. I haven’t thought through this but I’m sure someone could come up for some bad ways to override match.

nafg · October 17, 2018, 4:59pm

FWIW you could define match to take a regular function, since the MatchError behavior will be the same in either case

viktorklang · October 18, 2018, 7:37am

Yes, if my assumptions hold.

viktorklang · October 18, 2018, 7:40am

Absolutely. I guess the difference would be that there would be no match keyword or expressions needed in the language spec as it has been unified with the “Partial Function literal” syntax.

Edit:

One big difference though is that PartialFunctions are composable whereas match expressions arent. (Unless you sort of surround them to catch MatchError and delegate to another match expression.)

jducoeur · October 18, 2018, 12:43pm

Opinion: I disagree here. While I do think that point-free syntax is often overused by beginners, I really like the ability to use it for more DSL-ish applications, and do so pretty often for certain use cases. I think it’s worth keeping.

eed3si9n · October 18, 2018, 1:32pm

(Minor nitpick: point-free style is a whole different thing, so let’s call it “infix” vs “dot-and-paren” notation.)

I agree with @jducoeur in terms of Scala’s ability to extend language-construct-like method being a strength of the language, especially the way the collection library demonstrates it.

To me, the infix makes feels natural when the argument is a lambda, which is also consistent with match:

List(1, 2, 3) filter { _ % 2 == 0 }
Option(1) match { case _ => "foo" }
List(1, 2, 3) map { case 1 => 1; case n => n + 1 }

I’d switch to dot-and-paren when you want to chain more than 1 (, which cannot do with match now).

odersky · November 21, 2019, 2:32pm

Coming back to this… In Scala 3, match is special since it can also be used in match types. So I believe it should still be a keyword. However, we might change the syntax rules for match so that it behaves like a left-associative infix operator with the same precedence as other alpha-numeric operators. That would make using match more flexible. In particular, we could chain matches:

a match 
  case P1 => b
  case P2 => c
match 
  case P3 => d
  case P4 => e

Another variant would be to also allow match to be used after dot. So match would behave like a normal method, except that it is a reserved word. That would help with embedding matches in long chains of conditions. Here’s an example, lifted from real code (the exhaustivity checker in our compiler):

  def canDecompose(tp: Type): Boolean =
    val cls = tp.classSymbol
    cls.is(Sealed)
    && cls.isOneOf(AbstractOrTrait)
    && !cls.hasAnonymousChild
    && cls.children.nonEmpty
    || tp.dealias.match
          case _: OrType => true
          case and: AndType => canDecompose(and.tp1) || canDecompose(and.tp2)
          case _ => false
    || tp.isRef(defn.BooleanClass)
    || tp.isRef(defn.UnitClass)
    || cls.isAllOf(JavaEnumTrait)

That’s actually quite readable taking into account how complex a condition it is. Compare with the original that uses old Scala 2 syntax:

  def canDecompose(tp: Type): Boolean = {
    val dealiasedTp = tp.dealias
    (tp.classSymbol.is(Sealed) &&
      tp.classSymbol.isOneOf(AbstractOrTrait) &&
      !tp.classSymbol.hasAnonymousChild &&
      tp.classSymbol.children.nonEmpty ) ||
    dealiasedTp.isInstanceOf[OrType] ||
    (dealiasedTp.isInstanceOf[AndType] && {
      val and = dealiasedTp.asInstanceOf[AndType]
      canDecompose(and.tp1) || canDecompose(and.tp2)
    }) ||
    tp.isRef(defn.BooleanClass) ||
    tp.isRef(defn.UnitClass) ||
    tp.classSymbol.isAllOf(JavaEnumTrait)
  }

morgen-peschke · November 21, 2019, 4:53pm

This would be amazing, the lack of chaining after match has been a frequent “gotcha” for onboarding new Scala programmers.

dwijnand · November 23, 2019, 11:57am

I believe that’s all that’s needed.

Similarly you sometimes have to wrap for expressions in parens:

(for {
  x <- xs
  y <- ys
} yield
  foo(x, y)
).toSeq

Sciss · November 23, 2019, 10:35pm

Are you saying dot should be allowed for yield as well? I think that would be a good second case:

    for(x <- xs; y <- ys).yield(foo(x, y)).toSeq

viktorklang · November 23, 2019, 10:54pm

My preference would be to move the yield into the for comprehension, since scope-wise that is where the symbols are defined.

for { f <- foo; b <- bar; yield baz(f,b) }

Then you could easily get the match-example working:

for {
  f <- foo
  b <- bar
  yield baz(f,b)
} match {
  case … => ???
}

morgen-peschke · November 23, 2019, 10:56pm

I’d be in favor of this, regardless of what happens with match. The current syntax looks like the definitions are escaping their defined scope. They aren’t, but boy does it look like it.

odersky · November 24, 2019, 1:01pm

See https://github.com/lampepfl/dotty/pull/7610

eed3si9n · November 25, 2019, 5:42pm

Thanks Martin!