Pre-SIP: Easier pattern dispatch

This is a pattern I have seen time and time again:

Enum Tree:
  case LeafA(x: Int)
  case LeafB(x: String)
  case NodeC(x: Tree)
  case NodeD(l: Tree, r: Tree)

def handle(t: Tree, value: SomeType) = t match
  case t: LeafA => handleLeafA(t, value)
  ...
  case t: NodeD => handleNodeD(t, value)

def handleLeafA(t: LeafA, value: SomeType) =
  val LeafA(x) = t
  ...
...
def handleNodeD(t: NodeD, value: SomeType) =
  val NodeD(left, right) = t
  ...

This usually happens when the cases of the pattern match become too large not to be their own functions

But this comes with big downsides, to keep the typing information and de-structuring, we have to repeat ourselves a lot, each case is repeated 5 times:

  1. case t: LeafA
  2. => handleLeafA
  3. def handleLeafA
  4. (t: LeafA)
  5. val LeafA(x) =

Here is my proposed solution:

case parameters / pattern dispatch

The initial example would become:

Enum Tree:
  case LeafA(x: Int)
  case LeafB(x: String)
  case NodeC(x: Tree)
  case NodeD(l: Tree, r: Tree)

def handle(case LeafA(x), value: SomeType) =
  ...
...
def handle(case NodeD(left, right), value: SomeType) =
  ...

I’ll call synonyms (name TBD) methods which have the same signature except for case parameters

The compiler would take all synonyms and turn them into a single method which takes their supertype (which exactly?), whose body is a big match with every case:

def handle(t: Tree, value: SomeType) = t match
  case LeafA(x) => // body of `def handle(case LeafA(x), value: SomeType)`
  ...
  case NodeD(left, right) => // body of `def handle(case NodeD(left, right), value: SomeType)

One way to achieve something similar today is to make handle an abstract member on Tree, and override it in the children.
This has multiple downsides:

  1. The method must be defined by the authors, and cannot be added externally
  2. The method (or a pointer to it) is stored inside each instance, which can have performance overhead. (I’m not super sure of the details, but I remember discussing it with @bishabosha regarding scalac3)
  3. The method cannot be shared between similar concrete classes, unless additional levels are added (see also multi-level enums)

For the last point, here is an example of how it can be done with this proposal:

def countLeaves(case LeafA(_) | LeafB(_)): Int = 1

def countLeaves(case NodeC(t)): Int = countLeaves(t)
def countLeaves(case NodeD(l, r)): Int = countLeaves(l) + countLeaves(r)

Maybe we could add a definition statement, to reduce the need to repeat the return types and declare the type(s) on which we match:

/**
 *  Some Documentation here, in one centralized spot
 */
def countLeaves(match Tree): Int

def countLeaves(case LeafA(_) | LeafB(_)) = 1

def countLeaves(case NodeC(t)) = countLeaves(t) // No need to specify return type, even though recursive
def countLeaves(case NodeD(l, r)) = countLeaves(l) + countLeaves(r) // No need to specify return type, even though recursive

There is one big problem with this proposal: What happens when multiple patterns match ?
Intuitively, patterns should be matched in the definition order of the methods, but this means:

  1. Changing the order of defs can impact performance
  2. Changing the order of defs can change behaviour !

Alternative Proposal

Here is a simpler proposal which avoids this problem, but is not as powerful

Accept case patterns (without if filters) instead of parameters:

def handleLeafA(LeafA(x), value: SomeType) =
  ...

Which would be equivalent to handleLeafA from the original example
Does not do pattern dispatch
Parameter type is taken to be the extractor’s scrutinee type, in this case LeafA
This has the benefits of being more broadly useful, at the cost of being less useful in solving this specific problem, since you would still need to do

def handle(t: Tree, value: SomeType) = t match
  case t: LeafA => handleLeafA(t, value)
  ...
  case t: NodeD => handleNodeD(t, value)
4 Likes

I didn’t find a neat way to include it, but pattern dispatch would allow if filters:

def example(case LeafA(x) if x > 0) = "Strictly Pos"
def example(case LeafA(x) if x <= 0) = "Not Strictly Pos"

This sounds like something that would be (partially) solved by SIP-76 - Unpack Case Classes into Parameter Lists and Argument Lists by lihaoyi · Pull Request #114 · scala/improvement-proposals · GitHub, letting you unpack he case class or enum case into the method call at the callsite and the method signature at the definition site, thus avoiding all the boilerplate of unpacking and repacking parameters

The broader “dispatch to different methods based on runtime values” pattern could also use some language support. We see it all over the ecosystem: from MainArgs dispatching to @main methods, Cask dispatching to @cask.{get,post} methods, Mill dispatching to Commands, and so on. Hopefully if we can come up with some pattern to handle this sort of thing, it would be generalizable enough to apply to all these cases as well rather than being limited to dispatching on sealed traits or enums

I think another reason these methods were lifted out is to have better jvm inlining performance (i.e. there is some maximum bytecode size which means a method is never inlined) (and jvm has a upper hard bytecode limit for any method) - so i would probably expect still the pattern match to dispatch to separate methods rather than inlining the body

TypeAssigner is an example overload that rubs me the wrong way for some reason.

An autogenerated solution could also rely on importing extension methods from known locations. Then the handle methods need not be overloaded in one class. That is, dispatch would have the form t.handle(value). (I haven’t tried it.)

That reminds me, case patterns should probably also be allowed in extension methods:

extension (case LeafA(x))
  def show = s"LeafA($x)"
  def leafCount = 1
...
extension (case NodeD(l, r))
  def show = s"NodeD($l,$r)"
  def leafCount: Int = l.leafCount + r.leafCount

or

extension (match Tree)
  def show: String
  def leafCount: Int

extension (case LeafA(x))
  def show = s"LeafA($x)"
  def leafCount = 1
...
extension (case NodeD(l, r))
  def show = s"NodeD($l,$r)"
  def leafCount = l.leafCount + r.leafCount
1 Like

Reminds me somewhat of Clojure’s multimethods: defmulti - clojure.core | ClojureDocs - Community-Powered Clojure Documentation and Examples

I admit that I’m not a big fan of that - it can be a bit annoying when debugging code to have to figure out which version of handle is being called.

But at least there’s some precedent there. It might be a good way to take some inspiration.

The compiler would take all synonyms and turn them into a single method which takes their supertype (which exactly?), whose body is a big match with every case:

One way to simplify this would be to force the definition before hand. Consider a defmulti definition:

Enum Tree:
  case LeafA(x: Int)
  case LeafB(x: String)
  case NodeC(x: Tree)
  case NodeD(l: Tree, r: Tree)

defmulti handle(tree: Tree, value: SomeType): Unit

def handle(case LeafA(x), value: SomeType): Unit = ???
def handle(case NodeD(left, right), value: SomeType): Unit = ???
def handle(case tree, value: SomeType): Unit = ???

Although I have to admit that I don’t particularly enjoy this suggestion (especially for adding more syntax)

1 Like

Thanks for the pointer !

I did mention something similar:

Note the match Tree which lets us know the scrutinee type, and that this is a “defmulti”

It’s true that with the status quo, you get to the dispatch method, and you can then “go to definition” of the one you think matches

I wonder if it would make sense to just add a facility that just generates the case distinction for an overloaded method.
For example just writing handle(match tree, value)to generate a pattern match that tries all overloads of handle:

tree match 
  case leaf: LeafA => handle(leaf, value)
  case node: NodeC => handle(node, value)
  ...

(This does not address the restructuring of case classes as part of the method definition, which seems separate)

Ah, sorry. This is what I get by skimming the posts on my phone :melting_face:

That match syntax is indeed clearer than what I proposed.