Pre-SIP(s): Improving the expressiveness of conditional control flow

alvae · January 18, 2024, 10:14am

I’m opening this thread to discuss improvements we should consider w.r.t. the constructs we can use in Scala to express control flow.

Since my list of grievances is quite long, I expect the features pitched in this thread to turn into one or several (or none!) [Pre-]SIPs, depending on feedback. Further, while my post does proposes solutions at the end, I’m more interested to know whether the community agrees that there is something to be improved before I invest more time into making sure we can come up with changes having the right syntax and semantics.

I note that most of the ideas I’m about to present have been workshopped with @EugeneFlesselle.

Motivation

This section makes the case for defining new constructs to express conditional control flow in Scala. I will start relating pattern matching to conditional expressions to justify the general need for “concise” match expressions. Then I’ll present use cases for new constructs.

The case for conditional expression

Let me start by doing something a little weird: justify the existence of if!

Pattern matching is one of the most expressive forms of conditional control flow. As such, it can be used as a building block to express many other forms of conditional statements. For example, if a then foo else bar is merely sugar:

a match
  case true => foo
  case _ => bar

As we can see, there is a trivial way to write a conditional expression as a match over a Boolean scrutinee and vice-versa. Given this observation, one may wonder then why having conditional expressions in the first place. There are several answers to this question:

First, a conditional expression is just more concise than its equivalent match expression, and concision matters because of language ergonomics. When we see if a then 1 else 2, we can immediately recognize that we’re looking at a dichotomous choice. The match expression is a more convoluted in contrast.

Second, a conditional expression is more “structured”. There are only two cases: either a holds or it does not, the branch that comes first (i.e., right after the condition) is always the one running if the condition holds, and the other branch is always the one running otherwise. Hence, we have just less to think about because the syntax of the construct is ruling out many scenarii that a general-purpose match doesn’t.

Third, a conditional expression doesn’t have to have an else branch:

def computeThings(): T =
  if shouldLog then println("about to compute things")
  proceed()

The equivalent match expression has a case false that just produces a unit. Writing such a case would simply introduce noise in our code.

Lastly, let’s recognize the convenience of “else if”. For example, consider the following conditional expression:

if a1 then
  foo
else if a2 then
  bar
else
  zim

There are two ways to rewrite it as match expressions:

// (1)
a1 match
  case true => foo
  case _ =>
    a2 match
      case true => bar
      case _ => zim

// (2)
(a1, a2) match
  case (true, _) => foo
  case (_, true) => bar
  case _ => zim

The first case isn’t ideal because it introduces excessive nesting, making it difficult to quickly identify control flow paths. The situation is manageable here but imagine that we had 3 other else if branches attached to the original expression!

The second case isn’t great either because it causes the reader to go back to the expression of the scrutinee for every case. Using else if, one can more easily focus on just the condition of the current branch while reading code from top to bottom. Further, the second version has slightly different semantics as the first one. In particular, a2 is evaluated “no matter what”, which may be undesirable because of performance and/or side effects.

Patterns in conditional expressions

Now that we’re hopefully convinced that conditional expressions are useful despite being merely syntax sugar, let’s talk about conditions. After all, not all scrutinees are Boolean values.

Let’s imagine we have a type Version defined as follows:

enum Version:
  case Legacy
  case Stable(major: Int, minor: Int)

Now, imagine we want to run code only for a stable version greater than or equal to 2:

v match
  case Version.Stable(major, _) if major >= 2 => foo
  case _ => bar

We had no choice but to use a match expression here. The problem is that our “condition” is not a simple Boolean value, and therefore we can no longer sweeten our program with those tasty conditional expressions. That is unfortunate because all the reasons we used to justify them in the previous section still apply here.

Though it is not a Boolean value, the pattern is notionally a predicate. In fact, we could rewrite as below, so that we could go back to a conditional expression:

extension (self: Version)
  def >= (m: Int): Boolean =
    self match
      case Version.Stable(major, _) if major >= m => true
      case _ => false

if v >= 2 then foo else bar

Sadly the “trick” is quite underwhelming as we’ve done nothing more than moving code around. Further, not having access to major in the then branch might be a loss, as we will discuss later.

Refining matches

One fundamental problem that match addresses and if doesn’t is that it lets us extract a value from the scrutinee and give it name for use in another expression. For example:

v match
  case Version.Stable(major, minor) if major >= 2 =>
    println(s"running version ${major}.${minor}")
  case _ => ()

But again, this match expression would be arguably better expressed as a conditional expression. In particular, since nothing is executed unless the condition holds, the “catch-all” branch is just pure noise.

It is interesting to observe that the inability to bind values in conditional expression also impedes expressiveness in pattern guards. For example, assuming v is an instance of Option[Version]:

v match
  case Some(x) =>
    arbitraryComputation(x) match
      case Version.Stable(major, minor) if major >= 2 =>
        println(s"running version ${major}.${minor}")
      case _ => ()
  case _ => ()

Because the version is wrapped in an Option, we now have to introduce additional nesting to first extract it and give it a name. We need this name because arbitraryComputation is just a function and not an extractor. And obviously, the problem grows with the number of values that must be unwrapped in general!

I suppose we could use some more dark extractor magic to use an extractor in this specific case. However, that would be “costly” for something that could be a one-liner with language support, as we’ll see later. Further, I am not convinced this magic would be powerful enough to deal with variables matched from different levels of nested matches.

Generalizing from the above example use case, we remark that we sometimes need to write match expressions that “refines” the result of a previous match. For example:

v match
  case Some(x) if arbitraryPredicate(x) =>
    arbitraryComputation(x) match
      case Version.Stable(major, minor) if major >= 2 =>
        foo(major, minor)
      case Version.Legacy =>
        bar
      case _ =>
        zim
  case _ =>
    zim

The above example is a slight variation of the one shown in the previous section. The most significant difference is that we actually want the conditional behavior on x to be expressed as a match expression as it involves a non-dichotomous choice. There are two cases of interest:

v is Some(x) and x is a stable version greater than or equal to 2.0
v is Some(x) and x is a legacy version

In any other case, we run zim, which here stands in for some kind of default behavior.

In addition to the excessive indentation caused by the inner match, the problem of this expression is that it needs two occurrences of zim. It is manageable in this small example, but zim could be a much larger expression, potentially spanning many lines of code. Further, the problem grows with the number of non-exhaustive nested matches that we introduce, as zim will have to appear in all “catch-all” cases.

Also note that while all “catch-all” cases are stacked at the end in the above example, it is easy to imagine situations where one zim would end up in the middle of the expression:

n match
  case A(x) =>
    f(x) match
      case Z => foo
      case X => bar
      case _ => zim // <- `zim` is in the middle
  case B(x) =>
    f(x) match
      case Y => foo
      case _ => zim
  case _ => zim

One solution is to refactor zim in a function that gets called in all “catch-all” cases. Unfortunately, that is inconvenient if zim uses many variables from the local scope and impractical if it modifies local state (ah, if only Scala supported call-by-value-result!)

Bailing out

It sometimes happens that we want to test for some conditions at the top of a function before running the remainder of its body. For example, we may want to check for some preconditions:

/** Computes very important things.
  *
  * @param v The client's version, must be greater than or equal to 2.0.
  * @param w The server's version, must be greater than or equal to the client's.
  */
def computeThings(v: Version, w: Version): T =
  v match
    case Version.Stable(client, _) if client >= 2 =>
      if w >= v then
        // many lines
        if client >= 3 then
          newTechnique()
        else
          oldTechnique()

      else throw InvalidServerVersion(w)
    case _ => throw InvalidClientVersion(v)

The function starts by verifying that its precondition are satisfied, throwing exceptions if they aren’t. Then, it goes on to run its logic, which happens to require a value extracted from v.

Having the code written that way is sub-optimal for two reasons:

Most of the function’s body is nested in a match case and excessive indentation is harmful.
The handling of precondition failures is far from precondition checks.

As we can see, both of these issues grow with the number of precondition checks. The body of the function is nested at least once per test (twice in the case of a match) and it gets difficult to relate the exception thrown to the corresponding test. Note also that both conditional and match expressions introduce these issues.

The setup of computeThings make it very difficult to refactor. Because we need the client value to run its core logic, there’s at least one level of indentation that is unavoidable. The conditional expression could be written differently though:

if w < v then throw InvalidServerVersion(w)
// many lines

By inverting the condition, we can “bail out” early and write the remainder of the function without additional nesting. That works because throwing an exception is a way to give up control flow. I’ll show how this idea can be generalized later.

For comprehension to the rescue?

At first glance, it may seem like at least some of the issues discussed above are addressed by for comprehensions. For example, one can eliminate a match expressions used to extract values from options:

def g(v: Option[Int], w: Option[Int]): Option[Int] =
  for x <- v; y <- w if y >= x yield
    x + y

Unfortunately, there are still quite a few problems that for does not solve:

We can’t write an “else” branch that would be executed if one of the enumerators does not produce a value;
We can’t use a for comprehension as a pattern guard;
Filtering does not generalize well to all patterns.

The last issue is particularly vexing because it is quite tempting to write the following:

for case Version.Stable(major, minor) <- v if (major >= 2) do
  println(s"running version ${major}.${minor}")

Note that it is possible to get this expression working if v is wrapped in a sequence. But this trick reveals that for, even with a couple of changes, is perhaps ill-suited to fill the design gaps that I’ve discussed so far. AFAIU, the feature is fundamentally about handling enumerators rather than performing arbitrary pattern matching.

What have others done?

Let’s make a detour to other languages before we discuss possible solutions for Scala.

Patterns in conditional expressions

In Swift, we can use patterns in conditional expressions, which then serve as conditions. The bindings introduced by the patterns are defined only in the first branch. Multiple conditions can appear separated by a comma, and mixing pattern matches and Boolean expressions is allowed.

if case Version.stable(let major, _) = v, major >= 2 {
  foo()
} else {
  bar()
}

Note extracted values can be bound to let or var (which are equivalent to Scala’s val and var, respectively) to control the mutability of the local variable.

Rust also admits patterns in conditional expressions, although multiple conditions are not supported. The language therefore suffers the same issue as Scala with respect to nested matches: zim has to be repeated.

if let Version::Stable(major, _) = v {
  if major >= 2 { foo() } else { zim() }
} else {
  zim()
}

In this paper, Lionel Parreaux describes the so-called “ultimate conditional syntax” for MLscript that supports patterns in conditional expressions. Multiple conditions can appear separated by and:

if v is Stable(major, _) and major >= 2
then foo()
else bar()

Refining matches

Since Parreaux’s conditional unifies the syntax of conditional and match expressions, nested patterns are expressed as naturally as and-separated sequences of conditions in a “multi-way conditional expression”:

if v is
  Some(x) and x is
    Stable(major, minor) then foo(major, minor)
    Legacy then bar()
else zim()

Bailing out

In Swift, a guard statement can be used to test for a condition and break control flow if it does not hold. It has a single branch, introduced by else.

func computeThings(_ x: Int) throws -> T {
  guard x >= 0 else { throw InvalidArgument(x) }
  // many lines
  return T()
}

Just like in regular conditional expressions, the condition can contain patterns. Bindings introduced by these patterns are available after the guard statement, but not in its branch.

func computeThings(v: Version, w: Version) -> T {
  guard case Version.stable(let client, _) = v, client >= 2 else { throw InvalidClientVersion(v) }
  guard w >= v else { throw InvalidServerVersion(w) }

  // many lines
  return if client >= 3 {
    newTechnique()
  } else {
    oldTechnique()
  }
}

Rust has a similar construction. Sadly it also suffers from the inability to write multiple conditions.

fn computeThings(v: Version, w: Version) -> i64 {
  let Version::Stable(major, _) = v else { panic!("invalid server version"); };
  // ...
}

Proposed solutions

Let’s now examine possible solutions to address the three shortcomings introduced in the first section. These are pretty straightforward and directly inspired by precedent in other languages.

Patterns in conditional expressions

In both Swift and Rust, pattern matches in conditional expressions require an inversion of the pattern and the scrutinee. For example, in Swift:

// match form
switch v {
  case Version.stable(let major, _) where major >= 2:
    print(major)
  default:
    break
}

// conditional form
if case Version.stable(let major, _) = v, major >= 2 {
  print(major)
}

This inversion may look surprising. Hence, it may be reasonable to try to avoid it in Scala, just like in Parreaux’s “ultimate conditional syntax”:

// without pattern/scrutinee inversion
with v match case Version.Stable(major, _) if major >= 2 then
  println(major)

The reason for introducing the construct with with is that an expression of the form if v match case ... is currently valid Scala, provided that the cases are wrapped in braces, only with different semantics. The additional condition on the result of the match is introduced with if for consistency with pattern guards.

If we conceded the inversion of the pattern and the scrutinee, then we could use a syntax similar as Swift and Rust:

// with pattern/scrutinee inversion
if val Version.Stable(major, _) = v if major >= 2 then
  println(major)

This form has a few advantages that perhaps make up for the surprising inversion:

if is a more recognizable introducer for a conditional expression;
val and var can be used to introduce immutable and mutable bindings, respectively;
the syntax is consistent with that of vals and vars introduced with irrefutable patterns;
both additional matches and Boolean expressions are introduced with the same keyword.

Refining matches

A case in a match expression can be followed by another match expression, introduced with with to avoid ambiguity with pattern guards:

v match
  case Some(x) with x match
    case Version.Stable(major, minor) =>
      foo(major, minor)
    case Version.Legacy =>
      bar
  case _ =>
    zim

The cases of the nested expressions need not be exhaustive. If none of them match, then control flow returns to the outer expression and proceeds as though the current case had not been matched.

The nested matches could obviously introduce side-effects, but that wouldn’t be new. One can already write arbitrary side effects in a pattern guard (and you better make sure they don’t change the scrutinee!)

Bailing out

A val or var introduced with a refutable pattern can be followed by an expression returning control flow, introduced by else. Conditions and nested matches can follow the pattern, introduced by if and with, respectively. The expression in the else branch must produce Nothing.

def computeThings(v: Version, w: Version): T =
  val Version.Stable(client, _) = v if client >= 2 else throw InvalidClientVersion(v)
  if w < v then throw InvalidClientVersion(v)

  // many lines
  if client >= 3 then
    newTechnique()
  else
    oldTechnique()

Sporarum · January 18, 2024, 12:06pm

I don’t think there is a problem to begin with, for example if I wanted a single match, I could rewrite:

v match
  case Some(x) if arbitraryPredicate(x) =>
    arbitraryComputation(x) match
      case Version.Stable(major, minor) if major >= 2 =>
        foo(major, minor)
      case Version.Legacy =>
        bar
      case _ =>
        zim
  case _ =>
    zim

Into:

v.filter(arbitraryPredicate(_)).map(arbitraryComputation(_)) match
  case Some(Version.Stable(major, minor)) if major >= 2 =>
    foo(major, minor)
  case Some(Version.Legacy) =>
     bar
  case _ =>
    zim

You’ll not in particular that this avoids the duplication of zim !

But this is accessory to a more important point:

I simply disagree, I think the two-tab indentation we have in scala, together with indent guides and sticky scroll (native settings in VS Code) make indented code a breeze to read and understand

And I think this can be highlighted by the fact Scala users, in my experience, prefer:

if isValid(v) then
  code()
else
 throw error()

Over:

if isNotValid(v) then
 throw error()
code()

Given indentation is not costly in Scala, the former allows us to be explicit about control flow, which makes the mental model clearer

(And so I would be against adding a construct to make the latter easier)

I think even the following has the advantage of being clear:

v match
  case Version.Stable(major, minor) if major >= 2 =>
    println(s"running version ${major}.${minor}")
  case _ =>

And for me the final nail in the coffin is this:
Right now it is relatively clear when something should be a match and when it should not
I don’t want to have a third option in between, which would make the decision harder !
(a sentiment I also expressed for named tuples funnily enough)

Sporarum · January 18, 2024, 12:09pm

My disagreement should not hide the fact that this is an extremely well asked question !

I appreciate in particular the exploration of other programming languages, and the numerous examples

alvae · January 18, 2024, 12:19pm

That only works in the narrow use case where the first match is done over an option. I used Option for convenience but it should be really stressed that any trick that uses the API of Option is not a valid counter-example to the problems I mentioned because such an API isn’t guaranteed to exist on arbitrary case classes.

Sporarum · January 18, 2024, 1:41pm

Sorry yes I should have been clearer, Options are so ubiquitous I thought it was still worthwhile to showcase

The other points of my argumentation are more pertinent

lihaoyi · January 18, 2024, 2:46pm

I like this proposal.

Doesn’t give any unique new capabilities, but shortens a very common idiom, and brings the syntax in line with where all other languages are converging upon. Apart from Swift, Python has a similar thing with the walrus operator

if foo := returns_foo_or_none():
  do_thing(foo)

yilinwei · January 18, 2024, 2:55pm

Having used Rust quite a bit, I think that the if let functionality is quite useful.

I am skeptical, however that the current proposed syntax under Refining matches is a good idea. Syntactically it is extremely similar to the case where with is swapped with => but has different behaviour. However, it is hard to judge these decisions without using them in anger. Are there any current languages which have this type of pattern match statement so I can see a larger sample of use?

yilinwei · January 18, 2024, 3:09pm

Right now it is relatively clear when something should be a match and when it should not

In the case of binding variables, I sympathize, but the dividing line was when I was matching on independent variables as part of a chain - constructing a tuple for that case obscures the logic in my opinion.

I found that,

(x, y, z) match {
  case (X(...), _, _) => ???
  case (_, Y(...), _) => ???
  case (_, _, Z(...)) => ???
}

conveys the intent of the code worse than,

if val X(...) = x {
  ???
} else if val Y(...) = y {
  ???
} else {
  val Z(...) = z
  ???
}

Sporarum · January 18, 2024, 3:21pm

That’s why I would do something like:

c match
  case X(...) =>
    ???
  case _ => y match
    case Y(...) =>
      ???
    case _ => z match
      case Z(...) => 
        ???
      case _ =>
        ???

Jasper-M · January 18, 2024, 3:28pm

alvae:

In this paper , Lionel Parreaux describes the so-called “ultimate conditional syntax” for MLscript that supports patterns in conditional expressions. Multiple conditions can appear separated by and:
if v is Stable(major, _) and major >= 2
then foo()
else bar()

You may be interested to know that this and other potential solutions in this problem space have been discussed before in this topic.

olhotak · January 19, 2024, 1:48am

In Scala we can bind a variable in the header of a for comprehension, then use that bound variable in both the header and the body, like x in the following example:

for
  y <- 1 to 5
  x = foo(y)
  if x < 45
yield
  x + 5

Such a facility would be useful to avoid recomputing (and repeatedly writing) the subexpression foo(42) in the following example:

if foo(42) < 80
then 10 + foo(42)
else 10 - foo(42)

Unfortunately, a variable bound in the condition of an if is not available in the scope of its then and else branches:

if {
  val x = foo(42)
  x < 80
} then 10 + x else 10 - x // ERROR: not found: x

In the case of an if, a possible workaround is to declare the variable outside the if (though then the scope of x is broader than may be desired):

val x = foo(42)
if x < 80 then 10 + x else 10 - x

But in the case of a while loop where we want to recompute x in each iteration, such a workaround is not available. If we bind x in the condition, it is not available in the body:

while {
  val x = foo(42)
  x < 80
} do 10 + x // ERROR: not found: x

If we bind x outside the loop, it is not recomputed in each iteration:

val x = foo(42)
while x < 80 do 10 + x

I encountered a need for such a construct in the Community Build when trying to ensure null safety for a loop like this:

while x.next() != null do x = x.next()

Ensuring that this is safe would require proving that the implementation of the next() method is deterministic, that multiple calls x.next() return the same value (until we mutate x). This would require careful analysis of the implementation of next(). A variant that doesn’t depend on proving the determinism of next() could be written as:

while { var next = x.next(); next != null } do x = next

But this would require the variable next bound in the condition to be available in the scope of the loop body.

In summary, it would be nice if an improvement in this area could somehow make it possible to bind variables in the conditions of if and while and have them be available in the scopes of their bodies, as is already the case in for comprehensions.

Ichoran · January 19, 2024, 3:35am

With the advent of inline and boundary, Scala 3 can handle all sorts of control flow in user space. I think it is a good idea to utilize this capability, and at that point I’m not convinced that the additional complexity in Scala matches would be pulling its weight.

alvae:

n match
  case A(x) =>
    f(x) match
      case Z => foo
      case X => bar
      case _ => zim // <- `zim` is in the middle
  case B(x) =>
    f(x) match
      case Y => foo
      case _ => zim
  case _ => zim

Let’s consider this case. Firstly, I don’t actually see a problem with this. It’s straightforward enough. But with the control flow constructs I’ve created for myself in Scala 3, there are several solutions, with not too much extra boilerplate.

(Using com.github.ichoran::kse3-flow:0.2.9.)

One flavor: attempt. I don’t actually have a bail-on-incomplete-match functionality, but it’s trivial to add one (I do have pack-in-alternate-on-incomplete-match):

extension [A](a: A)
  inline def inCase[B, Z](pf: PartialFunction[A, B])(using scala.util.Boundary[Attempt[Z]]) = a.isCase(pf).!

Now that I think about it, I’m going to add something like that. (Edit: I did. It’s in 0.2.10. Just need to import kse.flow.*) Anyway, given this we can simplify the code thusly:

attempt:
  n.inCase:
    case A(x) => f(x).inCase:
      case Z => foo
      case X => bar
    case B(x) => f(x).inCase:
      case Y => foo
.default:
  zim

Other things could be done, but anyway, my point is that since Scala 3 has such immense abstraction capabilities here, I’m not sure it’s worth adding language level features rather than coming up with a really pretty control flow library, and then seeing what’s left.

For instance, I’ve eliminated almost all monadic uses of Option or Either in favor of an unboxed favored-case sum type (“Or”) and Rust-style .? jumps to bail out on the disfavored case. It makes my code far more compact and easier for me, at least, to understand.

Here’s an example:

def example: Unit Or Err = Err.Or:
  var totalSize = 0L
  Resource.Nice(asource.openRead(262144))(_.close()){ in =>
    totalSize += (in sendTo zos).?
  }.?

Ichoran · January 19, 2024, 3:45am

I solve the while problem by using different control constructs. For instance, one could:

boundary:
  while true do
    val x = foo(42)
    if x < 80 then boundary.break()
    f(x + 10)

One could also abstract that into a loop and quit functionality if one wanted.

The problem with enabling this scope-jumping is that then you can’t replicate it in user space. If I write a method def thingy(p: A => Boolean)(f: A => B), then I can’t get variables to jump between p and f.

hepin1989 · January 19, 2024, 4:50am

I saw a code in Pekko Stream (Akka stream should be the same) which I want to change the takeWhile(..).map(..) to collectWhile(...)

The code is

  protected def zipAllFlow[U, A >: Out, Mat2](
      that: Graph[SourceShape[U], Mat2],
      thisElem: A,
      thatElem: U): Flow[Out @uncheckedVariance, (A, U), Mat2] = {
    case object passedEnd
    val passedEndSrc = Source.repeat(passedEnd)
    val left: Flow[Out, Any, NotUsed] = Flow[A].concat(passedEndSrc)
    val right: Source[Any, Mat2] = Source.fromGraph(that).concat(passedEndSrc)
    val zipFlow: Flow[Out, (A, U), Mat2] = left
      .zipMat(right)(Keep.right)
      .takeWhile {
        case (`passedEnd`, `passedEnd`) => false
        case _                          => true
      }
      .map {
        case (`passedEnd`, r: U @unchecked) => (thisElem, r)
        case (l: A @unchecked, `passedEnd`) => (l, thatElem)
        case t: (A, U) @unchecked           => t
      }
    zipFlow
  }

Hope your new sip cover this case.

Ichoran · January 19, 2024, 9:55am

Since I’ve been on a control structure binge lately, I can do this one too, at least for Array, but the logic could be extended to any collection:

import kse.basics._
import shortcut.{quit, skip}
val input = Array("salmon", "eel", "perch", "", "minnow")
input.breakable.copyWith:
  case s if s.isEmpty => quit()
  case "perch" => "cod"
  case s if s.length < 4 => skip()
  case s => s.toUpperCase

This produces Array("SALMON", "cod").