Pre-SIP: multiple assignments

alvae · December 15, 2023, 3:57pm

Introduction

This proposal discusses the syntax and semantics of a construct to assign multiple variables with a single expression. This feature would simplify the implementation of operations expressed in terms of relationships between multiple variables, such as std::swap in C++.

Motivation

It happens that one has to assign multiple variables “at once” in an algorithm. For example, let’s consider the Fibonacci sequence:

class FibonacciIterator() extends Iterator[Int]:

  private var a: Int = 0
  private var b: Int = 1

  def hasNext = true
  def next() =
    val r = a
    val n = a + b
    a = b
    b = n
    r

The same iterator could be rewritten more concisely if we could assign multiple variables at once. For example, we can write the following in Swift:

struct FibonacciIterator: IteratorProtocol {

  private var a: Int = 0
  private var b: Int = 1
  init() {}

  mutating func next() -> Int? {
    defer { (a, b) = (b, a + b) }
    return a
  }

}

Though the differences may seem frivolous at first glance, they are in fact important. If we look at a formal definition of the Fibonacci sequence (e.g., on Wikipedia), we might see something like:

The Fibonacci sequence is given by F(n) = F(n-1) + F(n+1) where F(0) = 0 and F(1) = 1.

Although this declarative description says nothing about an evaluation order, it becomes a concern in our Scala implementation as we must encode the relationship into multiple operational steps. This decomposition offers opportunities to get things wrong:

def next() =
  val r = a
  a = b
  b = a + b // invalid semantics, the value of `a` changed "too early"
  r

In contrast, our Swift implementation can remain closer to the formal definition and is therefore more legible and less error-prone.

Multiple assignments show up in many general-purpose algorithms (e.g., insertion sort, partition, min-max element, …). But perhaps the most fundamental one is swap, which consists of exchanging two values.

We often swap values that are stored in some collection. In this particular case, all is well in Scala because we can ask the collection to swap elements at given positions:

extension [T](self: mutable.ArrayBuffer[T])
  def swapAt(i: Int, j: Int) =
    val t = self(i)
    self(i) = self(j)
    self(j) = t

val a = mutable.ArrayBuffer(1, 2, 3)
a.swapAt(0, 2)
println(a) // ArrayBuffer(3, 2, 1)

Sadly, one can’t implement a generic swap method that wouldn’t rely on the ability to index a container. The only way to express this operation in Scala is to “inline” the pattern implemented by swapAt every time we need to swap two values.

Having to rewrite this boilerplate is unfortunate. Here is an example in a realistic algorithm:

extension [T](self: Seq[T])(using Ordering[T])
  def minMaxElements: Option[(T, T)] =
    import math.Ordering.Implicits.infixOrderingOps

    // Return None for collections smaller than 2 elements.
    var i = self.iterator
    if (!i.hasNext) { return None }
    var l = i.next()
    if (!i.hasNext) { return None }
    var h = i.next()

    // Confirm the initial bounds.
    if (h < l) { val t = l; l = h; h = l }

    // Process the remaining elements. Note that reading elements two by two
    // is actually faster than extracting them one at a time.
    def loop(): Option[(T, T)] =
      if (i.hasNext) {
        var n = i.next()
        var m = if (i.hasNext) { i.next() } else { n }
        if (m < n) { val t = n; n = m; m = t }
        if (n < l) { l = n }
        if (h < m) { h = m }
        loop()
      } else {
        Some((l, h))
      }
    loop()

Note: implementation shamelessly copied from swift-algorithms.

The swap occurs once in the middle of the method with the sequence of expressions val t = l; l = h; h = l and a second time in the loop. To borrow from the words of Edgar Dijskstra [1, Chapter 11]:

[that] is combersome and ugly compared with the [multiple] assignment.

We can find countless other examples of swap in languages that enjoy this operation, such as Swift, C++, and Rust. There are even programming idioms based on its use (e.g., exception safety in C++).

Proposed solution

The proposed solution is to add a language construct to assign multiple variables in a single expression:

var a = 2
var b = 4
(a, b) = (b, a)
println(s"$a$b") // 42

Using this construct, the above Fibonacci iterator can be rewritten as:

class FibonacciIterator() extends Iterator[Int]:

  private var a: Int = 0
  private var b: Int = 1

  def hasNext = true
  def next() =
    val r = a
    (a, b) = (b, a + b)
    r

Multiple assignments also alleviate the need for a swap method on collections, as the same idiomatic pattern can be reused to exchange elements at given indices:

val a = mutable.ArrayBuffer(1, 2, 3)
(a(0), a(2)) = (a(2), a(0))
println(a) // ArrayBuffer(3, 2, 1)

Detailed design

A multiple assignment is an expression of the form AssignTarget ‘=’ Expr where:

AssignTarget ::= ‘(’ AssignTargetNode {‘,’ AssignTargetNode} ‘)’
AssignTargetNode ::= Expr | AssignTarget

An assignment target describes a tree that can only be matched by a compatible composition of tuples. For example, the following program is legal.

def f: (Bool, Int) = (true, 42)
val a = mutable.ArrayBuffer(1, 2, 3)
var x = true

(x, a(0)) = (false, 1337)
(x, a(1)) = f
((x, a(1)), a(2)) = (f, 9000)
(x) = Tuple1(false)

A mismatch between the structure of a multiple assignment’s target and the result of its RHS is a type error. It cannot be detected during parsing because at this stage the compiler would not be able to determine the shape of an arbitrary expression’s result. For example, all multiple assignments in the following program are ill-typed:

def f: (Bool, Int) = (true, 42)
val a = mutable.ArrayBuffer(1, 2, 3)
var x = true

(a(1), x) = f               // type mismatch
(x, a(1), a(2)) = (f, 9000) // structural mismatch
(x) = false                 // structural mismatch
(x) = (1, 2)                // structural mismatch

Note that (x) = Tuple1(false) is not equivalent to x = Tuple1(false). The former is a multiple assignment while the latter is a regular assignment, as described by the current grammar (see Expr1). Though this distinction is subtle, multiple assignments involving unary tuples should be rare.

The operational semantics of multiple assignments (aka concurrent assignments) have been studied extensively in scienific literature (e.g., [1, 2]). A first intuition is that the most desirable semantics can be achieved by fully evaluating the RHS of the assignment before assigning any expression in the LHS [1]. However, additional considerations must be given w.r.t. the independence of the variables on the LHS to guarantee deterministic results. For example, consider the following expression:

(x, x) = (1, 2)

While one may conclude that such an expression should be an error [1], it is in general difficult to guarantee value independence in a language with pervasive reference semantics. Further, it is desirable to write expressions of the form (a(0), a(2)) = (a(2), a(0)), as shown in the previous section.

To address these issues, we define the following algorithm:

Compute the value of the RHS, which forms a tree of values
Traverse the LHS and RHS structures pairwise in inorder, performing assignment for each pair of leaves.

For instance, the evaluation of the expression ((x, a(1)), x) = (f, false) is as follows:

evaluate (f, 9000); the result is a tree ((true, 42), false)
assign x to true
assign a(1) to 42
assign x to false

After the assignment, x == false and a(1) == 42.

The compiler is allowed to modify this order for optimization purposes as long as it can guarantee that such a change is not observable. For example, the assignments of x and y in (x, y) = (1, 2) can be reordered or even performed in parallel.

Impact on existing code

One understandable concern of the proposed syntax is that the semantics of multiple assignments resembles that of pattern matching, yet it has different semantics. For example:

val (a(x), b) = (true, "!") // 1

(a(x), b) = (true, "!")     // 2

If a is instance of a type with a companion extractor object, the two lines above have completely different semantics. The first declares two local bindings x and b, applying pattern matching to determine their value from the tuple (true, "!"). The second is assigning a(x) and b to the values true and "!", respectively.

Though possibly surprising, the difference in behavior is easy to explain. The first line applies pattern matching because it starts with val. The second doesn’t because it involves no pattern matching introducer. Further, note that a similar situation can already be reproduced in current Scala:

val a(x) = true // 1

a(x) = true     // 2

References

Edsger W. Dijkstra: A Discipline of Programming. Prentice-Hall 1976, ISBN 013215871X
Ralph-Johan Back, Joakim von Wright: Refinement Calculus - A Systematic Introduction. Graduate Texts in Computer Science, Springer 1998, ISBN 978-0-387-98417-9

Sporarum · December 15, 2023, 8:02pm

On the principle, I’m not against this change, it would be a neat niche feature
(We probably disagree on whether it would be niche, but I believe the semantics are more important first)

I would like to clarify the exact semantics, in the case of (a(0), p.x) = (b, c),
Do we agree that the following happens:

Evaluate b, let’s call the result v_b
Evaluate c, let’s call the result v_c
Evaluate (v_b, v_c), let’s call the result v_pair
Evaluate a.update(0, v_pair._1)
Evaluate p.x_=(v_pair._2) in the case where it is the most applicable (which is not the common case)

(I skipped the evaluate a and evaluate p)

alvae · December 15, 2023, 9:29pm

Yes, that is the evaluation order I am proposing.

Sporarum · December 15, 2023, 11:03pm

Then I see no technical issues right now.

As I hinted at before I don’t really see a semantic or syntactic issue with it either, it feels like a very natural extension of the language !

So the only negative I see with it is the work necessary to implement and maintain it, and whether that work is worth it for this feature.

I see the usefulness as low (in part because I tend to gravitate to immutable structures), so the question becomes how hard is it to implement ?

If easy, I think this is worth it
If hard, maybe not so much

But I’m open to being convinced otherwise !

morgen-peschke · December 15, 2023, 11:05pm

Is the intent to bless the update method, or specialize this behavior to the standard library?

Sporarum · December 16, 2023, 11:04am

Just looking at potential interactions with other features

Sporarum · December 16, 2023, 11:14am

Thinking more on it, I think it could be more regular, when we do:

val Foo(x, y) = foo

We’re really just doing:

val temp = Foo.unapply(foo).get
val x = temp._1
val y = temp._2

I would therefore expect the following to work as well:

Foo(x, y) = foo

Which would desugar in a similar way:

val temp = Foo.unapply(foo).get
x = temp._1
y = temp._2

devlaam · December 16, 2023, 11:55am

I am in favour of this new language feature. It always stunned me that if you have some lengthy calculation that produces two values in a tuple:

def someCalc(x: Int): (Int,Int) = ???

you can state:

val (a,b) = someCalc(42)

but in case a and b are variables you must do

val aux = someCalc(42)
a = aux._1
b = aux._2

The new solution is much cleaner, and imho, what you naively expect.

alvae · December 16, 2023, 12:45pm

There are very clear use cases for this feature in the context of immutable data structures. In fact, I discovered the language limitation as I was trying to implement value semantics in Scala.

There is a straightforward correspondence between in-place mutation and so-called “functional updates”. Consider the following:

final class Vec2(var x: Int, var y: Int)
def scaleInPlace(v: Vec2, f: Int) =
  v.x *= f
  v.y *= f

@main def example() =
  var v = Vec2(1, 2)
  scaleInPlace(v, 2)

Immutability enthusiasts will probably be repulsed by the vars in Vec2 and offer this code instead:

final class Vec2(val x: Int, val y: Int)
def scale(v: Vec2, f: Int) =
  Vec2(v.x * f, v.y * f)

@main def example() =
  var v = Vec2(1, 2)
  v = scale(v)

Note: The real enthusiasts will also complain about the var in the main function and claim that we should declare another variable instead.

What’s interesting to observe, though, is that if one can guarantee that v is not aliased anywhere, then there the two programs are notionally equivalent. So if can you maintain value independence, that is if you can guarantee that “mutating” something has no observable effect on any other value, then mutation becomes completely harmless. To convince yourself that it’s true, you merely have to rewrite your in-place mutation into a functional update.

So how can we generalize the relationship between scaleInPlace and scale? Well, we simply have to say that every argument that is “mutated” in an in-place mutating function (A1, ..., An) -> R is returned along with R. And then you wish you had multiple assignments to remove the noisy boilerplate involve in extracting results and assign them back to the original variable.

I’m happy to expand on this topic but it would be a little tangential to this Pre-SIP. So if people are interested I’ll open a different thread. Alternatively one can have a look at this paper: Implementation Strategies for Mutable Value Semantics.

Sporarum · December 16, 2023, 1:00pm

I see what you mean, but what I’d really like is to write

@main def example() =
  val v1 = Vec2(1, 2)
  val v2 = scale(v1, 2)

And the compiler rewrites it as:

@main def example() =
  val v = Vec2(1, 2)
  scaleInPlace(v, 2)

(Or even just val v = Vec2(2, 4))

I.E. I really don’t want to deal with mutability as an end-user, as a library designer, if I have to use some to make things fast, I don’t mind some boilerplate

bjornregnell · December 16, 2023, 7:51pm

It works already, if this is what you want to do:

$ scala
Welcome to Scala 3.3.1 (17.0.8.1, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.
                                                                                
scala> def someCalc = (42, 43)
def someCalc: (Int, Int)
                                                                                
scala> var (a, b) = someCalc
var a: Int = 42
var b: Int = 43
                                                                                
scala> b += 1
                                                                                
scala> b
val res0: Int = 44

devlaam · December 16, 2023, 10:01pm

Yes, in this form it works:

def someCalc = (42, 43)
var (a, b) = someCalc
println(s"a=$a, b=$b")

but not in the form i used above:

def someCalc = (42, 43)                                                                            
var (a,b) = (0,0)
(a,b) = someCalc
println(s"a=$a, b=$b")

som-snytt · December 17, 2023, 6:19am

I think this was requested in the ancient past. Possibly a T rex requested it.

In honor of the current thread, I submitted a fix for the message when one of co-defined vars is unused or not updated:

private var (i, j) = init()

Instead of saying “make it a val, what’s the problem?”, it will suggest refactoring, i.e., re-engineering the definition. Probably you will say, Did you add a quick fix? and the answer is no, I didn’t think of doing that.

Edit: periodic reminder that tickets from 2013 are now ten years old. That is not their default. As my daughter would remind me, they did not ask to be born.

sjrd · December 17, 2023, 11:51am

One thing that will be particularly tricky to get right is evaluation order. The initial post already gives a pretty good definition, but it omits a discussion of the evaluation order of the left-hand-side. When the components of the left-hand-side are all local variables, that is not an issue, but when they are more complicated, it becomes difficult.

The current semantics of assignment in the presence of a complicate left-hand-side are as follows.

For

prefix.field = rhs
// desugars to
prefix.field_=(rhs)

the prefix must be evaluated first, then rhs, then the method field_= is called.

For

obj(arg1, ..., argN) = rhs
// desugars to
obj.update(arg1, ..., argN, rhs)

the obj is evaluated first, then the arg1, ..., argN, rhs, in order, then update is called.

If we put these two kinds of things into multi-assignment statements, it is not clear what the exact semantics will have to be:

(prefix.field, obj(arg1, ..., argN)) = (rhs1, rhs2)

what do we evaluate when?

To keep the overall left-to-right evaluation, it seems we should do

prefix
obj
arg1, ..., argN
rhs1
rhs2
call field_=
call update_=

There is also the NullPointerExceptions to think about. On the JVM it will naturally happen at the time of field_= / update_=. In JS however the NPE UB triggers earlier. In a method call x.m(args) it triggers right after evaluating x. We’ll probably want it to happen right after evaluating prefix and then after obj, to be consistent.