"Unpacking" classes into method argument lists

odersky · October 10, 2023, 1:38pm

I like the general direction in which this is going. But do we need a new spread operator? Maybe we can re-use x*? This would in each case pass the argument without unpacking.

odersky · October 10, 2023, 2:06pm

Mulling this over after discussing with @sjrd and some others at lunch today, I now think this can really be super simple. Here is a MVP (meaning minimum viable proposal).

Unpack modifier

Introduce a new unpack modifier that can be added to the last parameter of a method. Example:

case class Config(size: Int, label: String = "")
def request(body: Body, unpack config: Config) = ...

The type of an unpack parameter must be a statically accessible class. Inside the method, unpack has no significance; we simply have a parameter of the declared type.

When such a method is called, we first match arguments as usual. When it comes to matching the last formal parameter, we simply wrap the remaining actual arguments with the apply method of the class. Example:

request(body, 20, label = "abc")

would expand to

request(body, Config(20, label = "abc"))

The expansion is done before typing. So there could be several apply methods and overloading resolution will pick the correct one.

Reuse Spread Operator

If we have a call like

request(body, config*)

we pass the last argument “as is” without wrapping. This works in the same way for repeated parameters (where the argument must be a sequence or an array) and for unpack parameters, where the argument must simply match the formal parameter type.

Notes

The unpack parameter class will often be a case class (since then it is easy to pick it apart) but this is no absolute requirement. In Scala 3, regular classes are provided with synthetic apply methods as well, and we can use them to wrap the arguments.
As always, the unpack class can itself have default arguments which then translate to default arguments of the unpacked version. So with the definitions above
```
request(body, size = 20)
```
would be legal and expand to
```
request(body, Config(size = 20))
```
which in turn expands to
```
request(body, Config(size = 20, label = ""))
```
The Config class can also have defaults for all parameters in which case we can leave out the argument completely. So given
```
case class Config(size: Int = 20, label: String = "abc")
```
we can write
```
request(body)
```
and this would expand to
```
request(body, Config())
```
I believe the question whether we want to accommodate multiple unpack parameters per method and/or multiple spread operators per class is orthogonal. This would probably raise a lot of tricky questions of how to expand and disambiguate. This proposal intentionally leaves that out. I would claim we thus get most of the benefit with very little additional language complication. But it could well be added later.
The proposal also intentionally does not allow you to use a spread operator to expand into multiple normal parameters of the called method. I believe such a feature would lead to too clever-by-half code which could become very confusing.
Interestingly the question of special named arguments turned out to be a red herring. The proposal works with named and positional arguments in the same way.

lihaoyi · October 10, 2023, 4:33pm

I think these extensions are worth doing, but postponing them till later is fine. The minimal proposal would already provide a lot of value. Most of my own use cases - in requests-scala, os-lib, upickle, etc. - would be satisfied with a single explicitly-declared unpack paramater per method.

We can always loosen these restrictions later if we decide it’s worth it

Bikeshedding, but I don’t like the asymmetry between unpack at def-site and * at callsite. We just went throught a whole process in Scala3 to consistently use * for positional unpacking, rather than :_* or @_*, so I feel we should not make the same mistake again so soon. ** (prefix or suffix) or .. or ... would be options that would fit at both defsite and callsite

Also, if we use * at callsite, isnt there ambiguity if the class extends Seq?

dont we need to type the method being called in order to know which arguments to bundle up?

Can it be a generic class? or an inner class? The “simple” case of a top level class is straighforward, but i wonder where we should draw the line in how sophisticated a type we allow here

lihaoyi · October 10, 2023, 4:41pm

this should work in class constructors as well right? That would go a long way to DRYing up boilerplatey case class hierarchies that all share a few fields

alvae · October 10, 2023, 4:51pm

I would agree and add that a dedicated syntax might make this feature more easily recognizable. Stumbling on a call site of the form request(a, things...) would make it clear that I should be looking for a declaration that expands a class into its argument list.

I wonder if perhaps we wouldn’t want to make it an absolute requirement. That would restrict the scope of the feature to use cases we have actually identified. Starting from a more constrained design would make it easier to determine if and how it should be relaxed.

odersky · October 10, 2023, 6:35pm

The point is, for better or for worse we already have * as the spread operator. And it does basically the same thing in both cases, namely disable the wrapping into a Config object or Seq literal that would otherwise occur.

Aside: A more uniform design could drop the Int* syntax for vararg parameters, i.e. replace

def ingest(xs: Int*)

with

def ingest(unpack xs: Seq[Int])

The only difficulty then is that we’d somehow have to treat the vararg of Seq.apply specially, since otherwise we’d get an infinite recursion. But that’s just speculation, I am not proposing we change varargs again.

Also, if we use * at callsite, isnt there ambiguity if the class extends Seq?

This would be disambiguated by the formal parameter, whether it’s vararg or unpack.

Can it be a generic class? or an inner class? The “simple” case of a top level class is straighforward, but i wonder where we should draw the line in how sophisticated a type we allow here

It can certainly be a generic class or an inner class of a global class. The questiion is what to do when it is a class that is somehow visible from the receiver type of the method, but not directly adressible from the callsite. I think we can probably come up with reasonable rules that work also in that case. But that’s not a MVP.

this should work in class constructors as well right? That would go a long way to DRYing up boilerplatey case class hierarchies that all share a few fields

Yes, of course.

morgen-peschke · October 10, 2023, 7:05pm

Should recursive unpacking be allowed?

case class Foo(a: Int, b: String)
case class Bar(a: Int, unpack foo: Foo)

def baz(a: Int, unpack bar: Bar): Unit = ???


baz(a = 5, a = 6, a = 7, b = "hi") 
// Desugared as baz(a = 5, Bar(a = 6, Foo(a = 7, b = "hi")))

I’d imagine this could get really hairy with inappropriate overloads of baz, Foo.apply, and Baz.apply.

lihaoyi · October 10, 2023, 7:30pm

I would imagine having the same name be used both as a top-level parameter and as an unpacked parameter should be a compile error, just as if you tried to define two top-level parameters with the same name

odersky · October 10, 2023, 7:52pm

I don’t see why not. It is usually harder to invent rules to prevent recursion than to simply allow it.

I would imagine having the same name be used both as a top-level parameter and as an unpacked parameter should be a compile error, just as if you tried to define two top-level parameters with the same name

Yes, that seems reasonable.

bjornregnell · October 10, 2023, 7:54pm

I like the idea of having a soft keyword referring to the intent at call site. I’m not sure though if unpack is the best. Perhaps unwrap or even shorter: unbox.

I don’t mind so much if we reuse the config* syntax for this related feature. But if it is really important with symmetry across def-call-site it could be box config as in the negation of unbox. But I like config* better as it is already in the language.

I think symbolic syntax here like .. or ** is too cryptic.

soronpo · October 10, 2023, 9:50pm

What is expected to happen if unpack is applied on a implicit argument (or more correctly, inside an implicit block)? Should that be allowed?

lihaoyi · October 11, 2023, 1:20am

These symbols are pretty widely used in the broader programming language ecosystem for this specific purpose:

Language	Sequence unpacking	Key-value unpacking
Scala	postfix `*`	n/a
Python	prefix `*`	prefix `**` / `Unpack[T]`
Ruby	prefix `*`	prefix `**`
Javascript	prefix `...`	prefix `...`
Java	postfix `...`	n/a
Fsharp	`[<ParamArray>]`	prefix `...` (proposed)
Csharp	prefix `params`	n/a
Kotlin	prefix `vararg`/`*`	n/a
PHP	prefix `...`	n/a
Go	prefix (defsite) postfix (callsite) `...`	n/a
Swift	postfix `...`	n/a

Different languages do it differently, but the common theme is

that they mostly use some variant of *, **, or .... Some are postfix and some are prefix, but the operator used is surprisingly consistent
using a standalone keyword like unpack is the odd one out, only done by Csharp (params) and Kotlin (vararg) out of all the languages I’m familiar with.
... is the most widely used by number-of-languages
though ** is used by both Python and Ruby which together have a huge marketshare and mindshare

We can expect people coming to Scala from basically any other programming language to already have some intuition for what ** or ... is meant to do, even without any formal training or education.

In the end I dont think the choice of keyword or syntax is a blocker, but I personally prefer ** at both defsite and callsite. For symmetry with * which we already have, and following Python which is a popular language which uses it for (roughly) the same purpose and one whose UX we are often trying to emulate (e.g. * syntax unification, import as syntax, indentation-based blocks, even this thread which is inspired by Python 3.12’s Unpack)

odersky · October 11, 2023, 8:21am

One issue here is that varargs and unpack are specified differently. With varargs we write
f(xs: Int*). That is, we write the element type and add a *. With unpack we write the container type instead. So I believe having a notation like ** that is too close to * would be misleading.

rssh · October 11, 2023, 12:38pm

I see this as quite awful situation from the library author’s point of view.

So, I should choose for my methods - to use a new unpack feature to allow people to pass objects if needed or write a method with named parameters. I.e., yet one fragmentation.

And since allowing people to pass objects is good from the generalization and abstraction point of view, it looks better to describe my library method with the help of an additional case class. So, libraries will be filled by Unpack definitions (and boilerplate case class code).

From my point of view, It is better to integrate or extend existing features, then create yet one moment of choice - which feature from two ((existing calls with named arguments in the function definition) and (unpack with the definition of names in additional case class)) I should choose for my method.

Katrix · October 12, 2023, 7:25am

Would allowing having multiple unpack parameters be okay if they were in different parameter blocks? We allow this with varargs, so I don’t see why not

odersky · October 12, 2023, 9:52am

Yes, that would be no problem.

bjornregnell · October 12, 2023, 1:15pm

Many thanks for creating that comprehensive table.

So in the second column we only have 3 dynamic languages currently using a symbolic notation for key-value unpacking. I don’t think that prevents us from using a Scala-ish idiom of our own if we can find a really good one.

For a learner of Scala the unpack feature will be yet another thing to learn and even if the learner does not use it, existing code might contain it so a reader of code need to be familiar with it, otherwise getting surprised. I think we should, if possible prioritize readability on the call site here, although readability in api implementation is also important.

As a reader of code already could know about adapting sequences to a vararg using config* I think that is best on the call site.

On the def site I like to have a keyword that communicate intent when reading it out loud. Saving letters typed by a symbolic thing on the def site is not as important as being readable even if you haven’t used this feature much yourself.

Hence I think this is better than ** or ***or .. or ...:

def request(body: Body, unpack config: Config) = ???

but I think we should consider/brainstorm all reasonable alternative keywords etc.

By the way: is this not somehow related to unapply? Perhaps we could use the match keyword, as the fields are matched to params? Or something else related to the deconstruction happening thanks to unapply?

Just brainstorming here. Need to think a bit more…

bjornregnell · October 12, 2023, 1:18pm

Also we could think more about if this feature can be generalized somehow? Are there other places where a similar kind of automatic unboxing is useful? Perhaps when going back and fourth from case classes to tuples?

bjornregnell · October 12, 2023, 1:34pm

I also have an itch that the keyword unpack is kinda used backwards… I mean: if this

request(body, size = 20)

is expanded to

request(body, Config(size = 20, label = ""))

then args are actually boxed into Config and not unboxed… So the intent at def-site is to allow for the call site to autobox args into this class by applying arguments to matching field names.

So perhaps it should be

def request(body: Body, pack config: Config) = ???

or similar…

some variants:

def request(body: Body, box config: Config) = ???

def request(body: Body, match config: Config) = ???

def request(body: Body, yield config: Config) = ???

def request(body: Body, apply config: Config) = ???

(I kinda like the last one as apply is normally invisible and so is config at call site.)

lrytz · October 12, 2023, 2:34pm

This looks like a generalization of current repeated parameters. I think we should aim for replacing and deprecating repeated parameters, otherwise we end up with two similar features with different syntax.

Are defaults allowed in methods with an unpack parameter? If defaults are used at call-site, how would you know where to insert Config( ... )? Scala 3 allows defaults in methods with a repeated parameter, and it takes it pretty far:

scala> def f(x: Int = 1, y: String*) = s"$x-" + y.mkString("-")
def f(x: Int, y: String*): String

scala> f(y = "a", x = 44, "b")
val res12: String = 44-a-b

Overloading resolution (and picking the most specific overload) will be affected, which is always non-trivial to spec and implement.

Tooling will be affected, IDEs have to learn to present unpack parameters nicely at call-site.