"Unpacking" classes into method argument lists

You typically wouldn’t store data in those, but rather use the type as a description for the case class’s shape, allowing the compiler the ability to infer the proper way to instantiate the case class from foo(name="the meaning of life", year=1983), as well as check if all the necessary fields have been provided. The Dynamic here is just the backing for foo, and only stores the information on construction of Movie, not the data of Movie.

Tuple as a store of HMap data is inefficient though, even if it’s only storing that information at compile-time. I would say that’s one of the main things that makes this unfeasible for massive usage.

1 Like

Okay, this is a good point. It would be better to restrict it to cases where there is symmetry for a type T between T(a = x, ...): T and t.a: A, .... I’m not convinced that random inherited members is an in-practice problem, but being able to create the thing you’re taking members from seems like a nice property.

It just needs to be flexible enough to handle syntactic upcasting. So if we have

case class Foo(i: Int, j: Long) {}
case class Bar(i: Int, j: Long, k: String) {}
def run(foo: ..Foo) = ???
val bar = bar(1, 2L, "three")

then presumably we want

run(..bar)

to work, even though we can’t

def run(foo: ..Foo) =
  val b = Bar(..foo)
  ...

which would be the symmetric condition.

The whole reason that it works as syntactic sugar for extensible records is that it isn’t symmetric so you can map the matching fields / parameters.

I’m actually not sure we want that case to work. I’d want the “too few params in case class unpacked at callsite” scenario to work, since you can add named params specify the others explicitly. But the “too many params in case class unpacked at callsite” scenario does not give any was of explicitly removing params, and removing them implicitly feels a bit off to me.

The precedence isnt obvious here.

  • On one hand Python, unpacking into untyped **kwargs does let you add extra stuff that all gets bundled into the dict.
  • On the other hand, unpacking **kwargs with extra fields into non-**kwargs functions that do not define those named params is an error. So they’re flexible in one way but not the other

Since we’re working with case classes, by their nature they’re typed and non-extensible. And Scala does err on the side of “strict” more than Python does. So IMO erroring out on extra fields that do not correspond to a defined named parameter is the right thing to do

Notably, a variant of this is being discussed in the F# community (just for records, since they don’t have named parameters in method calls), and it seems like it’s likely to make it in Spread operator for F# · Issue #1253 · fsharp/fslang-suggestions · GitHub. They propose ... similar to what @Ichoran suggested

1 Like

I like the general direction in which this is going. But do we need a new spread operator? Maybe we can re-use x*? This would in each case pass the argument without unpacking.

1 Like

Mulling this over after discussing with @sjrd and some others at lunch today, I now think this can really be super simple. Here is a MVP (meaning minimum viable proposal).

Unpack modifier

Introduce a new unpack modifier that can be added to the last parameter of a method. Example:

case class Config(size: Int, label: String = "")
def request(body: Body, unpack config: Config) = ...

The type of an unpack parameter must be a statically accessible class. Inside the method, unpack has no significance; we simply have a parameter of the declared type.

When such a method is called, we first match arguments as usual. When it comes to matching the last formal parameter, we simply wrap the remaining actual arguments with the apply method of the class. Example:

request(body, 20, label = "abc")

would expand to

request(body, Config(20, label = "abc"))

The expansion is done before typing. So there could be several apply methods and overloading resolution will pick the correct one.

Reuse Spread Operator

If we have a call like

request(body, config*)

we pass the last argument “as is” without wrapping. This works in the same way for repeated parameters (where the argument must be a sequence or an array) and for unpack parameters, where the argument must simply match the formal parameter type.

Notes

  1. The unpack parameter class will often be a case class (since then it is easy to pick it apart) but this is no absolute requirement. In Scala 3, regular classes are provided with synthetic apply methods as well, and we can use them to wrap the arguments.

  2. As always, the unpack class can itself have default arguments which then translate to default arguments of the unpacked version. So with the definitions above

    request(body, size = 20)
    

    would be legal and expand to

    request(body, Config(size = 20))
    

    which in turn expands to

    request(body, Config(size = 20, label = ""))
    
  3. The Config class can also have defaults for all parameters in which case we can leave out the argument completely. So given

    case class Config(size: Int = 20, label: String = "abc")
    

    we can write

    request(body)
    

    and this would expand to

    request(body, Config())
    
  4. I believe the question whether we want to accommodate multiple unpack parameters per method and/or multiple spread operators per class is orthogonal. This would probably raise a lot of tricky questions of how to expand and disambiguate. This proposal intentionally leaves that out. I would claim we thus get most of the benefit with very little additional language complication. But it could well be added later.

  5. The proposal also intentionally does not allow you to use a spread operator to expand into multiple normal parameters of the called method. I believe such a feature would lead to too clever-by-half code which could become very confusing.

  6. Interestingly the question of special named arguments turned out to be a red herring. The proposal works with named and positional arguments in the same way.

7 Likes

I think these extensions are worth doing, but postponing them till later is fine. The minimal proposal would already provide a lot of value. Most of my own use cases - in requests-scala, os-lib, upickle, etc. - would be satisfied with a single explicitly-declared unpack paramater per method.

We can always loosen these restrictions later if we decide it’s worth it

Bikeshedding, but I don’t like the asymmetry between unpack at def-site and * at callsite. We just went throught a whole process in Scala3 to consistently use * for positional unpacking, rather than :_* or @_*, so I feel we should not make the same mistake again so soon. ** (prefix or suffix) or .. or ... would be options that would fit at both defsite and callsite

Also, if we use * at callsite, isnt there ambiguity if the class extends Seq?

dont we need to type the method being called in order to know which arguments to bundle up?

Can it be a generic class? or an inner class? The “simple” case of a top level class is straighforward, but i wonder where we should draw the line in how sophisticated a type we allow here

1 Like

this should work in class constructors as well right? That would go a long way to DRYing up boilerplatey case class hierarchies that all share a few fields

I would agree and add that a dedicated syntax might make this feature more easily recognizable. Stumbling on a call site of the form request(a, things...) would make it clear that I should be looking for a declaration that expands a class into its argument list.

I wonder if perhaps we wouldn’t want to make it an absolute requirement. That would restrict the scope of the feature to use cases we have actually identified. Starting from a more constrained design would make it easier to determine if and how it should be relaxed.

1 Like

The point is, for better or for worse we already have * as the spread operator. And it does basically the same thing in both cases, namely disable the wrapping into a Config object or Seq literal that would otherwise occur.

Aside: A more uniform design could drop the Int* syntax for vararg parameters, i.e. replace

def ingest(xs: Int*)

with

def ingest(unpack xs: Seq[Int])

The only difficulty then is that we’d somehow have to treat the vararg of Seq.apply specially, since otherwise we’d get an infinite recursion. But that’s just speculation, I am not proposing we change varargs again.

Also, if we use * at callsite, isnt there ambiguity if the class extends Seq?

This would be disambiguated by the formal parameter, whether it’s vararg or unpack.

Can it be a generic class? or an inner class? The “simple” case of a top level class is straighforward, but i wonder where we should draw the line in how sophisticated a type we allow here

It can certainly be a generic class or an inner class of a global class. The questiion is what to do when it is a class that is somehow visible from the receiver type of the method, but not directly adressible from the callsite. I think we can probably come up with reasonable rules that work also in that case. But that’s not a MVP.

this should work in class constructors as well right? That would go a long way to DRYing up boilerplatey case class hierarchies that all share a few fields

Yes, of course.

Should recursive unpacking be allowed?

case class Foo(a: Int, b: String)
case class Bar(a: Int, unpack foo: Foo)

def baz(a: Int, unpack bar: Bar): Unit = ???


baz(a = 5, a = 6, a = 7, b = "hi") 
// Desugared as baz(a = 5, Bar(a = 6, Foo(a = 7, b = "hi")))

I’d imagine this could get really hairy with inappropriate overloads of baz, Foo.apply, and Baz.apply.

I would imagine having the same name be used both as a top-level parameter and as an unpacked parameter should be a compile error, just as if you tried to define two top-level parameters with the same name

I don’t see why not. It is usually harder to invent rules to prevent recursion than to simply allow it.

I would imagine having the same name be used both as a top-level parameter and as an unpacked parameter should be a compile error, just as if you tried to define two top-level parameters with the same name

Yes, that seems reasonable.

1 Like

I like the idea of having a soft keyword referring to the intent at call site. I’m not sure though if unpack is the best. Perhaps unwrap or even shorter: unbox.

I don’t mind so much if we reuse the config* syntax for this related feature. But if it is really important with symmetry across def-call-site it could be box config as in the negation of unbox. But I like config* better as it is already in the language.

I think symbolic syntax here like .. or ** is too cryptic.

What is expected to happen if unpack is applied on a implicit argument (or more correctly, inside an implicit block)? Should that be allowed?

These symbols are pretty widely used in the broader programming language ecosystem for this specific purpose:

Language Sequence unpacking Key-value unpacking
Scala postfix * n/a
Python prefix * prefix ** / Unpack[T]
Ruby prefix * prefix **
Javascript prefix ... prefix ...
Java postfix ... n/a
Fsharp [<ParamArray>] prefix ... (proposed)
Csharp prefix params n/a
Kotlin prefix vararg/* n/a
PHP prefix ... n/a
Go prefix (defsite) postfix (callsite) ... n/a
Swift postfix ... n/a

Different languages do it differently, but the common theme is

  • that they mostly use some variant of *, **, or .... Some are postfix and some are prefix, but the operator used is surprisingly consistent
  • using a standalone keyword like unpack is the odd one out, only done by Csharp (params) and Kotlin (vararg) out of all the languages I’m familiar with.
  • ... is the most widely used by number-of-languages
  • though ** is used by both Python and Ruby which together have a huge marketshare and mindshare

We can expect people coming to Scala from basically any other programming language to already have some intuition for what ** or ... is meant to do, even without any formal training or education.

In the end I dont think the choice of keyword or syntax is a blocker, but I personally prefer ** at both defsite and callsite. For symmetry with * which we already have, and following Python which is a popular language which uses it for (roughly) the same purpose and one whose UX we are often trying to emulate (e.g. * syntax unification, import as syntax, indentation-based blocks, even this thread which is inspired by Python 3.12’s Unpack)

3 Likes

One issue here is that varargs and unpack are specified differently. With varargs we write
f(xs: Int*). That is, we write the element type and add a *. With unpack we write the container type instead. So I believe having a notation like ** that is too close to * would be misleading.

I see this as quite awful situation from the library author’s point of view.

So, I should choose for my methods - to use a new unpack feature to allow people to pass objects if needed or write a method with named parameters. I.e., yet one fragmentation.

And since allowing people to pass objects is good from the generalization and abstraction point of view, it looks better to describe my library method with the help of an additional case class. So, libraries will be filled by Unpack definitions (and boilerplate case class code).

From my point of view, It is better to integrate or extend existing features, then create yet one moment of choice - which feature from two ((existing calls with named arguments in the function definition) and (unpack with the definition of names in additional case class)) I should choose for my method.

Would allowing having multiple unpack parameters be okay if they were in different parameter blocks? We allow this with varargs, so I don’t see why not

1 Like

Yes, that would be no problem.