SIP-XX - Unpack Case Classes into Parameter Lists and Argument Lists

Submitted another Pre-SIP. Please take a look!

This Pre-SIP formalizes the discussion we had in the earlier thread "Unpacking" classes into method argument lists into one coherent proposal

7 Likes

I like the idea in principle. It feels like it introduces several different concepts, though, each of which maybe deserve their own thought process.

For instance, there is the implicit idea of an erased class here. I don’t think “left up to the implementation” is really precise enough. If the idea is that we have field names as method arguments, and they really are just field names, then we should have a set of solid concepts around what exactly it’s sugar for, how the data is passed along, and so on.

There’s also an idea of unordered naming. That always happens when specifying explicit argument names. But, then, to make this work do we need the “argument names must be specified explicitly” feature activated?

We have introduced a feature where sometimes we want a spread and sometimes we don’t. But in this case (e.g. with Rust structs) you very often want to manually specify some fields and take others from an existing struct. Do we insist that you use copy in that case? connect(configParams.copy(url = "http://url.me")*)? Or do we want the partial spread operator that Rust has?

These are the three main things that I don’t think are put on a sufficiently principled basis in the proposal. There are a few small issues too (e.g. what happens if you have foo(unpack c1: C1, unpack c2: C2) and c2 has a field c1 in it?–on the one hand, the unpacked version has no conflict; but on the other hand, it would be weird if you couldn’t do foo(c2 = myc2*, c1 = myc1*)). But those can be bikeshedded easily enough later.

But the three main issues I see are, as I said,

  1. What is the class representation and if it is not literally just a class being passed as an argument, what are the rules around erased classes
  2. Having a good story around positional vs named arguments
  3. Nailing down whether we have partial spread or copy only

Finally, I think the proposal makes more sense for named tuples than case classes, but this is the one that we’ve got, and it’s sensible enough to start here since there are still rough edges on named tuples.

2 Likes

This is great, at work, we are using something like XXXParams to pack many parameters in anyway.

I think this could be very useful and goes into the right direction. Similar to what @ichoran wrote I think an important question will be in what form these parameters are passed. In

case class RequestConfig(url: String, 
                         connectTimeout: Int,
                         readTimeout: Int)

def downloadSimple(unpack config: RequestConfig) = doSomethingWith(config)

are the arguments passed as a RequestConfig object or as three values? As I understand it, the proposal specifies three values, but then where does the config come from in domeSomethingWith? I assume we have a definition like

def doSomethingWith(config: RequestConfig)

so we do need to re-constitute a RequestConfig. But if that’s necessary, we get lots of secondary questions:

  • what if RequestConfig takes further parameters or using clauses?
  • what if RequestConfig’s constructor has side effects?
  • do we cache the synthesized instance, or do we create a new one on each reference?

If we follow that scheme, I think it would be easier if we use named tuples instead of case classes since the points above would not be issues then.

The alternative would be to state that a single config parameter of type RequestConfig is passed. Then it seems the natural rule would be to restrict the spread operator like in config* to places where an unpack parameter is required. But I do like that spread and unpack are independent so that would speak for the original scheme.

If spread and unpack are independent we could also accommodate partial spreads. Simply state that a spread will fill in only parameters that have not been filled before. E.g.

downLoadSimple(newUrl, config*)

would expand to

downLoadSimple(newUrl, connectTimeout = config.connectTimeout, readTimeout = config.readTimeout)
1 Like

Parameter Passing Convention

While the current proposal does not specify exactly how the parameters are passed, and it’s possible that multiple overloads/passing-conventions are supported for performance, it seems like the requirements will force us to have at least once overload where the unpack is fully expanded and every parameter is passed separately in the bytecode method signature.

From a semantics perspective, I think that having the parameters all be separate is valuable for user-facing understandability.

  • It allows unpack and spread* to be totally independent, which broadens the use case of both features compared to having them only work together.

  • It gives a clear meaning to how things like @unroll interact with unpack: the unpack parameters get completely flattened out, potentially recursively, and only then any @unroll calls take effect

  • In general, “unpack copy pastes the parameter list of the case class into the place where it is used” seems like something that users would be able to grasp relatively easily, without needing to understand implementation details or constraints.

For all intents and purposes, we could specify “unpack re-constructs the case class every single time”, and the number of case classes with side-effecting constructors or other similar things is small enough users are unlikely to encounter those edge cases in typical use.

Multiple Parameter Lists

I think we can probably get away with speccing the feature to just say that we only allow case classes with a single parameter list as the unpack parameter type

An interesting question is the other side: would we allow unpack in the (using) parameter list? It seems like it could be useful sometimes, but I haven’t fully through that through

Performance

There’s some question of performance, but most idiomatic Scala code has tons of inefficiency anyway, so even if unpack is not able to provide peak allocation-free performance that’s probably ok. Traditional Varargs have a ton of overhead allocating an array that may not be acceptable in some situations, resulting in hard-coding intrinsic optimizations for List.apply in the Scala compiler, and proposals like Curried Varargs that attempt to improve upon it. Flexible varargs will probably end up allocating a new array each time. Common operations like array.map.filter.foreach generate tons of intermediate garbage and perform enormous numbers of megamorphic dispatches.

Maybe in the most ultra-optimized Scala code people wouldn’t want to use unpack, but ultra-optimized Scala code already avoids a ton of language and library features anyway, so needing to avoid unpack is not unusual at all. At least for all the scenarios discussed under Applications, the cost of unnecessarily constructing/deconstructing a case class is entirely negligible

Named Tuples

Named tuples almost work, but most parameter lists where something like unpack would be valuable have tons of default parameter values, which named tuples are currently unable to model.
So they could work, but they are insufficient to satisfy the requirements of the scenarios discussed in Applications.

Partial Spreads and Partial Unpacks

If spread and config are independent we could also accommodate partial spreads. Simply state that a spread will fill in only parameters that have not been filled before.

I don’t have a strong opinion here, but it seems like there’s a tradeoff between:

  • “It’s nice to let users specify explicit arguments first to override those that would be created by a spread*”
  • “Having spread* silently drop arguments is risky and seems like it can cause subtle bugs”

Both spread* and unpack could benefit from the flexibility of allowing us to do them on a partial basis. One possibility is we say:

  • In parameter lists, unpack foo allows an optional selector similar to export that specifies exactly what we want to unpack: unpack foo.{bar, qux}, unpack foo.{bar as _, *}

  • In argument lists, foo* is shorthand for unpack foo, which supports a similar set of qualifiers: unpack foo.{bar, qux}, unpack foo.{bar as _, *} for situations where people really want to customize what is being spread

I’m not sure we need to flesh all this out in the initial proposal, but it seems these approaches could work.

1 Like

What if the default case class constructor is private?
I didn’t fully think it through, but can we have an Unpackable[T] typeclass with a Fields and Defaults named tuple type fields? That would make the implementation more generic and push most of it to the standard library instead of the compiler internals.

I had not considered default arguments. Yes, that rules out named tuples. I think we can restrict it to single argument list case classes. We might even be able to define that they need to be pure. We already have such a concept for inlining: We can reduce a case class constructor followed by a selection if the case class is guaranteed to be pure, which in this case means it does not have fields or statements and its superclasses are pure as well. Alternatively, stating that an unpack argument is internally treated as if it was by-name would work as well.

I think that’s mostly a problem if a partial spread* corresponds to an unpack parameter. But for unpacked classes we must require anyways that they don’t repeat parameter names. So in that case spread* would never drop fields.

EDIT: I noiw ealize that was misleading. The “don’t repeat parameter names” ensures that we don’t accidentally drop elements from a spread* that appeared before. But one could very well supply some of the arguments of a spread* before it. I think that’s OK since it would be explicit enough.

Using a typeclass rather than hardcoding case class support could work. As you said, we basically just need type Fields + some kind of data structure representing the presence or absence of defaults for each field

If doing it generically, could even use generic metaprogramming to transform, extend, or elide fields before performing an unpack or *. This could be purely handled by user-land code rather than being baked into the language syntax

The only issue here would be that I am not personally familiar enough with how generic tuple/namedtuple metaprogramming works in Scala 3, so I can’t really propose anything concrete. Maybe someone else can chime in here?

I agree. But I don’t think this is unpack.

To me, this seems like

def foo(config: erased ConfigClass): Unit = ...

where you can refer to the parameters as config.fieldname but they are all compiler fictions, and it’s really def foo(fieldname1: Type1, fieldname2: Type2, ...), and the defaults are taken from the ConfigClass constructor.

So if we understand what it means to have an erased class at use site, then we should be fine. If we don’t, and we ever get anything like erased classes, the features will be awkwardly redundant.

Given that this is most naturally modeled as an erased class, the spread seems sufficiently different that I don’t see why it shouldn’t just unpack by name and type, and for the class to not matter at all.

So foo(myCfg*) would simply attempt to call foo(field1 = myCfg.field1, field2 = myCfg.field2, ...) for some set of fields that either are intrinsically there or which are listed by type Fields in itself and/or a typeclass. This would make it easy to carry around partial sets of variables in named tuples, because they would be eligible also.

If you wanted to have a single case class to hold it all, that should work. But if it’s decomposed into these separate capabilities, it ought also to be possible to have no case class at all and have the apparent structness of it as a compiler fiction which would yell at you if you break the fiction.

Then you could have @unpack as actual annotation if you wanted a bridge from a real class to a set of parameters. def foo(@unpack config: ConfigClass) would have the real class, but it would generate a synthetic forwarder def foo(field1: Type1, field2: Type 2, ...) = foo(ConfigClass(field1, field2, ...)). If you happened to have a config class object, you could just pass it in directly: foo(config).

1 Like

I like it, but do we really need an extra keyword? Can this be done implicitly or is it a bad idea?

1 Like

erased actually means something else, assuming you mean what’s under language.experimental.erasedDefinitions. An “erased” class is a class to which we do not have references at runtime because only erased code references the class. That’s different from saying that the class is decomposed into its members.

2 Likes

Yeah, the class itself is erased, but the fields are not. So, agreed, it’s a different concept than erased = there is nothing is here at all. It’s still a type of erasure, but maybe the commonality is too low for the same keyword to be a good idea. I had intended to convey the difference by erased foo: Foo meaning that foo does not exist, but foo: erased Foo means that foo exists but the nature of its existence is the erasure of Foo to its component parts. Probably too subtle, and then we have two different ideas of “erased class” which are “erased completely” and “erased to components”. Unless the two can be unified, that’s probably a bad idea–and it might be a bad idea anyway if it makes things too obscure.

Nonetheless, “this class (at least as used here) is a compiler fiction” seems like its own capability that should be carefully designed if it’s going to be used here, and I don’t think “unpack” is the most obvious word for that. If inside the method you’re going to refer to things like config.url but there is no actual config object, how that works should be carefully specced so the corner cases are handled. Can you match on it? Can you call methods? Can it store state? Can you pass it to things where a real class is required? Can there be setters? Can it be created with error checking? With context parameters? If the type had to be a named tuple, the answers would almost all be obvious, but case classes are much richer so there is much more to answer.

Otherwise, absent this level of detail, the real function should take an instance of the case class, and there’s just syntactic sugar for overloading the function with another variant that takes all the arguments. That would be far simpler conceptually. There might be performance issues, but in the original proposal getting ideal performance was already stated to be a non-goal, so maybe that’s okay.

To me something feels wrong about this proposal, it feels “unlike Scala”
This is quite vague, so I will try to dig a bit deeper:

The status quo is not so bad

To me, the following feels fine:

class Requester{
  def stream(
    request: Request, // change: not unpack
    chunkedUpload: Boolean = false,
    redirectedFrom: Option[Response] = None,
    onHeadersReceived: StreamHeaders => Unit = null,
  ): geny.Readable = ...
}

r.stream(
  Request(
    "myUrl",
    cert = myCert,
    params=Seq(("hello", "world")),
  ),
  chunkedUpload = true,
)

It’s a bit more verbose, but it’s clear what’s going on
And in general, I prefer my code verbose than obscure

User PoV

Signatures in Scala tend to focus more on how the function is called than how things are stored, or how they are available on the inside of the function:

// xs is a Seq[T] inside, but looks like a bunch of `T`s outside
def foo[T : Showable](xs: T*) = xs.map(x => x.show)
// `T : Showable` is more debatable, but in most cases, you do not need to worry about providing the Showable instance

This proposal goes in the other direction:
To know effective signature of downloadSimple, I have to look at how RequestConfig is defined.

If downloadSimple were to take RequestConfig as a regular parameter, the situation is different:
By looking at the signature, we see we need a RequestConfig as parameter, so we have two options:

  1. Create a dummy val cfg: RequestConfig = ??? and get to it later, or
  2. Look into creating an instance of RequestConfig, forgetting all about downloadSimple, and then come back to it once we have the instance

This allows the (possibly novice) user to do things step by step, with a fresh context for each


If one day we add a conversion of named tuple literals to case classes, then the initial example looks like:

r.stream(
  (
    url = "myUrl",
    cert = myCert,
    params = Seq(("hello", "world")),
  ),
  chunkedUpload = true,
)

Almost identical to this proposal !
But crucially there is a clear path to learn how to use Scala/the library:
First you make every instance explicitly, it’s intuitive and it works
After a while you get tired of typing so much, so you look for shortcuts, and you discover that conversion.
Notably this added complexity was not imposed upon you, you had the choice to find it by yourself.
(Which seems true of other features like SAM-conversion, pattern matching anonymous functions, etc)

Varargs

This proposal is similar to varargs, but differs in that there is an asymmetry between definition- and use-site:

// both use `*`
def foo(x: Int*)
foo(xs*)

// one uses `*`, the other `unpack`
def foo(unpack Bar)
foo(bar*)

I prefer when “there is only be one good way to do something”, that way I don’t have to think about it, and more broadly this avoids fragmentation in the ecosystem
With this proposal, there is the choice between direct parameters, and creating an instance of the unpack class, but you can also spread any other class !
See orthogonality, example after “And you can unpack a different case class”

Varargs on the other hand are much stricter about how they are used:

def foo(x: Int, y: Int)
foo(Seq(1,1)*)
// could fit, but instead:
// error: Sequence argument type annotation `*` cannot be used here: the corresponding parameter has type Int which is not a repeated parameter type

def foo(x: Int, y: Int)
foo((1,1)*)
// could be known to fit, but tuples cannot be spread

val xs: Seq[Int] = ???
def foo(x: String = "", ys: Int*) = ???
foo(xs*) 
// technically unambiguous, but instead:
// error: Sequence argument type annotation `*` cannot be used here: the corresponding parameter has type String which is not a repeated parameter type

Overall this means there is usually only one reason to use spreading: You got the value from somewhere else

Order

When spreading a case class instance, the order of members doesn’t matter, which makes sense to avoid surprises, and is what is done in Python’s PEP-692 given it is based on dicts.
(This might still be surprising to some, since when spreading seqs order matters)

When used as unpack, the order of members does matter, which makes sense because parameter lists are inherently ordered.

But together these two create some mental tension, especially in cases where the spreads do not correspond to the unpacks

Questions about edge-cases

Is it possible to override a single parameter ?

def foo(unpack a: A)
foo(a.copy(m1 = 0)*) // should follow from the SIP as presented, but a bit clunky
foo(m1 = 0, a*) // invalid
foo(a*, m1 = 0) // invalid
foo(a?*, m1 = 0) // a possible solution, but a bit symbol-soup-y

Are the following allowed ?

def foo(x: Int, unpack a: A)
foo(0, a*)

def foo(unpack a: A, y: Int)
foo(a*, 0)

And more generally, how do positional parameters interact with spreading ?
Since the order does not matter for the latter

Conclusion

I absolutely agree that parameter clauses are one of the last remaining places where there is not a clean way to reduce boilerplate !
I have tried here to explain the reasons behind my gut feeling that “something isn’t right”, and I hope they will be useful in finding the correct solution to this problem, be it a different proposal, or a variant of this one

And to this end:

What I would change

  1. Use unpack to spread values (least sure about this change)
  2. Remove orthogonality/keep parallel with varargs, unpack a is only valid if there is an unpack _: A in the same position
    This removes a lot of the questions and edge-cases introduced by the proposal
    But there is the risk that eventually people ask for it anyways
  3. Add a way to override specific elements, ideas: unpack? A, m1! = 0, override m1 = 0