"Unpacking" classes into method argument lists

lihaoyi · October 2, 2023, 11:18pm

Python 3.12 just landed this ability, via the special Unpack[T] type that you can use in argument lists to “unpack” the fields of the type T into the argument list for people to call directly, in a statically-typed manner:

class Movie(TypedDict):
    name: str
    year: int

def foo(**kwargs: Unpack[Movie]) -> None: ...

foo(name="The Meaning of Life", year=1983)  # OK!

kwargs: Movie = ...

foo(**kwargs)

This makes it much easier to

(a) define related methods that share some - but not all - of their arguments, without having to have the user manually bundle them up into a config object to pass in
(b) while still having the arguments & types be statically known

This is a big improvement in Python over **kwargs which do (a) and not (b), but would also be a big improvement in Scala where you can do (b) but not (a). e.g. this comes up in the com-lihaoyi/requests-scala code, where we have copy-pasted-but-slightly-different argument lists between Requester#apply, Requester#stream, and Request#apply, as well as some overloads where we do unpacking e.g. here

How hard would it be to implement something like this in Scala? I feel like it would greatly improve the ergonomics of defining and managing many of these “direct-style” APIs where you call methods and pass parameters without having to manually construct elaborate trees of nested configuration objects to pass in.

odersky · October 3, 2023, 5:24pm

It would be interesting to see a proposal adapted to Scala. If I understand correctly, if I write

def showMovie(kwargs**)

then inside showMovie, kwargs is represented internally as a Map. Now if I give it a type Unpack[Movie], the kwargs given externally can be typed. For instance

showMovie(year = 1982, name = "E.T.")
showMovie(name = "E.T.")

are both well-typed, but showMovie(name = true) is ill-typed.

But how would kwargs be typed internally in showMovie? Is Map[String, Any] the best we can do?

sirocchj · October 3, 2023, 6:03pm

Wouldn’t it be more a heterogeneous list of tuples? I.e. (String, Int) *: (String, String) *: EmptyTuple or even (singleton string, A)?

odersky · October 3, 2023, 6:19pm

Not sure. In Python, it’s a dictionary. A Tuple Map would indeed give better types, but it’s a lot slower both at run time (linear instead of constant time lookup) and at compile time (type size is linear in the number of fields).

morgen-peschke · October 3, 2023, 6:56pm

I agree with @lihaoyi, this does feel a lot more like blessing LablelledGeneric from Shapeless to a language feature than something that a Map[String, Any] would satisfy, for two reasons:

If you’re dealing with a Map[String, Any], typing at the call site is nice but the method itself is going to still be a mess.
Bunch of handwaving follows (been a while since I’ve done shapeless), but this should be sort of close to what could be done in Scala 2, so settling for a Map[String, Any] feels like defeat.

def foo[A <: HList](args: A)(implicit gen: LabelledGeneric.Aux[Movie, A]): String -> Int = {
  val movie = genM.to(args)
  movie.name -> movie.year
}

foo("name" ->> "E.T." :: "year" ->> 1982 :: HNil)

If we can do the same thing as a language feature, it could be much less noisy:

def foo[T <: HList](args: T)(implicit unpack: Unpack[T, Movie]): String -> Int = {
  val movie = unpack(args)
  movie.name -> movie.year
}

// Compiler would convert this into the appropriate HList, similar to 
// how varargs are handled
foo(name = "E.T.", year = 1982)

odersky · October 3, 2023, 7:50pm

@lihaoyi Can you confirm that the rest of this post is indeed what you had in mind?

lihaoyi · October 3, 2023, 10:27pm

I was actually thinking it could be auto-boxed/unboxed via case class instances.

Python uses typed dictionaries, which Scala doesn’t have built in. Tuples are close, but not quite enough if we want field/param names to be meaningful which they are for python’s ** (though we could argue that a separate heterogenous positional-argument unpacking via * could also make sense)
The closest thing we have in Scala for representing a “typed set of heterogenous named values” is case classes, or traits
We want to be able to both box and unbox them for the purpose of supporting this syntax, which makes case classes more appropriate since those come with constructors (we could new up traits with anonymous subclasses, but that would be a bit unusual)
Sure we could use HMaps, but (a) Scala doesnt have them and (b) they are an advanced technique not really comparable to Python’s typed dictionaries which are relatively pedestrian

So the Python code translated to Scala would look something like:

case class Movie(name: String, year: Int)

def foo(**kwargs: Movie): Unit = {
  // inside you get access to kwargs.name or kwargs.year,
  // since kwargs is a `Movie` instance
}

foo(name="The Meaning of Life", year=1983)  # OK!

val kwargs: Movie = ...

foo(**kwargs)

I left out the Unpack type, since Scala does not have an existing usage of ** that needs to be disambiguated from. I argue that this best carries the spirit and semantics of the original Python snippet, even if the implementation and details differ.

In fact, with this proposal case classes would potentially get some HMap-like properties for free, since you could unpack them into each other (pseudo-Scala syntax I just made up)

case class Foo(i: Int, s: String)
case class Bar(b: Boolean, d: Double)
case class Qux(i: Int, s: String, b: Boolean, d: Double)


val foo: Foo = ???
val bar: Bar = ???

val qux = Qux(**foo, **bar) // same as Qux(i = foo.i, s = foo.s, b = bar.b, d = bar.d)

Or even

case class Foo(i: Int, s: String)
case class Bar(b: Boolean, d: Double)
case class Qux(**foo: Foo, **bar: Bar){
  // here we can use foo.i, foo.s, bar.b, bar.d
}


val foo: Foo = ???
val bar: Bar = ???

val qux = Qux(**foo, **bar) // same as Qux(i = foo.i, s = foo.s, b = bar.b, d = bar.d

One big point of Scala boilerplate is that you can’t abstract over portions of a method parameter list or portions of a class constructor. That forces you to either

copy-paste the list of relevant parameters (boilerplate at definition), and have some overload that takes a config case class anyway (more boilerplate at definition site). This is what requests-scala and os-lib and upickle chose
make the method/class only take a config object that has to be constructed and passed in (boilerplate at callsite). This is what scalaj-http or sttp chose

Neither option above is really satisfactory. In fact, this tradeoff is almost exactly why the com-lihaoyi ecosystem exists at all: to choose Option 1 where most other libraries choose Option 2. The “typed unpacking” described above would nicely give us a third way, letting us define re-usable portions of argument lists in a typed and familiar manner, without boilerplate at either callsite or definition site. We’d be able to get the best of both worlds without having to make a tradeoff.

There are binary compatibility concerns, but no more than those already present for method signatures and case classes, and could be solved by the same technique (unrolled/telescoping default arguments)

Ichoran · October 4, 2023, 12:27am

Bikeshedding a bit, I don’t think ** is a great term to use here because even though it doesn’t have a predefined meaning, it is legal syntax, so you’re liable to clobber some library somewhere.

I would instead favor the Rust struct .. syntax to fill in arguments by name, since it’s guaranteed not to clobber anything.

We could decide that this is completely general (but you have to name everything left over):

case class Things(i: Int, k: String) {}
def foo(i: Int, j: Boolean, k: String) = ???

val things = Things(1, "one")
val a = foo(j = true, ..things)

def bar(things: Things) = ???
bar(..(i = 0, k = "zero"))  // Explicit request for boxing
bar(i = 0, k = "zero")    // Fully implicit--is this okay?

case class More(i: Int, j: Boolean, k: String) {}
val more = More(j = false, ..things)
val moremore = More(i = 9, j = false, ..things)  // i.e. k = things.k

If we go this way, function signatures don’t have to change at all!

Alternatively, if we think this is a little too permissive,

case class Things(i: Int, k: String) {}
def bar(things: ..Things) = ???

val things = Things(2, "two")
bar(things)  // Direct call with correct type fine
bar(i = 2, k ="two")  // Named call with pieces also fine

case class More(i: Int, j: Boolean, k: String) {}
def bippy(more: ..More) = ???

val m = More(0, false, "zero")

more(i = 1, j = true, k = "one")  // Fine
more(j = false, ..things)  // Also fine
more(m)  // Still good

but stuff like def baz(c: Char, s: String) = ??? would still be entirely free of case classes.

If, like Rust, we introduce a Default trait, we could interpret any missing parts as missing = summon[Default[Things]].missing, so partial matches would be okay.

Anyway, the key idea here is that .. becomes the universal “fill in things by keyword” prefix.

Depending on how expansive we wanted to be, we could generalize beyond case classes to any type T that can create an instance using T(a = x, ...) via apply on a companion or a constructor, and to any accessor (whether a method or extension method) t.a: A with the type needed.

lihaoyi · October 4, 2023, 12:43am

I don’t think being this permissive will work, because you then start getting into issues like

def bar(things: Things, i: Int) = ???

bar(i = 0, k = "zero")

Where the i from Things and the i in the parameter list next to it will collide. This is also a problem with an explicit marker .. or **, but at least that only becomes a problem for new code with the marker, whereas trying to make it fully implicit will cause ambiguity in existing code.

On the other hand, the other example you gave:

bar(..(i = 0, k = "zero"))  // Explicit request for boxing

The syntax is awful (it means the exact opposite of what most people would expect .. to mean) but the concept is not unusual. This is basically the Target-Typed New from C#

Author author = new("Matt Eland"); // Note: Author not repeated for the constructor

Dictionary<string, int> playerGoals = new();

Depending on how expansive we wanted to be, we could generalize beyond case classes to any type T that can create an instance using T(a = x, ...) via apply on a companion or a constructor, and to any accessor (whether a method or extension method) t.a: A with the type needed.

That’s reasonable. The important properties are that (a) it’s statically typed and (b) the set of “known fields” is well defined, e.g. we don’t want to accidentally be unpacking things like toString or hashCode. case classes give us both these things, but with a bit of work it could certainly be generalized a bit

Ichoran · October 4, 2023, 1:32am

lihaoyi:

I don’t think being this permissive will work, because you then start getting into issues like
def bar(things: Things, i: Int) = ???

bar(i = 0, k = "zero")
Where the i from Things and the i in the parameter list next to it will collide.

That’s easy: i in the argument shadows i in Things. So either Things had better have a default value for i, or don’t write your method like that if you want it used with k = "zero". You get to name the arguments, after all!

Fair enough. I didn’t like it either. But one can imagine alternatives if the idea is sound (use-site declaration of sugar).

Um, why not? def foo(toString: String, hashCode: Int) = ??? seems awfully suggestive, no? What else would you want it to do?

lihaoyi · October 4, 2023, 2:53am

I think you may be right that this can be made technically unambiguous and backwards compatible, but it still feels like it’s too ambiguous and confusing from a human perspective. Explicit unpacking of key-value pairs via ** or .. or whatever is common in programming languages, and even the fact that we’re allowing typed heterogenous key-value pairs is no longer unique since Python has it. But implicitly taking key-value argument and constructing parameters that seem to match is not common, and AFAICT does not exist elsewhere at all

I’d want it to unpack a foo(**myFoo) if-and-only-if myFoo is of a case class defined as:

case class Foo(override val toString: String, override val hashCode: Int)

Doing it implicitly based on java.lang.Object#toString or java.lang.Object#hashCode, inherited from some parent class or trait, seems like a bad idea:

It’s not just toString and hashCode: what about equals? clone? notify? notifyAll? wait? or even productArity? productIterator? productPrefix? productElementNames? Other things their case class may inherit from other upstream traits? We cannot expect developers to be aware of all the things that their class inherits, so making these random zero-arg methods unpack into argument lists is guaranteed to be surprising. Effectively you have an unbounded laundry list of parameter names you have to ban if you want avoid accidental unpacking. But we can expect people to know the fields of the case classes they use, which are all listed in one place, so this is less of a problem there.
It breaks symmetry between packing and unpacking: just because you can query the .toString of a case class instance doesn’t mean you can construct an instance with a given toString. Maybe it’s not strictly necessary to have packing/unpacking be symmetric, but it’s a really nice properly from a human-understandability perspective. For example:

val myFoo: Foo = ???
def foo(toString: String, hashCode: Int) = ???
foo(**myFoo) // This would work

def foo(**myFoo: Foo) = ???
foo(toString = "hello", hashCode = 123) // This wouldn't work

It breaks symmetry with pattern matching: pattern matching gives you the case class fields, it does let you pattern match on toString or hashCode or whatever even if technically it could be implemented.
Because every case class has a whole zoo of irrelevant inherited members, the only way we could make unpacking of inherited members work is by silently discarding members that do not match a named parameter. That kind of “silent” compatibility could work, but it’s definitely a sacrifice in strictness.

Overall lots of downsides for allowing unpacking of inherited members, and I’m not seeing any upside v.s. only unpacking things in the case class’s primary parameter list.

MateuszKowalewski · October 4, 2023, 7:41am

I like the general direction.

But this looks like just another special case.

Why can’t we have “HMaps” / “tuples with field names” / “anonymous case classes” and make parameter lists a first-class construct based on that?

Given some structural sub-typing on the “HMaps” / “tuples with field names” / “anonymous case classes” this feature here would emerge on it’s own, I think.

(And an unrelated tangent: I also like the mentioned Default type-class idea from Rust!)

lavrov · October 4, 2023, 9:00am

Why can’t we have “HMaps” / “tuples with field names” / “anonymous case classes” and make parameter lists a first-class construct based on that?

You forgot to mention “extensible records”
I believe it is long overdue for having this feature in the language. The space of application is huge. I personally miss it when working on front-end projects in ScalaJS. It is quite normal for visual components to have many parameters some of which are shared. The most annoying case is when components share almost everything but one parameter. There are ways to deal with that but they all look a bit awkward in some way or another.

lihaoyi · October 4, 2023, 9:52am

This is exactly the use case my proposal would solve! You can have the shared parameters in a separate case class, and unpack them into various components constructor definition sites in addition to the parameters unique to each one. And the same applies to users: if they want to pass the same set of shared parameters around, they can use the same case class or define their own, instantiate it, and and unpack the instance into all their component call-sites.

In that way, both definition-site and call-site become both boilerplate free, without needing to construct config objects all the time, while still giving the flexibility to abstract over the parameter sets in a way that copy-pasting parameter lists does not provide

markehammons · October 4, 2023, 11:04am

Can’t HMaps already be encoded in tuples (including field/param names being meaningful)? A tuple like (("name", String), ("year", Int)) would be the tuple/hmap representation of Movie’s data. Using Dynamic or programmatic structural types you can get something like foo working for this type too.

lihaoyi · October 4, 2023, 11:14am

Sure. But how much of the Scala developers are storing their data in (("name", String), ("year", Int))? Probably somewhere around 0%. Same can be said for how much of the Scala community uses Dynamic, or structural types (programmatic or not). In contrast, somewhere around 100% of Scala developers use case classes.

Replacing case classes with a bunch of nested tuples throughout the Scala ecosystem would be several orders of magnitude more difficult than adding syntax for unpacking in definitions and callsites.

markehammons · October 4, 2023, 11:53am

You typically wouldn’t store data in those, but rather use the type as a description for the case class’s shape, allowing the compiler the ability to infer the proper way to instantiate the case class from foo(name="the meaning of life", year=1983), as well as check if all the necessary fields have been provided. The Dynamic here is just the backing for foo, and only stores the information on construction of Movie, not the data of Movie.

Tuple as a store of HMap data is inefficient though, even if it’s only storing that information at compile-time. I would say that’s one of the main things that makes this unfeasible for massive usage.

Ichoran · October 4, 2023, 5:52pm

Okay, this is a good point. It would be better to restrict it to cases where there is symmetry for a type T between T(a = x, ...): T and t.a: A, .... I’m not convinced that random inherited members is an in-practice problem, but being able to create the thing you’re taking members from seems like a nice property.

It just needs to be flexible enough to handle syntactic upcasting. So if we have

case class Foo(i: Int, j: Long) {}
case class Bar(i: Int, j: Long, k: String) {}
def run(foo: ..Foo) = ???
val bar = bar(1, 2L, "three")

then presumably we want

run(..bar)

to work, even though we can’t

def run(foo: ..Foo) =
  val b = Bar(..foo)
  ...

which would be the symmetric condition.

The whole reason that it works as syntactic sugar for extensible records is that it isn’t symmetric so you can map the matching fields / parameters.

lihaoyi · October 4, 2023, 11:30pm

I’m actually not sure we want that case to work. I’d want the “too few params in case class unpacked at callsite” scenario to work, since you can add named params specify the others explicitly. But the “too many params in case class unpacked at callsite” scenario does not give any was of explicitly removing params, and removing them implicitly feels a bit off to me.

The precedence isnt obvious here.

On one hand Python, unpacking into untyped **kwargs does let you add extra stuff that all gets bundled into the dict.
On the other hand, unpacking **kwargs with extra fields into non-**kwargs functions that do not define those named params is an error. So they’re flexible in one way but not the other

Since we’re working with case classes, by their nature they’re typed and non-extensible. And Scala does err on the side of “strict” more than Python does. So IMO erroring out on extra fields that do not correspond to a defined named parameter is the right thing to do

lihaoyi · October 7, 2023, 6:16am

Notably, a variant of this is being discussed in the F# community (just for records, since they don’t have named parameters in method calls), and it seems like it’s likely to make it in Spread operator for F# · Issue #1253 · fsharp/fslang-suggestions · GitHub. They propose ... similar to what @Ichoran suggested