A syntax for aggregate literals
Hey there,
this thread was born out of a recent discussion on the Scala Discord.
Motivation
Unlike most other programming languages today like EcmaScript or C++, Scala does not have a literal syntax for collections and objects. It makes up for this with (potentially variadic) apply
methods on the relevant types’ companion objects, e. g. List(1, 2, 3)
or SomeCaseClass("foo")
. This works, but it means that it is often necessary to spell out the name of a type (or rather, its companion object) when it could easily be inferred by the compiler. This makes many very common types of code clunky and verbose, and it is in stark contrast to other parts of the language where types don’t need to be given explicitly when they can be inferred from the context. val
definitions don’t need a type, the type of a lambda parameter is usually inferred, and even the type that the lambda expression as a whole evaluates to can be inferred, e. g. it’s possible to pass a lambda expression to a function that expects a Runnable
.
Another problem is that having to spell out the type of a collection makes code harder to refactor. When you change the type of a function parameter from List
to ArraySeq
, all call sites need to be rewritten even though no semantically relevant change happened.
An example from Li Haoyi’s os-lib
library is the os.proc
syntax. Since it is impossible to a method to have both parameters with defaults and a variadic argument list, it is necessary to work around this with an additional method:
os
.proc("ls", "abc") // variadic function call
.call(cwd = ???) // optional named args
It should be possible to write this as a single method call, and it could be if the command to be executed could be written as a collection literal.
But it gets worse when you want to create objects of a deeply nested structure of case class
es.
List(
Person("Martin", Birthday(1958, 9, 5)),
Person("Matthias", Birthday(???, 7, 11)),
)
This is very typical when writing tests, and it’s clearly very redundant. I often found myself working around this using the only class type that Scala does have literal syntax for, tuples, and then mapping over the list, which is quite clunky. It is compounded by the fact that a common technique to avoid name clashes is to nest the definitions of types inside the companion object of the types that they occur in. E. g. Id
is a name that is likely to cause clashes, so people write it like this:
case class FooRequest(foo: FooRequest.Id)
object FooRequest:
case class Id(i: Int)
def callFoo(r: FooRequest) = ???
callFoo(FooRequest(FooRequest.Id(42)))
This is very common in generated code, such as from the Guardrail code generator for OpenAPI schemas. Not only is this code mind-numbingly redundant, you are also going to need a large amounts of import
statements to import all the relevant companion objects into your file.
Proposed solution
The solution I propose is to have square brackets as a new syntax to signify a call to a type’s companion object’s apply
method. Which companion object that is is determined by the type expected in that position. Using it in a position where no expected type can be determined is a compile-time error.
In the simplest case that is just a val
definition with a type annotation:
val ints: List[Int] = [1,2,3] // desugars to List(1,2,3) because a list is expected
But you could also be calling a function that expects a certain type:
def frobnicate(ints: List[Int], someArg: Int = 42) = ???
frobnicate([1,2,3]) // desugars to frobnicate(List(1,2,3))
Note that this cannot be done with a variadic function: the optional someArg
parameter makes that impossible, hence the awkward os.proc(…).call(…)
syntax in os-lib. A variadic function also doesn’t allow more than one list argument:
def dotProduct(
as: List[Double],
bs: List[Double]
) = ???
dotProduct(
[1, 2],
[3, 4]
) // desugars to dotProduct(List(1, 2), List(3, 4))
The List[Person]
example could easily be rewritten either like so:
List[Person](
["Martin", [1958, 9, 5]],
["Matthias", [???, 7, 11]]
)
or like so:
[
["Martin", [1958, 9, 5]],
["Matthias", [???, 7, 11]]
]: List[Person]
The FooRequest
example is also much simpler:
callFoo([[42]])
In addition to not having to spell out the object types here, you also don’t need to import them into your file any longer, which cuts down the boilerplate even more. With this syntax you will no longer miss the forest for the trees and instead focus on what’s important: 42, the answer to life, the universe and everything.
Points to clarify
- How does this work for non-case classes? They don’t have an
apply
method on the companion object!- it’s fine, just call the constructor instead, just like leaving out the
new
keyword already does in Scala 3
- it’s fine, just call the constructor instead, just like leaving out the
- What if the context of the expression doesn’t require any type, e. g.
val
without type annotation?- we could define a default for that case, like
Seq
which is already special because variadic functions. But I feel that’s just too arbitrary
- we could define a default for that case, like
- can this be made to work for types like
java.time.LocalDate
where you need to callLocalDate.of
instead ofLocalDate.apply
?- I wouldn’t know how
- can we eliminate the parens when calling a function with this syntax? E. g.
foo[a]
rather thanfoo([a])
?- this clashes with the syntax to apply type parameters. We could use a different syntax than
[]
, but it would be very unfamiliar to most users.{}
is already taken, and<>
would likely clash with the less/greater-than operators. I think[]
is probably best despite the paren issue
- this clashes with the syntax to apply type parameters. We could use a different syntax than
- what about named parameters and
Map
initialization?- Everything between the
[]
works like it always did:["foo" -> 3, "bar" -> 5]
for aMap
,[answer = 42]
for named parameters
- Everything between the
Prior art
There is ample precedent for this kind of feature. C# has Collection Expressions and Target Typed New (thanks to Li Haoyi for pointing these out). Scala allows us to cover both of these with a single syntax as collections and case classes are initialized the same way: by calling the companion object’s apply method. C++ has list initialization, which covers both collections and classes/structs.
Q & A
- Won’t this make code less readable? I won’t be able to see which objects are being created!
- We have experience from both other languages (like C#) and other language features (lambda expressions for SAM types) to suggest that this will improve readability by eliminating distracting redundancy – it is easier to find the important bits in the source code if there are less irrelevant ones
- Tooling like metals can provide visibility into this (i. e. show you what companion object is called), much like it can show inferred types and implicit parameters today
- You still have the option to use the explicit syntax in places where you feel that it helps readability
- Do we really need this just avoid typing
Vector
every now and then?- This also works for the initialization of
case class
es, of which there tend to be more than list literals, and I think it can cut down massive amounts of boilerplate in that area - It also makes code easier to refactor (e. g. changing a function parameter’s collection type or renaming a case class)
- This also works for the initialization of
I’d love to hear your thoughts and feedback on this proposal. I’d also like to thank Fabio (SystemFW), Haoyi and Luis (BalmungSan) for their invaluable input that led to this proposal.