Pre-SIP: A Syntax for Collection Literals
Scala is lacking so far a concise way to specify collection literals. This makes it an outlier
compared to many other popular languages. We propose to change this by introducing a special syntax for such literals. The syntax is quite conventional: A sequence is written as a comma-separated list of elements enclosed in square brackets. For instance, here is a diagonal matrix of rank 3:
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]
This pre-sip is a follow-on to a previous thread which received a large number of comments discussing many different alternatives. I am starting a new thread to focus on a concrete proposal that differs in some aspects from the original one. Some of the previously proposed alternatives are discussed below.
Why?
One reason Scala is such a late comer to collection literals is that it already offers with apply
methods an alternative that is reasonably concise. For instance weâd express the diagonal matrix above in Scala like this:
Vector(
Vector(1, 0, 0),
Vector(0, 1, 0),
Vector(0, 0, 1))
This uses the standard convention of apply
methods taking vararg arguments. Nevertheless,
the new syntax is has clear advantages:
- It is shorter and more readable.
- It keeps open implementation details like the precise implementation type of the correction. These can be injected from the context.
- It is more familiar for developers that come from other languages or know
standard data formats like JSON.
What
Collection literals are comma-separated sequences of expressions, like these:
val oneTwoThree = [1, 2, 3]
val anotherLit = [math.Pi, math.cos(2.0), math.E * 3.0]
val diag = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
val empty = []
val mapy = [1 -> "one", 2 -> "two", 3 -> "three"]
The type of a collection literal depends on the expected type. If there is no expected type (as in the examples above) a collection literal is of type Seq
, except if it consists exclusively elements of the form a -> b
, then it is of type Map
. These types are the ones from package scala.collection.immutable
. An implementation is free to choose more efficient
conformant types for the actual representation of such literals.
For instance, the literals above would get inferred types as follows.
val oneTwoThree: Seq[Int] = [1, 2, 3]
val anotherLit: Seq[Double] = [math.Pi, math.cos(2.0), math.E * 3.0]
val diag: Seq[Seq[Int]] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
val empty: Seq[Nothing] = []
val mapy: Map[Int, String] = [1 -> "one", 2 -> "two", 3 -> "three"]
If there is an expected type E
, the compiler will search for a given instance of the
type class ExpressibleAsCollectionLiteral[E]
. This type class is defined in package scala.compiletime
as follows:
trait ExpressibleAsCollectionLiteral[+Coll]:
/** The element type of the created collection */
type Elem
/** The inline method that creates the collection */
inline def fromLiteral(inline xs: Elem*): Coll
If a best matching instance ecl
is found, its fromLiteral
method is used to convert
the elements of the literal to the expected type. If the search is ambiguous, it will be
reported as an error. If no matching instance is found, the literal will be typed by the default scheme as if there was no expected type.
The standard library contains a number of given instances for standard collection types. To avoid the need for given imports, these
instances are preferably either in companion objects of the implemented collections or in the companion of ExpressibleAsCollectionLiteral
.
For instance, there would be:
given vectorFromLiteral: [T] => ExpressibleAsCollectionLiteral[Vector[T]]:
type Elem = T
inline def fromLiteral(inline xs: T*) = Vector[Elem](xs*)
Hence, the definition
val v: Vector[Int] = [1, 2, 3]
would be expanded by the compiler to
val v: Vector[Int] = vectorFromLiteral.fromLiteral(1, 2, 3)
After inlining, this produces
val v: Vector[Int] = Vector[Int](1, 2, 3)
Using this scheme, the literals we have seen earlier could also be given alternative types like these:
val oneTwoThree: Vector[Int] = [1, 2, 3]
val anotherLit: Vector[Double] = [math.Pi, math.cos(2.0), math.E * 3.0]
val diag: Array[Array[Int]] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
val empty: ArrayBuffer[Object] = []
val mapy: HashMap[Int, String] = [1 -> "one", 2 -> "two", 3 -> "three"]
Question: Is ExpressibleAsCollectionLiteral
too long as a name? Are there shorter alternatives that convery the meaning well?
Notes
-
Since the fromLiteral method in
ExpressibleAsCollectionLiteral
is an inline method with inline arguments, given instances can implement it as a macro. This can yield more efficient direct implementations with no need for the detour of aSeq
passed in a vararg. -
The precise meaning of âis there an expected type?â is as follows: There is no expected
type if the expected type known from the context is under-specified, as it is defined for
implicit search. That is, an implicit search for a given of the type would not be
attempted because the type is not specific enough. Concretely, this is the case for Wildcard types?
,Any
,AnyRef
, unconstrained type variables, or type variables constrained from above by an under-specified type. -
The precise rules when a
Map
instead of aSeq
is used as the default type are as follows. A collection literal is of typeMap
if there is no expected type and all elements are of the forma -> b
, where each->
resolves to the->
method defined inPredef
that is used to build a pair(a, b)
. Other elements (including expressions of typeTuple2
) will create literals of typeSeq
.
Syntax
SimpleExpr ::= ...
| â[â ExprInParens {â,â ExprInParens} â]â
Alternatives
There was extensive discussions in a previous thread about this scheme. Some of the alternatives that were proposed are briefly mentioned and discussed here:
Syntax Alternatives
There was some concern that square brackets would syntactically be too close to type arguments. Several alternatives were proposed, including
- Parentheses
(a, b, c)
. This has the problem that single element collections cannot be defined without introducing possibly far-reaching and unwanted conversions from element types to collection types. - Parentheses with some prefix or suffix, such as
#(a, b, c)
or(a, b, c)*
. This has the problem of being less familiar and harder to read than the[a, b, c]
notation, in particular for nested literals.
To be sure, there is no actual parsing ambiguity between collection literals and type arguments. If a function takes a collection literal as argument it still needs to be placed in parentheses. So, f[a]
is always instantiation with a type argument whereas f([a])
would be a function taking a collection literal as argument.
In my opinion, the experience in other languages shows that we donât need to be concerned too much about syntax clashes. Javascript, Python, PHP, C#, Typescript, Swift, Objective-C, Rust, and Dart all have bracket-enclosed literals and at the same time have index expressions that also use brackets in the same places where Scala uses type arguments. So one might think this would give similar scope for confusion. But in practice it does not seem to be a problem.
Typing Alternatives
There was some debate to what degree the new scheme should need opt-in for adapting to an expected type. One alternative was to always do the adaptation to a type C
is C
âs companion object has an apply
method that would be applicable to the collection elements. This looks simple and powerful and very backwards compatible since new syntax can be used for existing libraries without having to change them. But that aspect of the scheme is also its biggest problem since we then introduce a new and shorter way to invoke arbitrary apply
methods. Since the new syntax is shorter, it is likely to be mis-used widely even against the intention of library designers. Scala previously committed a similar design mistake by allowing unrestricted infix syntax for all methods. In practice that led to splits in the ecosystem where one group of developers could not read the otherâs code.
By contrast, type classes require explicit opt-in from library designers, with the ability of explicit retrofits through given imports. I believe this strikes a better balance between the need to keep the ecosystem consistent and the desire for flexibility.
There were proposals to use implicit conversions from some new âcollection literal typeâ instead of type classes. Of course in Scala 3 implicit conversions are also type classes, which come with more strings attached (i.e. need to enable them explicitly at the use-site). It seems that the current restrictions for conversions are not helpful in the case of collection literals and the need for a separate literal type makes the scheme more complicated. Regular type classes are the simpler and more straightforward alternative.
Implementation
The scheme was implemented as a draft PR. The implementation was straightforward; no difficulties were encountered.