Pre-SIP: A Syntax for Collection Literals

Pre-SIP: A Syntax for Collection Literals

Scala is lacking so far a concise way to specify collection literals. This makes it an outlier
compared to many other popular languages. We propose to change this by introducing a special syntax for such literals. The syntax is quite conventional: A sequence is written as a comma-separated list of elements enclosed in square brackets. For instance, here is a diagonal matrix of rank 3:

  [[1, 0, 0],
   [0, 1, 0],
   [0, 0, 1]]

This pre-sip is a follow-on to a previous thread which received a large number of comments discussing many different alternatives. I am starting a new thread to focus on a concrete proposal that differs in some aspects from the original one. Some of the previously proposed alternatives are discussed below.

Why?

One reason Scala is such a late comer to collection literals is that it already offers with apply methods an alternative that is reasonably concise. For instance we’d express the diagonal matrix above in Scala like this:

  Vector(
    Vector(1, 0, 0),
    Vector(0, 1, 0),
    Vector(0, 0, 1))

This uses the standard convention of apply methods taking vararg arguments. Nevertheless,
the new syntax is has clear advantages:

  • It is shorter and more readable.
  • It keeps open implementation details like the precise implementation type of the correction. These can be injected from the context.
  • It is more familiar for developers that come from other languages or know
    standard data formats like JSON.

What

Collection literals are comma-separated sequences of expressions, like these:

  val oneTwoThree = [1, 2, 3]
  val anotherLit  = [math.Pi, math.cos(2.0), math.E * 3.0]
  val diag        = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
  val empty       = []
  val mapy        = [1 -> "one", 2 -> "two", 3 -> "three"]

The type of a collection literal depends on the expected type. If there is no expected type (as in the examples above) a collection literal is of type Seq, except if it consists exclusively elements of the form a -> b, then it is of type Map. These types are the ones from package scala.collection.immutable. An implementation is free to choose more efficient
conformant types for the actual representation of such literals.

For instance, the literals above would get inferred types as follows.

  val oneTwoThree: Seq[Int]   = [1, 2, 3]
  val anotherLit: Seq[Double] = [math.Pi, math.cos(2.0), math.E * 3.0]
  val diag: Seq[Seq[Int]]     = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
  val empty: Seq[Nothing]     = []
  val mapy: Map[Int, String]  = [1 -> "one", 2 -> "two", 3 -> "three"]

If there is an expected type E, the compiler will search for a given instance of the
type class ExpressibleAsCollectionLiteral[E]. This type class is defined in package scala.compiletime as follows:

  trait ExpressibleAsCollectionLiteral[+Coll]:

    /** The element type of the created collection */
    type Elem

    /** The inline method that creates the collection */
    inline def fromLiteral(inline xs: Elem*): Coll

If a best matching instance ecl is found, its fromLiteral method is used to convert
the elements of the literal to the expected type. If the search is ambiguous, it will be
reported as an error. If no matching instance is found, the literal will be typed by the default scheme as if there was no expected type.

The standard library contains a number of given instances for standard collection types. To avoid the need for given imports, these
instances are preferably either in companion objects of the implemented collections or in the companion of ExpressibleAsCollectionLiteral.

For instance, there would be:

  given vectorFromLiteral: [T] => ExpressibleAsCollectionLiteral[Vector[T]]:
    type Elem = T
    inline def fromLiteral(inline xs: T*) = Vector[Elem](xs*)

Hence, the definition

  val v: Vector[Int] = [1, 2, 3]

would be expanded by the compiler to

  val v: Vector[Int] = vectorFromLiteral.fromLiteral(1, 2, 3)

After inlining, this produces

  val v: Vector[Int] = Vector[Int](1, 2, 3)

Using this scheme, the literals we have seen earlier could also be given alternative types like these:

  val oneTwoThree: Vector[Int]   = [1, 2, 3]
  val anotherLit: Vector[Double] = [math.Pi, math.cos(2.0), math.E * 3.0]
  val diag: Array[Array[Int]]    = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
  val empty: ArrayBuffer[Object] = []
  val mapy: HashMap[Int, String] = [1 -> "one", 2 -> "two", 3 -> "three"]

Question: Is ExpressibleAsCollectionLiteral too long as a name? Are there shorter alternatives that convery the meaning well?

Notes

  • Since the fromLiteral method in ExpressibleAsCollectionLiteral is an inline method with inline arguments, given instances can implement it as a macro. This can yield more efficient direct implementations with no need for the detour of a Seq passed in a vararg.

  • The precise meaning of “is there an expected type?” is as follows: There is no expected
    type if the expected type known from the context is under-specified, as it is defined for
    implicit search. That is, an implicit search for a given of the type would not be
    attempted because the type is not specific enough. Concretely, this is the case for Wildcard types ?, Any, AnyRef, unconstrained type variables, or type variables constrained from above by an under-specified type.

  • The precise rules when a Map instead of a Seq is used as the default type are as follows. A collection literal is of type Map if there is no expected type and all elements are of the form a -> b, where each -> resolves to the -> method defined in Predef that is used to build a pair (a, b). Other elements (including expressions of type Tuple2) will create literals of type Seq.

Syntax

SimpleExpr       ::=  ...
                   |  ‘[’ ExprInParens {‘,’ ExprInParens} ‘]’

Alternatives

There was extensive discussions in a previous thread about this scheme. Some of the alternatives that were proposed are briefly mentioned and discussed here:

Syntax Alternatives

There was some concern that square brackets would syntactically be too close to type arguments. Several alternatives were proposed, including

  • Parentheses (a, b, c). This has the problem that single element collections cannot be defined without introducing possibly far-reaching and unwanted conversions from element types to collection types.
  • Parentheses with some prefix or suffix, such as #(a, b, c) or (a, b, c)*. This has the problem of being less familiar and harder to read than the [a, b, c] notation, in particular for nested literals.

To be sure, there is no actual parsing ambiguity between collection literals and type arguments. If a function takes a collection literal as argument it still needs to be placed in parentheses. So, f[a] is always instantiation with a type argument whereas f([a]) would be a function taking a collection literal as argument.

In my opinion, the experience in other languages shows that we don’t need to be concerned too much about syntax clashes. Javascript, Python, PHP, C#, Typescript, Swift, Objective-C, Rust, and Dart all have bracket-enclosed literals and at the same time have index expressions that also use brackets in the same places where Scala uses type arguments. So one might think this would give similar scope for confusion. But in practice it does not seem to be a problem.

Typing Alternatives

There was some debate to what degree the new scheme should need opt-in for adapting to an expected type. One alternative was to always do the adaptation to a type C is C’s companion object has an apply method that would be applicable to the collection elements. This looks simple and powerful and very backwards compatible since new syntax can be used for existing libraries without having to change them. But that aspect of the scheme is also its biggest problem since we then introduce a new and shorter way to invoke arbitrary apply methods. Since the new syntax is shorter, it is likely to be mis-used widely even against the intention of library designers. Scala previously committed a similar design mistake by allowing unrestricted infix syntax for all methods. In practice that led to splits in the ecosystem where one group of developers could not read the other’s code.

By contrast, type classes require explicit opt-in from library designers, with the ability of explicit retrofits through given imports. I believe this strikes a better balance between the need to keep the ecosystem consistent and the desire for flexibility.

There were proposals to use implicit conversions from some new “collection literal type” instead of type classes. Of course in Scala 3 implicit conversions are also type classes, which come with more strings attached (i.e. need to enable them explicitly at the use-site). It seems that the current restrictions for conversions are not helpful in the case of collection literals and the need for a separate literal type makes the scheme more complicated. Regular type classes are the simpler and more straightforward alternative.

Implementation

The scheme was implemented as a draft PR. The implementation was straightforward; no difficulties were encountered.

16 Likes

This creates a map, but I guess a list of tuples can also be created with this syntax in case of explicit type, is it correct?

val list: List[(Int, String)]  = [1 -> "one", 2 -> "two", 3 -> "three"]

How does the new proposal weigh against the design principle that was expressed in this earlier post?

This creates a map, but I guess a list of tuples can also be created with this syntax in case of explicit type, is it correct?

Yes, exactly.

3 Likes

I was considering IArray instead of Seq as the default type for a sequence literal. But we hit a snag for the empty listeral []. That would translate to IArray() which does not typecheck.

scala> IArray()
-- [E172] Type Error: ----------------------------------------------------------
1 |IArray()
  |        ^
  |        No ClassTag available for Any
1 error found

Maybe this issue is fixable. But for now it looks like a blocker.

I think this looks great.

The default type for non-target-typed sequence literals is up for debate, but I think Seq and Map probably win just by default. All possible types have tradeoffs, but Seq and Map are the only ones with he most consistency with the data structures people already create most often.

Perhaps in another timeline Scala’s default collection would be IndexedSeq or IArray or ArrayDeque, but thay’s not the timeline we live in. And anyway in cases where another target-type is expected, this works out of the box (similar to swift)

It would take some experimentation to believe that the typeclass approach proposed works in the presence of implicit conversions and other Scala language features (e.g. Mill would want it to work for Task[Seq[T]]), but that seems solvable

3 Likes

I find this proposal quite interesting. Here are a few points I noticed:

  • Will this literal syntax also be enabled for Set?
  • It might be interesting to add flexibility by allowing the choice of collection type with a prefix, similar to string interpolation. For example, something like vec[1, 2, 3]. (But, I think this is too much and too concrete.)
  • Will this literal support destructuring? For instance, val [first, second] = xs. I believe this could be a very powerful feature to abstract away the underlying implementation of collections effectively.
1 Like
  • Will this literal syntax also be enabled for Set?

If an explicit type is given that has the right type class instance, yes. You could write

   val s: Set[Int] = [1, -2, 11]

if there is a ExpressibleAscollectionLiteral[Set[Int] instance.

  • Will this literal support destructuring?

It would be an interesting idea. But this has not been considered so far.

3 Likes

I am not in favor.

  • Going from one Scala codebase to the next can already feel like a different language. Having even more ways to express the same thing will make it worse.
  • I actually believe it can be quite confusing that both these can be correct
val x: Vector[Int] = [1, 2, 3]
val y: List[Int] = [1, 2, 3]

I think this is much harder to explain than just List(1,2,3) and Vector(1, 2, 3). I definitely think it can be confusing for beginners that this is actually type safe.

  • [1, 2, 3] might be familiar, but [1 -> "one", 2 -> "two"] certainly is not common in other languages.
  • Using type classes is nicely extensible, but can easily lead to being interpreted as arbitrary data structures, while in general Scala has been moving away from crazy DSLs.

Though I can see the value in less verbose data structures, I think the gain does not outweigh the cost.

8 Likes

Here are my couple cents:

Motivation & Technical Details

I found this very striking at first, but looking back at it, here’s what I see when I read them:

[  [1, 0, 0],
   [0, 1, 0],
   [0, 0, 1]]

//vs
~~~~~~(
    ~~~~~~(1, 0, 0),
    ~~~~~~(0, 1, 0),
    ~~~~~~(0, 0, 1))

The same square matrix structure remains, which for me means the difference in readability is not that great
It is undeniably shorter, but I am not sure this is a positive (nor necessarily a negative)

  1. I would say probably yes, my brain just skips over the name without reading it
  2. “Bracket literal” instead of “collection literal” (shorter, fewer syllables, and explicit reference to the syntax) ? FromCollectionLiteral, AsCollectionLiteral, ConvFromCollectionLiteral ?

It’s not easy to find a shorter name because:

  • Typeclasses tend to be named like properties: Showable, as opposed to actions Show
  • There are no conventional shortening of Collection and Expressible
  • The are no other synonyms for either (at least I could not find one)

Very good point

Overall I agree on pretty much all points shown so far

Syntax

As I have mentioned before, this is where the trouble starts for me

  1. I agree it’s slightly less familiar, but I disagree that this is a problem
    • Especially in the case of spreading (*), since that already exists in a different context, in both Scala and other languages
  2. I really don’t think it’s harder to read
(
  (1, 0, 0)*,
  (0, 1, 0)*,
  (0, 0, 1)*,
)*

The context would make it even more clear, probably val identity: Matrix =

Fully agree on the technical part, however:

f[a] // type application
// vs
f [a] // term application of collection litteral

Is not machine-ambiguous, but might very well be human-ambiguous, in particular personWhoDoesNotKnowScala-ambiguous

There are two important differences:

  1. Collection creation is more similar to collection access than it is to type application
  2. These languages pretty much all agree on the meaning of f [a]:
    • Python, PHP, C#, Typescript, swift, Objective-C, Rust, Dart: f[a]
    • JS: something like f(a), but not sure
      (From quick experiments on online interpreters, I am an expert in none of these)

Let’s put aside JS, as it is the only one to do things differently.
The above result stems from none of these languages allowing f a to mean f(a) (at least in the special case where a is an index literal), which is the exact opposite of what Scala does
This might explain why in those languages, no one gets confused at the double usage of square brackets, but this means we cannot generalize that finding to Scala !

For me these are absolute deal-breakers for the bracket based syntax, which is why I recommend either:

  • No new syntax, List(1,2,3) already reads pretty well
  • Parens-based syntax, in particular (1, 2, 3)* for regularity or someId(1,2,3) if we do not want to change the compiler (done thought the standard library)
4 Likes

“listeral” has the advantage of being very short, which I believe is a good fit for the type class:
ExpressibleAsListeral
I believe however we should maybe not make it the default name to talk about the feature, due to it sounding and looking very similar to “literal”
(Like “courrier” vs “courriel” in french)

No, both are type applications.

Oh my bad, I was sure f a worked in Scala, but it does not

The follow-up question is infix calls which do work like this, there again it does seem like there won’t be any issues:

l.++[1](List[1](1))

l ++ [1] // expression expected but '[' found
1 Like

I am not in favor either (like in the previous thread). Agreed with @bmeesters and @Sporarum 's points. The cost outweighs the benefits. It makes things a bit more confusing and less readable (except in a few very special cases of bulk data where context makes things super clear, like writing down a matrix etc., but we should shy away from bulk data in code). One of Scala’s nicest syntax features is the near uniformity of apply constructors: Name(args), and this would hurt that. It made me fall in love with the language (no more [] vs {} vs () vs <> of other languages). It will also make Scala look and feel like an untyped language where you can freely mix and match; it would be a confusing hurdle to newcomers from those languages who expect it to work the same (I bet they would also try {} to define a set or a dictionary to see it doesn’t work). I think Scala should stick to its guns and its original vision.

12 Likes

While I agree most of the time, there’s a very common use case for it: tests. I define bulk data in tests all the time, and IMO this would tend to be a readability improvement.

I’m not hard-over one way or t’other, but overall I like the look of this proposal, and would probably use it at least a moderate amount.

I’m not much of a fan of the [] flavor of the proposal, but I don’t think it would be a disaster.

I think it would make code less clear (but more compact); harder to learn (but more familiar); big impact on language with small increase in features (but it’s arguably an important one). Generally moves Scala away from “powerful orthogonal features” and towards “special-case for this thought-to-be-important use”. But arguably Scala 3 has already been doing that.

So it could be okay. I’m skeptical, but even if it’s a mistake in retrospect, I don’t think it would be a very bad mistake.

However:

This would be a very bad mistake. Types randomly materializing depending on which givens happen to be in scope is exactly why implicit conversions got a bad name to begin with.

I really only see two ways out.

  1. [x, y] is a fixed type, maybe an “inline IArray”, that has automatically-enabled conversions so you always and only get an IArray (or Array) when no type is expected, and otherwise it will use a (possibly inline macro) conversion from a type that is maximally-aligned to what is expressed in the code (which is a fixed-length immutable indexed collection that you do not need to modify).
  2. [x, y] doesn’t work at all unless the collection type is known.

To give an example of how the existing proposal–if the example quoted above actually works–leads even out-of-the box to confusing code, consider

val singletons = [1, 2, 3]
val pairs = [(1, 2), (3, 2), (1, 3)]
val triples = [(1, 2, 3), (2, 1, 3), (2, 3, 1)]

singletons has length 3. triples has length 3. pairs has…no length method at all! It’s a size 2 map with either (1, 2) or (1, 3) missing.

Edit: this is assuming you don’t use trait-defying a -> b introspecting magic to magically concoct a Map. If this is kosher, and can be done in general, then we have all manner of cool but extremely unexpected possibilities, like val x = ["Hello.", "Bye now!"] turning into "Hello.\nBye now!".

This is going to resolve into a huge mess, getting messier as more libraries use it, after which best practice will be to never use it at all.

If we love element-type inference so much that we can’t bear to write val xs: List[Int] = [1, 2, 3], then maybe we want to consider being able to express the case that we know the collection but not the element type, e.g. val xs: List = [1, 2, 3]. That isn’t an improvement over List(1, 2, 3), but

val matrix: Vector[Vector] =
  [ [1, 2, 3],
    [0, 4, 5],
    [0, 0, 6] ]

arguably is. Though, honestly, I don’t think matrix: Vector[Vector[Int]] is so terrible.

The lesson from implicits and implicit conversions should be: having the compiler know what the type is is not enough. If the programmer doesn’t also know the type without much difficulty (including that it tends to be stable to remote changes), it makes programming too difficult.

8 Likes

As mentioned in the previous thread, I have mixed feelings about this proposal, especially due to all the “magic” that it brings.

I would prefer the whole thing to be a library-level implementation behind an explicit import to avoid accidents.

I think most of my main concerns have all been expressed here, so I won’t repeat those, but one other thing that I believe is debatable is special casing of maps… I think it wouldn’t be that odd if (like Sets), Maps just required an inferred type.

That would avoid quite some surprises… Like, what happens if I have:

val scala = "scala" -> "programming language"
val python = "python" -> "programming language"

val tools = [scala, python, "vim" -> "editor"]

Is this a Map? Is it only a Map if I inline scala and python (or use inline val)?

And I guess there’s also the question of custom Predefs, although I think not many people actually use that.

8 Likes

This feels deeply weird:

val expected = List((1, 1), (2, 2), (1, 2))
[(1, 1), (2, 2), (1, 2)] == expected 
List(1 -> 1, 2 -> 2, 1 -> 3) == expected 
[1 -> 1, 2 -> 2, 1 -> 3] != expected 
[1 -> 1, 2 -> 2, 1 -> 3].toList.sorted != expected 
([1 -> 1, 2 -> 2, 1 -> 3]: List[(Int, Int)]) == expected 

Having the type (and in this case runtime value) depend on the difference between _ -> _ and (_, _) is, at least as far as I’m aware, unprecedented in Scala.

2 Likes

isn’t Vector(1, 2, 3) doing the job already?

as for controlling collection literal type, i’m thinking something like this would be helpful:

val matrix: Vector[Vector[_compiler_please_fill_this_part_in]] = [
  [0, 1, 2],
  [3, 4, 5],
  [6, 7, 8]
]

i.e. require the type ascription, but only the part that defines the collection types and let compiler infer the innermost element type. (edit: ah, this is what @Ichoran already wrote)

then the methods for converting collection literals to collections should be put into companion objects of mentioned collection types. no random implicits / givens needed.

2 Likes

No evidence has been presented so far to support the statement that this would be “more readable”. The only thing that has been presented is arguments of the form “language XYZ does this, and it is considered readable”. The conclusion that therefore [] is readable is a non-sequitur.

Well, XML literals seemed like a good idea at one point, and that didn’t really work out either. Nor did symbol literals, i. e. the 'Symbol syntax that Scala used to have. So maybe the lesson to draw from that is not to clutter the language with all kinds of literal syntax but instead come up with a more general way of allowing users to define their own literal syntaxes. We have string interpolators that kind of do that, but they require those ugly quotes and $ symbols – maybe we can come up with something better.
More generally, I don’t like the logic of “everybody else does this, so we should too”. There’s nothing wrong with stealing ideas from popular languages, but when we do, we should do it because they’re good features that solve an actual problem, not because other languages have them.

And this is where it gets really annoying to me personally, because you’re misconstruing the # proposal as a mere syntactic variant of list syntax. What it actually is is a much more general feature to allow access to the companion object of the expected type. It can be used as #(…), but also as #.of(2025, 1, 16) (for e. g. LocalDate) and in a myriad of other ways.

I strongly dislike this way of thinking because it implies that somehow library authors are entitled to decide for their users how their libraries should be used, which isn’t the case nor should it be. And it’s contradictory because “the ability of explicit retrofits through given imports” was touted as one of the advantages of this proposal, so library designers are actually not in charge. So whose intention are we actually talking about here? It seems like it’s the language designers’ intentions.

It should also be noted that this proposal opens the door to inconsistent (performance) behaviour between the current way of creating collections, i. e. List(…), and the new way. It would be better to instead make List.apply etc. work better for everyone, rather than telling people they need to switch over to a new syntax if they want better performance.

Overall this proposal creates a considerable amount of complexity, especially for beginners who will be confronted with […] expressions, and to understand what they mean they need to know about all sorts of advanced topics like typeclasses, the now-special meaning of -> and inline methods. At the same time, the utility is extremely limited: you save a total of 3 characters per Seq literal. And if you really want to, you can just import Scala.{Seq as S} and you’re down to one character. But oddly enough, nobody does that, and that tells me everything I need to know about the utility of this feature (or rather, the lack thereof). And it also interacts badly with implicit conversions. It was said that this is a “solvable” problem, but it’s just better not to create the problem in the first place than it is to create it and then hope you can find a solution to it later.

What I want in a programming language is a set of flexible tools that can be used in orthogonal ways, and I think that’s also the people whom Scala has historically attracted. I find it sad that we seem to be moving further and further away from that in a vain effort to make the language more palatable to developers from other languages, which is what this proposal is really about. And it’s going to be in vain because what those people are actually looking for is better tooling and documentation, not trivial syntactic conveniences.

4 Likes