Pre-SIP: A Syntax for Collection Literals

mdedetrich · January 18, 2025, 4:19pm

When people are talking about complexity in this fashion, they are not taking about the complexity of implementing it within the compiler (given that many “users” of Scala see the compiler as a black box and hence don’t really care about it as they are not exposed to it) but rather how the added feature interplays with all of the existing features of the language.

You do however explain this in great detail later and I agree with everything thats said here.

odersky · January 18, 2025, 5:30pm

I agree that we need to avoid language fragmentation and have argued exactly that point in the original precursor thread. But for collection literals as proposed here, I expect them to be clearly better than the alternatives when applicable. Except if you want to repeat yourself with the type, e.g.:

val x: Seq[Int] = Seq(1, 2, 3)

(or something more subtle). It really comes down to how much explicit and repeated type info one requires. Is it important to know whether [1, 2, 3] is a Seq or a List or an IArray as long as we know that the program typechecks with it? We don’t actually do anything with it, we just pass it somewhere where some collection is required. And, is the actual type hard to infer from the context?

There are some situations where the answer is yes to both and then using the companion object is the best choice. But in most cases collection literals will be the preferred choice. They are the preferred choice in all these other languages, so why would we expect Scala to be different?

I believe there’s a general rule for literals at play here: Generally, we prefer to omit type info for them when it can be inferred from the context. So, for instance we say

val x: Float = 2.3

and not

val x: Float = 2.3f

since the f is redundant, it just adds noise. And that even though 2.3 alone is a Double, not a Float. Collection literals are similar: they let you drop duplication in the type information and that’s usually a desirable thing.

djspiewak · January 18, 2025, 6:02pm

Yes, sadly. This is actually exactly the argument I’ve been making for more than a decade about Seq itself. There is no circumstance under which you don’t care about the performance of the underlying data structure you’re touching outside of trivial scripting situations like Mill and Sbt.

This is easily illustrated in other languages by noting how in Java, people often use java.util.List, but they ubiquitously assume ArrayList and their code usually behaves poorly if someone sereptitiously feeds them a LinkedList or similar. Meanwhile, Clojure very famously pushed hard on its sequence abstraction and the introduction of the conj form, but its polymorphic behavior is extremely perplexing in practice and, generally, people tend to be quite explicit about whether they expect a list or a vector.

In other words, Seq is a leaky abstraction, not just in Scala but everywhere it has been tried. It is far more appropriate and idiomatic in Scala to be explicit about your types when dealing with seq-like things (we don’t tend to do the same with sets or maps, since it really is safe to assume the hashing implementations there except when explicitly asking for associative trees). If you take a quick scan over the community build, you’ll find that the vast, vast majority of the current Scala corpus uses List/Vector/Array rather than Seq, particularly better maintained and more widely used code. This literals proposal flouts conventional Scala usage in that way, and I broadly predict that they’ll be linted out of use in mainline code by many organizations once the tooling to do so is available; numerical literals really are not an analogous situation.

sjrd · January 18, 2025, 6:03pm

odersky:

But for collection literals as proposed here, I expect them to be clearly better than the alternatives when applicable. Except if you want to repeat yourself with the type, e.g.:
val x: Seq[Int] = Seq(1, 2, 3)
(or something more subtle).

That’s almost never what we write, though. We write it as

val x = Seq(1, 2, 3)

or

val x = Set(1, 2, 3)

whose collection literal equivalent would be

val x: Set[Int] = [1, 2, 3]

which is strictly more verbose.

Same. We write neither of those things. We write

val x = 2.3f

which is the most concise and explicit form.

They are the only choice in the languages that actually have reasonable collection literals (see all the answers that have been analyzing this for real). For example, there is no way in JS to create an array with its members other than with [1, 2, 3].

Sporarum · January 18, 2025, 6:06pm

Most of these arguments apply to all type-agnostic ways of writing collections, and not only the bracket syntax

We should always remember 3 things are being discussed here:

How much do we need a short, type-agnostic way of writing collections
Is the bracket syntax worth that price ?
Is there no better alternative ?

And for me, the answers are:

Not that much
Absolutely not
Nope: someSymbol() or ()* are way better IMO (I won’t repeat the reasons here)

spamegg1 · January 18, 2025, 6:21pm

Yes, and yes. It’s common to get (basic, simple) type errors fluctuating between Seq/List/IndexedSeq etc. when doing some problem solving for example. Although I admit this is more about method return types (Array also keeps sneaking in there).

I believe those languages all “copied the wrong answer” from each other due to cultural / sociological reasons (like how Java copied curly braces and semicolons to attract / ease C/C++ users), but Scala actually got it right instead. Also, hasn’t Scala been different from the very beginning? There’s nothing quite like it Isn’t that a good thing? We should have more confidence in ourselves…

dos65 · January 18, 2025, 9:14pm

Vector and Array are rarely used. The most popular only Seq and List.

I’m wondering if you would change your mind about that if the default inference for val a = [1, 2, 3] will be List[Int] instead of Seq[Int]?

lihaoyi · January 19, 2025, 12:17am

This doesn’t actually work when you are defining public APIs, which by most conventions need to be type-annotated. Then you’re back to :

val x: Seq[Int] = Seq(1, 2, 3)
val x: Set[Int] = Set(1, 2, 3)

Another big case that we do write it that way is when passing the sequence to a function:

foo([1, 2, 3])

val bar = Bar(
  numbers = [1, 2, 3]
)

In these scenarios, you know the function signature, and thus know the parameters.

A third set of use cases is when overriding and providing method return value:

trait Qux{
  def qux: List[Int]
}
object QuxImpl extends Qux{
  override def qux = [1, 2, 3]
}

Here, we have the target type from the inherited abstract method.

At a high level, a big part of the original proposal is target typing: the proposed shorthand would be valuable in scenarios where target typing is present. Now target-typing of course isn’t present everywhere, so there will be scenarios where this isn’t useful. Your post here demonstrates that neatly. But we all know that target-typing is pervasive in the Scala language, with all sorts of things like type-inference and implicit resolution dependent on target-typing! All those are scenarios where the collection literal shorthand would benefit

lihaoyi · January 19, 2025, 12:27am

This proposal is that it will help the cause of precise collection types you describe here, rather than hurt it. Thus status quo is:

People use Seq because it’s convenient. It’s convenient to define methods taking Seq, and it’s convenient to call methods taking Seq.
People can be precise and define APIs taking Vector or List or ArrayDeque if you want, but it inconveniences downstream API users who need to remember to pass in Vector or List or ArrayDeque to your various methods.

So method authors are encouraged to use the short/consistent/meaningless type Seq, to make things easier for the callers when then have to use a meaningless Seq. Or worse, use val s = Seq; s(1, 2, 3) like many people in this thread have already suggested!

With this proposal, you can define methods taking Vector/List/ArrayDeque, and in the common case where someone is passing a literal, it is in fact more convenient than passing in a Seq today! So people will be incentivised to define methods with more precise collection types as parameters, since they won’t be penalizing users by making them juggle collection types unnecessarily. Defining and calling

def foo(xs: Vector[Int], ys: Vector[Int])
def bar(xs: List[Int], ys: ArrayDeque[Int])

foo([1, 2, 3], [4, 5, 6])
bar([1, 2, 3], [4, 5, 6])

Will thus be both more precise and more convenient than the status quo of

def foo(xs: Seq[Int], ys: Seq[Int])
def bar(xs: Seq[Int], ys: Seq[Int])

foo(Seq(1, 2, 3), Seq(4, 5, 6))
bar(Seq(1, 2, 3), Seq(4, 5, 6))

This goes back to a point @sjrd recently mentioned: are there scenarios where using collection literals is strictly better than the status quo? I argue that in these scenarios, it is:

The caller of foo and bar above does not care about the collection type, because how foo and bar make use of the collections internally is not their concern. So eliding Vector or List or ArrayDeque passing a literal at the callsite loses nothing, and in fact clarifies things by eliding things the caller explicitly doesn’t care about. And so having precise types + collection literals becomes more concise and an easier sell than the status quo of precise types + explicit collection constructors that you are advocating for today.
The definer of foo and bar does care that they get the precise collection type, because as you said it’s better to be precise about the collection you need rather than passing around opaque Seqs everyone and having unpredictable performance problems pop up, either during usage of the Seq or due to the cost of turning the Seq to some other more strict collection that has more predictable performance characteristics. And so having precise collection types + collection literals is superior than the common current style of Seq everywhere that you rightly deride.

I’d argue that what you really want is that “collections are precisely and unambiguously typed”, rather than “collections are syntactically constructed using a variety of different factory methods”. Collection literals make the former easier by letting us skip the latter, but only in the (common) case where it the collection factory is unambiguous due to the target type (which as I mentioned is the case for a lot of Scala syntactic shortcuts)

Of course you would still need to do explicit .toVector calls when passing a values: List[Int] from somewhere else to foo(xs: Vector[Int]), but that’s the case with the status quo as well, and is arguably desired because conversion between collections has runtime overhead so you don’t want to inject them automatically. But choosing the right collection type a literal should instantiate has no runtime overhead at all, so we can safely infer it without the performance problems that mysterious implicit conversions can cause

mdedetrich · January 19, 2025, 12:29am

To be frank, in situation where I am exposing a field (in this case a value) as a public API I will always give it a type ascription so its abundantly clear what that API is, whether its this proposed collection literal or something else.

But most glaringly, the number of situations where I would want to make a proposd collection literal public is few and inbetween, in fact this is the least likely type of field/member to made public in the first place.

I feel like you are making a point for the sakes of making the point (i.e. showing cases where a type ascription is good practice i.e. public fields/methods), but its not really giving credence to collection literals as its the exact same problem there. If I am at the point of giving a collection literal an explicit type ascription then you have already lost so much of the touted brevity.

tarsa · January 19, 2025, 1:55am

for me, the biggest practical win in ergonomics of the proposal would probably be avoiding too much parentheses nesting in a single call (where all parens look the same), i.e. to the point where i have to stare for a longer time and maybe even use parens highlighting in editor to show me which things are within which parens.

compare e.g.:

object.call(Seq(param1.m1(), param2.m1(Seq(param1.m2()), param3.m3())), param2.m2())

vs

object.call([param1.m1(), param2.m1([param1.m2()], param3.m3())], param2.m2())

of course extracting intermediate arguments into local variables would give greatest readability wins, but actually people tend to write long one-liners.

Ichoran · January 19, 2025, 2:27am

But in the common case where someone is passing a literal, it rarely matters what the literal is unless it is intended to be very high-performance and it is in fact Array.

So just make it Seq and don’t worry about it, in the case where literals are the norm. Or Iterable. Doing otherwise is premature optimization.

Or if you do worry about performance, do the right thing and make it Array or IArray (unless 0/1 element of a non-primitive is the norm, in which case maybe List is better). Literals are literally a perfect match to arrays: you have, at compile time, n items where n can’t vary, and you probably aren’t appending to it; and if you have [1, 2, 3] boxing all the numbers is almost always going to give you Python-like performance instead of Go/Java-like performance.

satorg · January 19, 2025, 2:54am

In regards to other languages that have that bracket-based coll-literals… It is worth mentioning, that such languages almost always do it for a reason: the brackets in the literals do not come out of the blue, but are a part of more extended syntax around collections and such. For example, brackets can be involved into array type declaration (int[] arr = ...), array indexing (arr[0] = ...), array slicing (arr[1:2:3]) and even some more advanced syntax (e.g. [x for x in arr if x % 2 == 0]).

In Scala, on the other hand, these brackets would exactly “come out of the blue” – no other collection-related syntax use brackets in there. Instead, in Scala brackets are entirely dedicated to another part of the language – generics. Now, imagine a newcomer to Scala seeing this:

val seq: Seq[Int] = [1, 2, 3]

The brackets are on both sides – left and right, but for two completely unrelated reasons. What that poor guys could think about that? I bet, in many cases it would be something like “wtf, why are they doing it to me?”

alvae · January 19, 2025, 2:59am

I have seen a lot of comments mentioning Swift, and most of them are ill-informed or outright wrong. So I’d like to offer my perspective as a primarily Swift developer looking at this feature for Scala.

TL;DR; I don’t think collection literals are right for today’s Scala but I also don’t think they are as catastrophic for the language as some people have suggested.

Collection literals work really, really well in Swift. In fact, the support for literals in general (e.g., including numbers) is one of the things Swift really nailed down compared to e.g. C++, Rust, and yes to some extent Scala.

You can look at the following for an example. Regardless of how it will make you feel, my point is that no experienced Swift developer would find surprising, confusing, or hard to read.

Some standard-looking Swift code

/// Returns a table mapping each element in `xs` to its number of occurrences.
func histogram<C: Collection>(_ xs: C) -> [C.Element: Int] {
  var result: [C.Element: Int] = [:]
  for x in xs { result[x, default: 0] += 1 }
  return result
}

let h0: [Int8: Int] = histogram([1, 2, 1, 1, 2])
  .merging([3: 10, 4: 2]) { (lhs, _) in lhs }
print(h0) // Prints [3: 10, 4: 2, 1: 3, 2: 2]

Note the “literal syntax” in several places:

[1, 2, 1, 1, 2] is an array literal
[3: 10, 4: 2] is a map literal
[:] is an empty map literal

Also note that the type of the array in the call to histogram has been inferred from context (to Array<Int8>), driving the inference of the number literals in the sequence 1, 2, 1, 1, 2, which otherwise would have defaulted to Int.

Swift defaults [x, y] to Array<T>, which is like a mutable ArrayBuffer and it defaults [a: x, b: y] to Dictionary<T, U>, which is like a mutable HashMap. If you’re alarmed by the mention of “mutable”, keep in mind that Swift has a very different approach to mutation that does not come with the usual foot guns.

Swift has an empty map literal that is distinct from an empty array literal (i.e., [:] vs []). In the presence of an expected type, Swift will look for an instance of the corresponding ExpressibleByXXXLiteral type class and so one can customize this system as will. Swift does not infer the type of an empty collection literal. But the language has a very different relationship to generics and erasure so this design choice makes a lot of sense there.

Now, perhaps surprisingly, as a Swift developer who loves collection literals in Swift, I don’t think they are a good fit for Scala. At least, I think they can no longer be retrofit in the language without causing harm.

My two main arguments are:

In a language with pervasive use of variadic arguments (which confuses no experienced Scala developers regardless of how they make me feel), Array(1, 2, 3) does not read worse than [1, 2, 3].
Looking at this thread and the one before, I bet that collection literals will be seen as a needless addition by at least a good third of Scala’s community and that will create different schools of style.

I’ll also add that I don’t find the use cases presented so far to be particularly compelling. Note that Swift will struggle to infer the type of @odersky’s example without annotations, despite its strong support for collection literals. That is because typing JSON requires a schema, not a single concise annotation on a val.

Also, JSON is not code and the product of a world where everything is dynamically typed. I do not believe it is a good benchmark for evaluating statically typed language. In fact, I am convinced my life would be easier if JSON had type annotations so that I would not spend precious hours guessing the format of my CI’s configuration files. Commenting on one of the proposal’s feature, I actively dislike that val x = [] can type check.

More generally, I think that the importance of writing embedded DSLs is often overblown. An embedded DSL must embed in its host language, otherwise it’s just a DSL (which is fine!) If one must support syntax that is too alien for the host, then one should write a compiler. It’s not that hard …

One interesting argument was the lowering of Scala’s “differentness” for folks coming from Python, JavaScript, etc. but:

I think no one is surprised that there’s a cost to moving from one language to the next, which includes learning new syntax.
I doubt that people coming from Python to Scala will leave because they must write List(1, 2, 3). The language has one zillion other reasons to stay (or leave) if that’s the background that you have.
As many have pointed out and @sjrd has eloquently summarized, adding choice to the language can sometimes make it harder to pick, not simpler.

That being said I do not think the feature will be nearly as disruptive as braceless syntax because, in general, it only concerns tiny spots in the code. I personally don’t write that many collection literals in standard code so I can easily believe that the feature will be most often used for quick scripts and configuration files, which I guess are the relevant use cases driving the proposal. So at the end of the day, if you don’t like it, you can simply ignore it.

Finally, unlike other syntax changes/addition, this one is very easy to recognize/transform. So I bet tooling can adapt very quickly. I have no doubt that corporate code has countless tool-enforced guidelines that can effectively ban collection literals from a codebase.

Ichoran · January 19, 2025, 3:21am

odersky:

I think I will file an amendmend to the named tuple SIP to support this use case.

Expanding on this. Here’s one of the data values again:

val b1: BuildDescription = (
  declarationMap = true,
  esModuleInterop = true,
  baseUrl = ".",
  rootDir = "typescript",
  declaration = true,
  outDir = pubBundledOut,
  deps = [junitInterface, commonsIo],
  plugins  = [
    ( transform = "typescript-transform-paths" ),
    ( transform = "typescript-transform-paths",
      afterDeclarations = true
    )
  ],
  aliases = ["someValue", "some-value", "a value"],
  moduleResolution = "node",
  module = "CommonJS",
  target = "ES2020"
)

I think this is a really fantastic idea, because I don’t think named tuples are really pulling their weight so far, and this is a compelling use for them. In particular, the really nice feature would be for nesting to work:

case class Point(x: Int, y: Int) {}
case class Rect(ul: Point, br: Point) {}
val box: Rect = (
  ul = (x =  5, y =  12),
  br = (x = 25, y = 192)
)

This is considerably clearer than the status quo:

val box = Rect(
  Point( 5,  12),
  Point(25, 192)
)

which might be more compact, but it’s not nearly so obvious what is going on.

It would be nice if rather than this magically working for case classes and nothing else, there was some annotation or given one could apply to one’s own stuff so it could work that way too: companion object apply can be called with names whenever this annotation/given is present (which case classes would get automatically).

However, there are three wrinkles.

(1) If (x = 5, y = 12) >: (5, 12) then it would be weird if ul = (x = 5, y = 12) works and ul = (5, 12) doesn’t.

(2) If val pt = (x = 5, y = 12), then it would be weird if Rect(ul = (x = 5, y = 12), ...) works but Rect(ul = pt, ...) doesn’t.

(3) If you can Point(y = 12, x = 5) or Point(x = 5, y = 12) then surely (y = 12, x = 5) should work just as well as (x = 5, y = 12). But then if you def setPoint(pt: (y: Int, x: Int)), to set your point, and you call setPoint((5, 12)), your Rect is going to be set backwards.

So, following the principle of least surprise leads to…surprises.

Sufficient thought should be put in to this feature to try to minimize the surprises. The easiest answer I can think of is “no subtyping relationship”; that fixes all the surprises here, but if there are handy explicit conversion methods, some of the surprise comes back again. (Though then there’s a stronger argument for “don’t use the converter in cases that will surprise you”.)

hepin1989 · January 19, 2025, 8:52am

On Java sub Reddit:
https://www.reddit.com/r/java/comments/1i3mwzq/why_java_doesnt_have_collections_literals/

odersky · January 19, 2025, 9:51am

Ichoran:

I think this is a really fantastic idea, because I don’t think named tuples are really pulling their weight so far, and this is a compelling use for them. In particular, the really nice feature would be for nesting to work:
case class Point(x: Int, y: Int) {}
case class Rect(ul: Point, br: Point) {}
val box: Rect = (
  ul = (x =  5, y =  12),
  br = (x = 25, y = 192)
)

Yes, nesting is essential. Fortunately, it already works as you describe it.

That’s what the implementation does: No subtyping relationship or implicit conversions between named tuples and case classes. Instead, we interpret named tuple literals depending on the target type. That’s a general principle, we do this for other literals as well. I have tried to point that principle out in this thread before. Maybe it did not get through well, since it was shot down by some snotty comments so I won’t go into this deeper here anymore.

Here I find this quote from Brian Goetz from 2 days ago: "And while “Collection Literals” is a sensible feature – just one that hasn’t yet made it to the top of the priority list – ". So I guess we will put the idea to rest and wait for Java to have collection literals first, maybe then we get more agreement to go ahead?

stewSquared · January 19, 2025, 12:29pm

As proposed, I don’t think it even compiles unless scala and python are inline.
So it would be a Map, though it sounds like that part of the proposal is off the table.

ghik · January 19, 2025, 1:47pm

I’m not sure if this would scale, but your particular example could be solved with method like this one:

def col[T, C](values: T*)(using f: Factory[T, C]): C =
  f.newBuilder.addAll(values).result()

def foo(xs: Vector[Int], ys: Vector[Int])
def bar(xs: List[Int], ys: ArrayDeque[Int])

foo(col(1, 2, 3), col(4, 5, 6))
bar(col(1, 2, 3), col(4, 5, 6))

It even works for maps:

def baz(m: HashMap[String, Int])

baz(col("a" -> 1, "b" -> 2))

It does not however work for nested collections, due to Scala’s type inference limitations:

def qux(v: Vector[List[Int]])

qux(col(col(1, 2), col(3, 4))) // does not compile

tarsa · January 19, 2025, 2:00pm

map building in java is a joke, so map collection building shorthands are much more pressing issue in java than in e.g. scala.

examples: dictionary - Can I declare and initialize a map in java with a literal? - Stack Overflow

before java 9, initializing a map in plain java (without helper utils or libraries) looked like:

    Map<String, String> setup = new HashMap<>();
    setup.put("1", "one");
    setup.put("2", "two");
    setup.put("3", "three");

in java 9 they added this for up to 10 elements:

Map<String, String> example = Map.of("1", "one", "2", "two");

and this for any number of elements:

Map<String, String> example = Map.ofEntries(
    Map.entry("1", "one"),
    Map.entry("2", "two"),
    Map.entry("3", "three"),
    //...
);

java doesn’t even have tuples, let alone method of converting a sequence of tuples to a map, unless we treat map#entry as a tuple.

scala is way ahead of java in the collections department, so i think adding java to the picture is only adding noise. i think somebody who really wants ‘compact java’ chooses kotlin as kotlin is much closer to java, than scala is to java.