Pre-SIP: A Syntax for Collection Literals

bishabosha · January 16, 2025, 8:29am

I would be (Edit: kind-of) in favor if the map syntax didn’t use arrows - i think = is fine,

val foo = [1 = "abc", 2 = "xyz"]

but for that case, introduce a second type class that explicitly deals with Key/Value.

using -> to mean “pair” isnt harmed by this imo, because the above reads as “map literal”, rather than the current proposals “sequence literal of pairs”.

Sporarum · January 16, 2025, 8:35am

I forgot about this part, and I think it should not be overlooked

Explanation:

The apply method already exists on collections, and it uses a Seq as input, so the following is not compiled in a smart way:

IArray(a, b, c)
// desugars to
IArray.apply(Seq(a, b, c)) // constructs a Seq
// and not to some optimised memory creation

But this would not (necessarily) be the case for brackets:

This means people who care about this kind of impact in performance will have to use the bracket syntax, even in cases where the old way is more readable

This only applies to libraries which have to stay binary compatible with their version before this update, but that might be a lot of libraries, notably the standard one !

gaeljw · January 16, 2025, 9:00am

I tend to agree with the reasons already mentioned above against the proposal.

IMHO it brings a tiny improvement for very few users compared to the confusion and language complexity it creates.

sjrd · January 16, 2025, 9:46am

It was mentioned before, but I’ll add some more opposition to the “default to Map if the arguments have the a -> b shape of tree” aspect of the proposal.

We have no precedent in Scala to do something different based on the shape of subtrees (and other than some corner cases in JavaScript about eval, I don’t know of any other language that does something like that).
It doesn’t generalize to 0: ["a" -> 1, "b" -> 2] is a map, ["a" -> 1] is a map, but [] a seq!
None of the arguments about familiarity from other languages apply to it: other languages explicitly have a different syntax (often {}-based) for maps/dictionaries.

odersky · January 16, 2025, 12:16pm

I don’t think that follows. The apply method in a companion object can be a macro just as well as the fromLiteral method can be a macro.

majk-p · January 16, 2025, 12:24pm

I appreciate the will of making Scala a better language, but I’ve got some concerns about this proposal. Firstly, it may create additional work for tooling maintainers. According to the recent announcement from Scala Space (Scala Space in 2025), they intend to focus on stability (they mention Metals, but that’s many tools under the hood), and introducing new syntax might not align with that goal.

Additionally, the extra syntax could pose a mental burden for developers, increasing the learning curve. While there might be benefits to the new syntax, I’m worried that they may not outweigh the associated costs.

channingwalton · January 16, 2025, 12:24pm

I have no objection to the idea in itself, although its incredibly rare for me to have to create collections like this in practice in the kinds of systems I work on (order management, commerce, finance for example).

My objection is that this isn’t the kind of feature that time should be spent on.

The biggest problems facing developers is compiler performance and IDE support, both of which are major impediments to adoption in my experience, and probably the main reason for Scala’s falling popularity.

Basically, there are more important things to work on.

prolativ · January 16, 2025, 12:40pm

Why not have dedicated types for collection literals like

sealed trait CollectionLiteral
trait SeqLikeLiteral[+A] extends CollectionLiteral // e.g. ["foo1", "foo2"]
trait MapLikeLiteral[+K, +V] extends CollectionLiteral // e.g. [foo = "bar"]

final class EmptyCollectionLiteral extends SeqLikeLiteral[Nothing],
                                           MapLikeLiteral[Nothing, Nothing] // []

and then convert them to appropriate runtime implementations when necessary? Then [] could work both for an empty Seq or Map depending on context.
I used the syntax for maps mentioned by @bishabosha above, I guess this approach wouldn’t work well for ["foo" -> "bar"] as a Map. If one needs a key that is not an identifier-like string literal maybe something like [(foo) = bar] would work, given that foo is some val or def.
Nevertheless I’m more a fan of using (...) instead of [...] and trying to integrate the collection literals with tuples (named or unnamed) somehow.

lihaoyi · January 16, 2025, 1:23pm

I agree that the map literal syntax is probably the weakest part of this proposal, for reasons already given. IMO it can be elided without much loss of functionality.

In all target-typed scenarios, constructing a Map with a collection literal containing arrows works anyway without special casing. And for non-target-typed scenarios, I would guess that Seq are instantiated maybe an of magnitude more than Map, so it would be reasonable to have one and not the other.

I’d suggest we drop the Map special casing, which is clearly the most controversial and possibly least important part of this proposal, and focus on the rest of it

Sporarum · January 16, 2025, 1:39pm

What I tried to say is:
New libraries can 100% use macros for their apply methods, giving no advantage to the bracket syntax
However, old libraries, such as the standard library, cannot (without breaking bincompat). But these libraries will have no problem using macros with bracket syntax and thus will do so, leading to a performance benefit for bracket syntax in some cases, again notably with the standard library!

(Of course this assumes it is not possible to evolve an old library to use macros for it’s already existing applys, if that is not the case, feel free to correct me)

fanf · January 16, 2025, 1:54pm

From a user perspective, i dont understand that proposal given the goal of:

making the language simpler. It adds one more way of doing same things, one more major difference between code base conventions, one more place where we wonder if we look at Scala or some other language.
easing the pressure on tool developers.

And maps with the same syntax is extremely confusing.

Plus, I remember when lots syntax changed were rushed into scala 3 “because it was the last time books will be written and syntax will change”. It caused a lot of arm at the time, at least make it not be ignored now.

Finally, I think in my code base context, it is just never a problem.
If I need lots if data, I default to json or yaml, in dtring or external files, so that the dataset can be exchanged with other languages in a consistent way.
If I don’t need lots of data, it doesn’t matter to add Map or Seq.

So, losing a lot more of “Scala code base inter-consistency” to what gain?

Well, perhaps just add first class json string and be done? At least it’s just a string parser, not syntax.

eed3si9n · January 16, 2025, 3:32pm

Given Scala’s type inference, in a real code base [0, 1, 0] would appear without obvious type ascriptions.

It is shorter, but less readable for beginners because they would have no idea what the type is. Is it a Vector? List? Set? Some other esoteric datatype?
This is the return of the infamous CanBuildFrom in a sheep’s skin, or Pythonic skin. The open aspect makes this hard to read/understand.
It makes Scala harder to learn, because this breaks the regularity of Scala that parenthesis is for terms, and bracket is for types.

Side note: tooling support

Tooling is somewhat open-ended so concretely, I’m thinking about formatting, linting, code completion, syntax highlighting on partially edited code. Scala 3.6 rolled out (partly accidentally) without coordination with IntelliJ, Metals, tree-sitter-scala etc working.

I’ve defended Scala 3.6 syntax changes by implementing tree-sitter-scala changes in a hurry, but the syntax change should come with environmental impact analysis (if not actual pull requests) to the tooling ecosystem, assuming people care to have Scala code highlighted correctly on GitHub and editors, and IDEs to function.

Summary

With or without tooling, I personally think the proposed syntax change makes Scala 3.x harder to learn because it adds yet another way of doing the same thing Vector(0, 1, 0), and the semantics is completely opaque, which is harder for beginners.

Neither Python nor Rust does this.

djspiewak · January 16, 2025, 4:10pm

I strongly agree with all of Eugene’s points here, but I want to particularly call attention to this:

I recently went through and (finally) updated Sublime Text’s Scala mode to support Scala 3 syntax. It was… nightmarish. There are still elements of it which don’t work correctly because I essentially gave up (in particular, case statements can be highlighted somewhat strangely since we don’t know if they’re indented match/case/partial-function thingies or if they’re members of an enum). Additionally, given shows up in way, way too many places, so any pretense of it being a soft keyword goes out the window. There are loads of other corners and angles (like self-types being inexpressible without braces), but that’s not what I wanted to talk about.

The underlying point here is that syntax changes need to be considered in sympathy with the tooling. The way in which a compiler parses a language is fundamentally different than the way an editor parses it. This is true for several reasons, but one easy one to conceptualize is the fact that the normal state for an editor is that the buffer does not parse into a valid syntax tree, but the parser (and other semantic tooling) must still handle every line reasonably. This is very different from a compiler, which generally fails on the first parse error, and even the very few compilers which do better than this still don’t need to do anything other than identify subsequent errors.

At a minimum, all proposed syntax changes should come not just with refinements to the compiler’s grammar, but also a tree sitter patch and ideally some experimental results on how the editor experience feels in the presence of the new syntax (interactions with similarly-parsed constructs are often terribly suboptimal, as my case example demonstrates).

I also generally think we should not be piling so much specialized syntax on top of an already very syntactically-complex language, but if we’re going to keep tweaking things in this fashion, at least it must be done in tandem with the tooling rather than implicitly hand-waving the problem away as being analogous to the compiler.

tgodzik · January 16, 2025, 4:14pm

And I can imagine that would add to people writing their code differently between codebases thus splitting the community even further. Honestly, I don’t really see it as such an improvement as to warrant the amount of work to go with it.

When I was learning Scala a while back I never really had issues with writing List() etc. I didn’t really understand what was going underneath, but it was simple to learn how to write lists, maps etc.

Also, using syntax typically associated with types seems problematic and it’s only advantage is to be like python.

bishabosha · January 16, 2025, 5:42pm

I’m changing my favorability somewhat.

If we do this thing - i think it should have a fixed type and that be it.

e.g. [1,2,3] is fixed to always be Seq[T] and you can debate what is the best allrounder underlying class for performance characteristics.

Then the use case is specifically for DSL’s that are designed around raw data (basically for config).

Have explicit conversions which can maybe be inlineable.

odersky · January 16, 2025, 6:09pm

I am happy to disregard map literals for the time being. They were anyway only a marginal part of the proposal.

About tooling concerns: Isn’t that a bit overblown? It’s a trivial syntax change. Took me about 5 minutes to change the grammar and the parser to support it. I agree that recent changes to givens and named tuples did pose deeper tooling challenges. But this?

One can also see it the other way. Collection literals would be a big help in approachable DSLs for tooling. For instance, I was told that they would be great for simplifying Mill build scripts. So a more approachable syntax also leads to more approachable tools. And this is the kind of simplification that matters for these tools. HKT or not, who cares? But a straightforward way to define a bunch of things that does not mention implementation details is needed in lots of places.

About the the concerns of learnability: It’s evidently not a problem in a dozen other languages. Almost every future programmer will come from Python where collection literals are everywhere. These future programmers will be pleased if they find the same syntax in Scala. They will be put off if it’s absent because we insist that collection literals are too hard to learn.

djspiewak · January 16, 2025, 6:35pm

I can’t speak for Eugene, but my point is that this type of handwave is exactly the problem. Neither you nor I know whether this concern is overblown because we haven’t actually tried to modify the tools (at first glance, my guess is that this is going to cause problems with parsing function type parameter applications, and the solution will likely involve some whitespace sensitive weirdness that will break with symbolic methods, but that’s just a guess). What I can say is that several of the recent syntax adjustments that seem very trivial are in fact incredibly difficult to handle. I refer back to my note about how editor parsers are very fundamentally different from compiler parsers. What’s trivial to do in scalac may be super hard in an editor and vice versa.

As a trivial and off topic example of this, remember that editors don’t generally have the ability to pretokenize, so we don’t have indent/dedent tokens to work with. This in turn means that any syntax which is sensitive to significant indentation or whitespace in general becomes really challenging to handle (braceless match types and their interaction with semicolon inference come to mind as a good example, but braceless enums are even more insidious).

At any rate, I’m not trying to spread FUD, just encourage the following concrete action: along with the grammar modifications in these proposals, please also take the time to write and test a tree sitter patch which implements the same syntax. (by “test” I also mean “try typing in some source files”, because the incremental buffer experience is usually part of the problem here) For a lot of syntactic changes, this will be easy; for some it will be impossible. If you take the time to do this, it will close the loop on language design and put a halt to these situations which have arisen consistently during Scala 3’s existence where a syntactic change is made to the language which is done in such a way that it is extremely difficult or impossible for the tooling to match.

tgodzik · January 16, 2025, 6:40pm

For me a bigger concern would be that beginners could for example find both
List(1,2,3) and [1,2,3] in the wild, which will nothing if not confusing.

There will be two ways of doing something very simple and used heavily in codebases.

Tooling concerns are relevant, but for sure if we were cautious and roll it out slow, we would be probably safe on that account. That will however be additional work for tooling authors, which they will spend time on instead of working on fixes etc. It seems minimal gain really for a lot of work and a lot of confusion.

I am not really convinced that it would simplify how build definitions looks like. Writing Seq was never an issue there. No one complained about that.

SethTisue · January 16, 2025, 6:59pm

I would like to join the chorus of voices in opposition to this proposal. It solves a problem we do not have and adds yet more syntax to a language which already has plenty of it. It will make Scala harder to teach, harder to read, harder to build tooling for. And the gain is marginal. Let’s please not do this.

Ichoran · January 16, 2025, 6:59pm

The collection literal [1, 2] in Python creates a List which is a mutable efficiently-indexable datastructure with an ArrayBuffer-like implementation under the hood.

If the future programmers are pleased to type val x = [1, 2] instead of x = [1, 2] but then try val y = [z*z for z in x] and it doesn’t work, will they be pleased? If they try

for w in ws:
  x.append(w)

and it doesn’t work, not even if they type var x = [1, 2] to start, will they be pleased?

I don’t think we can reason about things directly in this way. Python programmers like things to just work without hassle, but they put up with numpy.array([2, 3]) all the time (well, usually np.array because import numpy as np).

It’s very worthwhile to think about a top-quality experience, but we have to make sure it works well all the way through with Scala or it will worsen the experience of everyone and not meaningfully ease the migration of Python-only programmers.