Pre-SIP: A Syntax for Collection Literals

philwalk · February 8, 2025, 6:23pm

A translator tool can be helpful (are there any good ones?). The scala output of even a good translator might enhance the perception that scala is verbose.

A person familiar with python/numpy can easily come to the (false) conclusion that scala is verbose or complicated. In general, the opposite is true, but I wonder how many give up before having a positive experience.

Translating numpy code to scala/breeze creates the wrong expectations about other literal declarations:

val mat = DenseMatrix((1.0, 2.0), (3.1, 4.5), (-1.0, 3.4))

A natural but false expectation is that ordinary array declarations might work similarly:

val arr = Array((1, 2), (3, 4), (5,6))

philwalk · February 8, 2025, 6:44pm

Most (all?) tutorials introduce collections with literals, so they have disproportionate influence on the perception of newcomers.

satorg · February 8, 2025, 9:12pm

In Scala3 tuples are collections, therefore all the below declarations are generally equivalent from that perspective:

Seq(Seq(1, 2, 3), Seq(4, 5, 6), Seq(7, 8, 9))
Seq((1, 2, 3), (4, 5, 6), (7, 8, 9))
(Seq(1, 2, 3), Seq(4, 5, 6), Seq(7, 8, 9))
((1, 2, 3), (4, 5, 6), (7, 8, 9))

– they all represent a collection of nested collections, all immutable.

On the other hand, Array is different because it is mutable, and so lists in Python are. Therefore to resemble the [] syntax in Python, [] syntax in Scala should also be producing mutable sequences. And, to be fairer, it should start supporting []-style indexing, slicing and for-comprehension that Python can do.

But I personally doubt it should ever be a goal, because converting some code from one language to another with some tooling is generally easier and has less impact on the ecosystem rather than embedding an alien syntax into the language itself.

bjornregnell · February 13, 2025, 10:34am

Thanks everyone for contributing to this thread - it is so nice to see all engaged replies into this hot topic. As the rate of incoming comments tend to slow down I think it might be good to try to do a quick summary of the main pros and cons from the above discussions on introducing (Python-like) collection literals. (Let me know if I missed some important pros/cons.)

Regarding the concept of collection literals:
Pros:

Conciseness and ergonomics. Many argue that collection literals combined with target typing and a default collection target will make constructing collections more readable and higher level (no implementation details shown).
Support adoption of Scala. Developers are already familiar with this from other languages.

Cons:

Risk of bugs/issues when being non-explicit about the collection type.
There are already ways of doing this. Adding more ways of doing the same thing is not good. This may co-inside with the narrative of “Scala having too many ways of doing things” and thus hamper adoption.
Risk for tooling evolution, esp. in terms of interaction with other language constructs and increased complexity in grammar/semantics. There are problems with implementing this in IDE:s and it may mean that resources are forced to be spent on this rather than more important things.

Regarding the surface syntax based on brackets specifically:
Pros:

Similar to other languages. Developers can continue with known syntax.
More concise than apply-syntax. In nested structures the benefit of conciseness is amplified.

Cons:

The bracket syntax is alien to Scala in term position. Brackets currently have a clear devotion to types. Confusing this may actually hamper learnability and adoption.
The similarity to other languages is dubious as one may assume semantics that are actually false in Scala.
The bracket syntax compete and interacts with the tuple syntax. Why not use tuple syntax instead, as it is more native to Scala in term position?
Risks of splitting the community and code-bases between apply-syntax and bracket syntax.

My current, personal view, given all the insightful discussions here, is that, if doing the pros/cons tradeoff then bracket-based syntax is probably not worth it, but tuple-based syntax might be (we would benefit from a solution to the Tuple0 and Tuple1 syntax problem anyway…). But we would need extensive experimentation and investigations on feature interaction e.g. with named tuples etc. in order to validate that it’s worth it.

rjolly · February 16, 2025, 12:40pm

This is a serious issue, and the solution might be to require an expected type. It would solve the problem of what collection to use in the absence of expected type, and also the problem of maps with the awkward use of a -> b.

Also, following up on the idea above to also use brackets for case class literals, it would solve the problem of what actually is the type of:

[a = "str", b = true]

The error could be “type ascription needed” or somesuch.

Edit : note there is precedent (albeit still experimental) with generic number literals. This does not typecheck in the absence of type ascription:

val a = 11111111111

nightscape · February 16, 2025, 8:08pm

Just to throw in a fresh idea:
How about not doing this on the file-level, but rather have the option that your editor displays it differently than it is stored on disk.
The entire discussion reminds me a little of the issue of aligning e.g. the => arrows for match-case. It’s nicer to read when it’s aligned, but causes problems when the length of one the longest line changes and suddenly you have a git diff where all case lines changed because the formatter needed to re-align everything. If the editor were smart enough to show the arrows aligned, but on disk they are stored unaligned you would get the best of both worlds.

In the same way, editors could display a nicer-to-read syntax to the user than what is actually stored on disk. Kind of like font ligatures, but specific for a language.

Of course this doesn’t bring all the benefits of changing the syntax itself:

Users of more text-focused editors like VIM, Helix, Emacs, … might be left out.
When users look at the code on GitHub it looks differently than when they look at it in their editor.
When you write code, you (probably, unless your editor is very smart) have to use the current syntax.
It would need to be supported by editors.

On the other hand:

It would be an improvement on the status quo (if you consider the shorter syntax as an improvement).
Everybody could turn it off or on depending on their liking.
Everybody would get to pick the symbol they like, instead of forcing square brackets. I’m sure there are unicode symbols close to but still distinguishable from square brackets.
It would have few of the mentioned negative effects.
It could open up a way to add more nicer-syntax features without much impact.
It might standardized and be unified with the inline display of implicit expansions which Intellij and Metals both added independently. Maybe even as an extension to LSP.

Sporarum · February 16, 2025, 8:42pm

To be clear, what you are proposing is that

val xs = Seq(1, 2, 3)

is written in the .scala file, but the following is displayed to the user (if they so wish):

val xs = [1, 2, 3]

And in particular, this latter option would not be valid in a .scala file.
(This is the kind of thing that obsidian’s live preview does for markdown, and even then it can be confusing to navigate)

I’ve given this kind of chose-your-own-syntax some thought before and while initially very optimistic, I am now very much opposed.
(Even for things like non-standard tab lengths or programmer ligatures)

The issue is that most code is viewed outside of an editor
Chiefly, there’s online tutorials, forums like this one, videos and stack overflow
This is especially true while learning, which is when syntax is the most important !

And this would be true even if everyone used the same editor, and if that editor was perfect
And neither is the case currently, one of the issues raised in this thread is precisely the latter: There’s not enough tooling support

As such, I believe this idea is a trap at the best of times, and in the current context, would just spread the little support we have way too thin

(I don’t think this applies to aligning the match arrows, as the displayed code is valid and equivalent. This is similar to the “display inferred type” that some tool provide)

bjornregnell · February 18, 2025, 11:33am

Following up on @Sporarum 's comment: I agree that there is a great value in a true correspondence between real syntax and rendering in IDE:s, to avoid unnecessary confusion.

I also think that there are already ways of being more concise when constructing collections, that to a large extent addresses the motivating example by @odersky. For example with

val diag = Vector(
    Vector(1, 0, 0),
    Vector(0, 1, 0),
    Vector(0, 0, 1))

we could simply (as pointed out by @tarsa here

val V = Vector
val diag = V( V(1, 0, 0), V(0, 1, 0), V(0, 0, 1) )

and still having all IDE goodies work, and thus having + eating almost the whole cake

bjornregnell · February 18, 2025, 11:58am

Also the motivating example here by @lihaoyi

val json0 = ujson.Arr(
  ujson.Obj("myFieldA" -> ujson.Num(1), "myFieldB" -> ujson.Str("g")),
  ujson.Obj("myFieldA" -> ujson.Num(2), "myFieldB" -> ujson.Str("k"))
)

could be made concise and more readable by a simple import:

import ujson.*
val json0 = Arr(
  Obj("myFieldA" -> Num(1), "myFieldB" -> Str("g")),
  Obj("myFieldA" -> Num(2), "myFieldB" -> Str("k")),
)

and I think that there are readability and safety benefits in having Obj, Num, Str written explicitly, compared to a bracket salad.

bjornregnell · February 19, 2025, 3:08pm

Also, all the arguments here by @lihaoyi are valid for a tuple-based syntax, as far as I can tell.

Another problem I have with the proposal is that, after going through coding examples in my teaching material for beginner-level programming for first-year computer science and engineering students here…

(sorry for my Swedish, a translator might help but much of the code is in English)

…then I actually found no real benefit from a teaching perspective if I change to collection literals, but sometimes rather the opposite as there is more to explain about target typing etc and things get less explicit etc. And this is despite the readability argument in the proposal… (but might depend on the audience being beginner programmers).

lihaoyi · February 21, 2025, 4:16pm

The tuple-based syntax would indeed be perfect, except for one wrinkle: () and (1) already means something that is not a tuple. () is Unit, which can be worked around by compiler magic without too much issue. But (1) means 1, and this is so deeply embedded into every non-LISP programming language (not to mention everyone’s primary-school education!) that trying to change that to mean Seq(1) is basically impossible. And as @odersky has said earlier, having an implicit conversion from t: T to Seq(t): Seq[T] is probably far too powerful when what we really want is just a lightweight syntax for collection literals.

Given that, square brackets [] are the next best thing. There is some similarity to type parameters that could be theoretically confusing, but Python has already proved out the “are square brackets in types and square brackets in collections ambiguous” question, and the answer seems to be empirically “it is not a problem in reality”. Yes, Python uses [] for both indexing and list definition, so [1, 2, 3][0] in Python would be [1, 2, 3](0) in Scala, but I don’t think that’s a fatal difference.

Sporarum · February 21, 2025, 5:32pm

I agree, however that can be fixed, which would be less additional syntax, and benefit other features (tuples and named tuples)

Which btw is a real downside of the bracket syntax, it’s not at all unlikely people start to do this:

val t: Tuple = [
  1,
// 2,
]

And this is just absurd, but of course it makes a lot of sense given the “proper way” doesn’t work:

val t: Tuple = (
  1,
// 2,
) // Got: Int, Expected: Tuple

Additionally the original message probably also meant tuple-like syntaxes like @(1) which does not suffer from this issue

jeremyrsmith · February 21, 2025, 8:26pm

Given that T and Tuple1[T] are isomorphic, I don’t think it would be crazy to have an implicit conversion from T to Tuple1[T]. I can see how it could cause problems sometimes, but in rare places where it causes problems you could just accept the fact that shortcut syntax is unavailable.

Or, we could try for a language solution to the final problem, which is lack of existing syntax for Tuple1. I think this is a smaller problem than adding an entirely new syntax (with all its problems) to the language. For example, maybe the lone expression ((x)) could unambiguously mean Tuple1(x).

Or, just live with singleton sequences being slightly more inconvenient and don’t change the language at all.

I understand the desire to have beautiful syntax that fits perfectly with your use case. But that’s why there’s https://racket-lang.org/.

bjornregnell · February 23, 2025, 12:36pm

So if we don’t want conversion, we can simply have a suitable, concise extension, and then we could in those cases:

val xs: Seq[Int] = (1).tup   // Seq(1)
val ys: Seq[Int] = ().tup    // Seq()
val zs: Seq[Int] = (1, 2)    // Seq(1, 2)

Pretty OK, in my humble opinion. And safe. And readable.

rjolly · February 24, 2025, 9:23am

Maybe it’s because you do not teach data values, which in turn is because it’s not used in the industry, because it’s currently unsuitable(?) As someone working in the industry, I can tell you that being able to use the same language as for business logic for data and data validation would be immensely useful.

bjornregnell · February 24, 2025, 10:41am

We use case classes, tuples, enums, etc a lot and named tuples will be very useful as we introduce them the coming fall semester. What I mean is that I don’t see improvement with the bracket-based syntax and in most cases its perfectly OK and more readable (in my subjective view - I haven’t tested empirically with students…) to be explicit and just use a Seq or a Vector or a case class or a tuple or an enum etc.

odersky · February 24, 2025, 12:09pm

My main problem is with data modelling. There is a very concrete notion of “a sequence of things”, never mind how it’s implemented, which comes up in lots of places. When I write an algorithm abstractly I would use [a, b, c] to express such a sequence. And then I’m sad that I can’t write the same in Scala.

bjornregnell · February 24, 2025, 1:03pm

But could not tuples do what you want?

odersky · February 24, 2025, 1:42pm

So far, I have found none of the arguments for tuple syntax convincing. Sequences with one element is the blocker, and all constructions for disambiguation look more cumbersome than what the syntax is worth. By contrast, brackets is universally established and I have also found none of the arguments that they would cause syntax clashes convincing. Lots of other languages prove that this is not the case.

bjornregnell · February 24, 2025, 2:15pm

If you think 42.tup is too cubḿbersome, what about the already available toString for tuples (42,) as syntax for seqs of one elem, which does not look so cumbersome to me as it is just one little tiny comma?

I think that if the only blocker for tuple based syntax is the single elem seq case, then we should solve that rather than introduce a whole new syntactic thing and multiple ways of doing things. Or else if we embrace bracket syntax we should be regular and use brackets also for tuples in term position (named and unnamed). But I cannot see a way how brackets for tuple types can be used unambiguously in type position…

Lots of other languages prove that this is not the case.

But lots of other languages have lots of different semantics for brackets and lots of other languages do not have named tuples using brackets for types and parens for tuples both in type and term position.