Pre-SIP: A Syntax for Collection Literals

satorg · January 18, 2025, 12:46am

Good point, but let me disagree on that. If it was true, then, for example, LabelledGeneric would have never been created in Shapeless (why?). Named tuples are not just about removing some friction – this feature introduces a possibility to create a type that can represent a nested data structure including field names and moreover can be shared among all data types with the same set of fields. For example, with named tuples we can do something like this:

type MyData = (
  aaa: (
    bbb: Seq[(ccc: Int, ddd: String)],
    eee: (
      fff: Seq[Double]
    )
  ),
  ggg: Boolean
)

Note that it is a type, not a class. Case classes, on the other hand, would require to have a separate class instance at each level of nesting. Moreover, we can use named tuples to run transformations between different case classes. (I’m not sure though if has been supported in Scala already, but if not, I believe it is a matter of time). We cannot use regular case classes for that.

But also as consequence of the above, yes, named tuples can reduce some friction too, which is a nice perk, but not the selling point, I think.

Collection literals, on the other hand, do not introduce anything new apart of the ability to shortcut Seq(1, 2, 3) to [1, 2, 3].

PS. It’s just appeared to me that even with named tuples we have to use Seq[Whatever] if we want to define a type for a collection of items. Therefore, if we really want to make collection literals blended into Scala syntax nicely, then we may need to get something like this:

type MyData = (
  aaa: (
    bbb: [(ccc: Int, ddd: String)],
    eee: (
      fff: [Double]
    )
  ),
  ggg: Boolean

Otherwise, it would be yet another point of confusion:

(123, "abc") has type (Int, String)
(name = "John", age = 33) has type (name: String, age: Int)
Seq(1, 2, 3) has type Seq[Int]
[1, 2, 3] sorry, but only Seq[Int] too

From this point of view, the entire collection literal feature does not seem belonging in Scala language in general, if you will.

tarsa · January 18, 2025, 12:56am

if the main point of this feature it to lure random programmers into scala then this should be widely surveyed. discussion between scala programming experts probably isn’t representative of average python programmer that scala architects want to lure into scala.

i guess that it would at least have to be presented on a couple of conferences around the world and talked about with conference goers before deeming it useful.

mdedetrich · January 18, 2025, 12:57am

I am not a real fan of this, the conciseness is not worth it when it comes to the extra complexity added to the language vs the current apply and there are also gnarly exceptions like with IArray

nikitaga · January 18, 2025, 1:02am

People already use Seq all the time for “don’t care”, and the proposed syntax is a more powerful “don’t care”

True, but people use Seq(1, 2, 3) for collection literals without much thought because they know that they’ll get a decent immutable Seq. Whereas the [1, 2, 3] syntax doesn’t even guarantee that it’s immutable. So for example if I’m calling a function that expects a mutable collection, (because it will mutate its input, and expects me to read its mutations this way) and I provide it with [1, 2, 3] as the argument, now I probably made a bug that would not have happened if I provided it with Seq(1, 2, 3) instead (which would not compile). So with [1, 2, 3] the “don’t care” use case kind of breaks down, because now I need to care where I use it. It’s not safe to use wherever, like Seq(1, 2, 3) is. If Seq(1, 2, 3) doesn’t work somewhere, the compiler will tell me. Annoying sometimes, but a good time to catch a potential bug. But with [1, 2, 3], the compiler will just adapt its type instead, assuming that I know better. But I may not. Especially if I’m a newbie Scala user coming from a different programming language – the motivating target audience for this new feature.

How often do people have problems with 1 :: 2 :: 3 :: Nil vs List(1, 2, 3) vs Seq(1, 2, 3).toList? Do people have problems with new Runnable{ def run() = ??? } vs () => ???? Or x => x.foo vs _.foo?

In each of these sets, each option is functionally equivalent, so the choice is entirely stylistic, depending on whether you want to be more concise or more descriptive. In contrast, [1, 2, 3] and Seq(1, 2, 3) are not the same. The former expression is not necessarily a Seq, and behaves in ways that other Scala expressions generally don’t (it can be one of different unrelated types, based on type ascription, but it can also work without any type ascription).

SAM types perhaps come closest to this kind of type-adaptive literal syntax sugar, but that’s an advanced language feature that is used much more judiciously than [1, 2, 3] would as a whatever collection. And even then, my experience with SAM on user types has been hit and miss in terms of ergonomics. It’s fine for Runnable, but only because Java forces a very verbose Runnable on us so we need to deal with it somehow. But existing collection literals don’t have nearly the same problem with verbosity. They’re nice and short, especially Seq.

tarsa · January 18, 2025, 1:06am

charpov:

What attracted me the most about the proposal initially was the idea of specifying a sequence type only at the definition site, not at the call site. So, I could write f([1,2,3]) and not care if f requires a List or an IndexedSeq or changes over time. But even that is of limited benefit. Once I start to write:
val nums = [1,2,3]
f(nums)
the compiler won’t be able to infer a suitable type for nums based on the definition of f.

rust supports that. here’s the claim that it’s due to hindley-milner type system: How does Rust's type inference work across multiple statements? - Stack Overflow

official example with explanation Inference - Rust By Example :

fn main() {
    // Because of the annotation, the compiler knows that `elem` has type u8.
    let elem = 5u8;

    // Create an empty vector (a growable array).
    let mut vec = Vec::new();
    // At this point the compiler doesn't know the exact type of `vec`, it
    // just knows that it's a vector of something (`Vec<_>`).

    // Insert `elem` in the vector.
    vec.push(elem);
    // Aha! Now the compiler knows that `vec` is a vector of `u8`s (`Vec<u8>`)
    // TODO ^ Try commenting out the `vec.push(elem)` line

    println!("{:?}", vec);
}

since it works in that rapidly gaining popularity language, we could ask seasoned rust devs how much value this provides.

otoh, python, javascript and many other languages with rich collection literals are dynamically typed, so they aren’t as good comparison reference as statically typed rust (for this particular case, i.e. inferring the type of collection).

mdedetrich · January 18, 2025, 1:09am

If Scala is arguably dying it would be for completely different reasons vs this one and I don’t see why slightly simpler collection literals should be a hill to die over.

In fact you can make a strong argument for the opposite, Scalas issue is that it has gone overly complex over time with too many ways to do the same thing (this is the most common complaint I have seen from new people) and this proposal actually makes the situation worse as we have added yet another way to create collections.

If you want to make Scala simpler to learn then you need to remove various alternatives and emphasize orthogonality and consistency and not just keep on adding things to try and lure people.

Also plenty of successful languages have been successful despite following the status quo of other languages. Look at Lua which is a language that counts from 1 instead of 0 (which is completely alien to almost every other language I know) and yet it’s highly successful, being the most popular inbuilt embedded language for games and engines.

lihaoyi · January 18, 2025, 2:21am

Lots of folks have mentioned that collection literals are a dynamic-language single-collection-type thing, which is not true. Modern static languages have them too, with rich collection hierarchies, and they work basically identically to the proposal here target-typing and all

C#: Collection expressions (Collection literals) - C# reference | Microsoft Learn

// Initialize private field:
private static readonly ImmutableArray<string> _months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"];

// property with expression body:
public IEnumerable<int> MaxDays =>
    [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];

public int Sum(IEnumerable<int> values) =>
    values.Sum();

public void Example()
{
    // As a parameter:
    int sum = Sum([1, 2, 3, 4, 5]);
}

Swift: Documentation

var someInts: [Int] = []

var favoriteGenres: Set = ["Rock", "Classical", "Hip hop"]

var airports: [String: String] = ["YYZ": "Toronto Pearson", "DUB": "Dublin"]

Collection literals aren’t a new idea, even in statically typed languages. Every other language has something like them to varying degrees. The only languages that don’t are Scala and Racket.

Cross-posting from the other thread the top 20 languages in the Redmonk June 2024 Ranking:

[...]: Javascript, Python, PHP, C#, Typescript, Ruby, Swift, Kotlin (https://youtrack.jetbrains.com/issue/KT-43871), Rust, Dart
{...}: Java, C++, C (in limited scenarios)
[...]int{...}: Go
c(...): R
@[...]: Objective-C
@(...): Powershell

Languages that don’t:

Seq(...): Scala
(list ...): Racket

Scala shouldn’t blindly copy everything other languages do, but when Scala is the odd one out it’s worth asking: is Scala is truly so special that its requirements are so different from everyone else? Or did Scala simply got it wrong when Martin Odersky designed it 2 decades ago, and maybe it’s worth trying to evolve the language to get it right?

hepin1989 · January 18, 2025, 2:42am

I just have one question, can we have this in 3.7 ?

mberndt · January 18, 2025, 5:22am

So basically your point is that because other languages have syntax like @(…), @[…] and c(…), the &(…) syntax enabled by @JD557’s library sketch isn’t good enough and therefore we need […].

The fact that the proponents of this syntax keep ignoring the best alternative that has been proposed so far makes me extremely suspicious. What are you really after here? Saving one character per list literal? Look like Python?

And when it comes to Go, their stuff is clearly worse than the Scala status quo because you need to specify the element type. And despite of that, Go seems to be doing fine in terms of adoption, so if anything, that is a point against adopting the proposed syntax.

satorg · January 18, 2025, 5:45am

I think it is a little of overstatement. Although I personally appreciate you brought it up, but the comparison is not very accurate and sometimes not fair. I cannot say for all the languages, but let me highlight out some of them.

Kotlin I’m not sure why you mark it as one that supports collection literals, but it seems that as for now, no, it doesn’t:
https://play.kotlinlang.org
Even if I choose the most recent beta version of Kotlin, this doesn’t compile:
```
val x = [1, 2, 3]
println(x)
```
Moreover, the ticket you’re referring to is still in the “Open” state, not even “In progress” or something. Yes, there are a plenty of discussion in the ticket, and many concerns out there are similar to ones in this thread – the feature doesn’t seem adding a lot of value to the language. I might be missing something though, but that is what I found out.
C# First of all, this language is long-known for its pretty extensive strategy of incorporating almost every feature straight into the language core: properties, events, SQL-like queries, async execution, etc. I don’t believe it is the way-to-go for Scala. Moreover, bracket-based collection literals in C# pair well with the array syntax, which is also bracket-based. Moreover, C# collection literals have a lot of pretty advanced syntax (like value spreading) which does add some value indeed (comparing to simple [1, 2, 3]). So if Scala wants all that, well, ok then, but I believe it should be planned thoroughly and accordingly in that case (rather than “hey, let’s give it a shot and then think”).

Nevertheless, here is a catch:
```
ImmutableArray<int> x = [1, 2, 3]; // compiles
var x = [1, 2, 3]; // OOPS: Compilation error (line 10, col 11): There is no target type for the collection expression.    
```
In other words, yes, C# does have collection literals, but as for now it is not exactly superior to the syntax that Scala already has.
C/C++/Java You mentioned that “in limited scenarios”, but in fact it is so limited to array initialization only. It stems from the very legacy array initialization in C and doesn’t allow anything apart of exactly that:
```
// Java
int[] a = {1, 2, 3}; // works, but hold on...

var a = {1, 2, 3}; // nope!

void foo(int[] a) { ... }
foo({1, 2, 3}) // nope!
```
In other words, fairly speaking, we cannot say that Java supports collection literals, because the level of support for collections in Scala is way more superior to one that Java has currently.

Bottomline: perhaps, there is a language that provides better support for collection initialization than currently Scala does (Rust maybe?), but in my opinion it is just not fair to outcast Scala as a language that doesn’t have support collection literals, taking into account that even without that feature Scala outperforms many other languages in terms of conciseness and clarity.

ssdeep · January 18, 2025, 5:52am

CanBeCollectionLiteral

dos65 · January 18, 2025, 7:14am

I would say that one of the biggest Scala issues is that community formed a big complexity myth and keep repeating it for years.

This forms how Scala is seen outside and affects the flow of newcomers.

I don’t get the following thing. Why people belive in complexity of lang and at the same time continue writing in it for years?
Why they worry about its future and keep spreading the negative view?
How do they think it will help to the language?

I think that one of the keys to the bright future of Scala is to find a key to a community to reduce this wrong believe.
I have no idea how let's stop doing any changes might inspire people to join the lang.

satorg · January 18, 2025, 9:02am

Hey guys… It has just appeared to me… Why do we need a new syntax for collection literals again? It seems that Scala already does have such syntax. I mean it. Literally Scala already has syntax for collection literals. Pun intended. Watch this:

val x = (1, 2, 3)

Now, x gets type (Int, Int, Int) or Int *: Int *: Int *: EmptyTuple, if you will.

Tuples are collections. They can be heterogenous though but don’t have to.

Thereby, all that Scala needs is to be taught how to initialize collections from tuples. That’s it. No new controversial syntax required. Here a quick and dirty example follows:

//> using scala 3.6.3

given emptyTupleAsList[A]: Conversion[EmptyTuple, List[A]] with
  def apply(t: EmptyTuple): List[A] = Nil

given tupleAsList[A, T <: Tuple](using
    c: Conversion[T, List[A]]
): Conversion[A *: T, List[A]] with
  def apply(t: A *: T): List[A] = t.head :: c(t.tail)

Having that we can get:

def printSeq[A](seq: Seq[A]): Unit =
  println(seq.mkString("[", ", ", "]"))

@main
def helloCollectionLiterals(): Unit =
  val none = Tuple() // yeah, this one is ugly
  val ints = Tuple(12345) // and this one too
  val strs = ("one", "two")
  val dbls = (12.3, 4.56, 7.89)
  val chrs = ('a', 'b', 'c', 'd')
  printSeq(none) // prints: []
  printSeq(ints) // prints: [12345]
  printSeq(strs) // prints: [one, two]
  printSeq(dbls) // prints: [12.3, 4.56, 7.89]
  printSeq(chrs) // prints: [a, b, c, d]

This one works too, but the inferred type will be Seq[Any] apparently:

  val notExactlySeq = (1, "two", 3.0, '4')
  printSeq(notExactlySeq)

You see, it works nice with a caveat that we cannot use ( ... ) syntax for tuples with arity 0 and 1. Which returns me back to the point I made in my very first message in this thread:

Scala does need a universal and palatable syntax for tuples, not collections. Tuples imply collections. Like, always. But we need a nice syntax for tuples of all arities starting from 0.

UPD.: Actually, there’s toList in Tuple already, so the conversion (if requred) can be even simpler. Anyway and moreover, I don’t see a reason to duplicate tuple literals (1, 2, 3) with the alternative, controversial and more restricted collection literal syntax [1, 2, 3].

rjolly · January 18, 2025, 9:53am

Maybe a crazy idea, but it would look even better if the syntax for named tuples was with square brackets. (I insist on named, of course).

sjrd · January 18, 2025, 1:24pm

It is interesting to note how many users mention “complexity of the language” about this feature, despite the fact that it is “very easy to implement and does not make the compiler any more complex”.

I think the most important underlying reason is choice. The “several ways to do one thing” argument has been brought up already, of course, but why? Choice is supposed to be good, right? Well, not always.

When introducing a new feature that does something we could already do before (and TBH, that’s most features), the important question is: when that feature is applicable, is it (almost) always the best choice? If yes, then a feature does not introduce more choice paralysis or “language complexity”. It is easy to tell learners when to use which features. However, this is clearly not the case for this proposal. There are plenty of situations where a collection literal would be applicable but wouldn’t be the best choice.

Looking back at some prominent changes from the past few years, I can see a pattern: features that were the most controversial were the ones that introduced new choice without always being better, and conversely.

Some very important changes that were not controversial. When applicable, these new features are always better than the old way of doing things:

enum: always better than sealed abstract class/sealed trait when your structure fits in an enum.
extension defs: always better than the implicit class extends AnyVal dance of Scala 2.

Nobody says enums or extensions made the language more complex. On the contrary. Yet, these features contain enormous complexity in the compiler implementation!

Some very important changes that were controversial (and still are): even when applicable, they are not unambiguously better than the old ways:

Indentation syntax (yes, that one!), and other syntax changes that came with it like if..then.
All the things that are supposed to be better than implicit conversions, but with limitations that make them not actually better in many situations.

Language stagnation is not what we want. We want progress. But when we introduce new features, they should be better than the old ways, every time they are applicable. If they are merely different than the old way in a non-negligible number of situations, that introduces unwanted choice. Unwanted choice leads to language fragmentation. That, we must avoid.

arturopala · January 18, 2025, 1:41pm

odersky:

val b1: BuildDescription = (
  declarationMap = true,
  esModuleInterop = true,
  baseUrl = ".",
  rootDir = "typescript",
  declaration = true,
  outDir = pubBundledOut,
  deps = [junitInterface, commonsIo],
  plugins  = [
    ( transform = "typescript-transform-paths" ),
    ( transform = "typescript-transform-paths",
      afterDeclarations = true
    )
  ],
  aliases = ["someValue", "some-value", "a value"],
  moduleResolution = "node",
  module = "CommonJS",
  target = "ES2020"
)

This is not a valid JSON syntax, and this is very good because we can use (..) for sequences and maps, and devise a very simple and orthogonal rule out of it:

one can omit collection or case clase name if the target type is known

that way, it will be up to the code author whether to write

plugins  = (
    ( transform = "typescript-transform-paths" ),
    ( transform = "typescript-transform-paths",
      afterDeclarations = true
    )
  )

or

plugins  = Seq(
    Plugin( transform = "typescript-transform-paths" ),
    Plugin( 
      transform = "typescript-transform-paths",
      afterDeclarations = true
    )
  )

that way it will nicely fit in a Scala way of doing things IMHO

hepin1989 · January 18, 2025, 1:47pm

I think we should move faster and yes, make the new syntax LLM-friendly too:)
I would like to have this in 3.7, and we will see, Java will start to add this in Java 26.

channingwalton · January 18, 2025, 3:48pm

I don’t think this is a compelling example because, in practice, this just isn’t how JSON is built in commercial systems for all kinds of reasons. We use libraries like circe to map from case classes because we need a rich model.

tarsa · January 18, 2025, 3:52pm

javascript, typescript: doesn’t have rich collection types. has specialized syntax for maps and sets separate from sequences. map retains insertion order so can be sorted, but there’s no separate sorted map type.
python: similarto js/ts, but also doesn’t even allow to sort a map.
php: from documentation: " An array in PHP is actually an ordered map". weird language skip.
kotlin: unfinished feature.
java, c++, c: there is only raw array initializer (so absolutely no choice of collection type) and you have to write full type next to it anyway. how is new int[] {1, 2, 3} better than Seq(1, 2, 3)?
rust: the [...] syntax always creates arrays. no collection type choice afaik (but i’ve forgotten rust somewhat since i’ve last used it).
dart: the built in syntax results in fixed (unadapted) collection types. SplayTreeMap<String, int> sortedMap = {'a': 1, 'c': 2, 'b': 3}; gives Error: A value of type 'Map<String, int>' can't be assigned to a variable of type 'SplayTreeMap<String, int>.
swift: doesn’t have rich collection types? is there even a sorted map? anyway, by looking at examples in documentation, it requires known target type first. no fallback default in cases like var whatever = [1, 2, 3].
c#: has rich collection types, static typing, generics, method overloading, etc just like scala and it probably the best reference comparison here.

to expand on c# in separate paragraph:

has no fallback defaults. if you don’t provide collection type in any way then the code won’t compile - i think that’s very good. the default could be bad anyway, e.g. i think Seq being List is a wrong default, since operations on singly linked list often unnecessarily degrade to O(n). something like a combination of current List (for low memory usage on small collections) + Vector (for balanced performance on non-trivial collections) would be a best default.
has rich collection types, static typing, generics, method overloading, etc so it’s one of the closest counterparts to scala from the list above
c# is a kitchen sink of programming language features, but still the c# authors often do some lengthy discussions (with many people involved) before deciding what to add to the (ever increasing) mix
has to deal with complicated rules of method overloading.

Many APIs are overloaded with multiple collection types as parameters. Because a collection expression can be converted to many different expression types, these APIs might require casts on the collection expression to specify the correct conversion. The following conversion rules resolve some of the ambiguities:

A better element conversion is preferred over a better collection type conversion. In other words, the type of elements in the collection expression has more importance than the type of the collection. These rules are described in the feature spec for better conversion from collection expression.

Conversion to Span, ReadOnlySpan, or another ref struct type is better than a conversion to a non-ref struct type.

Conversion to a noninterface type is better than a conversion to an interface type.

When a collection expression is converted to a Span or ReadOnlySpan, the span object’s safe context is taken from the safe context of all elements included in the span. For detailed rules, see the Collection expression specification.

how does the ‘simple’ prototype implementation for scala deal with ambiguities during method overloading? is this precisely specified? what about project caprese? will capabilities dictate what overload is chosen?

also the automatic conversion of collection literal to required target type sounds awfully close to implicit conversions which scala 3 want to get rid of.

if we go with c# route and require that compiler knows the target type before converting collection literal to that target type then we can easily support sets and maps, sorted and unsorted. i think that rule (compiler knows target type before converting collection literal to it) is essential to make the complexity (cognitive load) manageable.

note that requiring knowing target type first will probably make the json examples less feasible, but imho that’s not a problem, since json can be (and should be) handled by string interpolation and/or serialization. the json example is overall weird to me, since the json-like scala syntax is completely incompatible with json. there’s no subset of proposed scala syntax and existing json syntax that would compile under both languages. what value is here if we need to adapt the text representation when moving it from scala to json and vice versa?

mdedetrich · January 18, 2025, 4:12pm

Im sorry but Scala being more complex is an objective fact, its like the C++ of high level languages. It mixes a lot of ideas and also has a lot of paradigms while also dealing with being built ontop of another language/ecosystem (i.e. Java and JVM) which brings its own complexity. Note that a language being complex doesn’t prevent writing simple programs in it, but it does mean that there are often many ways to solve a single problem (due to the large amount of paradigms).

Maybe because there are advantages to complexity because it allows you to both express things and solve complex programs in an elegant manner? A good example is the automatic resource management in libraries like cats-effect/zio and a good counter example as to why being “dumb simple” isn’t always great is too see the vast amounts of boilerplate in languages like Go.

Being honest about a languages shortcomings is the only way to improve it, drinking the kool aid does not.

By refocusing on what a lot of people think actually matters. I could very easily claim as to how introducing another syntax for expressing something that we can currently already in 4 different ways is entirely solving the wrong problem.

I, as well as others, can hardly see how writing [1,2,3,4] vs List(1,2,3,4) is even a minor (let alone major) reason why Scala is not doing well.

No one is saying not to do any changes, quite the opposite. People are saying that doing THIS specific change is not really helping, if anything its arguably making things worse.

I mean as @tarsa is pointing out, calling this change simple is also highly deceptive as it opens the door to so many complexities, largely to do with the fact that fundamentally Scala is a language that contains many rich data structures out of the box as part of stdlib and because of that this feature will create more confusion in non trivial cases which is not something we really want unless there is a massive benefit and slightly shorter syntax doesn’t count as a massive benefit.