Pre-SIP: a syntax for aggregate literals

lihaoyi · December 24, 2024, 11:49pm

This is true. We do not yet have experience with any new syntax, but one thing we can do here is to look at other languages which do have experience with it.

Python has been using square brackets for both types and collections for a few years now. I am not aware of any complaints about the reuse of the syntax. There are lots of other complaints about python syntax and python types in general, but the reuse of the square bracket syntax just does not seem to be an issue in practice

hepin1989 · December 25, 2024, 7:50am

A new language moonbit is using aggregate literals too
https://docs.moonbitlang.com/en/latest/language/fundamentals.html#array
eg

let map : Map[String, Int] = { "x": 1, "y": 2, "z": 3 }

mberndt · December 25, 2024, 2:45pm

It is completely implausible that the way to make a language that is infamous for having multiple ways to do the same thing easier to learn would be to add more ways to do the same thing. Beginners will still have to learn the current way to create collections since that’s what every Scala codebase out there is using now, so the proposed new syntax is just one additional thing to learn, and not a simple one since you need to understand typeclasses to really grok what it does. Justifying these collection literals as being “easier to learn” is a post-hoc rationalization of a feature with no value beyond aesthetics.

The way to make the language easier to learn is to make it more compact, regular and orthogonal, not to add random syntactic sugar. If we want a language that is easier to learn, we need to push the Scala community towards a more unified style. This requires providing guidance in the form of documentation (some PEP-8 kind of thing for Scala), as well as tooling (compiler diagnostics, quickfixes and rewrites) to make it easier for people to adhere to these conventions. What doesn’t help is collections literals.

ragnar · December 25, 2024, 4:03pm

Using syntax that is familiar to developers from other languages is no indication that it is easy to learn.

To the best of my knowledge, learnability of languages is not well researched.
One direction I am aware of is Felienne Hermans work, one interesting point she made is that being able to read code aloud is quite helpful for beginners, one reason being that it allows them to talk about code:
Code Phonology – Felienne Hermans
Her teaching language Hedy stepwise introduces syntax until it ends up teaching python, it begins with just using , to define lists to avoid syntactic overhead, before introducing [] based syntax much later (in level 16).

I think List(1, 2, 3) is still better than [1, 2, 3] in regard to pronounceability.

The other direction about teachability is “How To Design Programs” which uses restricted variants of Racket (S-Expressions) to essentially have no syntax variations at all, because any additional syntax is just distracting for learners.

See @mberndt’s elaboration why adding syntax is unlikely to make things easier to learn.

jducoeur · December 25, 2024, 4:16pm

While that argument is intellectually plausible, it kind of says that the only good language for learning is Scheme. (Which many people would agree with, but it’s a bit of a reductio ad absurdam.). Empirically, I’d say the history of programming languages and what has proven popular suggests that this is a relatively minor consideration in practice.

mberndt · December 25, 2024, 4:19pm

No it doesn’t, because it very much depends on what concepts you want to learn about. You’re not going to learn pointers or System-F-style type systems with Scheme. And there is no interesting concept in programming that collection literals enable you to learn about.

MateuszKowalewski · December 26, 2024, 1:59am

This listing is very misleading.

JavaScript uses […] for arrays, not collections in general. Specialized collections like Map have actually their own regular constructors (like in Scala). But JS has object literals, something I actually miss in Scala more than collection literals. But that’s another story.

Python’s […] is just a shorthand for list, and again not a general “collection literal”.

Typescript? That’s just JavaScript syntax (+ type annotations)…

Better not talk about Ruby. Placing this as “good example” is kind of a joke, I guess.

Swift has only an array literal… That’s again not a collection literal.

Kotlin? A few lines later we get to know that this is not implemented, and not even decided.

Objective-C? What?

Rust’s […] syntax is also just for arrays. To create with shorthand syntax a proper collection type, like e.g. Vec<T>, you need macros—which than looks like a kind of weird constructor call, e.g. vec![…].

In Dart […] is the List literal. (Sets and Maps actually use {} syntax in Dart, and there is also no “collection literal”.)

Java’s {…} is again just the array literal.

C doesn’t have this feature at all as it does not have collections, and not even proper array types.

All the rest are just symbolic constructor calls! (Something that could be trivially simulated in Scala with an alias.) Maybe besides Go, where this is syntactically not even a proper constructor call but some “funny” special syntax which is even more irregular than anything elsewhere. But at this point it’s obligatory to mention that Go is considered “easy to learn”. Despite its weird collection constructors.

So in fact only C#, PHP, and Ruby from the list above have a feature like that, and Perl, which was not mentioned, where PHP and Ruby actually got this syntax from. I wouldn’t call any of these languages a pinnacle of language design…

BTW: C# uses implicit conversions and target typing to implement its collection literals. Just saying…

Oh, and there is C++ of course. Where no sane person knows all the syntactic variants to initialize something. The C++ sequence syntax is just one more annoyance in that regard.

If we want a shorthand than only integrating this feature with (some variant of) tuple syntax makes sense in the context of Scala, imho.

I would especially not copy anything from the Perl tradition! But that’s exactly where such kind of symbolic bracket syntax is from.

Instead I would argue to make (named) tuples more powerful. So we get well working generic heterogeneous “collections”, where a collection containing only same types becomes just a specialized case. This would make the language more orthogonal, and reduce the set of basic constructs one needs to understand: “Everything is a tuple.” Syntax is of course obvious than.

MateuszKowalewski · December 26, 2024, 2:42am

Not only for beginners.

Go read some ancient code in say APL or Perl, and than compare to some code from the same era written in COBOL or Ada. I guess I know what is better ad-hoc understandable…

Symbols may be “oblivious” if you know them already, but if you don’t know them only fully written out words are readable (and likely understandable without much up-front learning).

This observation actually includes also the use of abbreviations in code. Something that should be considered hostile obfuscation imho; exactly as overuse of symbols. With modern code editors / IDEs there is absolutely no excuse to use abbreviations in code!

Scala code is actually especially terrible in both regards. It has a history of symbol overuse, and to make things even worse “typical Scala” uses brainless abbreviations everywhere, which make Scala code as hard to read as old C/C++.

That’s imho the true reason why Scala is considered hard to get into. It has always vibes of submissions to the obfuscated C contest, alone because of the typical code style everybody is aping. (Just look for example for abbreviations in the compiler codebase itself. They are everywhere, usually paired with such well though out variable names like “a”, “l”, “nss”, or other meaningless ASCII salad. Such an adverse code style makes it really hard for outsiders to contribute! (Re-)Naming things is really easy. Just press F2. And in case you can’t come up with a name, ask AI. That’s one of the few things it’s actually reasonably good at.)

All that said, I would welcome some “data literals”, as I said already before. But not at the price of misusing type syntax, and especially not if all we get is just a shorthand for writing Seq or Map, and nothing more. @mberndt is right here, this is a useless no-feature, which will only add more special syntax to learn—for absolutely no expressiveness gain.

Adding stuff almost never makes something simpler! Simplicity is only reached if you can’t remove anything any more. Adding stuff (especially if it’s redundant stuff!) makes things only more complicated usually.

Ichoran · December 26, 2024, 5:30am

I don’t remember why we didn’t want to use the spread operator for this.

val x: Seq[Int] = (1, 3, 5, 2, 7)*

creates a tuple literal, then, if there’s an appropriate typeclass, spreads it into a constructor for Seq[Int].

As a nice perk, there’s no reason this shouldn’t work for conversion between sequence types too.

val x: DataEntry = (name = "John", age = 42, color = Web.PaleVioletRed)*

packs into a case class with names as defined. (Need to decide whether unnamed tuples come along for free or whether you have to add an extra typeclass.)

And, I guess,

val sushi: Map[String, Int] = ("eel" -> 2, "salmon" -> 5)*

can work too, using the same scheme as [].

It’s a rather less dramatic syntactic change, and it’s more explicit that magic is happening because you say the magic everywhere.

Because we already have the idea of the spread operator, it’s a less dramatic change than mixing type notation and instance notation. (Again, unless we view [1, 5, 2] as a type literal from which we can summon an instance.)

lihaoyi · December 26, 2024, 12:24pm

ragnar:

To the best of my knowledge, learnability of languages is not well researched.
One direction I am aware of is Felienne Hermans work, one interesting point she made is that being able to read code aloud is quite helpful for beginners, one reason being that it allows them to talk about code:
Code Phonology – Felienne Hermans
Her teaching language Hedy stepwise introduces syntax until it ends up teaching python, it begins with just using , to define lists to avoid syntactic overhead, before introducing [] based syntax much later (in level 16).

I think List(1, 2, 3) is still better than [1, 2, 3] in regard to pronounceability.

The other direction about teachability is “How To Design Programs” which uses restricted variants of Racket (S-Expressions) to essentially have no syntax variations at all, because any additional syntax is just distracting for learners.

See @mberndt’s elaboration why adding syntax is unlikely to make things easier to learn.

Trying unusual things in new teaching languages is fine, in fact it’s par for the course. But how do we measure how successful a language is at being learnable? One way, of course, is to count how many people actually successfully learned the language. In these, Python has done a great job at getting a wide range of not-even-really-programmers to write code, far more than Scala or Racket or what have you.

If we want Scala to be as popular as Racket, then we should by all means follow Racket’s lead in how to design the programming language. If we want Scala to be an experimental teaching language, then by all means we should follow what other experimental teaching languages are doing. But I think Scala can do better than that, and I think many in the Scala community agree that “as popular as Racket” or “experimental teaching language” isn’t where we should be aiming

If “adding syntax is unlikely to make things easier to learn.”, the easiest language to learn would be MIT scheme, because that has the least syntax and is indeed an extremely elegant language. X86 machine code has pretty minimal syntax as well, who even needs an assembler since it just adds additional syntax? But the fact that we’re not all writing MIT scheme or X86 machine code demonstrates that yes, additional syntax can in fact make things better.

Adding new syntax can also make things worse. But there is no reason believe that the Scala language syntax @odersky came up with in 2004 is the paragon of optimal language syntax. So all this talk “we should change nothing because everything we do would make Scala worse” is really not productive. If we want to improve the Scala language, this sort of “We’ve tried nothing and we’re all out of ideas!” discussion is not the way to do it.

odersky · December 26, 2024, 7:41pm

Maybe the most immediate use of collection literals is in a public scripting setting where we want code to read well to people that don’t know much about the language. If a total newcomer to Scala sees code and takes back “that’s actually very legible” it’s a win. And I believe simple collection literals help, mostly because they are familiar. Seq(...) and Map(...) are perfectly fine once you learned the principle of apply methods and vararg arguments. But before that, they look more mysterious than a simple [...], which by now is universally recognized as a sequence literal (or maybe array literal, it really does not matter!).

If we want to bank that win, it’s important that the syntax is as familiar as possible. I was very intrigued by the (...)* proposal, but in the end it also does not pass the familiarity test. So I think [...] wins, and it’s very important that it returns by default a Seq or a Map without having to require an expected type.

Anecdotally, I remember a conversation with Rich Hickey a long time ago, where he asked me a bit incredulously why Scala did not have collection literals. It seemed such an obvious feature to him.

hepin1989 · December 26, 2024, 8:26pm

I think the JSON like one is great, where Array is [...] and Map is {...}
(...)* is wired.
https://learnxinyminutes.com/nim/

And I sometime want to have something like :

case class Person(name:String, age:Int)
val p: Person = {name:"Alice", age:18}
//will looks better when
val persons: Seq [Person] = [
  {name:"Alice", age:18},
  {name:"Bob", age:19}
]

you can see the dartlang is pretty simple here:

var gifts = {
  // Key:    Value
  'first': 'partridge',
  'second': 'turtledoves',
  'fifth': 'golden rings'
};

var nobleGases = {
  2: 'helium',
  10: 'neon',
  18: 'argon',
};

So I would suggest we follow how Dartlang does, plus:

val persons: Seq [Person] = [
  {name:"Alice", age:18},
  {name:"Bob", age:19}
]

hepin1989 · December 26, 2024, 8:56pm

Why a user would write List(1,2,3)?
Which needs:

`Shift + L`  , i , s, t , `Shift + (` 1, 2, 3, `Shift + )`

vs

[ 1 , 2 , 3]

life is short!!!

megri · December 27, 2024, 2:48pm

Regarding the relation between a and Tuple1(a), why are they not just the same thing? Seems like wrapping a value in Tuple1 is mostly just wasteful and makes the value harder to work with.

megri · December 27, 2024, 2:56pm

I think one issue that makes collection literals less of a clear win in Scala is that Scala exposes a lot more variants and expects users to make an informed decision. Vector, List, Array, ArrayList; things that indirectly become these like Seq, IndexedSeq, IArray, Builders and so on.

By comparison JavaScript (I’m not familiar enough with Clojure to make an informed statement but being a LISP it seems like a somewhat safe assumption) has arrays and dicts and that’s pretty much it. It makes sense to have literals for these since there’s not really a decision to make.

felher · December 27, 2024, 3:14pm

Just wanted to add my voice to the list of opinions about the Pre-SIP:

I like the initial proposal with rewrites to apply, and not only collection literals. In fact, I’ve already written code similar to it:

perlinNoise(
  seed = 0,
  sideLength = 512,
  layers =
    (frequency = 2 , persistence = 0.5  ),
    (frequency = 10, persistence = 0.25 ),
    (frequency = 20, persistence = 0.125),
    (frequency = 40, persistence = 0.125)
)

This is from some recent code that uses named tuples and I find it very readable. In my opinion, the missing “constructor” that would be there if it used a case class for a perlin noise layer would not add to legibility. In fact, I find both the naive case class approach, i.e.

perlinNoise(
  seed = 0,
  sideLength = 512,
  layers =
    PerlinLayerConfig(2,  0.5),
    PerlinLayerConfig(10, 0.25),
    PerlinLayerConfig(20, 0.125),
    PerlinLayerConfig(40, 0.125)
)

as well as a “complete” approach, i.e.

perlinNoise(
  seed = 0,
  sideLength = 512,
  layers =
    PerlinLayerConfig(frequency = 2,  persistence = 0.5),
    PerlinLayerConfig(frequency = 10, persistence = 0.25),
    PerlinLayerConfig(frequency = 20, persistence = 0.125),
    PerlinLayerConfig(frequency = 40, persistence = 0.125)
)

less readable. I think with the initial proposal, [] as syntax and no varargs it would be written as

perlinNoise(
  seed = 0,
  sideLength = 512,
  layers = [
    [frequency = 2 , persistence = 0.5  ],
    [frequency = 10, persistence = 0.25 ],
    [frequency = 20, persistence = 0.125],
    [frequency = 40, persistence = 0.125]
  ]
)

which is very close to my initial code and I find it still very readable.

With that out of the way, a couple of points. I like the initial proposal, because:

It is close to code I’ve already written
Even though it can be misused, like any feature, I don’t think it lends itself overly easy to being misused, even though it can be used in a wide range of circumstances.
I too find it uses existing Scala infrastructure pretty well and is easy to understand as soon as you understand .apply, which is a must for Scala developers anyways.
I seem to find visual noise much more detrimental than most of you, I think, and I find [] to be less visual noise than named constructors, …
Scala is my language of choice, but I find other languages (like Typescript) more lightweight. I think this would help narrowing the gap.

I’m aware that the points I made above are largely subjective. Just wanted to add my opinion into the mix.

morgen-peschke · December 31, 2024, 1:13am

I highly doubt any intrinsic characteristic of Python has influenced it to being picked up by not-really-programmers more than it being embedded in a variety of useful contexts.

Not-really-programmers pick up what’s available in the environment they need to do stuff, and the language features don’t make that much of a difference (as long as they’re not actively and overwhelmingly hostile to users).

JS is massively popular despite being a hot mess of a language for exactly this reason, and I would strongly suggest we don’t take design cues from JS (and I’m personally of the opinion that crediting too much of Python’s popularity to it’s design is a mistake driven by survivorship bias).

hkt · December 31, 2024, 1:27am

While following this SIP-thread, I was thinking the same - are we giving too much credit for Python that we end up borrowing some of its bad parts?

It should be other way round that other languages refer Scala as an inspiration. I am not saying none currently don’t.

nafg · December 31, 2024, 2:03am

Could we
(a) not make radical new language changes until IntelliJ has caught up with the existing ones
(b) take a survey of the wider Scala community (not everyone does the forums) to get opinions – not just a binary who is for or against, but how strongly people feel. For instance, if 5% feel like Scala urgently needs this, 75% think it simply a nice-to-have, and 20% feel it would be an absolutely terrible idea, maybe we’re better off leaving it out.

mberndt · December 31, 2024, 7:36pm

That is an excellent point. If we want to make the language easier for beginners (and again, I don’t think that is actually the intention here, I think that’s a post-hoc rationalization), then copying features from other languages is not the way to do it. You make thinks easier for beginners by carefully observing what they’re actually struggling with, and then carefully thinking how to prevent those problems. More often than not it’s actually tooling and getting their basic setup working. After that, simple parser-level syntax errors like forgetting a comma. Maybe what we really need to make Scala more beginner friendly is LLM integration for Metals that helps them balance their parens.

I struggled so much with Scala, I just couldn’t remember the syntax for Lists. Was it (1,2,3)List? Or (List, 1, 2, 3)? Dangit!

… said nobody anywhere ever.