Pre-SIP: A Syntax for Collection Literals

I already did that, kind of! A library solution is mostly adequate.

The strategy I took was to define a slice as an opaque type in a Long–yes, this means that you can’t stride, but those cases are comparatively rare–and have one type being “collection relative” where if you want to be relative to the end you use End. Then I decorated Array with a ton of methods that take these.

It’s quite nice! I rarely miss Python slices any longer. Because I have to evade existing names, it’s not quite perfect.

But all over my code now I have things like

val ys = xs.select(1 to End-1)
ys.edit(3 to 10): (y, i) =>
  if y*i > 10 then -y else 2*y
ys(End-5 to End-2) = 0
ys(_ < -2) = -2

Anyway, my conclusion from this is that you don’t really need the language to do anything for you. Regular syntax is enough unless you actually want to just copy the exact Python syntax so you don’t need to relearn it. Otherwise, I find 3 to End a lot clearer than 3:.

(If one wanted to intercept the usual 1 to 5 by 2 syntax of ranges via a macro, one could also do that. I find it still a bit too much work to make everything a macro, so I did without.)

Runnable version: Scastie - An interactive playground for Scala.

8 Likes

Having in-line XML in Scala made my keystone Scala project possible. Having in-line JSON would be amazing. I’ve worked many projects that needed short snippets of these standard external formats for communicating with other systsems. Maybe if this feature were packaged up as part of full support for JSON in Scala it would be more attractive. Just checking json"{...}" format at compile time would be valuable.

However, something that “kinda acts like JSON” but isn’t “paste in JSON” doesn’t provide that value, and has none of the appeal. A non-standard, internal format doesn’t fit this common use case as well as json"".

In contrast Seq(a,b,Seq(c,d)) structures have worked really well when I’m teaching. The better students hit command-B on Seq and discover apply() methods. ("What the heck is unapply()? " - it’s been that good.) Square brackets offer far less for them to find, especially if some unseen implicit holds everything together. Adding a 1991 Python feature I think takes teaching the wrong direction.

The other big use case discussed is build systems. Bleep uses a .yaml file for input - and simple Scala trait implementations for its plugins. The kids have no problem switching between the two. Bleep is a joy to use for projects that fit in its nascent ecosystem. Maybe an external .yaml file for lists of common things is a better general path forward for the build system use case.

3 Likes

It depends on what you mean by checking. Just checking syntax is not enough. I don’t know JSON very well but in XML you have schemas that you can validate against. In Scala this would be done by type checking against a type ascription:

val x: MyCaseClassOrCollectionThereof = json"{...}"

I’m not sure how this could be done concretely, without substantial macro machinery, and even then, do we have expected type information in macros? Again, I don’t know enough of macros to tell.

Well they came across to me as a voice of reason. I believe we have evolved as a species to solve problems through diverse personality types. So even if you are disposed to see the negative over the positive, its always good to have at least one person with that disposition in a decision making group.

So I would say please keep up the snarkiness, even though I don’t doubt that down the road, it will be one of my bright ideas, that you’re throwing cold water on.

As someone outside the main contributors circle, it seems like Scala has gone from one extreme to the other. For years it seemed virtually impossible to get anything changed. Now everything has to change and by last week to. In the past it seemed that everything was sacrificed on the alter of “simplicity” and orthogonality, no special case could ever be considered. Now it seems like every special case under the sun must be catered to.

Its good that we realised that people don’t want simple languages. I never believed that the attacks on Scala’s complicatedness were made in good faith. But its as if we’ve done an about turn and are now deliberately trying to make the language more complicated.

Code and Data are different. Code needs a higher level of verbosity than Data, because Code can be anything, whereas Data can be succinct because the user of the data, knows what it means, knows the types of the Data that are being parsed.

5 Likes

(This is all a bit of a tangent from Collection Literals at this point.)

We demonstrated “just checking the format” for XML was very valuable for projects with lots of little, boring snippets of XML. Those are more rare in 2025, out-competed in the ecosystem by smaller, still boring snippets of json.

We have a healthy ecosystem of libraries for processing json. I agree that handling json really should be in the province of those libraries, even at compile time.

I think tasks to support that work would add more value than the proposal here might add.

I am mildly curious if there is any more convincing or unconvincing around the previous suggestion to develop special-syntax string templates and interpolation as a way to “embed” literals syntax. To recapitulate some points in its favour:

  1. it follows precedent of Json literals in circe or SQL literals in doobie (to name but a few),
  2. it ring-fences (in a triplequote-ring) new syntax literals from the rest of the language; so the new syntax can be as feature-rich as desired without fear of harm for the old.
  3. it is library based, even standard library based, which makes it easier to iterate, experiment, refine, or deprecate features or designs. Indeed, someone could write and publish one such template for each proposal and release them “next week”.

Edit: In short: why just use the same syntax that simpler languages do, rather than use a feature (typed string interpolation) that puts Scala ahead of Haskell or Java?

1 Like

I’m still not sure if string interpolation can undergo target typing. @odersky?

It can.
In the end, it’s just a “normal” method

implicit class Foo(sc: StringContext):
  def n[T:Numeric](args: Any*) = summon[Numeric[T]].zero

val i: Int = n""
val bi: BigInt = n""
1 Like

Interesting. So a dummy implementation of the present topic could be:

trait ExpressibleAsCollectionLiteral[+Coll]:
  type Elem
  inline def fromLiteral(inline xs: Elem*): Coll

given [T] => ExpressibleAsCollectionLiteral[Seq[T]]:
  type Elem = T
  inline def fromLiteral(inline xs: T*) = xs

extension (sc: StringContext)
  inline def coll[Coll : ExpressibleAsCollectionLiteral as e](inline args: e.Elem*) = e.fromLiteral(args*)

coll"[${1}, ${2}, ${3}]": Seq[Int] // ArraySeq(1, 2, 3)
1 Like

But, it does not work for case class literals. Suppose we have:

case class Person(name: String, age: Int)

To assign it, we would have to use something like [name=$name, age=$age] but, as per the string interpolation mechanism, which is based on varags, the type of the parameters would have to be Any - that is, the LUB - thus loosing type information.

So Martin’s data values proposal is still relevant.

I agree, the point isn’t to make conversion easier, but to reduce confusion and frustration experienced by experienced developers coming from other languages. Converting working python or other code to scala exposes various confusing and non-orthogonal syntax seen by newcomers.

Not per se: you can define your interpolator however you like.
Want name and age? works!

implicit class Foo(sc: StringContext):
  def person(name: String, age: Int) = Person(name, age) // omitting checks for the string parts in sc for brevity

person"${"Martin"}${42}"

Interesting, but we’d need to be able to abstract over any case class.

Use-cases certainly drive language development, but I also strongly believe that language features attract use-cases.

One of the things Python did well was provide a simple and concise, boilerplate-free language to talk about (numerical) data, which the community as a whole has fully embraced and could rely upon also for external libraries, bringing simplicity and clarity to codes. Chief amongst those language constructs are the list syntax and slicing notation.

Considering only the list syntax, take the following pandas example:

df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

And how it’d look in current Scala (using tabula):

val ds = Dataset(
    Map(
        "Name" -> Seq(
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ),
        "Age" -> Seq(22, 35, 58),
        "Sex" -> Seq("male", "male", "female"),
    )
)

Besides making things shorter and clearer, I also wanted to emphasise that a syntax for collection literals would enable focusing on intent and structure, ignoring the mechanism through which the data that you declare is passed.

If you’re not versed in the DS world, after seeing this short snippet you might doubt the significance of the impact the new syntax would bring. After all, is it worthwhile to bring this addition for improving 4-5 lines of code ? But this is not about beautifying a few lines of code here and there. Creating numpy arrays, selecting pandas columns, indexing pytorch tensors, … In the Data Science world your codebase is littered with such code and as such it’s hard to make a case for Scala given the verbosity, even for someone like me that loves the language.

I truly believe this is one among several low-hanging fruits that Scala can pick to improve the language and make it more attractive for a wider audience.

Can we help drive this forward? I’d be keen to try it out in 3.8!

2 Likes

If Tabula wants to make you type extra stuff, because the culture in Scala is to type less stuff than Java but more stuff than Python, that’s its prerogative. However, there is no reason why you can’t get from that to something that is at least as compact as R.

val ds = Dataset(
  "Name" -> c(
    "Braund, Mr. Owen Harris",
    "Allen, Mr. William Henry",
    "Bonnell, Miss. Elizabeth",
  ),
  "Age" -> c(22, 35, 58),
  "Sex" -> c("male", "male", "female")
)

is a trivial amount of code away:

def c[A](elements: A*) = A.toSeq
extension (dsc: Dataset.type)
  def apply(kvs: (String, Seq[?])*) = Dataset(kvs.toMap)

With a lot more effort, one could make it work with tuples. With more effort yet, one could make it work with named tuples.

val ds = DataSet((
  name = (
    "Braund, Mr. Owen Harris",
    "Allen, Mr. William Henry",
    "Bonnell, Miss. Elizabeth",
  ),
  age = (22, 35, 58),
  sex = ("male", "male", "female")
))

We can do this already. Someone just has to want to enough.

Language features do attract use cases. But they can also clash with the existing feel. And that’s where the current proposal is iffy. It’s not that the feature itself is bad; I have hardly seen anyone saying, “I love extra boilerplate!” Rather, it’s that there don’t seem to be good choices that don’t induce some sort of pretty fierce clash between how the language feels and the new feature.

And, furthermore, we don’t seem to have a clear consensus on how to minimize the clash.

For example, I think that the correct way to interpret [22, 35, 58] is as a type, because [...] is always a type in Scala, but which can be reified into the corresponding collection. This would go for [xs.length, 2 + 7, foo(foo(foo(1)))], too; but that would require an extension to the type system to allow code literals to have a corresponding type, and for a code literal type to be treated as an inlineable piece of code. I haven’t worked through all the details, but I think this is a sound and self-consistent, albeit rather ambitious way to make the [...] syntax make sense within Scala. It’s a huge amount of work. Other people have other ideas, with their own arguments for and against. But overall, the biggest barrier is the friction, because (...) already means something, [...] already means something, and {...} already means something.

There are various things that don’t mean anything, but they also don’t mean anything in other languages, so the familiarity is zero. For instance, this is totally compatible, but I’m not sure anyone would go for this:

val ds = Dataset $
  "Name" -> $
    "Braund, Mr. Owen Harris"
    "Allen, Mr. William Henry"
    "Bonnell, Miss. Elizabeth"
  "Age" -> $ 22; 35; 58
  "Sex" -> $ "male"; "male"; "female"

Here $ would be a multi-expression token, which opens a block where the value from every statement is returned, with the whole thing typed as a tuple or array depending on expected type.

No language I know of has anything like this. It’s arguably even cleaner than Python. But it looks super weird.

10 Likes

I hear you when you say that there are alternative, less disruptive, ways to achieve compactness. However, I tried to highlight above (maybe unsuccessfully) that the new syntax truly delivers on its full potential only when we consider its value proposition as a whole:

  1. Shorter / boilerplate-free
  2. Clearer / more focused, by virtue of being shorter and using a familiar syntax
  3. More intentful, by delegating the choice of collection / mechanism to the definition

As such:

You achieve shortness, but you fall short on clarity and intentfulness. This can’t be solved at the library level because even if we abandon our intentfulness goals, dismissing the need for language changes, clarity gains would only be achieved by a standard that all users and libraries can rely upon.
But then couldn’t we just implement c in the standard library ? As has already been stated in the SIP, I think it’s fair to say that it’d still fall short on clarity / familiarity.

Although having spent a significant amount of time abroad helps me soften / erase the perceived friction, I can totally sympathise with this argument. Nevertheless, we wouldn’t be transgressing by extending the meaning of a syntactic element. Examples abound in other languages, from the concurrent dedication of {} to arrays and other, unrelated semantics, to the successful use of [] for both types and collections, as mentioned in the proposal.
And it’s not just others, to take a single example how many uses do we have for parenthesis ?

val a: (Int, Int) = (3 * (identity(1) + 2), 3)

As such, having multiple semantics for a single syntactic element is not only pretty common and well accepted but also probably unavoidable, especially as languages grow and improve.

Probably, then, the main issue is that culturally, typing is a big deal in Scala and, historically, type parameters have enjoyed the privilege of being the sole owner of the square brackets.

So how do we proceed ? Given the multiple potential benefits such a feature would have, its ease of implementation and its harmlessness overall (especially with regards to existing code) I think it would be, at the very least, a missed opportunity to dismiss it. And given the cultural / psychological nature of the friction, we may have have to just try it out, see it in our code, get familiar with it and based on that, decide if we are to definitively adopt it.

1 Like

This is a good point, but I would counter that it’s actually rather annoying that () has multiple uses.

For instance, it makes it impossible to cleanly specify a one-element or zero-element tuple, despite these nominally being distinct things in the language that are not identical to an ordinary item or the unique Unit value (which is also spelled ()). And there’s the ongoing frustration regarding argument lists and tuples, which is mostly okay in Scala 3 but mostly by restricting what would otherwise be a completely okay thing to do.

So, yes, I agree that [...] would be picked up pretty easily, and after some frustration people who picked it up easily would stop trying to index into things with [i]. And people would unlearn that [...] automatically means type and slow down a bit and figure out which it is.

What I don’t think is so clear is that this makes the language better overall. We already group things with (...), and the “magically [...] will be what is needed” aspect is hard to get right without magic causing mystery and mystery causing code with higher levels of technical debt.

I don’t have a very strong opinion either way. Mostly I am convinced of the extra complexity, but unconvinced of the need given the features Scala already has to make such things easy if one wants.

This is the part that I think is wrong. I think it is slightly harmful to all Scala code everywhere because of the extra intricacies of what syntax means. So to my eye, the feature had better be well worth it, because the cost is individually small but paid by everyone for all time, which together is a pretty big cost.

7 Likes

This is of course hard to assess. If Scala were the first language to introduce brackets for both collections and type parameters I would weigh this issue much higher. But it isn’t, there’s ample precedent in other languages where we do not see a mental clash.

I believe part of the problem to make progress here is choice fatigue. There are so many possibilities to invent a new syntax for collection literals! I did not want to monopolize this thread further so have held back with postings. But after very careful consideration of all points raised in the many posts here I must say that the observations of @datalin are spot on. It should be brackets or nothing, let’s restrict it to this binary choice.

6 Likes

The only language I know of that had both from the beginning is Nim. Python added type hints with square brackets, but the usage is far less widespread: you use it in optional type hints, and functions aren’t parameterized by type. The capacity for types and array-values to interleave is far less than Scala because Python is very light on types.

Nim is a pretty good example. I don’t think it’s “ample precedent”, however.

I’m still not entirely convinced, though. Scala has types do a lot of heavy lifting. That type parameters are visually highly distinct I think helps.

Python restricts type hints to variable declarations, arguments, and return types. Nim does also. Scala can ascribe type hints anywhere to anything, and embed type parameters wherever needed in expressions.

The amount of context you need to gather to understand what is going on is an important consideration. Right now, if you see [Foo] you know it’s a type parameter of type Foo. But after the proposal, [Foo] might also be a length-one collection containing the Foo companion object. In a language that didn’t intentionally conflate the names of types and values for practically every concrete class, that wouldn’t be an issue. It is difficult to imagine that this isn’t going to raise the cognitive overhead–maybe not very much, given that you don’t have to look very much farther to discover whether it’s likely a type argument or in term position, but it’s hard to see how the cognitive overhead could be nothing.

So I don’t want to overstate the difficulty. I think it’s very surmountable. But I also don’t want to understate the difficulty, and therefore I think the feature needs to be pretty compelling.

To me the potential advantages are

(1) Works for length 0 and 1. Because () and (x) have their own meanings, parens don’t generalize to collections of length less than two.

(2) Slightly cleaner due to sharp edges and not needing one extra character.

(3) More visually distinct warning that magic might be hidden here, if we enable magic. Any character we used to dispatch a builder could be chosen by someone else to mean something else.

(4) Possible to use for multi-expression blocks, which parens can’t be:

val x = [
  [1, 2, 3]
  [4, 5, 6]
]

is syntactically available. It might be inadvisable because

val y = [
  if foo() then
    bar()
  else
    baz()
  quux()
]

seems pretty incomprehensible, even if the compiler would parse it cleanly and unambiguously as a two-element collection with the first element of either bar or baz, and a second element of quux (because those are the two expressions in the block).

Honestly, the most appealing thing to me out of that list is handling the 0- and 1-element cases well. That’s a constant source of friction. The rest seems either of dubious value (multi-expression) or an unexciting degree of improvement on something we could already do at the library level (val a: Vector[Int] = [1, 2] vs (val a: Vector[Int] = &(1, 2) as per JD557’s example) or a poor integration to the usual way to do this (e.g. can I [Short][1, 2, 3] to make sure the element type is Short?–I certainly could make it so &[Short](1, 2, 3) worked!).

4 Likes

I don’t remember where we reached with the “map literal” syntax, but for empty map you could follow what Swift does and have the bind operator alone: so [->] for maps with syntax ['a' -> 23]

2 Likes