Pre-SIP: a syntax for aggregate literals

mberndt · August 13, 2024, 10:41pm

Thank you for your encouraging comment. I was talking to @lihaoyi on Discord earlier who also encouraged me to prosecute this further and gave some very valuable advice.

It seems that the “placeholder for companion object” idea is considered too easy to abuse by too large a faction of the Scala community to be viable. I find this unfortunate because it is extremely versatile: it can be used to construct collections, construct case classes, construct objects using factory methods like of or fill, plays nicely with explicit type parameters, multiple parameter lists and using clauses, can be used for collection conversions (e. g. (1 to 10).to(#)) or even enum constructors and it does all of that with a single new expression, #, whose meaning is determined from the context in a way that largely re-uses the rules that are already in place for _ lambda expressions. I think that’s a pretty high power-to-weight ratio, which I thought would be appealing to people, but alas, many people seem to have concerns about it.

I also feel a bit misunderstood since @odersky used the phrase “the original proposal to have special forms for collection literals” – but it was never intended to be limited to collections. It was always meant to work for collections and objects, which is why I called it aggregate literals, not collection literals.

But at the end of the day, the bait needs to taste good to the fish, not to the angler, so it doesn’t really matter how much I like it. I need to think about this a bit more and see if I can come up with a proposal that is more palatable to the community. I haven’t completely given up yet.

rjolly · August 14, 2024, 7:51am

Yes sorry it wasn’t clear. I meant any mechanism based on expected types is brittle and bound to fail in many cases:

When methods are overloaded : then there is simply no such thing as an expected type
When for some type T the expected type is not T but something T can be brought to, like Option[T], as was already pointed

So, more often than not we are forced to make the type explicit as in val a: T = thing instead of val a = T(thing) whereas a big part of the language is dedicated to type inference. And I personally always try to hide types as much as possible.

This also impacts implicit conversions, but at least these can chain, which addresses point 2). Point 1) regularly causes implicit conversion to fail, as implicit and overloading resolution seem difficult to coordinate.

This might also explain the difficulties of the generic number literals proposal which is nice on the paper but fails in too many cases for it to be really viable, due to the above.

mberndt · August 14, 2024, 6:14pm

By “chaining”, do you mean that when there’s an implicit conversion from A to B and from B to C, then these can be used together as an implicit conversion from A to C? Because Scala doesn’t actually do that.

rjolly · August 15, 2024, 12:21pm

You are right, you have to arrange one of them accordingly, as in:

But in practice this is not a problem (I use it regularly).

som-snytt · August 18, 2024, 6:52pm

The other use case is simpler output in REPL, where people often ask for less name verbosity.

nafg · September 4, 2024, 11:59pm

Is anyone here volunteering to get IntelliJ up to speed on all of the Scala 3 features until this one?

I don’t think it’s a good idea to divorce the language more and more from what at least I think is the only really good IDE. And while it’s appreciated that Jetbrains, despite maintaining the only close competitor to Scala, pays people to work on the Scala plugin, I feel like if we keep making their job harder, what if at some point they feel like it just isn’t worth it? What if they decide that the only way to make the plugin usable (perhaps for a future release of Scala) is well beyond whatever budget they feel Scala support justifies?

odersky · November 6, 2024, 3:00pm

Coming late to the thread. I think the originally proposed [1, 2, 3] syntax is fine. My main reservation is that I foresee many uses where it will obscure rather than clarify things. E.g.

case class Point(x: Int, y: Int)
def foo[T](xs:T)(ys: Array[T]): Unit = ...
val origin = Point(0, 0)
foo(origin)([[3, 4], [5, 6], [-2, 4]])

There’s no indication that the subsequences [3, 4], ... are Points! One could say devs should stay away from obscure code like this, but it’s a fact that people will naturally flock to the shortest solution. So if the shortest solution is obscure, we have a problem and should resist enabling that style.

On the other hand, the title of this thread is “A syntax for aggregate literals” and I believe the proposed syntax is perfectly good for that. So I would support it with a restriction: Assume we have [x1, ..., xN] where T is the common type of all elements. We look at the expected type C. If that has a companion object with an apply taking a vararg parameter of type T*, map it to C.apply(x1, ..., xN).

This makes perfect sense for the internal compiler AST since we already have a node class SeqLiteral to which [x1, ..., xN] could map. And SeqLiteral exists specifically to express a bunch of arguments passed to a vararg parameter. So I would expect that change to be quite straightforward, which would also alleviate concerns about updating the tooling.

You might ask: Why does the restriction help against obscure code. Let’s look at the previous example again:

def foo[T](xs:T)(ys: Array[T]): Unit = ...
val twoNumbers: List[Int] = [1, 2]
foo(twoNumbers)([[3, 4], [5, 6], [-2, 4]])

We still don’t see directly what’s the class of the literals [3, 4], …, but that arguably does not matter much since we know they are conceptually sequences. We could also explicitly require that C <: Seq for the translation to work. Then we could be sure they are sequences and nothing else. That’s something to discuss.

We could also go further and also allow sequence literals if there is no expected type. In that case we just assume Seq. So

val xs = [1, 2, 3]

would expand to

val xs = Seq(1, 2, 3)

I think that makes sense. And a similar trick would not work if we mapped into arbitrary apply methods.

rjolly · November 6, 2024, 4:38pm

A nice addition would be to allow it in pattern position too, as in:

val [a, b, c] = xs

bjornregnell · November 6, 2024, 6:17pm

The rule for expanding [a, b, c] to Seq(a, b, c) and allow it in varargs seems simple, and explainable and makes the syntax non-ambiguous. The only downsides I can think of just now are

The overload the meaning of […], which currently is reserved for type parameters, so wherever a learner sees a […] they know its something to do with types.
Yet another syntax to learn and explain.

I guess these downsides are outbalanced by the upsides of convenience and boilerplate-scrap.

Sporarum · November 6, 2024, 9:07pm

Another downside is the cost of “unpacking” the Seq, for example:

Array(1, 2, 3)

Is not compiled to an “array literal”, i.e. a contiguous chunk of memory pre-filled with the values.
Instead the unapply method of Array creates an array at runtime and fills it one by one with the literals.

sjrd · November 6, 2024, 9:48pm

If we use [a, b, c] as syntax, we’re going to get dangerously close to C++’ level of syntax density. By that I mean the ratio of “random strings that happen to be valid syntax” over “random strings”.

For example, a C++ syntax density issue that many of my students have trouble with:

int x;    // defines a variable x of type int
int x();  // prototype definition of a function named x returning an int
int x(0); // defines a variable x of type int, initialized to 0

Now in Scala you can have fun too:

foo bar [a, b, c] // infix call with a seq literal as argument; a, b, c are terms
// versus
foo.bar [a, b, c] // dotted call with type arguments; a, b, c are types

(and no, the extra space doesn’t give it away; it takes a while for students to get the sense that formatting is remotely relevant)

Sporarum · November 6, 2024, 10:01pm

(let’s not forget int x {0} vs int x[] {0})

odersky · November 6, 2024, 10:13pm

I understand the concerns about syntactic overloads. On the other hand, it works exactly the same in Python, and does not seem to cause a lot of problems there. So, I think it might work in Scala as well. But agreed, it’s a tradeoff. I don’t very feel strongly either way, for or against including the feature.

Sporarum · November 6, 2024, 10:19pm

But in python the following is invalid:
foo bar [a, b, c]
Since application only takes the form a(b)

(IIRC)

lihaoyi · November 6, 2024, 10:49pm

sjrd:

Now in Scala you can have fun too:

foo bar [a, b, c] // infix call with a seq literal as argument; a, b, c are terms
// versus
foo.bar [a, b, c] // dotted call with type arguments; a, b, c are types

The same similarity here happens with parens and tuples/namedtuples, which differs between whether the method is alphanumeric or symbolic:

scala> "%s".format(1, 2, 3) // multiple positional arguments
val res0: String = 1

scala> "%s".format(args = 1) // 1 named argument
val res1: String = 1

scala> "%s" format (args = 1) // 1 positional named tuple argument
val res2: String = (1)

scala> "%s" +  (1, 2, 3) // 1 named argument that is a tuple
val res3: String = %s(1,2,3)

scala> "%s" +  (x = 1, y = 2, z = 3) // 1 positional argument that is a named tuple
val res4: String = %s(1,2,3)

And even differs between Scala 3.5.1 (above) and earlier versions e.g. 3.3.1 below:

scala> "%s" +  (x = 1, y = 2, z = 3) // 3 named arguments
-- Error: ----------------------------------------------------------------------
1 |"%s" +  (x = 1, y = 2, z = 3)
  |         ^^^^^
  |method apply in object Tuple3: (_1: T1, _2: T2, _3: T3): (T1, T2, T3) does not have a parameter x
-- Error: ----------------------------------------------------------------------
1 |"%s" +  (x = 1, y = 2, z = 3)
  |                ^^^^^
  |method apply in object Tuple3: (_1: T1, _2: T2, _3: T3): (T1, T2, T3) does not have a parameter y
-- Error: ----------------------------------------------------------------------
1 |"%s" +  (x = 1, y = 2, z = 3)
  |                       ^^^^^
  |method apply in object Tuple3: (_1: T1, _2: T2, _3: T3): (T1, T2, T3) does not have a parameter z
3 errors found

scala> "%s" format (args = 1) // 1 named argument
val res2: String = 1

I don’t bring this up to say that confusion does not accumulate: I agree that confusion is cumulative. But I don’t think there has been a major issue with this similarity in the past, and we have even changed the semantics of some of this syntax in the past 5 years. All through this, although the edge cases exist, I don’t think it has caused major hardship for Scala users. Given that, I don’t think adding a bit more of density using square brackets for sequence literals will be a significant burden.

On the other hand, aligning the sequence literal syntax with almost every other programming and config language out there (Python, Javascript, Ruby, C#, Swift, PHP, Kotlin-KT-43871, Haskell, F#, OCaml, Rust, Dart, JSON, YAML, TOML, …) will very likely save on much more confusion than the increased density would cause.

If you look at the top 20 languages in the Redmonk June 2024 Ranking, the syntax breaks down as follows:

[...]: Javascript, Python, PHP, C#, Typescript, Ruby, Swift, Kotlin, Objective-C, Rust, Dart
{...}: Java, C++, C (in limited scenarios)
[...]int{...}: Go
c(...): R
@[...]: Objective-C
@(...): Powershell
Seq(...): Scala

From this, it is clear that Scala is the odd one out with a weird list literal syntax. Basically the entire programming community uses [...], a few C-family languages use {...}, and then there’s the weird oddballs, of which the Scala syntax is the second most verbose of any language in the Redmonk top 20!

You have people coming to Scala from all these other programming languages, and having them type [1, 2, 3] rather than Seq(1, 2, 3) will be a huge win for familiarity and first-glance fluency. And this familiarity benefit is on top of the verbosity improvement that will also significant increase quality of life

mberndt · November 7, 2024, 5:30am

I find that a surprising idea from you since you recently motivated the choice of Array as a return type for an enum’s values method with a desire to not couple the language to the standard library too much – a notion that I actually agree with. Speaking of Array: accepting this only for Seq would mean it doesn’t work for Array, which is just weird. I would also like to avoid any kind of “blessing” of Seq as some sort of “standard” type as it encourages people to use that type more than they probably should. For example, Seq is just a bad choice in programs that use cats because cats doesn’t define typeclass instances for Seq but does for List and Vector.

And there’s another thing that’s bothering me about the idea in its current form: it is focused too much on collections. Being able to create objects is IMO an absolutely crucial part of this feature because one of the main use cases that motivated me to propose this in the first place is the use as a “data language” for things like zio-k8s. There’s no reason why Scala should be any worse than Yaml or toml for defining data!

I’d like to know what you think about the following idea: we could have two different forms of aggregate literals: sequence aggregates and object aggregates.

sequence aggregates have no named arguments and can only be used to call vararg apply methods on the companion object
object aggregates have only named arguments and can only be used to call non-vararg apply methods on the companion object

An example would be

val x: List[Point] = [
  [x=1, y=2],
  [x=3, y=4]
]

This mirrors the well-established data model of JavaScript/JSON: objects have field names, arrays don’t.

That would give us almost everything we need to make zio-k8s as pleasant to use as kubernetes YAML… with the exception of a better solution for optional parameters. But perhaps it’s best do deal with that separately.

soronpo · November 7, 2024, 6:03am

I think we can enable a short syntax of key-value as a Map:
["k1" = "v1", "k2" = "v2"] is the same as Map("k1" -> "v1", "k2" -> "v2")
[1 = "v1", 2 = "v2"] is the same as Map(1 -> "v1", 2 -> "v2")
Then you can use explicit conversion functions to construct whatever from the Map.

soronpo · November 7, 2024, 6:41am

If Point is a named tuple (x: Int, y: Int), then you can write:
val x = [(x = 1, y = 2), (x = 3, y = 4)]
If Point is a case class, then all you need is a named tuple to case class conversion

lihaoyi · November 7, 2024, 8:17am

Why not ["k1" -> "v1", "k2" -> "v2"]? We already have Map taking a varargs of tuples, so it should work out of the box with no additional special syntax

soronpo · November 7, 2024, 8:30am

Why is ["k1" -> "v1", "k2" -> "v2"] not considered to be Seq("k1" -> "v1", "k2" -> "v2")?