Pre-SIP: a syntax for aggregate literals

bjornregnell · November 22, 2024, 1:18pm

Hmm. Could be special-cased? Normal expressions don’t start and end with < and > respectively…

arturopala · November 22, 2024, 2:47pm

I would love to be able to show my (simple) Scala code to other language developers and make them feel like they already know it and understand how it works without digging into the specs. Scala is already vanishing from the project and job markets, and any new fancy syntax will only accelerate this.

jpablo · November 22, 2024, 6:49pm

Yes, any syntax that deviates from the norm (in cases where there’s a norm) will have a huge cost in terms of familiarity, so there better be an extremely good reason for it.

mberndt · November 23, 2024, 7:34pm

I’ve toyed around with the example code and found that it doesn’t seem to support nesting. I. e. this doesn’t compile:

val u: Seq[Seq[Int]] = seqLit(seqLit(1, 2, 3, last))

Neither do other scenarios where the expected type is required to (partially) be known, e. g.

val w: Seq[Int => Int] = seqLit(_ => 42)

odersky · November 23, 2024, 9:20pm

Yes, this looks like a fundamental problem. We don’t propagate the expected type into the elements. We have to see whether this is fixable with a different typeclass scheme or not. Since Swift seems to use something like that it would be good to find out how it’s done there.

mberndt · November 23, 2024, 9:32pm

I think that my proposed solution of simply desugaring it to an apply call on the companion object does not have this problem. val x: Seq[Seq[Int]] = [[42]] would be desugared to Seq([42]) first and then Seq(Seq(42)) in the next step. Unlike the typeclass solution, it isn’t required to know the element type of the collection, because the companion object will be the same regardless.

I also think that the idea of using the information from implicit conversions in the companion object would be a viable solution to the Task/Optional problem. Or my other idea of having a special type alias that tells the compiler object which companion object to delegate to.

Ichoran · November 24, 2024, 3:56am

I like the typeclass approach, but I am still uneasy about what kind of thing [42, 16] is.

Can we formalize [a], applied to nothing, as a singleton type literal, with the tuple [(a0, a1, ..., aN)] having syntactic sugar to drop the ( ), so just [a0, a1, ..., aN]?

If we do, then [42, 16] has exactly the same meaning it has now: it represents two type arguments, the first of which is the type of 42 and the second of which is the type of 16. Because they’re singletons, there’s a canonical mapping between types and values, so you can have the values whenever you want.

I’m not sure how to handle the variable arity issue. In one sense, the tuple idea bypasses it. But I’ve also used the Tuple machinery with type lambdas, and did not find the overall experience particularly enjoyable.

But the biggest aversion I have to [42, 16] in Scala is that [A, B] is a pair of types, not a pair of values. But with the singleton type literal idea, it still is a pair of types, just very specific ones.

Then we still have to ask how to use [42, 16]. But at least we know what it is, and it’s the most consistent thing it could be.

lihaoyi · November 24, 2024, 4:37am

This fails once you want to do the obvious next thing which is to put arbitrary expressions in the sequences, such as [x + 1, Stdin.readLine().toInt]

odersky · November 24, 2024, 11:50am

I did some more digging why nested applications of seqLit don’t work. The first reason is that seqLit is defined like this:

inline def seqLit[T: ClassTag](inline xs: T*)(using inline fsl: FromSeqLit[T, C]): C =

This means the implicit argument is resolved after we typechecked the first xs argument. So we don’t have an expected type for that. We can turn it around, so that the using clause comes first:

inline def seqLit[T: ClassTag](using inline fsl: FromSeqLit[T, C])(inline xs: T*): C =

That still does not work since we don’t propagate the expected result type into the function of an application. One reason we don’t do that is that there could be an implicit conversion inserted around the application, so we’d risk rejecting valid programs. That’s one of the areas where we can do better once we reject implicit conversions without a language import instead of just producing a feature warning.

But anyway in the end we’ll treat aggregate literals as its own syntax instead of emulating them as applications. Then we have a lot more flexibility when and how to do implicit searches, so the typeclass idea might still work out.

sjrd · November 24, 2024, 12:04pm

Speaking of the above signature, we have to be able to get rid of the ClassTag requirement. Otherwise we wouldn’t be able to construct a sequence such as a List out of unconstrained T elements.

The ClassTag requirement should only be required when building an Array. Not other kinds of sequences.

mberndt · November 24, 2024, 8:52pm

I have no doubt that it can be made to work. But if aggregate literals are going to be special anyway, I would like to understand why you insist on a typeclass based solution and what the problem is with the original idea of rewriting it to a call to the companion object’s apply method.

Regarding nesting: we already have a construct where the expected type is propagated towards nested expressions, namely lambdas, i. e. code like this compiles fine:

val f: Int => Int => Int =
  x => y => x + y

I think it should be possible to make a similar scheme work for aggregate literals.

I also see significant advantages for my idea in terms of adoption: no typeclass instances are required, you simply update your Scala version and the new syntax works even for existing 3rd party libraries like ZIO Chunk – the typeclass idea requires explicit library support, so every time you want to use it in a library you’re left guessing whether that library supports it.
It’s also very consistent, as an aggregate literal is guaranteed to behave the same as a conventional apply call on the companion object. It was proposed to rewrite aggregate literals to the corresponding builder code, but doing it at that point implies that people who forever reason aren’t using the new syntax won’t get the performance benefits. Why not just improve the apply method and have it also work for those people who don’t want to rewrite every collection expression in their code?

I’m sure there are good reasons for pushing the typeclass idea, and I would like to understand what they are.

Ichoran · November 24, 2024, 9:17pm

If we want expressions,

[x + 1, Stdin.readLine().toInt]

is

val $0 = x+1
val $1 = Stdin.readLine().toInt
[$0, $1]

We already get all of the A² type stuff, so it’s not entirely without precedent, though I agree it’s a change to how we infer singletons now. But inferring singletons within [] could be slightly more expansive.

odersky · November 25, 2024, 10:07am

I mentioned that already. I am very concerned about abuse. Even a lot of the examples shown in this thread were quite obscure, certainly not a showcase of good Scala style! If we allow [...] for arbitrary apply methods, people will use that a lot since it’s shorter than the alternative and clarity will be lost.

We need to restrict this to collection-like builders. Restricting to apply methods with vararg arguments is better but risks having both false positives (a vararg might be used where no collection-like meaning is intended) and false negatives (an apply method might have a different form, maybe a fixed number of arguments or a vararg + default parameters). So it’s better to make the meaning explicit: Aggregate literals can be used as constructors of a type if the type defines an instance of a typeclass like ExpressibleAsCollectionLiteral. There’s precedent in Swift where this approach seems to work well.

mberndt · November 25, 2024, 10:57am

I don’t think you have explained before what the objection is against the idea of restricting it to variadic apply methods – perhaps I have missed it.

Thank you for clarifying this. While I don’t agree with the reasoning, I do understand it now.

odersky · November 26, 2024, 4:05pm

Here’s a draft implementation of collection literals: A strawman for aggregate literals by odersky · Pull Request #21993 · scala/scala3 · GitHub. The PR contains a doc page outlining what is included.

The syntax uses [...], which is the most common choice in other languages. The cognitive clash with type parameters is a concern, but not a show stopper, IMO. Python also uses [...] for both collection literals and type parameters.

The syntax can be used for all types that implement the ExpressibleAsCollectionLiteral type class. The name is quite similar to the one in Swift, thanks @alvae for pointing out Swift’s scheme. If no such type is found, it falls back to Seq or Map as default.

OndrejSpanel · November 26, 2024, 9:09pm

Where the docs require for Map literals “the form a -> b”, will (a, b) be also considered to satisfy the form? So far those two forms were equivalent as far as expression construction goes.

bishabosha · November 27, 2024, 6:28am

currently only if there is an expected type:

scala> val map = [("hello", 123)]
val map: Seq[(String, Int)] = List((hello,123))

scala> val map: Map[String, Int] = [("hello", 123)]
val map: Map[String, Int] = Map(hello -> 123)

OndrejSpanel · November 27, 2024, 7:36am

That seems enough for practical use. Still, would not it be more consistent if any [Tuple2, Tuple2, ... Tuple2] would produce Map and not Seq unless there is an expected type dictating the contrary?

The rules seem easier to understand to me when talking about types and not 'forms".

lihaoyi · November 27, 2024, 9:51am

This would make a lot of sense, but IMO this is a place where convenience may trump simplicity. Just as Seqs are a pretty fundamental data type, so are Maps, so having a special syntax for both of them is not unreasonable.

The basic idea that this proposal comes from is that there are some fundamental data structures that are broadly useful across all programming languages and environments, and deserve a literal syntax. Scala already inherited literal numbers, literal strings, literal booleans from Java, and this proposal would add literal sequences and (maybe) literal maps. It’s always debate-able where to draw the line, but IMO the general popularity of key-value maps in all configuration formats and all scripting languages does indicate that they are broadly useful enough to deserve a bit of special casing

Just to chip in from a library author perspective, having the aggregate literal syntax work for nested literals is table stakes for most of my use cases. e.g. You can currently write

val json0 = ujson.Arr(
  ujson.Obj("myFieldA" -> ujson.Num(1), "myFieldB" -> ujson.Str("g")),
  ujson.Obj("myFieldA" -> ujson.Num(2), "myFieldB" -> ujson.Str("k"))
)

val json = ujson.Arr( // The `ujson.Num` and `ujson.Str` calls are optional
  ujson.Obj("myFieldA" -> 1, "myFieldB" -> "g"),
  ujson.Obj("myFieldA" -> 2, "myFieldB" -> "k")
)

And part of the draw for this syntax is to be able to concisely write:

val json0: ujson.Value = [
  ["myFieldA" -> 1, "myFieldB" -> "g"],
  ["myFieldA" -> 2, "myFieldB" -> "k"]
]

This also applies to heterogenous use cases, so that rather than writing

val expected = ujson.Obj(
  "hello" -> ujson.Arr("hello", "world"),
  "hello2" -> ujson.Obj("1" -> "hello", "2" -> "world")
)

I would want to be able to write

val expected: ujson.Value = [
  "hello" -> ["hello", "world"],
  "hello2" -> ["1" -> "hello", "2" -> "world"]
]

wz7982 · November 29, 2024, 2:31pm

If using [].
The following code may lead to ambiguity.

def f[X <: Int, Y <: Int] = ???

val v = f[1, 2]

object Arr:
  def apply[X <: Int, Y <: Int] = ???

val a = Arr[1, 2]