Pre-SIP: a syntax for aggregate literals

som-snytt · July 14, 2024, 4:11am

This is a fundamental design choice.

Odersky said several years ago that he thought Scala was not about safety but about enabling what was not otherwise possible.

Scala 3 is “opinionated”, albeit not necessarily in the name of safety, but Odersky recently commented (or capitulated) that Scala devs “like their guard rails.”

Personally, I prefer a relaxed language (with no warnings) but with a robust linter (that uses whatever tech is necessary to encode the community’s institutional knowledge).

I don’t know what percentage of code is literals. I know “inline” literals harm readability, such as f(true) instead of f(reverse = true).

Probably aggregate literals are especially useful for initializing. Perhaps in method applications they call for controls (syntax at definition or even at use site akin to named booleans).

rjolly · July 15, 2024, 1:15pm

Note that constructor inference already exists in the case of new:

trait MyFavoriteThing:
  val id: Int
  val metadata: MyFavoriteThing.Whatever

object MyFavoriteThing:
  trait Whatever:
    val salt: String

val x: MyFavoriteThing = new:
  val id = 2
  val metadata = new:
    val salt = "eee"

rjolly · July 15, 2024, 2:01pm

I like the idea:

case class MyFavoriteThing(id: Int, metadata: infer MyFavoriteThing.Whatever)

object MyFavoriteThing:
  case class Whatever(salt: String)

val x: infer MyFavoriteThing = (2, ("eee"))

infer would desugar to into plus a synthetic implicit conversion from tuple to constructor - thus enabling implicit conversion from the declaration site, which will be the norm anyway.

lihaoyi · July 15, 2024, 2:55pm

rjolly:

Note that constructor inference already exists in the case of new:

trait MyFavoriteThing:
  val id: Int
  val metadata: MyFavoriteThing.Whatever

object MyFavoriteThing:
  trait Whatever:
    val salt: String

val x: MyFavoriteThing = new:
  val id = 2
  val metadata = new:
    val salt = "eee"

That’s true! The main issue with constructor inference is that constructors are often not what you want. In Scala, you often want the apply factory method instead: whether to construct collections or to construct case classes. Especially for collections, when initializing them you almost always use apply instead of new, so extending constructor infererence to apply inference may help make the feature a ton more useful

OndrejSpanel · July 15, 2024, 3:50pm

A kind of. What is inferred is a base type for an anonymous class:

class MyFavoriteThing

val x: MyFavoriteThing = new {}

is not the same as:

class MyFavoriteThing

val x: MyFavoriteThing = new MyFavoriteThing

spamegg1 · July 22, 2024, 10:01pm

mberndt · July 22, 2024, 11:53pm

I’d like to write down a few more thoughts about this that have crossed my mind recently. I’ve come to the conclusion that it’s probably overambitious to try and cover the wide range of problems that have been discussed with a single language feature. Instead, we should consider several smaller changes and extensions.

FP fundamentals should just work

As a functional language, Scala should be able to express typical FP idioms as elegantly and concisely and possible. By this I mean code like the following: a prototypical implementation of map:

  extension[A, B](l: List[A]) def map(f: A => B): List[B] =
    l match
      case head :: next => f(head) :: l.map(f)
      case Nil => Nil

(Let’s ignore for now that List already has map and that this is not stack safe). This is the nice, idiomatic functional code that the language is intended for and that we want people to be able to write.

This code works today because

Nil and :: aren’t scoped inside List like they would be if an enum had been used, which would be the idiomatic way to declare types like this
List has a method called :: defined on it, and there’s a weird syntax rule that says that some identifiers when used in infix are looked up on the right operand rather than the left one

Neither of these would be the case if we had just declared List like so:

enum List[+A]:
  case Nil
  case ::(head: A, next: List[A])

And I feel strongly that the above implementation of map should just work without any ceremony. To me, any uglification like List.:: @:: or ..:: (ew!) is an unacceptable step backwards, and so would be importing Nil and ::.

It follows that we need new name lookup rules for at least some identifiers. When matching against an enum type, its cases must automatically be in scope without imports or the like, and in a position where an enum is expected, its cases should also be in scope. And for binary symbolic cases, it should also be possible to apply them using infix syntax.
The same thing is true when matching against a sealed type: the derived case classes and case objects should be in scope both for matching and construction. In fact, I recently changed a bunch of code from sealed trait (with derived types declared in the same scope, not in the companion object) to enum, and while this did improve the declaration of those types, it made using them much less pleasant.

The need to define a separate :: method to construct these things is also a wart, and it requires a weird syntax rule to even work. Maybe we can say that in places where an enum type is expected, the enum cases with a symbolic name (e. g. ::) can be applied with infix syntax?

Too much of a good thing?

Assuming that we can agree on the above, it’s easy to jump to the conclusion that everything should be looked up in the companion object scope when the companion object is known. This is tempting because it makes some things very easy. Want to make a LocalDate? Easy, just type of(…) and you’re done!
But many types’ companion objects declare dozens or even hundreds of methods, and that could easily lead to an unacceptable level of namespace pollution.

OTOH, I still feel that we should have some way to make the companion object more accessible, and I think a good compromise is to allow unqualified lookup for enum case/case class/case object symbols while requiring explicit syntax like .. for all other symbols in the companion object scope.
..of(1958, 9, 5) is a reasonable syntax to make a LocalDate, and at the same time there’s a visual clue that some special name lookup is going on.

The apply method

As a special case, apply is used more often than any other name for “factory” functions, for the obvious reason that it is treated specially in various ways by the language. A syntax like ..apply(1,2,3) would defeat the purpose. It was suggested that we might abbreviate this to ..(1,2,3). But when I look at @Ichoran’s example from earlier…

    ..(II, ..(..("rrf-3", ..("b", 26))),
    ..(IV, ..(..("fem-1", ..("hc", 17))),

… then I can’t help but feel that it looks excruciatingly ugly, and it’s the dots that are bothering me. I really think that

    [II, [["rrf-3", ["b", 26]]],
    [IV, [["fem-1", ["hc", 17]]],

looks much cleaner. Regarding multiple parameter lists and using clauses, I think it’s fairly simple to solve: [stuff] is really just a syntax for ..apply(stuff), and so if you need more parameter lists or using clauses, you just do this:
[stuff](bla)(using blub), which would desugar to ..apply(stuff)(bla)(using blub). [stuff][A] would be desguar to ..apply(stuff).apply[A].

In addition to the motivation that we already discussed (e. g. k8s objects), I ran into another case recently where this kind of syntax just makes sense, and it’s the zio-aws project. Basically all methods in this project look something like this:

  def createBucket(
      request: CreateBucketRequest
  ): IO[AwsError, zio.aws.s3.model.CreateBucketResponse.ReadOnly]

There are hundreds (or even thousands?) of these, and of course everybody and their dog is using AWS these days, so a simpler syntax to create those requests really would make the language feel a lot more light-weight for a large number of people. What would you rather read and write: s3.createBucket(CreateBucketRequest(bucket = BucketName("foo"))) or s3.createBucket([bucket = ["foo"]])? I know which one I’d pick, and it’s not the first one!

Summary

My current feeling is that we’ve been trying to cram too much functionality in one feature, so I would propose four separate ones:

In expressions where an enum or sealed type is expected, the relevant case/case class/case object identifiers can be used unqualified. The same applies when matching an enum or sealed type: case/case class/case object names can be used unqualified
Symbolic binary case/case class names can be written with infix syntax.
..foo (or @foo or whatever else we agree on) is a syntax to modify name lookup to occur in the companion object of the expected type.
[stuff] is syntactic sugar for ..apply(stuff)

mberndt · July 23, 2024, 12:40am

That’s actually an interesting example, because it demonstrates that spelling out types is often not helpful at all. Or is anybody going to argue that f(true: Boolean) is any easier to read?

som-snytt · July 23, 2024, 2:24am

well, I would write it f(true: true)…

pronounced, “true, too (two) true”. Insert ruefully shaking head.

spamegg1 · July 23, 2024, 5:28am

MyFavoriteThing(id = 2, metadata = MyFavoriteThing.Watevr(salt = “eee”)),
…really doesn’t make the code any clearer.

It really does actually. By a lot. If we are bothered by length / verbosity we can use imports and renaming to a shorter name. (This would still be a trade-off, losing some clarity.)

import {MyFavoriteThing => MyFvTh}
import MyFavoriteThing.*

Yep! Data in code is a bad practice / smell. Best to keep data in files and read using helpers / libraries. There are many Scala libraries that read data into specialized efficient classes with tons of features.

lihaoyi · July 24, 2024, 4:00am

It’s always possible to come up with synthetic examples to make any language feature look arbitrarily bad. That’s a pretty meaningless discussion that will get us nowhere.

Here’s some real-world code I just came across that would benefit from this language feature being proposed:

  def pomSettings: PomSettings = PomSettings(
    description = artifactName(),
    organization = "com.lihaoyi",
    url = "https://github.com/com-lihaoyi/scalasql",
    licenses = Seq(License.MIT),
    versionControl = VersionControl.github(
      owner = "com-lihaoyi",
      repo = "scalasql"
    ),
    developers = Seq(
      Developer(id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
    )
  )

Perfectly idiomatic Scala: immutable case classes, collections, factory methods, named parameters, explicit type annotations. This is as vanilla Scala as it gets.

But when you look at the code, it’s kind of clunky. licenses = Seq(License...), versionControl = VersionControl..., developers = Seq(Developer(...)). There’s tons of boilerplate plate and unnecessary duplication, and def pomSettings: PomSettings = PomSettings(...) is the kind of verbosity that you find in Java that Scala programmers like to feel smug about avoiding. It’s true that Scala is often pretty concise, but in the very common scenario of constructing nested data structures, Scala is pretty verbose as well.

I would much prefer to write some like:


  def pomSettings: PomSettings = (
    description = artifactName(),
    organization = "com.lihaoyi",
    url = "https://github.com/com-lihaoyi/scalasql",
    licenses = [MIT],
    versionControl = github(
      owner = "com-lihaoyi",
      repo = "scalasql"
    ),
    developers = [
      (id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
    ]
  )

mberndt · July 24, 2024, 8:26am

With my latest ideas, that would be something like

  def pomSettings: PomSettings = [
    description = ..artifactName(),
    organization = "com.lihaoyi",
    url = "https://github.com/com-lihaoyi/scalasql",
    licenses = [..MIT],
    versionControl = ..github(
      owner = "com-lihaoyi",
      repo = "scalasql"
    ),
    developers = [
      [id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi"]
    ]
  ]

That looks pretty good to me, and it’s explicit about the lookup of items like github or MIT. Though the more I look at it, the more I feel that .. really is a terrible eyesore. Maybe it’s time to get the bikeshedding started on that one? I feel that @ would be OK, but I’m not sure if it would conflict with annotations or pattern aliases.

I had considered using () (tuple syntax) instead of [] for the “companion object apply” syntax, and I don’t like the idea because so far, wrapping any single expression in parens is always a no-op, and I think it should stay that way. I also like [] because then it’ll match the list syntax of most other languages.

soronpo · July 24, 2024, 10:16am

I think there is a way that can satisfy most concerns raised here and on the relative-scoping thread. Here is what I think we should do:

Relative-scoping

Relative scoping, as in accessing the target type’s companion object methods/values, will only be available for named argument placement. Due to this restriction, I think we can reduce the relative scoping token to a single ., since the ambiguity is removed, IIUC. For discussion: with this restriction, we can even consider removing the need for a leading relative scoping token entirely.

case class Foo(arg: Int, bar: Bar)
object Foo:
  def func(): Foo = ???
enum Bar:
  case Bar1, Bar2, Bar3

val foo1: Foo = Foo(arg = 0, bar = .Bar1)  //OK! (we can even consider removing the need for leading `.`)
val foo2: Foo = Foo(0, .Bar1) //error (relative scoping only available for explicit named arguments)
val foo3: Foo = .func() //error (we could allow this, but I think not)

Note: Due to the above restriction, we need to have named pattern matching officially in the language to have relative scoping within pattern matching.

Aggregated literals

Aggregated literals, as in dropping the explicit type constructor name when invoking apply, will be possible under the following restrictions:

Only named values or argument placement.
The syntax of [] is used, and must always have named argument positioning, unless there is a single varargs argument.

case class Foo(arg: Int, bar: Bar)
enum Bar:
  case Bar1, Bar2, Bar3
case class Baz(arg: Int*)

val foo1: Foo = [arg = 0, bar = .Bar1]  //OK!
val foo2: Foo = [0, Bar.Bar1]  //error: missing argument names
val baz: Baz = [0, 1, 2, 3] //OK!

mberndt · July 24, 2024, 10:35am

I’m opposed to restricting relative scoping to named arguments because it’s simply unnecessary in many (perhaps even most) cases. .of(year = 1958, month = 9, day = 5) isn’t clearer than .of(1958, 9, 5), it’s less clear because the signal-to-noise ratio is worse. And worse than that, named parameters don’t even work with Java methods.

What is so wrong with simply allowing the developers to make their own decisions, like adults do? I’m honestly sick and tired of being told that we can’t give powerful features to capable and responsible developers because of a few fools who might abuse them to shoot themselves in the foot. Especially given that pretty much the worst thing that can happen is that somebody needs to enable parameter name hints in their editor.

We don’t know better than other people how they should write their code.

soronpo · July 24, 2024, 10:39am

Actually, it’s much clearer to me. But what I proposed does not prevent you from writing .of(1958, 9, 5). The named argument placement is a restriction for invoking relative scoping. As In, Foo(arg = 0, date = .of(1958, 9, 5)) works, but Foo(0, .of(1958, 9, 5)) won’t.

mberndt · July 24, 2024, 11:46am

I find that hard to believe given that nobody writes dates like that anywhere ever. Humans are really very good at figuring things out from context and world knowledge, and that’s why ISO date format is 1958-09-05 and not year: 1958, month: 9, day: 5. Because everybody can figure out it’s a date just by glancing at it.

Why, why complicate things with additional arbitrary rules that make them less orthogonal than they need to be? Why this insistence that you know better than the people using the language how they should be writing their code? I’m sorry I’m getting emotional here, but this idea that adults are really children who need to be protected from themselves is spreading everywhere, and it needs to stop. You don’t make the world a better place by preventing every bad thing that could happen, however minor. You make it a better place by giving the capable and well-intentioned all the possibilities to make good things happen. Especially when the worst possible downside can easily be mitigated by decent tooling.

But getting back to the technical side: invoking relative scoping only for named parameters means I can’t use it for variable definitions, I can’t use it in a List, I can’t use it for a function’s return value, I can barely use it in a tuple (_2 = .Bar1, srsly?). To me, that’s unacceptable.

The restriction of limiting unqualified lookup to enum cases and case classes that inherit from a sealed trait should really be enough.

soronpo · July 24, 2024, 12:02pm

That is untrue. Without the context I really had no idea what did you mean by .of(1958, 9, 5). That’s why date = .of(1958, 9, 5) makes sense, but without any reference to what those numbers mean it’s an absolute hell. You need some kind of anchor for the reader to understand context. If we take away full constructors, we need to at least leave the argument name.

mberndt · July 24, 2024, 1:55pm

I’m going to once again come back to a point that I made earlier, which is that while you’re placing all your focus on this .of(...) expression, you’re neglecting all the cases where you have even less information of what the thing you’re passing to the Person constructor might be. Nobody is complaining about the fact that Person("Martin", _) is a perfectly valid expression to construct a function of type LocalDate => Person, nor did anybody ever insist that this must be written Person("Martin", birthday = _). The reason is that the language couldn’t possibly understand all the context that a human reader can factor in while reading the code, and thus whether to make this explicit is a choice that the programmer must make. In an expression like Person("Martin", .of(1958, 9, 5)), it is relevant context that I know what Martin looks like and therefore have rough idea of what his year of birth might be. Humans do this kind of subconcious cross-referencing all the time, and it’s not something that any programming language will ever understand. Another example of context is variable names. What if the expression isn’t .of(1958, 9, 5) but .of(year, month, day)? It’s just not possible to argue that anybody could mistake that for anything other than a date.

The irony here is that I probably would use a named parameter for this case in order to distinguish between a person’s birthday and other possible relevant dates (like wedding date, signup date or whatever). But when I just write LocalDate.of(y, m, d) I don’t have to do that either, so adding this rule for a relative scoping expression doesn’t really solve the problem, especially given that people can just import LocalDate.of. So you’re not enforcing readable code, you’re just enforcing longer import lists.
And there’s a lesson here: you cannot enforce readable code through language rules. Readable code is the result of developers giving a fuck, and no amount of language legislation is going to change that.
Another example is my zio-aws example from earlier. s3.createBucket([bucket = ["foo"]]) is good code, adding a parameter name is pure noise, and adding noise makes code worse, not better.

When it helps, developers have the possibility to use named parameters. I like named parameters, I probably use them more than the average developer. But I don’t want to have the language tell me when I need to use them, and to me, all these arbitrary restrictions (arbitrary as in not forced by technical reasons) frankly just feel like somebody else trying to force their ideas of what good code should look like on me, when they have no idea what my project or the people working on it are like.
It also shouldn’t be forgotten that these additional restrictions make the language not only harder to learn (because there’s more arbitrary rules to memorize) but also less fun to learn. We should strive to have a language where, while you’re learning it, you have those moments where it clicks and you realize that, wait, you can put those two things together in that way too? How cool is that? And that moment shouldn’t be destroyed because your mommy comes in and tells you, no, you can’t do that, it’s too dangerous for you.

I also see that you haven’t considered my other points. If relative scoping is only permitted for named parameters, they

can’t be used in val definitions
… or return values
… or Lists
don’t work in Java methods (no named parameters)
are ugly in tuples

Perhaps some teams like to put guardrails like that around themselves, and that’s fine, they can write a scalafix rule for it. I see this kind of rule firmly in the territory of linting tools, not the language proper.

mberndt · July 24, 2024, 1:59pm

Oh and one more thing: while it’s probably pretty obvious that I don’t agree with everybody on everything, I very much do appreciate the civil discussion of ideas and the time that people have been putting into it. Thanks everybody for your continued engagement.

ragnar · July 24, 2024, 5:51pm

Adding new syntax and an entirely new way to impose restrictions seems suboptimal. If someone encounters these examples in a codebase, they won’t easily understand what is happening. The code is non-discoverable and difficult to search for.

I prefer the simplicity of @mberndt’s earlier proposal, which recommends using a new symbol to infer the companion object without any magic – just an import:

import scala.compiletime.companion as <>

val l: List[Int] = <>(1,2,3,4)

(Note: <> is chosen randomly since @ is not an allowed symbol)

Delegating the choice of something short to reduce boilerplate is slightly cheating, but it is still an improvement because this renaming import is needed only once, not per constructor.

Specifically, (companion.abc(xyc): T) would compile to T.abc(xyc), which seems nearly achievable with macros, except the inferred return type seems inaccessible.

This approach is clear in terms of documentation and how to find it. There are no new principles to learn, just some inferred types and forwarded methods.

It is not as concise as some other proposals, but it seems to be an improvement without real downsides, except possibly conflicting with a better proposal.