Pre-SIP: a syntax for aggregate literals

Thanks for the examples! So let’s think about the simplest rule that would pretty much work.

Let’s say that is it. There’s nothing else going on: it is entirely a syntactic, not a semantic desugaring, save that we need the semantic awareness of whether a type is already expected at a position.

We probably need [...] to function as a type inference barrier. Although one can imagine a solver that would figure out from the types inside [...] what it could possibly match, and from that figure out what calling types could be expected, and so on, it would be extremely opaque to programmers even if the compiler could often figure out the puzzle.

Furthermore, we’re expecting [...] to parallel method arguments (granted, only on the apply method, but that can be as general as any method). That means we need to figure out what to do about named arguments, multiple parameter blocks, and using blocks. For example, what if we have

def apply(foo: Foo)()(bar: Bar)(using baz: Baz) = ???

But hang on! Relative scoping was already suggesting that we use . (or ..) to avoid having to specify the class name over and over again. Bare . in the right scope would then just be…apply!

So if we write

val xs: Array[Person] = .(
  .("John", .(1978, 5, 11)),
  .("Jane", .(1987, 11, 5))
)

it’s arguably the exact same feature, and since we are literally using the same syntax for the constructor/apply call (with . in for the class name), there aren’t any weird gotchas to think through. Everything already works; the only thing we need to specify is where the relative scope is “we expect this type”.

This was suggested for things like .Id(3), where a single dot looks reasonable, but to me anyway it looks extra-weird without any identifier. Even though it’s longer, the double dot feels better to me:

val xs: Array[Person] = ..(
  ..("John", ..(1978, 5, 11)),
  ..("Jane", ..(1987, 11, 5))
)

So I think the two features end up completely unified at this point.

class Line(val width: Double) {}
class Color(r: Int, g: Int, b: Int):
  val Red = Color(255, 0, 0)

class Pencil(line: Line, color: Color) {}

val drawing = Pencil(..(0.4), ..Red)

would just work, all using the same mechanism.

Furthermore, it is hard to see why [0.4] should work and .Red or somesuch should not work. The “type is known and saying the name over again is redundant” thing is similarly bad.

Catching two birds with one net seems appealing to me.

6 Likes

Although I agree with Ichoran’s points, I oppose both this and the Relative scoping idea… they are both horribly confusing and unreadable. Once one (or both) of these features are out, they will spread everywhere, everyone will be using them where they are not needed at all, purely out of sheer laziness and silly minimalist aesthetic reasons. (We live in an age where people cannot be bothered to say full words, instead they abbreviate it to the first 3-4 letters.)

Just leave it as it is, I have no problem writing Array and Person. There is such a thing as too much conciseness. It would be fine as a library, but should not be made part of the language (even as opt-in).

2 Likes

To me a sign of a good feature is a useful one.

2 Likes

I am thinking now it is probably good to consider this alongside the spread heterogenous arguments proposal

edit: this one

Hey @Ichoran,

Thanks for engaging once again!

We probably need [...] to function as a type inference barrier.

Oh absolutely, every other way lies madness.

Furthermore, we’re expecting [...] to parallel method arguments (granted, only on the apply method, but that can be as general as any method). That means we need to figure out what to do about named arguments, multiple parameter blocks, and using blocks.

Good point. While I had thought about named parameters (and came to the conclusion that [foo = 42] should work fine), I hadn’t considered multiple parameter lists or using clauses. Would they be problematic though? It seems straight-forward enough:
[foo = 42][bar][using baz].

The only potential issue here is that putting [bar] after an expression usually means “call this method and supply the type parameter bar”. But I think it’s not a problem because calling a method on a […] expression doesn’t make sense. You need to know what the type of an expression is to call a method on it, but […] expressions don’t know what their type is, they need to have it imposed from the outside. So it should all be fine.

it’s arguably the exact same feature, and since we are literally using the same syntax for the constructor/apply call (with . in for the class name), there aren’t any weird gotchas to think through. Everything already works; the only thing we need to specify is where the relative scope is “we expect this type”.

This is actually a really interesting thought, which led me to another idea. We already have abbreviated Lambda syntax with the _ placeholder. When a _ occurs in an expression, it is desugared by replacing the _ with an identifier, x say, and then prefixing the whole thing with x =>. So _ + 3 becomes x => x+3.
We already have a set of rules here to sort out the scoping details (e. g. does f(g(_)) mean x => f(g(x)) or f(x => g(x))? – it’s the latter).
What if we used the exact same set of rules, but with some other token – @, say – that is then replaced with the expected type?
val x: List[Int] = @(1,2,3) would desugar to val x: List[Int] = List(1,2,3). So far, this is equivalent to the [] syntax we had been discussing so far, but unlike that proposal, it isn’t limited to mere function application. For example this would work:

def foo(date: java.time.LocalDate) = ???

foo(@.of(1958, 9, 5))

The downsides are that it’s a tiny bit more verbose and that we wouldn’t be using [] for lists like most languages do – but Scala isn’t other languages, so that part is probably fine.
The upside is that it would be more flexible (the LocalDate thing) and that we’d be reusing a set of existing syntax rules from abbreviated lambdas, so it should be easy to teach.
I have to say this feels absolutely right to me, I love this idea! Thanks for coming up with that (after all it’s essentially the same as the ..(1978, 5, 11) syntax that you proposed).

Re merging into a different proposal: I had proposed that to @soronpo, but he preferred to have two separate proposals – I’m prepared to discuss this when he is. As for the heterogeneous spread thing – I need to look into it.

1 Like

As discussed on discord, I proposed to expand the experimental Generic Numeric Literals into a Generic Constructor Literals.
So similarly we can define an implicit of something like:

trait FromLiteralList[T]:
  def fromList(args: Any*): T

It would trigger for literal lists defined by [] (e.g., ["hello",[1, 2, some_param]]).
Then we can add a specific FromLiteralList for case classes that enforces the types can can recursively summon FromLiteralList or FromDigits to each of the target case class arguments.

case class Foo(arg1: Int, arg2: String, bar: Bar)
case class Bar(arg1: String, list: List[Int])

val foo: Foo = [1, "2", ["bar1", [1, 2, 3, 4, 5, 6]]]
2 Likes

This is to me is a a much better way to implement this feature, it’s more general, and has clearer semantics

As for if this feature should exist in the first place, I must say I’m not super convinced.

A point to remember is that in C++ when declaring a variable you have to specify a type, always, even if just “auto”.
It therefore makes a lot of sense to avoid repetition by removing the need for specifying the class name twice.
And this is not something we have in Scala, since variables get their types inferred.

Furthermore, having used this feature in C++, it very easily leads to writing code that’s obvious when you write it, and you have all the context in your head, but very hard to read afterwards:
Of course this can be managed with code etiquette/best practices/etc but it’s another choice, another burden on the programmer.

I have never thought of doing this before, but I must say it seems very elegant (as long as it is declared very close to the use-site).
It has the advantage that the context is spelled right in front of you, making re-reading easier.
It’s syntactic weight is also an upside for me: It nudges users into not using it unless there’s really a lot of data.
It’s also easier to refactor out: Just replace all bs with Birthday and you’re done !

There’s also the tooling support question, when you have a name, you can control click it and it brings you to the definition, with just a pair of parentheses or brackets, it’s not as clear.
Especially since it’s harder to aim for a single character than a full class name.

1 Like

underrated point - how does visual studio / jetbrains solve hover/goto definition with it in C#? or c++

3 Likes

From what I remember it doesn’t ^^’
At least in VS code

1 Like

I’m also not super convinced, but if we’re going to do it, I want the best version of it that we can think of, and I think the aggregate literal = relative scoping unification is a step in that direction.

I haven’t been able to think of intentional restrictions that increase the safety and clarity. And yes, I use C++ also, and yes, I also find that the clarity suffers in many instances. Part of that is C++'s propensity for adding a new feature if there is any use case covered by it, and thus tending to accumulate multiple alternatives for doing the same thing, but mostly it’s just that {2, 5, true} is pretty opaque, whether you spell it with braces, brackets, or double-dot parens.

One could try unlocking it only in a varargs context, so

val p: Person = ..("Leslie", ..(1994, 7, 8))

wouldn’t work, but

val ps: Seq[Person] = ..(
  ..("Leslie", ..(1994, 7, 8)),
  ..("Alex", ..(1993, 7, 8)),
)

would. But this doesn’t make Person("Chris", Birthday(1949, 12, 31)) any less annoying to type.

And since you couldn’t stop the apply version for relative scoping if everything was fair game, you would have to do something like restrict to stable identifiers (val and object) to get

shape.draw(Line(2.5), ..Red)

to work (and the Line part wouldn’t work because it’s not varargs).

You could have to opt in with a keyword, but that would be practically useless because you can’t expect the entire ecosystem to change, and there are lots of places where you otherwise would have a lot of redundancy.

So I think the problem is that the feature requires the kind of care that is honestly pretty difficult to apply at the time, because of course you know the types, this is super obvious…while you’re writing it. The new hire, two years later, does not find it obvious. Neither do you, when you come back to it after two years and they’re all, “What even is this?!”

So that leaves gating it behind a language.relativeScoping import, and hoping that this induces people to only reach for the feature in the cases where it’s most justified. And we maybe could provide a rewrite tool where the compiler would fill in the .. (or . or whatever we decide, if we do this) as a backstop against uninterpretability.

That’s all possible and has some precedent, which leaves me on the fence as to whether this is a good idea or not.

2 Likes

I guess another possibility is to require argument names save for varargs. This wouldn’t give you less boilerplate, but it would allow you to pick your poison. There would be no choice that would burden the programmer with choosing to be clear instead of obscure.

val preferCaseClassNames = List(
  Person("Martin", Birthday(1958, 9, 5)),
  Person("Meryl", Birthday(1949, 6, 22))
)

val preferArgumentNames = List[Person](
  ..(name = "Martin", ..(year = 1958, month = 9, day = 5)),
  ..(name = "Meryl", ..(year = 1949, month = 6, day = 22))
)

But to me this seems worse in almost every case. It’s extremely verbose.

With a bit of tweaking, it’s basically JSON:

val preferArgumentNames: List[Person] = (
  (name = "Martin", (year = 1958, month = 9, day = 5)),
  (name = "Meryl", (year = 1949, month = 6, day = 22))
)

JSON and it’s family of data formats (YAML, TOML, Jsonnet, etc.) are basically the most popular way of writing out hierarchical data on the planet. JSON is often the way you specify hierarchical data structures in Python, the way you specify hierarchical data structures in Javascript, and the way you specify hierarchical data structures in most languages through parsing external files.

It turns out the “positional arrays and key-value objects” is a very universal pattern for programming languages and data structures; consider your first “Java 101” course where someone learns about classes with named fields and positional arrays, or “C 101” course where someone learns about structs and arrays. Sometimes it’s a bit of a stretch (e.g. do you want a syntax for Sets?) but it’s overall JSON has been incredibly successful. And this proposal does provide an answer to the question of how to square a JSON-ish anemic syntax with Scala’s rich collection of data structures, using target-typing.

The basic issue comes down to a question of being “data first” v.s. being "name first. Following @Ichoran’s example, why have tuples at all when you can just define class p(val v1: Foo, val v2: Bar)? Why have apply method sugar, when you can just def b(...) and call foo.b() all the time? Why have singleton object syntax, when people can just define their own public static Foo v()?

The answer is that we used to do all these things in Java 6, but there are scenarios where the name is not meaningful, and forcing people to come up with short meaningless names is worse than having no name at all. Being able to smoothly transition from “name first” to “data first” depending on the context is valuable. The proposed feature here allowing developers to smoothly transition from name-first object instantiations to some kind of data-first definition of data structures is just another step in the direction Scala has been moving in for decades, and has plenty of precedence elsewhere in the programming ecosystem

3 Likes

But this is exactly named tuples. So is this suggesting that all we need is for automatic adaptation specifically of named tuples into classes with apply methods with corresponding names?

That is a pretty hard-to-abuse feature, I agree. It’s far less ambitious, though.

5 Likes

For me, a lot of this amounts to implicit conversions … with all their greatness and pitfalls.

In fact, with a combination of inline conversions dutifully macro-generated, and then imported into context, you can approximate this feature with minimal boilerplate.

Sketching this out,

// library code
type FromTuple[T] = Conversion[NamedTupleOf[T], T] 

object FromTuple:
  inline def gen[T]: FromTuple[T] = ???

// user-defined allowed tuple/varargs-to-type conversions
package com.quick.profit
given FromTuple[Person] = FromTuple.gen
given FromTuple[Birthday]= FromTuple.gen

// profit!
import scala.language.implicitConversions
import no.scruple.varargs.given
import com.quick.profit.given

val preferArgumentNames: List[Person] = (
  (name = "Martin", (year = 1958, month = 9, day = 5)),
  (name = "Meryl", (year = 1949, month = 6, day = 22))
)
4 Likes

Regarding tooling, I think it’s not difficult to solve. The moment we have any token to indicate “apply relative scoping here”, whether that’s [ or @ or .. or whatever, you can Ctrl-click on that to get to the definition.

Regarding legibility, I think there are enough cases where the meaning of the data can be deduced from the data itself. When you see ..("Alex", ..(1993, 7, 8)) then you need neither field nor type names to know whats going on. And it’s like that for many cases. Something like ["US78378X1072"] is immediately obvious to anybody who’s worked in finance: it’s an ISIN, and uglifying it to ISIN("US78378X1072") or [isin = "US78378X1072"] doesn’t make the code better, it makes it worse.

We shouldn’t assume that we know better than Scala developers how to best make their code easy to read, also because this encourages workarounds that are often worse.
“Wait, I need to spell out all the field names? Ah whatever, I’ll just use tuples then”

List(
  ("Alex", 1993, 7, 8)
).map((n,y,m,d) => Person(n, Birthday(y, m, d))

This is worse than whatever could be done with liberal use of the proposed syntax because tooling cannot help you any more once you do this, whereas X-Ray mode in IntelliJ can already show you method parameter names today. And to be honest, that alone should be enough to dispel concerns about readability.

I also strongly dislike the idea of making this opt-in via a given or something. It makes it harder to use for data types that you don’t control (from libraries), it has unnecessary run-time overhead and it adds more distracting boilerplate (like the given declarations themselves but potentially also import statements to make them available) when the idea was to have less of that.

To me I think the core of this proposal is two things:

  1. Automatic adaptation of named tuples (or equivalent syntax) into case classes
  2. Automatic adaptation of positional tuples (or equivalent syntax) into Scala collections

Both of these can be based on target typing, and would give a concise JSON-ish way of declaring hierarchical data in a Scala program while still letting it be coerced into nominally-typed data structures.

IMO it’s not that much of a stretch to go one step further:

  1. Automatic adaptation of positional tuples into case classes

To me, the core of the proposal is really to allow some kind of lightweight anonymous notation for hierarchical data structures that fits into Scala’s nominally-typed case classes and rich collections library. Target typing seems like it should get us most of the way there.

The exact syntax doesn’t matter so much for me, but given we already have positional-tuple and named-tuple syntax, it seems most straightforward to re-use it rather than coming up with a new square-bracket-based syntax

As @mberndt mentions, IntelliJ has X-Ray mode to “desugar” a lot of Scala language features already: implicits, type inferences, and so on. Having such a feature apply to target-typed hierarchical data is very natural, and would give us the best of both worlds: concise declaration of hierarchical data structures while still giving the programmer visibility into all the nominal inferred types along the way

5 Likes

The way I see it by now, the core of this proposal is one thing: making it easier/terser to refer to the type expected where the expression is located. Let’s say we use @ as a placeholder for that type. If a List is expected, then @(1,2,3) will be List(1,2,3). If a Person is expected, @("Matthias", @.of(???, 7, 11)) will be Person("Matthias", LocalDate.of(???, 7, 11)) (assuming that the second field of Person has type LocalDate).

This syntax doesn’t know or care whether it’s being used to build a case class, a collection, or even something else like a LocalDate (which is a Java class and hence neither of the two). This is very simple to explain and teach, and I would even say that it’s probably easy to implement as well, because a lot of the machinery is already in place. The types of Lambda parameters are deduced from the type expected in that position, so the Scala compiler already has some notion of that expected type. And the rules for how the whole “placeholder” thing works are also already specified, because we already have a type of expression that uses a placeholder, namely the abbreviated lambda syntax with _ as a placeholder.

To make the syntax even cleaner, we could say that @foo is equivalent to @.foo, e. g. @of(???, 7, 11) for LocalDate.

At the risk of tooting my own horn here: to me, this feels just right. It’s simple in every way I can think of, it’s quite flexible and general, and it has the potential to massively cut down on boilerplate. What’s not to like?

Whoa, hang on there! You just threw away all the identifying information and called it “not much of a stretch”. That’s like saying it’s not much of a stretch to go from

[
  { "name": "Martin", "DOB": { "year": 1958, "month": 9, "day": 5 } },
  { "name": "Meryl", "DOB": { "year": 1949, "month": 6, "day": 22 } }
]

to

[
  ["Martin", [1958, 9, 5]],
  ["Meryl", [1949, 6, 22]]
]

But, in fact, in key-value-land (JSON specifically) you ubiquitously don’t see that.

So I reject the premise. This is an enormous, game-changing step. You go from naming what you’re talking about to failing to name what you’re talking about.

Named tuples are arguably isomorphic to the structural type (i.e. data interface) of a data-bearing class. Positional arguments are already how you call varargs anyway–it’s just a sequential list. But positional tuples neither mention the class nor the interface. Is (2, 3) a 2-arg vector? Start and end indices? Start index and length? The two axes of an ellipse? Month and day? No clue!

It might be a great improvement to the game when context makes the meaning of your (2, 3) clear. But even though you can describe the three adaptations with sentences that are superficially very similar, the consequences are vastly different.

And this simplification is what (IMO) is now not much of a stretch.

@(2, 3) when you aren’t completely solid on the context.

4 Likes

Sure, you don’t see it in key-value JSON land. But you do see it in the next most popular data format on the planet: CSV. And there’s a whole family of similar formats with unlabelled values (e.g. .xls)

A CSV with a header row of labels and rows of unlabelled data is similar to a type signature followed by a big list of unlabelled tuples. Obviously not 100% identical, but close. And like CSV vs JSON, either style is useful in different scenarios depending on how much you value explicitness vs compactness

Given Scala traditionally treats param names optionally, in method calls, constructors, and soon tuples, having the names be optional here seems very reasonable to me.

If we were discussing Swift, with its mandatory names at every callsite, then I would agree that the names in this proposal should be mandatory. But Scala has a different convention

1 Like

But, in fact, in key-value-land (JSON specifically) you ubiquitously don’t see that.

Actually what you see in JSON is that ~nobody encodes dates as {year: 1958, month:9, day: 5}, everybody uses "1959-09-05" instead, because you don’t need field labels to see that it’s a date.

@(2, 3) when you aren’t completely solid on the context.

Sure: if you don’t know the context and whoever wrote the code decided not to put field names and for whatever reason can’t or won’t use a tool to help you out, then it’s hard to read. And if in addition to all that you assume that the author of that code isn’t going to work around any restrictions that we impose on this feature (e. g. by using tuples instead), then enforcing field names might help to make this code easier to read. But to me, those are too many ifs to force mandatory field names on the many more cases where that is not necessary.