Pre-SIP: a syntax for aggregate literals

With a bit of tweaking, it’s basically JSON:

val preferArgumentNames: List[Person] = (
  (name = "Martin", (year = 1958, month = 9, day = 5)),
  (name = "Meryl", (year = 1949, month = 6, day = 22))
)

JSON and it’s family of data formats (YAML, TOML, Jsonnet, etc.) are basically the most popular way of writing out hierarchical data on the planet. JSON is often the way you specify hierarchical data structures in Python, the way you specify hierarchical data structures in Javascript, and the way you specify hierarchical data structures in most languages through parsing external files.

It turns out the “positional arrays and key-value objects” is a very universal pattern for programming languages and data structures; consider your first “Java 101” course where someone learns about classes with named fields and positional arrays, or “C 101” course where someone learns about structs and arrays. Sometimes it’s a bit of a stretch (e.g. do you want a syntax for Sets?) but it’s overall JSON has been incredibly successful. And this proposal does provide an answer to the question of how to square a JSON-ish anemic syntax with Scala’s rich collection of data structures, using target-typing.

The basic issue comes down to a question of being “data first” v.s. being "name first. Following @Ichoran’s example, why have tuples at all when you can just define class p(val v1: Foo, val v2: Bar)? Why have apply method sugar, when you can just def b(...) and call foo.b() all the time? Why have singleton object syntax, when people can just define their own public static Foo v()?

The answer is that we used to do all these things in Java 6, but there are scenarios where the name is not meaningful, and forcing people to come up with short meaningless names is worse than having no name at all. Being able to smoothly transition from “name first” to “data first” depending on the context is valuable. The proposed feature here allowing developers to smoothly transition from name-first object instantiations to some kind of data-first definition of data structures is just another step in the direction Scala has been moving in for decades, and has plenty of precedence elsewhere in the programming ecosystem

3 Likes

But this is exactly named tuples. So is this suggesting that all we need is for automatic adaptation specifically of named tuples into classes with apply methods with corresponding names?

That is a pretty hard-to-abuse feature, I agree. It’s far less ambitious, though.

5 Likes

For me, a lot of this amounts to implicit conversions … with all their greatness and pitfalls.

In fact, with a combination of inline conversions dutifully macro-generated, and then imported into context, you can approximate this feature with minimal boilerplate.

Sketching this out,

// library code
type FromTuple[T] = Conversion[NamedTupleOf[T], T] 

object FromTuple:
  inline def gen[T]: FromTuple[T] = ???

// user-defined allowed tuple/varargs-to-type conversions
package com.quick.profit
given FromTuple[Person] = FromTuple.gen
given FromTuple[Birthday]= FromTuple.gen

// profit!
import scala.language.implicitConversions
import no.scruple.varargs.given
import com.quick.profit.given

val preferArgumentNames: List[Person] = (
  (name = "Martin", (year = 1958, month = 9, day = 5)),
  (name = "Meryl", (year = 1949, month = 6, day = 22))
)
4 Likes

Regarding tooling, I think it’s not difficult to solve. The moment we have any token to indicate “apply relative scoping here”, whether that’s [ or @ or .. or whatever, you can Ctrl-click on that to get to the definition.

Regarding legibility, I think there are enough cases where the meaning of the data can be deduced from the data itself. When you see ..("Alex", ..(1993, 7, 8)) then you need neither field nor type names to know whats going on. And it’s like that for many cases. Something like ["US78378X1072"] is immediately obvious to anybody who’s worked in finance: it’s an ISIN, and uglifying it to ISIN("US78378X1072") or [isin = "US78378X1072"] doesn’t make the code better, it makes it worse.

We shouldn’t assume that we know better than Scala developers how to best make their code easy to read, also because this encourages workarounds that are often worse.
“Wait, I need to spell out all the field names? Ah whatever, I’ll just use tuples then”

List(
  ("Alex", 1993, 7, 8)
).map((n,y,m,d) => Person(n, Birthday(y, m, d))

This is worse than whatever could be done with liberal use of the proposed syntax because tooling cannot help you any more once you do this, whereas X-Ray mode in IntelliJ can already show you method parameter names today. And to be honest, that alone should be enough to dispel concerns about readability.

I also strongly dislike the idea of making this opt-in via a given or something. It makes it harder to use for data types that you don’t control (from libraries), it has unnecessary run-time overhead and it adds more distracting boilerplate (like the given declarations themselves but potentially also import statements to make them available) when the idea was to have less of that.

To me I think the core of this proposal is two things:

  1. Automatic adaptation of named tuples (or equivalent syntax) into case classes
  2. Automatic adaptation of positional tuples (or equivalent syntax) into Scala collections

Both of these can be based on target typing, and would give a concise JSON-ish way of declaring hierarchical data in a Scala program while still letting it be coerced into nominally-typed data structures.

IMO it’s not that much of a stretch to go one step further:

  1. Automatic adaptation of positional tuples into case classes

To me, the core of the proposal is really to allow some kind of lightweight anonymous notation for hierarchical data structures that fits into Scala’s nominally-typed case classes and rich collections library. Target typing seems like it should get us most of the way there.

The exact syntax doesn’t matter so much for me, but given we already have positional-tuple and named-tuple syntax, it seems most straightforward to re-use it rather than coming up with a new square-bracket-based syntax

As @mberndt mentions, IntelliJ has X-Ray mode to “desugar” a lot of Scala language features already: implicits, type inferences, and so on. Having such a feature apply to target-typed hierarchical data is very natural, and would give us the best of both worlds: concise declaration of hierarchical data structures while still giving the programmer visibility into all the nominal inferred types along the way

5 Likes

The way I see it by now, the core of this proposal is one thing: making it easier/terser to refer to the type expected where the expression is located. Let’s say we use @ as a placeholder for that type. If a List is expected, then @(1,2,3) will be List(1,2,3). If a Person is expected, @("Matthias", @.of(???, 7, 11)) will be Person("Matthias", LocalDate.of(???, 7, 11)) (assuming that the second field of Person has type LocalDate).

This syntax doesn’t know or care whether it’s being used to build a case class, a collection, or even something else like a LocalDate (which is a Java class and hence neither of the two). This is very simple to explain and teach, and I would even say that it’s probably easy to implement as well, because a lot of the machinery is already in place. The types of Lambda parameters are deduced from the type expected in that position, so the Scala compiler already has some notion of that expected type. And the rules for how the whole “placeholder” thing works are also already specified, because we already have a type of expression that uses a placeholder, namely the abbreviated lambda syntax with _ as a placeholder.

To make the syntax even cleaner, we could say that @foo is equivalent to @.foo, e. g. @of(???, 7, 11) for LocalDate.

At the risk of tooting my own horn here: to me, this feels just right. It’s simple in every way I can think of, it’s quite flexible and general, and it has the potential to massively cut down on boilerplate. What’s not to like?

Whoa, hang on there! You just threw away all the identifying information and called it “not much of a stretch”. That’s like saying it’s not much of a stretch to go from

[
  { "name": "Martin", "DOB": { "year": 1958, "month": 9, "day": 5 } },
  { "name": "Meryl", "DOB": { "year": 1949, "month": 6, "day": 22 } }
]

to

[
  ["Martin", [1958, 9, 5]],
  ["Meryl", [1949, 6, 22]]
]

But, in fact, in key-value-land (JSON specifically) you ubiquitously don’t see that.

So I reject the premise. This is an enormous, game-changing step. You go from naming what you’re talking about to failing to name what you’re talking about.

Named tuples are arguably isomorphic to the structural type (i.e. data interface) of a data-bearing class. Positional arguments are already how you call varargs anyway–it’s just a sequential list. But positional tuples neither mention the class nor the interface. Is (2, 3) a 2-arg vector? Start and end indices? Start index and length? The two axes of an ellipse? Month and day? No clue!

It might be a great improvement to the game when context makes the meaning of your (2, 3) clear. But even though you can describe the three adaptations with sentences that are superficially very similar, the consequences are vastly different.

And this simplification is what (IMO) is now not much of a stretch.

@(2, 3) when you aren’t completely solid on the context.

4 Likes

Sure, you don’t see it in key-value JSON land. But you do see it in the next most popular data format on the planet: CSV. And there’s a whole family of similar formats with unlabelled values (e.g. .xls)

A CSV with a header row of labels and rows of unlabelled data is similar to a type signature followed by a big list of unlabelled tuples. Obviously not 100% identical, but close. And like CSV vs JSON, either style is useful in different scenarios depending on how much you value explicitness vs compactness

Given Scala traditionally treats param names optionally, in method calls, constructors, and soon tuples, having the names be optional here seems very reasonable to me.

If we were discussing Swift, with its mandatory names at every callsite, then I would agree that the names in this proposal should be mandatory. But Scala has a different convention

1 Like

But, in fact, in key-value-land (JSON specifically) you ubiquitously don’t see that.

Actually what you see in JSON is that ~nobody encodes dates as {year: 1958, month:9, day: 5}, everybody uses "1959-09-05" instead, because you don’t need field labels to see that it’s a date.

@(2, 3) when you aren’t completely solid on the context.

Sure: if you don’t know the context and whoever wrote the code decided not to put field names and for whatever reason can’t or won’t use a tool to help you out, then it’s hard to read. And if in addition to all that you assume that the author of that code isn’t going to work around any restrictions that we impose on this feature (e. g. by using tuples instead), then enforcing field names might help to make this code easier to read. But to me, those are too many ifs to force mandatory field names on the many more cases where that is not necessary.

I think the most general solution would be to use tuples for everything, with automatic conversion to expected target type:

val xs: List[Int] = (3, 10, 7, 5, 18)
val xs: List[Int] = (3, 10, "foo", 5, 18) // type error

val xs = (3, 10, 7, 5, 18)
val ys: List[Int] = xs

case class Person(name: String, age: Int, height: Double)

val tom: Person = ("Tom", 34, 1.75)
val tom: Person = ("Tom", 34, 175) // type error

val tom = ("Tom", 34, 1.75)
val person: Person = tom

val tom: Person = (name = "Tom", age = 34, height = 1.75)
val tom: Person = (name = "Tom", age = 34, weight = 1.75) // type error

val tom = (name = "Tom", age = 34, height = 1.75)
val person: Person = tom

val persons: List[Person] =
  (
    ("Tom", 34, 1.75),
    ("Bob", 23, 1.72),
    ("Joe", 45, 1.81)
  )

Whether this is technically possible I don’t know, but from a user’s perspective it would be very convenient.

@kavedaa

Putting aside the obvious type safety issues for a moment, the “everything is a tuple” approach isn’t enough to solve the problem that relative scoping solves. It doesn’t work with types like LocalDate that you need to create with a factory method (LocalDate.of), and it doesn’t solve the problem that creating enum values is currently too verbose.

With relative scoping you can do this:

enum Shape:
  case Rectangle(height: Int, width: Int)
  case Circle(radius: Int)

def f(s: Shape) = ???

// current syntax
f(Shape.Rectangle(x, y))
// proposed syntax
f(@Rectangle(x, y))
1 Like

We could go with “everything is a tuple” for apply constructors, and the rest of the constructors are in-effect with relative scoping with or even without a relevant leading token (., .., @, @., $.).

I think it’s definitely worth exploring how far we can get without changing the language. e.g., with named tuples and positional tuples and generic tuples, and implicit conversions, could we implement most of this in user-land as implicit conversions?

The scope-injection hierarchical/relative-scoping stuff definitely can’t be done without language changes, but the “positional tuple is converted to collection” “named tuple is converted to case class” “positional tuple is converted to case class” stuff all seems like it could be done via implicit conversions. With macros, they could even be done without runtime overhead.

It’s unclear to me where the limits of implicit conversions are, but with the new generic-tuple/named-tuple stuff it seems like we should even be able to make nested hierarchical data implicitly convertable based on target typing.

4 Likes

The problem is that an implicit conversion cannot rely on another implicit conversion. So if in an explicit constructor relies on an implicit conversion for one of its arguments, that will not work if that was a tuple implicitly converted to a the constructor. Maybe recursively we can summon Conversion[] for each argument in an inline function. Not sure if it works.

Well, one thing that can’t be solved with the tuple conversion strategy is the “LocalDate” problem, i. e. objects that need to be created via a factory method. This also affects pure-Scala classes like cats’ NonEmptyList, which is created using the of method in the companion object. Except of course if you provide some way to extend this to user-defined types, in which case it won’t work uniformly and every time you want to use it you get to guess whether the library you’re using supports it or not.

Another thing that won’t work, as @Ichoran has pointed out, is multiple parameter lists and using clauses.

1 Like

I don’t think we should considers these a priority at all. A simple feature that can cater to the common case is better than a complex feature that can tend to all. We’re trying to better represent data structures via Scala. Multiple parameter and using blocks are outside of that scope.

2 Likes

A simple feature that can cater to the common case is better than a complex feature that can tend to all.

True, but I don’t think that having @ (or # or whatever syntax we decide on) as an alias for the expected type is particularly complex. In fact, I think that it’s simpler than a bunch of macro-based tuple conversions, and more importantly, it’s quite explicit and communicates intent to both humans and non-humans. You won’t be able to get e. g. parameter assistance from your editor if your editor thinks you’re just writing a tuple, and your editor won’t be able to show you parameter names in something like X-Ray mode either. If you do spell them out and make a typo, you won’t get a clear error message along the lines of “parameter f0o doesn’t exist, did you mean foo?”, it would just show you a generic type error, and now good luck figuring out which field you misspelled.
The best an IDE could do to show you what’s going on is to show you the implicit conversion and allow you to Ctrl-click on that. Then you’re taken to some macro definition that is likely going to be impenetrable to most users.

I’m aware that I’m biased here since I’m the one who started this thread… But I do think that a syntax extension is warranted here for all these reasons, also because we’re going to need one anyway for the reason that Haoyi mentioned: scope injection of enum constructors etc…

1 Like

I think this is a tempting but ultimately counterproductive way to think about it.

Scala’s best meta-feature is that its features work together. Practically every time this isn’t true, it rankles.

Take context functions, for instance. Methods take context but functions can’t?! This cripples your ability to create abstractions, and forces you to fall back on trait FunctionWithTheContextIWant { def apply(foo: Foo, bar: Bar)(using MyContext): Baz }. Generalizing has been a big win, I think.

Scala has plenty of expressive power to handle the fewer-explicit-type-mentions-for-data feature. But I don’t think we should bolt on something simple and consider the more general case as “not a priority at all”. We should get the general case completely clear, then if we think it’s too powerful to unleash or too hard to implement, we can take the easy case first.

In particular, we have four different features we could build off of in the data case.

  1. We can view it as implicit conversion of tuples.
  2. We can view it as a spread operation a la varargs xs* (with or without a spread operator).
  3. We can view it as a particular case of relative scoping (with or without a scoping operator).
  4. We can view it as novel literal syntax, unlike everything else.

If it is an implicit conversion of tuples, then we aren’t dealing just with ("Leslie", (1966, 9, 15)), but also ("Leslie", dob) where dob is a tuple type not a DateOfBirth class. We might not unlock it, but that’s the generalization.

If it is a spread operation, then you’d expect it to spread wherever you need it, at least if the feature is generalized at some point. So s.substring(r*) should work, where r is a 2-tuple of ints and * is our spread operator. Maybe Array[Int](p*, p*, (3, 5)*) should work too–it’s common for spreads to expand into varargs. Maybe ("Leslie", (ym*, 15)*)* should work, where ym* is a 2-tuple containing two ints. Also common for spreads.

If it is relative scoping, then ..of(1958, 9, 15) should work too (where .. is the prefix relative scoping operator)–again, if we decide to go for more generalization.

If it’s its own special snowflake, unrelated to everything else, then it should be expected to pass a much higher bar because you’re introducing a new feature that intentionally doesn’t have any broader use, anything that helps you reason about it. It’s just yet another thing to learn, for one particular use case. Scala is already perilously heavy on separate things to learn–all for good reason, pretty much, but we can’t discount the burden. Enums? Match types? summonInline? Context bounds? Context functions? Named tuples, maybe?

Thus, I disagree that

is a good policy from which to approach holistic language design. You absolutely do want to cover the common use case well, but if you’re spending your force-programmers-to-learn-a-new-thing budget, it should be a wise expenditure. This means considering very carefully whether one can solve other pressing problems with the same concept. Especially since the other pressing problems which are related are on the table right now.

So I advocate, strongly, for considering all the possible generalizations even if the feature for now ends up just being val CaseClass = (5, 1, 2, "herring", true)–purely literals, only in named cases, etc… If we haven’t thought through the generalizations, and tentatively picked one of which we’re implementing a special case, we won’t know how to set it up for potential future language development. In the not-very-long-run this leads to a kitchen sink language or a static language.

8 Likes

Agree with the whole message, I’d like to add a little to it

For me, this is an absolute no-no, if you’re going to re-use a value like dob, you should make it strongly typed.
This has a downside when refactoring:

List(
  ("Leslie", (1966, 9, 15)),
  ("Johnie", (1966, 9, 15)),
)
// refactored to
val dob = (1966, 9, 15)
List(
  ("Leslie", dob), // error
  ("Johnie", dob), // error
)

You’ll note that refactoring is basically the only time when this would happen, furthermore, when refactoring, we tend to pay close attention to the compiler’s output.
Therefore a better solution would be a kind of “quick action” coming with the error, in the same way git has in VS code.
It could say something like:
dob is of inferred type (Int, Int, Int), where DateOfBirth was expected, however the definition of dob, (1966, 9, 15), is a valid literal for type DateOfBirth, do you want to had the explicit type DateOfBirth to dob ?”

2 Likes

If it was made in user-land, could it be added to the standard library ?
(Asking mainly SIP folks)

That would strike a good balance between “not complicating the language further” and “there’s 20 libraries that work differently”

2 Likes