Pre-SIP: a syntax for aggregate literals

Sure, you don’t see it in key-value JSON land. But you do see it in the next most popular data format on the planet: CSV. And there’s a whole family of similar formats with unlabelled values (e.g. .xls)

A CSV with a header row of labels and rows of unlabelled data is similar to a type signature followed by a big list of unlabelled tuples. Obviously not 100% identical, but close. And like CSV vs JSON, either style is useful in different scenarios depending on how much you value explicitness vs compactness

Given Scala traditionally treats param names optionally, in method calls, constructors, and soon tuples, having the names be optional here seems very reasonable to me.

If we were discussing Swift, with its mandatory names at every callsite, then I would agree that the names in this proposal should be mandatory. But Scala has a different convention

1 Like

But, in fact, in key-value-land (JSON specifically) you ubiquitously don’t see that.

Actually what you see in JSON is that ~nobody encodes dates as {year: 1958, month:9, day: 5}, everybody uses "1959-09-05" instead, because you don’t need field labels to see that it’s a date.

@(2, 3) when you aren’t completely solid on the context.

Sure: if you don’t know the context and whoever wrote the code decided not to put field names and for whatever reason can’t or won’t use a tool to help you out, then it’s hard to read. And if in addition to all that you assume that the author of that code isn’t going to work around any restrictions that we impose on this feature (e. g. by using tuples instead), then enforcing field names might help to make this code easier to read. But to me, those are too many ifs to force mandatory field names on the many more cases where that is not necessary.

I think the most general solution would be to use tuples for everything, with automatic conversion to expected target type:

val xs: List[Int] = (3, 10, 7, 5, 18)
val xs: List[Int] = (3, 10, "foo", 5, 18) // type error

val xs = (3, 10, 7, 5, 18)
val ys: List[Int] = xs

case class Person(name: String, age: Int, height: Double)

val tom: Person = ("Tom", 34, 1.75)
val tom: Person = ("Tom", 34, 175) // type error

val tom = ("Tom", 34, 1.75)
val person: Person = tom

val tom: Person = (name = "Tom", age = 34, height = 1.75)
val tom: Person = (name = "Tom", age = 34, weight = 1.75) // type error

val tom = (name = "Tom", age = 34, height = 1.75)
val person: Person = tom

val persons: List[Person] =
  (
    ("Tom", 34, 1.75),
    ("Bob", 23, 1.72),
    ("Joe", 45, 1.81)
  )

Whether this is technically possible I don’t know, but from a user’s perspective it would be very convenient.

@kavedaa

Putting aside the obvious type safety issues for a moment, the “everything is a tuple” approach isn’t enough to solve the problem that relative scoping solves. It doesn’t work with types like LocalDate that you need to create with a factory method (LocalDate.of), and it doesn’t solve the problem that creating enum values is currently too verbose.

With relative scoping you can do this:

enum Shape:
  case Rectangle(height: Int, width: Int)
  case Circle(radius: Int)

def f(s: Shape) = ???

// current syntax
f(Shape.Rectangle(x, y))
// proposed syntax
f(@Rectangle(x, y))
1 Like

We could go with “everything is a tuple” for apply constructors, and the rest of the constructors are in-effect with relative scoping with or even without a relevant leading token (., .., @, @., $.).

I think it’s definitely worth exploring how far we can get without changing the language. e.g., with named tuples and positional tuples and generic tuples, and implicit conversions, could we implement most of this in user-land as implicit conversions?

The scope-injection hierarchical/relative-scoping stuff definitely can’t be done without language changes, but the “positional tuple is converted to collection” “named tuple is converted to case class” “positional tuple is converted to case class” stuff all seems like it could be done via implicit conversions. With macros, they could even be done without runtime overhead.

It’s unclear to me where the limits of implicit conversions are, but with the new generic-tuple/named-tuple stuff it seems like we should even be able to make nested hierarchical data implicitly convertable based on target typing.

5 Likes

The problem is that an implicit conversion cannot rely on another implicit conversion. So if in an explicit constructor relies on an implicit conversion for one of its arguments, that will not work if that was a tuple implicitly converted to a the constructor. Maybe recursively we can summon Conversion[] for each argument in an inline function. Not sure if it works.

Well, one thing that can’t be solved with the tuple conversion strategy is the “LocalDate” problem, i. e. objects that need to be created via a factory method. This also affects pure-Scala classes like cats’ NonEmptyList, which is created using the of method in the companion object. Except of course if you provide some way to extend this to user-defined types, in which case it won’t work uniformly and every time you want to use it you get to guess whether the library you’re using supports it or not.

Another thing that won’t work, as @Ichoran has pointed out, is multiple parameter lists and using clauses.

1 Like

I don’t think we should considers these a priority at all. A simple feature that can cater to the common case is better than a complex feature that can tend to all. We’re trying to better represent data structures via Scala. Multiple parameter and using blocks are outside of that scope.

2 Likes

A simple feature that can cater to the common case is better than a complex feature that can tend to all.

True, but I don’t think that having @ (or # or whatever syntax we decide on) as an alias for the expected type is particularly complex. In fact, I think that it’s simpler than a bunch of macro-based tuple conversions, and more importantly, it’s quite explicit and communicates intent to both humans and non-humans. You won’t be able to get e. g. parameter assistance from your editor if your editor thinks you’re just writing a tuple, and your editor won’t be able to show you parameter names in something like X-Ray mode either. If you do spell them out and make a typo, you won’t get a clear error message along the lines of “parameter f0o doesn’t exist, did you mean foo?”, it would just show you a generic type error, and now good luck figuring out which field you misspelled.
The best an IDE could do to show you what’s going on is to show you the implicit conversion and allow you to Ctrl-click on that. Then you’re taken to some macro definition that is likely going to be impenetrable to most users.

I’m aware that I’m biased here since I’m the one who started this thread… But I do think that a syntax extension is warranted here for all these reasons, also because we’re going to need one anyway for the reason that Haoyi mentioned: scope injection of enum constructors etc…

1 Like

I think this is a tempting but ultimately counterproductive way to think about it.

Scala’s best meta-feature is that its features work together. Practically every time this isn’t true, it rankles.

Take context functions, for instance. Methods take context but functions can’t?! This cripples your ability to create abstractions, and forces you to fall back on trait FunctionWithTheContextIWant { def apply(foo: Foo, bar: Bar)(using MyContext): Baz }. Generalizing has been a big win, I think.

Scala has plenty of expressive power to handle the fewer-explicit-type-mentions-for-data feature. But I don’t think we should bolt on something simple and consider the more general case as “not a priority at all”. We should get the general case completely clear, then if we think it’s too powerful to unleash or too hard to implement, we can take the easy case first.

In particular, we have four different features we could build off of in the data case.

  1. We can view it as implicit conversion of tuples.
  2. We can view it as a spread operation a la varargs xs* (with or without a spread operator).
  3. We can view it as a particular case of relative scoping (with or without a scoping operator).
  4. We can view it as novel literal syntax, unlike everything else.

If it is an implicit conversion of tuples, then we aren’t dealing just with ("Leslie", (1966, 9, 15)), but also ("Leslie", dob) where dob is a tuple type not a DateOfBirth class. We might not unlock it, but that’s the generalization.

If it is a spread operation, then you’d expect it to spread wherever you need it, at least if the feature is generalized at some point. So s.substring(r*) should work, where r is a 2-tuple of ints and * is our spread operator. Maybe Array[Int](p*, p*, (3, 5)*) should work too–it’s common for spreads to expand into varargs. Maybe ("Leslie", (ym*, 15)*)* should work, where ym* is a 2-tuple containing two ints. Also common for spreads.

If it is relative scoping, then ..of(1958, 9, 15) should work too (where .. is the prefix relative scoping operator)–again, if we decide to go for more generalization.

If it’s its own special snowflake, unrelated to everything else, then it should be expected to pass a much higher bar because you’re introducing a new feature that intentionally doesn’t have any broader use, anything that helps you reason about it. It’s just yet another thing to learn, for one particular use case. Scala is already perilously heavy on separate things to learn–all for good reason, pretty much, but we can’t discount the burden. Enums? Match types? summonInline? Context bounds? Context functions? Named tuples, maybe?

Thus, I disagree that

is a good policy from which to approach holistic language design. You absolutely do want to cover the common use case well, but if you’re spending your force-programmers-to-learn-a-new-thing budget, it should be a wise expenditure. This means considering very carefully whether one can solve other pressing problems with the same concept. Especially since the other pressing problems which are related are on the table right now.

So I advocate, strongly, for considering all the possible generalizations even if the feature for now ends up just being val CaseClass = (5, 1, 2, "herring", true)–purely literals, only in named cases, etc… If we haven’t thought through the generalizations, and tentatively picked one of which we’re implementing a special case, we won’t know how to set it up for potential future language development. In the not-very-long-run this leads to a kitchen sink language or a static language.

9 Likes

Agree with the whole message, I’d like to add a little to it

For me, this is an absolute no-no, if you’re going to re-use a value like dob, you should make it strongly typed.
This has a downside when refactoring:

List(
  ("Leslie", (1966, 9, 15)),
  ("Johnie", (1966, 9, 15)),
)
// refactored to
val dob = (1966, 9, 15)
List(
  ("Leslie", dob), // error
  ("Johnie", dob), // error
)

You’ll note that refactoring is basically the only time when this would happen, furthermore, when refactoring, we tend to pay close attention to the compiler’s output.
Therefore a better solution would be a kind of “quick action” coming with the error, in the same way git has in VS code.
It could say something like:
dob is of inferred type (Int, Int, Int), where DateOfBirth was expected, however the definition of dob, (1966, 9, 15), is a valid literal for type DateOfBirth, do you want to had the explicit type DateOfBirth to dob ?”

2 Likes

If it was made in user-land, could it be added to the standard library ?
(Asking mainly SIP folks)

That would strike a good balance between “not complicating the language further” and “there’s 20 libraries that work differently”

2 Likes

IMO this proposal is beneficial only for the ~5% of time spent with a piece of code when writing it. Everything else (reading, maintaining, refactoring, diffing) is made worse. The impact would be more severe than implicits (because it would be used a lot more widely) without any gain to language expressiveness.

Also, tooling around Scala is already lagging and this would throw it back further. IDEs are not the only thing, e.g. code reviews.

Scala is already a very concise language. Hiding away what data types are being used makes it a lot harder to read and to reason about.

9 Likes

That is temporary and an excuse not to change anything in the language.

This is a good point. But we need to make this point about every substantive change to syntax.

The proposal right now is to change givens to given [A] => Ord[A] => Ord[List[A]]:. The tooling comment hasn’t come up there in the discussion (but maybe it has in the meetings?).

Named tuples are already in as an experimental feature, and that is also a syntax change that tooling won’t immediately support.

So, while true, I don’t think this should count extra against this specific proposal save to the extent that this is a tough one to align with tooling.

Actually, the main draw for me is in reading.

val x = Array(
  MyFavoriteThing(2, MyFavoriteThing.Watevr("eee")),
  MyFavoriteThing(174, MyFavoriteThing.Watevr("gg")),
  MyFavoriteThing(48, MyFavoriteThing.Watevr("m"))
)

is such a pain to read. Creating is easy: write one line, cut and paste, edit the parts you want to be different. But you can’t easily get 2, "eee" out, and really that’s what matters.

This is much easier to read yet has no ambiguity about what the types are:

val x = Array[MyFavoriteThing](
  ..(2, ..Watevr("eee")),
  ..(174, ..Watevr("gg")),
  ..(48, ..Watevr("m"))
)

It’s easier to read yet if you vertically align the values, which is commonly considered to be bad practice but the idea that it’s bad practice seems to be bad practice given that it favors writing over reading:

val x = Array[MyFavoriteThing](
  ..(  2, ..Watevr("eee")),
  ..(174, ..Watevr("gg")),
  ..( 48, ..Watevr("m"))
)

There isn’t, if done carefully, anything bad about this that isn’t already a problem with type inference. This is just constructor inference, which is the dual of type inference.

val m: List[Int] = List[Int](1, 2, 3)
val m = List[Int](1, 2, 3)   // Type inference
val m: List[Int] = (1, 2, 3) // Constructor inference

How can line 3 be any harder to read than line 2? We had unnecessary redundancy; in one case we removed it on the left and the other on the right.

The trick with this proposal is to make sure it is used to remove unnecessary redundancy that just makes everything harder for humans (albeit maybe easier for compiler and tooling developers, and at some point that’s the overriding concern), so that enough redundancy is removed to make it worth bothering but not so much as to reduce comprehensibility. And that is the hard part.

7 Likes

This is the one rather annoying thing in current Scala; explicit types on public API are very much recommended but sometimes you end up having to write the type twice.

The proposal would fix that.

The issue is when the expected type is not visible (which is most of the time).

foo(..(..(x,y), ..(a -> b)))

I have no clue what the argument is. I need to look up the signature of foo, then see the argument type is Bar, I need to go look up the argument types of Bar.
The IDE can help, but code is still often consumed outside IDEs.

Another argument against this proposal is refactoring. The proposal would take Scala a huge step closer to untyped languages where refactoring with confidence is extremely hard. The most basic promise of a type system is that I can change an expected type (parameter type of foo) and the compiler will tell me where I need to go and apply fixes. That breaks apart.

9 Likes

That’s certainly a concern. What about a proposal that scrupulously avoided the cases where expected types were not visible anywhere?

1 Like

I agree with Lukas here.
Readability and refactoring is really important.

In the case of [] for Seq it might be simple, it would just call Seq.apply.

But with case classes, there are infinitely many types.
What happens if you have

case class A(x: Int)
case class B(x: Int)
type T = A | B

val t: T = [x = 5]

Should this be disallowed?
Similar could happen with ADTs…
There are probably many other edge cases which will pop up and make scala more complicated.

2 Likes

OK, so let’s compare it to what it would look like without the proposed syntax:

foo(Foo(Foo.Bar(x,y), Map(a -> b)))

Did that fix the problem? No, because names like foo, x, y, a, b don’t tell you anything, and adding more Foos and Bars doesn’t do anything to change that. It’s fine to argue against this proposal, but if you’re going to give an example, then please choose one that actually demonstrates your point, meaning one where adding type information that can be inferred actually helps. And my point is that it’s actually not that easy to find one.

Let’s look at some code that somebody might actually write:
os.proc(‥("ls", "-l"))
and compare it to what we have today:
os.proc(List("ls", "-l"))
No, that’s not easier to read.

Or let’s make up an example:

timeUntilBirthday(‥now(), ‥("Martin", ‥of(1958, 9, 5)))

compared to

timeUntilBirthday(ZonedDateTime.now(), Person("Martin", LocalDate.of(1958, 9, 5)))

No, this isn’t easier to read, in fact it’s harder to read because there’s more clutter distracting from the meaningful bits. It’s easy to notice when not enough information is available, whereas it’s very hard to notice when you miss something important because you’re being drowned in information that might be relevant in some circumstances but most of the time is not. That shouldn’t lead us to the conclusion that more information is always better – it clearly isn’t. We already have many places in Scala where we can add redundant information if we believe that it helps the reader, but we usually have the compiler infer it:

  • type parameters for function calls
  • named parameters for function calls
  • type annotations for variable declarations
  • type annotations for function return types
  • lambda argument types

The exact same points that are being made here could be made for all of these: “what do you mean the compiler will infer type parameters for method calls? That’s going to be so hard to read!”. But that’s not how most of the community feels about this. Editors these days can display most of this stuff for you if you want. The fact that most people don’t permanently run their editor in such a mode is telling us something: it’s more distracting than helpful most of the time.
Once again, it’s fine to argue against this proposal. But when you do, please don’t use arguments that apply to so many other features in the language that are already there and are generally considered a good idea.

You could argue that there’s some kind of sweet spot in the “explicit vs. inferred” spectrum, i. e. that all these features that we have now are fine, and maybe relative scoping by itself would be fine too if one or several of the others weren’t there. But I don’t buy that either because there’s many different kinds of code that people write and consequently they need different tools to make their code readable. One Size Does Not Fit All.

Besides, not spelling out the name of the type you’re passing to a function is actually the far more common case. You don’t see the type when you’re passing any other kind of expression like a variable or another function call to the function:
timeUntilBirthday(today, martin). You could spell that with type ascriptions: timeUntilBirthday(today: ZonedDateTime, martin: Person), but nobody ever does, and that should tell us something. What you’re trying to argue here is that expressions like p (one letter variables are very common in lambdas) don’t need a type ascription, whereas an expression like ‥("Martin", ‥of(1958, 9, 5)) somehow does, despite the fact that it contains way more clues as to what’s going on than an expression like p does.

I’m sorry, it doesn’t compute, and the more I think about this argument, the less I’m buying it. I’m more now convinced than ever that this would be a huge boon to the language.

1 Like