Pre-SIP: a syntax for aggregate literals

Agree with the whole message, I’d like to add a little to it

For me, this is an absolute no-no, if you’re going to re-use a value like dob, you should make it strongly typed.
This has a downside when refactoring:

List(
  ("Leslie", (1966, 9, 15)),
  ("Johnie", (1966, 9, 15)),
)
// refactored to
val dob = (1966, 9, 15)
List(
  ("Leslie", dob), // error
  ("Johnie", dob), // error
)

You’ll note that refactoring is basically the only time when this would happen, furthermore, when refactoring, we tend to pay close attention to the compiler’s output.
Therefore a better solution would be a kind of “quick action” coming with the error, in the same way git has in VS code.
It could say something like:
dob is of inferred type (Int, Int, Int), where DateOfBirth was expected, however the definition of dob, (1966, 9, 15), is a valid literal for type DateOfBirth, do you want to had the explicit type DateOfBirth to dob ?”

2 Likes

If it was made in user-land, could it be added to the standard library ?
(Asking mainly SIP folks)

That would strike a good balance between “not complicating the language further” and “there’s 20 libraries that work differently”

2 Likes

IMO this proposal is beneficial only for the ~5% of time spent with a piece of code when writing it. Everything else (reading, maintaining, refactoring, diffing) is made worse. The impact would be more severe than implicits (because it would be used a lot more widely) without any gain to language expressiveness.

Also, tooling around Scala is already lagging and this would throw it back further. IDEs are not the only thing, e.g. code reviews.

Scala is already a very concise language. Hiding away what data types are being used makes it a lot harder to read and to reason about.

9 Likes

That is temporary and an excuse not to change anything in the language.

This is a good point. But we need to make this point about every substantive change to syntax.

The proposal right now is to change givens to given [A] => Ord[A] => Ord[List[A]]:. The tooling comment hasn’t come up there in the discussion (but maybe it has in the meetings?).

Named tuples are already in as an experimental feature, and that is also a syntax change that tooling won’t immediately support.

So, while true, I don’t think this should count extra against this specific proposal save to the extent that this is a tough one to align with tooling.

Actually, the main draw for me is in reading.

val x = Array(
  MyFavoriteThing(2, MyFavoriteThing.Watevr("eee")),
  MyFavoriteThing(174, MyFavoriteThing.Watevr("gg")),
  MyFavoriteThing(48, MyFavoriteThing.Watevr("m"))
)

is such a pain to read. Creating is easy: write one line, cut and paste, edit the parts you want to be different. But you can’t easily get 2, "eee" out, and really that’s what matters.

This is much easier to read yet has no ambiguity about what the types are:

val x = Array[MyFavoriteThing](
  ..(2, ..Watevr("eee")),
  ..(174, ..Watevr("gg")),
  ..(48, ..Watevr("m"))
)

It’s easier to read yet if you vertically align the values, which is commonly considered to be bad practice but the idea that it’s bad practice seems to be bad practice given that it favors writing over reading:

val x = Array[MyFavoriteThing](
  ..(  2, ..Watevr("eee")),
  ..(174, ..Watevr("gg")),
  ..( 48, ..Watevr("m"))
)

There isn’t, if done carefully, anything bad about this that isn’t already a problem with type inference. This is just constructor inference, which is the dual of type inference.

val m: List[Int] = List[Int](1, 2, 3)
val m = List[Int](1, 2, 3)   // Type inference
val m: List[Int] = (1, 2, 3) // Constructor inference

How can line 3 be any harder to read than line 2? We had unnecessary redundancy; in one case we removed it on the left and the other on the right.

The trick with this proposal is to make sure it is used to remove unnecessary redundancy that just makes everything harder for humans (albeit maybe easier for compiler and tooling developers, and at some point that’s the overriding concern), so that enough redundancy is removed to make it worth bothering but not so much as to reduce comprehensibility. And that is the hard part.

7 Likes

This is the one rather annoying thing in current Scala; explicit types on public API are very much recommended but sometimes you end up having to write the type twice.

The proposal would fix that.

The issue is when the expected type is not visible (which is most of the time).

foo(..(..(x,y), ..(a -> b)))

I have no clue what the argument is. I need to look up the signature of foo, then see the argument type is Bar, I need to go look up the argument types of Bar.
The IDE can help, but code is still often consumed outside IDEs.

Another argument against this proposal is refactoring. The proposal would take Scala a huge step closer to untyped languages where refactoring with confidence is extremely hard. The most basic promise of a type system is that I can change an expected type (parameter type of foo) and the compiler will tell me where I need to go and apply fixes. That breaks apart.

8 Likes

That’s certainly a concern. What about a proposal that scrupulously avoided the cases where expected types were not visible anywhere?

1 Like

I agree with Lukas here.
Readability and refactoring is really important.

In the case of [] for Seq it might be simple, it would just call Seq.apply.

But with case classes, there are infinitely many types.
What happens if you have

case class A(x: Int)
case class B(x: Int)
type T = A | B

val t: T = [x = 5]

Should this be disallowed?
Similar could happen with ADTs…
There are probably many other edge cases which will pop up and make scala more complicated.

2 Likes

OK, so let’s compare it to what it would look like without the proposed syntax:

foo(Foo(Foo.Bar(x,y), Map(a -> b)))

Did that fix the problem? No, because names like foo, x, y, a, b don’t tell you anything, and adding more Foos and Bars doesn’t do anything to change that. It’s fine to argue against this proposal, but if you’re going to give an example, then please choose one that actually demonstrates your point, meaning one where adding type information that can be inferred actually helps. And my point is that it’s actually not that easy to find one.

Let’s look at some code that somebody might actually write:
os.proc(‥("ls", "-l"))
and compare it to what we have today:
os.proc(List("ls", "-l"))
No, that’s not easier to read.

Or let’s make up an example:

timeUntilBirthday(‥now(), ‥("Martin", ‥of(1958, 9, 5)))

compared to

timeUntilBirthday(ZonedDateTime.now(), Person("Martin", LocalDate.of(1958, 9, 5)))

No, this isn’t easier to read, in fact it’s harder to read because there’s more clutter distracting from the meaningful bits. It’s easy to notice when not enough information is available, whereas it’s very hard to notice when you miss something important because you’re being drowned in information that might be relevant in some circumstances but most of the time is not. That shouldn’t lead us to the conclusion that more information is always better – it clearly isn’t. We already have many places in Scala where we can add redundant information if we believe that it helps the reader, but we usually have the compiler infer it:

  • type parameters for function calls
  • named parameters for function calls
  • type annotations for variable declarations
  • type annotations for function return types
  • lambda argument types

The exact same points that are being made here could be made for all of these: “what do you mean the compiler will infer type parameters for method calls? That’s going to be so hard to read!”. But that’s not how most of the community feels about this. Editors these days can display most of this stuff for you if you want. The fact that most people don’t permanently run their editor in such a mode is telling us something: it’s more distracting than helpful most of the time.
Once again, it’s fine to argue against this proposal. But when you do, please don’t use arguments that apply to so many other features in the language that are already there and are generally considered a good idea.

You could argue that there’s some kind of sweet spot in the “explicit vs. inferred” spectrum, i. e. that all these features that we have now are fine, and maybe relative scoping by itself would be fine too if one or several of the others weren’t there. But I don’t buy that either because there’s many different kinds of code that people write and consequently they need different tools to make their code readable. One Size Does Not Fit All.

Besides, not spelling out the name of the type you’re passing to a function is actually the far more common case. You don’t see the type when you’re passing any other kind of expression like a variable or another function call to the function:
timeUntilBirthday(today, martin). You could spell that with type ascriptions: timeUntilBirthday(today: ZonedDateTime, martin: Person), but nobody ever does, and that should tell us something. What you’re trying to argue here is that expressions like p (one letter variables are very common in lambdas) don’t need a type ascription, whereas an expression like ‥("Martin", ‥of(1958, 9, 5)) somehow does, despite the fact that it contains way more clues as to what’s going on than an expression like p does.

I’m sorry, it doesn’t compute, and the more I think about this argument, the less I’m buying it. I’m more now convinced than ever that this would be a huge boon to the language.

1 Like

Yes, that would be disallowed. Non-opaque type aliases are generally transparent, so the code should be behave the same whether you use a type alias or substitute its definition. So val t: T = [x = 5] would be like val t: A | B = [x = 5]. But this can’t be desugared because the type A | B doesn’t have a companion object and […] (in the original proposal) is specified as a call to the companion object’s apply method.

No it couldn’t, it’s quite clear. The proposal is: an expression like ‥Foo in a position where a value of type Bar is expected (type parameters being treated transparently) is equivalent to Foo.Bar. It works like this:

enum Either[A, B]
  case Left(a: A)
  case Right(b: B)

val e: Either[Int, String] = ‥Right("foo") // equivalent to:
val e: Either[Int, String] = Either.Right("foo")

There are probably many other edge cases which will pop up and make scala more complicated.

Name one.

Yeah as @Ichoran mentioned, converting

val x: Array[MyFavoriteThing] = (
  ..(2, ..("eee"),
  ..(174, ..("gg")),
  ..(48, ..("m")),
  ..(2, ..("eee"),
  ..(174, ..("gg")),
  ..(48, ..("m"))
)

or

val x: Array[MyFavoriteThing] = (
  (id = 2, metadata = (salt = "eee"),
  (id = 174, metadata = (salt = "gg")),
  (id = 48, metadata = (salt = "m")),
  (id = 2, metadata = (salt = "eee"),
  (id = 174, metadata = (salt = "gg")),
  (id = 48, metadata = (salt = "m"))
)

to

val x = Array(
  MyFavoriteThing(2, MyFavoriteThing.Watevr("eee")),
  MyFavoriteThing(174, MyFavoriteThing.Watevr("gg")),
  MyFavoriteThing(48, MyFavoriteThing.Watevr("m")),
  MyFavoriteThing(2, MyFavoriteThing.Watevr("eee")),
  MyFavoriteThing(174, MyFavoriteThing.Watevr("gg")),
  MyFavoriteThing(48, MyFavoriteThing.Watevr("m"))
)

or

val x = Array(
  MyFavoriteThing(id = 2, metadata = MyFavoriteThing.Watevr(salt = "eee")),
  MyFavoriteThing(id = 174, metadata = MyFavoriteThing.Watevr(salt = "gg")),
  MyFavoriteThing(id = 48, metadata = MyFavoriteThing.Watevr(salt = "m")),
  MyFavoriteThing(id = 2, metadata = MyFavoriteThing.Watevr(salt = "eee")),
  MyFavoriteThing(id = 174, metadata = MyFavoriteThing.Watevr(salt = "gg")),
  MyFavoriteThing(id = 48, metadata = MyFavoriteThing.Watevr(salt = "m"))
)

…really doesn’t make the code any clearer. I agree that Scala is generally a concise language. But there are some notable exceptions, such as the scenarios being addressed by this proposal

There are cases where the type is important for readability, and cases where the type hinders readability, depending on how “obvious” it is. This is always a use-site decision. Because how obvious the types are depends on the surrounding context at the use-site, and cannot be determined up front at the declaration site or at a language-level.

In general, Scala has type inference with optional type annotations to let the developer make the decision on a case-by-case basis. This would be a different mechanism, but spiritually it’s basically type inference, just for the constructor type rather than param/type-param/variable/return types

As @lrytz mentions, this proposal (or some variant thereof) solves a bunch of conundrums:

  1. We are encouraged to put explicit type ascriptions on public definitions, but that causes verbose repetitiveness when you need to repeat the type on the RHS, and encourages us to leave the type ascription off

  2. We are encouraged to use meaningful type names to convey meaning, but when these type names get repeated over and over in large data structures it encourages us to use short type names

Unlike some others, I don’t think import MyFavoriteThing.{apply => p} and calling p() is a good compromise. That looks like a worst-of-all-worlds compromise that can work, but giving things short/arbitrary/inconsistent/meaningless names just to avoid verbosity is not an ideal outcome if the need for a meaningless name could just be avoided entirely

Scala generally does not force you to put types in a lot of places. You can easily have huge method chains without type annotations, complicated nested lambdas without type annotations. Equivalent code to foo(..(..(x,y), ..(a -> b))), where the lack of types causes problems, is always possible using existing language features. I don’t think it consistent to say “people are too stupid to figure out when to elide constructor types where appropriate” while also saying “we love eliding param/type-param/variable/return types where appropriate”. In the end, it’s just type inference, and developers who can use their judgement on one can use their judgement on the other.

3 Likes

Sure, but let’s not also take the case where we’re hard-coding lots of data in our source files. That’s not something that should be done often anyway.

The code that I wrote for this isn’t public, so I’ll be going through the gist of it rather than the actual code, but I do have examples for where name elision would have been a terrible idea.

Furthermore, it would be a terrible idea only upon revisiting the code, not when I wrote it initially. That is, it’s exactly the sort of maintenance trap, enabled by the language, that must be resisted by forethought and good practices. Which is to say, it’s a trap that people are going to fall into frequently.

So, suppose we come across some code–let’s suppose we’ve used ..( ) notation rather than bare tuples for the feature–and read

val strain: Strain = ..(
  "CF512",
  ..(
    ..(..II, ..(..("rrf-3", ..("b", 26))),
    ..(..IV, ..(..("fem-1", ..("hc", 17))),
  )
)

Now, my background makes it trivial for me to tell what’s going on here. I know exactly what this gobbledygook means. Good luck anyone else figuring it out, though! Just to be sure that it isn’t the clutter of ..( ), let’s do it again with [ ] and no .. for relative scoping.

val strain: Strain = [
  "CF512",
  [
    [II, [["rrf-3", ["b", 26]]],
    [IV, [["fem-1", ["hc", 17]]],
  ]
]

No, that doesn’t help. Let’s do it with full field annotations instead:

val strain: Strain = ..(
  name = "CF512",
  genotype = ..(
    ..(csome = ..II, mutations = ..(..(name = "rrf-3", allele = ..(code = "b", n = 26))),
    ..(csome = ..IV, mutations = ..(..(name = "fem-1", allele = ..(code = "hc", n = 17))),
  )
)

Well, that’s better. It seems to have something to do with genetics? Pretty obscure what any of the data types are actually called, though–you’d better have an awesome IDE or be ready to do a lot of digging. But at least you’re kind of oriented now.

What about the version with types, though?

val strain = Strain(
  "CF512",
  Linkage(CSome.II, Gene("rrf-3", Allele("b", 26)) :: Nil) ::
  Linkage(CSome.IV, Gene("fer-15", Allele("hc", 17)) :: Nil) ::
  Nil
)

That’s better yet! We now have some vague idea that it’s important to group things by “linkage” which seems to have something to do with CSome (probably an enum?), and we’ve got genes with (probably) names and (certainly) alleles. If you’re not using an IDE or browsing on GitHub or something, you probably know to look in Gene.scala to see what’s up with genes.

This is better yet, is pretty much how I wrote it, and if we’re not careful I would have no way to prevent someone from turning this into the number-and-string soup that we started with.

And if we’d had both types and field names, it would have been even clearer.

I understand the number and string soup, though. There’s no reason for me not to write it, except that it makes everything more impenetrable. It’s usually a mistake, but oh is it tempting!

But it’s also always dependent on the context of who the programmer is. One of the big challenges in software development is to guide ourselves towards writing code that works for others, including ourselves later on.

You can write beautiful, clear, maintainable Perl code. But practically nobody did because the language so richly rewards you for doing what you understand right now in the moment, not what you or anyone else can figure out once the context that you had in your head when you wrote it is lost.

So I can read ["rrf-3", ["b", 26]] and go, oh, of course that’s a gene name with its allele (with the lab code and allele number separated for some reason). That’s not the right question, though, if anyone else is ever going to look at this code. The question is whether you can.

For the record, although I brought this up, I also do not think this is a good general-purpose compromise. It’s the kind of thing that one would occasionally reach for in special cases.

This is why it’s important to not just do it, but rather, if it’s going in at all, to figure out what kind of feature it actually is and what the bounds of it are. Depending on how it is conceived, it may or may not have other edge cases. I previously discussed four different conceptualizations of what this feature could be.

I don’t think we need to be afraid of edge cases in general, though. We need to know what we’re doing, and then think things through–maybe not enough to find every edge case, but enough to know how many there are going to be. If we hit a conceptualization which risks a large number of edge cases, then yes, it’s cause to at least be wary.

There’s a very important exception to this: newtypes which are there solely to prevent errors.

If I have a class

class OrbiterAcceleration(value: NewtonSecondsSquared, timestamp: TimeSinceLaunch) {}

I sure as heck don’t want [a], [t] in for those values, even if NewtonSecondsSquared(x) and TimeSinceLaunch(t) work. The entire point is to have an extra safeguard. This is very explicitly and intentionally something that is NOT intended to be context-dependent. The context is that you don’t want to lose your spacecraft. The context is that you must state that you have the units right every time you have to touch this.

But

val x: NewtonMetersSquared = ..(2.7)

is still fine, because we made the statement.

4 Likes

I’m also not sure it’s a use-site decision.

We have infix and into to control usages. Maybe infer could modulate whether the special syntax is allowed for inferred types.

scalafmt -rewrite to convert between styles.

Currently, I don’t have a case class that says MyFavoriteThing(id = 2) without extra work.

Maybe the better-literals syntax plugin should be required.

Or remove Numeric from genericNumericLiterals.

Instead of FromDigits, require FromBrackets.

In addition, FromBraceless for tables formatted with tabs.

2 Likes

That feels pretty disingenuous, you chose those names to be meaningless !

And even like that, it’s at least clear the a -> b is for a Map (it could be any type which requires at least one pair)

A more realistic example could be:

foo(Alignment(Aligment.FromCoordinates(x,y), Map(a -> b)))

It’s instantly much clearer what this does than the original (this is the first example that kinda fit the original example, there’s surely much better)

I think this is a very important point, and your two examples support it really well, I have to say I completely agree

So far the only use case I can fully get behind is the following:

val x: AnExplicitType = (my, parameters, for_the_type)

And maybe one of these (regardless of keyword):

val x: Array[MyFavoriteThing] = (
  (2, ("eee"),
  (174, ("gg")),
  (48, ("m")),
  (2, ("eee"),
  (174, ("gg")),
  (48, ("m"))
)

or

This is in contrast to integer literals, where the following is completely fine for me:

x.max(9_000_000_000_000_000_000_000)

Sure it’s a BigInt, but it’s still a number, compare this to the following:

entity.move(..(a, b))

It could be a Position or a Vector, and each of these could be in 2D or more, in carthesian or in polar coordinates.
(As an example, in Unity, all vector3 can be initialized with only the x and y coordinates, as this is very useful when doing 2.5D games)

Of course you would now that if you know the signature of move,but even then, it could depend on entity (for example if it is an Object2D or an Object3D)

4 Likes

So you call me disingenuous for going along with somebody else’s foo/x/y/a/b style example, and then you come up with an example where the function name is still foo while picking descriptive names for the types and call that a realistic example…

Come on, let’s try to argue in good faith here.

This is in good faith. foo could just as well be named process–you can’t expect that the method name will contain enough information to drastically change the understanding.

I used foo, x, y, a and b as they were part of the original example:

I therefore decided not to change these parts of the expression, this was probably overly zealous of me

But I realise now you maybe wanted to criticize the original example for its lack of relevant name, and argue that therefore it’s normal the example is unreadable

If that is the case, I apologize for misrepresenting you and calling your example disingenuous, as it was actually fully in good faith

With this new understanding, here’s a new example:

updateTelescope(Alignment(Aligment.FromCoordinates(x,y), Map(setting -> newSetting)))
// vs
updateTelescope(..(..(x,y), ..(setting -> newSetting)))

I claim the second expression is much less clear than the first, even with better names

Of course we can have names precise enough that we really don’t need the types:

Instead of:

Use:

entity2d.moveToPosition(..(a,b))

Or even if we want to be more pedantic:

entity2d.moveToCartesianPosition(..(a,b))

But this feels like what a strong type system and overloading are here to avoid

Again here’s this example with names:

import myLibrary.cartesian.* // cartesian positions and vectors

// potentially many lines of code

entity.move(Position(a, b))
//or maybe more pedantic:
entity.move(Position2d(a, b))
4 Likes

Yes, exactly. I’m glad we were able to clear this up.

I’m going to have to come back to a point that I made earlier: in many (most?) cases you’re not going to construct an Alignment locally, you’re going to pass a variable or some other kind of expression, and the lack of type ascriptions in that case doesn’t seem to bother anybody. And yet, when you’re constructing the object right there and thus have more information about what’s contained in it, you suddenly insist on spelling out the types – this is a discrepancy that I think needs to be explained. And actually I still think that the real issue here is poor function naming – it should be named alignTelescope. Similarly, for the entity.move example, I think the solution isn’t entity.move(Position(a, b)) or entity.move(Vector(a, b)) but entity.moveTo(..(a, b)) or entity.moveBy(..(a, b)).

Really? I mean, be honest: how long does it take for someone new to the project to learn that something like rrf-3 is a gene? Because that looks like the kind of domain knowledge that you would pick in the first week or so, like the fact that US67079K1007 is an ISIN. Would they actually have to discover that this project is dealing with genes, or would they know that before they even join the project? And actually, assuming I had some domain knowledge, I think I’d actually prefer the ..(..II, ..(..("rrf-3", ..("b", 26))), variant that you described as “number and string soup” – because I can immediately see the important parts and I don’t have to search for them amidst the clutter.
Maybe at this point it comes down to preferences and priorities of different people. Apparently some people are less sensitive to clutter than I am. And apparently some people place higher priority on making code understandable to people who lack basic domain knowledge and also can’t or won’t use an editor that can show this information to them. But to be honest I think this isn’t something that we should place a priority on. I’d rather get rid of the clutter.

That’s just one example. I mean, be honest: how long does it take to read the long form–again, for one example?

If you’re not routinely putting data in your code, what’s the big deal? If you are putting data in your code, maybe that’s the problem?

Of course people can gain expertise and then be productive, but there’s a bigger barrier to everything. At some point you stop being willing to look at the code because it’s too hard to understand. You get even more panicked when the person who knows that stuff finds another opportunity somewhere else.

The question is how to get the balance right: make as little stuff as possible needlessly hard for anyone, and then when a tradeoff is unavoidable, try to empower the community as much as possible (and yes, restricting features so that you can better understand what is going on is a type of empowerment).

1 Like

Well, for one thing, the long form doesn’t actually fit on a line where I first read it, which was on my phone. You could say that phones aren’t suitable devices to read code on, but then, why read code outside of a proper editor that can show you what the compiler is inferring? I cannot understand why somebody would optimize a language for being read with unfit tools.

It’s not only about data, it’s about constructing objects, and that is a major part of every program I’ve ever worked on. But actually, even placing some amount of data inside programs is common. Tests are a very common case, and as I mentioned earlier, my team decided to go with Typescript + cdk8s rather than Scala + zio-k8s to generate our k8s manifests specifically because Typescript doesn’t require you to spell out the types. I don’t see anything wrong with either of those and I’d like them to be convenient and without unnecessary clutter. There’s a reason why basically every modern programming language has some form of collection literal.

I think that’s a fundamental philosophical difference between us. I see the developer responsible first and foremost, and I think it’s not the language’s (or the language designer’s) job protect developers from themselves. And yes, restrictions are useful, but the language’s job isn’t to place arbitrary restrictions on the developers but rather to provide facilities for the developer to restrict himself when he feels that’s useful.

And again, all the points that are being made here have been made before about type inference and other similar features (like not needing to spell out function parameter names). I think they were wrong then and I think they’re wrong now, too, and I’m still waiting for a reason why passing a variable to a function without ascribing a type is fine, whereas passing a freshly-constructed object without a type ascription is somehow a problem. I believe it’s not, and I think the concerns are more due to the “this is new and weird” feeling than anything else.

Anyway, I feel we’re going in circles a bit here regarding this “readability” point.