Pre SIP: Named tuples

lrytz · November 25, 2024, 12:34pm

I know it was discussed before and you disagree, but since you say it’s either the current design or reversed subtyping: there’s also the option to have no subtyping at all (or not yet), have extension methods for both conversions, and convert literals to named tuples by expected type.

It could be less of an irregularity in the context of Pre-SIP: a syntax for aggregate literals.

soronpo · November 26, 2024, 12:27am

Currently this fails to compile. Should it?

val x = summon[Tuple <:< NamedTuple.AnyNamedTuple]

Ichoran · November 26, 2024, 7:55am

In my prototype of named tuples as tuples of named types, the named types had no subtyping relationship with the original type and on balance I found that this resulted in the kind of safety I wanted named types for. In particular, a key use-case was in making sure that a value of a particular type is used correctly–e.g. range(index: Int \ "index", len: Int \ "len") would make sure you didn’t range(5, 10) thinking it meant from 5 until 10. The corresponding tuple, (Int \ "index", Int \ "len"), has the same problem: avoiding the ambiguity of (5, 10) was the main point for me.

I found greater friction in unwrapping named types than in wrapping them, suggesting that the more convenient subtyping relationship is named <:< unnamed, but there were cases where doing this automatically would have led to error, so I left it without subtyping.

To the extent that named tuples are for convenience at the expense of safety, the answer may be different there. But I heartily recommend no subtyping relationship, with explicit methods to convert in either direction.

odersky · November 26, 2024, 3:51pm

In fact, I was wrong. @EugeneFlesselle pointed out to me that we do automatic .toTuple insertion. But the logic was missing cases, in particular it did not follow aliases. This is fixed in Fix .toTuple insertion by odersky · Pull Request #22028 · scala/scala3 · GitHub.

bishabosha · November 29, 2024, 8:24am

Since no-one updated here, it seems a decision was made to revert Named Tuples back to an experimental feature in 3.6.2-RC3

odersky · November 29, 2024, 11:37am

Yes, NamedTuples will stay experimental for the 3.6 cycle. They are accepted as addition to the language, so the SIP is closed. But we will keep them experimental for this minor version to be able to tweak APIs without binary backwards compatibility constraints should the need arise.

jeremyrsmith · December 2, 2024, 10:37pm

To the first point, the SIP committee is ten people, right? I guess what I’m saying is: some SIP details need wider discussion. It wasn’t meaningfully (“at length”) discussed outside of the committee.

To the second point, it’s not a simple matter for someone to “step forward” and implement something in the compiler, no matter how strongly they may feel that it should be considered. Most of the people that could do this are probably either on the SIP committee, or are in a bubble with folks that are. Would anybody seriously look at a branch from someone outside the core group? The early rhetoric implied that the decision wasn’t open for consideration unless a core contributor would dissent.

I really believe that this subtyping direction will be one of those decisions that you will eventually reverse on and regret. Much like implicit conversions or other things that seemed obvious from a practical standpoint at some point, but on further reflection (or attempts at improvement) made soundness much harder to really achieve. And if we’re still using Scala in 2030, we’ll be cursing the decision and wondering what could have been if it had only gone the other way. It really is that big of a deal, in my opinion.

FWIW, you do have @lrytz (from the SIP committee) arguing for removing the subtyping relationship initially. This would be a more prudent design for now, that would allow further time to consider ergonomic and theoretical implications of the options.

Ichoran · December 3, 2024, 1:15am

You don’t need any compiler help to implement subtyping of labeled types!

I’ve played with all three. Here’s all you need. Change names as you prefer.

object NamedTypes {
  /** A stable identifier to disambiguate types by label */
  type Named = String & Singleton

  /** A named type; create with `val x: Int \ "eel" = \(5)`; access with `x ~ "eel"`
    * change to <: A = A or >: A = A for subtyping!
    */
  opaque infix type \[+A, N <: Named] = A
  object \ {
    inline def apply[A, N <: Named](a: A): (A \ N) = a
    extension [A, N <: Named](na: A \ N)
      inline def ~(n: N): A = na
  }
}
extension [A](a: A) {
  /** Associate a compile-time name with this value by giving the other (Singular) value */
  inline def \[N <: NamedTypes.Named](n: N): NamedTypes.\[A, N] = NamedTypes.\(a)
}
inline def literal[N <: NamedTypes.Named]: N = compiletime.constValue[N]
import NamedTypes.{ Named, \ }

The usage is pretty simple for singleton types.

val eels = 5 \ "eel"    // type Int \ "eel"
val nEels = eels ~ "eel"
// val x = eels ~ "cod" does not compile

def recipe(l: Int \ "lemon", c: Int \ "cod"): Int \ "dinner" =
  (l ~ "lemon" min 2*(c ~ "cod")) \ "dinner"

val food = recipe(5 \ "lemon", 3 \ "cod")
val notFood = recipe(4 \ "lemon", eels)         // Fails
val whoKnows = recipe(6, 2)                     // Fails
val nFood: Int = recipe(5 \ "lemon", 3 \ "cod") // Fails

If you add subtyping in one direction or the other, some of the explicit naming of types with \ or extraction of values with ~ go away.

But I like it best like this because there is little reason to use this aside from safety. Regular types will catch anything that isn’t ambiguous, so who needs names? It’s only when the types don’t help but the identity really matters that this is important.

If you choose <: A = A, then your use cases all get dangerous. In the case of function parameters, which are already named, no big deal. But what if you have

def recipe2(ingredients: (Int ~ "lemon", Int ~ "cod")): Int ~ "dinner" =

Nothing helps you use ingredients._1 and ingredients._2 properly. You can be sure you’re passing in lemons and cod, but within the method, all bets are off. Same deal if you destructure the ingredients; nothing requires you to get the labels right. In particular, if you change your interface and then need to fix it then things will silently be wrong.

>: A = A is the other option. Your usage is now correct inside recipe2, but you can pass in any old (5, 3). Is it an offset and length? Eels and more eels? Lemon and cod? Doesn’t matter! Again, if you change your interface and need to fix it, the compiler won’t help you.

So, for me, for named types, the right answer for correctness is clearly to have no subtyping relationship. (lemon = 5, cod = 3) to me should be unrelated to (5, 3) unless you call a conversion method. It should have accessors .lemon and .cod and that’s it. If we want to declare (5, 3) to be (_1 = 5, _2 = 3) that’s fine with me.

But the cool thing for me is that my flavor works out of the box already. I don’t need any compiler support to get things safe. If the compiler supports named tuples, and the named tuples don’t pull their weight, I don’t have to rely on them to get something like the feature. The syntax is a little bit more awkward. But it’s really not a big deal.

jeremyrsmith · December 3, 2024, 1:27am

I appreciate that you’ve experimented with it. And I understand your conclusion that we can already do “named tuples”, so in some sense it doesn’t matter if Scala gets it wrong.

But we do have just one shot here for the “blessed” named tuples, that will be supported by obvious syntax (i.e. foo = x vs your Int \ "foo" or "Int ~ "foo"). Introducing the syntax first (with no subtyping relationship baked in) would be pretty uncontroversial, but that’s not the way it’s going to happen. That’s what I’m complaining about.

MateuszKowalewski · December 9, 2024, 2:12am

I for my part see named tuples as convenience feature.

So I actually want some auto-“adaptation”.

Like said before, I don’t care much how it’s implemented. I’m really unsure this will matter at all in the end. (But we don’t know yet!)

So I’m fine with sub-typing and / or conversions in any variant.

But as others have strong feelings about that, how about the following compromise that will likely make the people who want strong type safety and the people who want convenience happy at the same time:

If named tuples had no sub-type relation to “unmaned tuples” (whatever this is, as “unmaned tuples” actually have named “fields”, like _1, _2, etc.), but there were conversions available you could just import wouldn’t this make everybody happy?

So the default would be you need to call conversion methods yourself. That’s safe, but depending on use-case annoying.

But you could import conversions; either in the one direction, or in the other, or even both at once (maybe a combining shorthand actually)—depending on what makes sense in context. The result, in that scope, would be less safe (we all know the traps of implicit conversion) but as convenient as having a built-in sub-type relation.

Maybe this here is in fact one of the seldom cases where employing implicit conversions is a good design choice? That’s the whole point of implicit conversions: They offer auto-“adaptation” among “unrelated” types.

Because besides convenience I really see no reason to have a sub-typing relation. Actually both sub-typing directions can be argued against, but also argued for, so the sub-typing story is imho very murky all in all. But I like in fact having auto-“adaptation”, otherwise the feature won’t feel “lightweight” and won’t be nice to use in quick-and-dirty scenarios. I’m OK if that part is solely implemented as implicit conversions. See no evil here.

What do you think?

PS: Thanks to the Scala team for reconsidering rushing this out!

I’m really looking forward to this feature, but I prefer to wait a little bit longer until being able to use it in production than to have something half backed stick “forever”.

bishabosha · December 10, 2024, 9:49am

One point against sub typing to “regular” tuples (and Product transitively): There is no runtime representation of field labels, meaning that the productElementNames method would be completely incorrect!

as it stands, you can’t call any Product methods on a named tuple, and this is correct imo

lrytz · December 10, 2024, 10:43am

I see this as a limitation / trade-off of the implementation.

If we were designing named tuples without thinking about how to implement them efficiently, there’s no reason for them not to implement Product.

I agree that it is a strong argument against the reversed subtyping direction, but not really an argument for the current one.

mberndt · December 10, 2024, 1:13pm

OK, but do we really care? I think those were a mistake to begin with, a Mirror should be used instead to access the field labels.

Fwiw, I find the current subtyping direction, i. e. named tuples as supertypes of regular tuples, outright bizarre. Normally supertypes contain less information than their subtypes – but named tuples contain more information, i. e. field labels. Normally it’s easy to add more subtypes to a type (just extend that type) and harder to add a supertype (need to add an extends clause to that type) – but here it’s the other way around. Normally supertypes have fewer methods than their subtypes, but again, in the current proposal it’s the other way around. I find this highly unintuitive.

The whole problem with tuples is that they make it easy to confuse the fields because they don’t have descriptive names – and now there’s a subtyping rule whose only purpose is that you don’t need to label the fields. That doesn’t make sense to me. The essential argument for the current subtyping relation from the spec seems to be this:

Addendum: Turning things around, adopting named <: unnamed for the subtyping and `unnamed → named for the conversion leads to weaker typing with undetected errors. Consider:
type Person = (name: String, age: Int)
val bob: Person
bob.zip((firstName: String, agee: Int))
This should report a type error. But in the alternative scheme, we’d have (firstName: String, agee: Int) <: (String, Int) by subtyping and then (String, Int) -> (name: String, age: Int) by implicit naming conversion. This is clearly not what we want.

By contrast, in the implemented scheme, we will not convert (firstName: String, agee: Int) to (String, Int) since a conversion is only attempted if the expected type is a regular tuple, and in our scenario it is a named tuple instead.

IMO, that is not an argument in favour of the current subtyping rule but against this use of implicit conversions.

The specification suggests that the only alternatives are to have either both the subtyping rule and the implicit conversion, or neither.

Looking at precedent in other languages it feels like we we do want some sort of subtyping for easy convertibility and an implicit conversion in the other direction. This proposal picks unnamed <: named for the subtyping and named → unnamed for the conversion.
[…]
A possibly simpler design would be to drop all conformance and conversion rules. The problem with this approach is worse usability and problems with smooth migration. Migration will be an issue since right now everything is a regular tuple.

I think that’s a false dichotomy, having the other subtyping rule (i. e. NamedTuple <: Tuple) and no implicit conversion seems like a viable (and more intuitive) approach to me.

Maybe I’m late to the party, but I agree with @lrytz. For now, subtyping and implicit conversions should be removed, and potentially added again in a separate SIP at a later stage.

sideeffffect · December 10, 2024, 4:46pm

I also think that we can postpone any subtyping or any other fancy conversion features.

Let’s just have bare named tuples for a while and then see after things settle down.

MateuszKowalewski · December 11, 2024, 2:35am

I’m still of the opinion that this features is incomplete, up to unusable for its purpose (which is for me an ad hoc, quick-and-dirty replacement for regular case classes) without some auto-adaptation.

Therefore there should be conversions (in both directions) available in the implementation.

But you would still need to import these conversion manually.

That makes the feature useful in the situations I like to use it, but it would be still “safe” by default.

So please include conversions!

I know it’s trivial to add your own, but imho this should come with the std. lib. (Especially as people will add such conversion themself if not available, and than the migration story will become more difficult if the std. lib would also provide such conversions at some later point in time.)

I see no real cost to have such conversions in the std. lib. (Do I overlook something?) So imho this should be a no-brainer. The one direction is anyway already there.

Besides that: I’m definitely in the camp of people who think that a sub-typing relation makes no sense; either way.

(The argument with “has named fields” and “does not have named fields” makes no sense to me as “unmamed tuples” actually have field names; just generic ones.)

Regardless, I have to state that I like the approach that the names are just a compile time fiction. This is imho a good design as it resembles C structs, where “field names” are also just compile time sugar for offsets. That similarity could make named tuples a feature that maps nicely to anonymous structs in Scala Native (likely with some more magic added around, of course). In that case you likely wouldn’t want automatic conversions to “regular tuples”. That’s another reason why it should be optional I think.

The main difference is though that a debugger for a native language like C/C++/Rust/Zig sees the names as they’re part of the debug information whereas there is nothing like external debug information in the case of the JVM. So you won’t see the names in the debugger, or would you? That’s kind of an issue, I think. (One could build a Scala debugger capable of reading something like DWARF, and generate that even for the JVM, but this looks like a very big adventure. Or maybe it would be enough to link a Scala/JVM debugger somehow to TASTy? Doesn’t the new Scala debugger do that actually? I’m not sure how it works in detail. But this goes now quite off-topic anyway…)

lrytz · December 12, 2024, 10:06am

It was pointed out at the core meeting yesterday that the SIP is already “accepted for shipping” (Process Specification | Scala Documentation) by the committee. So at this point it’s about testing for bugs, the design is done.

The SIP communication was lacking lately: meeting summaries were not published, pages (List of All SIPs | Scala Documentation) and PRs (https://github.com/scala/improvement-proposals/pull/72#event-15618867923) were not updated. So it was not possible for people outside the committee to know about the status. We need to improve this situation. This ties in to the upcoming “preview” stage for language features in the compiler (Introduce concept of Preview Features as a feature stabilization period · Issue #22044 · scala/scala3 · GitHub).

I guess in practice it doesn’t matter much whether a conversion is by subtyping or through an implicit conversion. The toTuple implicit conversion is enabled by default anyway, so one can go back and forth between named and unnamed.

scala> val p = (x = 1, y = 2)
val p: (x : Int, y : Int) = (1,2)

scala> p.x
val res1: Int = 1

scala> ((p: (Int, Int)): (y: Int, x: Int)).x
val res2: Int = 2

IIUC, the argument for unnamed <: named as implemented currently is the analogy to argument lists. If we had first class argument lists, we would certinaly allow val args = ("bob", 72); foo@args, the names are inferred. This makes sense to me.

But it bothers me from a “types == sets of values” perspective. (String, Int) is any String / Int pair, or alternatively the set of unnamed String / Int pairs. The type (name: String, age: Int) is pairs with name and age fields.

There’s also some inconsisntency in pattern matching, a named tuple matches an unnamed pattern, not the other way around. But that won’t change, no matter if there’s subtyping or not.

scala> val p = (1, 2)
val p: (Int, Int) = (1,2)

scala> val n: (x: Int, y: Int) = p
val n: (x : Int, y : Int) = (1,2)

scala> p match { case (x = 1, y = 2) => true }
-- Error: ----------------------------------------------------------------------
1 |p match { case (x = 1, y = 2) => true }
  |                ^^^^^
  |               No element named `x` is defined in selector type (Int, Int)
-- Error: ----------------------------------------------------------------------
1 |p match { case (x = 1, y = 2) => true }
  |                       ^^^^^
  |               No element named `y` is defined in selector type (Int, Int)
2 errors found

scala> n match { case (1, 2) => true }
val res0: Boolean = true

ragnar · December 12, 2024, 12:52pm

I read that multiple times in this thread, but I find that is only one perspective.

As an alternative, I think it’s perfectly fine to think about named tuple types and the corresponding unnamed tuple type to all have the same values. There is nothing strange about this from a type theoretic perspective, and this is also how opaque types usually work.

Sure, one could make an argument that one would prefer to have separate values (and I mean values semantically here, not in a “specific JVM representation” way) for named and unnamed tuples, but both are choices and I have not seen any argument why one is fundamentally better than the other.

In regard to subtyping, as both sets of values are the same, both subtyping directions are valid, and which (if any) to pick depends on usability question.

So the criticism I do agree with is that the question of usability is one that needs time and examples to figure out.
Given that the SIP was seemingly approved before a full implementation existed (as far as I can tell, the implicit conversion from named → unnamed did not work until 2 weeks ago or so) I do not believe that anyone in the SIP committee a full understanding of the usability questions.

Thus, I think the following position:

Looks pretty bad to anyone following the language evolution. In particular with the admittance that the whole process is nearly non-transparent to anyone not on the committee. I would suggest revising it to something along the lines of

The SIP as implemented in Scala 3.6.2 is accepted, please do thoroughly test the feature for your usecases. If no major concerns are raised, the feature is on track to become non experimental in Scala 3.7.

To be clear, I think the design of the feature is good at least the parts I have tried, I personally would have been fine with releasing this as is in 3.6. But given that it remains experimental (after raised concerns), I don’t see a world where admitting that you are still open to address significant concerns/issues is disadvantageous.

If you want me to make this complaint more formal, stage 3 (Implementation) of the SIP process is “Provide an Experimental implementation of the changes in the compiler. Evaluate how they hold up in practice. Get feedback from implementers and users.” I seriously question that this happened, given the lack of a full implementation before the vote on acceptance.

mberndt · December 12, 2024, 6:51pm

I couldn’t agree more. What’s the point of having an experimental implementation in the compiler when the outcome of the experiment has been pre-determined by the committee to be positive?

Ichoran · December 12, 2024, 8:10pm

This is one main reason why I implemented my named tuples as tuples of named types. If (name: String) is a type, then it’s pretty obvious how to think about (name: String, age: Int).

I’ve been trying to port some of my named type code to named tuples. Mostly I use named types for safety, and named tuples with subtyping and conversion are a low-safety feature because of bidirectional conformance. So I won’t be porting that unless conversions are optional and can be turned off.

But in other cases I parse through tuples of named types to extract names (e.g. for command-line argument parsing). When I use my named types, aside from the occasional lack of compiler support, it’s really no different than regular tuple destructuring because every element has its own name. Effectively, ((street: String), (zipcode: Int)) destructures with e *: tp to e: (street: String) and tp: ((zipcode: Int)). And I can get "street" using compiletime.constValue.

How is one supposed to do this with named tuples?

Do I destructure the names and values separately, using [N <: Tuple, V <: Tuple] and just compiletime.error all the impossible cases where N and T are different lengths? Should I be destructuring recursively with SplitAt and size instead? Is tuple-destructuring a bad idea anyway and everything should be done with index-walking if possible?

Going from “hey, named tuples works!” to “here’s something general-purpose that really shows the utility of named tuples” is a bit less obvious than I’d hoped. (Maybe because I was already doing it wrong.)

MateuszKowalewski · December 13, 2024, 4:33pm

That’s of course the case, and at least I said that I would be OK with it.

But the point is, you can make implicit conversion optional (you could turn it off by not importing it) but that does not work for subtyping. So it’s only the same in case you want the conversion. For the people who don’t want it it makes a real difference, and it’s not the same.

That’s why I’ve proposed the compromise of doing both directions with an (optional!) conversion. Would make me happy (I like the conversions) and would make the people happy who don’t want it (by default).

I see no real drawback. Besides additional effort on the feature, of course. But it seems the topic is so controversial that it would be worth it.

I just don’t see any winning argument. All presented arguments seem valid and none invalidates the others for good.

That’s a very interesting viewpoint! I think it didn’t come up until now.

But there is also another:

If you see (String, Int) as (_1: String, _2: Int) than it’s quite “obvious” that this is a completely unrelated type to (name: String, age: Int) (besides having the same runtime representation).

There are so many conflicting viewpoints that I think the safe treatment is to not specify any subtype relation. At least for now. It’s just too murky. No of the viewpoints is imho fundamentally better than the others. At least nobody presented arguments that invalidate most of the viewpoints so only one remains.

Regardless, thanks for keeping the discussion still open!