Pre SIP: Named tuples

Ichoran · December 4, 2023, 7:00pm

I completely agree that named <:< unnamed is a bad subtyping relationship, for the reasons stated.

But I’m not sold on unnamed <:< named either. Method names are in covariant position, which means that adding them should make you a subtype. That’s how it always works with everything else, and it sure feels like named tuples are “unnamed tuples but with method names added”. It’s really hard to shake that impression.

// How can you avoid thinking this??
class (foo: Foo, bar: Bar) extends (Foo, Bar):
  def foo = _1
  def bar = _2

// Especially given this:
// scala> ("eel", "cod").getClass.getMethods.find(_.getName == "_1")
// val res11: Option[java.lang.reflect.Method] =
//   Some(public java.lang.Object scala.Tuple2._1())

Furthermore, unnamed <:< named is useful in some situations but not in others. For example, if the entire point of (start: Int, length: Int) vs (start: Int, end: Int) is to make sure you interpret the pair of integers correctly, do we really just want to blithely pass in (3, 7)?!

And what is the interaction with structural typing?

def bippy(withFoo: { def foo: Foo }) = ???

val nt = (foo = Foo(), why = "example")
bippy(nt)  // Does this work?  If not, why not?

val ut = (Foo(), "example")
bippy(ut)  // We don't seriously want this to work, do we?!

This suggests to me that as tempting as subtyping might be to get the methods on named tuples to support names, we probably don’t want unnamed tuples as subtypes of named tuples.

Instead, I think the best analogy from within the language as it stands now is that named tuples are automatically generated opaque types over regular tuples.

opaque type Tuple2_foo_Foo_bar_Bar = (Foo, Bar)
object Tuple2_foo_Foo_bar_Bar {
  inline def fromTuple(fb: (Foo, Bar)): Tuple2_foo_Foo_bar_Bar = fb
  extension (named: Tuple2_foo_Foo_bar_Bar)
    inline def toTuple: (Foo, Bar) = named
    inline def foo: Foo = named._1
    inline def bar: Bar = named._2
}

plus compiler support for matching and so on. This also has downsides–you won’t get the names printed, for instance–but I think it’s the best straightforward compromise, and it has the advantage that you don’t need to unpack and repack objects just to get the used names straight.

Anyway, I’m not sure about this as a solution, but I’m pretty convinced by your arguments that named tuples as subtypes of unnamed tuples isn’t really what we want, but I am also convinced by other arguments that unnamed tuples as subtypes of named tuples also has substantial problems.

In the absence of a way around one or the other, the only solution is to not have a subtyping relationship.

odersky · December 4, 2023, 7:13pm

In fact, yes, as the example of Lukas showed. I do want to pass tuples to named tuples. If it was not for subtyping it would have to be a conversion (which I can’t define generically, so it would have to be compiler provided).

You can think of an unnamed tuple as a tuple with uncommitted names. It conforms to any named tuple of the same arity and value types, but once you have picked names you can’t change them anymore. Nothing behaves like that.

Ichoran:

And what is the interaction with structural typing?

def bippy(withFoo: { def foo: Foo }) = ???

val nt = (foo = Foo(), why = "example")
bippy(nt)  // Does this work?  If not, why not?

It does not work. Structural typing assumes some form of reflection or runtime dictionary that can retrieve elements based on names. Named tuples don’t give you that.

alvae · December 4, 2023, 8:27pm

But @Ichoran makes a good point, though. Only because it’s seemingly convenient in @lrytz’s example doesn’t mean that in general it’s a good idea to be able to carelessly convert (T, U) to (a: T, b: U) (or the other way around).

Here’s an example of how we can get into trouble:

case class Rectangle(origin: (x: Int, y: Int), dimensions: (width: Int, height: Int))

def makeRectangle: Rectangle = {
  val o: (Int, Int) = computeOrigin()
  val d: (Int, Int) = computeDimension()
  Rectangle(d, o) // Oops
}

I’ll grant that maybe it wasn’t the brightest idea to define the properties of the case class as named tuples, but we can expect this kind of code and I’m sure people will come up with more compelling examples in the same vein. It is not hard to imagine two tables of a database having homogeneous records of the same arity.

While I still firmly believe that conversions are error prone regardless of the direction, at the very least with (a: T, b: U) to (T, U) we don’t “invent” new constraints on APIs, we only erase the ones we have.

soronpo · December 4, 2023, 8:47pm

Regarding the implementation, there is a slight limitation for opaque types to have wild-card arguments:

opaque type NamedTuple[N <: Tuple, +V <: Tuple] >: V = V
transparent inline def mymacro: NamedTuple[?, ?] = ??? //error: unreducible application of higher-kinded type NamedTuple to wildcard arguments

What happens if I write:

transparent inline def mymacro: (name: ?, value: ?) = ???

devlaam · December 4, 2023, 9:21pm

odersky:

Either all elements of a tuple are named or none are named. It is illegal to mix named and unnamed elements in a tuple. For instance, the following is in error:
val illFormed1 = ("Bob", age = 33)  // error

What is the motivation for forbidding this? After all, from the outside a named tuple is just as much a tuple with names as it is a class without a name. And in the latter it is allowed to mix positional and named parameters.

Taking this analogy a little further, one could even allow for predefined or default parameters like this:

type Person = (name: String = "Bob", age: Int, length: Double)
val bobJunior: Person = (age = 33, length=1.82)
val bobSenior: Person = (age = 88, length=1.69)

You could have a debate if it should be allowed or not to redefine the value for name. Both may have its advantages I guess.

Ichoran · December 4, 2023, 9:29pm

But this is exactly the kind of thing that is a problem!

That something is convenient doesn’t make it a good idea. The only reason why r as a return value is safe is because you actually don’t need the names to do anything for safety. It’s purely documentation for the receiver. The feature could just as well be

def m(xs: List[Int]): @tooltip("sum", "log") (Int, String) = ...

I have no problem making named tuples zero-cost (the names being just compiler fictions), but I need a more compelling argument to not be convinced that one shouldn’t have to do something like

  def m(xs: List[Int]): (sum: Int, log: String) = {
    val r = xs.foldLeft((0, "")) {
      case ((acc, d), x) => (acc + x, s"$d$x")
    }
    // some work with `r`
    r.named
  }

where

extension [A, B](t2: (A, B))
  inline def named[Na = A, Nb = B]: (Na = A, Nb = B) = t2.asInstanceOf[(Na=A, Nb=B)]

or somesuch was in (effectively) in scope.

Also, giving an example with 3N object creations isn’t the most compelling to use when arguing against one more.

odersky · December 4, 2023, 9:29pm

I just tested it. It looks like it works.

import language.experimental.namedTuples

val x: (?, ?) = (1, 2)
val y: (name : ?, age : ?) = (1, 2)

Here’s the expansion:

    val x: Tuple2[? >: Nothing <: Any, ? >: Nothing <: Any] =
      Tuple2.apply[Int, Int](1, 2)
    val y:
      NamedTuple.NamedTuple[Tuple2["name".type, "age".type],
        Tuple2[? >: Nothing <: Any, ? >: Nothing <: Any]]
     = Tuple2.apply[Int, Int](1, 2)

Ichoran · December 4, 2023, 9:37pm

Sure, I can think of that, but that also means that the easy thing to do (not provide names) is also the dangerous thing to do (no typechecking).

That isn’t what I generally want out of a typesystem. lrytz’s feature looks to me like an argument against using named tuples.

In contrast, if you can take an unnamed thing, surely you can’t complain if you get it from having names that you discard. You already didn’t care about names, only positions and types, so who cares what they were?

alvae · December 4, 2023, 9:37pm

I don’t think we should understand named fields in opposition to positional ones. Names are just another way to refer to a position in a tuple. Otherwise we’re no longer talking about a tuple.

Isn’t that re-inventing case classes?

case class Person(name: String = "Bob", age: Int, length: Double)

odersky · December 4, 2023, 9:42pm

But that’s the crux of the matter! r.named is not definable, it needs a magic conversion. The definition you tried to give for named is not legal code.

I’ll shut up now. I think we are turning in circles.

bishabosha · December 4, 2023, 11:18pm

it should be defined as

extension [Ts <: Tuple](ts: Ts)
  inline def named[Ns <: Tuple]: NamedTuple[Ns, Ts] = ts.asInstanceOf[NamedTuple[Ns, Ts]]

and well if there’s no expected type then it will infer as NamedTuple[Nothing, Ts] which will I guess confuse the type printer

Ichoran · December 5, 2023, 12:16am

That’s because I was writing pseudocode. I thought the intent was clear enough.

Here’s a working library-level implementation of named 2-tuples which are opaque wrappers around 2-tuples but are not subtypes of tuples and use non-magical .named exactly as indicated in lrytz’s code: Scastie - An interactive playground for Scala.

(To be fair, transparent inline is hard to distinguish from magic sometimes. But anyway, it’s not novel magic; it’s the magic we already have.)

Because it’s library-level, I can’t use .name but instead have to use $ "name". That’s where a little compiler magic would be needed. (R uses $ instead of . for member access.)

The issue isn’t whether it can be implemented, but rather how we wish for the feature to work. unnamed <:< named is elegant but facilitates usage that is the polar opposite of one main use case of names: to safely disambiguate.

MateuszKowalewski · December 5, 2023, 3:34am

I disagree with the claims that named tuples need to provide some high level of type safety.

The opposite is the case: (Named) tuples are a construct for the case where you didn’t commit to strong typing (yet)!

They are for “liquid structures”—for explorative, or very generic work.

You start out with a tuple, just defining the types of your fields. In the next step you commit to some names for the fields as you figured that out. As things evolve further you give the baby a name and create a case class…

The whole point is that you don’t need to commit to the strongest of typing directly form the get go!

That’s exactly the issue with strong typing: It’s hard to use in cases you haven’t figured out everything already, or the structures you work with are very dynamic / generic.

I want named tuples so I have some middle ground between tuples and case classes. But they aren’t a replacement for cases classes of course.

I also don’t buy any examples like:

case class Rectangle(origin: (x: Int, y: Int), dimensions: (width: Int, height: Int))

def makeRectangle: Rectangle = {
  val o: (Int, Int) = computeOrigin()
  val d: (Int, Int) = computeDimension()
  Rectangle(d, o) // Oops
}

If you want strong typing just make origin and dimensions take better defined types.

With the argument above you would actually need to disallow positional arguments to functions entirely. All calls would need to spell out the parameter names. Always. Because otherwise there is the potential that you confuse some arguments.

Think for example of a Dimension case class like this here:

case class Dimension(width: Int, height: Int)

The compiler won’t prevent me from writing something like:

def getDimension =
   val w: Int = computeWidth()
   val h: Int = computeHeight()
   Dimension(h, w) // Oops!

When you use completely underspecified types like Ints or Strings that’s always the danger with positional arguments. Still nobody every proposed to get rid of them, because they’re obviously error prone.

Instead you would refactor to stronger types step by step when the need arises. You could for example introduce opaque type aliases for your loosely typed Int fields…

lihaoyi · December 5, 2023, 3:49am

I agree that Nothing, overloading, and implicits are existing cases of “compile-time” LSP violation. But I argue that they indeed fit the bill for “endless stream of edge cases”! We were just discussing two different edge cases around Nothing inference a few days ago, and I think most would agree that the existence of overloading makes everything more complicated, with tons of features that don’t work well together with it (default values, target typing, …). I don’t think adding more such compile-time LSP violations is desirable if there are easy alternatives available.

I think there is enough confusion in this thread to suggest that despite the technical validity of unnamed <: named, it’s pretty counterintuitive to a majority of users. There is certainly a mental model where unnamed <: named makes sense, but it seems that it is simply not the model that everyone already has in their heads

Given unnamed <: named seems empirically confusing, assuming we don’t want to go with python-ish named <: unnamed, I like @soronpo’s idea of just having all tuples be named, with unnamed tuples just being named tuples with the names "_1", "_2", etc.:

That would both keep things simple since there’s only one real implementation of named tuples with a thin desugaring for unnamed tuples, but also keep the two types distinct so named =!= unnamed.
We can add conversions in either direction, between named -> unnamed, unnamed -> named, and even between different named -> named tuples. Given that they are all the same runtime values, these conversions would be zero runtime cost, and IMO having people explicitly opt-in to convert between them is a very good idea for reasons @Ichoran and others have already brought up.
It opens up the possibility of mixed named/unnamed tuples for free, since in the end they’re all just named tuples with some of the names being "_n", there’s no additional complexity to worry about: you can append them, mix them in any order, and it should “just work”

MateuszKowalewski · December 5, 2023, 7:31am

Nevertheless nobody every proposed to not have overloading…

The reason is simple: People expect that overloading works. Actually, people even expect that “something like overloading” works in case of extension methods! Because overloading “feels natural”, no matter the technical challenges.

People also expect that they can use positional arguments in functions, even all functions have named parameter lists! Even that’s “unsafe” because you can confuse different parameters when they have the same types.

What’s the “easy alternative”?

Nobody proposed anything like that in that thread here until now.

The most realistic counter-proposal so far was actually to deliver a crippled feature—mostly like what they did in C#—and “just” make Tuples and NamedTuples completely unrelated types. This feels very unsatisfying!

You’re able to deduce that from a sample of around 10 people? Respect for that statistical insight. (And not even here we have an unambiguous “majority” btw. But decisions about technical innovations aren’t something you can go by majority vote anyway. Otherwise we would still program in assembly to this very day… ).

People who ever used positional arguments in functions would disagree…

Besides that: This feature is not out, so such claims make anyway no sense. You can’t have a mental model of something you never used.

But there is actually a very strong analogy to positional arguments in functions! So my guess would be that most people wouldn’t be surprised that you can pass a unnamed tuple where a named is expected.

But given the previous experience with (named) parameter lists in functions while calling them with positional parameters I guess people would be very surprised if that wouldn’t work. (At least not without additional ceremony. Ceremony that can’t be explained by the language, but needs outright magic that only the compiler could provide in some ad hoc way.)

Come on! Not even I would reach for such a stretch! (And I really love to exaggerate in arguments).

Emm, no.

That’s a contradiction.

If unnamed tuples are named tuples, there can’t be any distinction on the type level.

I hope that’s the case anyway. Because at runtime that should be just “flat structs” no matter what.

Again that argument?

Consequently you would need to ask users to explicitly “convert” positional parameter lists to named parameters on every function call.

That’s just plain ridiculous.

Well, that’s just restating my observation that the super-type of named tuples are ParameterLists… Because all the things mentioned here are already possible with parameter list.

But I clearly see that reworking the language in that direction is out of scope for this proposal here.

That’s not an issue as this could be done in some follow up (maybe Scala 4, or so).

I would disagree. There is plenty of subtle complexity in here!

Parameter list handling in Scala is already super complex. You would need the exact same complexity to handle “mixed” tuples.

I don’t say that’s undesirable, but that’s nothing you would do in the first iteration of the feature. That’s just to much to ask for in one iteration.

som-snytt · December 5, 2023, 7:42am

said nobody ever. Or do I mean never.

Nobody expects that overloading works, and it doesn’t feel natural.

Everybody knows that overloading is evil and is unnatural demon spawn.

My poignant observation is that unnamed <: named is the new PartialFunction <: Function.

lrytz · December 5, 2023, 7:48am

Yeah, seeing r.named working I think I prefer keeping the types separate. That would actually minimize compiler / language magic, either direction of subtyping has to be built in.

EDIT: though the ergonomics still matter - I think everyone agrees that an unnamed tuple literal can be used where a named tuple type is expected.

alvae · December 5, 2023, 8:03am

If names matter so little, then I don’t think we need named tuples in the first place.

I have already stated my position and won’t repeat my arguments but there are a couple of statements that rubbed me the wrong way:

I once gave a talk in which I said we should get rid of overloading in PLs because that’s a confusing feature. I’m sure I’m not the only one.

Nobody proposed anything that you found satisfying. Making unnamed tuples and named tuples unrelated is an alternative and several people have put arguments forward to justify this approach. It is a little disappointing to see these arguments pushed aside because “[it] feels very usnsatisfying”.

I have enough respect for the people on this forum to trust that they have good intuition about language design. If a feature seems so devise to 10 of these people then I’d also conclude that something is fishy.

Named tuples have been used in other languages for years. That is enough time to build a mental model of the feature.

While breaking precedent with tradition should always be on the table, it should also require substantial evidence of being beneficial. I don’t think we gathered enough understanding about the consequences of unnamed <: named to make definitive statements here.

The fact that you’re not moved by that argument doesn’t make it invalid. It has been put forward by several people who seemed genuinely worried about a loss of type safety. This question sounds like you’re asking these people to just forget about it and move on.

This statement might be infuriating to Swift users. As I mentioned in an earlier post, Swift arguments are always positional and those that have labels must always be “named”. Millions of Swift users are used to this approach and at least half of them find it useful.

MateuszKowalewski · December 5, 2023, 8:45am

You’re definitely not the only one.

There are people who disregard overloading for its theoretical challenges. For example the Flix language doesn’t include overloading for this exact reason. But Flix solves this differently, by allowing functions to take parameters of structural types, and these structural types (Records) are first class constructs. In Flix parameter lists named parameters are modeled by passing in row-polymorphic records.

https://doc.flix.dev/records.html#named-parameters-with-records

But there are also more than enough people who think that Java like overloading isn’t even enough. People built whole languages on so called multi-methods, which are kind of “maximally overloaded” functions. That’s the other end of the spectrum…

That’s a much better argument. The former tried “statistics”…

First of all a lot of people didn’t use these languages. Not everybody used all kinds of languages.

But I used some C# actually. Nothing exciting, just a few kLOC. But I’m quite sure that I don’t think C# is very exciting after this experience. It has a lot of checkboxes ticked in the feature list, but almost everything “feels” like a 80% solution. Including named tuples. (The lack of Nothing, or actually any advanced type system features may be a reason for that.)

Where do you want to take that evidence from?

You would need to push that feature out to the people so you could gather some feedback from field usage.

I think we saw some very compelling arguments.

Calling functions is one of the most fundamental things in programming languages, especially in functional ones. Having features in your toolbox that allow for a kind of “dynamic” programming in a statically typed language is one of the selling points for Scala.

Scala has already plenty of tools to asses the highest level of type safety. But it lacks a little bit on the other side of the spectrum, where you’re interested in more dynamic data. Scala is actively loosing to Python for exactly this reason! It’s a pain to do explorative data processing in the stiff corset of a strongly typed language (when you don’t have any escape hatch).

Scala is trying to cover this use-case with features like structural types, and now named tuples.

Things in that space need to work in loosely, “quick and dirty” ways also. You don’t want to do type acrobatics to transform some data that doesn’t have a proper scheme!

That’s why I think it’s important to make the feature very convenient to use. Even this means loosing out in terms of absolute “safety”.

odersky · December 5, 2023, 8:58am

I completely agree we need to research links to data frames and relational algebra. I made a start in my PR where I implemented one kind of join in file named-tuple-strawman-2.scala. It would be great if others could help and add to this.

What do you mean by “built in”? It’s a simple lower bound declaration in a library.

I hear the arguments against subtyping, but I don’t find them convincing. I believe library operations will be a lot more pleasant to use if there is subtyping. And the safety concerns look overblown to me. There’s also the point that all other language that have named tuples do allow automatic injection. I am not arguing that we should in any case copy what others do, but if we deviate we have a burden of proof that explains why. This burden of proof has not been met, IMO.

If I write

val bob: Person = ("Bob", 33)
bob.zip(("Miller", "years"))

I find it un-Scala like to demand that one writes either

val x: Person = (name = "Bob", age = 33)
bob.zip(name = "Miller", age = "years")

or

val x: Person = ("Bob", 33).named
bob.zip(("Miller", "years").named)

It’s more verbose and ugly and gains us nothing (and, no, making an exception for literals won’t cut it, it would just introduce another irregularity). It feels to me we have lost the spirit of Scala. Why do this here, but not for named arguments? It’s just an arbitrary restriction to make people’s life harder in the interest of some hypothetical safety. By the very same argument you could say it’s easy to accidentally swap arguments to functions, so let’s make named arguments mandatory! You can have that opinion, but that language would not be Scala.

You can dismiss all that and weigh your priorities differently. But in the face of that and the evidence that all the other languages do the same, I think we have to agree to disagree on this point.