Pre SIP: Named tuples

soronpo · December 4, 2023, 1:12pm

odersky:

To repeat, the following model holds for the type structure at compile time:
(name: String, age: Int)   ~~   (("name", "age"), (String, Int))
(Int, String)              ~~   ((Nothing, Nothing), (String, Int))
Here, "name" and "age" are the singleton types with these string literals as values.

I’m not considering the (current draft) implementation in my argument, nor do I think it should be part of the conversation.

odersky · December 4, 2023, 1:16pm

My modeling was not primarily related to the implementation. It’s the modeling we need to keep in our heads to understand the semantic intuition of the proposal.The implementation just follows that model (and quite loosely, at that, since instead of structural expansion like this it uses an opaque type that reflects it).

alvae · December 4, 2023, 1:37pm

Swift and labeled types

I just want to give a few precisions about Swift and its treatment of named and positional things.

In Swift, functions always have positional arguments only. If you write this:

func divide(a: Int, b: Int) -> Int { a / b }

The compiler will enforce that a appears before b at all call sites.

divide(b: 2, a: 6) //  error: argument 'a' must precede argument 'b'

The reason is that Swift lets you add labels on positional parameters. Those labels don’t even have to match the name of the parameters:

func divide(dividend a: Int, divisor b: Int) { a / b }
print(divide(dividend: 6, divisor: 2))

Why is this important? Because we can also understand the “names” of a tuple the same way. They are just labels for positional elements. All tuples in Swift, named or otherwise, can be accessed via integer indices.

func divide(_ a: Int, by b: Int) -> (quotient: Int, remainder: Int) {
  (quotient: a / b, remainder: a % b)
}

let x = divide(6, by: 2)
print(x.0)         // 3
print(x.remainder) // 0

So one way to think about named tuples is to see them as just tuples with some information that lets the compiler relate a name to a position. This interpretation gives us more lenience to define subtyping and/or conversion.

About subtyping and conversions

I have to admit that I’m not particularly moved by the beauty of the theory and/or implementation of named <: unnamed or unnamed <: named. What matters to me is how useful the relation will be in my programs, and part of that includes intuition. It does not take mental gymnastic to understand that throw E returns Nothing, regardless of what that particular interpretation of throws buys for the calculus or the implementation. The same can’t be said about the arguments that have been presented in favor of unnamed <: named.

The subtyping relationship in Swift between types with labels is murky so I don’t think it is necessarily something to emulate. My (likely unpopular here) personal opinion is that not having a subtyping relationship at all might not be a bad bet. I’d also add that it’s always easier to relax constraints than tightening the screws after the fact. So perhaps it would be best to start without subtyping and identify where exactly that choice causes unbearable pain.

That being said, one thing that gets annoying without implicit conversion is the boilerplate necessary to create new tuple instances. Let me rewriting my Swift example in Scala to illustrate:

def divide(a: Int, b: Int) -> (quotient: Int, remainder: Int) = {
  (quotient = a / b, remainder = a % b)
}

I claim that it would be mighty convenient if we didn’t have to repeat the labels/names of the tuple in the return value. Similarly, if we have a method def f(x: (a: Int, b: Int)) we probably want to be able to call f((1, 2)). But I also claim that the conversion isn’t that important in other use cases. For example, I don’t think it is the end of the world if the compiler complains when I write this:

def foo((Int, Int)) -> Int = ???
foo(divide(6, 2))

After all, there is a very real possibility that I misused the result of divide. Having to pause and say “here’s how you get from (quotient: Int, remainder: Int) to (Int, Int)” might actually be beneficial for the understandability of this code.

I think one way to define very simple conversions between named and unnamed tuples is to restrict them to tuple literal expressions. If locally the compiler is able to infer the named that we left out, then all is well. Otherwise, (a: T, b: U) is neither super type nor subtype of (T, U) and the user must take the appropriate step to convert their types. We can always bikeshed syntax for that.

About the motivations for named tuples

There are many reasons why I’m not riding the subtyping train, but for tuples specifically, one is that I don’t think tuples should be a substitute for named types. In fact, I claim that the opening example shows a bad use case for named tuples:

type Person = (name: string, age: Int)
val amy: Person = (name = "Amy", age = 33)

What is the argument for not having defined a case class here? For essentially the same number of keystrokes, we get a type that also supports pattern matching and for which subtyping is clearly defined and unambiguous. So if we want to do fancy things with implicit conversions on assignment or at function boundaries, we already have the right tools for the job.

The fact that a case class has a heavier bytecode footprint isn’t a very compelling argument to me either. It’s good to know if I have to optimize my code one day but otherwise I’ll always lean on the side of using fewer features.

What I think is far more compelling is to have

a convenient lightweight way to return multiple results from a function

This use case doesn’t deserve a sophisticate subtyping relationship, only a simple way to create instances, match on them, and select their members. The simple conversion scheme that I described above is sufficient for that.

FWIW, I’ll add that in my experience with Swift, a lot of code starts with a tuple (labeled or not) and ends with a named struct because eventually one wants to properly document a type and their properties. So most uses of tuples in Swift are at function boundaries in things like Dictionary.init(uniqueKeysWithValues: Sequence<(key: Key, value: Value)>).

I’m not at all familiar with database oriented applications so I won’t comment on it. I’ll only say that I strongly suspect database people have thought of ways to deal with records sharing names and that is where we should look for answers if we haven’t yet.

Other possibly terrible ideas

If we adopt the view that “names” are merely labels for positional things, then there are a few restrictions we can lift.

For example:

It is illegal to mix named and unnamed elements in a tuple

Why? That is perfectly fine in Swift:

let x: (a: Int, String) = (1, "hello")
print(x.1)

We can just get tuples that happen to not have labels for some specific elements. Anyway, we can still access those elements using their position, as shown in the example.

or to use the same same name for two different elements.

Why?

That’s a little more experimental (at least we can’t do it in Swift) but we could simply say that if multiple elements have the same label, then the compiler reports an ambiguity if we try to use it. Again, all elements can be unambiguously accessed by their position anyway.

I think this approach also solves the problem of concatenating two tuples with overlapping names. We just get one whose unambiguous elements can be accessed by name and the other must be accessed by position. The label information is still useful because if we later split the combined tuple we might be able to unambiguously name its parts.

Inventing syntax and APIs because I don’t know how to express these operations in Scala:

val x = (a = 1, b = 2)
val y = (c = 3, b = 4)

// z has type (a: Int, b: Int, c: Int, b: Int)
val z = x ++ y
// compile-time error
print(z.b)
// OK
print(z(1))

// w has type (a: Int, b: Int, c: Int)
val (w, _) = z.splitAt(3)
// OK
print(z.b)

Sporarum · December 4, 2023, 1:46pm

For what it’s worth, my intuition was also that named <: unnamed, and it seems to make more sense

I was about to write something along the lines of:

But Alvae beat me to it ^^’

There is precedent for this:

val x: Double = 2 // implicit conversion from int literal to double literal

See Dropped: Weak Conformance - More Details

odersky · December 4, 2023, 2:01pm

For what it’s worth I also had the opposite intuition about subtyping initially so I am not surprised that people find it puzzling. But if you think things through, there is no other way.

alvae:

For example, I don’t think it is the end of the world if the compiler complains when I write this:
def foo((Int, Int)) -> Int = ???
foo(divide(6, 2))
After all, there is a very real possibility that I misused the result of divide. Having to pause and say “here’s how you get from (quotient: Int, remainder: Int) to (Int, Int) ” might actually be beneficial for the understandability of this code.

Exactly, that’s why we have unnamed <: named, and not the other way round.

alvae · December 4, 2023, 2:16pm

No. I claim that both these programs would be equally prone to misunderstandings:

// P1
def f(): (x: Int, y: Int) = ???
def g(a: (Int, Int)): Int = ???
g(f())

// P2
def f(): (Int, Int) = ???
def g(a: (x: Int, y: Int)): Int = ???
g(f())

In fact, I think P2 is even worse

odersky · December 4, 2023, 2:23pm

Here’s a more useful variant of P2:

val x: (a: Int, b: Int) = (a = 1, b = 2)
val y: (a: Boolean, b: Boolean) = (true, false)
val z = ("one", "two")

assert( x.zip(y) == (a = (1, true), b = (2, false) )  // OK, names match
assert( x.zip(z) == (a = (1, "one"), b = (2, "two") ) // Also, OK, names are assumed.

It’s also worth noting that both for spec and implementation, admitting the subtyping relationship is cheaper than not admitting it. If you insist on dropping subtyping, the language will get more complicated and less expressive at the same time.

lrytz · December 4, 2023, 2:25pm

A practical aspect: I assume returning a named tuple is going to be very common. Within the implementation one might not want to deal with the element names.

  def m(xs: List[Int]): (sum: Int, log: String) = {
    val r = xs.foldLeft((0, "")) {
      case ((acc, d), x) => (acc + x, s"$d$x")
    }
    // some work with `r`
    r
  }

unnamed <: named allows the above without the runtime cost of a conversion.

Sporarum · December 4, 2023, 2:27pm

Note that there probably wouldn’t be a runtime cost one way or another, as one is an opaque type of the other

odersky · December 4, 2023, 2:31pm

Great example, which shows why the idea of only converting literals is not expressive enough.

And if we talk about an implicit conversion from unnamed to named, how would you even define it (generically, once and for all)? You can’t. It has to be compiler magic. Which just shows how we are digging ourselves into another rabbit hole doing this.

The subtyping relationship is a single 4 character addition to the library. We write

opaque type NamedTuple[N <: Tuple, +V <: Tuple] >: V = V

instead of

opaque type NamedTuple[N <: Tuple, +V <: Tuple] = V

Now we are talking about magic compiler-generated conversions to replace it. Not an improvement in my book.

alvae · December 4, 2023, 2:33pm

“more complicated” is arguable given the number of people in this thread who find your subtyping direction unintuitive and expressiveness is a cruel mistress.

Sure, there are programs like your x.zip(z) that we can’t write. In exchange we don’t have to think about the union of Nothing with an invisible type describing the tuple’s names to understand how we can convert from one tuple type to another.

It is not expensive to add the names to construct the initial value of the accumulator. Then we can use name inference of literal tuples to cover the return values in your lambda.

alvae · December 4, 2023, 2:37pm

My argument is that the fact that a scheme is expressible in a type system doesn’t make it automatically better. We make design decisions that are incredibly hard to implement every day because we think they will help users.

odersky · December 4, 2023, 2:45pm

Yes, but here they will in fact not help users, just make their lives more difficult. In

  def m(xs: List[Int]): (sum: Int, log: String) = {
    val r = xs.foldLeft((0, "")) {
      case ((acc, d), x) => (acc + x, s"$d$x")
    }
    // some work with `r`
    r
  }

it’s hard to explain why this should not work. And it’s also not easy to see that

  def m(xs: List[Int]): (sum: Int, log: String) = {
    val r = xs.foldLeft((sum = 0, long = "")) {
      case ((acc, d), x) => (acc + x, s"$d$x")
    }
    // some work with `r`
    r
  }

would fix it. Given the intricacies of type inference, I am not sure it would infer names all of the way, and I am sure there are other examples where that would fail. It depends at what time which type variables are instantiated. Subtyping by contrast is known and much more robust since we have already solved the questions how it interacts with variable instantiation.

Users at first glance will expect to just be able to convert everything to everything. So I expect they would be upset if that was not true since the language designers could not agree on a semantic intuition. Having one half work by subtyping, and the other hand work by an easy conversion which is trivial to define and could be made implicit seems to fit the user’s mental model much better.

MateuszKowalewski · December 4, 2023, 2:55pm

Something very similar (just with less parens) is already possible in current Scala:

// P3
def f: (Int, Int) = ???
def g(x: Int, y: Int): Int = ???
g.tupled(f)

I don’t think that’s terrible.

alvae · December 4, 2023, 2:58pm

If inference fails, then the user has to deal with it. When I write this:

val x = 1
def f(y: Short) = y

@main def main() = {
  f(x) 
}

The compiler complains that it found Int and expected Short and I have to deal with it. In fact I’m glad that I do, because Short != Int and it is valuable to ask the user what was their intent.

It is possible, yes. I’m saying it would be valuable to start with the more restrictive approach, identify these examples, and evaluate whether the pain is actually unbearable or if on the contrary the explicitness makes the code clearer.

Yes, and they will want to implicitly convert (a: T, b: U) ~> (T, U) and will be frustrated that they can’t. Now it’s anyone’s guess as to whether they will be more or less frustrated if the arrow is reversed.

Instead of picking one direction and saying that it must be right because it makes the type system happy, I think that being more conservative will let us better understand what use cases need fixing, if any.

alvae · December 4, 2023, 3:01pm

I think it would only be the same thing if Scala had argument labels à la Swift, i.e., labels that are mandatory and positional. Without these restrictions, there’s no way to encode a function that would require a named tuple as argument anyway. If we had them from the get go, then I’d claim your example should not compile and would be confused if it did.

MateuszKowalewski · December 4, 2023, 3:05pm

The point was that the compiler happily “implicitly converts” from (unnamed) tuples to a (named) parameter list; when you call a function though tupled.

So there is precedence in the language for such a semantic.

devlaam · December 4, 2023, 4:18pm

odersky:

The accessors _1, …, _22 are not defined for named tuples. It’s wrong to equate
  (Int, String)    with    (_1 = Int, _2 = String)
If you do that, you will no longer be able to map between named and unnamed tuples at all.

Would it not be better to remove builtin support for _1, _2 etc on unnamed tuples altogether? Allowing tuple._1, which are essentially names on an unnamed tuple feels unnatural somehow. And since we can access the items by index as well: tuple(0), tuple(1) etc, it is also unnecessary.

Of course this may require some deprecation path for it will break code. For example the annotation: @numberedNames could convert the tuple (Int,String) into (_1: Int, _2: String) which is then treated as a regular named tuple.

odersky · December 4, 2023, 5:00pm

I agree it would be good to deprecate _1, …, _n. But that problem goes beyond tuples. Currently, every product class defines these accessors.

oscar · December 4, 2023, 6:25pm

I am excited at the possibility to have named tuples.

Spark is still a major motivator for scala usage. It would be a shame to not work through how named tuples could help us have well typed dataframes with spark and see if that exercise informs any of the design.

In a dataframe setting, each part of a query will have a different type, and defining case classes for each part is tedious. It seems like named tuples could be an opportunity to improve safety and usability at the same time.

A related problem would be libraries that model database tables with scala types.