Pre SIP: Named tuples

Of course that’s possible:

def divmod(x; Double, y: Double): (divisor: Double, modulo: Double) =
  (x / y, x mod y)
3 Likes

That’s excellent. However I’m not sure about the sub typing relationship. To me it would seem natural to have named tuples as a sub type of their unnamed tuple structures. But maybe there’s major use case’s that this would cause a problem for that I’m not aware of.

There’s along discussion about this at https://github.com/lampepfl/dotty/pull/19075.

I can’t think of a use case for mixing named and unnamed off the top of my head, but is there a specific reason why it can’t be allowed? Couldn’t the numeric position names be synthetically added to any params without an explicit name?

So:

val bob = (name = "Bob", 47)
val bob2 = (name = "Bob", _2 = 47)
assert( bob == bob2 )

I’ve just seen this comment from the PR discussion:

In the current implementation, we require all elements of a tuple to be named or none to be named. maybe we can relax this and only inquire that a tuple type consists of either all named fields or no named fields

While that isn’t exactly what I asked, it does seem to address my underlying thought. I like the idea of tuple types requiring names, but values being flexible. That aligns very nicely with the parameter list analogy.

FWIW, see also previous discussion Named Tuple Arguments / Anonymous Case Classes

1 Like

It’d be good to see how this interacts with typeclass instance definitions. In particular, I’d expect the standard library to let me do:

type Person = (name: String, age: Int)
val bob: Person = (name = "Bob", age = 33)
val alice: Person = (name = "Alice", age = 35)

List(bob, alice).sorted // List(alice, bob)

Which looks like it would require a new instance:

given lexicographic[N <: Tuple, V <: Tuple]: Ordering[NamedTuple[N, V]] with
  def compare(x: NamedTuple[N, V], y: NamedTuple[N, V]): Int =
    summon[Ordering[Tuple]].compare(x.toTuple, y.toTuple)

… except this doesn’t work because we’re already missing a generic instance for Ordering on Tuple, and I’m not exactly sure what’s the best way to define one (and how it would interact with the existing instances for Ordering on specific Tuples).

1 Like

Finally!

Thank you so much, I love it! :heart_eyes:

The first question that comes to my mind: Could named tuples actually become “proper structs” on Scala Native?

The other thing is, as far as I understand this proposal, it’s not like this would somehow “close the road to enhanced structural types” at all. Quite the contrary, I think: Named tuples could in theory become the underlying structure for some future “HMaps”, so we could get structural types like in TypeScript also on top. It would take some (type level?) conversion mechanism to make this happen, where you “materialize” records / structural types as named tuples for the runtime representation. But this looks doable in the next step.

Nevertheless some possible future extension, named tuples are a great addition on their own. It was proposed so often. Great Scala listens once more to its community and makes such superb improvements! The language would get with this improvement one step closer to perfection, imho. We really just need a marketing department to spread the word.

1 Like

While the symmetry between tuples and parameter lists is nice, I do think the relationship is fundamentally the wrong way around. I agree with @sjrd in https://github.com/lampepfl/dotty/pull/19075#issuecomment-1827403119 when he said that the direction of sub-typing is confusing: subtypes should have stronger contracts (whether encoded in the type system or not), more information known about them, and define more methods than their supertypes. In this case:

  1. Named tuples do seem like they have a more specific semantic contract: you cannot pass (x: Int, y: Int) in place of (y: Int, x: Int), whereas all (Int, Int)s are interchangeable
  2. We know more about each value, semantically (e.g. .x is not just a Int, but a Int along the X-axis)
  3. They define more “methods”: you can access a named tuple’s fields positionally or via their names, but can only access a positional tuples fields positionally

By this logic, named tuples should definitely be a sub-type of positional tuples.

@odersky mentioned in Named tuples experimental first implementation by odersky · Pull Request #19075 · lampepfl/dotty · GitHub that Python is not statically typed, and thus does not have precedence here, but that is incorrect: Python’s namedtuple is unambiguously a subtype of tuple

  • When you look at it from a dynamic perspective an instance of namedtuple supports both a namedtuple’s .foo/.bar/.qux, as well as a positional tuple’s [0], [1], [2]. By duck-typing, namedtuple can be passed anywhere a tuple can be, and is thus a subtype

  • namedtuples have a dynamic isinstance relationship with tuple, and not the other way around

from collection import namedtuple

Foo = namedtuple("Foo", "a b c")

isinstance(Foo(1, 2, 3), tuple) # true

isinstance((1, 2, 3), Foo) # false
  • The standard (or as standard as exists) MyPy static typechecker treats NamedTuples as subtypes of Tuples of the same arity:
from typing import NamedTuple, Tuple

# Define a named tuple
class Point(NamedTuple):
    x: int
    y: int

# Function expecting a positional tuple
def process_tuple(t: Tuple[int, int]):
    print(t[0], t[1])

# Create a named tuple
point = Point(1, 2)

# Call the function with the named tuple
process_tuple(point) # OK


# Function expecting a named tuple
def process_named_tuple(t: Point):
    print(t.x, t.y)
    
# Call the function with a positional tuple
process_named_tuple((1, 2)) # main.py:23: error: Argument 1 to "process_named_tuple" has incompatible type "tuple[int, int]"; expected "Point"  [arg-type]

IMO we should follow the Python convention that NamedTuple <: Tuple, rather than the Scala argument-list convention that positional arguments can be used anywhere a named argument can be used.

Long-term, IMO we should probably have a stricter separation of position and named arguments the same way Python/Swift have. The status quo way we can swap named/positional parameters is looser than it should be:

  1. Python allows you to define named-only or positional-only parameters using the / and * syntax
  2. Swift goes one step further and mandates that parameters be only named-only or positional-only
  3. @odersky mentioned in person how in Dotty they have linters/stylistic-conventions for some parameters being positional-only (e.g. those with a single character name)

Rather than twisting the named/positional-tuple subtyping relationship to match what we currently have for named/positional arguments, IMO we should follow Python’s lead for the subtyping of named/positional-tuples and slowly twist the named/positional arguments to be more Python/Swift-like instead.


Notable, Swift allows unrestricted conversions in both directions between named and un-named tuples, so both assignments below work.

var tuplePositional: (String, Int) = ("Suresh Dasari", 200)
var tupleNamed: (name:String, id:Int) = tuplePositional
print("Hello World \(tupleNamed)")


var tupleNamed: (name:String, id:Int) = (name: "Suresh Dasari", id: 200)
var tuplePositional: (String, Int) = tupleNamed
print("Hello World \(tuplePositional)")

I can understand why we may not want to do that at the subtype level, because subtyping transitivity would mean that they’re the same type. But we could conceivably provide implicit conversions since those are non-transitive by default. But we probably should just stick to a one-directional subtyping relationship and provide an explicit case/conversion method to go the other way


On an unrelated note, named tuples has a very nice synergy with https://contributors.scala-lang.org/t/unpacking-classes-into-method-argument-lists/6329/93:

  1. Named tuples provide a lighter weight alternative to case classes as a type to be passed to unpack
  2. unpack makes conversion between named tuples and case classes trivial, you would be able to call MyCaseClass(*myNamedTupleInstance) or (*myCaseClassInstance) to convert between them without needing any explicit library or language support. These conversions would “just work”
5 Likes

For reference, C# also allows conversions in both directions (but don’t ask how it’s implemented, IDK):

using System;
					
public class Program
{
	public static void Main()
	{
		(String, int) tuplePositional = ("Suresh Dasari", 200);
		(String Name, int Id) tupleNamed = tuplePositional;
		Console.WriteLine($"Hello World {tupleNamed}");
		
		(String Name, int Id) tupleNamed2 = (Name: "Suresh Dasari", Id: 200);
		(String, int) tuplePositional2 = tupleNamed2;
		Console.WriteLine($"Hello World {tuplePositional2}");
	}
}

[ see: C# Online Compiler | .NET Fiddle ]

Note: In C# the names seem to be just some compile time entity. The code above prints two times the same.

Scala could do better, as I understand this proposal here.


OTOH I would argue against imitating Python. Their *args / **kwargs distinction always felt like a major kludge to me. It looks like something that was made this way because it wasn’t feasible to hide this distinction when Python was created. But our computers and software is much more powerful today. Just let the machine figure out how to convert those structures as needed, and hide the tedious details from the user. (Also having static types helps with that, I guess. Something Python didn’t have when *args / **kwargs was invented. Otherwise this would be just different types of tuples / arguments, I think…)

In general I like the idea to unify parameter lists and (named) tuples very much. Imho parameter lists are nothing else than some funky kind of named tuple(s). But there need to be conversions in all directions to make something like that convenient to use. Forcing end-users to fiddle with *args / **kwargs is the opposite of convenience!

1 Like

Conversions by subtyping in both directions are fundamentally wrong, since they make all names meaningless (detailed argument in the linked thread). Subtyping is supposed to be transitive. So with subtyping directions in both ways we get:

  (foo: Int, bar: Int) <: (Int, Int) <: (baz: Int, bam: Int)

That’s clearly not something we want to have.

That leaves one direction available. Arguably unnamed <: named is much more useful than the other way around.

Critique of named <: unnamed

If we assume named <: unnamed then named tuples are regular tuples. The names are conceptually on the side and enable us to also do selection by name. We can use all tuple functions including _1, _2 on named tuples. But that means we can also use functions like ++ on tuples. Both of the following would be OK:

Bob ++ (1, true)
(1, true) ++ Bob

and would yield an unnamed tuple. Furthermore, we can also write the erroneous Bob ++ Bob and instead of reporting an error about duplicate names this would be forced to strip the names and return an unnamed 4-tuple. I have the impression this would lead to brittle code. Imagine concatenating two long tuples from a database that accidentally share a name and instead of an error you get an unnamed tuple!

Moreover we can re-define none of the operations of Tuple. An example: Arguably map on a named tuple should return a named tuple. But the map operation is already taken on Tuple, and it is an inline function, so we cannot override it. This means we need a new operation like namedMap to do the “correct” map on NamedTuple. And a user naively using map on a named tuple would lose all the names. Yuck!

Defense of unnamed <: named

By contrast unnamed <: named is undeniably useful and has none of the shortcomings of the other direction. But what is its semantic intuition?

If unnamed <: named then tuples are named tuples. Semantically, a named tuple is modeled as a type level tuple of singleton types representing its names and a regular tuple of values. So, what are the names of an unnamed tuple? By the subtyping relation we are forced to have them be Nothing. So an unnamed tuple is a named tuple where each name type is Nothing. Once you think of it, it makes sense: There is no name, so the type of the name is Nothing. Just like the type of head on an empty list is Nothing since it does not return a result.

Nothing is not intuitive at all at first, but arguably Scala made some breakthroughs in PL design by embracing it. This seems to be another case where Nothing is really useful!

As an aside, Kotlin creator Andrej Breslav gave a nice talk recently that recounted that C# specifically suffered from not having Nothing, which meant that throw had to be a statement instead of an expression. Funny that C# and Typescript fall into the same trap again for named tuples.

Closing Argiument

I have shown that if we take subtyping seriously then unnamed <: named is the only relation that is useful and makes sense. The reverse relation named <: unnamed is actually hurtful, since it forces us to re-use all operations on tuples unchanged also on named tuples even where this does not make sense. I also want to already stave off any arguments that we should fiddle with subtyping, for instance by adding some additional rules or restrictions. IMO that’s a rabbit hole not worth jumping into.

Alternative

My argument was strictly about subtyping. Conversions between tuples are another topic entirely. Of course you should always be able to map named to unnamed tuples by an explicit conversion. In fact it already exists:

val pair: (String, Int) = Bob.toTuple

We could add that functionality to a Conversion on tuples, so the conversion could be used implicitly.

Should we forgo subtyping altogether and instead define implicit conversions in both directions? Arguably, that’s a bad idea since we would then be never sure what conversion the compiler applied, which would in turn determine the type of a tuple. We quickly realized that bijective implicit conversions were a really bad idea when we defined JavaConversions. Let’s not make the same mistake here!

7 Likes

Stripping names is what Python does when you append tuples and namedtuples. I agree it’s not an ideal result.

Would one solution to have some kind of “partially named” tuples? That’s effectively what we have with argument lists, where you can call foo(1, 2, 3, x = 4, y = 5, z = 6). What if we define (1, 2, 3) ++ (x = 4, y = 5, z = 6) to be (1, 2, 3, x = 4, y = 5, z = 6)? Would such a thing be possible?

Moreover we can re-define none of the operations of Tuple . An example: Arguably map on a named tuple should return a named tuple. But the map operation is already taken on Tuple , and it is an inline function, so we cannot override it. This means we need a new operation like namedMap to do the “correct” map on NamedTuple . And a user naively using map on a named tuple would lose all the names. Yuck!

This seems like an implementation issue. Not to say those aren’t real, but I wonder if a way could be found to work around it? e.g.

  • Could .map be made into a normal method or extension method, rather than inline, such that named tuples could do a covariant override with a more specific return type?
  • Or could .map be made transparent inline, such that when called on a named tuple it would use an inline if to also return a named tuple?

Using Nothing works, but I’m not sure it’s good. By that logic, any type is a subtype of anything else!

  • A positional scala.Tuple2[T1, T2] is a sub-type of scala.Tuple3[T1, T2, Nothing]
  • A java.lang.Object is a sub-type of java.lang.CharSequence, with covariant overrides def length(): Nothing = ??? and def charAt(index: Int): Nothing = ???

In effect, using Nothing is just throwing out static typing and going to a Pythonic “call whatever method you want, it’ll just blow up with an exception if it’s invalid”. The fact that Nothing can be used in types/expressions more convenient than throw being only usable in statements, but typically it’s still just throwing exceptions at runtime, and is a code smell that usually indicates you have your subtyping relationships wrong or backwards.

I’m not sure if there’s some clever Scala 3 type-level logic stuff or compiletime.ops stuff going on that will mean this isn’t as dangerous as normal usages of Nothing in Scala code. And sometimes you really can’t capture everything you want in the type system, and have to resort to Nothing/throw-ing in corner cases. But the fact that the proposal intentionally uses Nothing in a type definitely seems suspicious and suggests that we’re doing it wrong.


Basically the objections to NamedTuple <: Tuple seem like implementation restrictions that in theory could be made to work while preserving the Liskov Substitution Principle, while the implementation of Tuple <: NamedTuple using Nothing seems like a direct violation of LSP. It’s possible that there may be be other factors that may weigh in on this decision, but the issues with Tuple <: NamedTuple definitely seem more serious: implementations can change and be tweaked and restrictions lifted, but LSP is pretty fundamental and isn’t going way anytime soon, and typically violating LSP means an endless stream of edge cases that will never work quite right

Just thinking out loud:

What if all functions would just take NamedTuples as arguments?

If you pass the subtype, an (unnamed) Tuple, it can always be “up-cast” to the named version, as you can map the positional arguments onto the names (the tuples need to have same length to be even considered related by subtyping). The “up-cast” seems also somehow related to the fact that function params are contra-variant.

Passing positional arguments to functions (which have param lists, which have named arguments) works just fine in today’s Scala. The names of function parameters are forgotten / erased at runtime. The same would be the case with NamedTuples at runtime.

Tuples in general look really very related to param lists…

Thinking in subtypes, I guess, it would look something like:

Tuple <: NamedTuple <: ParameterList

Functions always take ParameterLists. :smile:

But it’s always fine to pass a subtype (LSP). So passing positional arguments (a Tuple) is OK, and will never do harm.

2 Likes

No, Nothing used in that way does not compromise type safety. My argument did not assume that any part of Scala changes. In particular, a type with a field x: Nothing was never equivalent to a type without that field and isn’t now either. So the argument that this amounts to dynamic typing is wrong.

No, it isn’t.

No, that does not hold either.

Is LSP violated? That’s an interesting question. LSP is about runtime behavior, where all tuples are equivalent, so that means LSP must hold in that sense. But there is something interesting going on with Nothing when it comes to compile time.

If I say, x: String and then I refine to x: Nothing, can I still do everything with it I could do before? Of course not: For instance I can’t take the length of x if its type is Nothing. Even though I could upcast x to String and then take the length again. So that shows that LSP does not work for compile-time subtyping. (there are other violations as well which are not linked to Nothing for instance connected to overloading or implicit resolution.)

If unnamed <: named then what I would have expected semantically is an unnamed tuple is a private case of a named tuple where all the fields are named _1, _2, .... So (Int, Int) <: (arg1: Int, arg2: Int) does not make sense to me unless also (otherName1: Int, otherName2: Int) <: (arg1: Int, arg2: Int).
So intuitively I would expect either all tuples with the same field types are considered the same type or that named tuples are subtypes of their unnamed counterparts.

The accessors _1, …, _22 are not defined for named tuples. It’s wrong to equate

  (Int, String)    with    (_1 = Int, _2 = String)

If you do that, you will no longer be able to map between named and unnamed tuples at all.

To repeat, the following model holds for the type structure at compile time:

(name: String, age: Int)   ~~   (("name", "age"), (String, Int))
(Int, String)              ~~   ((Nothing, Nothing), (String, Int))

Here, "name" and "age" are the singleton types with these string literals as values.

1 Like

I’m not considering the (current draft) implementation in my argument, nor do I think it should be part of the conversation.

My modeling was not primarily related to the implementation. It’s the modeling we need to keep in our heads to understand the semantic intuition of the proposal.The implementation just follows that model (and quite loosely, at that, since instead of structural expansion like this it uses an opaque type that reflects it).

Swift and labeled types

I just want to give a few precisions about Swift and its treatment of named and positional things.

In Swift, functions always have positional arguments only. If you write this:

func divide(a: Int, b: Int) -> Int { a / b }

The compiler will enforce that a appears before b at all call sites.

divide(b: 2, a: 6) //  error: argument 'a' must precede argument 'b'

The reason is that Swift lets you add labels on positional parameters. Those labels don’t even have to match the name of the parameters:

func divide(dividend a: Int, divisor b: Int) { a / b }
print(divide(dividend: 6, divisor: 2))

Why is this important? Because we can also understand the “names” of a tuple the same way. They are just labels for positional elements. All tuples in Swift, named or otherwise, can be accessed via integer indices.

func divide(_ a: Int, by b: Int) -> (quotient: Int, remainder: Int) {
  (quotient: a / b, remainder: a % b)
}

let x = divide(6, by: 2)
print(x.0)         // 3
print(x.remainder) // 0

So one way to think about named tuples is to see them as just tuples with some information that lets the compiler relate a name to a position. This interpretation gives us more lenience to define subtyping and/or conversion.

About subtyping and conversions

I have to admit that I’m not particularly moved by the beauty of the theory and/or implementation of named <: unnamed or unnamed <: named. What matters to me is how useful the relation will be in my programs, and part of that includes intuition. It does not take mental gymnastic to understand that throw E returns Nothing, regardless of what that particular interpretation of throws buys for the calculus or the implementation. The same can’t be said about the arguments that have been presented in favor of unnamed <: named.

The subtyping relationship in Swift between types with labels is murky so I don’t think it is necessarily something to emulate. My (likely unpopular here) personal opinion is that not having a subtyping relationship at all might not be a bad bet. I’d also add that it’s always easier to relax constraints than tightening the screws after the fact. So perhaps it would be best to start without subtyping and identify where exactly that choice causes unbearable pain.

That being said, one thing that gets annoying without implicit conversion is the boilerplate necessary to create new tuple instances. Let me rewriting my Swift example in Scala to illustrate:

def divide(a: Int, b: Int) -> (quotient: Int, remainder: Int) = {
  (quotient = a / b, remainder = a % b)
}

I claim that it would be mighty convenient if we didn’t have to repeat the labels/names of the tuple in the return value. Similarly, if we have a method def f(x: (a: Int, b: Int)) we probably want to be able to call f((1, 2)). But I also claim that the conversion isn’t that important in other use cases. For example, I don’t think it is the end of the world if the compiler complains when I write this:

def foo((Int, Int)) -> Int = ???
foo(divide(6, 2))

After all, there is a very real possibility that I misused the result of divide. Having to pause and say “here’s how you get from (quotient: Int, remainder: Int) to (Int, Int)” might actually be beneficial for the understandability of this code.

I think one way to define very simple conversions between named and unnamed tuples is to restrict them to tuple literal expressions. If locally the compiler is able to infer the named that we left out, then all is well. Otherwise, (a: T, b: U) is neither super type nor subtype of (T, U) and the user must take the appropriate step to convert their types. We can always bikeshed syntax for that.

About the motivations for named tuples

There are many reasons why I’m not riding the subtyping train, but for tuples specifically, one is that I don’t think tuples should be a substitute for named types. In fact, I claim that the opening example shows a bad use case for named tuples:

type Person = (name: string, age: Int)
val amy: Person = (name = "Amy", age = 33)

What is the argument for not having defined a case class here? For essentially the same number of keystrokes, we get a type that also supports pattern matching and for which subtyping is clearly defined and unambiguous. So if we want to do fancy things with implicit conversions on assignment or at function boundaries, we already have the right tools for the job.

The fact that a case class has a heavier bytecode footprint isn’t a very compelling argument to me either. It’s good to know if I have to optimize my code one day but otherwise I’ll always lean on the side of using fewer features.

What I think is far more compelling is to have

a convenient lightweight way to return multiple results from a function

This use case doesn’t deserve a sophisticate subtyping relationship, only a simple way to create instances, match on them, and select their members. The simple conversion scheme that I described above is sufficient for that.

FWIW, I’ll add that in my experience with Swift, a lot of code starts with a tuple (labeled or not) and ends with a named struct because eventually one wants to properly document a type and their properties. So most uses of tuples in Swift are at function boundaries in things like Dictionary.init(uniqueKeysWithValues: Sequence<(key: Key, value: Value)>).

I’m not at all familiar with database oriented applications so I won’t comment on it. I’ll only say that I strongly suspect database people have thought of ways to deal with records sharing names and that is where we should look for answers if we haven’t yet.

Other possibly terrible ideas

If we adopt the view that “names” are merely labels for positional things, then there are a few restrictions we can lift.

For example:

It is illegal to mix named and unnamed elements in a tuple

Why? That is perfectly fine in Swift:

let x: (a: Int, String) = (1, "hello")
print(x.1)

We can just get tuples that happen to not have labels for some specific elements. Anyway, we can still access those elements using their position, as shown in the example.

or to use the same same name for two different elements.

Why?

That’s a little more experimental (at least we can’t do it in Swift) but we could simply say that if multiple elements have the same label, then the compiler reports an ambiguity if we try to use it. Again, all elements can be unambiguously accessed by their position anyway.

I think this approach also solves the problem of concatenating two tuples with overlapping names. We just get one whose unambiguous elements can be accessed by name and the other must be accessed by position. The label information is still useful because if we later split the combined tuple we might be able to unambiguously name its parts.

Inventing syntax and APIs because I don’t know how to express these operations in Scala:

val x = (a = 1, b = 2)
val y = (c = 3, b = 4)

// z has type (a: Int, b: Int, c: Int, b: Int)
val z = x ++ y
// compile-time error
print(z.b)
// OK
print(z(1))

// w has type (a: Int, b: Int, c: Int)
val (w, _) = z.splitAt(3)
// OK
print(z.b)
4 Likes

For what it’s worth, my intuition was also that named <: unnamed, and it seems to make more sense

I was about to write something along the lines of:

But Alvae beat me to it ^^’

There is precedent for this:

val x: Double = 2 // implicit conversion from int literal to double literal

See Dropped: Weak Conformance - More Details