Pre SIP: Named tuples

Summary

We propose to add new form of tuples where the elements are named.

type Person = (name: String, age: Int)
val Bob: Person = (name = "Bob", age = 33)

Bob match
  case (name, age) =>
    println(s"$name is $age years old")

val persons: List[Person] = ...
val minors = persons.filter: p =>
  p.age < 18

Named bindings in tuples are similar to function parameters and arguments. We use name: Type for element types and name = value for element values. It is illegal to mix named and unnamed elements in a tuple, or to use the same same name for two different elements.

Fields of named tuples can be selected by their name, as in the line p.age < 18 above.

Motivation

  1. Tuples are a convenient lightweight way to return multiple results from a function. But the absence of names obscures their meaning, and makes decomposition with _1, _2 ugly and hard to read. The existing alternative is to define a class instead. This does name fields, but is more heavy-weight, both in terms of notation and generated bytecode. Named tuples give the same convenience of definition as regular tuples at far better readability.

  2. Named tuples are an almost ideal substrate on which to implement relational algebra and other database oriented operations. They are a good representation of database rows and allow the definition of generic operations such as projections and joins since they can draw on Scala 3’s existing generic machinery for tuples based on match types.

  3. Named tuples make named pattern matching trivial to implement. The discussion on SIP 43 showed that without them it’s unclear how to implement named pattern matching at all. With them, the delta in the prototype implementation is about 10 lines of code.

Alternatives considered

We also considered to expand structural types. Structural types allow to abstract over existing classes, but require reflection or some other library-provided mechanism for element access. By contrast, named tuples have a separate representation as tuples, which can be manipulated directly. Since elements are ordered, traversals can be defined, and this allows the definition of type generic algorithms over named tuples. Structural types don’t allow such generic algorithms directly. Be could define mappings between structural types and named tuples, which could be used to implement such algorithms. These mappings would certainly become simpler if they map to/from named tuples than if they had to map to/from user-defined "HMap"s.

By contrast to named tuples, structural types are unordered and have width subtyping. This comes with the price that no natural element ordering exist, and that one usually needs some kind of dictionary structure for access. We believe that the following advantages of named tuples over structural types outweigh the loss of subtyping flexibility:

  • Better integration since named tuples and normal tuples share the same representation.
  • Better efficiency, since no dictionary is needed.
  • Natural traversal order allows the formulation of generic algorithms such as projections and joins.

Details of the Proposal

The proposal has been implemented as an experimental extension of Scala in #19174. The following sections draw from the doc page of that PR.

Conformance

The order of names in a named tuple matters. For instance, the type Person above and the type (age: Int, name: String) would be different, incompatible types.

Values of named tuple types can also be be defined using regular tuples. For instance:

val x: Person = ("Laura", 25)

def register(person: Person) = ...
register(person = ("Silvain", 16))
register(("Silvain", 16))

This follows since a regular tuple (T_1, ..., T_n) is treated as a subtype of a named tuple (N_1 = T_1, ..., N_n = T_n) with the same element types. On the other hand, named tuples do not conform to unnamed tuples, so the following is an error:

val x: (String, Int) = Bob           // error: type mismatch

One can convert a named tuple to an unnamed tuple with the toTuple method, so the following works:

val x: (String, Int) = Bob.toTuple // ok

Note that conformance rules for named tuples are analogous to the rules for named parameters. One can assign parameters by position to a named parameter list.

  def f(param: Int) = ...
  f(param = 1)   // OK
  f(2)           // Also OK

But one cannot use a name to pass an argument to an unnamed parameter:

    val f: Int => T
    f(2)         // OK
    f(param = 2) // Not OK

The rules for tuples are analogous. Unnamed tuples conform to named tuple types, but the opposite does not hold.

Pattern Matching

When pattern matching on a named tuple, the pattern may be named or unnamed.
If the pattern is named it needs to mention only a subset of the tuple names, and these names can come in any order. So the following are all OK:

Bob match
  case (name, age) => ...

Bob match
  case (name = x, age = y) => ...

Bob match
  case (age = x) => ...

Bob match
  case (age = x, name = y) => ...

Expansion

Named tuples are in essence just a convenient syntax for regular tuples. In the internal representation, a named tuple type is represented at compile time as a pair of two tuples. One tuple contains the names as literal constant string types, the other contains the element types. The runtime representation of a named tuples consists of just the element values, whereas the names are forgotten. This is achieved by declaring NamedTuple
in package scala as an opaque type as follows:

  opaque type NamedTuple[N <: Tuple, +V <: Tuple] >: V = V

For instance, the Person type would be represented as the type

NamedTuple[("name", "age"), (String, Int)]

NamedTuple is an opaque type alias of its second, value parameter. The first parameter is a string constant type which determines the name of the element. Since the type is just an alias of its value part, names are erased at runtime, and named tuples and regular tuples have the same representation.

A NamedTuple[N, V] type is publicly known to be a supertype (but not a subtype) of its value paramater V, which means that regular tuples can be assigned to named tuples but not vice versa.

The NamedTuple object contains a number of extension methods for named tuples hat mirror the same functions in Tuple. Examples are
apply, head, tail, take, drop, ++, map, or zip.
Similar to Tuple, the NamedTuple object also contains types such as Elem, Head, Concat
that describe the results of these extension methods.

The translation of named tuples to instances of NamedTuple is fixed by the specification and therefore known to the programmer. This means that:

  • All tuple operations also work with named tuples “out of the box”.
  • Macro libraries can rely on this expansion.

Restrictions

The following restrictions apply to named tuple elements:

  1. Either all elements of a tuple are named or none are named. It is illegal to mix named and unnamed elements in a tuple. For instance, the following is in error:
    val illFormed1 = ("Bob", age = 33)  // error
    
  2. Each element name in a named tuple must be unique. For instance, the following is in error:
    val illFormed2 = (name = "", age = 0, name = true)  // error
    
  3. Named tuples can be matched with either named or regular patterns. But regular tuples and other selector types can only be matched with regular tuple patterns. For instance, the following is in error:
    (tuple: Tuple) match
        case (age = x) => // error
    

Syntax

The syntax of Scala is extended as follows to support named tuples:

SimpleType        ::=  ...
                    |  ‘(’ NameAndType {‘,’ NameAndType} ‘)’
NameAndType       ::=  id ':' Type

SimpleExpr        ::=  ...
                    |  '(' NamedExprInParens {‘,’ NamedExprInParens} ')'
NamedExprInParens ::=  id '=' ExprInParens

SimplePattern     ::=  ...
                    |  '(' NamedPattern {‘,’ NamedPattern} ')'
NamedPattern      ::=  id '=' Pattern
32 Likes

This feels like a great addition to the language! The proposed syntax seems intuitive to me. Nicely done!

2 Likes

Another way to consider this (which is almost done here) is to consider “named type” as a variety of type. For example, foo: String is itself a type, which is a subtype of String. Viewed this way, a named tuple is just an ordinary tuple with specific types (and one could even mix named and non-named types in a tuple). It would also mirror FieldType in shapeless really well and put a nice bow on the isomorphism between case classes and records, and would be cleaner to deal with in macros than the tuple-of-tuples.

But, however it’s accomplished, I like the idea of tuples with compile-time names.

Edit: By “almost done here” I mean that “NameAndType” is introduced in the syntax, but isn’t given a correspondent in the type system.

3 Likes

I like this idea. I wonder if it would be possible to declare them in line so speak, to declare them within the return type of a method, as often you will only ever want to return a named Tuple from a single method.

Of course that’s possible:

def divmod(x; Double, y: Double): (divisor: Double, modulo: Double) =
  (x / y, x mod y)
3 Likes

That’s excellent. However I’m not sure about the sub typing relationship. To me it would seem natural to have named tuples as a sub type of their unnamed tuple structures. But maybe there’s major use case’s that this would cause a problem for that I’m not aware of.

There’s along discussion about this at https://github.com/lampepfl/dotty/pull/19075.

I can’t think of a use case for mixing named and unnamed off the top of my head, but is there a specific reason why it can’t be allowed? Couldn’t the numeric position names be synthetically added to any params without an explicit name?

So:

val bob = (name = "Bob", 47)
val bob2 = (name = "Bob", _2 = 47)
assert( bob == bob2 )

I’ve just seen this comment from the PR discussion:

In the current implementation, we require all elements of a tuple to be named or none to be named. maybe we can relax this and only inquire that a tuple type consists of either all named fields or no named fields

While that isn’t exactly what I asked, it does seem to address my underlying thought. I like the idea of tuple types requiring names, but values being flexible. That aligns very nicely with the parameter list analogy.

FWIW, see also previous discussion Named Tuple Arguments / Anonymous Case Classes

1 Like

It’d be good to see how this interacts with typeclass instance definitions. In particular, I’d expect the standard library to let me do:

type Person = (name: String, age: Int)
val bob: Person = (name = "Bob", age = 33)
val alice: Person = (name = "Alice", age = 35)

List(bob, alice).sorted // List(alice, bob)

Which looks like it would require a new instance:

given lexicographic[N <: Tuple, V <: Tuple]: Ordering[NamedTuple[N, V]] with
  def compare(x: NamedTuple[N, V], y: NamedTuple[N, V]): Int =
    summon[Ordering[Tuple]].compare(x.toTuple, y.toTuple)

… except this doesn’t work because we’re already missing a generic instance for Ordering on Tuple, and I’m not exactly sure what’s the best way to define one (and how it would interact with the existing instances for Ordering on specific Tuples).

1 Like

Finally!

Thank you so much, I love it! :heart_eyes:

The first question that comes to my mind: Could named tuples actually become “proper structs” on Scala Native?

The other thing is, as far as I understand this proposal, it’s not like this would somehow “close the road to enhanced structural types” at all. Quite the contrary, I think: Named tuples could in theory become the underlying structure for some future “HMaps”, so we could get structural types like in TypeScript also on top. It would take some (type level?) conversion mechanism to make this happen, where you “materialize” records / structural types as named tuples for the runtime representation. But this looks doable in the next step.

Nevertheless some possible future extension, named tuples are a great addition on their own. It was proposed so often. Great Scala listens once more to its community and makes such superb improvements! The language would get with this improvement one step closer to perfection, imho. We really just need a marketing department to spread the word.

1 Like

While the symmetry between tuples and parameter lists is nice, I do think the relationship is fundamentally the wrong way around. I agree with @sjrd in https://github.com/lampepfl/dotty/pull/19075#issuecomment-1827403119 when he said that the direction of sub-typing is confusing: subtypes should have stronger contracts (whether encoded in the type system or not), more information known about them, and define more methods than their supertypes. In this case:

  1. Named tuples do seem like they have a more specific semantic contract: you cannot pass (x: Int, y: Int) in place of (y: Int, x: Int), whereas all (Int, Int)s are interchangeable
  2. We know more about each value, semantically (e.g. .x is not just a Int, but a Int along the X-axis)
  3. They define more “methods”: you can access a named tuple’s fields positionally or via their names, but can only access a positional tuples fields positionally

By this logic, named tuples should definitely be a sub-type of positional tuples.

@odersky mentioned in Named tuples experimental first implementation by odersky · Pull Request #19075 · lampepfl/dotty · GitHub that Python is not statically typed, and thus does not have precedence here, but that is incorrect: Python’s namedtuple is unambiguously a subtype of tuple

  • When you look at it from a dynamic perspective an instance of namedtuple supports both a namedtuple’s .foo/.bar/.qux, as well as a positional tuple’s [0], [1], [2]. By duck-typing, namedtuple can be passed anywhere a tuple can be, and is thus a subtype

  • namedtuples have a dynamic isinstance relationship with tuple, and not the other way around

from collection import namedtuple

Foo = namedtuple("Foo", "a b c")

isinstance(Foo(1, 2, 3), tuple) # true

isinstance((1, 2, 3), Foo) # false
  • The standard (or as standard as exists) MyPy static typechecker treats NamedTuples as subtypes of Tuples of the same arity:
from typing import NamedTuple, Tuple

# Define a named tuple
class Point(NamedTuple):
    x: int
    y: int

# Function expecting a positional tuple
def process_tuple(t: Tuple[int, int]):
    print(t[0], t[1])

# Create a named tuple
point = Point(1, 2)

# Call the function with the named tuple
process_tuple(point) # OK


# Function expecting a named tuple
def process_named_tuple(t: Point):
    print(t.x, t.y)
    
# Call the function with a positional tuple
process_named_tuple((1, 2)) # main.py:23: error: Argument 1 to "process_named_tuple" has incompatible type "tuple[int, int]"; expected "Point"  [arg-type]

IMO we should follow the Python convention that NamedTuple <: Tuple, rather than the Scala argument-list convention that positional arguments can be used anywhere a named argument can be used.

Long-term, IMO we should probably have a stricter separation of position and named arguments the same way Python/Swift have. The status quo way we can swap named/positional parameters is looser than it should be:

  1. Python allows you to define named-only or positional-only parameters using the / and * syntax
  2. Swift goes one step further and mandates that parameters be only named-only or positional-only
  3. @odersky mentioned in person how in Dotty they have linters/stylistic-conventions for some parameters being positional-only (e.g. those with a single character name)

Rather than twisting the named/positional-tuple subtyping relationship to match what we currently have for named/positional arguments, IMO we should follow Python’s lead for the subtyping of named/positional-tuples and slowly twist the named/positional arguments to be more Python/Swift-like instead.


Notable, Swift allows unrestricted conversions in both directions between named and un-named tuples, so both assignments below work.

var tuplePositional: (String, Int) = ("Suresh Dasari", 200)
var tupleNamed: (name:String, id:Int) = tuplePositional
print("Hello World \(tupleNamed)")


var tupleNamed: (name:String, id:Int) = (name: "Suresh Dasari", id: 200)
var tuplePositional: (String, Int) = tupleNamed
print("Hello World \(tuplePositional)")

I can understand why we may not want to do that at the subtype level, because subtyping transitivity would mean that they’re the same type. But we could conceivably provide implicit conversions since those are non-transitive by default. But we probably should just stick to a one-directional subtyping relationship and provide an explicit case/conversion method to go the other way


On an unrelated note, named tuples has a very nice synergy with https://contributors.scala-lang.org/t/unpacking-classes-into-method-argument-lists/6329/93:

  1. Named tuples provide a lighter weight alternative to case classes as a type to be passed to unpack
  2. unpack makes conversion between named tuples and case classes trivial, you would be able to call MyCaseClass(*myNamedTupleInstance) or (*myCaseClassInstance) to convert between them without needing any explicit library or language support. These conversions would “just work”
5 Likes

For reference, C# also allows conversions in both directions (but don’t ask how it’s implemented, IDK):

using System;
					
public class Program
{
	public static void Main()
	{
		(String, int) tuplePositional = ("Suresh Dasari", 200);
		(String Name, int Id) tupleNamed = tuplePositional;
		Console.WriteLine($"Hello World {tupleNamed}");
		
		(String Name, int Id) tupleNamed2 = (Name: "Suresh Dasari", Id: 200);
		(String, int) tuplePositional2 = tupleNamed2;
		Console.WriteLine($"Hello World {tuplePositional2}");
	}
}

[ see: C# Online Compiler | .NET Fiddle ]

Note: In C# the names seem to be just some compile time entity. The code above prints two times the same.

Scala could do better, as I understand this proposal here.


OTOH I would argue against imitating Python. Their *args / **kwargs distinction always felt like a major kludge to me. It looks like something that was made this way because it wasn’t feasible to hide this distinction when Python was created. But our computers and software is much more powerful today. Just let the machine figure out how to convert those structures as needed, and hide the tedious details from the user. (Also having static types helps with that, I guess. Something Python didn’t have when *args / **kwargs was invented. Otherwise this would be just different types of tuples / arguments, I think…)

In general I like the idea to unify parameter lists and (named) tuples very much. Imho parameter lists are nothing else than some funky kind of named tuple(s). But there need to be conversions in all directions to make something like that convenient to use. Forcing end-users to fiddle with *args / **kwargs is the opposite of convenience!

1 Like

Conversions by subtyping in both directions are fundamentally wrong, since they make all names meaningless (detailed argument in the linked thread). Subtyping is supposed to be transitive. So with subtyping directions in both ways we get:

  (foo: Int, bar: Int) <: (Int, Int) <: (baz: Int, bam: Int)

That’s clearly not something we want to have.

That leaves one direction available. Arguably unnamed <: named is much more useful than the other way around.

Critique of named <: unnamed

If we assume named <: unnamed then named tuples are regular tuples. The names are conceptually on the side and enable us to also do selection by name. We can use all tuple functions including _1, _2 on named tuples. But that means we can also use functions like ++ on tuples. Both of the following would be OK:

Bob ++ (1, true)
(1, true) ++ Bob

and would yield an unnamed tuple. Furthermore, we can also write the erroneous Bob ++ Bob and instead of reporting an error about duplicate names this would be forced to strip the names and return an unnamed 4-tuple. I have the impression this would lead to brittle code. Imagine concatenating two long tuples from a database that accidentally share a name and instead of an error you get an unnamed tuple!

Moreover we can re-define none of the operations of Tuple. An example: Arguably map on a named tuple should return a named tuple. But the map operation is already taken on Tuple, and it is an inline function, so we cannot override it. This means we need a new operation like namedMap to do the “correct” map on NamedTuple. And a user naively using map on a named tuple would lose all the names. Yuck!

Defense of unnamed <: named

By contrast unnamed <: named is undeniably useful and has none of the shortcomings of the other direction. But what is its semantic intuition?

If unnamed <: named then tuples are named tuples. Semantically, a named tuple is modeled as a type level tuple of singleton types representing its names and a regular tuple of values. So, what are the names of an unnamed tuple? By the subtyping relation we are forced to have them be Nothing. So an unnamed tuple is a named tuple where each name type is Nothing. Once you think of it, it makes sense: There is no name, so the type of the name is Nothing. Just like the type of head on an empty list is Nothing since it does not return a result.

Nothing is not intuitive at all at first, but arguably Scala made some breakthroughs in PL design by embracing it. This seems to be another case where Nothing is really useful!

As an aside, Kotlin creator Andrej Breslav gave a nice talk recently that recounted that C# specifically suffered from not having Nothing, which meant that throw had to be a statement instead of an expression. Funny that C# and Typescript fall into the same trap again for named tuples.

Closing Argiument

I have shown that if we take subtyping seriously then unnamed <: named is the only relation that is useful and makes sense. The reverse relation named <: unnamed is actually hurtful, since it forces us to re-use all operations on tuples unchanged also on named tuples even where this does not make sense. I also want to already stave off any arguments that we should fiddle with subtyping, for instance by adding some additional rules or restrictions. IMO that’s a rabbit hole not worth jumping into.

Alternative

My argument was strictly about subtyping. Conversions between tuples are another topic entirely. Of course you should always be able to map named to unnamed tuples by an explicit conversion. In fact it already exists:

val pair: (String, Int) = Bob.toTuple

We could add that functionality to a Conversion on tuples, so the conversion could be used implicitly.

Should we forgo subtyping altogether and instead define implicit conversions in both directions? Arguably, that’s a bad idea since we would then be never sure what conversion the compiler applied, which would in turn determine the type of a tuple. We quickly realized that bijective implicit conversions were a really bad idea when we defined JavaConversions. Let’s not make the same mistake here!

7 Likes

Stripping names is what Python does when you append tuples and namedtuples. I agree it’s not an ideal result.

Would one solution to have some kind of “partially named” tuples? That’s effectively what we have with argument lists, where you can call foo(1, 2, 3, x = 4, y = 5, z = 6). What if we define (1, 2, 3) ++ (x = 4, y = 5, z = 6) to be (1, 2, 3, x = 4, y = 5, z = 6)? Would such a thing be possible?

Moreover we can re-define none of the operations of Tuple . An example: Arguably map on a named tuple should return a named tuple. But the map operation is already taken on Tuple , and it is an inline function, so we cannot override it. This means we need a new operation like namedMap to do the “correct” map on NamedTuple . And a user naively using map on a named tuple would lose all the names. Yuck!

This seems like an implementation issue. Not to say those aren’t real, but I wonder if a way could be found to work around it? e.g.

  • Could .map be made into a normal method or extension method, rather than inline, such that named tuples could do a covariant override with a more specific return type?
  • Or could .map be made transparent inline, such that when called on a named tuple it would use an inline if to also return a named tuple?

Using Nothing works, but I’m not sure it’s good. By that logic, any type is a subtype of anything else!

  • A positional scala.Tuple2[T1, T2] is a sub-type of scala.Tuple3[T1, T2, Nothing]
  • A java.lang.Object is a sub-type of java.lang.CharSequence, with covariant overrides def length(): Nothing = ??? and def charAt(index: Int): Nothing = ???

In effect, using Nothing is just throwing out static typing and going to a Pythonic “call whatever method you want, it’ll just blow up with an exception if it’s invalid”. The fact that Nothing can be used in types/expressions more convenient than throw being only usable in statements, but typically it’s still just throwing exceptions at runtime, and is a code smell that usually indicates you have your subtyping relationships wrong or backwards.

I’m not sure if there’s some clever Scala 3 type-level logic stuff or compiletime.ops stuff going on that will mean this isn’t as dangerous as normal usages of Nothing in Scala code. And sometimes you really can’t capture everything you want in the type system, and have to resort to Nothing/throw-ing in corner cases. But the fact that the proposal intentionally uses Nothing in a type definitely seems suspicious and suggests that we’re doing it wrong.


Basically the objections to NamedTuple <: Tuple seem like implementation restrictions that in theory could be made to work while preserving the Liskov Substitution Principle, while the implementation of Tuple <: NamedTuple using Nothing seems like a direct violation of LSP. It’s possible that there may be be other factors that may weigh in on this decision, but the issues with Tuple <: NamedTuple definitely seem more serious: implementations can change and be tweaked and restrictions lifted, but LSP is pretty fundamental and isn’t going way anytime soon, and typically violating LSP means an endless stream of edge cases that will never work quite right

Just thinking out loud:

What if all functions would just take NamedTuples as arguments?

If you pass the subtype, an (unnamed) Tuple, it can always be “up-cast” to the named version, as you can map the positional arguments onto the names (the tuples need to have same length to be even considered related by subtyping). The “up-cast” seems also somehow related to the fact that function params are contra-variant.

Passing positional arguments to functions (which have param lists, which have named arguments) works just fine in today’s Scala. The names of function parameters are forgotten / erased at runtime. The same would be the case with NamedTuples at runtime.

Tuples in general look really very related to param lists…

Thinking in subtypes, I guess, it would look something like:

Tuple <: NamedTuple <: ParameterList

Functions always take ParameterLists. :smile:

But it’s always fine to pass a subtype (LSP). So passing positional arguments (a Tuple) is OK, and will never do harm.

2 Likes

No, Nothing used in that way does not compromise type safety. My argument did not assume that any part of Scala changes. In particular, a type with a field x: Nothing was never equivalent to a type without that field and isn’t now either. So the argument that this amounts to dynamic typing is wrong.

No, it isn’t.

No, that does not hold either.

Is LSP violated? That’s an interesting question. LSP is about runtime behavior, where all tuples are equivalent, so that means LSP must hold in that sense. But there is something interesting going on with Nothing when it comes to compile time.

If I say, x: String and then I refine to x: Nothing, can I still do everything with it I could do before? Of course not: For instance I can’t take the length of x if its type is Nothing. Even though I could upcast x to String and then take the length again. So that shows that LSP does not work for compile-time subtyping. (there are other violations as well which are not linked to Nothing for instance connected to overloading or implicit resolution.)

If unnamed <: named then what I would have expected semantically is an unnamed tuple is a private case of a named tuple where all the fields are named _1, _2, .... So (Int, Int) <: (arg1: Int, arg2: Int) does not make sense to me unless also (otherName1: Int, otherName2: Int) <: (arg1: Int, arg2: Int).
So intuitively I would expect either all tuples with the same field types are considered the same type or that named tuples are subtypes of their unnamed counterparts.

The accessors _1, …, _22 are not defined for named tuples. It’s wrong to equate

  (Int, String)    with    (_1 = Int, _2 = String)

If you do that, you will no longer be able to map between named and unnamed tuples at all.

To repeat, the following model holds for the type structure at compile time:

(name: String, age: Int)   ~~   (("name", "age"), (String, Int))
(Int, String)              ~~   ((Nothing, Nothing), (String, Int))

Here, "name" and "age" are the singleton types with these string literals as values.

1 Like