Pre SIP: Named tuples

AMatveev · December 8, 2023, 2:31pm

It ofcourse solves the task but it does not shine at all. It leads to decoupling declaration with usage and makes code harder to read and refactor it forces using magic names and so on.

It is just little toothache. Ok, somebody does not need it. It is normal. But It is really easy to understand , just do not use anonymous functions in functional approach. A named function allows to do the same, does not it?

julienrf · December 8, 2023, 2:45pm

I agree. I think any argument in favor of named tuples (or structurally-typed records) should not involve the definition of a type alias, otherwise the benefits over case classes are too small.

arturopala · December 8, 2023, 3:07pm

can you elaborate more on this, please?

julienrf · December 8, 2023, 3:33pm

In your example, what are the drawbacks of using a case class?

In my opinion, tuples are useful in situations where you would not (or could not) use a case class. Otherwise, you would simply use a case class, no?

A typical example is a foldLeft call as shown in Pre SIP: Named tuples - #28 by lrytz. Other examples are projects like Iskra or frameless, although named tuples alone may not be enough to support them, as shown in this section about projections between data types.

bishabosha · December 8, 2023, 3:52pm

As a general test of usability I have been using Named Tuples exclusively for Advent of Code 2023 solutions: GitHub - bishabosha/advent-of-code-2023: Advent of Code challenges 2023 (exercising the sub typing relationship, field selection, pattern match etc.) I did not find myself missing case class, particularly as for these kinds of problems methods are not really better than top level functions

arturopala · December 8, 2023, 4:25pm

My instinct would be to use named tuples with type aliases instead of unnamed ones, especially when more than one pair or when nested tuples, instead of ad-hoc case classes. Reason: cleaner code, better names.

sjrd · December 8, 2023, 5:11pm

This has been a long thread, and it has also been accompanied by extensive – hours-long – discussions off-line back here at EPFL. It is time that I provide my analysis and opinion. There are several broad areas I want to touch upon:

Motivation: use cases, motivating example, benefit over case classes
Migration story
The infamous subtyping direction

I will post each of these areas as separate posts, because I suspect that different sets of people will want to like–or not like–them separately.

Use cases

Often, SIPs and Pre-SIPS start with terrible motivating examples. Usually this is because they focus on simple “how does it work” examples rather than “why would I use it” examples. This Pre-SIP is no exception. “Why would I use it” examples are typically longer but much more important.

I see 3 areas where named tuples bring substantial value over what we currently have:

Methods returning multiple result
Enabler for named pattern matching
Data type for intermediate results in operations that compute types (notably, database-like manipulations)

Multiple-result methods

Methods that return multiple results are not rare. In the collections library, for example, we can find partition, span, splitAt, partitionMap. These methods really want to return two results. They use a tuple to do so because that’s what the language offers to do so, not because the result is semantically a pair. (Contrast with unzip or unzip3, for which the result being a tuple makes inherent sense.)

I’ll choose partition as the canonical example. It is a good example because I’ve seen on numerous occasions developers saying that they never remember which side is which:

val (underagePersons, offAgePersons) = persons.map(_.age >= 18)
// oops, got it wrong

The confusion is not surprising: there is no fundamental reason that matching elements should go to the left and non-matching elements should go the right. The Saladoc of course says which is which, but it would be better if the API itself would provide the information.

This would naturally be achieved with named tuples. We would define partition as:

def partition(p: A => Boolean): (matching: List[A], nonMatching: List[A]) = ???

and then the confusion would immediately disappear.

Note that in argument position, we would use separate parameters for this, each with its own name. We only do this in result position because we have no other choice.

Also note that the named tuple type is used exactly once. That’s because it’s barely a type at all; it is almost only part of that single method’s signature. Defining a case class for that would make no sense.

Enabler for named pattern matching

Named pattern matching is very desirable. It has been coming up regularly over the years. Why has Scala never added support for it? Because we never figured out we could do it. The latest SIP on the topic came close, but did not succeed in the end.

With named tuples, we finally have a good answer to named pattern matching: if an extractor’s unapply method returns a named tuple or an Option of a named tuple, we can use its names in the pattern. For example:

object Duration {
  def unapply(s: String): Option[(length: Long, unit: TimeUnit)] = = ...
}

input match {
  case Duration(length = len, unit = TimeUnit.Seconds) => s"$len seconds"
}

Once again, this type is written exactly once: in the signature of the unapply method. It would never need a type alias or any other kind of explicit name.

This use case cannot be replaced by case classes. If we tried to do that, we would not be able to explain case classes in terms of other language features. They would have to be truly magic, and that is something we always wanted to avoid.

Computed intermediate types

I am not myself a user of database operations or any other thing like that, so I will refrain from motivating why we want those operations in the first place. However, using named tuples for them is a true enabler.

Operations like joins take two sets of rows and produce a new set of rows. The unique aspect here is that the resulting type can be generically computed from the types of the inputs.

Type computations cannot create classes (nor traits or any other form of nominal types). However, they can produce new types that structurally compose other types. This applies whether or not the type computations are in the language (e.g., with match types) or in macros. Therefore, using case classes here is also a complete no-go.

Existing solutions try to use structural types, but these are notoriously difficult to handle. In particular, their unordered nature makes it sketchy to destructure and compute upon (although we can construct them).

Named tuples, with their ordered, static list of name-value pairs provide a unique solution to this category of problems.

Anti use cases

As several people have already observed, as soon as you have to define a type alias for your named tuple, the usefulness compared to case classes is debatable at best. I will go as far as to say it is actively harmful, for several reasons:

Lack of a place to put associated documentation for each field,
Potential to mix and match, by mistake, two types that are structurally the same (but not semantically), and
The sheer decision factor of having to choose between case classes and named tuples.

Therefore, in my opinion, defining a type alias over a named tuple should be seen as a code smell. It may happen very sporadically in some situations (every code smell has its exceptions), but the overwhelming majority of cases should not do that. It might be good to lint against it. It certainly does not serve this proposal that the explainer puts this “use case” forward. We should never show this; not encourage developers to do it by giving them bad examples like that.

The fact that this is an anti-use-case does not undermine the value of the actual, good use cases I have elaborated on above, though.

sjrd · December 8, 2023, 5:12pm

Migration stories

If you are a regular reader of this forum and GitHub PRs, you probably know that I am “the binary/TASTy compatibility guy”. Everything I lay my eyes on, I immediately see the potentials for incompatibilities.

In the case of named tuples, you’ll tell me: “what’s the problem? They’re a new kind of type, they won’t pose any compatibility issue!” And you’ll be right, as long as we talk about the feature itself. But it goes further.

I mentioned earlier the example of partition. Surely, if and when we do get named tuples in the language, we’ll want to improve the existing collection API with a named tuple as the result of partition. The same goes for the other multi-result methods I mentioned. It will also likely happen in other third-party libraries.

Likewise, we will want to improve existing extractor unapply methods to return named tuples instead of unnamed ones.

To be able to do that without breaking compatibility (for the stdlib, that means to be able to do it at all), those changes must preserve binary and TASTy compatibility.

To preserve binary compatibility, the erasure of the named tuples, and their run-time behavior, must be the same as unnamed tuples. This is indeed preserved by the current proposal, and by several other encodings we can think of. (It wouldn’t be preserved if we used a run-time pair of names/values, or a separate class hierarchy, for example.)

Preserving TASTy compatibility is trickier. In general, changing the result type of a method for something that is not equivalent (both subtype and supertype) is not allowed: it must be a subtype because callers need that, and it must be a supertype because overriders need that. However, for final methods (or methods of final classes), we only need the newer-is-subtype-than-older direction.

(Btw, if you don’t know yet, these conditions can be checked automatically for you library using tasty-mima.)

If we want to improve partition, which is effectively final in List, to use named tuples, we would need a named tuple to be a subtype of the corresponding unnamed tuple. Likewise, for extractors found everywhere, we need that subtype direction. Note that a conversion (implicit or explicit) does not work here.

You might say: this cuts both ways. If I want to improve a tuple parameter to become a named tuple, we need the other subtyping direction. This is however not true for 2 reasons: a) we would not have used tuples in that case but rather several parameters in the first place; and b) parameters of methods are invariant anyway.

sjrd · December 8, 2023, 5:12pm

The infamous subtyping direction

This brings me to the controversial question: should we have subtyping between named and unnamed tuples, and if yes, in which direction?

The theoretic arguments

By now, I think we have thoroughly established that either choice is sound.

I have seen (and used myself) arguments based on the Liskov-Substitution-Principle (LSP) to argue that named <: unnamed is the only one that makes sense. Such arguments can be countered by saying that LSP is about the run-time behavior and the absence of ClassCastExceptions, not about “contracts”; or by saying that structural names do not carry contracts in the first place.

I have also seen arguments based on how some theory or another interacts with Nothing to necessarily imply that unnamed <: named is the only one that makes sense. Such arguments can be countered by observing that it is incidental to the particular choice of encoding we made: an equally valid encoding of (foo: Int, bar: String) would be (Int, String) { type Names = ("foo", "bar") }, which would share the run-time representation property but clearly be in favor of named <: unnamed.

Since theoretical arguments cannot choose an answer for us, we are forced to (or I would rather say: we get the freedom to) choose based on practical arguments: what does either subtyping relationship lets us do.

The practical arguments

Clearly, there are good arguments for unnamed <: named. When implementing partition (the naive way), I probably don’t want to repeat the names twice close to each other:

def partition(p: A => Boolean): (matching: List[A], nonMatching: List[A]) =
  (matching = filter(p), nonMatching = filterNot(p))

That’s boilerplate: it makes me write more for no benefit in terms in safety. I would rather write

def partition(p: A => Boolean): (matching: List[A], nonMatching: List[A]) =
  (filter(p), filterNot(p))

and let the compiler figure it out.

This particular example embodies, in my opinion, a large number of comments in this thread. Namely, the argument that Scala should help me write obvious code without bothering me. I agree with that sentiment; I also think it should not be at the cost of undermining safety in larger contexts: I want Scala to let me do obviously correct things, and have my back when it comes to the more complex things.

Note that, in the same line of arguments, I am not at all convinced by what appears to be a clear variant of the above: what we saw in the initial explainer:

type Person = (name: String, age: Int)
val x: Person = ("Laura", 25)

Why? Because it matches none of the 3 (IMO) valid use cases for named tuples. The type alias is already a code smell to me.

On the other side of the equation, there is named <: unnamed. What would that alternative let us do?

I have actually already made what is IMO the most compelling argument here: it would let us gradually improve existing APIs to benefit from named tuples. Here, actual subtyping is required; it cannot work with conversions (whether implicit or explicit) nor adaptation (type inference / expected types).

If we do choose this one, however, does that mean that we must write the names twice in the definition of partition (and every other similar method)? No, it doesn’t.

If we come back to my 3 use cases, in practice, named tuple types will almost only appear in the result type of methods. That means we’ll write actual instances of the named tuple in the last expression(s) of our methods, where the expected type is obvious. The compiler would have no more trouble adapting an unnamed tuple into a named one in that position than it has to adapt 5 to an expected Byte type.

arturopala · December 8, 2023, 7:47pm

Since when is naming things a code smell?

alvae · December 8, 2023, 8:34pm

It is not the naming that is smelly but the fact that the naming was done by creating an alias on a named tuple.

@sjrd’s point is that we should not encourage people to define type aliases on named tuples because type Person = (name: String, age: Int) is better expressed as case class Person(name: String, age: Int). Instead, named tuples should be used in signatures only.

saulpalv · December 8, 2023, 8:59pm

Named tuples relate to case classes, in a similar way lambdas relate to classes with a single method, in other words, allows to interface Classes by its shape/structure instead of its Class Name.

soronpo · December 8, 2023, 9:18pm

If we really want to, we can make it not pass grammar (parser), no?

Sporarum · December 8, 2023, 9:41pm

We could, but that would be irregular, I can’t think of a single type which cannot be put on the rhs of a type alias.

And I’d argue it’s not necessary, linters and documentation would do the job.
(And I think a linter should be included with the compiler, or metals as a default in the LSP, but that’s for another topic)

soronpo · December 8, 2023, 9:46pm

Yeah, and it’s also can be workaround easily with dependent typing to extract it, so it does not really make sense to create such a limit. Linting is an interesting option.

odersky · December 8, 2023, 10:26pm

Well, that 2 element person case class is 3371 bytes of byte code. Which needs to be loaded and jitted. By contrast, the code overhead of a named tuple is zero. Which might not matter in many situations, but might make a difference in others, in particular if there are many of these.

The other argument is that once we start to compute with named tuples (as in relational algebra), of course we will want to name our row types. So it all depends, and certainly there are good use cases for type aliases of named tuples.

Ichoran · December 8, 2023, 11:25pm

This is a really good set of examples, but I think we also should see how close we can get without explicit compiler support to determine whether explicit compiler support is desirable (and also to see if we can play with some of the concepts absent a compiler).

Being able to tag anything with a string constant type refinement is I think a fundamental generalization of named tuples. This was Martin’s earlier implementation method, and I return to that (essentially) here, with no subtyping relationships.

I’ve tossed together a hasty and brittle but working implementation into my library so people can play with it if they want to. (I want to.) The implementation only works for small sizes of tuples, because each time I ran into some problem that surprised me, I just adopted a simpler strategy which scaled less well. Anyway, you need a 3.4 flavor to try it out. The following at the top of a .scala file run with scala-cli will do the trick:

//> using scala 3.4.0-RC1-bin-20231114-18ada51-NIGHTLY
//> using dep com.github.ichoran::kse3-basics:0.2.7
//> using mainClass example.Main

package example

import scala.language.experimental.relaxedExtensionImports
import kse.basics.{given, _}
import kse.basics.labels.{given, _}

The key idea in this implementation is that you can take any type, let’s say Boolean, and annotate it with a string constant using \, e.g. Boolean \ "read-only". This is an opaque type that does nothing but apply the string literal as a type parameter. This works for values as well as types, so true \ "read-only" is a Boolean \ "read-only" with a value of true.

Now, we can consider the case of fully-labeled tuples, e.g. (String \ "name", Int \ "age"). One can write extension methods for such types specifically (extension [A, La, B, Lb, ...](tuple: (A \ La, B \ Lb, ...))), so I did.

Multiple-result methods

This immediately solves the multiple-result methods issue. It also allows one to force disambiguation of parameters:

def mutate(s: Array[Byte] \ "source", d: Array[Byte] \ "dest") = ...
mutate(xs, ys)  // Fails
mutate(xs \ "dest", ys \ "source") // Fails
mutate(xs \ "source", ys \ "dest") // Succeeds

One has the same decision about subtyping of A vs A \ "label", but note that in the case of function returns one can write a method that conforms to the desired labels (I call it .label) so it isn’t much work.

Enabler for pattern matching

Full pattern matching isn’t something I’ve been able to get working smoothly. But you can have pattered extractors quite easily.

If you define a method ~ that accesses a labeled value by label, then you have assurance at use-site that you know what it is:

def foo(ro: Boolean \ "read-only") =
  if ro ~ "read-only" then ???

This can be extended to extract one (~) or many (~~) items from a named tuple.

val stuff = (5 \ "eel", true \ "salmon", 'b' \ "bass")
val salmon = stuff ~ "salmon"  // true
val (eel, salmon, bass) = stuff ~~ ("eel", "salmon", "bass")

It’s not quite as nice as pattern matching, but it’s pretty decent.

Computed intermediate types

With every item labelled, creating intermediate types is really easy (maybe too easy–you can create some invalid stuff).

The key is, how do you work with these things with compiler-known types that you might not know?

Well, you can get things by name, if you know the name, with ~.

You can also define a variant of ~~ that allows subsampling:

val etc = stuff.pick("bass", "eel")  // ('b' \ "bass", 5 \ "eel")

Or you can define a variant that applies whichever values you have got to a default record, leaving alone any that aren’t mentioned:

val default = (true \ "happy", "ss" \ "bass", false \ "salmon")
default.updatedBy(stuff)  // (true, 'b', true), with same labels

Next steps

What would be great is if we could come up with use-cases where the library-only approach is clearly not good enough. Here’s a complete example which shows a not-completely-horrible better-than-no-library-status-quo of the three goals you’ve listed:

//> using scala 3.4.0-RC1-bin-20231114-18ada51-NIGHTLY
//> using dep com.github.ichoran::kse3-basics:0.2.7
//> using mainClass example.Main

package example

import scala.language.experimental.relaxedExtensionImports

import kse.basics.{given, _}
import kse.basics.labels.{given, _}

object Main {
  type Person = (String \ "name", Int \ "age")

  def part[A](xs: List[A])(p: A => Boolean): (List[A] \ "yes", List[A] \ "no") =
    xs.partition(p).label
  
  def multiResult: Unit =
    val people =
      ("Joe" \ "name", 33 \ "age") ::
      ("Sue" \ "name", 17 \ "age") :: 
      Nil
  
    val (mature, juvenile) = part(people)(_ ~ "age" >= 18) ~~ ("yes", "no")
    
    println("Old: " + mature)
    println("Not: " + juvenile)

  def patternAlternative: Unit =
    val joe = ("Joe" \ "name", 33 \ "age")
    val joeName = joe ~ "name"
    val joeAge = joe ~ "age"
    val (name, age) = joe ~~ ("name", "age")
    println(s"Joe's name is $joeName, yes, $name")
    println(s"Joe's age is $joeAge, yes, $age")
    
  def intermediateTypes: Unit =
    val joe = ("Joe" \ "name", 33 \ "age")
    val unhelpful = joe ++ (true, false)
    val weird = joe ++ joe
    // unhelpful ~ "name" fails; not all elements are labelled
    // weird ~ "name" fails with a redundant label message
    val more = joe ++ ("Big Avenue" \ "street", true \ "votes")
    val joeStreet = more ~ "street"
    println(s"${more ~ "name"} lives on ${more ~ "street"}")
    
    // Get some things
    val info = more.pick("name", "votes")
    println(s"Voter status: $info")
    
    // Initialize a subset of fields
    val default = (22.0 \ "C", "J. Doe" \ "name", false \ "retired", false \ "student")
    val fromJoe = default.updatedBy(joe)
    println(fromJoe)  // name is now "Joe"

  def main(args: Array[String]): Unit =
    multiResult
    patternAlternative
    intermediateTypes
}

Not shown there, you can drop labels with .unlabel.

So, what is the killer feature of having it in the compiler?

Maybe it’s needed for efficiency or scaling; to get around things I didn’t immediately understand, I make the compiler work rather preposterously hard to implement the above (and then, only for things up to total size 9 or so, so if you are picking 3 of 6 it works but not 5 of 8). But I’m sure it could be written far better even without macros (which I did not use).

Maybe it’s needed for syntax. It looks rather, well, weird the way I’ve written it. But people can get used to weird things as long as they don’t work the opposite way of other weird things that they’ve also gotten used to. (Unlearning is hard.) I’m already pretty comfortable with the syntax after a couple days.

Maybe the reason to put it in the compiler is just so people will actually use it, because we think it will help transform the ecosystem in positive ways, and we want it to be adopted as widely and quickly as possible.

But, anyway, I think it’s a good idea to have something from which we can construct “we can do this better” examples, so I have this sort-of strawman implementation that might be useful (I’ll probably use it), but at least might help focus thinking on what really must be at the compiler level.

Note: it’s probably also worth considering what could be accomplished with an InlineDynamic feature; in that case the string constant bits could look like ordinary method calls because person.name would resolve to person.applyInlineDynamic("name") which is basically equivalent to person.~("name") which is what I’ve defined.

987Nabil · December 9, 2023, 6:55am

Maybe I missed that, but did somebody ask/think about if case class User(name:String, age:Int) <:< (name: String, age: Int) ?

arturopala · December 9, 2023, 12:49pm

I would argue that we should have a way to use named tuples to construct case classes from property lists. It is a standard problem business application developers face when building and processing large APIs.

The simplest way would be to use inheritance:

type CommonApiFields  = {requestId: String, origin: String, /** etc */}
type PostApiFields  = {nonce: Int, /** etc */}

case class GetARequestPayload( a: String, b: Int) extends CommonApiFields 
case class PostBRequestPayload( c: String) extends CommonApiFields, PostApiFields

or a dedicated marker type:

case class GetARequestPayload( a: String, b: Int) extends CaseClassFields[CommonApiFields]
case class PostBRequestPayload( c: String) extends CaseClassFields[(CommonApiFields, PostApiFields)]

or a spread operator

case class GetARequestPayload( a: String, b: Int, ...CommonApiFields)
case class PostBRequestPayload( c: String, ...PostApiFields, ...CommonApiFields)

tarsa · December 9, 2023, 9:26pm

Tuple is a class with parameters too (like case class), so inheritance can’t efficiently work here. The reason that can’t be efficient is that tuples aren’t specialized, so they box primitives (or, more precisely, box all values classes).