Pre SIP: Named tuples

AMatveev · December 7, 2023, 1:30pm

//I need fast iteration over 1 000 000 rows with zero copy
stream[(a:String,b:Int,...,z:Int)]("""
select a,b,c ..., z from very_large_table
""").foreach{row =>
 println(s"""I am happy to get value from iterator without data copy, 
   I just can get it by ${row.a()}""")
}

And please do not tell me that I can live without it, I know it )))

bishabosha · December 7, 2023, 3:49pm

I think “IDE doesn’t support it yet, therefore we can never have it” might be a bit of an unreasonable argument.

As the current implementation exists, it requires an @experimental language feature import, so is not even usable without nightly/snapshot compiler, and it probably wouldn’t become stabilised (even with SIP approval) until IDEs, linters etc can support it.

The primary use case as mentioned is for some form of container with typed field selection that can be easily constructed by macros. E.g. parsing some schema into a data structure, or dataframe-like APIs - macros are not allowed to generate classes where the API is not already defined e.g. in a trait. However they would be able to construct named tuples. Other libraries already exist to do a similar thing with structural types, but these are not integrated as well with the compiler, and need to reinvent the wheel to be flexible.

dejvid · December 7, 2023, 5:29pm

I think we should stick to case classes. As Nabil said we do not need a third way to construct product types. Any decent code will use case classes. If someone is too lazy to define it let him deal with _1, and _2 I think Scala needs to work on tooling more than named tuples. I don’t see the benefits.

dejvid · December 7, 2023, 5:33pm

AMatveev:

/I need fast iteration over 1 000 000 rows with zero copy
stream[(a:String,b:Int,...,z:Int)]("""
select a,b,c ..., z from very_large_table
""").foreach{row =>
 println(s"""I am happy to get value from iterator without data copy, 
   I just can get it by ${row.a()}""")
}

And please do not tell me that I can live without it, I

this can be done with a macro and STRUCTURAL TYPES from the library that does the stream function. why complicate the language. We have all the tools available.
stream(stream"“”
select a,b,c …, z from very_large_table
“”“).foreach{ row =>
println(s”${row.a()}")
}

So yes I can say you can live without them.

SethTisue · December 7, 2023, 10:48pm

2 posts were split to a new topic: Tooling support

AMatveev · December 7, 2023, 8:52pm

If I am not missing something, It can not be done without disadvantages.

SethTisue · December 8, 2023, 3:45pm

2 posts were merged into an existing topic: Tooling support

arturopala · December 8, 2023, 1:37pm

I see a massive benefit in having a lightweight way to define intermediary record-like types without paying the cost of a class at each step. It isn’t about being a lazy programmer but about the performance and readability.

Consider the following code:

type NameWithAge = (name: String, age: Int)
def collectNameWithAge: Seq[NameWithAge] = ???

def format(record: NameWithAge): String = s"${record.name} is ${record.age} old"

val report = collectNameWithAge.map(format).mkString("\n")

IMHO it is much cleaner than the alternatives, and more performant at runtime than the case class.

bishabosha · December 8, 2023, 2:00pm

to be clear tuples are syntax sugar for a case class, so this example I am not sure shows the benefit of performance you claim - perhaps if you do a bunch of conversions that actually end up casting the named tuple, rather than allocating a new case class with a different name (but same fields) this would make more sense

Ichoran · December 8, 2023, 2:05pm

arturopala:

Consider the following code:
type NameWithAge = (name: String, age: Int)
def collectNameWithAge: Seq[NameWithAge] = ???

def format(record: NameWithAge): String = s"${record.name} is ${record.age} old"

val report = collectNameWithAge.map(format).mkString("\n")
IMHO it is much cleaner than the alternatives, and more performant at runtime than the case class.

I don’t think any of this is right. This is exactly where case classes shine.

case class NameWithAge(name: String, age: Int) {}
type Name With Age = (name: String, age: Int)

// Everything else exactly the same either way
// And the case class, unlike the tuple, does NOT have `age` boxed!

You need an example more like

def joe(n: Int): (name: String, age: Int) = ("Joe", n)

// vs

case class NameWithAge(name: String, age: Int) {}
def joe(n: Int) = NameWithAge("Joe", n)

By the time you define a type variable, you’ve already hit the same syntactic complexity as a case class (saving only four letters: type = vs case class).

arturopala · December 8, 2023, 2:18pm

I would expect this to use a generic Tuple2 class and be optimized further by escape analysis on the JVM.

arturopala · December 8, 2023, 2:21pm

I don’t need a full-blown case class for my example. I’m saving more than a few letters.

AMatveev · December 8, 2023, 2:31pm

It ofcourse solves the task but it does not shine at all. It leads to decoupling declaration with usage and makes code harder to read and refactor it forces using magic names and so on.

It is just little toothache. Ok, somebody does not need it. It is normal. But It is really easy to understand , just do not use anonymous functions in functional approach. A named function allows to do the same, does not it?

julienrf · December 8, 2023, 2:45pm

I agree. I think any argument in favor of named tuples (or structurally-typed records) should not involve the definition of a type alias, otherwise the benefits over case classes are too small.

arturopala · December 8, 2023, 3:07pm

can you elaborate more on this, please?

julienrf · December 8, 2023, 3:33pm

In your example, what are the drawbacks of using a case class?

In my opinion, tuples are useful in situations where you would not (or could not) use a case class. Otherwise, you would simply use a case class, no?

A typical example is a foldLeft call as shown in Pre SIP: Named tuples - #28 by lrytz. Other examples are projects like Iskra or frameless, although named tuples alone may not be enough to support them, as shown in this section about projections between data types.

bishabosha · December 8, 2023, 3:52pm

As a general test of usability I have been using Named Tuples exclusively for Advent of Code 2023 solutions: GitHub - bishabosha/advent-of-code-2023: Advent of Code challenges 2023 (exercising the sub typing relationship, field selection, pattern match etc.) I did not find myself missing case class, particularly as for these kinds of problems methods are not really better than top level functions

arturopala · December 8, 2023, 4:25pm

My instinct would be to use named tuples with type aliases instead of unnamed ones, especially when more than one pair or when nested tuples, instead of ad-hoc case classes. Reason: cleaner code, better names.

sjrd · December 8, 2023, 5:11pm

This has been a long thread, and it has also been accompanied by extensive – hours-long – discussions off-line back here at EPFL. It is time that I provide my analysis and opinion. There are several broad areas I want to touch upon:

Motivation: use cases, motivating example, benefit over case classes
Migration story
The infamous subtyping direction

I will post each of these areas as separate posts, because I suspect that different sets of people will want to like–or not like–them separately.

Use cases

Often, SIPs and Pre-SIPS start with terrible motivating examples. Usually this is because they focus on simple “how does it work” examples rather than “why would I use it” examples. This Pre-SIP is no exception. “Why would I use it” examples are typically longer but much more important.

I see 3 areas where named tuples bring substantial value over what we currently have:

Methods returning multiple result
Enabler for named pattern matching
Data type for intermediate results in operations that compute types (notably, database-like manipulations)

Multiple-result methods

Methods that return multiple results are not rare. In the collections library, for example, we can find partition, span, splitAt, partitionMap. These methods really want to return two results. They use a tuple to do so because that’s what the language offers to do so, not because the result is semantically a pair. (Contrast with unzip or unzip3, for which the result being a tuple makes inherent sense.)

I’ll choose partition as the canonical example. It is a good example because I’ve seen on numerous occasions developers saying that they never remember which side is which:

val (underagePersons, offAgePersons) = persons.map(_.age >= 18)
// oops, got it wrong

The confusion is not surprising: there is no fundamental reason that matching elements should go to the left and non-matching elements should go the right. The Saladoc of course says which is which, but it would be better if the API itself would provide the information.

This would naturally be achieved with named tuples. We would define partition as:

def partition(p: A => Boolean): (matching: List[A], nonMatching: List[A]) = ???

and then the confusion would immediately disappear.

Note that in argument position, we would use separate parameters for this, each with its own name. We only do this in result position because we have no other choice.

Also note that the named tuple type is used exactly once. That’s because it’s barely a type at all; it is almost only part of that single method’s signature. Defining a case class for that would make no sense.

Enabler for named pattern matching

Named pattern matching is very desirable. It has been coming up regularly over the years. Why has Scala never added support for it? Because we never figured out we could do it. The latest SIP on the topic came close, but did not succeed in the end.

With named tuples, we finally have a good answer to named pattern matching: if an extractor’s unapply method returns a named tuple or an Option of a named tuple, we can use its names in the pattern. For example:

object Duration {
  def unapply(s: String): Option[(length: Long, unit: TimeUnit)] = = ...
}

input match {
  case Duration(length = len, unit = TimeUnit.Seconds) => s"$len seconds"
}

Once again, this type is written exactly once: in the signature of the unapply method. It would never need a type alias or any other kind of explicit name.

This use case cannot be replaced by case classes. If we tried to do that, we would not be able to explain case classes in terms of other language features. They would have to be truly magic, and that is something we always wanted to avoid.

Computed intermediate types

I am not myself a user of database operations or any other thing like that, so I will refrain from motivating why we want those operations in the first place. However, using named tuples for them is a true enabler.

Operations like joins take two sets of rows and produce a new set of rows. The unique aspect here is that the resulting type can be generically computed from the types of the inputs.

Type computations cannot create classes (nor traits or any other form of nominal types). However, they can produce new types that structurally compose other types. This applies whether or not the type computations are in the language (e.g., with match types) or in macros. Therefore, using case classes here is also a complete no-go.

Existing solutions try to use structural types, but these are notoriously difficult to handle. In particular, their unordered nature makes it sketchy to destructure and compute upon (although we can construct them).

Named tuples, with their ordered, static list of name-value pairs provide a unique solution to this category of problems.

Anti use cases

As several people have already observed, as soon as you have to define a type alias for your named tuple, the usefulness compared to case classes is debatable at best. I will go as far as to say it is actively harmful, for several reasons:

Lack of a place to put associated documentation for each field,
Potential to mix and match, by mistake, two types that are structurally the same (but not semantically), and
The sheer decision factor of having to choose between case classes and named tuples.

Therefore, in my opinion, defining a type alias over a named tuple should be seen as a code smell. It may happen very sporadically in some situations (every code smell has its exceptions), but the overwhelming majority of cases should not do that. It might be good to lint against it. It certainly does not serve this proposal that the explainer puts this “use case” forward. We should never show this; not encourage developers to do it by giving them bad examples like that.

The fact that this is an anti-use-case does not undermine the value of the actual, good use cases I have elaborated on above, though.

sjrd · December 8, 2023, 5:12pm

Migration stories

If you are a regular reader of this forum and GitHub PRs, you probably know that I am “the binary/TASTy compatibility guy”. Everything I lay my eyes on, I immediately see the potentials for incompatibilities.

In the case of named tuples, you’ll tell me: “what’s the problem? They’re a new kind of type, they won’t pose any compatibility issue!” And you’ll be right, as long as we talk about the feature itself. But it goes further.

I mentioned earlier the example of partition. Surely, if and when we do get named tuples in the language, we’ll want to improve the existing collection API with a named tuple as the result of partition. The same goes for the other multi-result methods I mentioned. It will also likely happen in other third-party libraries.

Likewise, we will want to improve existing extractor unapply methods to return named tuples instead of unnamed ones.

To be able to do that without breaking compatibility (for the stdlib, that means to be able to do it at all), those changes must preserve binary and TASTy compatibility.

To preserve binary compatibility, the erasure of the named tuples, and their run-time behavior, must be the same as unnamed tuples. This is indeed preserved by the current proposal, and by several other encodings we can think of. (It wouldn’t be preserved if we used a run-time pair of names/values, or a separate class hierarchy, for example.)

Preserving TASTy compatibility is trickier. In general, changing the result type of a method for something that is not equivalent (both subtype and supertype) is not allowed: it must be a subtype because callers need that, and it must be a supertype because overriders need that. However, for final methods (or methods of final classes), we only need the newer-is-subtype-than-older direction.

(Btw, if you don’t know yet, these conditions can be checked automatically for you library using tasty-mima.)

If we want to improve partition, which is effectively final in List, to use named tuples, we would need a named tuple to be a subtype of the corresponding unnamed tuple. Likewise, for extractors found everywhere, we need that subtype direction. Note that a conversion (implicit or explicit) does not work here.

You might say: this cuts both ways. If I want to improve a tuple parameter to become a named tuple, we need the other subtyping direction. This is however not true for 2 reasons: a) we would not have used tuples in that case but rather several parameters in the first place; and b) parameters of methods are invariant anyway.