Pre SIP: Named tuples

I would expect that when the word type is in front of it, certainly. Otherwise, it seems to me as good a candidate as any for a general-purpose naming operator.

RE the proposal for using named tuples for unapply methods, could we perhaps extend NamedTuple[Labels, T] to accept any T?

say you have type Point:

case class Point(x: Int, y: Int)

define its unapply as such

object Point:
  def unapply(p: Point): NamedTuple[("x", "y"), Point] = NamedTuple.apply(p)

there could even be syntax sugar possibly?

object Point:
  def unapply(p: Point): Point(x: Int, y: Int) = NamedTuple.apply(p)

Possibly as an extra make NamedTuple a subtype of a generic NamedType?

my concern is avoiding wrapping objects just to get labelled pattern, why did we introduce by-name pattern match extractors if we lose the benefits

Note that in Scala 3 the unapply method of Point returns simply this, so I can’t see how we can improve that.

But there could be other benefits if we had a cheap way to map a case class to a named tuple. The tricky bit is that tuples have a lot of essential infrastructure (e.g. they are considered subtypes of pair types A *: B), and it’s not clear to me how to extend that infrastructure to all products. Which we’d have to do if the case class → named tuple mapping should not copy case class fields.

The reason I don’t agree with your premise is that points (2) and (3) aren’t choices we’ve already made. You’re arguing that named types would lead inexorably to them, and I don’t agree.

On point (2), I guess the injection you’re referring to is eta-expansion. And I guess you’re arguing that eta-expansion must preserve argument naming, but I don’t think that’s necessarily true. In fact, it wouldn’t be possible – FunctionN is defined with its apply method, and that already has names (even though they’re not useful). I’m not sure what surjection you’re proposing that would make a bijection here. But certainly, reworking anonymous functions is not necessary for named types to be useful. And if we wanted to support that with named types, we could add a parallel NamedFunctionN or something, with non-optional names at the call site (and a different way of eta-expanding so you could choose to preserve names). This wouldn’t be burdensome, because it would be opt-in.

On point (3), there is a bijection (up to a certain arity, anyway) via FunctionN#tupled and Function.untupled. And I think you’re arguing that that these things should preserve naming (which presupposes point (2)). Again, this could be supported by parallel, opt-in “named functions”, which have their own tupled and untupled mechanisms where names are preserved and non-optional.

I think trying to take the “named type” too far and deep into the existing type system – by replacing all instances of labeling/naming values with named types instead – leads astray and is unnecessary. It can be strictly additive. It doesn’t have to directly unify named tuples with named method arguments.

I also like your naming experiment. It’s similar to FieldType from shapeless – something that’s already a pretty successful and useful implementation of “named types”, where the named type is a subtype of its corresponding type:

type FieldType[T, N] = T with Tagged[N]

// where Tagged is just:
trait Tagged[N]
// so you can cast anything to it

Since the named type is a subtype, you can pass named values to things that don’t require names, without having to explicitly discard the name. I’d just argue that it should get a first class syntax (like base: Double in type position and base = double in value position). And then named tuples fall out of that, and they’re a subtype of unnamed tuples.

2 Likes

Maybe one more question for @odersky here:

Unnamed tuples support code like (1,2).copy(_2=3). How will named tuples work with .copy? Will they even work with .copy? Will .copy allow names? Would that require special handling in the compiler? Can other user-defined non-tuple data types make use of that to make their copy methods, or methods in general, more flexible?

Perhaps this also applies to the named tuple getters, as well: given this proposa, can I define my own class Foo[T <: String] and have some way of saying given val foo: Foo["x"] I can refer to named properties foo.x or call methods foo.bar(x = ???) or pattern match case Foo(x = y) =>?

I imagine there are a lot of use cases where you want something like a named tuple, but not a scala.TupleN itself. e.g. a database library may have a Row type, a RPC library may have a Request type and Response type, etc. These would be statically-typed, flexibly named key-value mappings that look like named tuples but may need to be their own subclass. Is there some way we could support such use cases?

1 Like

Expanding/changing `Selectable` based on upcoming Named Tuples feature is the most promising thing I’ve seen for that.

Structural types allow you to build facades over existing class types. I think the strength of a named tuple is that there is only one implementation, so it’s easy to build new named tuples in operations like joins.

It will be straightforward to define translations between structural types and named tuples. AFAIK, macros to do this already exist between structural types and unnamed tuples, so adding names should not be problematic.

Will this fix any of these cases:

def foo(x: Int, y: String = "") = ???
val bar = foo
val baz = foo(_, "z")

foo(x = 1)
// bar(x = 2) // fail
// baz(x = 3) // fail

Sorry, small offtopic about the current state of name bindings in function parameters:

type F = (Int, Int) => Int
type F2 = (a: Int, b: Int) => Int

trait C extends F  //< compiles

trait C2 extends F2  //< fails with message
// "(a: Int, b: Int) => Int is not a class type"
1 Like

One of important use cases for Named tuples could be addressing in a simpler way the current problem of expressing data structures with mapped field types. For example, scalasql library uses the higher kinded data pattern:

case class City[T[_]](
    id: T[Int],
    name: T[String],
    countryCode: T[String],
    district: T[String],
    population: T[Long]
)
object City extends Table[City]

This pattern allows a case class to serve as a model (with identity HKT) and the dsl query structure (where fields are wrapped with a sql-specific HKT).

One could argue that the possibility of mapping data structure fields should not be required to be written out explicitly, especially with the complex concept of HKT. It could be assumed as a property of this class by some language features. Named tuples seem to be a really good fit for this feature, as we already have match types operating on tuples in Scala, and they could work as well on the named tuples:

case class City(
  id: Int,
  name: String,
  countrycode: String,
  district: String,
  population: Long
)

type AsSql[Xs <: NamedTuple] = Xs match
  case x *: xs => Sql[x] *: AsSql[xs]
  case EmptyNamedTuple => EmptyNamedTuple

/* Assuming that case class <: specific NamedTuple, when the field contract is true */
def sql[A <: NamedTuple]: AsSql[A] =
  /* Some dynamic magic*/
/* If the property of case class <: specific NamedTuple is not true, we could as well:*/
transparent inline def sql[A <: Product]: AsSql[ToNamedTuple[A]] =
  /* Some dynamic magic */

sql[City] // => sql: (id: Sql[Int], name: Sql[String], countrycode: Sql[String], ...)

This way, the model can be kept simple, and the type system would allow for this kind of flexibility without exposing the user to the internal works of it. There would be the match type - but hidden, instead of exposed HKT. For the user, it would be just a type.

The main difference between named tuples and structural types in that case is that the named tuples are aware of being algebraic products (with all its benefits and flexibility)

2 Likes

I was thinking along the same lines. While I like the general approach scalaSQL, I have reservations about its user-facing higher-kinded types.

There should be a straightforward way to map between case classes and named tuples. It would be good if people who have more domain knowledge on schema mappings would work out what we need here.

2 Likes

What about performance?

I tried to follow the thread, and saw reference to zero cost abstraction and to the fact that one of the motivation is to avoid the cost of a full class declaration.
But I’m not clear of the cost of named tuples wrt to tuple (the same I think), or to class.

Nor if it could be a better path toward a real immutable struct in Scala, taking benefits of recent/futur jvm optimization on that topic, like Valhalla flat memory layout and immutable structs (it seems I saw in an other thread that case classes are not a good candidate for that, but I don’t find it back, so perhaps I’m plain misunderstanding here)

I don’t see named tuples as being a feasible replacement for higher-kinded case classes in ScalaSql. They would be a great addition, but I don’t think they can serve as a replacement.

ScalaSql uses “higher kinded” data throughout, not just in case classes. The mapping is something like

SQL (Q) Scala (R)
Expr[T] T
(Q1, Q2, ...Qn) (R1, R2, ..Rn)
MyCaseClass[Expr] MyCaseClass[Sc]
Query[Q] Seq[R]/geny.Generator[R]

Named tuples would be a great addition in that it would provide the flexibility of tuples with the usability benefit of having names. This is exactly analogous to the benefit named tuples provide over case classes or normal tuples in normal Scala. So we may have an additional row, with “SQL query” named tuples “normal Scala” and named tuples:

SQL (Q) Scala (R)
(foo=Q1, bar=Q2, ...qux=Qn) (foo=R1, bar=R2, ...qux=Rn)

But neither use case of named tuples here replaces the higher-kinded case classes in ScalaSql, any more than named tuples replace case classes in normal Scala. All the standard reasons apply: named tuples do not have short aliases, working with large named tuples and ascribing them as types gets very verbose, you can’t define helper methods, etc.

As @sjrd mentioned earlier, type-aliasing named tuples and using them as a replacement for case classes is a bit of an non-goal: the whole point of named tuples is that they are anonymous and save us the hassle of defining it somewhere before use. If we didn’t care about that, we would just define a case class

So while named tuples would be an excellent addition to a library like ScalaSql, I don’t see any way it can reasonably replace the higher-kinded case classes.

Maybe there are other ways to replace the higher-kinded case classes, with clever whitebox macros or structural types, but that’s all still at a “research” level of maturity and not something I’d bet a company or codebase on. In contrast, higher-kinded types - despite all their shortcomings - are widely used in the wild and well understood by all existing tooling. e.g. I don’t want to bet the farm on whitebox-macros that haven’t had IntelliJ support for the past decade and don’t have a viable roadmap for IntelliJ support in the next decade, when a supermajority of the community is relying on IntelliJ for their day-to-day work

The question is: Do users of the library interact directly with both SQL and Scala types? Or are the SQL types used by the mapping software only?

If it’s the latter, I’d argue that we should try to not push internal concerns into the simplest user-facing API. And I am saying that as someone who has helped pioneering this style 10 years back with our work on LMS and Virtual Scala. My experience then was that HKTs were tempting at first, but that in the end they added too much user-facing complexity. I am not saying this is necessarily the same for ScalaSQL, I know too little about it and the field in general to be able to judge. Just this: be careful and wary, and make sure there really is no alternative.

If users do need to interact directly with SQL types, I’d need some examples to understand better.

I did not mean for named tuples to replace case classes, but we should investigate whether we can create a standard way to map between case classes and named tuples that can be exploited by query libraries. Named tuples have the considerable advantage over case classes that we can express the type of the result of arbitrary joins and projections. So in that sense they are much closer to relational algebra and SQL itself.

P.S. And I would also try to avoid whitebox macros, for the reasons you have given. But structural types do work, and have been shown to play well with IDEs.

1 Like

For my own library, DataPrism, named tuples would work wonderfully to complement the higher kinded data. Today DataPrism automatically converts tuples between types like (T1, T2, ... TN) and (F[T1], F[T2], ..., F[TN]]) for any type F[_]. Extending this to named tuples would be fairly simple.

If named tuples were added as an opaque type as outlined above, I do think we should have everything we’d need for convert a case class to a named tuple in Mirror already. I do however not think I would make use of this in DataPrism. Either the user would define a higher kinded case class with the required implicit instances, or they’d use a (potentially named) tuple and let the library derive the needed stuff. At most I could see having a function to convert a named tuple to a case class if names and types match.

I more just want named tuples because you make a lot of tuples when making SQL queries. That means you either have to live with _n to access their elements, or you have to add a bit of boilerplate for an actual case class.

1 Like

From what I understand, it seems higher-kinded classes in schema definitions would not be needed if query results returned named tuples instead of MyCaseClass[Sc]. Additionally, standardizing the query result types to tuples would make them consistent with projections other than SELECT *, which already return tuples today.

In fact, the line MyCaseClass[Expr] => MyCaseClass[Sc] would not be necessary in the mapping:

SQL (Q) Scala (R)
Sql[T] T
(Q1, Q2, ...Qn) (R1, R2, ...Rn)
(foo: Q1, bar: Q2, ...qux: Qn) (foo: R1, bar: R2, ...qux: Rn)

A schema definition would look like the following:

object City extends Table[(
  id: Sql[Int],  // Maybe `id: Column[Int]` would read better, although it is longer
  name: Sql[String],
  countryCode: Sql[String],
  district: Sql[String],
  population: Sql[Long]
)]

Thus, there is no need to “compute” the type of a SELECT query, it’s the type we just passed to the Table class, which could take its Mirror as a context parameter to retrieve the column names.

Then, when we could do a projection like the following:

val cityNamesQuery: Query[(name: Sql[String])] =
  for city <- City.select yield (name = city.name)
// Or, City.select.map(city => (name = city.name))

// And then (same as today)
val cityNames: Seq[(name: String)] =
  db.run(cityNamesQuery)

I wonder if that could work out.

1 Like

I agree that this is the crux of the question. I don’t actually know the answer here, as it is an empirical question and ScalaSql is relatively new and hasn’t seen enough usage “in the wild” to answer from experience, and we do not have any experience with using named-tuples/structural-types as alternatives.

It is true that named-tuples/structural-types more accurately model the flexible, flat nature of SQL rows v.s. the nested case classes and normal-tuples that ScalaSql currently uses.

The long-term ergonomics of named-tuples/structural-types are still unknown. e.g., if I JOIN two tables which both have an id: Int column (i.e. basically all of them), the current nested approach is able to give them different column aliases and place the respective ids in the proper ._1.id and ._2.id field of the nested data structure. How would JOINing two named-tuples/structural-types be able to handle this, given they represent a flat namespace where keys have to be unique? This is only one issue I could think of off the top of my head, I’m sure there are others

Going full-on named-tuples in both SQL-query and normal-Scala code is probably not possible because ascribing type parameters with database row named tuples (which would typically be very wide) would be very verbose. And in normal-Scala code, it is common to pass database row objects all over the place and into helper methods and utils modules

But it’s possible we could use a case class to model the normal-Scala data type, and use some language feature to synthesize a corresponding SQL-query named tuple to work with there. There’s still the issue of not being able to conveniently write helper methods for the SQL-query (as it would require ascribing the SQL-query named-tuple as a parameter type), but it would be less of a problem than being unable to ascribe parameter types in plain-Scala code. Whether that’s a blocking issue, or just a mild inconvenience, is something we’d need to determine empirically based on usage patterns and experience.

I’m trying to get help getting ScalaSql ported onto Scala 3 (should be relatively simple, just one 100LOC macro), after which it should be possible to experiment with this stuff in a more concrete fashion.

2 Likes

Is it really about unnamed vs. named? Could we think of it differently perhaps? After all, current tuple elements have names like _1, _2, etc. But to me it seems that the more interesting question is whether position is significant. For instance, if in one place I return a (name: String, age: Int) and in another place I return a (age: Int, name: String), are those the same type?

If position remains significant, then I think this is not really about named vs. unnamed but rather default-named vs. custom-named, in which case they are unrelated types. (1, "hi") should be an instance of (_1: Int, _2: String) and that should be as related to (a: Int, b: String) as that is to (x: Int, y: String).

But if this is about unordered tuples then I think ordered tuples need to be a subtype of at least the unordered tuple with the same names (_1, _2, etc.)

Either way I don’t see a sense in which current tuples lack a name. The problems described by @sjrd seems to be more about current tuples’ names being fixed and not conveying domain-specific meaning.

We could still call current tuples unnamed in the sense that they lack a customized name, but in terms of subtyping I don’t see how they have any less names than the new tuples.

3 Likes

One could handle this easily with type aliases. I find it quite appealing to define tables and query results in terms of named tuples, since that’s the database view, after all. However, we’d need then a way to “import” a query result into a program where we typically would use classes to describe things that come from the database.

So here’s an idea: Could we use the spread operator for that? I.e. if C is a class and
x: (name: String, age: Int), then C(x*) would expand to

C(name = x.name, age = x.age)

That would be checked for matching names and types, of course. More generally this would work for all function calls, not just class creations.

8 Likes

I am very confident that we can express this specific concern and probably the others as well. Join types are match types, so you can have a rich set of computations to determine the result type of a join.

2 Likes