Better support for structurally typed records

julienrf · November 16, 2023, 7:22pm

A couple of months ago, I took some time to assess the state of structural typing in Scala 3.3.0. My feeling is that the language could benefit from the addition of structurally typed records with a bespoke syntax for both types and terms.

Unfortunately, I don’t have much time currently to complete this investigation, so the purpose of this post is to:

acknowledge the work done by Olof Karlsson and Philipp Haller
share the status of my work
see if there is any interest in continuing this work, both from the SIP Committee side and from the industry side

I will summarize my thoughts here, and I invite the interested readers to check out my full write-up on the matter.

Structural typing is useful when you want to manipulate structured data but you don’t want to define a class for it (e.g., because it is a temporary data structure used just once, which does not justify the “cost” of defining a class for it). Examples of Scala projects that involve structural typing are chimney, Iskra, Tyqu.

Today, Scala supports structural types via type refinements. As an example, here is a method divide that returns a structurally-typed result:

def divide(dividend: Int, divisor: Int): Record { val quotient: Int; val remainder: Int }

In Scala 2, calling a structurally-typed member (e.g., calling .quotient on the result of the method divide) involves reflection, and is discouraged. Scala 3 provides a way to implement the type Record in an efficient way that does not involve reflection. It relies on a special type Selectable.

The special type Selectable solves the problem of calling the members of structurally-typed values, but it does not address the problem of constructing or transforming structurally-typed values (e.g. concatenating two records). In 2018, O. Karlsson and P. Haller published a paper, Extending Scala with records: design, implementation, and evaluation, to address that problem. They proposed to add a very lightweight extension to the language (no new syntax, what they really introduced in the compiler was only the ability to synthesize implicits to compute the types resulting from the manipulation of records, and some rules to ensure the correctness of extending records). They also checked that the run-time performance of their Records implementation was good.

My intuition was that their work did not get the visibility it deserves, and that the implementation of the aforementioned libraries involving structural types (chimney, Iskra, and Tyqu) would be simpler if extensible records were part of the language.

To check this hypothesis, I rebased their work on top of the main branch of the Scala 3 compiler, and, after solving 5 years of Git conflicts, I tried to write minimal, record-based, re-implementations of chimney, Iskra, and Tyqu as tests.

My analysis is that only the chimney implementation would really benefit from the extensible records proposal. The benefits on Iskra and Tyqu’s implementations were quite marginal (more details in the full paper).

Another observation is that the extensible records proposal would not really improve the usability of those libraries. In my opinion, the main reason for that is that the syntax for writing record types and for manipulating record values is too verbose and non-idiomatic. See:

val point: Record { val x: Int, val y: Int } =
  Record("x" ->> 1, "y" ->> 2)

Let’s compare that with tuples:

val point: (Int, Int) =
  (1, 2)

Or case classes:

case class Point(x: Int, y: Int)
val point: Point =
  Point(x = 1, y = 2)

The thing is that, the main purpose of structural types is to be used when defining a class is not possible or convenient. So, the workarounds such as defining a class or a type alias (type Point = Record { val x: Int; val y: Int }) are irrelevant.

My main conclusion is that if we want to make structural types (or structurally-typed records) usable in Scala, they need to have a concise syntax.

In the paper, I propose the following syntax, which looks like tuples, but with named fields instead of positional fields:

val point: (x: Int, y: Int) =
  (x = 1, y = 2)

def divide(dividend: Int, divisor: Int): (quotient: Int, remainder: Int) =
  (quotient = dividend / divisor, remainder = dividend % divisor)

I believe more research is needed to validate that it would really work in practice and improve the usability.

My second idea is that a better support of generic programming (Mirrors) with records could simplify the implementation of Iskra and Tyqu (and the other libraries that involve structural types), by providing automatic conversions between case classes and records, for instance (more details in the full paper). However, here too, more work is needed to validate this hypothesis.

chuwy · November 17, 2023, 6:25am

Haven’t read the paper yet, but I’m quite interested in this part:

providing automatic conversions between case classes and records

When I first got interested in structural typing in Scala 3 - this was what drove me away from experiments. Namely, the Selectable-based records are doomed to remain the second-class citizens compared to case classes (the first-class record type). For example, how would one define a type class instance for them? If they’re so special - there’s very-very limited scope where I’d prefer them over case classes. The structurally-typed records totally must provide some compatibility layer with case classes to not become a niche feature that nobody cares about.

A couple of questions:

Would it be possible to define a type alias for the record with the proposed syntax? E.g. type Division = (quotient: Int, remainder: Int)
Are proposed records ordered (like case classes and tuples) or unordered (like typical Selectable)?

Although, if the Record is ordered it seems like a good addition to Product family:

Tuple - name-free, label-free
Record - name-free, label-aware
case class - name-aware, label-aware

Another thing that can drastically increase the value is row-polymorphism. Even though it seems out of the scope of the paper - I think structurally-typed records could be the basis for later row-polymorphism and it’s important to plan how it could look like and not break anything on the first stages.

Although, on the other hand if records added to the language the right way, we should be able to write something like following with just library code:

type Person = (name: String, age: Int, citizenship: Country)

def isMajor[P <: Record](person: P)(using Polymorphic[P, Person]): Boolean =
  person.citizenship match
    case US => person.age >= 21
    case _ => person.age >= 18

julienrf · November 17, 2023, 7:00am

Yes. The type (quotient: Int, remainder: Int) would be a shorthand for the regular type Record { val quotient: Int; val remainder: Int }, which can be aliased like any other type.

That’s a good question, and no, records are not ordered. This is why they would need a special type of Mirror.

That’s interesting to think about that as well! I don’t think your example needs row-polymorphism, though, subtyping seems to be enough:

type Person = Record { val name: String; val age: Int; val citizenship: Country }

def isMajor(person: Person): Boolean =
  person.citizenship match
    case US => person.age >= 21
    case _  => person.age >= 18

chuwy · November 17, 2023, 7:26am

I don’t think your example needs row-polymorphism, though, subtyping seems to be enough

Ah sure, it just wan’t full enough. This one would better reflect what I meant:

type NoName = Record { val age: Int; val citizenship: Country }
type Person = Record { val name: String; val age: Int; val citizenship: Country }
case class PersonClass(name: String, age: Int, citizenship: Country)

// All compiles:
isMajor[NoName](noName)
isMajor[Person](person)
isMajor[PersonClass](personClass)

soronpo · November 17, 2023, 3:54pm

Related discussion: Named Tuple Arguments / Anonymous Case Classes

bishabosha · November 19, 2023, 12:26pm

In your proposal you suggest a new trait Mirror.Record, with a fromFields(fields: Map[String, Any]) method, however this could be done by an alternative implementation for fromProduct (and we can change the implementation in the Synthesiser based on MirroredType):

def fromProduct(p: Product): Record { val x: Int, val y: Int } =
  val fieldIndices = p.productElementNames.zipWithIndex.toMap
  val x = p.productElement(fieldIndices("x")).asInstanceOf[Int]
  val y = p.productElement(fieldIndices("y")).asInstanceOf[Int]
  Record() + ("x" ->> x) + ("y" ->> y)

even better, if Product had a productElement(name: String) method that would avoid the intermediate step (more bytecode in every case class of course).

Edit: I am coming back to this thought, and if the purpose of the original fromFields method was to allow conversion from other Records, then I imagine that would be an identity operation, unless dropping fields is desired - but in such a case maybe an extra overload is desirable if we don’t make Record a Product

odersky · November 20, 2023, 1:08pm

The concise syntax feels very natural. The question is, should it define some structural type or a named tuple? The difference between the two is:

Structural types are unordered and have width subtyping. They don’t support generic tuple operations out of the box, but such operations can be made available via conversions. Structural types support type members and type-dependencies between members.
Named tuples are ordered and don’t support width subtyping, type members, or type dependencies. On the other hand, all generic tuple operations are applicable.

The proposed syntax is a natural extension of tuples, so maybe its semantics should be named tuples as well?

bmeesters · November 21, 2023, 11:57am

Though I understand and (somewhat agree with the conclusion), it also means that we are ‘back’ at square 1 regarding structural types. If I want to create a method that takes some data structure that has a name (string) and id (UUID) for example I cannot really do that with named tuples. Since any case class or tuple that has that and more will be rejected. I expect the area where structural types are useful is much bigger than where named tuples are useful. (though maybe with type classes and derivation you could mimic the behavior).

Sporarum · November 21, 2023, 1:31pm

Why not the following syntax then:
{ x: Int, y: Int }
It better communicates the idea that there is no order

It has the downside however of potentially looking too similar to {val x: Int, val y: Int } which it aims to (more or less) replace anyways

It also loses the correspondence with normal classes:

class A{
  val x: Int
  val y: Int
}
// semantically similar to
{
  val x: Int,
  val y: Int
}

// but

class A{
  x: Int
  y: Int
}
// not really semantically similar to
{
  x: Int,
  y: Int
}

odersky · November 21, 2023, 4:11pm

Yes, but structural types do that today. But that’s tied to reflection, so it’s inherently incompatible with the idea that there’s a lightweight way to construct and access them.

So if you don’t want to access classes but treat the construct in isolation, what is more important for you?

Tuples:
+ smallest footprint, fastest access and a large library of generic operations.
- field order matters, no width subtyping (but both can be achieved with explicit conversion functions)

Structural types
+ Fields can be permuted and dropped in supertypes.
- Labels have to be stored with the data so they are bulkier and slower to access than tuples

lavrov · November 21, 2023, 4:57pm

Yes, but structural types do that today. But that’s tied to reflection, so it’s inherently incompatible with the idea that there’s a lightweight way to construct and access them.

Should it be necessarily existing reflection based structural types or first class records could be considered?

odersky · November 21, 2023, 5:22pm

First class records are the structural types treated in isolation of my last mail. I should have been more clear about this. pros and cons remain as stated.

lavrov · November 21, 2023, 5:39pm

There is an old idea of implementing records without reflection. Fields are resolved statically thus it does not have the cons of structural types (reflection, storing field names). Unfortunately that thread does not have a conclusion about feasibility of such design. @odersky why this idea was not pursued?

odersky · November 21, 2023, 6:05pm

Thanks for the pointer, I had not remembered that particular proposal. It’s a lot more expensive than named tuples, since we have to generate these Labeled classes on the fly and store labels in values.

sideeffffect · November 22, 2023, 9:59pm

It would be nice to have both Named Tuples and Structural Records. But if it’s either-or, in my experience, Structural Records would bring much more practical value in the industry.

There, you very often work with data that are under-defined and open for future extension and where you care only about a few fields at one moment. Structural Records would be great fit for this use case.

odersky · November 23, 2023, 8:24am

Can you give some use cases where first-class records are better suited than named tuples, and that at the same time are not yet covered by Scala’s existing structural types?

chuwy · November 23, 2023, 6:01pm

If understand it right, first-class records open the door for row-polymorphism, i.e. with first-class records we can say that:

type Person1 = { name: String, age: Int }
type Person2 = { age: Int, name: String }
// Person1 =:= Person2

And if they’re named tuples - they must be unrelated.

Also, IMO records syntax is more concise and refactoring-proof - you don’t have to change all call-sites if you changed your definition.

AMatveev · November 23, 2023, 7:26pm

Are there any plans to implement named tuples in scala?

odersky · November 23, 2023, 9:40pm

I am exploring the idea. If it pans out we’d come back with a Pre SIP.

lihaoyi · November 24, 2023, 12:50am

Also, records would be extensible, allowing subtyping between them

type Person1 = { age: Int }
type Person2 = { age: Int, name: String }
// Person2 <:< Person1

Right now, longer tuples are not subtypes of shorter tuples, and I assume the same would apply to hypothetical named tuples. I don’t see any fundamental blockers from making tuples subtypes of each other though, both for positional and named, so maybe that’s something we could add?