Pre-SIP: Structural data structures that can evolve in a binary-compatible way

julienrf · April 19, 2022, 4:58pm

Case classes are often preferred to simple classes by Scala developers to model data structures. However, case classes have a drawback: they can’t evolve in a binary compatible way. Some Scala developers created workarounds based on code generation (e.g., contraband), or macro annotations (e.g., data-class, scalameta). Other developers just manually write simple classes (e.g., Scala.js, endpoints4s), but that requires a lot of undesired boilerplate.

I believe there is a need for a middle ground between case classes and regular classes, with some of the features of the case classes (mainly structural equality) but without compromising the possibility of making binary compatible evolutions. Let’s call them “data classes”.

This post details the motivation for data classes, and proposes a couple of ideas to get more support for them at the language level. Please let me know what you think of the proposed ideas, or if you see another path!

Motivation

Case classes are often preferred to simple classes by Scala developers to model data structures. However, case classes have a drawback: they can’t evolve in a binary compatible way (we can’t add or remove optional fields, nor mandatory fields with a default value).

For instance, consider the following case class definition:

case class User(name: String, age: Int)

Developers can write programs that use User, as follows:

val julien = User("Julien", 36)
val sebastien = User("Sébastien", 32)
assert(julien != sebastien)
val updatedJulien = julien.copy(age = julien.age + 1)

Let’s say that the class User is shipped in a library, and that at some point we want to add an optional field email:

case class User(name: String, age: Int, email: Option[String] = None)

This change is not backward binary compatible. The above program will have to be re-compiled with the new version of the class User, although the change is source compatible! The reason why it is not bacwkard binary compatible is because the signature of the constructor has changed, so has the signature of the copy method.

However, there are ways to add an optional field to a data type without breaking the binary compatibility.

Indeed, Scala developers have been using the following techniques:

code generation (e.g., contraband), which requires a build tool with a specific setup, which is sometimes not supported well by IDEs, and which makes the code harder to navigate through,
macro annotations (e.g., data-class, scalameta), which are currently dragging the adoption of Scala 3 (IMHO), and generally make the the code harder to navigate through since the macro annotations generate code that is not seen in the source files,
manually write simple classes (e.g., Scala.js, endpoints4s), which requires a lot of undesired boilerplate.

I believe there is a need for a middle ground between case classes and simple classes, with some of the features of case classes but without compromising the possibility of making binary compatible evolutions.

The features I would like to retain for data classes are the following:

field accessors
structural implementation of equals and hashCode
structural implementation of toString
Java serialization
lean syntax for creating and copying instances
support for “named field patterns” in match expressions (if it becomes implemented… Let’s put aside this item for now, since it refers to a language feature that does not exist yet)

Status Quo

Currently, to benefit from the aforementioned features on the class User without relying on macros or code generation, developers have to write the following:

  class User(val name: String, val age: Int) extends Serializable:
    private def copy(name: String = name, age: Int = age): User = new User(name, age)
    override def toString(): String = s"User($name, $age)"
    override def equals(that: Any): Boolean =
      that match
        case user: User => user.name == name && user.age == age
        case _ => false
    override def hashCode(): Int =
      37 * (37 * (17 + name.##) + age.##)
    def withName(name: String): User = copy(name = name)
    def withAge(age: Int): User = copy(age = age)

The following snippet illustrates how to construct instances, to copy them, and to compare them:

val alice = User("Alice", 36)
val bob   = User("Bob", 42)
// structural `toString`
println(bob) // "User(Bob, 42)"
// lean syntax for copying instances
val bob2 = bob.withAge(31)
// structural equality
assert(bob2 == User("Bob", 31))

Then, one can publish a new version of the data type User, with an additional (optional) field email. That new version of User is binary compatible with the previous one:

  class User private (val name: String, val age: Int, email: Option[String]) extends Serializable:
    def this(name: String, age: Int): User = this(name, age, None) // public constructor that matches the signature of the previous primary constructor
    private def copy(name: String = name, age: Int = age, email: Option[String] = email): User = new User(name, age, email)
    override def toString(): String = s"User($name, $age, $email)"
    override def equals(that: Any): Boolean =
      that match
        case user: User => user.name == name && user.age == age && user.email == email
        case _ => false
    override def hashCode(): Int =
      37 * (37 * (37 * (17 + name.##) + age.##) + email.##)
    def withName(name: String): User = copy(name = name)
    def withAge(age: Int): User = copy(age = age)
    def withEmail(email: Option[String]): User = copy(email = email)

So, for every added field, we have to remember to update the implementation of toString, copy, equals, and hashCode.

The problem statement is: how to keep supporting the use-case of defining a data structure that can evolve in a binary compatible way, while significantly reducing the associated burden?

After some internal discussions, I saw the following possible solutions, which are detailed further in the next sections.

The first approach would be to introduce a new type of class definitions that would support exactly this use-case. Developers would write class definitions, which, like case classes, would expand to serializable class definitions with structural equality and public field accessors, but unlike case classes would have synthetic methods like withName and withAge to transform instances (no public copy method), and would have a mechanism to ensure that the public constructor remains backward binary compatible over time. That approach would require the least effort from end-developers, but it raises some technical challenges (how do we manage the compatibility of the public constructor?), and the specification of the desugaring of data class would be more complex than the alternative approaches.

The second approach would be to focus on the more general use case of defining “structural” data types. Developers would write class definitions that would expand to serializable class definitions with structural equality and public field accessors, but nothing more. Such structural classes could be used to support our main use-case by manually adding transformation methods like withName and withAge.

The last approach would be to build on the existing case class feature, which already does exactly what we want when we define the primary constructor to be private, except that it also define a public extractor that would break the backward compatibility if the class evolves. Thus, the last approach would be to change the semantic of case classes with private constructors to also make their extractor private. This approach is the most “conservative” one in the sense that it does not introduce a new language feature.

The next sections discuss the proposed approaches in more details.

Fully Fledged Data Classes

In this approach, our User class definition would look like the following:

data class User(name: String, age: Int)

The compiler would expand it to:

  class User(val name: String, val age: Int) extends Serializable:
    private def copy(name: String = name, age: Int = age): User = new User(name, age)
    override def toString(): String = s"User($name, $age)"
    override def equals(that: Any): Boolean =
      that match
        case user: User => user.name == name && user.age == age
        case _ => false
    override def hashCode(): Int =
      37 * (37 * (17 + name.##) + age.##)
    def withName(name: String): User = copy(name = name)
    def withAge(age: Int): User = copy(age = age)

The desugaring would be exactly what we would write manually with plain classes: the compiler would implement structural equality and toString, it would define public accessors for the fields, and it would define public transformation methods (withName and withAge).

The main challenge is to deal with the binary compatibility of the class constructor if we want to publish a new version of User with new fields with default values. We need to find a way to tell the compiler what was the type signature of the previous version of the User data type.

One possibility would be to handle data class fields with a default value in a special way. For instance, if a developer writes a new version of User that includes the optional email, they would write the following:

data class User(name: String, age: Int, email: Option[String] = None)

And the compiler would desugar it to the following:

class User private (val name: String, age: Int, email: Option[String]):
  // public constructor that calls the primary (private) constructor
  def this(name: String, age: Int) = this(name, age, None)
  // ... then, just like the above desugaring

A data class that has fields with default values would have a private primary constructor and a public secondary constructor taking as parameters only the fields that don’t take default values, and calling the primary constructor with the default values for the remaining parameters.

That mechanism would allow developers to introduce new fields, but not to remove optional fields. To support this use case, developers would have to manually re-introduce the accessor of the removed field to return the previously defined default value, and to manually re-introduce the withXxx transformation method. In the case of email, this would look like the following:

// Removal of `email` field
data class User(name: String, age: Int):
  def email: Option[String] = None
  def withEmail(email: Option[String]): User = this

Structural Classes

A simpler approach (from the perspective of the language design) would be to focus on the more general use case of defining “structural” data types. That is, type definitions that support structural equality and toString.

The language would support the concept of structural class definitions, which would provide half of the features of case class definitions:

structural class User(name: String, age: Int)

This would desugar to:

  class User(val name: String, val age: Int) extends Serializable:
    def copy(name: String = name, age: Int = age): User = new User(name, age)
    override def toString(): String = s"User($name, $age)"
    override def equals(that: Any): Boolean =
      that match
        case user: User => user.name == name && user.age == age
        case _ => false
    override def hashCode(): Int =
      37 * (37 * (17 + name.##) + age.##)

structural classes would be very similar to case classes. The main difference is that they would not synthesize an extractor (an unapply method in the companion), meaning that we could not use “constructor patterns” on instances of structural classes. Other differences are that they would not extend Product and CanEqual, but that point is open to discussion, see below.

To define a data type that can evolve in a backwards compatible way, developers could use a structural class with a private default constructor, and add transformation methods, and a public “smart constructor”:

structural class User private (name: String, age: Int):
  def withName(name: String): User = copy(name = name)
  def withAge(age: Int): User = copy(age = age)

object User:
  def apply(name: String, age: Int): User = new User(name, age)

Note that the visibility of the copy method would be the same as the visibility of the primary constructor, private. (This is already the case, currently, with case classes.)

A backward binary compatible version of User with an optional email field could be defined as follows:

structural class User private (name: String, age: Int, email: Option[String]):
  def withName(name: String): User = copy(name = name)
  def withAge(age: Int): User = copy(age = age)
  def withEmail(email: Option[String]: User = copy(email = email)

object User:
  def apply(name: String, age: Int): User = new User(name, age, None)

Note that the public constructor (the apply method in User) has the same signature as before but it now provides a default None value for the email field.

This solution is more verbose than the previous one because of the explicit definitions of the transformation methods withName, withAge, and withEmail. However, the fact that transformation methods are defined explicitly also provides more flexibility. For instance, one could define more specific transformation methods for optional fields or collection fields:

def withEmail(email: String): User = copy(email = Some(email))
def withoutEmail: User = copy(email = None)

A challenge raised by this solution is that we need a way to make the private constructor effectively private at the bytecode-level. Indeed, since it is actually called from the companion object, it can’t really be private at the bytecode-level. At least, currently this is not the case for case classes with private constructor (see below). I see several possible solutions to this problem. The first solution was proposed by @smarter and consists of emitting the constructor as ACC_SYNTHETIC to make it effectively invisible from Java. Another solution could be to define the first “version” of User with a public constructor (and copy method), and then make them private in the second version only:

// v1
structural class User(name: String, age: Int)
// v2
structural class User private (name: String, age: Int, email: Option[String]):
  // re-introduce the old public constructor and copy method, for compatibility
  def this(name: String, age: Int): User = this(name, age, None)
  def copy(name: String = name, age: Int = age) = copy(name = name, age = age, email = email) // use the generated private copy method

In this version, the private constructor is really private because it is not called from the companion.

We might consider alternative keywords instead of structural. Maybe product would be a good one (and in such a case, the class may also extend Product, see also the discussion point below). Or data.

One thing that I like about structural classes, is that they can be seen an intermediate step between plain classes and case classes. Indeed, case classes are structural classes with an extractor. And structural classes are plain classes with structural implementation of toString, equals, and hashCode, and a copy method.

Case Classes with Private Constructors

As described in the previous section, the main difference between “structural” classes and case classes would be that structural classes would not have an unapply method in their companion. It made me think that maybe case classes alone would be enough to support our use-case. Indeed, if we changed the semantic of case classes with private constructors to also have a private unapply method (like this is already the case for their apply method), then we would not even need to introduce the concept of structural classes to the language. We could just use case classes with private constructors to support our use-case.

Our running example rewritten with a case class with a private constructor would look very similar to the structural class with private constructor:

case class User private (name: String, age: Int):
  def withName(name: String): User = copy(name = name)
  def withAge(age: Int): User = copy(age = age)
  
object User:
  def apply(name: String, age: Int): User = new User(name, age)

We could then define a new version of User with an additional optional field as follows:

case class User private (name: String, age: Int, email: Option[String]):
  def withName(name: String): User = copy(name = name)
  def withAge(age: Int): User = copy(age = age)
  def withEmail(email: Option[String]): User = copy(email = email)
  
object User:
  def apply(name: String, age: Int): User = new User(name, age, None)

Currently, this new version of User is not backward binary compatible with the previous one for two reasons. First, the private constructor is not really private at the bytecode-level, see the discussion point in the previous section. Second, because the compiler emits a public unapply extractor that allows users to write code like user match { case User(name, age) => ... }, which would crash on the new version of User.

So, the main question about this design is “should case classes with a private constructor also have a private extractor?”. The answer may not be obvious. Maybe there is a real need for defining data structures that need a controlled way to be constructed, but that are fine to be pattern matched on?

In any case, if we decide to now change the compiler to emit private unapply methods when the primary constructor is private, it would still be possible for users who want a public unapply to define it explicitly:

def unapply(user: User): User = user

That would also allow programs compiled with the new version of the compiler to be compatible with what the old version of the compiler used to produce.

Another argument is that the purpose of the case keyword is to enable pattern matching. It would look weird to define something with the syntax case class that does not support pattern matching.

Open Questions

Should data classes and structural classes also implement Product? I would say yes, but I didn’t think more about it.

Should the compiler synthesize “generic” Mirrors for them, like it does with case classes? Maybe, but only the fields that don’t have a default value should be mirrored.

bmeesters · April 20, 2022, 8:51am

I am very much in favor of doing something in this area since all current solutions are either inadequate (much boilerplate) or complicated and not portable (macros, codegen). I am not sure which of the proposed changes is best. But I would like to go for a solution with minimal (preferably none) syntax changes. I think that needing to explain the differences between structural/data class and case class makes the language more complicated (though both names might actually be better than case class…).

If the private constructor option is not desirable I think I would rather go for mixing in some marker trait in normal classes or case classes that influence the bytecode instead of introducing new keywords. I think the use case is important, but the amount of library code that needs it is comparatively small compared to the amount of application code that does not need it.

Jasper-M · April 20, 2022, 12:30pm

IMO a private constructor restricts how objects can be constructed. I don’t see a reason why it would affect destructuring / pattern matching.

julienrf:

In any case, if we decide to now change the compiler to emit private unapply methods when the primary constructor is private, it would still be possible for users who want a public unapply to define it explicitly:
def unapply(user: User): User = user

You could turn it around and if you want an evolvable case class define the private unapply method explicitly:

private def unapply(user: User): User = user

This is the most minimal solution and it already works today. Not extremely practical, but luckily in Scala 3 you don’t have to remember to update the unapply method if you adapt the case class.

Also, are you sure that pattern matching is not backwards binary compatible? Doesn’t user match { case User(name, 32) => ... } translate to something like this?

val x0 = User.unapply(user)
if (x0 != null && x0._2 == 32) {
  val name = x0._1
  ...
}
else throw new MatchError

Shouldn’t that still work if User is recompiled with an extra field?

arkban · April 20, 2022, 2:14pm

Personally I’d love to see the structural keyword as a stand-alone addition to simplify creating abstract base classes for hierarchies of case classes.

rssh · April 21, 2022, 4:55am

Imagine, how somebody will have a fun time, explaining to the novice Scala programmers, why the word ‘structural’ in Scala has nothing in common with structural typing (which is denoted by a Selectable trait, not a keyword).

sjrd · April 27, 2022, 2:51pm

This is definitely an important problem. As demonstrated in the original post, it is significant enough that several library authors have found various solutions, all to provide what is essentially the same public API. Putting a good solution directly in the language would definitely be an improvement.

I believe the analysis of the 3 possible solutions is pretty good. I will add a few considerations.

About `Product` and `Mirror`s

It is clear that data classes must not receive Mirrors. A Mirror statically exposes a type member with the Tuple of its element types. That type alias would change from one tuple type to another when adding a field to the data class, breaking the binary API of anyone using said mirror. So that is not an option.

Only mirroring fields with a default value doesn’t make sense. Mirror-based equality would be wrong. Mirror-based serialization would not round-trip. Etc.

I don’t have a clear argument against extending Product. I would err on the side of not extending Product, because IMO a Product is supposed to have a fixed, determined number of elements. It seems weird to me that, as we add fields to a data class, it won’t be a Product of the same number of elements. But there’s nothing that fundamentally goes against it either. I think we should see concrete use cases to be able to decide this.

`case class`es with `private` constructors

I think this solution has several significant issues.

First, it does constitute a backward source breaking change. It changes the meaning of source programs that already have case classes with private constructors, since they won’t define an unapply anymore. In addition, as I explained above, data classes must not have Mirrors. That is a second breakage compared to the status quo. While these things can be recovered by writing them out explicitly at the time we upgrade the compiler, it is a significant burden, and definitely does not play in this solution’s favor.

Second, the original post argues that

the main difference between “structural” classes and case classes would be that structural classes would not have an unapply method in their companion. It made me think that maybe case classes alone would be enough to support our use-case.

As explained above, there is at least one additional strong difference, namely the lack of Mirrors. Perhaps also the lack of extending Product. At this point, it becomes quite hard to justify how the private-ness of the constructor has an influence on so many different, unrelated things.

Third, I am convinced that we should reserve the concept of case classes for the things that we are going to use in case clauses of pattern matching. It doesn’t make sense to me any other way.

Fully Fledged Data Classes

While I want to like this solution the most, there is at least one awkward problem with it. The compiler has to come up with the name of the withX methods. While it seems obvious to create it as "with" + fieldName.head.toUpperCase + fieldName.tail, there are at least two problems with that:

If the field name is symbolic, like %, the generated name with% would be illegal. It would have to be with_% instead, but that becomes weird.
More annoyingly, if the first “word” of the field name is actually an acronym, like httpHeaders, the generated name withHttpHeaders would clash with coding styles that recommend acronyms to be all caps. In such a coding style, the proper name would be withHTTPHeaders. There is no precedent in the Scala language for desugarings that, by necessity, impose such choices.

Structural Classes

Because of the above, I believe this to be the best solution (defining syntax notwithstanding).

I would suggest some amendments to the proposed spec, however.

First, I think the copy method should always be private, irrespective of the visibility of the constructor. There will many good use cases for having a public constructor that accepts all the fields. Even when adding fields, the new primary constructor could be public to allow users to create an instance with all the new fields specified, while the old signature would be preserved as a secondary constructor. If we make the copy method public in those cases, we are preventing this use case, and additionally opening again the binary compatibility risks that case classes pose today. This does not seem wise to me.

A challenge raised by this solution is that we need a way to make the private constructor effectively private at the bytecode-level.

While I wouldn’t mind to have a solution to this issue, I certainly don’t think it is specific to the “Structural Classes” design, nor that it is blocking. There are plenty of cases where things not accessible to Scala become accessible to Java, and they are often ignored for the purposes of binary compatibility, on the grounds that Java consumers should be dealing with those problems, not Scala consumers.

About the motivating example

I think the motivating example should add a field whose type does not necessarily need to be an Option for the example to make sense. Sure, things that are Options (often) have a meaningful default value which is None. But that’s not at all the only cases. In fact, most of the time, the fields I add in my hand-written data classes are not Options. They just have a meaningful default value that correspond to the previous semantics.

In this case, I would suggest adding a field admin: Boolean. Its default value would be false. This assumes that the previous system had no notion of admin users, so all existing users are necessarily non-admins. Newly created users may be admin or not, though.

It’s better than an Option, but still a bit misleading because the default value is the obvious “zero” value of the type. But that’s probably fine.

Speaking of default values

The original post says the following about default values in the section “Fully Fledged Data Classes”, although it would equally apply to the other solutions:

One possibility would be to handle data class fields with a default value in a special way.

I think this would be a mistake. The proposed desugaring is at odds with the desugaring of default values everywhere else in the compiler. Since the changes are user-visible (an overload is not the same thing as one method with default params), that will necessarily have to complicate the spec. Using the same desugaring as the rest of the compiler is not an option, since that is known to pose problems of binary compatibility when evolving the APIs.

Therefore, I would stay well clear of assigning any specific meaning to default values in data classes. Providing constructors with the old signatures must remain the sole responsibility of the library maintainer.

smarter · April 27, 2022, 4:10pm

If both copy and withX lead to issues, perhaps we could rely on syntactic sugar instead of code generation:

struct class User(name: String, email: String)
val user = User("bob")

user.with(name = "Bob") // equivalent to:
User(name = "Bob", email = user.email)

More generally given

class C[T](val x_1: T_1, ..., val x_N: T_N) (or something equivalent, like a struct or case class) and

p: C[S] (with the uniqueness condition: if p: D[U] is also valid then C[S] <: D[U]), then

p.with(x_i = e_i) would desugar into

C(x_1 = p.x_1, ..., x_i = e_i, ..., x_N = p.x_N)
(and it’s easy to generalize that to allow multiple arguments).

sjrd · April 27, 2022, 4:18pm

That creates a binary and TASTy dependency from the call site to the particular current shape of the primary constructor, even if that constructor is private. That’s not acceptable.

smarter · April 27, 2022, 4:28pm

How so? That desugaring would be done in typer and should be followed by another desugaring to replace C(...) by either C.apply(...) or new C(...), if the result typechecks then we’re not depending on any private API (we could also directly generate calls to C.apply(...)). I guess it can be confusing in that changing a private constructor and keeping everything else the same could be a source-breaking change for callers. That could be avoided by restricting this feature to classes where there is always an apply method that matches the constructor (so case classes and struct classes presumably).

sjrd · April 27, 2022, 4:34pm

Actually, it’s much worse than that. It’s also problematic for completely public constructors, in a more insidious way.

If you start with

struct class User(name: String, email: String)
val user: User = ???

user.with(name = "Bob") // equivalent to:
User(name = "Bob", email = user.email)

and then you add an admin field like so:

struct class User(name: String, email: String, admin: Boolean):
  def this(name: String, email: String) = this(name, email, false)

then the already generated code

User(name = "Bob", email = user.email)

will incorrectly reset admin = false, instead of copying it from user.admin.

smarter · April 27, 2022, 4:41pm

Good point, I withdraw my proposal then
The other option I see is having struct classes generate overloaded copy methods so adding a field just adds an overload, but our current default parameter scheme doesn’t support overloads with defaults so that would require more special-casing.

morgen-peschke · April 27, 2022, 5:01pm

Not sure how feasible this would be for highly performance sensitive code, however one way around the copy issue is to have a method which takes and applies a bunch of lenses. ScalaPB goes this route, and it provides a nice alternative to copy that remains compatible when new fields are added.

bishabosha · April 28, 2022, 7:52am

could this be solved by desugaring after reading from tasty? (The primary constructor and fields would be known at this point)

sjrd · April 28, 2022, 8:06am

It would work for TASTy compatibility, but not for binary compatibility. And we’re still a long way away from an ecosystem that is exclusively built on TASTy compatibility.

lihaoyi · April 29, 2022, 1:50am

IMO if we want to make adding optional/default-valued params binary-compatible, we should make sure to do so consistently across:

Case class constructors
Normal class constructors
Case class .copy methods
Normal methods

These are all logically the same thing: method calls that can take a combination of positional and/or named arguments, some of which have default values and are optional. Having special-case syntax for some, but not others, is a recipe for confusion.

Here’s one option I haven’t seen brought up:

We limit binary-compatible modification to only adding new parameters with default values on the right.
We add an opt-in annotation, that automatically generates telescoping forwarders of the method, one for each default parameter.

That is to say, we have this:

@telescoping
def foo(a: Int, b: String = "", c: Boolean = true) = ???

expand into:

@synthetic def foo(a: Int) = foo(a, $defaultBlahBlah(), $defaultBlahBlah())
@synthetic def foo(a: Int, b: String) = foo(a, b, $defaultBlahBlah())
def foo(a: Int, b: String = "", c: Boolean = true) = ???

The telescoping methods could be flagged as @synthetic or something so they don’t affect typechecking and are ignored by the Scala compiler. Their sole purpose is to provide something for third-party code compiled against an older versions of the method to link against.

This can apply for case classes as well: with telescoping def this constructors and telescoping .copy methods.

@telescoping
case class User(name: String, age: Int, email: Option[String] = None)

class User(name: String, age: Int, email: Option[String] = None) {
  def this(name: String, age: Int) = this(name, age, $defaultBlahBlah())

  @synthetic def copy(name: String, age: Int) = copy(name, age, $defaultBlahBlah())
  def copy(name: String, age: Int, email: Option[String] = None)
}

Telescoping method definitions and constructors is already the de-facto way people evolve things a binary-compatible manner, not just in Scala but in Java as well. Its limitations - only adding new parameters with default values on the right - are widely understood. It seems like something we could automated for convenience without needing to come up with an entirely novel encoding.

This doesn’t solve the extractor binary and source compatibility problem - we’d need named-field-patterns or something similar for that - but it would let us evolve methods/constructors/copy in a binary-compatible fashion without overhauling the user experience with withFoo methods

sjrd · April 29, 2022, 9:01am

We would still need something that doesn’t generate an unapply nor Mirrors. So you would at least still have to combine that with one of the approaches in the original post to achieve that.

Note that they would have to be hidden from source typechecking, but still be available to TASTy retypechecking, for cases where they are used in an inline method. These two things happen in the same phase, so it would probably be much trickier to correctly implement than one may think.

As the original post showed, none of this is a novel encoding (or public API, actually). The proposed public API is that which is already used, one way or another, by major libraries who expose structural classes that need to evolve in binary-compatible ways.

This is a more severe limitation than one might think. Sure, as long as we’re only truly adding fields, restricting that to be at the end is viable. But there are other kinds of evolution that the proposed schemes allow, and that this would prevent. For example, expanding the set of possible values for a field. One needs to change its name and type for that, and provide separate getters and setters for the old name/type. Here is a concrete example where we did this:

github.com/scala-js/scala-js

Allow to specify target versions of ECMAScript > 2015.

committed 07:29PM - 12 Apr 21 UTC

sjrd

+760 -180

We introduce a new setting ESFeatures.esVersion to select specific ES versions t…o target for compliance. Values of `ESVersion` are defined from ES 5.1 up to ES 2020. The setting `useECMAScript2015` is remapped to be the same as `esVersion >= ESVersion.ES2015`. Since it represented two purposes (relying on ES 2015 features, and enabling the ES 2015 semantics of Scala.js language features), we introduce a separate setting `useECMAScript2015Semantics`, and deprecate `useECMAScript2015` in favor of either `esVersion` or `useECMAScript2015Semantics`. We adapt and clean up the linker code to reflect those two purposes. For now, we do not allow `useECMAScript2015Semantics` to be directly configured. It is always derived from `esVersion >= ESVersion.ES2015`. We do this not to introduce unnecessary configuration for now, but we could make it configurable on its own in the future. We leverage `esVersion >= ESVersion.ES2017` in one place in the core JS lib. More tests like that could be introduced in the future. We add `esVersion` in the generated `BuildInfo` for the test suite, and use it to better constrain some our tests of JS features. The `esVersion` is also made available to user-space libraries through `LinkingInfo.esVersion`. `assumeES6` is deprecated in favor of `esVersion`. We also introduce `LinkingInfo.useECMAScript2015Semantics`, to match the setting from `ESFeatures`. We enhance `NodeJSEnvForcePolyfills` and our Jenkins scripts to test with various target versions of ECMAScript.

We expanded the set of possible target ES versions, from a unique useECMAScript2015: Boolean for ES2015 (true) and ES 5.1 (false), to a full-fledged esVersion: ESVersion with more values. And we provided getters/setters for the old useECMAScript2015.

This is not something we could do by only adding fields, assuming that some copy method was exposed as well. So what should we have done in that situation if we had used your proposed scheme before? We would have had to, again, completely revert to using a custom class, taking care to implement by hand everything that was generated before.

lihaoyi · April 29, 2022, 11:56pm

I don’t disagree that “only add new things with default, only on the right” it is a major limitation, but I would argue that it is a common limitation in the field of data/schema evolution:

Language-agnostic data serialization frameworks like Protobuf/Avro/Thrift have this as the official recommended way of evolving the schema Schema evolution in Avro, Protocol Buffers and Thrift — Martin Kleppmann’s blog
Scala libraries like uPickle already have this as the way to evolve the case classes (the dictionary-based encoding only allows adding new fields with defaults, the tuple-based encoding only allows adding new fields on the right with defaults)
Working with SQL databases, a common recommendation is to only add new columns with defaults during migrations, migrate the application code over, and then asynchronously clean up the old column some time later. No “in-place” alter columns changing types etc.

If we have to draw a line somewhere, these seems like a very reasonable place to draw the line. It’s a line already understood and practiced by a wide range of people, with well-know limitations and practices around it.

julienrf · October 20, 2022, 3:47pm

Thank you everyone for this discussion. Based on your comments I have submitted this proposal.

hamnis · October 31, 2022, 9:46pm

As an alternative to the original data-class, I have implemented that as a scalafix rule here

Pre-SIP: Structural data structures that can evolve in a binary-compatible way

Motivation

Status Quo

Fully Fledged Data Classes

Structural Classes

Case Classes with Private Constructors

Open Questions

About Product and Mirrors

case classes with private constructors

Fully Fledged Data Classes

Structural Classes

About the motivating example

Speaking of default values

About `Product` and `Mirror`s

`case class`es with `private` constructors