Pre-SIP : Proper Specification for Match Types

sjrd · August 11, 2023, 9:35am

Currently, match type reduction is not specified, and its implementation is by nature not specifiable.

This is an issue because match type reduction spans across TASTy files (unlike, for example, type inference or GADTs), which can and will lead to old TASTy files not to be linked again in future versions of the compiler.

We propose a proper specification for match types, which does not involve type inference.
It is based on baseType computations and subtype tests involving only fully-defined types.
That is future-proof, because baseType and subtyping are defined in the specification of the language.

The proposed specification defines a subset of current match types that are considered legal.
Legal match types use the new, specified reduction rules.

Illegal match types are rejected, which is a breaking change, and can be recovered under -source:3.3.

In exchange, the proposal promises that patterns that are legal will actually keep working in a backward compatible way in the future (which is not the case with the current implementation, even for the legal subset).

See the full proposal already in the SIP repo. Its specification-heavy nature made it impractical to write the proposal in this forum without more infrastructure:

github.com/scala/improvement-proposals

SIP-56 - Proper Specification for Match Types.

scala:main ← sjrd:match-types-spec

opened 09:30AM - 11 Aug 23 UTC

sjrd

+354 -0

Currently, match type reduction is not specified, and its implementation is by n…ature not specifiable. This is an issue because match type reduction spans across TASTy files (unlike, for example, type inference or GADTs), which can and will lead to old TASTy files not to be linked again in future versions of the compiler. This SIP proposes a proper specification for match types, which does not involve type inference. It is based on `baseType` computations and subtype tests involving only fully-defined types. That is future-proof, because `baseType` and subtyping are defined in the specification of the language. The proposed specification defines a subset of current match types that are considered legal. Legal match types use the new, specified reduction rules. Illegal match types are rejected, which is a breaking change, and can be recovered under `-source:3.3`.

Comments welcome.

soronpo · August 11, 2023, 2:00pm

For how long (how many minor versions) will this flag be available?

sjrd · August 11, 2023, 2:04pm

The flag itself will remain available “forever”, but the real question is for how long will it still apply the old match type reduction. That I cannot say at this point. Likely as long as maintenance is not actively hampered by it.

soronpo · August 11, 2023, 2:08pm

That is fair. I’m all for “warn early deprecate late” when it comes to language features that need to be mitigated or removed. Like any language feature, who know how match types are “abused” in the wild. Maybe there are valid use cases that didn’t make the cut and we can find a better home for them somewhere else or somehow expand upon the legally spec-ed feature.

sjrd · August 11, 2023, 2:16pm

For once, thanks to the quantitative analysis we conducted, we actually know how they were “abused” in the entire open-source (published on Maven Central) ecosystem. Of course, we never know about closed-source software, nor did we analyze source code available on the net but without any published binaries.

From what I saw in tests (but not in published libraries), the most reasonable use case is to typetest+capture using either an & type (which only ever worked by chance) or a constrained type alias like type IsSeq[t <: Seq[Any]] = t (which is a bit better). This use case has a standard workaround of decomposing into two nested match types: one for typetesting, and one for capturing.

If there’s one direction in which to consider expanding the set of legal match types, I would go for a proper solution to this use case. And while doing so, using dedicated syntax for it like case List[t <: Bound] => would be better IMO.

liufengyun · August 23, 2023, 1:09pm

Nice proposal! It reminds me that provablyDisjoint calls TypeOps.refineUsingParent, the latter has some hacky logic to infer the prefix of a subclass from a parent type.

The complex and hacky machinery there can be avoided if we impose the following restriction on sealed classes/traits:

The subclass symbols of sealed traits/classes must be reachable from the base class symbol without going through C.this types.

Sealed traits/classes can still be defined as inner classes or local classes. But the following pattern will be rejected:

sealed trait Base

class A:
  class X extends Base

class B:
  class Y extends Base

The reason is that the subclass X and Y are not reachable from Base without going through A.this and B.this.

Conceptually, it makes the specification for sealed simpler: sealed provides a way to reliably partition a type into subtypes. In addition, it avoids the anonymous class problem which happens with sealed classes/traits. It will also be simpler for exhaustivity check, typeclass derivation, and match types.

In terms of implementation, registering child types becomes possible in contrast to registering child symbols as it’s done now in Dotty. Even if we continue to register symbols, there will be no need to do complex prefix & parameter inference in TypeOps.refineUsingParent — some substitution seems to suffice.

Just some tentative thoughts for discussion.

sjrd · August 23, 2023, 1:26pm

Thanks for the comments.

I happen to have just updated the proposal with a proposed specification for provablyDisjoint. In it, I make the sealed treatment even simpler than what you propose: it only relies on relationship between classes, and disregards prefixes entirely. It seems to be largely enough for normal use cases where we rely on disjointness.

soronpo · August 25, 2023, 11:37pm

If anyone wishes to comment on the SIP, please do so soon (either in this thread or the SIP’s PR). We will discuss it during the upcoming SIP meeting that will take place before the Scala Days event in Madrid, and it will be best if we can sum up community feedback by then.

MateuszKowalewski · August 27, 2023, 1:09am

Hold up the good work @sjrd! Having specs is always good. Thanks!

I’m most likely not the right person to comment on this as I’m still struggling with match types. (Every time I try to do something with them I get “Note: a match type could not be fully reduced; failed since selector X does not match case ___ => ___ and cannot be shown to be disjoint from it either.” But that’s just me, I’m obviously still lacking some intuition.)

But there is one thing about this spec that I don’t get really, already on the basic conceptual level:

The spec says it is needed as TASTy doesn’t contain the fully resolved typing information. This was quite a “raised eyebrow moment” for me.

My understanding was until now that TASTy is “fully typed”. (Even the name points in this direction). But it turns out that you need to “reduce match types” (which is equivalent to “compute types”, as this is the very purpose of match types) even when you have already TASTy?

Why can’t the resolved (“reduced”, “computed”) type be included in TASTy? Wouldn’t inserting a synthetic type alias for the name of the match type to the computed type directly before the place where the (original, previous) match type got applicated do the trick? This synthetic type alias could “shadow” the original match type, and just resolve to the computed type directly in a local scope.

This kind of resolution would not change with different compiler versions reading TASTy as the resolved type would now be included.

Where is the flaw in my reasoning?

sjrd · August 29, 2023, 9:20am

That is a very thoughtful comment. I understand why you may come to such a conclusion. After all, putting the resolved types in TASTy is exactly what we do for term-level match expressions (for GADTs, notably), or for type inference. Unfortunately, that idea does not extend to match types (or types in general).

Types are universal. They don’t have an identity, and they don’t have a location in the source code. If you create twice the “same” type in different portions of the codebase (including across separate compilation), they must behave in the same way. If they don’t, bad things happen (broken subtyping, broken or phantom overriding relationships, etc.) To take a concrete example, you can have List[Int] on the one hand, and List[X] that has been transformed by substituting X := Int on the other hand. These two types, even when coming from separate compilation, must behave the same. In particular, you could take List[X] with X := Int from a library, and also List[X] from the library but perform the substitution yourself, and you must end up with two types that behave exactly the same.

If we put the resolved (reduced to be precise) type of a match type into TASTy, we would still not be protected against the last example. Consider a match type MT[X]. If get an MT[Int] from a library (with its reduced form read from TASTy), and you also take MT[X] from that library and you do the substitution X := Int yourself (with your result for the reduction algorithm), then you must still get two types that always behave the same. If your algorithm for the reduction does not give the same result as the the reduced from you read from TASTy, you’ll have a discrepancy, and the two types will not behave the same.

Therefore, putting the reduced from in TASTy might avoid some computations (at the cost of some storage), but it does not remove the requirement to have a fully-specified, reproducible and stable reduction procedure.

lrytz · August 30, 2023, 9:05am

A concrete example (thanks @sjrd for the offline discussion): assume a library defines

class C[X](x: X):
  def f: MT[X] = ...
  def g: MT[String] = ...

for some match type MT, and we have client code

class D[X](x: X) extends C[X](x):
  override def f: MT[String] = ...

The signature of g in C is calculated when compiling the library, using a compiler 3.x. The library code is compiled with a different compiler 3.y. If the two match type reductions yield different results that is a problem.

Another example, for an expression C("").f in the client code, the inferred type should not change when upgrading the compiler version in the project.

MateuszKowalewski · August 31, 2023, 8:29pm

Before I say more I want first of all clarify once more that I’m very happy that more parts of the compiler get speced. This is completely unrelated to whether the specs are somehow compulsory needed. More specs == generally better!

But let’s play devil’s advocate.

Let’s pick up the last example.

Well, actually that’s not a big deal.

TASTy is versioned, like JVM ByteCode. The “3.y” compiler would of course just bump the TASTy version when it makes incompatible changes. So TASTy created by “3.x” would just not link to TASTy created by a later version “3.y”. There is no issue. (Besides that you just need to fully compile form source again, of course…)

I understand that bumping the TASTy version is like breaking binary compatibility was before TASTy. So this should not be done lighthearted. Also TASTy has advantages as you don’t need to break it in case you just want to change code-gen (which breaks of course binary compatibility, but that’s exactly the case TASTy can mitigate).

With the spec the situation is actually quite similar: In case the spec needs to evolve in the future you would of course need to bump the TASTy version; and everything is than like described above.

Having the spec helps “to not forget” to bump the TASTy version in case changes happen affect anything speced there. So things won’t break silently, and that’s very good!

The other point I wanted to comment on: Wouldn’t it be much more preferable for users of TASTy if it would live up to its name and actually provide a fully typed view on Scala code? So for example a TASTy interpreter (or some other TASTy consuming tools) wouldn’t need to reimplement parts of the type-checker. As the computed types are anyway only valid for one TASTy version, and the versioning would catch possibly incompatible implementations interacting / linking, why not put the computed types into TASTy for convenience of the TASTy consumers? It would make consumers much simpler by not requiring them to redo some of the work of the typer phase (which comes before the TASTy handling compiler middle-end, which could than handle a really fully typed tree; and this would even simplify the compiler itself a little bit, I think).

So regarding this SIP, I would ship it by all means. Having some more specs is always good.

But I still think putting the fully reduced types into the typed AST representation would be a good idea.

The spec actually enables this.

sjrd · September 1, 2023, 9:12am

Recall that, without the new spec, the reduction of match types was completely subject to the whims of type inference. Type inference is something that changes, and breaks, often. Why? Because every “bug fix” or “improvement” to type inference has a negative impact on some other code. We accept some breakage in (hopefully) rare situations to get improvements in (hopefully) more situations. This we do in minor releases in Scala 3.x. It is fine, because it only breaks source compatibility everywhere in the language … except current match types.

If we take your suggestion to its conclusion, we will have to make a hard breakage of TASTy versioning (basically a major language version) every time we change type inference in the slightest. That would be a major step back in terms of compatibility guarantees. I don’t think anyone wants a new major version every 3-6 months.

MateuszKowalewski:

The other point I wanted to comment on: Wouldn’t it be much more preferable for users of TASTy if it would live up to its name and actually provide a fully typed view on Scala code? So for example a TASTy interpreter (or some other TASTy consuming tools) wouldn’t need to reimplement parts of the type-checker. As the computed types are anyway only valid for one TASTy version, and the versioning would catch possibly incompatible implementations interacting / linking, why not put the computed types into TASTy for convenience of the TASTy consumers? It would make consumers much simpler by not requiring them to redo some of the work of the typer phase (which comes before the TASTy handling compiler middle-end, which could than handle a really fully typed tree; and this would even simplify the compiler itself a little bit, I think).

As I have demonstrated earlier, even if we did store more stuff into TASTy, it would not relieve us from the requirement of a reliable, stable reduction algorithm. So it would really only be convenience in some cases.

Note that “users of TASTy” are basically 2 things: the compiler, and tasty-query. Every non-compiler tool who wants to read TASTy should do so through tasty-query. tasty-query implements the spec so that you get all that convenience. In fact, this whole undertaking about match types that led to SIP-56 originated in tasty-query: I was faced with an insurmountable problem that match type reduction was not specifiable, and hence not implementable in tasty-query.

Now, you’ll say: but I wouldn’t have had to implement it at all if it had been stored! Except I still would have. Type substitution and subtyping (but not type inference) is essential to understanding TASTy programs (yes, I’m talking about TASTy as if it were a language, because it is in a sense). There is no way to compute overriding relationships without type substitution and type equivalence. There are substitutions that you have to come up with at the time someone decides to use the API of tasty-query, and that you cannot predict in advance in the compiler to store into TASTy. So again, we’re back to square one: we need a stable reduction algorithm.

TASTy does store everything non-specced that the compiler comes up with: overload resolution, type inference, implicit resolution, etc. It does not store everything that can be reliable computed from other things by following the spec. In some cases, I think a few more things could have been stored to make it faster or easier to compute certain things (notably, the Signatures of methods at definition site, or direct symbolic links to everything that’s private ). But I have not yet encountered anything that would make something possible that is not currently possible.

Katrix · September 3, 2023, 1:05pm

Just to be sure, what effect would this change have for the performance of match types. I’ve previously had problems with their performance, and I would prefer if this change at most came at a negligible compiletime cost.

sjrd · September 3, 2023, 2:14pm

I don’t expect any change to their performance, in either direction.

MateuszKowalewski · September 6, 2023, 9:15am

Thanks a lot for this in depth answer. It had some very interesting insights in it!

Like said, I’m not opposing this spec. Quite the contrary.

I was mostly thinking out of the perspective of some TASTy consumer, which isn’t the compiler or tasty-query.

My understanding was until now that TASTy is “fully typed”, so you don’t need to do any type related computations any more when you have TASTy in hands. This assumption seems wrong, as I understand now. Thanks a lot for clarifying!

Yes, TASTy is a language. And I was thinking about a stand-alone interpreter for it.

My assumption was that this shouldn’t be too difficult as “the hard part” was already done by the compiler, and one could just walk TASTy and directly interpret it without much further effort.

But now I start to see that this won’t be as easy. Good to know!

Thanks again for your effort, and ship your spec soon!

soronpo · September 6, 2023, 11:15am

What about custom error messages for match types?

type MyData[T] <: T = T match
  case Int => T
  case String => T
  case _ => MatchError["My Data only supports Int or String"]

sjrd · September 6, 2023, 12:00pm

That is an interesting idea, but I feel like it could be a standalone proposal. We already have term-level means to emit custom errors in inline methods. Perhaps we could have a type-level equivalent.

Iltotore · September 6, 2023, 12:46pm

How will this interact with opaque types?

For example, I have a match type defined like this in a project:

opaque type RefinedType[A, C] = ...

trait RefinedTypeOpsImpl[A, C, T]:
  //Utility methods like smart constructors etc...

type RefinedTypeOps[T]:
  case RefinedType[a, c] => RefinedTypeOpsImpl[a, c, T]

opaque type Temperature = RefinedType[Int, Positive]
object Temperature extends RefinedTypeOps[Temperature]

This currently compiles but has problems with incremental compilation.

In this example, I would like the RefinedTypeOps[Temperature] to reduce to RefinedTypeOpsImpl[Int, Positive, Temperature]. What will happen according to this SIP?

MateuszKowalewski · September 6, 2023, 1:16pm

Was this published? Than it was likely tested.

The SIP says it tested all public libs. (Which is quite impressive actually.)