Scala 3, macro annotations and code generation

smarter · December 12, 2022, 3:01pm

Scala 3, macro annotations and code generation

Hi all,

Back in 2018, Macros: the Plan for Scala 3 | The Scala Programming Language described how we expected macros, and in particular macro annotations, to eventually look like in Scala 3:

[Macros] will run after the typechecking phase is finished because that is when Tasty trees are generated and consumed. Running macro-expansion after typechecking has many advantages

it is safer and more robust, since everything is fully typed,

it does not affect IDEs, which only run the compiler until typechecking is done,

it offers more potential for incremental compilation and parallelization.

Since we recently merged a first draft for macro annotations support, I wanted to revisit the issue of porting existing macro annotations that expect to run during typechecking because they add new members to classes. For example, take @alexarchambault’s, data-class:

Use a @data annotation instead of a case modifier, like
import dataclass.data
@data class Foo(n: Int, s: String)
This annotation adds a number of features, that can also be found in case classes:

sensible equals / hashCode / toString implementations,

apply methods in the companion object for easier creation,

[…] It also adds things that differ from case classes:

add final modifier to the class,

for each field, add a corresponding with method (field count: Int generates
a method withCount(count: Int) returning a new instance of the class with
count updated).

For many years, our answer to “How do I do this in Scala 3?” has been “Use code generation”. But it seems that no popular code generation framework for Scala has emerged during this time (scalagen seemed interesting but was archived).
More recently, we’ve been considering having something like @data built into the language but there’s some concerns that this would bloat the language, and it wouldn’t help with other macro annotations.

Meanwhile, the Scala 3 compiler grew a -rewrite flag which can be used to automatically fix errors. For example,

def f(): Unit = ()
f

does not compile in Scala 3, but if I pass -source 3.0-migration -rewrite to the compiler, the source file will be patched to obtain:

def f(): Unit = ()
f()

Currently, the rewrite mechanism is only used to ease migrations, and it is difficult to trigger since it requires fiddling with compiler flags. But in the future we should be able to expose this better to the outside world so that you can apply rewrites from the comfort of your IDE by clicking a button.

This brings me back to macro annotations: even if we cannot add new definitions visible during typechecking, we could have the macro just check if appropriate definitions exist, and emit an error with an appropriate automatic rewrite if they don’t. For example given,

@data class Foo(n: Int, s: String)

Running this code in my IDE should give me a red underline, clicking on the “fix it” button, could then rewrite the code as follow:

@data final class Foo(n: Int, s: String) {
  def withN(n: Int) = data.generated()
  def withS(s: String) = data.generated()

  override def equals(x: Any): Boolean = data.generated()
  override def hashCode: Int = data.generated()
  // ...
}
object Foo {
  def apply(n: Int, s: String): Foo = data.generated()
  // ...
}

where data.generated is an inline def which generates the correct method body depending on the context (we could also generate the actual method body inline, but relying on an intermediate method keeps the amount of generated code to the minimum needed).
If at a latter point I decided to add an extra field x to Foo, I would then get a new error and clicking on the “fix it” button for that error would add the necessary withX method to the class while leaving everything else as-is. While this is more laborious than what was possible with Scala 2 macros, it means that the generated APIs are now easily readable for both humans and computers without having to understand macro code.

In other words, I’m suggesting we use existing facilities in the compiler to turn it into a code generation tool. This mean we wouldn’t have to worry about having to setup a separate tool and integrate it in our build pipelines. To ease cross-compilation, existing Scala 2 macro annotations could also be adapted to allow this style (by just not doing anything when detecting methods with the correct signature and body).

The main thing that will be needed to make this practical is some convenience methods in the reflection API for doing code generation (this won’t be completely trivial since we’ll have to handle transforming classes with existing definitions, and ideally avoid using fully-qualified names where possible for readability).

Before exploring this further, I’d be interested in hearing from implementers of macro annotations: would you be interested in using this pattern? For example, scio defines some powerful macro annotations for generating full case classes from a schema which seems like they could fit into this pattern, but I’m not familiar with how they’re used in practice.

Let me know what you think!

bjornregnell · December 12, 2022, 3:44pm

This sounds really useful!

What would take to implement something like the @data annotation using the cool new stuff? (I’m asking for a simple sketch (if that’s possible) to get a feeling of what an implementation looks like…)

arturopala · December 12, 2022, 3:58pm

Automatic derivation of the obvious but laborious code is a powerful tool. In Scala3, we can now easily derive specific functions (in the form of typeclasses), but data structures lack similar flexibility in shaping and transforming based on the meta-data. I believe we should allow annotation macros to expand before typechecking to realize their potential fully.

odersky · December 12, 2022, 9:46pm

I think an approach based on suggested rewrites strikes a nice balance between predictability and convenience:

we still maintain the invariant that every visible symbol has an explicit definition to which we can navigate.
at the same time, we relieve the developer from having to write lots of boilerplate code.

arturopala · December 13, 2022, 6:18am

@smarter how this codegen schema should work when the macro definition gets updated?

rcano · December 13, 2022, 9:37pm

I think the obvious complain you’ll get from users is “I already added the annotation indicating what I want to do, why do I have to click a button (which is IDE dependent) to make it do the thing? Also generated code is boilerplate that I now have to maintain”

As a solution to macro annotations I’m against (for whatever my opinion is worth). This is not an improvement over just not having it, specially because of how it binds the whole language to an IDE (to make it remotely practical, without one it’s even worse).
As a concept in general though, about the compiler realizing some common mistakes or being able to ship rewrite rules in libraries that end up making the IDE nudge you in the right direction, I love it! But it really isn’t a solution to macro annotations.

arturopala · December 14, 2022, 11:05am

It would be interesting to realize what kind of code generation we need most.

Automatic derivation of typeclases in Scala3 solves IMHO the problem of providing all types of capabilities (functions). The example of @data macro shows that we need to have something similar in power for data structures.

As an example, I would like to have the possibility to write a @withOptional and @compose macro:

trait HasId { val id: String}
@withOptional trait HasAmount { val amount: Int}

@compose class StateA extends HasId
@compose class StateB extends HasId with HasAmountOptional
@compose class StateC extends HasId with HasAmount

to produce at the compile time:

trait HasId { val id: String}
trait HasAmount { val amount: Int}
trait HasAmountOptional { val amountOpt: Option[Int]}

case class StateA(id: String) extends HasId
case class StateB(id: String, amount: Option[Int]) extends HasId with HasAmountOptional
case class StateC(id: String, amount: Int) extends HasId with HasAmount

odersky · December 14, 2022, 12:44pm

I believe the way it is intended, a macro annotation can check that a class conforms to a certain schema, but cannot generate definitions that makes it conform. So if the macro gets updated, the new check might fail and another action to fix it might be proposed.

It’s similar to @tailrec. No magic, just an assertion that the annotated construct has certain properties.

smarter · December 16, 2022, 6:14pm

I’ve prototyped a subset of @data in [Proof of Concept] Code generation via rewriting errors in macro annotations by smarter · Pull Request #16545 · lampepfl/dotty · GitHub to show what’s possible with current APIs. Besides exposing -rewrite, the main thing missing is safe code generation which I assume could be done by taking inspiration from scalameta.

hypatian · December 17, 2022, 3:24am

I’m not ever super keen on the idea of auto-generated code that is added to source files, but I understand the concern here about macros happening during more phases and doing more magical things.

It’s of high importance that anything like this can be done with command-line tools and not just an IDE, and that it can be done in a way that only makes the one specific change asked for and no others.

Some worries: what if someone adds new code or comments to an area of generated code? What are the rules about what happens then? Is it standard or up to each code generator to do it the way they think is “right”?

For example, if someone wants to add to a definition of withX so that it does some additional work along with the default work? Wants to change hashCode to account for known properties of this data?

(My feeling is that if definitions are present in the code, they should be changeable. If they’re not able to be changed, and the implementations are obscured like the above, why are they even there? But if the generated methods have strong correctness requirements that they never be changed or replaced…they still show up here.)

steinybot · April 13, 2023, 9:25pm

Aren’t transparent inline methods expanded during type checking? Does that mean that the original premise about when macros will run is wrong? Does that have any implications for the future macro annotations and whether they will be able to produce definitions that are available to the typer?

I’m not sure about the benefit that this proposal has over scalafix. I would have thought that being a rewrite that it would run automatically rather than needing to be triggered from an IDE. Since macro annotations still cannot produce definitions, code generation is still the only solution to those cases.

The way I see it there are two different sources for code that we want to be produced automatically. If it is some external source such as a text file, database schema, GraphQL schema, TypeScript definition, then using code generation is what I would expect. I generate this once before I start writing my code that uses it. Perhaps it updates and I have to regenerate the code but it updates with a different cadence to my user written code. The second source is other user written code. To have to write code, run a code generator that uses that code, then write more code that uses that generated code, is totally unpalatable. Meta-programming has to be the answer to this second case.

While meta-programming has to be the answer to code that is produced from other user written code, that doesn’t necessarily mean that we need it to produce new definitions. If type lambdas were as powerful as TypeScript’s Mapped Types then I think that would solve almost all cases (plus a whole lot more that we need for ScalablyTyped). It should be trivial to define a type lambda for @data and @withOptional and then generate the instance with a macro def or macro annotation.

smarter · April 14, 2023, 1:23pm

Looks like I forgot to answer this post earlier, my apologies.

This would technically be up to each code generator, but the idea would be to have some high-level API in the standard library for macro authors to use to ensure some level of regularity between macros. I think the default should be to emit an error in case of any user-generated code that differs from what the compiler would generate.

I think this could be enabled by adding an escape hatch, this would be similar in spirit to @unchecked.

smarter · April 14, 2023, 1:31pm

You’re absolutely right, transparent inline methods are an intentional escape hatch from the usual restrictions on the interaction of macros and typechecking, but the current idea for macro annotations is still to not widen that escape hatch further to allow API additions.

TypeScript’s mapped types are definitely interesting, but can you actually do something equivalent to @data using them? It seems like they only allow you to transform member types, not to define new members or even to provide an implementation of existing members.

steinybot · April 19, 2023, 3:32am

You can define a Data type:

interface Foo {
  n: number
  s: string
}

type Data<A> = A & {
  [Property in keyof A as `with${Capitalize<string & Property>}`]: (value: A[Property]) => Data<A>
}

declare const fooData: Data<Foo>

const fooData2 = fooData.withN(123).withS("abc")

If we could do something like this with a type lambda in Scala 3 then that would be amazing. Once we are able to define the type then a blackbox macro could implement it no problem.

hypatian · April 22, 2023, 5:38am

This seems to assume that the compiler is guaranteed to have access to both the old and the new versions of the macro invocation (and possibly both old and new versions of the macro itself) so it can check whether the text is the same as old_macro(old_invocation) before it replaces it with new_macro(new_invocation).

Or do you mean that it would just say “that doesn’t match, you should re-run the macro” and if the user clicks the button any “customizations” are lost?

Which I guess is “fine” because it would never have compiled with customizations anyway. But… again, that leaves me wondering why the un-changeable macro output is included in the first place (rather than being an optional gloss that can be asked for.)

smarter · April 27, 2023, 12:33pm

Yes, that’s what I had in mind.

It serves as documentation for the user and compared to running user-written macros during the typechecker, it limits the ways in which things can go wrong (e.g., cyclic errors because the macro tried to access the type of something which is in the process of being typechecked).

smarter · April 27, 2023, 12:59pm

Thanks for the example! I think the closest analog in Scala 3 would be match types, right now one can write:

trait Foo {
  val x: Int
}

type HasX[T] = { val x: T }
type WithX[Base, Impl] = Base match
  case HasX[t] => Base { def withX(x: t): Impl }

class FooImpl(val x: Int) extends Foo { // Ideally this would be: WithX[Foo, FooImpl]
  def withX(x: Int): FooImpl = new FooImpl(x) // ... so this can be blackbox-macro-generated
}

val test: WithX[Foo, FooImpl] = new FooImpl(1)

To make this practical we would need at least:

To allow writing class FooImpl(val x: Int) extends WithX[Foo], or more generally, allow extending any refinement type, with the refinements becoming abstract members of the class we’re defining.
To allow matching on the name of variables and not just their types, and to have a mechanism for splicing variable names. That would be much more controversial since there’s no precedent for this outside of quotes. Also match types are already known to have tricky typing issues (e.g., see Refine criterion when to widen types by odersky · Pull Request #17180 · lampepfl/dotty · GitHub) which we really need to figure out before considering extending them further.

I think to motivate this kind of language change we would need a lot of convincing usecases besides the usual @data-like one.

jxnu-liguobin · June 12, 2023, 3:18am

jeremyrsmith · June 12, 2023, 8:28pm

As a macro fan, wanted to add two cents: IMHO, the strength of macros has always been that they help eliminate boilerplate. They accomplish that by (either directly, for custom macros; or indirectly, in shapeless and magnolia) generating the boilerplate at compile time, from the smallest and/or most idiomatic way to express that boilerplate. For example, writing a case class (a small and idiomatic thing) can get you the boilerplate of a JSON serializer, or a pretty-printer, or what have you. Annotation macros are in a similar vein, but for use cases where the boilerplate can’t be captured in a value and must be a generated definition.

Physically generating the boilerplate and writing it back to the source file is not remotely the same thing. It solves the problem of writing the boilerplate, but that was only a small problem to start with compared to the problem of reading the boilerplate. After all, code is read many more times than it’s written. The reduced form of the boilerplate (e.g. the case class) is much more readable than the expanded boilerplate itself. Evidence for this can be seen in Java projects which use code generation via annotation processors – the generated code is never checked in to source control; it’s instead generated only during the compilation pipeline.

I think the idea of using the rewrite system as a code generation tool is clever (smarter, even ) but it won’t find much adoption unless it can be a transparent part of the compilation pipeline (i.e. it does not modify source files, only modifies the AST that goes to the next compilation phase – basically what annotation macros do in Scala 2)

MateuszKowalewski · June 12, 2023, 11:26pm

I think there is a kind of tension here:

On the one hand generated code is “just noise” and usually nothing you would like to maintain manually, or even read in a lot of cases.

On the other hand having “invisible code” interfere with the rest of your code-base is an issue. This was a constant complain about the old annotation macros… Everything’s cool as long as everything works. But in case it doesn’t have fun debugging “invisible” code!

I think what Java annotation processors do is mostly sane: They generate code to disk so it can be read and understood. Also the code can be debugged easily this way. But it’s a kind of “don’t touch” code that doesn’t get versioned and shouldn’t be edited manually therefore.

I admit that the “invisible code” problem may be a pure tooling issue. But I’m not sure about that. Tooling would need to have the generated code anyway somewhere (at least in memory, likely even on disk). So there is in the end so or so no big difference to the annotation processor approach. Only the internal implementation would differ. But with purely “virtual” code the tooling would need to be quite complex, I guess…

To avoid touching and changing anything in manually written compilation units by a code-gen facility I want once more point to what C# did in this regard. This seems smart, imho.