Scala 3, macro annotations and code generation

Scala 3, macro annotations and code generation

Hi all,

Back in 2018, Macros: the Plan for Scala 3 | The Scala Programming Language described how we expected macros, and in particular macro annotations, to eventually look like in Scala 3:

[Macros] will run after the typechecking phase is finished because that is when Tasty trees are generated and consumed. Running macro-expansion after typechecking has many advantages

  • it is safer and more robust, since everything is fully typed,
  • it does not affect IDEs, which only run the compiler until typechecking is done,
  • it offers more potential for incremental compilation and parallelization.

Since we recently merged a first draft for macro annotations support, I wanted to revisit the issue of porting existing macro annotations that expect to run during typechecking because they add new members to classes. For example, take @alexarchambault’s, data-class:

Use a @data annotation instead of a case modifier, like

import dataclass.data
@data class Foo(n: Int, s: String)

This annotation adds a number of features, that can also be found in case classes:

  • sensible equals / hashCode / toString implementations,
  • apply methods in the companion object for easier creation,

[…] It also adds things that differ from case classes:

  • add final modifier to the class,
  • for each field, add a corresponding with method (field count: Int generates
    a method withCount(count: Int) returning a new instance of the class with
    count updated).

For many years, our answer to “How do I do this in Scala 3?” has been “Use code generation”. But it seems that no popular code generation framework for Scala has emerged during this time (scalagen seemed interesting but was archived).
More recently, we’ve been considering having something like @data built into the language but there’s some concerns that this would bloat the language, and it wouldn’t help with other macro annotations.

Meanwhile, the Scala 3 compiler grew a -rewrite flag which can be used to automatically fix errors. For example,

def f(): Unit = ()
f

does not compile in Scala 3, but if I pass -source 3.0-migration -rewrite to the compiler, the source file will be patched to obtain:

def f(): Unit = ()
f()

Currently, the rewrite mechanism is only used to ease migrations, and it is difficult to trigger since it requires fiddling with compiler flags. But in the future we should be able to expose this better to the outside world so that you can apply rewrites from the comfort of your IDE by clicking a button.

This brings me back to macro annotations: even if we cannot add new definitions visible during typechecking, we could have the macro just check if appropriate definitions exist, and emit an error with an appropriate automatic rewrite if they don’t. For example given,

@data class Foo(n: Int, s: String)

Running this code in my IDE should give me a red underline, clicking on the “fix it” button, could then rewrite the code as follow:

@data final class Foo(n: Int, s: String) {
  def withN(n: Int) = data.generated()
  def withS(s: String) = data.generated()

  override def equals(x: Any): Boolean = data.generated()
  override def hashCode: Int = data.generated()
  // ...
}
object Foo {
  def apply(n: Int, s: String): Foo = data.generated()
  // ...
}

where data.generated is an inline def which generates the correct method body depending on the context (we could also generate the actual method body inline, but relying on an intermediate method keeps the amount of generated code to the minimum needed).
If at a latter point I decided to add an extra field x to Foo, I would then get a new error and clicking on the “fix it” button for that error would add the necessary withX method to the class while leaving everything else as-is. While this is more laborious than what was possible with Scala 2 macros, it means that the generated APIs are now easily readable for both humans and computers without having to understand macro code.

In other words, I’m suggesting we use existing facilities in the compiler to turn it into a code generation tool. This mean we wouldn’t have to worry about having to setup a separate tool and integrate it in our build pipelines. To ease cross-compilation, existing Scala 2 macro annotations could also be adapted to allow this style (by just not doing anything when detecting methods with the correct signature and body).

The main thing that will be needed to make this practical is some convenience methods in the reflection API for doing code generation (this won’t be completely trivial since we’ll have to handle transforming classes with existing definitions, and ideally avoid using fully-qualified names where possible for readability).

Before exploring this further, I’d be interested in hearing from implementers of macro annotations: would you be interested in using this pattern? For example, scio defines some powerful macro annotations for generating full case classes from a schema which seems like they could fit into this pattern, but I’m not familiar with how they’re used in practice.

Let me know what you think!

12 Likes

This sounds really useful!

What would take to implement something like the @data annotation using the cool new stuff? (I’m asking for a simple sketch (if that’s possible) to get a feeling of what an implementation looks like…)

Automatic derivation of the obvious but laborious code is a powerful tool. In Scala3, we can now easily derive specific functions (in the form of typeclasses), but data structures lack similar flexibility in shaping and transforming based on the meta-data. I believe we should allow annotation macros to expand before typechecking to realize their potential fully.

I think an approach based on suggested rewrites strikes a nice balance between predictability and convenience:

  • we still maintain the invariant that every visible symbol has an explicit definition to which we can navigate.
  • at the same time, we relieve the developer from having to write lots of boilerplate code.
2 Likes

@smarter how this codegen schema should work when the macro definition gets updated?

I think the obvious complain you’ll get from users is “I already added the annotation indicating what I want to do, why do I have to click a button (which is IDE dependent) to make it do the thing? Also generated code is boilerplate that I now have to maintain”

As a solution to macro annotations I’m against (for whatever my opinion is worth). This is not an improvement over just not having it, specially because of how it binds the whole language to an IDE (to make it remotely practical, without one it’s even worse).
As a concept in general though, about the compiler realizing some common mistakes or being able to ship rewrite rules in libraries that end up making the IDE nudge you in the right direction, I love it! But it really isn’t a solution to macro annotations.

9 Likes

It would be interesting to realize what kind of code generation we need most.

Automatic derivation of typeclases in Scala3 solves IMHO the problem of providing all types of capabilities (functions). The example of @data macro shows that we need to have something similar in power for data structures.

As an example, I would like to have the possibility to write a @withOptional and @compose macro:

trait HasId { val id: String}
@withOptional trait HasAmount { val amount: Int}

@compose class StateA extends HasId
@compose class StateB extends HasId with HasAmountOptional
@compose class StateC extends HasId with HasAmount

to produce at the compile time:

trait HasId { val id: String}
trait HasAmount { val amount: Int}
trait HasAmountOptional { val amountOpt: Option[Int]}

case class StateA(id: String) extends HasId
case class StateB(id: String, amount: Option[Int]) extends HasId with HasAmountOptional
case class StateC(id: String, amount: Int) extends HasId with HasAmount

I believe the way it is intended, a macro annotation can check that a class conforms to a certain schema, but cannot generate definitions that makes it conform. So if the macro gets updated, the new check might fail and another action to fix it might be proposed.

It’s similar to @tailrec. No magic, just an assertion that the annotated construct has certain properties.

1 Like

I’ve prototyped a subset of @data in [Proof of Concept] Code generation via rewriting errors in macro annotations by smarter · Pull Request #16545 · lampepfl/dotty · GitHub to show what’s possible with current APIs. Besides exposing -rewrite, the main thing missing is safe code generation which I assume could be done by taking inspiration from scalameta.

I’m not ever super keen on the idea of auto-generated code that is added to source files, but I understand the concern here about macros happening during more phases and doing more magical things.

It’s of high importance that anything like this can be done with command-line tools and not just an IDE, and that it can be done in a way that only makes the one specific change asked for and no others.

Some worries: what if someone adds new code or comments to an area of generated code? What are the rules about what happens then? Is it standard or up to each code generator to do it the way they think is “right”?

For example, if someone wants to add to a definition of withX so that it does some additional work along with the default work? Wants to change hashCode to account for known properties of this data?

(My feeling is that if definitions are present in the code, they should be changeable. If they’re not able to be changed, and the implementations are obscured like the above, why are they even there? But if the generated methods have strong correctness requirements that they never be changed or replaced…they still show up here.)

6 Likes