Scala 3, macro annotations and code generation

For the record, this basically matches what the current experimental support for macro annotations in Scala 3 lets you do (you can in fact add new definitions, but because macro annotations are expanded in a compiler phase that takes place after typer, they’re not visible outside of the macro expansion).

Maybe we need Scala Poet for code gen

1 Like

Only that we don’t need any “builders” as we have already Expr[?]s.

All that’s needed is to “export” (some of) the output so it can be picked up by the next compiler “stage” (as in multi-stage programming).

Using codegen is fine, as long as it does not influence the typing of the rest of the program. Like I said, @main is fine. @data in the original meaning is not fine since it would create a new sort of case class that offers methods different to case classes in the same compilation unit. That’s for all effects and purposes a dialect. Somebody coming new into a Scala codebases that uses @data has to know about what definitions it generates, just like they have to know what definitions a case class generates.

@data can be reasonably supported under the restriction/rewrite model since the definitions it generates are straightforward. In that case, even if you don’t know about @data, you can understand the class just the same by looking at the explicit definitions. But it’s still important that these definitions are there, both for tooling (to have something to navigate to) and for understanding.

Now, if you want to create something much more complicated than that and want it to be hidden from the eyes of the programmer, but you still require that the new definitions are somehow understood to be there, be callable from Scala code, and so on you are in effect creating a dialect: a language that cannot be understood without precise knowledge of what the annotation does. And the tooling experience will be substandard too, because of these hidden definitions.

2 Likes

The editor integration is not part of Scala. It can be offered in Scala tooling, just like Github can offer Copilot. It’s not linting baked into the language either. Rather, it’s the code of the macro annotation that can do the checks.

To give some ideas what a macro annotation can do:

  • Generate definitions that are not directly accessible from the same Scala program, but that can be used for e.g. FFIs or host embeddings. Example: @main
  • Check the code of the annotated definition in some sense. Example: @tailrec
  • Change the body of the annotated definition without changing its signature. Example @optimized
  • Serve as markers for external tools. Basically that’s what Java annotation processors are. For instance, we could have a codegen tool based on TastyQuery that takes an annotated file and produces companion units that add new definitions. With a bit more effort we could let such a tool even produce Tasty directly, so no string concatenation would be needed to do this form of codegen. That tool does not exist currently, but the foundations to develop it are in place,
    If there’s enough interest, we as a community can try to find the resources to develop it.

With such a definition, I can say that any sufficiently advanced framework defines a language dialect.

4 Likes

I think it would be good to work from concrete examples. Where does the dotty repo use codegen? And how is codegen used in your projects?

At least I’m not arguing for any hidden (or like I called it elsewhere “virtual”) code.

Generated code needs to be easily accessible and introspectable for tooling and humans. That’s for sure!

I think that generation of “virtual” invisible code was a flaw in the old macro annotations. In this point I’m fully with you!

But just dumping the generated code into some sources that are otherwise meant to be maintained by humans directly seems just wrong also.

And I don’t buy that “dialect” part. Under such definition almost every Java framework would constitute a “Java dialect”. Nobody ever called it like that; not even something close. Or: Do Rust macros create “Rust dialects”? Honest question.

You always need to know what’s happening under the hood when you use some framework / feature that does seemingly “magic things”. But just having some “magic things” around doesn’t create a language “dialect”. (But in the end it makes no difference to argue this part. Words can be defined arbitrary. So the proposed definition is arbitrary, and “we don’t want ‘dialects’” needs still some concrete justification as such “dialects” are obviously harmless. Almost no language with meta-programming facilities broke because people created meta programs! LISP may be an exception to this rule as it offers very powerful rewriting on the bare syntax level without any safety net of semantic checks, and especially no type checks. But other languages with meta-programming systems don’t suffer form this phenomenon. You can’t call a list as a function by accident in a typed language only because you prepended some symbol to that list that’s bound to a function in some scope. But in LISP exactly this can happen. All you have is s-expression soup. More modern languages have a different types of expressions so messing things up by accident is very unlikely.)

Yeah, checks.

But not the actual work that’s the sole reason to use some facility like that.

That goes into the direction I’m after.

What I don’t understand: Why external tools? Why TastyQuery?

We have already a very fine DSL built into the language to abstract over code creation : Macros! The new quote stuff is impressive and more advanced than what for example the mentioned Kotlin compiler / tooling offers. All that’s needed now is a possibility to export definitions from the macro scope into the outer program.

Such an export should of course not interfere with type checking inside the compilation unit where it gets imported in any other way than some regular compilation unit (code in a different file) that gets imported can. Otherwise we would have the previous mess, with invisible “action at a distance”.

Java is already very dynamic regarding imports, and Scala has excellent support for separate compilation. So one could relatively easy export definitions from macro scope, dump the results to disk, and make them available as an otherwise normal external compilation unit for a “second stage of compilation”. (That’s more or less my understanding of how the mentioned export macros @littlenag is building would work, only that the “dump stuff to disk” part isn’t planed currently afaik; please correct me if I misunderstood).

1 Like

I’ve lately linked something in another post.

I don’t know what he’s doing. But maybe what I had in mind so we can discuss concrete examples that I wanted to build using meta-programming in Scala 3: I’m crying for finally sane code-gen because I want to generated whole client and server stubs with all the marshaling in between just from some simple data definitions. Also I want to abstract away the persistence layer for this data completely. Who ever built some web software where the “business logic” is mostly CRUD knows that most of the code is completely repetitive and differs mostly only by the names and structure of some entity classes. The rest is almost completely mechanical. One could rightly say that +90% of the project consist of boilerplate…

The amount of copy-paste in such projects is hilarious! Because you can’t abstract anything away without resorting to the most dirty “tricks” like creating (string) templates for code files that get filled in by some external scripts.

This kind of code-gen would create a lot of code. The generated code would be one or two orders of magnitude larger than the hand written parts. Of course you need all kinds of definitions. Actually I want to generate whole implementation packages. Likely across Scala platforms. So having Scala.js and Scala JVM code generated that matches each other. The code would be mostly not meant to be touched by humans. (But it needs of course to provide some extension points, so hand written code could be hooked in).

As very large parts of the whole code would be generated a good debugging story is vital. Also being able to read and test the code during development of the “templates” is important. So “virtual” code is no good.

Macro annotations as such play only a small role in this scenario. They would be only the trigger points for code-gen. Convenience of writing “templates” and the tooling support around that are the main concerns here.

OTOH I don’t need any “checking of validity” triggered by the macros.

Is this a workable example?

1 Like

So if I understand correctly, what you are after is a high-level annotation processor that can produce .scala files and other artifacts? I agree that this would be useful to have. It could probably be implemented as a compiler plugin using quotes.reflect as a base layer.

3 Likes

Yes, something like that! :smiley:

How it works in the end under the hood, I don’t care actually.

But the “templating” needs to be sane, safe, and convenient even for less skilled people.

My impression was that the current quote stuff in Scala, with its Expr[?] abstraction, would make a really great “templating language”. It’s the best I’ve seen so far as it’s type safe!

The trigger points that would deliver the data to the “templates” would be hand written annotated definitions (of for example case classes).

The results of the triggered code-gen needs to be “material” as this would be otherwise way to much opaque magic that can’t be debugged reasonably.

And yes, such a feature would be extremely useful! The lives of lesser beings consist in large parts of writing repetitive boilerplately code. Cutting this down to the bare minimum would make Scala especially attractive to Jon-Doe-average-programmer. It would be almost a killer feature for some jobs, making mundane tasks really easy—without compromise on safety or tooling support (like in the case of stringy code templates that are the only way to achieve the stated goal currently in Scala).

Just think about the large market share of poor web devs working with all kinds of languages who do mostly nothing else than writing such kind of “boilerplate”; defining entities, code that brings them over the wire, and persists them on the server side. Most of this is copy-paste, while just replacing entity and field names. A framework that could abstract this away would be a game changer! Spring killer…

Thanks a lot for trying to understand what the pain points are, and what would make things substantially better! That’s something I love Scala for. People are listening. (You sometimes just need to cry loud enough… :grin:)

3 Likes

It’s worth noting that annotation processors are among the most popular tools right now in java-land (mapstruct, immutables, micronaut), and that somehow the annotation macro produces java files with code that are visible in the same compilation unit, because you are able to use the generated definitions on the same file where you introduced the annotations that produce the generated code.
I don’t know how this magic happens, but it is there and it is very necessary for the general usage of annotation processors, in java at least.

1 Like

You can use Kotlin compiler plugins in other contexts too, including Maven and REPL, but it is nice that IDEA’s error highlighting doesn’t get too confused by the syntactic absence of generated stuff.

Someone already linked the Dotty example. Some examples from my own code:

  1. uPickle generates JSON serializers for each arity of tuple (upickle/build.sc at f9bf9984e5175e5f4b2020db17f99e26a3037250 · com-lihaoyi/upickle · GitHub). Not sure if this will fully go away with Scala 3, or whether we’ll need to keep the current implementation for performance

  2. Templatized generics: files like upickle/ujson/templates-jvm/DoubleToDecimalElem.java at main · com-lihaoyi/upickle · GitHub have their Elem string replaced by Byte or Char, effectively specializing/monomorphizing them and avoiding boxing that would arise with generics. This is similar to what’s done in Java-land for specialized collections like FastUtils or Koloboke Collections

  3. IDL codegen: at work we do build-time codegen from .proto schemas, OpenAPI specs, and AWS API specs to provide typed RPCs. The goal here is primarily to provide type-safe access to something defined outside the Scala codebase. In my personal projects, I’ve also used Scalably Typed which works via codegen.

There are also places where I haven’t bitten the bullet to use codegen, but there’s tons of boilerplate which cannot be made to go away:

  1. Defining a whole bunch of related case classes with the same field, e.g. all Exprs in the Sjsonnet config compiler have a pos: Position field sjsonnet/sjsonnet/src/sjsonnet/Expr.scala at master · databricks/sjsonnet · GitHub. They all extend a trait, so .pos can be used seamlessly in downstream code, but it’s tedious to have to include pos: Position in every single case class declaration when there’s a lot of them.

  2. Injecting implicits throughout all methods in a object, e.g. Fastparse’s def number[$: P] context bound. Having a context bound or implicit/using param isn’t a big deal when you have a few of them, but when you have hundreds of them one-on-every-line even the smallest amount of boilerplate gets old.

    • Multiple implicits can be combined together easily into a single implicits via wrappers (e.g. here), but in many cases - such as for FastParse rules - even a single implicit is a ton of boilerplate when it happens on every single line (hence the contortions around context bounds to try and minimize it)
  3. Dependency injection via implicits, somewhat similar to above. I wrote a compiler plugin back in the day to automatically add (implicit foo: Foo) to all definitions in annotated files (GitHub - lihaoyi/sinject: SInject is a Scala compiler plugin which helps auto-generate implicit parameters), to remove the boilerplate of tediously declaring the implicit over and over and over.

  4. Re-using parameter lists between functions, without forcing the user to construct and pass in a config object. e.g. in Requests-Scala, the same parameter list is copy-pasted 4 times (1 2 3 4 with minor tweaks. There are some other places where the copy-pasta happens in expressions that aren’t particularly amenable to solve via macros (1 2).

    • If the method signatures are exactly the same, then I could get away with defining a class Foo{ def apply(...) } and instantiating Foo multiple times. That is what is done for requests.get/post/etc. to re-use the signatures. But in the case of .get/.get.stream/Request(...), the signatures are slightly different, which means i have to copy-paste-edit the whole thing each time I want a new one

    • These cases could be resolved by some kind of **kwargs keyword-argument-expansion language feature as exists in Python: both at the call site “expanding” a case class via foo("hello", **bar)into a bunch of keyword-arguments foo("hello", qux = bar.qux, baz = bar.baz), and at the definition side where could define a def foo(s: String, bar: MyCaseClass**) and have it automatically expand into a bunch of keyword parameters (with types and defaults) def foo(s: String, qux: Int, baz: String)

I also do a bunch of faux-macro-annotations, that don’t need to introduce stuff visible to the typer, but bundle up metadata or definitions for use at runtime. These requirements are probably satisfied by the “transparent” macro annotations that run purely after-typer

  1. mainargs @main
  2. Cask @get, @post, @postJson, @websockets, etc.
  3. Mill def myCommand() = T.command{ ... } (not quite an annotation - it is discovered based on return type instead - but it works basically the same way)

Now, I won’t say that the way Scala allows you to abstract over definitions is bad. You can get surprisingly far with traits, type parameters, higher-kinded types, implicits, and so on. But there’s definitely a gap there.

In other languages you might not even notice this boilerplate, because everything is so boilerplatey it kind of blends together. But in Scala, given how nice we can make a lot of our expression-related code with functions, HoFs, by-name params, and macros, these areas of clunkiness really stand out and are probably the motivation for a lot of the requests for macro annotations

5 Likes

Perhaps another set of places where the current Scala features for abstracting over definitions are insufficient, are those around ORM:

SLICK

final case class Coffee(name: String, price: Double)
// Next define how Slick maps from a database table to Scala objects
class Coffees(tag: Tag) extends Table[Coffee](tag, "COFFEES") {
  def name  = column[String]("NAME")
  def price = column[Double]("PRICE")
  def * = (name, price).mapTo[Coffee]
}
// The `TableQuery` object gives us access to Slick's rich query API
val coffees = TableQuery[Coffees]

ScalikeJDBC

import java.time._
case class Member(id: Long, name: Option[String], createdAt: ZonedDateTime)
object Member extends SQLSyntaxSupport[Member] {
  override val tableName = "members"
  def apply(rs: WrappedResultSet) = new Member(
    rs.long("id"), rs.stringOpt("name"), rs.zonedDateTime("created_at"))
}

In general, ORMs need a few things:

  1. They need some kind of case class representing a row in the database table, with each field in the case class representing a single entry in that database column

  2. They need some kind of object representing the database table itself, with each field in that object representing the entire database column as-a-whole. This may have table-level or column-level configuration, and support table-level or column-level operations

The case class and the object usually have a lot of similarities, but it is impossible to encapsulate this boilerplate using normal Scala language features.

  1. You cannot, for example, define a trait and use that to auto-generate the case class signature and object members.

  2. You might be able to use a sufficiently abstract trait to enforce that the case class and object have matching sets of column definitions. But as has been discussed earlier, merely enforcing that the boilerplate matches a particular pattern is not enough. People want to encapsulate the boilerplate so they don’t see it!

What ends up happening is one of two things:

  1. Listing out all the columns in the database table twice: once for the case class and once for the object.

  2. Move the configurability from the object into magic annotations on the case class, and generate the object using an expression-macro. This gives up considerable flexibility, and introduces its own weird DSL: there is no standard for annotations, and any annotations can do just about anything. This is what Squeryl and Quill do with their table[T] and query[T] macros respectively:

Squeryl

class Book(
  val id: Long,  
  var title: String,  
  @Column("AUTHOR_ID") // the default ‘exact match’ policy can be overriden
  var authorId: Long,  
  var coAuthorId: Option[Long]
) {
    def this() = this(0,“”,0,Some(0L))  
}

val books = table[Book]

Squeryl also supports more dynamic configuration via schema objects, as a sort of look-aside table containing an odd DSL

object Library extends Schema {    
  on(borrowals)(b => declare(  
    b.numberOfPhonecallsForNonReturn defaultsTo(0),  
    b.borrowerAccountId is(indexed),  
    columns(b.scheduledToReturnOn, b.borrowerAccountId) are(indexed)  
  ))

  on(authors)(s => declare(  
    s.email is(unique,indexed(“idxEmailAddresses”)), //indexes can be named explicitely  
    s.firstName is(indexed),  
    s.lastName is(indexed, dbType(“varchar(255)”)), // the default column type can be overriden  
    columns(s.firstName, s.lastName) are(indexed)  
  ))  
}

Quill

case class Circle(radius: Float)

val areas = quote {
  query[Circle].map(c => pi * c.radius * c.radius)
}

Quill goes a different way, and instead of annotations, it makes configuration get pulled in via implicit resolution:

def example = {
  implicit val personSchemaMeta = schemaMeta[Person]("people", _.id -> "person_id")

  ctx.run(query[Person])
  // SELECT x.person_id, x.name, x.age FROM people x
}

These workarounds work, but they’re not ideal. People want to define their database table as a case class and object pair. The object representing the entire table and the case class representing a row within it, with many similarities but many differences. You can configure either separately if you need something unusual

People don’t want to jump through hoops with annotations that get read by magic expression-macros to do their thing, or be forced to define their config in some look-aside data structure, or have the configuration of their database table be pieced together via implicit resolution. But given the boilerplate of duplicating all definitions N times to set up the case class/object pair, the weird ad-hoc workarounds become attractive.

If we could allow users to write an annotation macro that expands predictably into a case class/object pair, with some programmable defaults and allowing user-definable overrides, that would obliviate the need for a lot of these crazy contortions that ORM libraries go through to let users define and configure their schema in a type-safe way

9 Likes

Thank you for expanding in such detail on my remark that “I want to abstract away the persistence layer”!

It shows exactly where all the boilerplate is. :+1:

To give the example more weight: Imagine something like a web-CMS. There you have often dozens or even hundreds of flat tables. All the “logic” operating on them is usually always the same. Basically CRUD, with some hooks.

In Scala you would currently need to write all the semi-complex repetitive code out by hand.

Compare with something like Java’s JPA:

@Entity
public class T {
   @Id private K id;
   // … rest of Java boilerplate for data type

@Repository
public interface TRepository extends JpaRepository<T, K> {}

// somewhere else:
private final TRepository tRepository; // gets injected…
// …
tRepository.findById(id);

(And this above could be even more abstracted in case of something like the mentioned CMS if you would have some kind of code templating… Just spit out this code snippet for all kinds of Ts.)

You get basically everything for free, just by some magic annotations. That’s why people use stuff like Spring. Even a junior dev can be very productive with that because things are really simple and straight forward!

Java-land is code-gen land. Same for Go (even it got better since they have “generics”).

Also all the many usages of Rust macros are prominent examples.

1 Like

As a small addition, as I just realized that this wasn’t mentioned here anywhere:

Python, a language marketed as simple and approachable even for beginners has also excellent meta-programming features everybody is using on a day to day basis.

https://python-3-patterns-idioms-test.readthedocs.io/en/latest/PythonDecorators.html

https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Metaprogramming.html

The whole purpose of meta-classes is of course the programmatic introduction of new class definitions.

Also you can see “decorators” (~ macro annotations) everywhere in Python! From data-classes to serialization frameworks, and all kinds of boilerplate reduction, down to validation, logging, or debugging aid.

It’s really hard to write real-word Python without at least some decorators.

People seem to love them:

Please note that decorators are quite similar in some regards to Scala’s implicit functions. Over there people think you can’t live without and use them everywhere. In Scala implicit functions are “dreaded” because someone overused them somewhere in the past and someone else complained very loud… And what happened than was the above already mentioned complete overreaction in Scala.

(The same goes by the way for implicit conversions: When you look at them in C# everybody loves them! But Scala is trying to fight them lately… The “issues” with them are imho mostly a marketing thing. At some point Google’s auto-suggest even returned as continuation of the query “Scala implicit conversions” the word “bad” as top suggestion; but for “C# implicit conversion” it spits out positively annotated and helpful content anytime I tried.)

2 Likes

Eureka! After (literal) years of experimentation—though, luckily for my psyche, extremely intermittent experimentation—I’ve stumbled upon a solution* to this problem. *At least, a solution for my particular problem subspace.

The Problem

I’ve been pining for one specific aspect of Scala 2’s meta-programming: The ability to outfit a companion object with a set of methods, loosely derived from the structure of the underlying trait or case class.

I’m recalling the anguish of writing Lenses in longhand:

case class Person(name: String, age: Int)

object Person:
  val name = Lens[Person](_.name)(p => name => p.copy(name = name))
  val age  = Lens[Person](_.age)(p => age => p.copy(age = age))

Whereas, in Scala 2, we had:

@deriveLenses
case class Person(name: String, age: Int)

Person.name // Lens[Person, String]
Person.age  // Lens[Person, Int]

Similarly, a common pattern of boilerplate besmirching many a ZIO codebase is that of “accessors”:

trait ExampleService:
  def add(x: Int, y: Int): Task[Int]

object ExampleService:
  def add(x: Int, y: Int): ZIO[ExampleService, Throwable, Int] = 
    ZIO.serviceWithZIO(_.add(x, y))

So much needless RSI, when we simply could’ve written:

@deriveAccessors
trait ExampleService:
  def add(x: Int, y: Int): Task[Int]

Solution

I’ve taken five or six abortive stabs at this problem throughout the years. The closest I’d found previously is the Selectable pattern. The pattern, described in this issue, didn’t work at first, due to the lack of autocomplete support—hence the issue, which has since been addressed :partying_face:.

So, until now, the best I’d had was this (copy-pasted from the issue, so ignore the commented caveat):

case class Person(name: String, age: Int)

object Person {
  val lenses = Lenses.gen[Person]
}

Person.lenses.name // For this to be tenable, this would need to autocomplete with the type Lens[Person, String]

You know, this ain’t too bad. But that little gap of convenience, of needing to call through some intermediate Selectable value, has been gnawing at me. I wanted to call Person.name or ExampleService.method directly.

And so, I’ve finally concocted a way of doing this. I’m surprised it works at all, to be honest. It is, essentially, the daisy-chaining together of the Selectable pattern with a Conversion and a given macro, allowing for arbitrary macro-generated extension methods. It’s a neat trick and I’m lucky to have found it, because there are about 12 subtle variations which all fail spectacularly. I was on the verge of giving up when it finally compiled.

With this trick in place, we get the following:

case class Person(name: String, age: Int, isAlive: Boolean)
object Person extends DeriveLenses

@main
def example(): Unit =
  val person  = Person("Alice", 42, true)
  val name    = Person.name.get(person)
  val age     = Person.age.get(person)
  val isAlive = Person.isAlive.get(person)

  println(s"Name: $name, Age: $age, Is Alive: $isAlive")

It’s still not perfect, as one must extend the Companion object, which means one must still define the companion object, even if it’s otherwise unnecessary. Yet, save for that blemish, this long sought after syntactic summit is finally reachable.

It works for the ZIO accessor pattern as well:

trait ExampleService:
  def launchRockets(): Task[Unit]
  def addNumbers(a: Int, b: Int): UIO[Int]

object ExampleService extends DeriveAccessors

object Example extends ZIOAppDefault:
  val program: ZIO[ExampleService, Throwable, Unit] =
    for
      _ <- ExampleService.addNumbers(1, 2)
      _ <- ExampleService.launchRockets()
    yield ()

  val run =
    program.provide(ExampleServiceLive.layer)

The other downside, of course, is that the transparent inline defs required to make this work, only truly work with Metals. So IDEA is decidedly uninvited to the party. I really hope this changes before long, but that’s a separate issue.

Code

The implementation of DeriveLenses is here: quotidian/examples/shared/src/main/scala/quotidian/examples/lens/LensMacros.scala at main · kitlangton/quotidian · GitHub

As you can see, it’s a thin, yet necessary, wrapper around some other macro-generated bits.

trait DeriveLenses:
  given conversion( 
      using
      cc: CompanionClass[this.type],
      lenses: LensesFor[cc.Out]
  ): Conversion[this.type, lenses.Out] =
    _ => lenses.lenses

I hope the pattern can be simplified somewhat (open to suggestions!), but at least it’s nice and clean at the call-site.

Final Entreaty

Of course, what I’d really love is the reinstatement of this particular subset of annotation macros. It sure would be neat if they could once again extend the companion object with arbitrary helper methods.

Luckily, one can achieve the same effect with this combination of non-experimental Scala 3 macros + other mechanisms. Therefore, if anyone fears of the consequences of such a feature, well: Be afraid now! :stuck_out_tongue_winking_eye:. The only issue is that we’re about 2% shy of syntactic perfection.

Anyhow, thanks for reading! I hope this was useful/entertaining/distracting-from-some-chronic-pain-now-reminded-of. And, just to end on a positive note: Any frustration I express, now or ever, over the Scala 3 macro system is born of pure joy and love. It’s been so fun messing around with (and trying to break) it over all these years. :heart: Endless gratitude to all who build and maintain it.

20 Likes

I believe you could use the “computed field names” support in dotty/docs/_docs/reference/experimental/named-tuples.md at named-tuples-2 · dotty-staging/dotty · GitHub to avoid the conversion.

3 Likes