Scala 3, macro annotations and code generation

During the program lifecycle, code is read much more times than written. A generated boilerplate code, which is visible, adds cognitive load to each reading act, making working with code much harder.

6 Likes

I had a need for a macro annotation that adds (private) implicit arguments. IIUC, this is not a language dialect, but it’s also something that must occur before typer.

I said it over in the other thread, but do folks think that forbidding macros from synthesizing declarations but allowing them to fill in arbitrary definitions is a reasonable compromise? This would mean that @data would not be able to generate an apply or withX methods, but it could still fill in equals and hashCode. I think the withX methods could be done lens style with .with(_.x)(value) if there is a trait Data with def with[T](field: Self => T)(value: T): Self. Okay, Self doesn’t exist in Scala 3 (right?), but that’s a separate issue :slight_smile:

C# had same problem when they had to generate code from XAML to merge with user’s C#.

They did it with Partial classes

1 Like

I try to clarify what I’m after.

I didn’t care so much about macro annotations so far. I thought they’re on a good way, especially as they’re useful already. Since you can generate “internal” definitions (not visible to the rest of the program) now I was under the impression that the next logical step is to make those definitions somehow visible outside of the macro (e. g. “export” them to the outer scope) to achieve what I’m actually after.

I’m after some nice code-gen facility, coherent with the rest of the language.

That’s exactly the point!

And you can’t, by no means, do code-gen without introducing new definitions. That would fully defeat the point in most cases.

But Scala has still no proper facility for code-gen.

I really don’t understand why such an important feature is neglected!

I’m waiting for this feature more or less since macros were announced in Scala 2. I knew that those would go away and never invested in learning them. But since than I’m waiting for “the real thing”. And now? Nothing?

I say it once more: Concatenating strings into source files is a joke. More or less everything else is better. Even C++ templates are better in this regard. (And C++ template magic is actually hell).

Quite a lot of other people were so desperate that they used the interim solution in Scala 2. Why did they? Why would anybody invest in a “throw away” solution? Especially as this meant to dabble in complex compiler internals without any stability guaranty.

Also, why are there now so many macros still left that can’t be ported to Scala 3? Why did everybody use those even you could be sure that this will mean trouble updating to a future Scala version?

The answer is imho quite clear: People desperately need code-gen.

Code-gen is everywhere. Whole frameworks in all kinds of languages are build upon it. Even the Scala compiler requires it. More or less everything prominent in the Java world is highly depended on code-gen (as otherwise Java would be even more boilerplately than it is already). But in Java almost nobody complains. Even Java code-gen is inconvenient and unsafe there. Most people actually praise that Java kluge and love their magic annotations in frameworks like for example Spring.

Rust has a quite “primitive” token stream macro system (not much better than gluing untyped symbols together). Still the Rust macros are hyped as one of the most important features in Rust. For many it’s even one of the absolute killer features that Rust has and others don’t.

Could we please recognize that code-gen is a game changing facility when built into a language? People lining up to use such a feature because it solves real problems!

What I finally want in Scala 3 is a safe and convenient way to generate code from code, that works fine in an IDE. I strongly suspect a lot other people also want that. Everybody who needs to port code-gen macros to Scala 3 is waiting for it!

This self-imposed constraint was a good idea to build a very nice foundation for Scala’s new macros. I’m glad it was done this way as it resulted in an very clean solution, so far.

But this constrain needs to be lifted in some form to make code-gen possible. Like said, code-gen is pointless when you can’t generate anything else then implementations of already existing symbols / definitions. Scala still offers nothing in this regard when it comes to meta-programm code-gen.

Also the proposal here at hand seems imho a little bit self contradicting: How does checking whether some types are present and correctly implemented doesn’t affect type checking? What kind of errors should we expect if a, let’s call it “constraining macro annotation” doesn’t find the right types it requires to exist (which the compiler needs to compute anyway, btw.)? My best guess would be this results in type errors at the usage side because expected definitions / implementations are missing… So a “constraining macro annotation” would affect type checking, wouldn’t it?

I think both points are valid.

Extended editor support to look at desugarings is imho a very good idea, but orthogonal to some code-gen facility.

Code-gen should imho always end up in some generated code on disk. Only not in files that are meant to be edited by humans, land in version control, and show up in reviews! That’s the most terrible part of this proposal, imho.

But I’m also dreaming of better introspectability of some magic the compiler does under the hood. Something like that would make tooling really valuable.

One could go even one step further and enable more of this kind of editor magic. I would love if the Scala tooling could implement something like “code portals”. This is an ingenious idea nobody ever picked up, which is a shame as it fits especially well with the evaluation by substitution model of Scala. (VSCode code-lenses aren’t interactive, and can’t nest like portals).

Code-gen on the other hand is often a kind of build step. It should be independent of the IDE / editor someone uses.

This would just mean that all kinds of tools would become part of Scala the language. We’re back to the 90’s where your language was sold together with an IDE. Moving away form this IDE meant substantial rewrite of your code (if it even was possible to reasonably move away form the IDE; think VB6, or so).

This doesn’t seem relevant as the Kotlin compiler is tightly bound to the JetBrains IDE.

In Scala an IDE doesn’t “see” anything a compiler pluing does. Quite the contrary…

Also the initial sentiment doesn’t look very honest given this here:

They’re literally selling a meta-programming toolkit… So the “it’s too complicated for the tooling developers” argument falls apart.

That’s a good idea!

Generated “invisible” code is problematic on all kinds of axes. You can’t navigate to it and introspect it, debugging it is a horror, as you can’t even see it to figure out what’s going on.

I don’t see this. At least this doesn’t fit my definition of “fragmentation”.

It makes absolutely no difference whether I write some boilerplate code by hand or let a robot do the work. In both cases the result will be vanilla Scala code. Code that can be feed into the currently existing compiler without issues.

Using code-gen can’t lead to “dialects”. At least this doesn’t fit my definition of “dialect”…

Only if it would be possible to add or modify syntax, or change semantics of build-in language constructs this would result in a “dialects”. But nothing like that is possible through code-gen that is embedded into the language. All you have are the expression and declaration types the language offers anyway. Nothing can be changed or added there. It’s just a robot writing vanilla Scala at the end of the day!

One can’t build a macro that adds for example “do notation” to the language. Or build a macro that would finally allow me to use emojis as symbol names. That would create dialects as such code wouldn’t be recognized by the currently existing Scala compiler. But just letting a robot write some definitions won’t create “dialects” whatsoever.

Yes it’s clever. A little bit too clever for the liking of some here, I think…

That’s not an “added convince”. That’s what a computer is actually for: It should do the tedious work! That’s the main reason to use a computer, namely to automate things away.

I already know what needs to be there. I don’t need the compiler to check that. I need the machine to do the actual work and implement what needs to be there. That’s the whole point of automation. It’s nuts when the actual reason to use code-gen gets relabeled as “an added convince”.

This cries for trouble.

Someone could change generated code in ways that break intend but doesn’t break it’s interface (which is all the compiler can reliably check). Than have fun debugging.

Generated code would be the last place to look at as it’s usually reasonable to assume that some code-gen tool works fine, as otherwise also other people would have issues at the same time which you get to know easily for example by looking into the bug tracker.

This doesn’t “work well”. All this generated code needs to be read and maintained together with the handwritten parts. Actually it’s not even clear form looking at that code which parts are auto-generated.

The very next question is how about updates to the generated code? Now you need to use refactoring tools… Because you can’t distinguish the parts that are generated and those that are hand written. Just changing something in the code-gen templates and regenerating code is not possible after the initial generation of code. Alternatively you have // DO NOT EDIT THIS IT IS GENERATED CODE ANY CHANGES WILL BE LOST blocks everywhere, so you can see what you shouldn’t edit. Only that’s it’s hard to enforce that. So even more tooling with more complex features is needed…

But that’s exactly the scenario code-gen is used for!

Nobody uses such a heavyweight feature to generate a few simple lines.

When you grab code-gen you will usually generate a lot, and often quite complex code. Code that otherwise nobody would like to write by hand. And now exactly this kind of unwieldy code is all over the place… Come on.

But that’s the exact reason to use code-gen in the first place!!!

Just write the generated code to disk. Problem solved.

That’s also easy for tooling, as tooling almost doesn’t need to be aware of any “magic” going on. It just gets some additional folder full of source files. Java’s annotation processors work that way, and it works fine, is simple, and easy to grasp for developers.

Code-gen is like multi-stage programming. Only that instead of having templates that are embedded in compiler output and than specialized at runtime before the actual execution you move everything a step back and have “templates” (or the-like) in your source code which get “specialized” (filled in) at compile-time into source files on disk, so those can than be picked up in the “next phase of compilation”.

But when you write the code out by hand it’s not an “dialect”? Come on…

THIS!

And what’s about other Java compiler plugins? For example:

Java is really flexible in this regard. You can even change syntax!

Nevertheless nobody ever complained about “Java dialects”.

That’s not an issue as long as you don’t have to deal with some very low-level data structure representing your code.

But this could easily happen if you need to combine some string based Scala source rewrites when there is no other safe facility to achieve some code-transformation. Back to “meta-programming with sed”.

At least not in Kotlin…

Implemented as compiler plugin.

Exactly! We’re back in the 90’s tied to some special sauce in our IDEs.

Exactly!

You put in the annotation and nothing happens. It does not do the hard work for you as expected. That’s more than confusing. It’s frustrating.


Of course, it makes no difference how much I write here. I see, the post got way too looong anyway. Maybe because I’m waiting for such a long time for proper code-gen and can’t stand it that this long awaited feature may fall apart on the last few meter before the finish line.

But what I can say for sure: If this “export macros” thingy succeeds to be implemented, but should it be only available on the fork of the compiler, I know what I’m going to use—shouldn’t there be any adequate alternative in the official Scala release. I’m a simple grug brained developer, I will use the tool that solves my problems. (And I guess the average dev out there thinks the same. Otherwise we wouldn’t have all the currently unportable macros everywhere).

At this point than we can start a serious discussion about “fragmentation” of the language, I guess.

7 Likes

LSP code actions don’t normally do file i/o, instead they perform editor actions that modify your editor buffers in the same way that a user writing the code by hand would. If regular LSP code actions work in your setup, then the stuff being worked on in Roadmap for actionable diagnostics should also work. Notebooks are an issue as always but I just saw that Jupyter[Lab] Language Server Protocol — Language Server Protocol integration for Jupyter[Lab] exists and seems to at least supports renaming, so maybe notebooks won’t be second-class citizens eventually!

Anyway, irrespective of the codegen proposal (which I’m not planning to spend more time on at this point), I encourage you to give feedback on Roadmap for actionable diagnostics since we’re still figuring out the details. Also if Databricks is interested in better integration between Databricks notebooks and LSP/BSP, perhaps it’s something the Scala Center could help with (the inability to run tools such as scalafix on notebooks has come up before and I think is something we’d all benefit from solving).

1 Like

For the record, this basically matches what the current experimental support for macro annotations in Scala 3 lets you do (you can in fact add new definitions, but because macro annotations are expanded in a compiler phase that takes place after typer, they’re not visible outside of the macro expansion).

Maybe we need Scala Poet for code gen

1 Like

Only that we don’t need any “builders” as we have already Expr[?]s.

All that’s needed is to “export” (some of) the output so it can be picked up by the next compiler “stage” (as in multi-stage programming).

Using codegen is fine, as long as it does not influence the typing of the rest of the program. Like I said, @main is fine. @data in the original meaning is not fine since it would create a new sort of case class that offers methods different to case classes in the same compilation unit. That’s for all effects and purposes a dialect. Somebody coming new into a Scala codebases that uses @data has to know about what definitions it generates, just like they have to know what definitions a case class generates.

@data can be reasonably supported under the restriction/rewrite model since the definitions it generates are straightforward. In that case, even if you don’t know about @data, you can understand the class just the same by looking at the explicit definitions. But it’s still important that these definitions are there, both for tooling (to have something to navigate to) and for understanding.

Now, if you want to create something much more complicated than that and want it to be hidden from the eyes of the programmer, but you still require that the new definitions are somehow understood to be there, be callable from Scala code, and so on you are in effect creating a dialect: a language that cannot be understood without precise knowledge of what the annotation does. And the tooling experience will be substandard too, because of these hidden definitions.

2 Likes

The editor integration is not part of Scala. It can be offered in Scala tooling, just like Github can offer Copilot. It’s not linting baked into the language either. Rather, it’s the code of the macro annotation that can do the checks.

To give some ideas what a macro annotation can do:

  • Generate definitions that are not directly accessible from the same Scala program, but that can be used for e.g. FFIs or host embeddings. Example: @main
  • Check the code of the annotated definition in some sense. Example: @tailrec
  • Change the body of the annotated definition without changing its signature. Example @optimized
  • Serve as markers for external tools. Basically that’s what Java annotation processors are. For instance, we could have a codegen tool based on TastyQuery that takes an annotated file and produces companion units that add new definitions. With a bit more effort we could let such a tool even produce Tasty directly, so no string concatenation would be needed to do this form of codegen. That tool does not exist currently, but the foundations to develop it are in place,
    If there’s enough interest, we as a community can try to find the resources to develop it.

With such a definition, I can say that any sufficiently advanced framework defines a language dialect.

4 Likes

I think it would be good to work from concrete examples. Where does the dotty repo use codegen? And how is codegen used in your projects?

At least I’m not arguing for any hidden (or like I called it elsewhere “virtual”) code.

Generated code needs to be easily accessible and introspectable for tooling and humans. That’s for sure!

I think that generation of “virtual” invisible code was a flaw in the old macro annotations. In this point I’m fully with you!

But just dumping the generated code into some sources that are otherwise meant to be maintained by humans directly seems just wrong also.

And I don’t buy that “dialect” part. Under such definition almost every Java framework would constitute a “Java dialect”. Nobody ever called it like that; not even something close. Or: Do Rust macros create “Rust dialects”? Honest question.

You always need to know what’s happening under the hood when you use some framework / feature that does seemingly “magic things”. But just having some “magic things” around doesn’t create a language “dialect”. (But in the end it makes no difference to argue this part. Words can be defined arbitrary. So the proposed definition is arbitrary, and “we don’t want ‘dialects’” needs still some concrete justification as such “dialects” are obviously harmless. Almost no language with meta-programming facilities broke because people created meta programs! LISP may be an exception to this rule as it offers very powerful rewriting on the bare syntax level without any safety net of semantic checks, and especially no type checks. But other languages with meta-programming systems don’t suffer form this phenomenon. You can’t call a list as a function by accident in a typed language only because you prepended some symbol to that list that’s bound to a function in some scope. But in LISP exactly this can happen. All you have is s-expression soup. More modern languages have a different types of expressions so messing things up by accident is very unlikely.)

Yeah, checks.

But not the actual work that’s the sole reason to use some facility like that.

That goes into the direction I’m after.

What I don’t understand: Why external tools? Why TastyQuery?

We have already a very fine DSL built into the language to abstract over code creation : Macros! The new quote stuff is impressive and more advanced than what for example the mentioned Kotlin compiler / tooling offers. All that’s needed now is a possibility to export definitions from the macro scope into the outer program.

Such an export should of course not interfere with type checking inside the compilation unit where it gets imported in any other way than some regular compilation unit (code in a different file) that gets imported can. Otherwise we would have the previous mess, with invisible “action at a distance”.

Java is already very dynamic regarding imports, and Scala has excellent support for separate compilation. So one could relatively easy export definitions from macro scope, dump the results to disk, and make them available as an otherwise normal external compilation unit for a “second stage of compilation”. (That’s more or less my understanding of how the mentioned export macros @littlenag is building would work, only that the “dump stuff to disk” part isn’t planed currently afaik; please correct me if I misunderstood).

1 Like

I’ve lately linked something in another post.

I don’t know what he’s doing. But maybe what I had in mind so we can discuss concrete examples that I wanted to build using meta-programming in Scala 3: I’m crying for finally sane code-gen because I want to generated whole client and server stubs with all the marshaling in between just from some simple data definitions. Also I want to abstract away the persistence layer for this data completely. Who ever built some web software where the “business logic” is mostly CRUD knows that most of the code is completely repetitive and differs mostly only by the names and structure of some entity classes. The rest is almost completely mechanical. One could rightly say that +90% of the project consist of boilerplate…

The amount of copy-paste in such projects is hilarious! Because you can’t abstract anything away without resorting to the most dirty “tricks” like creating (string) templates for code files that get filled in by some external scripts.

This kind of code-gen would create a lot of code. The generated code would be one or two orders of magnitude larger than the hand written parts. Of course you need all kinds of definitions. Actually I want to generate whole implementation packages. Likely across Scala platforms. So having Scala.js and Scala JVM code generated that matches each other. The code would be mostly not meant to be touched by humans. (But it needs of course to provide some extension points, so hand written code could be hooked in).

As very large parts of the whole code would be generated a good debugging story is vital. Also being able to read and test the code during development of the “templates” is important. So “virtual” code is no good.

Macro annotations as such play only a small role in this scenario. They would be only the trigger points for code-gen. Convenience of writing “templates” and the tooling support around that are the main concerns here.

OTOH I don’t need any “checking of validity” triggered by the macros.

Is this a workable example?

1 Like

So if I understand correctly, what you are after is a high-level annotation processor that can produce .scala files and other artifacts? I agree that this would be useful to have. It could probably be implemented as a compiler plugin using quotes.reflect as a base layer.

3 Likes

Yes, something like that! :smiley:

How it works in the end under the hood, I don’t care actually.

But the “templating” needs to be sane, safe, and convenient even for less skilled people.

My impression was that the current quote stuff in Scala, with its Expr[?] abstraction, would make a really great “templating language”. It’s the best I’ve seen so far as it’s type safe!

The trigger points that would deliver the data to the “templates” would be hand written annotated definitions (of for example case classes).

The results of the triggered code-gen needs to be “material” as this would be otherwise way to much opaque magic that can’t be debugged reasonably.

And yes, such a feature would be extremely useful! The lives of lesser beings consist in large parts of writing repetitive boilerplately code. Cutting this down to the bare minimum would make Scala especially attractive to Jon-Doe-average-programmer. It would be almost a killer feature for some jobs, making mundane tasks really easy—without compromise on safety or tooling support (like in the case of stringy code templates that are the only way to achieve the stated goal currently in Scala).

Just think about the large market share of poor web devs working with all kinds of languages who do mostly nothing else than writing such kind of “boilerplate”; defining entities, code that brings them over the wire, and persists them on the server side. Most of this is copy-paste, while just replacing entity and field names. A framework that could abstract this away would be a game changer! Spring killer…

Thanks a lot for trying to understand what the pain points are, and what would make things substantially better! That’s something I love Scala for. People are listening. (You sometimes just need to cry loud enough… :grin:)

3 Likes

It’s worth noting that annotation processors are among the most popular tools right now in java-land (mapstruct, immutables, micronaut), and that somehow the annotation macro produces java files with code that are visible in the same compilation unit, because you are able to use the generated definitions on the same file where you introduced the annotations that produce the generated code.
I don’t know how this magic happens, but it is there and it is very necessary for the general usage of annotation processors, in java at least.

1 Like

You can use Kotlin compiler plugins in other contexts too, including Maven and REPL, but it is nice that IDEA’s error highlighting doesn’t get too confused by the syntactic absence of generated stuff.