Code generation

On the topic of macro annotations and AST generation, it’s been often mentioned that this is undesirable and that a better approach would be code generation. In scala 2 we do have ast manipulation¹ but in scala 3 we don’t, we also don’t have tools for code generation, we are left with no migration path.
Tasty introspection is nice, but it depends on the code being compiled already. Both Java and Kotlin support annotation processors which allows you to generate more source code during compilation, and that code gets in turn compiled in the same compilation unit, allowing cyclic dependencies between the two. With this one can, more or less, replace annotation macros that alter the ast.

Can this be realistically supported by a compiler plugin, or does this really require direct compiler support?


  1. Meaning the ability to introduce new symbols that are visible in the compilation. Macro annotations in scala 3 do allow us to change the ast but not the API visible before the macro runs.
4 Likes

AFAIK a “research plugin” can do whatever it likes, but “normal” plugins can only trigger after typer.

https://docs.scala-lang.org/scala3/reference/changed-features/compiler-plugins.html

Of course it would be trivial to subvert any protection against that either with plain reflection, or some ByteCode manipulation (didn’t check the details), as the JVM is a dynamic runtime. But I guess that’s not the desired way to do things. The restriction for non-research plugins is there for a reason. It’s just that if it turns out to be too restrictive people will find workarounds. That would be a bad outcome, so I think there should be some way to get features in that are demanded by the users.

Regarding the actually topic: I voiced my opinion on code generation more than often, I think I don’t have to repeat the plea. As said elsewhere I would like to have a simple, type safe code templateing system with very good IDE support. Maybe something on the basis of “typed holes” could work for that?

1 Like

what is the difference between such a “type safe” templating system, and writing some library that can convert Expr[T] to text and put it in a .scala file?

The difference is ergonomics and approachability.

What you propose actually “works fine”. I’m doing exactly this in one place.

But you need to handle “code” (Expr[?] expressions) programmatically. There is no way to do it declaratively. A quasi-quotes feature is missing.

A quasi-quotes feature that supports “holes” in the expressions (the “variables” in the “template”) which can be filled later on. The holes need to have types too, so you can’t “render” your “template” placing wrongly typed expressions in the holes as this would result in not compileable code, or worse, in wrong / nonsensical but compiling generated code. (If the types match it could still be nonsensical code. But one can always write nonsensical code, types help only as far as they can; I can always put the wrong string in a String variable…)

But that wouldn’t be enough as “normal” macros (even written with the help of quasi-quotes) wouldn’t able to create new code. But now writing Scala files and than again compiling them as part of the project is an external built-tool “hack”. The compiler is not aware of that, so IDE features (like navigating from and to your “templates”) won’t work. You just end up in the generated code, without the compiler knowing that it’s coming out of “rendered” Expr[?]s somewhere. The link is missing.

Also filling “holes” this way does not really work. If the “rendered” code is parametric, have fun transforming it programmatically. Of course without any built-in support for “holes” from the current API to make things more funny.

To make the “just write Scala files from Expr[?]s” idea anyhow usable the compiler needs to be fully aware of this whole generate-write-read-include-compile cycle. Otherwise it’s a big messy hack! (That’s what it looks like; I know as I have constructed such thing).

Being aware of such macro expansions needs to work also likely without the help of the build tooling (as otherwise Scala would be tightly coupled to such build tool that does that).

As I’ve also said elsewhere for me code-gen is strictly the variant that actually “materializes” the generated code. Otherwise generated code is indeed not handlebar, as you can’t handle code you can’t “see” or “touch” as it’s just some data in memory during compilation; which was one of the problems with how code-gen through old Scala macros worked; the results couldn’t be debugged in any meaningful way; which would work fine if the generated code actually “materializes”.

So I think this “just write Scala files from Expr[?]s” is actually the right way to do it at the core. But it needs machinery around to make is actually usable!

Of course, with materialized generated code there would be also a strong desire for something like partial classes from C#. (Handling generated code is actually one of the main motivations to have that feature there).

Sorry for the long post, again a wall of text, with mostly things I’ve said already a few times elsewhere. I don’t want to sound annoying! It’s just that proper macros are imho really one of the most glaring holes (no pun intended) in the Scala language. It’s a joke that what we have is even worse than good old CPP (CPP has proper IDE support…). To “template” code you need to do obscure hacks… Usually involving raw strings… In a language which has one of the most advanced staged compilation features. That’s a major joke!

(Also I’ve posted the link to that Hazel language for a reason. Having first class support for holes in the language in general, not only in “code templates”, would enable IDE features not seen until today anywhere; I mean besides in that Hazel language; I recommend everybody to have a look. It’s fascinating! One step closer to a real structural editor for code! :vulcan_salute:)

I still don’t see what is missing ergonomically there, you can make a function Expr[Foo] => Expr[Bar] and splice the argument into the result? what power isn’t there? about quasi quotes, I thought your complaint was we shouldn’t be working in strings?

If quasi-quotes are strings Scala has already quasi-quotes… :joy:

But maybe (made up) examples are more helpful.

val classNames = List("Foo", "Bar")
val code: ClassName => Expr[ClassDef] = s"""
class $_:
   def baz = println("doing work")
"""
classNames
   .map: templateVariable =>
      code(templateVariable.toClassName)
   .foreach:
      _.materialize

Would such code work?

How does the IDE work inside the “quasi-quote”?

Does the compiler complain if I use $_ at any other place than where a class name is expected and valid?

What does actually materialize do? Where will I find the Scala files with the generated classes? How are they included in my current project?

When I’m inside the generated classes (wherever they are) will the IDE be able to navigate back to the “quasi-quote” that defined them, or find use sides?

The current reality is that this does not work like that.

To do the same as that hypothetical code I need to

  • Create a “TemplateClass”, which is actually a real class in some sub-project that doesn’t get published and is there only for the compile time magic
  • Compile that TemplateClass, and than reread the generated TASTy
  • Walk the TASTy, programmatically change the class name so it matches the “template variable” and than “pretty print” that TASTy back to some files on disk in another project; do that for all “template variables” [I don’t even remember this was possible for class names, and I didn’t need to fall back to do string replace in the pretty printed code snippets; would need to look it up again how I did it, but not now]
  • Compile that project and depend on it where I need to use the generated code

Don’t ask me how to do that if there were circular dependencies between generated code and its use side…

You do all of that of course without nice IDE support, that could for example navigate from that generated code back to the actual “TemplateClass”.

Also everything is just stringly typed. A ClassName does not exist. It’s just a string, which I could use somewhere in some API call or constructor.

The “template variables” are of course much more involved in reality. You have at least some Maps of Lists, and you need to walk the TASTy a few times until all is replaced and transformed so you can generate and write out one instance of the code.

The whole approach is imho a hack. It works, yes, but it’s far from optimal. It’s definitely not ergonomic. It’s very fragile. (Change the “TemplateClass” or it’s surroundings and you break likely the TASTy walking code-generation code).

It’s also not declarative. I can’t put some placeholder at the place of the definition of the name of my “TemplateClass”. I need to call that class somehow, and than fish for exactly that string in the TASTy. There is no API for placeholders (like in this example the class name, but could be also method names, parameter names, type names, package names, and maybe some other things I forgot) which could be filled with (type safe) template variables.

A declarative approach would be also much more robust. Changing the templates would not break everything, it would just keep working.

Because doing this with the above approach is so extremely involved people just use string templates for code instead. I’m also back to doing that. Because it needs less machinery, and it’s almost trivial to declaratively replace some placeholder in a string template as Scala has a built-in feature for that. But it does not have that for code, despite “powerful macro features”.

Of course everything is than just a string and you have no IDE support at all, but that’s also the status quo if you do it the involved way.

What I’ve described is of course half a compiler pipeline. Just in user space, hold together by some build scripts… I think it would be much simpler to just use an already existing compiler, which has all the machinery already available, in much better shape than whatever one could hack into existence. Also the compiler has already an API to feed back info into the IDE. Something the home made solution can’t provide with realistic effort.

Another aspect I was thinking of:

Code generation can be actually seen as part of staged compilation. Just that code-gen happens at “negative stages”. The above example would expand the code template at stage “-1”. (In theory one could think of generated code that generates code which would be than something happening at stage “-2”. But never seen a use-case for that. Still, if the machinery were there this wouldn’t be to difficult to have also I guess). The point is: Maybe this would fit nicely into the current theoretic framework? “Just” expand it to negative stages, get code-gen with superior compiler / IDE support for free. (OK, it needs to do all the things I’ve described above, and that’s not “free”. But I think the building blocks are already there. If you can do it in user space it should be even simpler to implement with the tools in the compiler).