Code generation

rcano · November 3, 2024, 8:23pm

On the topic of macro annotations and AST generation, it’s been often mentioned that this is undesirable and that a better approach would be code generation. In scala 2 we do have ast manipulation¹ but in scala 3 we don’t, we also don’t have tools for code generation, we are left with no migration path.
Tasty introspection is nice, but it depends on the code being compiled already. Both Java and Kotlin support annotation processors which allows you to generate more source code during compilation, and that code gets in turn compiled in the same compilation unit, allowing cyclic dependencies between the two. With this one can, more or less, replace annotation macros that alter the ast.

Can this be realistically supported by a compiler plugin, or does this really require direct compiler support?

Meaning the ability to introduce new symbols that are visible in the compilation. Macro annotations in scala 3 do allow us to change the ast but not the API visible before the macro runs.

MateuszKowalewski · November 10, 2024, 12:27am

AFAIK a “research plugin” can do whatever it likes, but “normal” plugins can only trigger after typer.

https://docs.scala-lang.org/scala3/reference/changed-features/compiler-plugins.html

Of course it would be trivial to subvert any protection against that either with plain reflection, or some ByteCode manipulation (didn’t check the details), as the JVM is a dynamic runtime. But I guess that’s not the desired way to do things. The restriction for non-research plugins is there for a reason. It’s just that if it turns out to be too restrictive people will find workarounds. That would be a bad outcome, so I think there should be some way to get features in that are demanded by the users.

Regarding the actually topic: I voiced my opinion on code generation more than often, I think I don’t have to repeat the plea. As said elsewhere I would like to have a simple, type safe code templateing system with very good IDE support. Maybe something on the basis of “typed holes” could work for that?

bishabosha · November 10, 2024, 1:46pm

what is the difference between such a “type safe” templating system, and writing some library that can convert Expr[T] to text and put it in a .scala file?

MateuszKowalewski · November 10, 2024, 11:43pm

The difference is ergonomics and approachability.

What you propose actually “works fine”. I’m doing exactly this in one place.

But you need to handle “code” (Expr[?] expressions) programmatically. There is no way to do it declaratively. A quasi-quotes feature is missing.

A quasi-quotes feature that supports “holes” in the expressions (the “variables” in the “template”) which can be filled later on. The holes need to have types too, so you can’t “render” your “template” placing wrongly typed expressions in the holes as this would result in not compileable code, or worse, in wrong / nonsensical but compiling generated code. (If the types match it could still be nonsensical code. But one can always write nonsensical code, types help only as far as they can; I can always put the wrong string in a String variable…)

But that wouldn’t be enough as “normal” macros (even written with the help of quasi-quotes) wouldn’t able to create new code. But now writing Scala files and than again compiling them as part of the project is an external built-tool “hack”. The compiler is not aware of that, so IDE features (like navigating from and to your “templates”) won’t work. You just end up in the generated code, without the compiler knowing that it’s coming out of “rendered” Expr[?]s somewhere. The link is missing.

Also filling “holes” this way does not really work. If the “rendered” code is parametric, have fun transforming it programmatically. Of course without any built-in support for “holes” from the current API to make things more funny.

To make the “just write Scala files from Expr[?]s” idea anyhow usable the compiler needs to be fully aware of this whole generate-write-read-include-compile cycle. Otherwise it’s a big messy hack! (That’s what it looks like; I know as I have constructed such thing).

Being aware of such macro expansions needs to work also likely without the help of the build tooling (as otherwise Scala would be tightly coupled to such build tool that does that).

As I’ve also said elsewhere for me code-gen is strictly the variant that actually “materializes” the generated code. Otherwise generated code is indeed not handlebar, as you can’t handle code you can’t “see” or “touch” as it’s just some data in memory during compilation; which was one of the problems with how code-gen through old Scala macros worked; the results couldn’t be debugged in any meaningful way; which would work fine if the generated code actually “materializes”.

So I think this “just write Scala files from Expr[?]s” is actually the right way to do it at the core. But it needs machinery around to make is actually usable!

Of course, with materialized generated code there would be also a strong desire for something like partial classes from C#. (Handling generated code is actually one of the main motivations to have that feature there).

Sorry for the long post, again a wall of text, with mostly things I’ve said already a few times elsewhere. I don’t want to sound annoying! It’s just that proper macros are imho really one of the most glaring holes (no pun intended) in the Scala language. It’s a joke that what we have is even worse than good old CPP (CPP has proper IDE support…). To “template” code you need to do obscure hacks… Usually involving raw strings… In a language which has one of the most advanced staged compilation features. That’s a major joke!

(Also I’ve posted the link to that Hazel language for a reason. Having first class support for holes in the language in general, not only in “code templates”, would enable IDE features not seen until today anywhere; I mean besides in that Hazel language; I recommend everybody to have a look. It’s fascinating! One step closer to a real structural editor for code! )

bishabosha · November 11, 2024, 7:51am

I still don’t see what is missing ergonomically there, you can make a function Expr[Foo] => Expr[Bar] and splice the argument into the result? what power isn’t there? about quasi quotes, I thought your complaint was we shouldn’t be working in strings?

MateuszKowalewski · November 12, 2024, 4:13am

If quasi-quotes are strings Scala has already quasi-quotes…

But maybe (made up) examples are more helpful.

val classNames = List("Foo", "Bar")
val code: ClassName => Expr[ClassDef] = s"""
class $_:
   def baz = println("doing work")
"""
classNames
   .map: templateVariable =>
      code(templateVariable.toClassName)
   .foreach:
      _.materialize

Would such code work?

How does the IDE work inside the “quasi-quote”?

Does the compiler complain if I use $_ at any other place than where a class name is expected and valid?

What does actually materialize do? Where will I find the Scala files with the generated classes? How are they included in my current project?

When I’m inside the generated classes (wherever they are) will the IDE be able to navigate back to the “quasi-quote” that defined them, or find use sides?

The current reality is that this does not work like that.

To do the same as that hypothetical code I need to

Create a “TemplateClass”, which is actually a real class in some sub-project that doesn’t get published and is there only for the compile time magic
Compile that TemplateClass, and than reread the generated TASTy
Walk the TASTy, programmatically change the class name so it matches the “template variable” and than “pretty print” that TASTy back to some files on disk in another project; do that for all “template variables” [I don’t even remember this was possible for class names, and I didn’t need to fall back to do string replace in the pretty printed code snippets; would need to look it up again how I did it, but not now]
Compile that project and depend on it where I need to use the generated code

Don’t ask me how to do that if there were circular dependencies between generated code and its use side…

You do all of that of course without nice IDE support, that could for example navigate from that generated code back to the actual “TemplateClass”.

Also everything is just stringly typed. A ClassName does not exist. It’s just a string, which I could use somewhere in some API call or constructor.

The “template variables” are of course much more involved in reality. You have at least some Maps of Lists, and you need to walk the TASTy a few times until all is replaced and transformed so you can generate and write out one instance of the code.

The whole approach is imho a hack. It works, yes, but it’s far from optimal. It’s definitely not ergonomic. It’s very fragile. (Change the “TemplateClass” or it’s surroundings and you break likely the TASTy walking code-generation code).

It’s also not declarative. I can’t put some placeholder at the place of the definition of the name of my “TemplateClass”. I need to call that class somehow, and than fish for exactly that string in the TASTy. There is no API for placeholders (like in this example the class name, but could be also method names, parameter names, type names, package names, and maybe some other things I forgot) which could be filled with (type safe) template variables.

A declarative approach would be also much more robust. Changing the templates would not break everything, it would just keep working.

Because doing this with the above approach is so extremely involved people just use string templates for code instead. I’m also back to doing that. Because it needs less machinery, and it’s almost trivial to declaratively replace some placeholder in a string template as Scala has a built-in feature for that. But it does not have that for code, despite “powerful macro features”.

Of course everything is than just a string and you have no IDE support at all, but that’s also the status quo if you do it the involved way.

What I’ve described is of course half a compiler pipeline. Just in user space, hold together by some build scripts… I think it would be much simpler to just use an already existing compiler, which has all the machinery already available, in much better shape than whatever one could hack into existence. Also the compiler has already an API to feed back info into the IDE. Something the home made solution can’t provide with realistic effort.

Another aspect I was thinking of:

Code generation can be actually seen as part of staged compilation. Just that code-gen happens at “negative stages”. The above example would expand the code template at stage “-1”. (In theory one could think of generated code that generates code which would be than something happening at stage “-2”. But never seen a use-case for that. Still, if the machinery were there this wouldn’t be to difficult to have also I guess). The point is: Maybe this would fit nicely into the current theoretic framework? “Just” expand it to negative stages, get code-gen with superior compiler / IDE support for free. (OK, it needs to do all the things I’ve described above, and that’s not “free”. But I think the building blocks are already there. If you can do it in user space it should be even simpler to implement with the tools in the compiler).

sake92 · December 28, 2024, 1:56pm

Just my 2 cents on this subject.
Scalafix for example is great when you want to do linting or refactoring of every file.
But we don’t have a tool to say “I want this file to have this, this and this, do refactor it only if needed”.
Made a library for exactly that: GitHub - sake92/regenesca: Refactoring Generator of Source Code for Scala
Example: regenesca/example/src/example/Example.scala at main · sake92/regenesca · GitHub

Used them for 2 (re)generators:

squery SQL models/daos Code Generation - Squery
openapi generator GitHub - sake92/openapi4s: openapi4s

There is certainly an amount of ad-hoc rules for replacement of definitions, but for my usecases it works surprisingly well.
And mostly simple to understand, not a lot of code really.

MateuszKowalewski · December 30, 2024, 9:10pm

Thanks for sharing!

This looks very much like what I want to have. Need to see whether I can adopt it to my use-case. But this here looks exactly like what I’ve described above:

val typeName = Type.Name(sqlTable.capitalize)
val termName = Term.Name(sqlTable.capitalize)
val tableNameLit = Lit.String(sqlTable)
source"""
    case class ${typeName}()

    object ${termName} {
      val tableName = ${tableNameLit}
    }
"""

It’s of course very unfortunate you need to use Scalameta (which is imho a big hack, constantly lagging behind current compiler development) instead of some compiler provided machinery, like for example something called “macros” in other languages.

sake92 · December 31, 2024, 5:42pm

Scalameta is a bit clunky to write, but at least you know that the file will be syntactically corect 99% of the time.
Compare that with manually stitching strings…
It will nicely handle reserved keywords like “type”, it will generate it as val `type` which is allowed.

Hopefully they will manage to publish scala3 parser from the scala3 repo itself, so that scalameta can just depend on it.
There was a thread/issue somewhere…

som-snytt · December 31, 2024, 7:44pm

At one time, it was deemed useful to have more than one implementation of the compiler.

Now the pendulum has swung.

Parsing is supposed to be easy, as far as phases go, so it is especially too bad to suggest that everyone must depend on the dotty parser because it’s just too tricky.

I leave the rest of this post as whitespace in tribute to that feature.

MateuszKowalewski · January 4, 2025, 10:47pm

Almost no language can afford more than one toolchain.

Of course it would be nice in theory to have that, but that just isn’t realistic for Scala right now.

Even in case of the few languages in usage that have such luxury, namely the ones supported at the same time by GCC and LLVM, there are exactly only those two implementations left that mater at this point. (One could argue that there is also this Microsoft thingy, but it’s nothing else than a Microsoft thingy, so it doesn’t count imho.)

OK, there is also JS. But the runtimes used outside of the browser (where also just two remained) are more or less just forks of the same code base. All other implementations are as insignificant as alternative C implementations.

Does anybody know more examples where there are competing language implementations in usage? (Please don’t say RegEx… )

Besides that, implementing a parser is likely not rocket science. The actual issue is to keep all of the implementations in sync! That does not even work for stuff where there is an “official” ISO standard. The different implementations still often disagree, and than you need to find out where the bug actually is: The one implementation, the other implementation, or the spec. That’s why we have soon C++24 but not even C++17 works flawless across all implementations. I really don’t what to have that in Scala!

ragnar · January 19, 2025, 12:11pm

Common Lisp has many implementations that are used.
Haskell used to have many in use, but GHC seems to have won?
JRuby and Jython both have users, as do TruffleRuby and GraalPy.
Java has many different implementations.
C# has multiple.
Unix shell has many different supersets.
SQL.

It’s also extremely common that parsers for languages are rewritten for every editor with syntax highlighting, because rewriting parsers within the existing parsing infrastructure tends to be simpler than somehow fitting existing parsers into that infrastructure.

There often is one implementation that is normally used, with other implementations being more special purpose. But a lot of the alternative implementation would be dearly missed if you took them away from their users.

In some ways Scala is a bit unusual that the compiler does support JVM, Native, and JS backends and it was not necessary to create completly different implementations for each Scenario.