Scala 3, macro annotations and code generation

I’m not saying this hack wouldn’t work, somehow.

But it’s imho a mayor hack nevertheless.

Some formatting or even a comment on the wrong line could break the whole thing. Especially funny when this happens in CI (people have crazy stuff in CI, like code formatters). Now all kind of tooling needs to be aware of the hack. Also one would need sophisticated “tree-diffing” which accounts for all kinds of such possible breakage caused by otherwise harmless changes.

How about compile times when the compiler needs to regenerate code every time to do the actual diff (which isn’t fee either)? Code-gen can be quite heavyweight. That’s nothing you do on every recompile for a reason usually, but with this proposal the compiler would constantly need to check whether the // DO NOT EDIT THIS IT IS GENERATED CODE ANY CHANGES WILL BE LOST blocks are still intact. Additionally “tree-diffing” can become quite complex.

But mixing hand written code and generated code is anyway an K.O, imho. Nobody likes to see changing // DO NOT EDIT THIS IT IS GENERATED CODE ANY CHANGES WILL BE LOST blocks in diffs (and reviews). This would make things like git bisect also more complex, at least, I guess.

Those issues could be fixed of course if the code would be generated externally (as everybody else is doing). But than the question remains why go the route of a hack with all its issues instead of building a clean solution based directly on the current macro system like proposed in the other previously linked thread?

I understand that this proposal likely seemed simple to implement at first. So it’s a smart hack! It’s just not the right thing™ in my opinion.

If we would need as fast as possible something that works somehow maybe this hack would be even bearable. But why rush things? There is no reason to do so. This will be around for some time I guess. So having only a mediocre solution with a lot of design warts (like the old macros), which may be simple to implement in isolation but will make the live with tooling / automation very hard in the end, is not very optimal.

If it did that would be a deal-breaker indeed, I think in practice we could make it work (you can check out the PR I linked to earlier and try to break it)

Just generating the code isn’t the expensive part, it’s the rest of the compiler pipeline operating on the generated code which takes time. It’s a common trap for people to accidentally use macros in a way that generates a ton of code then be confused about their suddenly increased compile-time.

I’m not trying to rush anything, just proposing something that fits our self-imposed constraints of “macros are not allowed to create new definitions that affect subsequent typechecking” and doesn’t require new language features. Everyone is welcome to make their own proposals and explore other parts of the design space of course.

For me the big upside of this proposal is the following:
It shows the user what is going on, so that they can have a mental model of what these macros do
Which is useful for example when debugging code

But in my opinion, this should not be done by editing source files !

Since we kind-of assume IDE support, I think we should approach the problem from a different angle:
Develop tools that allow the user to inspect the generated code with their IDE

This has the benefit of also applying to already existing constructs like inline and macros, or even case classes, for and match !

I think this would be a very valuable tool for teaching as well, student could write the for they want, and see how it is desugarred

My vision would be for it to work somewhat like code folding:
If you click/expand/… on a macro, it displays the code generated by that macro. Probably either as a popup or in a different color, so that it’s always clear what code is “real”, and what code is generated)

4 Likes

And since not everyone uses a visual IDE, we can also think about how to show these expansions in CLI environments

But this is somewhat orthogonal, as the same issues would appear with the rewriting idea

1 Like

Moving the manual mangling of source files from compiler to the editor mitigates the problem a bit, but does not solve it. This assumes two things:

  1. The source files are writable at all (not always true today for ~half of my colleagues, who edit code on one machine and compile/run it on another!) and not shared (not true for most cross built code!)
  2. You can integrate with everywhere Scala code is written

Let’s consider (2). Maybe we work with Virtus to integrate Metals/vscode, and jetbrains integrates IntelliJ. Then what?

  1. What about Almond/Jupyter notebooks?
  2. Zeppelin Notebooks?
  3. Polynote notebooks?
  4. Databricks notebooks?
  5. What about the REPL? Will it edit the code being submitted without compiling/running it?
  6. What about alternate Repls, like Ammonite?
  7. What about codegen? Let’s say I generate code within a SBT, Mill, or Bazel build task on a CI machine. Who will be press the “autofix” button then?
  8. Mdoc snippets?
  9. Vim/Emacs/Sublime?

There’s a long tail of places where Scala code is written and run, the above is just what I came up with off the top of my head in 30s, I’m sure there are countless others I didn’t think of. All of these places assume code is written by the user and compiled and run with a one-directional dataflow. That is how it works for 99% of other programming languages they support. Most do not expect code to flow “backwards” from the compiler back to the sources.

These problems are solvable. A similar challenge exists with scalafmt/scalafix. But it’s one thing for third party tools to have incomplete best-effort support within all execution environments; these integrations are just icing on the cake making your life easier. But having core language features/workflows be unsupported depending on where you write your code, and requiring special integrations to properly use a language feature at all, is something quite different.

2 Likes

I’ve created a seperate thread about this, but I’d like to ask this here: How many of the previous uses of annotation macros that modified program state actually needed to modify program state? Is it possible that the situations where code generation is actually absolutely needed doesn’t truly apply to the listed environments?

If code generation is based on non-scala based information (ie: generating endpoints from an openapi spec), then yeah you need code generation, but I don’t think you’d actually use that with jupyter/the repl/etc.

However, there’s places where macro annotations were used in the past that can potentially be solved by Scala 3 features like programmatic structural types.

I’m not advocating that Scala should (or should not!) follow suit, but the approach Kotlin takes is interesting: essentially everything you might want to use macros for is implemented as a compiler plugin. Here’s a twitter thread from a couple years ago with some rationale.

I think what @smarter was advocating is a change in viewpoint,

First, I believe we should never let unrestricted macro annotations in their old form into Scala again. @smarter gave as an example @data classes. Looks like a great idea and is sure very convenient in some situations, but will get a hard no from me. If we admit things like that we open all doors again to language fragmentation. And this goes completely against our idea what Scala should be.

Macro annotations are OK when it comes to codegen for interop (e.g. something like @main annotations). Embedding your Scala program into a host environment without having to write boilerplate code is great, and since none of this is visible at type checking, it will not lead to dialects and fragmentation.

Having macro annotations restrict your program is also OK. Macro annotations could check that your program is pure, or that it can be translated to SQL or Datalog or any property you like. Enforced language subsets are not dialects,

The idea of @smarter, which I find quite clever, is to re-interpret a macro annotation like @data as a way to restrict your program. It now indicates that the definitions required by the @data class specification are all present. That by itself is useful and should be uncontroversial. You could manually check that all required definitions are there, but with @data the compiler does it for you, and you see at a glance what kind of class this is.

Then as an added convenience the compiler or IDE can also generate any definitions that are missing for you. You can freely replace or edit those definitions; the annotation will simply check that even after editing they are still of the right shape.

If the added code is small and easy to understand I can see this working quite well. Sort of like automatically adding getters and setters in Java IDEs. If the added code is large and convoluted that’s another matter. Then probably you should not do it. But at least we won’t have the situation that large and convoluted code gets added under the surface without this being pushed in the face of the developer.

So the point is, the rewrite aspect is really just an added convenience. It could be achieved in a number of ways or be omitted altogether. The important part is the change in viewpoint: The annotation tells you want content you can expect to see in the class.

5 Likes

Not sure what happened to my email reply, so I’ll repost:

The problem we’re trying to solve is that the macro should not influence the types. I think something akin to the Typescript example given can solve that. So basically what we need is

  1. A language for expressing in the types what the shape is of the code that will be generated

  2. Allow annotation macros to be responsible for providing the implementation of methods etc. expressed in the type.

So for example, there should be a way to express, not in a turing-complete executable macro code language, but in the types, whether as part of the definition of the annotation or otherwise:

  1. For the @data example, something like “Annotates a case class C, and for every field f: T there will be a method with${f.capitalize}(p: T): C
  2. For monocle @Lenses, something like "Annotates a case class C, takes a parameter prefix: String = "", and for every field f: T there will be a field in the companion ${prefix}f: Lens[C, T]
  3. I would like to be able to write a macro for scalas-react components, that can be expressed as: “Annotates a trait, class or object containing a case class that must be named Props, and a field named component, and generates an apply method with the same parameters as Props, with return type VdomElement

Then, after macro expansion you don’t need to run the type checker, but you do need to check that there aren’t missing method bodies. (If the current pipeline doesn’t allow missing method bodies that far, you could let them have some kind of special body that will error later if not replaced, I guess.)

It seems unfortunate that this would force the macro implementation and the shape specification to be redundant in some ways. IIUC match types can also be redundant with a corresponding pattern match. Maybe this could be solved, but even if it’s not, it’s still better than anything else IMO.

Also if type-level name mangling is too much, both examples could replace the prefix with an enclosing object. Something like:

case class Person(name: String):
  object `with`:
    def name(name: String) = copy(name = name)

or

case class Person(name: String)
object Person:
  object lenses:
    val name = Lens[Person, String](_.name)(name => _.copy(name = name))

I think “dialects” and “fragmentation” refer to characteristics of the Scala code that people write. Facilities for boilerplate expansion will influence that, whether the boilerplate expansion happens before typechecking or afterward. And I think you’re implying that “dialects” are inherently bad, and are the kind of thing that sullied the good name of Scala 2.x. But, would you consider things like Lombok or Java annotation-driven frameworks to be “dialects” in that sense? I sure would, but they sure didn’t hurt the adoption of Java.

I’d agree that it sounds really elegant to reframe the codegen problem as a “lint failure” followed by a corresponding “code fix”. But however it’s framed, if a “dialect” is to be avoided, then it sounds like the benefit (compression of boilerplate into its essence) is what’s being avoided.

TLDR: I guess I’d argue that “dialect” is the goal, so if that’s incompatible with Scala’s principles then there’s no sense in trying to find some path toward code generation.

1 Like

@nafg I am arguing against the very idea of allowing to create language dialects like @data classes. Whether you do it via macro-expansion or via a super-powerful meta-type system is secondary. If we allow that by whatever means, we will get dozens or hundreds of un-coordinated de facto language extensions. That’s the Lisp dream and the Lisp curse.

As an aside, I’m really happy to see all the work that’s gone into cleaning up metaprogramming and making it first-class. But around the edges, I think there’s all too much over-correction around things that, maybe someone said one time confused them about Scala.

This is only my own opinion, for whatever it’s worth – but I don’t think that “dialects” from annotation macros were ever a deal-breaking confusion point for anyone in Scala 2.x. If anyone had a problem with “dialects”, they were of the variety that is still completely possible without any sort of metaprogramming.

Edit: want to make sure to note that my “air quotes” aren’t meant to be sarcastic, and I hope I’m not coming off as disrespectful. I really do appreciate all the work that’s gone into Scala 3, and I totally understand the desire to keep as many unbroken seals on it as possible.

2 Likes

I believe most annotation-driven frameworks would be supported as macro annotations in Scala. Lombok is different since it directly hacks into the javac compiler, changing its AST. So that’s more like an unauthorized dialect of Java. At least it’s not called Java and there is only one of it. Now imagine having dozens of mini Lomboks all masquerading as innocuous annotations, possibly clashing with each other.

It doesn’t sound great, but I also wouldn’t blame Java for it :man_shrugging:

Nor would I argue that Java should have made it harder for those things to exist. If I’m using them – on purpose – and they don’t work very well, my options are to stop using them or fix them. In this rhetorical situation, I can’t imagine saying “Well, nuts to Java!” and switching to Kotlin or something, just because I Icarus’ed myself with all the Lomboks.

I feel like the kind of fragmentation that I’ve seen people complaining about as unique to Scala is its flexibility such as () vs. {}, e.g. the many ways to write or call a function (which would now be compounded by things like {} vs indentation and implicit vs. given/using), not that Monocle @Lenses is too magical and weird. Does having @Lenses fragment the language?

I mean I guess I’d rather it be built into the language, but that isn’t the point.

But I don’t have that much exposure, only things like Reddit rants. Can you expand on your concern Martin?

4 Likes

Either the editor integration is “part of Scala”, or it’s not.

  1. If the editor integration is not “part of Scala”, we do not need to worry about integrating with dozens of different editors, but all we have is a fancy linter. Linters are great, but it’s not a viable replacement for macro annotations at all. We already have plenty of linters, and do not need another one baked into the language. People can write their own linters, and have.

  2. If the editor integration is “part of Scala”, then it is a viable replacement for macro annotations, but we are now committed to properly integrating this in the dozens to hundreds of different places people can write Scala.

We can’t have it both ways. Saying “It’ll be super simple! But don’t worry about the inconvenience, because we’ll have this editor integration! But the editor integration is just a nice to have and not required, so it’s still simple!” does not address the real tradeoffs and challenges involved.

Nobody in the programming community does things this way, for good reason. The closest thing you’ll find in other communities is the boilerplate generators in Java IDEs or Rails, which aren’t exactly shining examples. These are exactly the kinds of things that people fled other languages and came to Scala to avoid!

To go back to a more fundamental question, IMO Scala definitely has a verbosity problem around definitions.

Scala has tons of ways to abstract over expressions: functions, higher-order functions, by-name parameters, inline, macros, etc… We hardly ever see people resorting to code-generation for dealing with expressions, the way people sometimes do so in Java. Scala’s inbuilt tooling is sufficient. Perhaps the only exception is for performance reasons due to the fact that Scala’s abstractions are not zero-overhead.

But there is basically only one way to abstract over definitions: you can extend a trait. That’s great for some use cases, but not sufficient for others. When you see codegen in Scala, it’s basically always for generating definitions. The Dotty repo itself uses codegen. Many of my own projects use codegen. “extend a trait” is for definitions more or less what "higher order function" is for expressions: they work great when the things you are abstracting over follow predictable patterns, but fail when you need something more flexible. For expressions, you can use macros, but for definitions you’re out of luck.

This is never fatal - you can always fall back to codegen. But the point of adding these language features is so we can do more “in language”, in a standardized way without falling back to splicing strings in a hundred-and-one ad-hoc ways. The proposal above is unable to even replace codegen in my own projects or those of the folks I collaborate: nobody likes templating source code, but at least build-time code-gen is deterministic and automated in a way that IDE-level point-and-click code-gen is not.

I don’t know what the answer is, but I’m very sure this proposal is not it. Nobody is coming to Scala because they love managing heaps of boilerplate with IDE-auto-generated getters and setters. It goes against the entire ethos of the Scala community

11 Likes

Thank you for making this so clear! Previously I don’t think I was really conceiving of the feature quite the right way.

However, I think that the issue of language fragmentation is there regardless. In some ways it’s even worse, because it’s language-and-toolset fragmentation. Metals/VSCode Scala looks like this, full of @data annotations that have conveniently been filled out; but people with different tools use non-@data-heavy Scala because without the convenience, there’s no point.

Sometimes you want things to be inconvenient on purpose to encourage a uniformity of design. Yes, you can do the weird thing, but you have to pay the inconvenience-tax.

What you point out is that there’s really no clear dividing line between helpful linting and toolchain fixes, and unbridled tool-driven codegen that pushes everyone effectively into their own silos, even though technically anyone can edit anything (but nobody would, without the right tools).

So I think a serious conversation still needs to be had about where to try to draw the line. “We encourage you to write unrestricted macro annotations, but this is how you do it: frame it as a lint, then write a code generator for your favorite tool that fixes the code when it (obviously always) fails the lint when you write it by hand” is not avoiding unrestricted macro annotations in any meaningful sense, is it?

I would instead think that an avenue to explore that might have more principled boundaries intrinsic to it is whether additional transparency (c.f. transparent inline) could cover enough of the remaining important use cases to be able to forget about the rest. Basically, the idea would be to have a structured but more flexible way to indicate the content–maybe allowing abstraction over parameter list lengths or somesuch?–without enabling so much abstraction that one can’t even tell what code means any longer due to the proliferation of dialects.

2 Likes

I don’t think I’ve said this previously, but I hate this approach (macros built along the lines of @tailrec) with a burning passion. I find the approach really awkward, and I’m pretty sure I would never, ever use it. I mean, @tailrec confuses the bloody hell out of new Scala programmers – doubling down on that is not a good plan.

(ETA: sharpening the reason why I would never use this – it encourages boilerplate in the code. In the world of enterprises that require double code reviews on every PR, where you usually have to struggle to get reviewers, this is horrifying. It literally makes my life harder than writing things by hand, since it blurs the distinction between “this is real code that you need to read carefully” and “this is a generated file that you can basically ignore in PR review”. If I’m understanding it correctly, it’s an absolute anti-feature from my POV, and I would probably fight to forbid its use in our repos.)

I agree with the folks above – specifically, I find the arguments against classic in-the-compiler-pipeline macros to be really overblown. Yes, there will be some problems. But we have stuff like this all over the place today, and the convenience massively outweighs the occasional downsides.

I really think the team is being excessively clever here, and fighting against what I believe most of the in-the-trenches Scala engineers want. IMO, that’s a mistake.

5 Likes

During the program lifecycle, code is read much more times than written. A generated boilerplate code, which is visible, adds cognitive load to each reading act, making working with code much harder.

6 Likes

I had a need for a macro annotation that adds (private) implicit arguments. IIUC, this is not a language dialect, but it’s also something that must occur before typer.