A scala-lang blog post outlines a proposal for a new macro system that tightly integrates macros and Tasty. Feedback on this would be very welcome.
I see tasty being called out as a core technology along with use cases for separate compilation. One thing we want for scalable separate compilation is to only compile against the interface of the code vs all the code. Build tools like bazel, pants and buck do this currently. To do this today, there is a tool that removes all the code attributes from the jar, which as it happens, scalac will still accept. If you disable code inlining across jars everything works fine.
It would be a real advance if this were a supported thing: ability to emit an API-only tasty jar that dotc could compile against with the contract that at runtime the full code would be present.
This is related to macros, because clearly we cannot omit macro code at compile time, so ideally dotc would error with a nice message (what jar/class/method was required at compile time).
I know there are people who feel like the needs of large scale build tools should not be considered, but I hope that is not the case as I think using scala at companies with large monorepos is an important case and valuable for the language. It seems quite reasonable to me that dotc should make this easy vs all the monorepo build tools having to hack around this and often giving substandard support due to no supported path in the tooling.
This should be doable, but it’d be good if we understood your usecase better. What’s wrong with having the body of methods in Tasty if the compiler never looks at them when compiling against a library?
because the build tool needs to understand that even though the body has changed, it is safe not to reinvoke the compiler.
The goal is for these generic build tools (make, bazel, pants, buck,…) to be able to tell by, for example, hashing the binary of the input, if the compiler needs to be re-run. If we could compile by passing only the API, and not the code, then when only the code changes, and not the API, then the build tool can easily see that no input has changed and skip running the entire target.
Does that make sense? The ability to compile against only the API enables the build tool to be decoupled from the compiler in a way that avoids invoking the compiler at all if non of the dependent APIs change, and the code for the target does not change.
First of all, thank you for this blog post. Not everything I can understand, but from what I can it seems the proposed solutions cover blackbox macro use quite well and possess great potential. I’m very glad for the possible return of annotation macros, and even better than before.
As for whitebox macros, I’m still uneasy. There are many versatile use-cases for them today. I’ll need to delve into the dotty typelevel programming proposal to see if my use cases are covered, but how do we solve the problem for the community in general. Should the dotty team be proactive and contact library authors to try and solve their issues together? Should library authors open whitebox issues on the dotty repo? (And all that is without knowing what hell goes on in private repos that use macros).
Again, thanks. There is much work ahead, but I’m a bit more optimistic than I was before this post.
Thanks, that’s helpful. So wouldn’t it be simpler to cut the middle-man out and have the compiler emit a .apihash file next to each .class file which contains something like a 64 bits hash of the API?
At the least, a handwave description of how some of the commonly-used whitebox use cases would be addressed in the new environment would probably go a long ways towards alleviating the general angst about losing the old system. I think most folks have accepted that macro libraries will need rewriting – everyone just wants some confidence that the key functionality will still be possible.
(In principle, I very much like the new design – it’s much more elegant than the old system. But I can’t say I grok yet how it will apply to some of the more complex use cases…)
The hash is not enough for bazel, and probably won’t work for make (and maybe pants, I don’t know). Bazel insists on reading the inputs to see if it should rebuild. If you say the .apihash is the input, then you have to only compile with that.
You don’t have a way to say if this hash changes, then pass me this jar. You can only declare your real inputs (buck does allow you to customize the key of the build cache).
Bazel devs at Google are reluctant to change this because it is a non-trivial change to the model, and they are nervous about buggy implementations of this apihash. Since bazel does aggressive distributed caching, correctness is a priority to not corrupt the cache.
I am still not quite sure how Tasty fits into this picture. If there is a 1-to-1 correspondence between Tasty constructs and the interface that programmers will actually use to consume and produce it (shown in definitions.scala), then what is the point of imposing Tasty in the first place? Does it not become all but an implementation detail?
It seems to me that most of the trickiness will come from those “semantic operations” added on top of the standard data structures. I imagine we can’t let each compiler implement them using its own infrastructure, as this would result in the very problems of compiler-dependent semantics that we already have. So in effect, if Scalac/nsc wants to support this new macro interface, won’t it have to basically embed the whole Dotty/dotc frontend to implements these operations?
One important point: Tasty is fully internally-typed, so there should be no support for “untyped tree”, as opposed to the current macro system. This probably entails a quite different (possibly better) way of developing macros.
It is not listed on the blog post, but I assume one of the semantic operations will be to dynamically re-infer the types of reconstructed program fragments, which also seems to be a non-trivial task.
Should the dotty team be proactive and contact library authors to try and solve their issues together? Should library authors open whitebox issues on the dotty repo? (And all that is without knowing what hell goes on in private repos that use macros).
Yes to both. We will reach out to library authors to work with them to solve their issues so that the projects can be integrated in the new community build. The plan is, once we have a macro system working, we will gradually port projects, hopefully with the help of the authors. We will start with testing frameworks because these are usually at the bottom of the dependency graph for any projects and work our way up. Opening issues in the dotty repo is a good way to start the conversation. But it might be best to hold off with that until we have a first version that works.
As far as I understand it, the problematic case is a whitebox macro or annotation macro that produces new definitions or refines the types of existing ones, where these definitions have to be accessed in the same compile. I.e. it is not feasible that all consumers of these abstractions reside in a downstream project.
@olafurpg did a survey of existing macro-uses, and classified them into categories. I can see two possible categories that would match the description above: typeclass derivation and compile-time type providers.
Typeclass derivation will be supported natively in the language, but we have more discussions and experimentation work to do before we can settle on a concrete scheme. Getting use cases and empirical data in this area would be very valuable.
Compile-time type providers that produce Scala types from external sources like schema descriptions will need to be factored out into an upstream project. I.e. say you want to import a database schema S. Project A would contain the macros that read S and produce case classes that mirror S. Then all code that accesses these case classes would have to be in a different project. I believe that’s actually a saner way to go about things than to mix everything in one project. The advantage is that the generated types in project A can be inspected and verified separately - since the Doc tool info is integrated in Tasty you could even provide ScalaDocs for generated code!
I understand. The only thing I’m worried about is that we may not have the full picture of how whitebox macros are actually used. I suggest a “Call for whitebox macro issues” to be extended out to library authors and encourage them to open issues on the dotty repo. If you see that a specific use-case is already handled, then that issue can be closed with reference to the PR/blog post that handles it. I think it is best to properly document all the macro use-cases, so we can know ahead of time which Scala3 solutions cover which macro use-case (and see what we’re missing).
TL;DR, I propose:
- Create a flag for whitebox macro support on the dotty bug tracker.
- Encourage library authors to detail their whitebox macro use cases (maybe suggest a template), now.
- Covered use-case tickets can be closed with reference to “the plan how”.
Call for whitebox macro issues
Related: Whitebox def macros
Yes, but I think the dotty tracker is a better way to handle this and also provide (very) detailed tickets.
I’m open to a separate bug tracker. That is OK too.
I added the following section to the post to summarize the relationship between ScalaMeta and Scala 3 macros.
Meta Programming in the Large
The future Scala 3 macro design is intended to replace the existing def macros and the scala.reflect
infrastructure. But there is another meta programming system that is quite complementary to it: Scalameta provides high-quality syntactic and semantic analysis and code generation tools which are separate from the Scala compiler. As the name implies, Scalameta is run at the meta level, that is, it takes programs as input and produces syntactic or semantic information or rewritten programs as output. A macro system, by contrast, is integrated in the language and expands programs as they are compiled. There are potential synergies between the two projects. To name but two possibilities:
- Scalameta or projects derived from it such as SemanticDB could obtain type information directly from Tasty, which would make them independent from specific compilers.
- IDEs could use Tasty for single projects but refer to SemanticDB for more complicated multi-project and multi-language builds.
This is a fantastic idea, thanks for the great work. I might be a little late here, but the symbols '
and ~
seems strange to me.
From my understanding ~
starts a block that is executed at compile time, whereas '
inserts a symbol into the code there.
So what about changing them to meta
and $
(akin to string interpolation)?
E.g.
inline def concat[Xs <: HList, Ys <: HList](xs: Xs, ys: Ys): Concat[Xs, Ys] =
meta {
case Xs =:= HNil => ${ys}
case Xs =:= HCons[type X, type Xs1] => ${Cons(xs.hd, concat(xs.tl, ys))}
}
Compile-time type providers that produce Scala types from external sources like schema descriptions will need to be factored out into an upstream project. I.e. say you want to import a database schema S. Project A would contain the macros that read S and produce case classes that mirror S. Then all code that accesses these case classes would have to be in a different project.
That seems to be a huge pain - that is, I need to better understand what it looks like real projects, but it is exactly the kind of pain that Java with its “one class by file” rule created, but here it is at project level.
Please, be very, very careful when introducing that kind of requirement. Builds are already sufficiently hard to manage by themself, it would be a real barrier (especially for new comers) if any project wanting to demo a json-schema (or whatever) example need to also explain how multi-projects build works…
Again, it may not be that important, I don’t really understand the cases where it will be needed for now.
If creating a multi-project build is difficult compared to creating and using a proper JSON schema, I think the fault lies with the build system. Conceptually, creating an upstream project is trivial; build tooling can be altered to make it trivial in practice, too. (Existence proof: it is trivial with cargo
, Rust’s build tool. It’s not that hard with sbt either, though it does feel too much like an exotic and dangerous thing to do.)
However, a lot of trivial boilerplate is still onerous, so your point stands: requiring separate projects can be burdensome if it is frequently required. (E.g. if reasonable projects would require dozens of separate sub-projects.)
I believe the proposed syntax is derived from LISPs macro system which used '
for “do not evaluate” which in LISP means “keep this as code”. I think ~
is selected because it is not heavily used and LISP used ,
which would definitely be bad in Scala.
I think using meta
would be reasonable, but $
would be a problem since it is actually a valid “letter” in many places in Scala (and Java). So I would be very nervous to make $
act as an operator in any context.
I’m a proponent of making anything which is rarely used relatively verbose (like asInstanceOf). So I would actually go for the older Scala meta proposal of meta
to prevent evaluation and inline
to reenable it. Any code that uses these operators a lot should probably look obviously unusual anyway.
Olafur certainly did a great job at analyzing the various use-cases (here’s the mentioned blog with the results), though I think it might be worth re-visiting the subject, as the analysis mostly contained open-source library code, which is usually quite different from “everyday” application code.
For example, I would expect to see a lot more usages of libraries such as circe or monocle, and much less of spire and parboiled, not to mention annotations (which I think are a dead-end and people coming from Java often instinctively avoid them, but that’s another topic )
If that would be interesting for the development group, we’d be happy to help run such a more detailed survey on a larger scale.
Adam
This is kind of a naïve question, but I think it’s worth answering: Why can’t Scala 3 allow white box macros by rerunning the type checker after applying the macros?
It would require duplicate work, but it would only be needed when whitebox macros are actually used.