Pre-typer syntactic plugins in Scala 3?

I’d certainly be happy to see improvements in that area. If you download the ENSIME source code from https://ensime.github.io/ and look through the plugin.scala in the scala-3 folder you’ll see some other hacks I had to add in there to deal with the fact that Settings can no longer be “unparsed”. It would be good to recover that Scala 2 feature as this is a really good mechanism for extracting the compiler parameters for use by any tooling that then invokes the compiler (out of band, e.g. like in the IDE usecase). You should be able to see in that short file exactly how it could be cleaned up to simply .foreach over the list of source files, if it was available, instead of being called for each compilation unit at a later compiler phase.

I just wanted to see if I understood the state of things. First, I’ll name three kinds of plugins:

  1. Read-only: plugins that read code but never change or add it. Something like Java’s FindBugs falls into this category.
  2. Append-only: plugins that generate new code, but never alter existing code.
  3. Read-write: plugins that add new code and change or remove existing code.

At present, full read-write (non-research) plugins are permitted, but only after the typer. This means that you can do some pretty terrible things, like take every pair of single-arg methods in a class and create a new method with named with the concatenation of the method pair of method names that composes the methods. Other code in the same module cannot depend on those synthesized methods, but downstream modules or other projects could. You could even just swap + and - everywhere (in cases where they have the same type signature of course). So in some limited (but still terrible) sense, these plugins can “create a dialect.”

The compiler team says that no plugins should run before the typer. I think there is a pretty clear use case for append-only code generation before the typer, with a replacement for macro annotations (in particular, @deriving) being probably the canonical example as OP said. @rssh suggested that read-only plugins should be able to run before the typer, pointing out that such plugins can’t create a language dialect. IIUC level of power would also be sufficient for ENSIME. @som-snytt also suggested limiting the power of macros.

Is the position of the compiler team that no plugin of any kind, even read-only ones, should ever run? If it were technically possible, would append-only be palatable enough that the compiler team would allow them? And is there any reason that read-only pre-typer plugins shouldn’t be possible?

The compiler team has no official position in the matter. One can discuss things in the dotty repo in a feature request or issue. But you’ll have to get someone excited about it, who will actually push for the changes.

My personal opinion is that read-only plugins are much less or a problem than plugins that modify or augment the tree. But they also very limited. Maybe a more flexible alternative would be to open up the parsing in a separate tool. That would be beneficial on its own. I.e. a parser that can be customized with the kinds of trees it generates, maybe coupled with a formatter.

1 Like

Are there any updates on this? Scala already has many features to customize the language. Macros and compiler plugins being some powerful examples. I have never experienced the old problems myself and would welcome a way to create language dialects. Everyone can decide on his own how much complexity he wants to add to the language. Currently anything like this is implemented with preprocessors that create a copy of the files, convert the dialect to actual scala code and run the compiler afterwards. Making this an official option would allow for better tooling and automatic linting, error generation etc. This feels just more like scala. Giving you all the possibilities but also the responsibility. I mean we also have scala xml. We should anything like this be not permitted. It would allow adding support for maybe scala json, scala toml or other languages. To me this sounds like it would improve developer experience in some cases by a lot.

I also want this

I would encourage folks who want an even more expressive metaprogramming story to look at Pre-SIP: Export Macros.

Export Macros can do quite a lot of what I think folks have been asking for. And better, the feature has been implemented in my branch so if you want to test it out you can. Its obviously not production grade, but it is enough to get a sense of the potential I think.

2 Likes

I’m against dialects, but the “code generation by string concatenation” joke has to die finally!

It’s laughable that Scala, one of the most powerful and advanced programming languages under the sun, has literally nothing to offer in that regard, so you’re back to assembling raw strings like it was done already with m4 macros almost 50 years ago.

I can’t believe the thing I’m editing is just a string.

I also don’t get it.

Why are we converting an AST back and forth to some limited string representation constantly?

This should happen only once on input (as we need to type in the code somehow), and from there the editor should work with the rich data structure that an AST is (likely even augmented by some meta-data).

Problem is of course that we don’t have proper AST editors yet.

We have some kind of hybrid, with a “stringly frontend” which is powered by a backend engine that works with the actual AST. But we don’t even cache the AST in memory afaik. We convert constantly back and forth form strings. That seems indeed ridiculous.

I’m of course not asking to change that right now. This would need some greater move across the whole filed of software development.

But what I’m asking for is some sane and safe method to generate code in Scala.

Having to resort for code gen to raw strings, without any safety, not even against typos, in one of the most powerful languages out there seems just wrong. And yes, people want code gen! But the result should be still maintainable.

I guess there is even some space for research and advancement of the state of the art in this topic. Scala should see this as a chance to become better than other languages on one more axis.

But rankly, at the moment we’re at the technical level of the m4 macro processor, in my opinion.

OK, I’m exaggerating of course: Scala has already very advanced and powerful meta-programming facilities. Only that those don’t allow unrestricted code generation. You can’t even abstract over the creation of similar shaped data structures which differ only by some concrete names. In that case you would have to write some string template, and iterate through some data to fill in the gaps and write out the this way generated code to disk. That’s the most awful and error prone approach possible! Even the C pre-processor hat more safety guards built in…

And it’s not like this is some exotic feature nobody needs. Here, just some random examples what kind of workarounds are currently employed because of this limitation of the mata-programming features:

1 Like

It’s not a string, it’s a rope. And you’re holding far too much of it, give me some back!

Do you mean like, enough to toss over a rafter?

There is other text for that purpose.

I’m against dialects, but the “code generation by string concatenation” joke has to die finally!

Speaking of a well-executed joke, just look at what dialects did to Switzerland.

Making bureaucracy for the government clerks more complicated to handle?

I thought the central advantage of Scala was to create language dialects or domain-specific languages, external or embedded. Its ability to capture intent of code, but defer the execution I think has contributed to the success of Scala 2. Setting aside the purely functional camp like Scalaz/Cats and we can bundle sbt into it too, effectively all user-land success stories are some form of dialect that enabled something that was previously not possible or difficult on JVM.

  • Twitter’s Future was probably among the first major commercial success story of Scala 2.x (see https://youtu.be/Jfd7c1Bfl10?t=495 for details). Long before SIP-14 added a watered-down version of Future to scala-library, Twitter implemented Future with local scheduling, root compression, and cancellation.
  • Akka code looks nothing like normal Scala code, but it encoded message-passing actors that can automatically be distributed across machines with safety mechanisms like isolation and backpressure.
  • Spark powers many of the major enterprises for distributed computation, implemented by DataFrame, which literally takes a lambda expression and bundles it up and ships it across different worker machines.
  • Morgan Stanley’s code base (see https://www.youtube.com/watch?v=BW8S92jP5sE&t=984s for details) also seems to be powered by a dialect of Scala.

    We’ve created this construct, the Node, this is an annotation that extends the Scala language, and we’ve implemented it ourselves using a compiler plugin.

So I wouldn’t say that Scala 2.x succeeded despite the dialects, but because of it. This is not to say that Scala 2.x was without warts, bad rep, and complexities. Sometimes the developer experience of using some of the above are downright confusing and horrible when things fail, because:

  1. parallel computation and concurrency is confusing, esp when you hide them from the users.
  2. because it’s using some hacks that interfere with other language features in unobvious ways.

When we notice these sharp edges, I think better thing for Scala to do rather than shutting down dialect would be to adopt it into the feature, like Pickling and Spores for shipping lambadas, and make it a friendlier language to implement dialects. To put another way, if Spark was shopping around for a host language today, we should make sure they would still choose Scala 3.

5 Likes

I should have been more clear. When I said dialects, I meant things that are not expressible in normal Scala, but could be expressible by changing the parser, or having a pre-typer syntactic plugin, or doing stuff with macro annotations advanced enough to confuse tooling.

The things you mention, Twitter Futures, Akka, Spark, are not dialects in this sense. They demonstrate the great syntactic flexibility that Scala has already. It’s precisely for this reason that I think we don’t want to go beyond what Scala already offers.

2 Likes

I agree, I wouldn’t call such things “dialects” either.

As long as the resulting code is “vanilla Scala” it’s not a dialect.

But some more flexible form of code generation would still be more than nice!

Concatenating strings as seen above is also not tooling friendly, and you lose all the things the language normally provides. So some plan for a more powerful but still safe way of generating code needs to be made, imho.

1 Like

So this would exclude things like kind projector?

1 Like

Btw, is Ammonite’s import $ivy. now considered a “dialect” which needs to get burned? “Vanilla Scala” won’t compile that…

https://ammonite.io/#import$ivy

I hope all the given examples show now clearly that this “no language extension” policy is complete nonsense. Scala lives by its various extensions! (Which most of the time aren’t proper dialects anyway). If you take this away you have at best Kotlin. Why would anybody use Scala then?

Edit: I just realized that Kotlin has actually quite some dialects. They use compiler extensions excessively, and this is considered “a good thing”. So once more: The issue is again just marketing!

Yes, tooling is an extremely important part of the picture. Nobody claimed otherwise. But this should go hand in hand with powerful abilities for language extensions on all kinds of axes. So the key here would be to think about some language extension mechanisms that are tooling friendly, and actually properly integrated into tooling before prime time.

A good and stable officially supported pre-typer plugin API seems to be the right thing. (The alternative is of course just to hack something. Nothing holds one back to manipulate the part of the compiler that disallows currently pre-typer plugins. The JVM is a dynamic runtime, you can do all kinds of hacks, up to runtime bytecode rewriting. Should there be no official way people will just find workarounds, because people don’t like arbitrary limitations. Especially if those are there only for ideological reasons without any true technical necessity.)

In the end people want power, not limitations. @lihaoyi is just right about that. But limiting power where needed is of course also a valid requirement. I understand this, and that’s why I’ve proposed a kind of relief to this situation in the other thread.

The real problem are arbitrary extensions that aren’t properly integrated so people need to fall back to some kind of “hack”. So Scala should always look to pick those up when the time is ripe. Kind projector is a great example of how this can work out nicely! This never would have happened if things would have been outright limited form the get go.

1 Like

FWIW I would love if this kind of thing were available in mainstream Scala code… When I’m bootstrapping an experiment, writing a build.sbt can be a major distraction…

1 Like

Scala has now “magic comments” for that, which scala-cli will recognize.

I would suggest to try it out. It’s especially useful for some quick experiments.

Enjoy! :smiley:

3 Likes

@MateuszKowalewski
Unrestricted generation of source code using better way than string concatenation would certainly be very good, but is the compiler the right place to do it? Compilers consume source code, not produce one. Producing source code is more of a job for build tools with plugins.

Scala 3 has already an AST-based representation on disk: TASTy GitHub - scalacenter/tasty-query . Maybe instead of tasty-reader we can have a tasty-writer too? That would give us typesafe API to construct AST trees. That AST could then be decompiled using some tool to ugly *.scala file, then reformatted using scalafmt and finally we would have AST and nicely formatted *.scala file. I’m not sure if it makes sense, though. The downside is that compilation whould have to happen in phases, i.e. first you compile code generator in some module, then you run that generator to produce code for other module, then you compile the other module (that’s why you need support from build tools). Worse than Scala 3 macros which AFAIU allows you to run everything in one module (I’m talking about modules in sbt or Maven sense), but Scala 3 macros don’t let you generate code in unrestricted way. So there’s always a tradeoff. On the upside, the pretty printed generated code should be much easier to debug than some crazy macros giving crazy error messages.