This is my first post in this forum so I do apologize in advance if my question is not relevant, too trivial or has already been addressed.
I would like to develop a tool that would consume scala syntax-trees (as the title suggests). More specifically it would be great if I could consume the trees after each compilation phase.
My research led me to Dotty’s TASTY interchange format which seemed to be what I was looking for (i.e., a serialization format for the AST). Perhaps it is. However, I was not able to find adequate documentation on-line to figure out how to extract it and consume it.
I also looked at dotc compiler flags and couldn’t figure out an obvious approach. I noticed the option : “-print-tasty” but I couldn’t verify the expected output or perhaps I am missing something ?
Of course one can always print the ASTs after each phase using the scala printer (i.e., -Yshow-trees etc.). Is this my only option ? If it is, then fine.
Ideally, it would be great if I could consume the ASTs in a more structured format if you will (like JSON).
My strategy at the moment is to extend/modify/hack the AST printing classes of the Scala compiler to print out a more manageable version of the ASTs but before I do that I would like to make sure that there is no alternative.
I would really appreciate your help/feedback !!
Thank you in advance.
Hey @than21, what do you really want to do with the ASTs? Why do you need to consume them after each compilation phase?
Have you looked into Scalameta or writing your own compiler plugin?
Thank you @jvican for your response. These two links look very interesting ! I will definitely have a look.
In the meantime, to answer your first question … I basically want to write a new back-end (i.e., code generator).
In order to do that it would be great if I could have a reliable and structured input.
Why consume it after each phase ? This is not a requirement really but at this point in time I would like to experiment
with different alternatives (i.e., plug my new back-end after different phases etc.).
Does that make sense ?
Oh, if you want that I’d strongly recommend you to do it in a compiler plugin, as Scala.js and Scala Native do. Have a look at the example I linked before, it does all you want to do – but you’ll need to learn a little bit about the compiler internals .
Thank you @jvican. Indeed that seems to do what I want but I find it a bit hard to understand what it does .
Anyway, I will definitely give it a shot.
In the meantime, if there is anything else that you think might help me (e.g., a talk or a paper etc.) please let me know .
Again, thanks a lot for your help !
Just for completeness: it is possible to use the Scala compiler as a
library. The compiler in standard setup is in a class called Global
AFAIK, you can control the stages and get Trees after each stage. There is
also the reflection compiler, which uses the application classloader rather
than a provided classpath, and the presentation compiler, which only goes
through the early stages, but continuously.
I think there are several talks about “compiler hacking” but cannot link to them now. I encourage you to search for them in Youtube. As compiler plugins do not have much documentation, I’m afraid you’ll be a little bit on your own here… but I’m sure you’ll learn a lot along the way. I’ve heard that @retronym’s Scalac survival guide is quite good, though I’ve never had a chance to read it: https://github.com/retronym/scalac-survival-guide.
If you have doubts, feel free to ask a question in the scala/contributors Gitter channel.
I have written such a plugin. Unfortunately it’s closed-source. I intend to prepare an open-source toy version of it as a learning resource for persons such as yourself, but I’m not sure when I’ll get to it, so you shouldn’t wait around.
I can offer this advice, though: looking at what Scala.js does is extremely useful. I consulted https://github.com/scala-js/scala-js/blob/master/compiler/src/main/scala/org/scalajs/core/compiler/GenJSCode.scala many times when writing my plugin.
There isn’t any documentation on what Scala syntax trees look like late in the compilation pipeline. Basically you pattern match, see what tree shapes you get, add cases to cover them, rinse, repeat. Once you’re able to compile all the hand-written test cases you can think of, you’ll want some scaffold that runs some actual codebases through it and reports unmatched trees.
scalac -Xprint:mixin is extremely useful for seeing what transformed code looks like. If you’re not yet sure what phase you want to operate it, try
scalac -Xprint:all. Note that
-Xprint prints as pseudo-Java, not as nodes; if you want to see the syntax tree more directly, try
showRaw (very compact output) and
nodePrinters.nodeToString (much more verbose).
Hello, the only question I have that to be able to extract the AST from a code we need to have a one very
special property within a language of code which is Homoiconicity
Which is in case of Lisp and its dialects like clojure have that property where we can treat code as data, and then the code itself form a AST. But in case of scala as its nature as strictly typed language I am not sure how it can be achived to consume scala syntax tree from external tool?.
I may be wrong but I am really interested to know how it will work.
Homoiconicity has nothing to do with the OP’s use case, since they want to write a custom compiler back-end, which means they live basically inside the compiler, or at least at the same level of abstraction and data representation as the compiler. That means that the Scala code they will be manipulated will only be in AST form.
Oh I see! Thanks for the insight.
It’s been a while since I posted this question and I would like to confirm (mainly for future reference) that I was successful in effectively consuming Scala syntax trees through a custom plugin. I find the instructions here:
to be extremely helpful as well as the scala survival guide.
Thank you all for your advice !
Thanks a lot @SethTisue ! Your comment was very helpful as it clearly confirmed my initial intuition.