Functional Reactive Compilation

nightscape · March 5, 2019, 1:45pm

The topic of compiler performance comes up time and again.
One idea that’s been going through my mind a lot is the idea of a functional reactive compiler.
What I mean by that is that the compiler has a second, interactive server mode where it keeps all intermediate results in memory and when the user types a few characters, it updates only the concerned parts of the AST (using an incremental parser like https://github.com/Eliah-Lakhin/papa-carlo or https://github.com/tree-sitter/tree-sitter). If the AST changes in a way that requires the types / method table / … to change as well those get updated incrementally until the change is propagated through all necessary layers.

I assume this is quite different from how the compiler works today and would probably require different data structures, but my hope would be that this could be abstracted by a final tagless API where each interface has one implementation for the normal bulk compilation mode where F[_] = Id or Future and one implementation for incremental mode where F[_] = ReactiveVariable.

Using such an approach, one would hopefully get recompilation in milliseconds for many cases, e.g. just changing the content of a String might only change the AST slightly and modify one entry in a constant table.

If this is not possible/feasible I’d be very interested in where my wishful thinking lead me to make assumptions that are too simple

curoli · March 5, 2019, 4:47pm

Sounds perfectly reasonable in theory. In practice, I would imagine it is expensive to figure out what needs to be updated and what not.

But if you only change one file in a way that does not change the API, so that you only have to recompile that one file, that should be done in milliseconds with existing compilers, too, right?

clhodapp · March 5, 2019, 10:26pm

This mode of operation seems to be the rough goal of what has often been called the “presentation compiler”, which is a thing that does exist and which is primarily consumed by IDEs. I’m not any kind of expert on the implementation of the current presentation compiler but my understanding is that it is far less “clean” than you what describe. Additionally, my understanding is that it has a history of being very buggy and costly to maintain and thus I believe that so far Dotty does not have a presentation compiler.

jvican · March 5, 2019, 10:44pm

If you’re talking about doing this in a principled, correct way, this has been an active topic of discussion/research and it’s deemed to be too difficult to formalise incrementality at the language level, let alone implement it in an efficient way. What we have today in scalac and Bloop (via Zinc) is an approximation of that model and honestly works fairly well.

In particular, some of the properties that you mention (in-memory stores) have already been implemented and used in practice.

interactive server mode where it keeps all intermediate results in memory

Bloop already does this in pipelined compilation, the signatures of modules are stored in memory and downstream compilers reuse them. We can enable this for non-pipelined compiles (the default), though usually the best approach is to memory map the classes directories you use to compile at the OS level.

Using such an approach, one would hopefully get recompilation in milliseconds for many cases,

In the current Bloop release I’m working on, a recompile of one or two files is already well below the 500ms range in a medium-sized project. In the previous one it’s slightly slower. This new release includes several IO optimisations that bring the price of incremental compiles further down. To give you an idea of the current timings, doing a no-op compile of the whole build of GitHub - guardian/frontend: The Guardian DotCom. takes around 450ms. (This release will be out soon, if you have more specific questions feel free to drop by our Gitter channel https://gitter.im/scalacenter/bloop)

There are several more promising optimisations that we’re working on that improve on these results regarding both performance and correctness. For example, pipelined compilation was one of the first I worked on, but another big one would be resident compilation, tracked in Improve Classpath Implementation · Issue #416 · scala/scala-dev · GitHub.