The topic of compiler performance comes up time and again.
One idea that’s been going through my mind a lot is the idea of a functional reactive compiler.
What I mean by that is that the compiler has a second, interactive server mode where it keeps all intermediate results in memory and when the user types a few characters, it updates only the concerned parts of the AST (using an incremental parser like https://github.com/Eliah-Lakhin/papa-carlo or https://github.com/tree-sitter/tree-sitter). If the AST changes in a way that requires the types / method table / … to change as well those get updated incrementally until the change is propagated through all necessary layers.
I assume this is quite different from how the compiler works today and would probably require different data structures, but my hope would be that this could be abstracted by a final tagless API where each interface has one implementation for the normal bulk compilation mode where F[_] = Id or Future and one implementation for incremental mode where F[_] = ReactiveVariable.
Using such an approach, one would hopefully get recompilation in milliseconds for many cases, e.g. just changing the content of a String might only change the AST slightly and modify one entry in a constant table.
If this is not possible/feasible I’d be very interested in where my wishful thinking lead me to make assumptions that are too simple
This mode of operation seems to be the rough goal of what has often been called the “presentation compiler”, which is a thing that does exist and which is primarily consumed by IDEs. I’m not any kind of expert on the implementation of the current presentation compiler but my understanding is that it is far less “clean” than you what describe. Additionally, my understanding is that it has a history of being very buggy and costly to maintain and thus I believe that so far Dotty does not have a presentation compiler.
If you’re talking about doing this in a principled, correct way, this has been an active topic of discussion/research and it’s deemed to be too difficult to formalise incrementality at the language level, let alone implement it in an efficient way. What we have today in scalac and Bloop (via Zinc) is an approximation of that model and honestly works fairly well.
In particular, some of the properties that you mention (in-memory stores) have already been implemented and used in practice.
interactive server mode where it keeps all intermediate results in memory
Bloop already does this in pipelined compilation, the signatures of modules are stored in memory and downstream compilers reuse them. We can enable this for non-pipelined compiles (the default), though usually the best approach is to memory map the classes directories you use to compile at the OS level.
Using such an approach, one would hopefully get recompilation in milliseconds for many cases,
In the current Bloop release I’m working on, a recompile of one or two files is already well below the 500ms range in a medium-sized project. In the previous one it’s slightly slower. This new release includes several IO optimisations that bring the price of incremental compiles further down. To give you an idea of the current timings, doing a no-op compile of the whole build of https://github.com/guardian/frontend takes around 450ms. (This release will be out soon, if you have more specific questions feel free to drop by our Gitter channel https://gitter.im/scalacenter/bloop)