See the following for why we might want a bootstrapping compiler in the first place:
https://contributors.scala-lang.org/t/bootstrapping-of-the-scala-compiler
The idea is the following:
Iteratively ablate the compiler until what remains is small enough to write an interpreter for (in whatever other language).
By numbering each successive version as bootstrap(0), bootstrap(1), … bootstrap(k), interpreter, we can then bootstrap the compiler by running (pseudo-code):
interpreter_binary = bootstrapped_other_language_compiler( interpreter )
# Run the simplest compiler on itself
first_binary = interpreter_binary( bootstrap(k), bootstrap(k) )
# Get progressively more capable binaries
second_binary = first_binary( bootstrap(k-1) )
third_binary = second_binary( bootstrap(k-2) )
...
penultimate_binary = ante_penultimate_binary( bootstrap(1) )
# And one last time, with the source code of scalac x.y.z
scalac_binary = penultimate_binary( bootstrap(0) )
Each successive ablation would be done in one of two ways:
Scoverage Loop
In this loop, run the current compiler on its own source code with coverage enabled, remove all dead code, and do it again.
By “current compiler” I don’t mean “scalac x.y.z”, I means the current state of the compiler we’re modifying.
Once this process reaches a fixed point (there is no dead code to remove), make a step in the second loop:
Manual Loop
This means we should have reached a set of features that the compiler uses to compile itself.
Find the easiest feature to refactor out, and do it.
Some examples of the kind of “feature-refactoring”, replace in the current compiler:
- All uses of given/using by explicit parameters.
- All enums with case-classes.
- All case-classes with regular classes
(Probably only one should be done at once, so that the scoverage loop hopefully makes removing another simpler.)
With this done, the current compiler should have more dead code for the Scoverage loop.
For example, replacing all givens/using by explicit parameters means the Scoverage loop will remove all code which handles compiling givens/using, from the parser all the way to the code generation !
And if there’s only a few features left, and/or none of the features can be refactored out, it’s time for the final phase:
Scala-- interpreter
Let’s call Scala-- the sum of language features necessary for
Write an interpreter that covers all the features used in the current compiler in a boot-strappable language, for example C.
And that’s it ! (well it’s still a lot of work, but at least its cut into manageable portions)
It’s probably a very naive plan, but it’s been bouncing around in my head for a while, and I don’t think I’ll have time to try it soon, so hopefully it will inspire someone to try it !
P.S: It might be tempting to use coding agents for one of the manual steps, I highly recommend against it: 1. it stops you from learning cool things, 2. It’s way harder to trust since LLMs have a tendency of making hard to spot mistakes, 3. It’s very probable the current models are not “smart” enough, and it will make a lot of progress at the start, and then come to a screeching halt