Bootstrapping of the Scala compiler

To be honest this sounds like making it much much harder to make Scala buildable from source.

Scala isn’t a proper OpenSource project anyway already as you can’t build it form source:sob:

But there was until now at least some distant hope to make this possible in the future (by putting a lot of effort into it, and creating a bootstrap compiler).

But when this thing here depends on “downloading the internet” just to run it, that’s a catastrophe!

It makes not much difference whether you need to download stuff manually or it happens automatically.

The status quo is already very bad. But more dependencies mean making things even worse.

The current situation where you need SBT to build Scala, which needs Scala to build itself is already a full blown catastrophe, imho. Because instead needing “just” a bootstrap compiler, you would need also a “bootstrap SBT,” just to get the build running.

As I understand it, now there would be even more external dependencies directly in the compiler sources, right? That’s not wise…

Please try at least a little bit to get Scala into a state where it would become buildable form source, so it can become a real OpenSource project finally.

Please don’t do anything that would make this goal impossible (or at least much harder as it’s already)!

Convenience for users is important, sure.

But the maximal convince achievable is making a software installable form the official repos by a simple apt install! Scala is frankly not even close, and things got worse already with the switch to the SBT build.

(Nowadays the above applies also to M$ and Apple$ boxes, as you run anyway almost always a Linux VM there do to basic dev stuff. VS Code makes it even really easy to run your dev-env in a container, which means on M$ and Apple$ a VM with a usable OS. Want to make Scala easy and trouble-free runnable on any box with just a few clicks? Create a dev-container. No more “thousends” of installation methods, every with its own bunch of failure modes—especially on systems that don’t have a proper package manager. This could be integrated in Metals, and available by verbatim one click! If Scala would have proper distri builds creating such a dev-container would be a no-brainer: Install a base distro, install the Java RTS, maybe additionally LLVM & Node.js, and the Scala packages… A few lines of Docker… Things could be so simple… And BTW: Such self-contained, reproducible dev-environments are especially handy in case of classes with students.)

I think Scala CLI is great! (I usually advertise it everywhere. :slight_smile:)

But this and it’s tail should just never become hard dependency of the compiler. Otherwise it will become almost impossible to de-entangle the build from dependency hell again, so the compiler could be build form source finally.

2 Likes

Scala CLI will not be a hard dependency of the compiler, it will just be released alongside it. If it ever becomes problematic it should be easy to remove it, but I don’t think this will happen.

As I understand it, now there would be even more external dependencies directly in the compiler sources, right? That’s not wise…

Please try at least a little bit to get Scala into a state where it would become buildable form source, so it can become a real OpenSource project finally.

The compiler does have dependencies and will always have them. I am not familiar with the definition of open source that bars using dependencies. We have one really large which is Java itself, so what you propose is impossible to achive.

For sure we will try to have more capabilities available that do not need to download anything so that the basics are possible without any additional downloads, but I don’t believe this is the only thing that matters here. We want to provide users with a really good out of the box experience, which itself I think is most important here.

2 Likes

@tgodzik

It’s not about not having dependencies. You can have them. But you have to be conscious of them and may need to handle them in a special manner. Cyclic dependencies are especially problematic, though.

Java and the Java platform as such is boostrappable, for example.

What @MateuszKowalewski is asking for is not only achievable, but also very desirable. But the Scala team would need to be aware of the issue. You can read more about it on this website:

https://bootstrappable.org/projects.html

https://bootstrappable.org/projects/jvm-languages.html

2 Likes

This does seem like a useful thing altogether, but that would require limiting Scala CLI functionalities a lot and also extensive changes in the compiler, so doesn’t look feasible currently.

Scala CLI is not a compiler dependency. It is not required to build the compiler and never will be. Except for the buildtool that is written in Scala, the compiler has no Scala dependencies. SBT also is not a big problem to bootstrappability as we can build it with earlier versions of the compiler, which can be built by earlier versions of SBT. Finally, we will arrive at the compiler version that doesn’t require SBT.

We can go further back. The real problem appears around Scala 2.0 in 2006. It was built using the latest version of Scala 1. From what I see on the internet, it was Scala 1.4.0, which was written in Java with some extensions. The source of the compiler for those extensions was never published. I also don’t know if the source for Scala 1.4.0 is publicly available.

So to have full bootstrapping we need to have one of the two:

  • Interpreter/transpiler that is able to process Scala 2.0 source.
  • Sources for both Scala 1.4.0 (those may be published somewhere, I don’t know) and source for the compiler extensions used to build it.

Nothing more is required and nothing more affects the bootstrapping. Definitely, Scala CLI has nothing to do with it.

2 Likes

Note that I’ve split this discussion into a new thread, since it’s only tangentially related to Scala-CLI.

Past threads about the bootstrapping issue include:

4 Likes

To clarify here: I’m only saying here that this is possible, not that it is easy. Using one major (2.12, 2.13, 3) release of Scala to build another one is possible but requires multiple intermediate builds. I don’t remember the exact number, but I recall something around at least 30 builds being necessary to build Scala 3.0.0 using Scala 2.13. I don’t know how it looked between major releases of Scala 2, but it was probably even worse.

Even in Scala 3 world, it is not always straightforward. We strive to build the current version of the compiler with the previous one (so 3.3.0 was built with 3.2.2, 3.2.2 with 3.2.1, etc.). However, it was impossible in one case. 3.0.2 wasn’t able to build 3.1.0. We needed one intermediate snapshot version between them.

… and the paragraph above also contains oversimplification. Each released version of the compiler is built not directly by the previous version but by the intermediate build that is never published on maven central. This intermediate compiler is built from the same sources as the final released version, using the previous compiler. So, 3.3.0 was built by the 3.3.0-[censored], which was built from the 3.3.0 sources by 3.2.2.

That sounds excessively complicated from the outside. If it’s not wildly off topic, I’d be curious about the history that lead to this workflow, and what benefits come out of it.

I don’t think it’s feasible any more to try to go down the whole bootstrap chain until some prehistoric Scala compiler which wasn’t build in Scala. This ship sailed long ago, imho.

The only way I can think of is to “build” a Scala compiler (or interpreter) in a different language, one which is bootstrapable; so likely in Java.

But of course rebuilding the compiler in Java is also not doable.

But one could try to “cheat”:

I’ve tried the obvious approach of just decompiling the compiler. But this does not work as no Java byte-code decompiler I know of (and I’ve tried all I could find) is able to generate sources that even would compile again. Also I don’t think the resulting code would be acceptable as it’s too much obfuscated. The main offender are pattern matches which decompile very often to labeled jumps. (The “goto-encoding” of pattern matches is an engineering marvel, really clever, very efficient, but the code can’t be comprehended by mortal beings any more frankly.)

So my current “best idea” would be to create a Scala (or TASTy) to Java compiler. This seems at least somehow doable as the compiler internal AST in late phases, close to byte-code generation, is actually “almost Java”. One would “just” need a kind of “Java pretty printer” for that AST.

As Scala’s and Java’s type-systems aren’t compatible statically typed Java output is not really achievable. But I think it would be OK if one would take type erased tees as input. The resulting code wouldn’t be very pretty and full of casts wherever generics where used, but this wouldn’t be worse than with for example compilers / interpreters written in dynamic languages. The code would be still acceptable as human readable and understandable, I think.

“Only” things that can’t be mapped directly to Java—and the expanded encodings aren’t really meant to be read by humans—would need some extra love. Like for example the mentioned pattern matches. (Maybe some of that can be solved by the new Java pattern match capabilities, or maybe by outputting some code which uses something like Vavr?)

This Scala => Java “transpiler” would be a throw-away artifact. It doesn’t need to be good for anything than this task here at hand. So no docs, no optimizations (which would be anyway contra-productive as the goal is to generate readable code, not fast code), and just enough features to process the current, but as much as possible striped down Scala sources once.

The result of this “pretty printing to Java” doesn’t need to be perfect! If some manual post processing of the output would be easier than trying to solve this in the “pretty printer” this would be OK; as long as this doesn’t get out of hands. The idea is to “cheat” around writing a Scala compiler in Java from scratch. Not to create a real Scala to Java compiler…

So the idea would be to first rip everything out of the compiler sources that isn’t strictly necessary to do the source => byte-code transformation. Especially things like the type checker. And also everything around the compiler as such, like runners, doc processors, whatever. (Though the std. lib is a little bit problematic as it’s quite large and not modular.)

Than compile the hopefully reasonably small remaining sources to Java with the help of the above described “AST pretty printer”.

Yes, no type checker in here. The bootstrap compiler would have only one purpose: Create a “seed” binary of the full current compiler. Form that you could used the result to build (first itself once again) and than further versions of Scala 3. We would have a proper bootstrap again! (I’m not sure it makes sense at this point to do the dance for Scala 2. I would leave it out, I guess.)

Also the build of this minimal “rump-compiler” needs to be rewritten in something else than SBT. Trying to bootstrap (or even just transpile) SBT is imho likely more complex than doing this for the compiler itself. SBT has a shitload of dependencies… (And that’s the problem with dependencies. You need to bootstrap all of them first! This explodes usually very quickly. Especially in an environment where the usual expectation is that you can just “download the internet” during build. Java is really a glory mess when it comes to dependencies and the “modularity” story. Almost all “modern” Java stuff can’t be build from source any more. All the newer big JVM projects are missing in the software repos. That’s for a reason. Super bad trend since around a decade… Nobody cares!). But I think creating a build for a kind of “rump-compiler” even in something like make would be doable. (Or Ant, Maven, or whatever is already in the repos and would fit in here.)

Than you need to do more or less the same for SBT: Create a “rump-SBT”, with a build that is not based on SBT, and compile that with the bootstraped compiler. Use this “rump-SBT” to build SBT. Than you can run actually the regular Scala build. From here one could start packaging Scala libs finally!

So that’s my current “best” idea.

It would help tremendously if the Scala compiler, and actually it’s lib, would be as modular as possible!

Same for SBT.

This would really help deconstructing this whole thing to just the parts that are strictly needed for source to byte-code translation without further checks (as we may just assume the Scala codebase is correct, as it actually compiles with the “real” compiler with all checks).

The “Scala AST Java pretty printer” thingy would profit from good APIs for code generation (which could be tweaked to output Java syntax). But I think it would be doable even without proper APIs as one could try to hack the current AST pretty printer. The result would be anyway just one-time use throw-away code…

Yeah, I know this sounds “a little bit” crazy. But after looking into this some time ago, and playing around with some ideas I came to the conclusion that this would be still the “simplest” way to do it currently.

All this wouldn’t be needed of course if Scala would have a proper OpenSource story form the beginning, and if people would care about such things in the first place, and not make the situation even gradually worse over time.

1 Like

You’ll probably need scala 2.12 to compile sbt. Maybe if you’re lucky there will be a Scala 3 sbt by the time you get all the other stuff working…

Given that until now this is just an idea maybe it’ll be Scala 4 SBT… :sweat_smile:

I was thinking about how to do it and tried the obvious thing a half year ago, but since than nothing really happened. Looks like a lot of work and I have only very shallow knowledge about compiler internals until now.

I’m looking into TASTy currently for other reasons, but maybe this turns out helpful also in this case here.

But I would not expect that something happens anytime soon from my side. I’ve posted here mostly to find out whether the idea makes sense in general. The usual “someone is wrong on the internet” smoke test… At least until now nobody came up with reasons why the approach is impossible in practice. So far so good.

Let’s see whether more, or (hopefully!) better, simpler ideas come up…

Just a small reminder why downloading random binaries form the internet is a big no-no:

(This episode form the popular never ending Rust soap opera)

Please look through the comments and also consider the reactions, and the amount of people actually carrying about this indeed very severe issue.

Just imagine the people in charge of IT security, for example at big banks, health industry, and the like, would actually know that by using Scala they run in their production environments arbitrary, unsigned, not reproducible binaries created by random, foreign, anonymous people. (It’s very likely that the management actually does not know about the details, as just downloading and running arbitrary binaries is usually against legal policies at such entities…)

How it’s done elsewhere? Here’s an example of “doing it right”:

But Scala is still not part of the party. :frowning_face:

BTW. related: Scala 3 macro security

3 Likes

For the archaeologists interested in the reallllllly ancient links wayyyyy early in the bootstrap chain, Matthias Zenger recently put JaCo up on GitHub: GitHub - objecthub/jaco: Extensible Java compiler framework.

3 Likes