Scala Native Next Steps

We plan to support it eventually, but I don’t know yet when.

I agree that it is interesting and I would really like to be able to work more easily with Scala on bare metal.

That said, I wonder what Scala Native really has to offer vs. the other modern system languages out there. To understand that, let’s look at two significant topics of what it is missing in Scala Native:

  • Support for multi-threading (i.e. the Java memory model and GC support)
  • More complete JDK API support

In general, better JDK support is needed to support more of the Scala library ecosystem out-of-the-box (i.e. only having to add cross-compilation).

If I would pick just one single critical platform feature, it would be TLS support. These days, you cannot realistically connect to any other system without it. To implement this correctly (offering SSLEngine and related APIs) is very complex. I guess an initial effort could try to port netty’s OpenSSL support but just the key and certificate management is a huge chunk of work. The problem is that it requires a high initial investment to get it working (I would guess at least 3-6 months of work) but it would also require regular maintenance because of its security implications.

But this is just the tip of the iceberg. Without a proper platform implementation, the “only thing” that Scala Native offers is really “Scala on bare metal”. In the best case, a new ecosystem would form that would provide a new platform as a set of libraries that depend directly on native libraries. It is a lot of work to provide those wrappers in a portable and (memory-)safe way. And that ecosystem would still be 5-10 years behind what Go and Rust offer right now.

I’m not sure what it would take to implement multi-threading support. It’s definitely not trivial. In any case, it would be good to know what the plans are in that regard. Often an application can be fine with a single thread, but at some point there’s often a cliff where your application cannot scale any further being able to make use of multiple cores. This will only get worse in the future.

Altogether, while I’d like to see the project succeed, I’m a bit skeptical that it can be successful in the current competitive environment. I guess it could get the required leverage if dotty provides a more-than-incremental push to the language, so that in the longer term it could still be successful (or find its niche) but then we are talking about the 5-10 year time span. Are there plans longer than for the immediate future and how will those major points be addressed?

(There’s also the elephant in the room, Graal’s native image support and the Substrate VM, which, despite of all its current shortcomings seems rather complete when you compare it to Scala Native. It’s not quite in the same space, though, since it is rather enterprisy, comes with the potential vendor lockin, licenses, etc.)

4 Likes

In addition to Graalvm native image, there is also a proposal for an OpenJDK “Project Leyden” which if it goes ahead would also seem to be targeting a somewhat similar space.

“And that ecosystem would still be 5-10 years behind what Go and Rust offer right now”.

The advantage of Scala over Rust would be that it is easier to program in. The problem with Rust is its borrow checker, which can make programming challenging. I have programmed in Rust and find it comparable to a door-to-door combat scenario, one building at a time, and often the need to back track when facing the wrath of the borrow checker, and strategize about a new approach.

Klaus

3 Likes

I don’t know about Project Leyden, but the GraalVM native image only addresses some of the things that Scala Native can do. Basically, the native image gives you a native executable of a standard JVM program. That means you are limited to what the JVM limits you to. You can’t do system programming or make optimizations based on memory locality in a native image. A quick look at an article on Project Leyden implies it has the same goal as GraalVM Native Image, they both allow for quick startup because there is no VM to spin up. Outside of that though, it doesn’t look like they offer anything else. That’s far from what Scala Native provides.

1 Like

Disclaimer: I’ve never used Graalvm, but the compiler options look like it can link against LLVM bitcode and lots of other languages, hence I would have thought that is could be used for systems programming.

Having a page that objectively compares the pros/cons of Scala Native vs Graalvm native would probably be interesting/helpful for engineers trying to evaluate these technologies.

I didn’t look all that closely, but scanning https://www.graalvm.org/docs/reference-manual/native-image/ doesn’t make me think they are equivalent. Even if you can make calls to C libraries through the linker, that’s not really enough to do system programming unless they introduce new types for storing pointers. Scala Native has a Ptr[A] type that lets you really interact with memory at a low-level. I haven’t seen anything indicating the GraalVM has anything similar. It would be more like making JNI calls on the JVM than actually having low-level access to the machine.

1 Like

Java has http://openjdk.java.net/projects/panama/ for dealing with native libraries and native memory and http://openjdk.java.net/projects/valhalla for flattening objects’ representation (i.e. reducing pointer chasing). Part of Project Panama that deals with low-level memory access is already incubating as part of http://openjdk.java.net/projects/jdk/14/ The documentation of https://openjdk.java.net/jeps/370 is here: https://download.java.net/java/GA/jdk14/docs/api/jdk.incubator.foreign/module-summary.html IMHO there’s a high chance that this will be supported by GraalVM eventually.

I’m not sure about the system programming part, but there are already GraalVM native-image-compatible libraries like https://github.com/remkop/picocli that facilitate writing AOT-compiled command line tools.

7 Likes

Exciting news! As mentioned above, picocli offers tooling to make it easy to create GraalVM native images for picocli-based command line applications. I would like to learn more to see how picocli can be improved to also make it easy for people to write Scala Native command line applications with it.

Is there a link about requirements for libraries that facilitates their use in Scala Native? Looking at the reflective instantiation support in the OP, it appears that classes need to be annoted with @EnableReflectiveInstantiation before they can be reflectively instantiated in a Scala Native app.

Picocli uses reflection internally to instantiate subcommands and other components. Some of these may be written in Scala and can be annotated with @EnableReflectiveInstantiation, but some of them (like a number of picocli built-in components) are written in Java and do not have this annotation.

So, I hope there will be another mechanism separate from this annotation to register classes that need to be instantiated reflectively. If these requirements are documented somewhere, library authors can prepare to make it possible for their libraries to be used in Scala Native.

2 Likes

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

I’d like to say, do not make things thinking of overtaking/competing with other alternatives. Scala does not compete with Java or Kotlin (in any regard really). Scala.js does not compete with Typescript or JavaScript. And Scala-Native should not try to compete with alternatives.
There’s no award to achieve by competing, and there’s a lot you could lose for trying.
If you need a reason of why Scala-Native over alternatives, it’s simply because it is Scala, and the alternatives aren’t.

Having worked with Rust, I can tell you that it sucks (for a scala programmer). It’s terribly inexpressive, it’s constantly getting in my way and making me do small performance decisions every step of the way, oh, and it totally gave up on the concept of abstractions and just forces on me a text processor (their macro system, which is used for everything really, since the base language lacks abstraction capabilities).

13 Likes

Scala Native itself doesn’t need built-in support for TLS to be useful in server development, it’s possible to build on top of existing servers like NGINX Unit https://unit.nginx.org/

If we are going to pick any language where Scala Native is “competing” (and I don’t think such discussions are fruitful anyways as covered by @rcano), I would rather pick Python instead of Go or Rust. Having worked full-time with Python, I think there’s lot of applications that are using Python today where Scala would be better suited.

I am super excited about GraalVM native-image and I’m already heavily using it in several projects both in open source (Scalafmt) and at work. However, the developer experience with native-image today is not great: it takes a long time to link a binary resulting in slow edit-and-test feedback loops and there are many native-image configuration flags that tweak the runtime semantics, which requires a decent upfront investment to learn. In comparison, Scala Native binaries link significantly faster and you can test them just like you would test a JVM application giving me better confidence that the binary works as expected at runtime.

I think native-image is revolutionary technology and a huge boon to the Scala/JVM ecosystem. I’m also just as excited about the applications for Scala Native. These are not contradicting views.

7 Likes

Just to pick up on this, I work on a lot of scripts, mostly Python.

Python is great, but inevitably you want to do something in parallel: whether parallel HTTP requests, maybe parallel building some docker containers, parallel auto-formatting some source files, whatever.

This is the point at which all hell breaks loose. Multi-threading in Python doesn’t work well, and multi-processing in Python is clunky and fragile. multiprocessing.pool breaks Ctrl-C and makes scripts unkillable unless you jump through hoops with Ctrl-Z and kill -9, and the -9 is often required. Many of these scripts are slow (hence the parallelization) and it’s not uncommon that a script will blow up with a NameError or TypeError after several minutes of execution, with the stack trace mangled due to multiprocessing. Deployment is often a pain: even with tools like PEX to try and make somewhat-hermetic executables, someone will inevitably brew install something that messes up the global python install.

Some of these tools end up being ported to Scala. This works, but it certainly is inconvenient to take a large codebase started in Python when it was small and short-running, and porting it to Scala just because it started becoming large and slow.

This awful experience is the status quo for a lot of our important scripts and automation. They’re not written in Scala because the JVM startup overhead and resource footprint is too much, but by no means is the Python experience “good”. It’s acceptable, and we’re surviving, but there’s a real opportunity for Scala-Native to come in here with nice Futures-based parallelism, proper multithreading, easy collections transformations, type checking to catch dumb errors.

Scala-Native could provide a real alternative to writing scripts in Python, and perhaps be a lot more ergonomic than the other alternatives which are writing your command line tools in Go or Rust (which is also increasingly common)

17 Likes

Thanks for the replies and points of view, @rcano and @olafurpg.

Regarding competition: I’m also not a fan of overly competitive thinking when it comes to languages. From an idealistic viewpoint, I would like to see Scala Native to succeed and I could just be happy that there’s funding for a somewhat risky project.

That said, I am also realistic enough to know that resources are limited. Resources are Scala Center’s developer resources, but also public attention and momentum that are required to drive such a project.

Ultimately, the only way (for anything) to survive is find its (probably temporary) niche. The statement at the top about why Scala Native is relevant and what is missing and will be done is too fuzzy to be useful. You could say this is a classic project management problem. If you want such a project to succeed you will first have to admit there’s an insurmountable amount of work ahead of you, and then you will need to have a very specific idea how to arrive incrementally at something useful as quickly as possible. The statement gives a reasonable roadmap for the short-term but doesn’t say what reasonable short- and mid-term goals are and how to reach them.

My assumption behind this is that Scala Native currently isn’t useful (enough). Here again the question is whether you look at “being useful” as an isolated absolute concept or relative to what’s going on elsewhere. Let’s take the stance, that it is useful if, given a systems programming task, you would choose Scala Native over anything else. Let’s look at a few scenarios:

  • A simple short-running command line application (dealing mostly with file IO and nothing else, no): yes, Scala Native is applicable even now, you can program in Scala, basic APIs are there
  • A short-running command line application that is CPU intensive: not so much, since there’s no support for multi-threading to spread the work
  • A tool that connects to other service over the network, not so much, since protocol implementations are missing
  • A low-level tool that interfaces with the kernel (basically standard C usage) or C libraries: somewhat applicable and performant but inherits all the bad parts from C (memory unsafety, clunky APIs, etc.) while even missing concurrency primitives for the Scala parts. If you compare that with just doing the same using Scala on the JVM using JNA as FFI, you don’t win much. Using JNA is somewhat uncommon but it works on the JVM if you don’t have the utmost performance or resource requirements. With native-image, you can also solve these with SubstrateVM’s FFI.
  • A bigger, long-running application: plus points for being able to write in Scala, minus points for missing multi-threading, missing connectivity, Scala on the JVM already provides all these features with good to perfect performance characteristics (but the well-known bloat wrt startup times and memory usage)

Which use cases are missing and which of these use cases are the primary target for Scala Native?

For me personally, the current answer whether to use Scala Native is “no” (aside from playing around) because there’s nothing useful enough I can do with it. I have a project for quick Linux memory analysis (doing some custom aggregation) that would come closest. That currently runs a few seconds per invocation. But even there, most of the runtime is because its functionality (parsing files in /proc) is not optimized to be fast. Despite it being a short-running program, the JVM imposed runtime cost is there but not relevant enough to try hard to get rid of it.

So, I guess what I’m asking for are more concrete ideas about applicability and then also some data that would support those ideas.

There’s not only servers… And also, just having a C-API doesn’t make it a ready-to-use library for Scala Native.

If you are coming from Python you could also use Scala on the JVM, couldn’t you?

Yeah, native-image times are really bad (but that’s an ongoing struggle for any compiled language). Would be interesting to see some apple-to-apple comparisons for that particular aspect (but, of course, it’s also somewhat expected that Scala → native translation can be faster than going via bytecode first, the question is whether the downsides are worth it).

2 Likes

After I wrote this I just saw this article about startup time improvements with native-image: Static Compilation of Java Applications at Alibaba at Scale | by Alina Yurenko | graalvm | Medium. So, I realize that my impression of the JVM-startup being fast enough for most scripts is rather biased by using a (7+ year old but still) reasonably fast laptop where there’s usually enough spare cores available to run the JVM JIT and GC in parallel. Other deployment scenarios where you have to pay (in latency and money) for these extra resources like FaaS, slimly deployed containers or VMs, or embedded or small devices might not have that luxury and would benefit a lot more from more efficient startup.

3 Likes
  • a number crunching application that runs on wasm either in the browser or on the desktop
1 Like

I had a quick try with that project (200 lines of code, uses scala collections, file IO, and regular expressions, no dependencies). Here’s a comparison:

  • AdoptOpenJDK 8, 2.5 seconds (4.7s CPU time)
  • Graal native-image JDK 11 20.1.0, 2.29 seconds (same CPU time)
  • Scala native 0.4.0-M2: 20-50 seconds depending on settings (many settings fail because of a memory leak)

Build times:

  • scalac + assembly: 5s
  • native-image, on top of above: 30s
  • Scala Native: 13s in debug mode (leading to >50s runtime), 200 - 300s in release-full mode (20s runtime), 30s in release-fast mode (which runs until the machine runs out of memory)

In the best case, Scala Native’s results are due to a problem with the regex implementation. As far as I got with profiling Scala Native code, much time is spent in GC which might be a problem in itself or a consequence of a potential memory leak.

In any case, I find this statement somewhat optimistic…

1 Like

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

Not being able to include JARs that are not written in Scala in a Scala Native app is a massive limitation… :frowning: Bit of a show-stopper that…

Can’t we decompile a JAR to Scala and then compile to LLVM IR? That is a bit tongue-in-cheek, but surely there are some bytecode patterns that cause problems and others that are fine. Libraries that adhere to certain rules could be included that way.

If there is no way around this limitation, one idea that could offset this (in terms of value proposition) would potentially be cross-compilation, as GraalVM does not have that (and it is not on the road map; the GraalVM team seems to have given up on the idea of cross-compilation).

I’m not an expert on Scala Native compilation process, but AFAIK Scala Native acts as a compiler plugin for Scala compiler and collects some vital information long before emitting Java bytecode and metadata inside *.class files. I think bending Scala Native to do decompilation is probably not an easy path.

I’m not sure how cross-compilation is relevant here, but:

  • AFAIU GraalVM native-image can compile bytecode produced by Scala, but that would bypass Scala Native entirely.
  • SubstrateVM from Graal project has its own garbage collector and objects’ representation. Scala Native also has its own garbage collector and objects’ representation. Thus sharing normal objects between them seems impossible.
  • However both GraalVM and Scala Native provide a FFI (foreign function interface) which could be used as an interface between native code produced by GraalVM native-image and native code produced by Scala Native. But that would be cumbersome as you would need to use raw C strings and C structs to pass information between these native code parts. OTOH if you were able to implement C ABI in picocli then it would be usable not only from Scala Native but also from any other language including C, C++, Rust, C#, Python and so on. OTOH (how many hands do I have?) that would be weird, time consuming for picocli authors and maybe not worth the effort in the long run?

Overall, the situation seems complicated and I don’t have enough insight to it as I’m only a bystander.

I guess that if it were feasible to recompile a random jar, it would have been done a long time ago for scala.js.

2 Likes