Scala Native Next Steps

In addition to Graalvm native image, there is also a proposal for an OpenJDK “Project Leyden” which if it goes ahead would also seem to be targeting a somewhat similar space.

“And that ecosystem would still be 5-10 years behind what Go and Rust offer right now”.

The advantage of Scala over Rust would be that it is easier to program in. The problem with Rust is its borrow checker, which can make programming challenging. I have programmed in Rust and find it comparable to a door-to-door combat scenario, one building at a time, and often the need to back track when facing the wrath of the borrow checker, and strategize about a new approach.

Klaus

2 Likes

I don’t know about Project Leyden, but the GraalVM native image only addresses some of the things that Scala Native can do. Basically, the native image gives you a native executable of a standard JVM program. That means you are limited to what the JVM limits you to. You can’t do system programming or make optimizations based on memory locality in a native image. A quick look at an article on Project Leyden implies it has the same goal as GraalVM Native Image, they both allow for quick startup because there is no VM to spin up. Outside of that though, it doesn’t look like they offer anything else. That’s far from what Scala Native provides.

Disclaimer: I’ve never used Graalvm, but the compiler options look like it can link against LLVM bitcode and lots of other languages, hence I would have thought that is could be used for systems programming.

Having a page that objectively compares the pros/cons of Scala Native vs Graalvm native would probably be interesting/helpful for engineers trying to evaluate these technologies.

I didn’t look all that closely, but scanning https://www.graalvm.org/docs/reference-manual/native-image/ doesn’t make me think they are equivalent. Even if you can make calls to C libraries through the linker, that’s not really enough to do system programming unless they introduce new types for storing pointers. Scala Native has a Ptr[A] type that lets you really interact with memory at a low-level. I haven’t seen anything indicating the GraalVM has anything similar. It would be more like making JNI calls on the JVM than actually having low-level access to the machine.

Java has http://openjdk.java.net/projects/panama/ for dealing with native libraries and native memory and http://openjdk.java.net/projects/valhalla for flattening objects’ representation (i.e. reducing pointer chasing). Part of Project Panama that deals with low-level memory access is already incubating as part of http://openjdk.java.net/projects/jdk/14/ The documentation of https://openjdk.java.net/jeps/370 is here: https://download.java.net/java/GA/jdk14/docs/api/jdk.incubator.foreign/module-summary.html IMHO there’s a high chance that this will be supported by GraalVM eventually.

I’m not sure about the system programming part, but there are already GraalVM native-image-compatible libraries like https://github.com/remkop/picocli that facilitate writing AOT-compiled command line tools.

6 Likes

Exciting news! As mentioned above, picocli offers tooling to make it easy to create GraalVM native images for picocli-based command line applications. I would like to learn more to see how picocli can be improved to also make it easy for people to write Scala Native command line applications with it.

Is there a link about requirements for libraries that facilitates their use in Scala Native? Looking at the reflective instantiation support in the OP, it appears that classes need to be annoted with @EnableReflectiveInstantiation before they can be reflectively instantiated in a Scala Native app.

Picocli uses reflection internally to instantiate subcommands and other components. Some of these may be written in Scala and can be annotated with @EnableReflectiveInstantiation, but some of them (like a number of picocli built-in components) are written in Java and do not have this annotation.

So, I hope there will be another mechanism separate from this annotation to register classes that need to be instantiated reflectively. If these requirements are documented somewhere, library authors can prepare to make it possible for their libraries to be used in Scala Native.

1 Like

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

I’d like to say, do not make things thinking of overtaking/competing with other alternatives. Scala does not compete with Java or Kotlin (in any regard really). Scala.js does not compete with Typescript or JavaScript. And Scala-Native should not try to compete with alternatives.
There’s no award to achieve by competing, and there’s a lot you could lose for trying.
If you need a reason of why Scala-Native over alternatives, it’s simply because it is Scala, and the alternatives aren’t.

Having worked with Rust, I can tell you that it sucks (for a scala programmer). It’s terribly inexpressive, it’s constantly getting in my way and making me do small performance decisions every step of the way, oh, and it totally gave up on the concept of abstractions and just forces on me a text processor (their macro system, which is used for everything really, since the base language lacks abstraction capabilities).

6 Likes

Scala Native itself doesn’t need built-in support for TLS to be useful in server development, it’s possible to build on top of existing servers like NGINX Unit https://unit.nginx.org/

If we are going to pick any language where Scala Native is “competing” (and I don’t think such discussions are fruitful anyways as covered by @rcano), I would rather pick Python instead of Go or Rust. Having worked full-time with Python, I think there’s lot of applications that are using Python today where Scala would be better suited.

I am super excited about GraalVM native-image and I’m already heavily using it in several projects both in open source (Scalafmt) and at work. However, the developer experience with native-image today is not great: it takes a long time to link a binary resulting in slow edit-and-test feedback loops and there are many native-image configuration flags that tweak the runtime semantics, which requires a decent upfront investment to learn. In comparison, Scala Native binaries link significantly faster and you can test them just like you would test a JVM application giving me better confidence that the binary works as expected at runtime.

I think native-image is revolutionary technology and a huge boon to the Scala/JVM ecosystem. I’m also just as excited about the applications for Scala Native. These are not contradicting views.

4 Likes

Just to pick up on this, I work on a lot of scripts, mostly Python.

Python is great, but inevitably you want to do something in parallel: whether parallel HTTP requests, maybe parallel building some docker containers, parallel auto-formatting some source files, whatever.

This is the point at which all hell breaks loose. Multi-threading in Python doesn’t work well, and multi-processing in Python is clunky and fragile. multiprocessing.pool breaks Ctrl-C and makes scripts unkillable unless you jump through hoops with Ctrl-Z and kill -9, and the -9 is often required. Many of these scripts are slow (hence the parallelization) and it’s not uncommon that a script will blow up with a NameError or TypeError after several minutes of execution, with the stack trace mangled due to multiprocessing. Deployment is often a pain: even with tools like PEX to try and make somewhat-hermetic executables, someone will inevitably brew install something that messes up the global python install.

Some of these tools end up being ported to Scala. This works, but it certainly is inconvenient to take a large codebase started in Python when it was small and short-running, and porting it to Scala just because it started becoming large and slow.

This awful experience is the status quo for a lot of our important scripts and automation. They’re not written in Scala because the JVM startup overhead and resource footprint is too much, but by no means is the Python experience “good”. It’s acceptable, and we’re surviving, but there’s a real opportunity for Scala-Native to come in here with nice Futures-based parallelism, proper multithreading, easy collections transformations, type checking to catch dumb errors.

Scala-Native could provide a real alternative to writing scripts in Python, and perhaps be a lot more ergonomic than the other alternatives which are writing your command line tools in Go or Rust (which is also increasingly common)

12 Likes

Thanks for the replies and points of view, @rcano and @olafurpg.

Regarding competition: I’m also not a fan of overly competitive thinking when it comes to languages. From an idealistic viewpoint, I would like to see Scala Native to succeed and I could just be happy that there’s funding for a somewhat risky project.

That said, I am also realistic enough to know that resources are limited. Resources are Scala Center’s developer resources, but also public attention and momentum that are required to drive such a project.

Ultimately, the only way (for anything) to survive is find its (probably temporary) niche. The statement at the top about why Scala Native is relevant and what is missing and will be done is too fuzzy to be useful. You could say this is a classic project management problem. If you want such a project to succeed you will first have to admit there’s an insurmountable amount of work ahead of you, and then you will need to have a very specific idea how to arrive incrementally at something useful as quickly as possible. The statement gives a reasonable roadmap for the short-term but doesn’t say what reasonable short- and mid-term goals are and how to reach them.

My assumption behind this is that Scala Native currently isn’t useful (enough). Here again the question is whether you look at “being useful” as an isolated absolute concept or relative to what’s going on elsewhere. Let’s take the stance, that it is useful if, given a systems programming task, you would choose Scala Native over anything else. Let’s look at a few scenarios:

  • A simple short-running command line application (dealing mostly with file IO and nothing else, no): yes, Scala Native is applicable even now, you can program in Scala, basic APIs are there
  • A short-running command line application that is CPU intensive: not so much, since there’s no support for multi-threading to spread the work
  • A tool that connects to other service over the network, not so much, since protocol implementations are missing
  • A low-level tool that interfaces with the kernel (basically standard C usage) or C libraries: somewhat applicable and performant but inherits all the bad parts from C (memory unsafety, clunky APIs, etc.) while even missing concurrency primitives for the Scala parts. If you compare that with just doing the same using Scala on the JVM using JNA as FFI, you don’t win much. Using JNA is somewhat uncommon but it works on the JVM if you don’t have the utmost performance or resource requirements. With native-image, you can also solve these with SubstrateVM’s FFI.
  • A bigger, long-running application: plus points for being able to write in Scala, minus points for missing multi-threading, missing connectivity, Scala on the JVM already provides all these features with good to perfect performance characteristics (but the well-known bloat wrt startup times and memory usage)

Which use cases are missing and which of these use cases are the primary target for Scala Native?

For me personally, the current answer whether to use Scala Native is “no” (aside from playing around) because there’s nothing useful enough I can do with it. I have a project for quick Linux memory analysis (doing some custom aggregation) that would come closest. That currently runs a few seconds per invocation. But even there, most of the runtime is because its functionality (parsing files in /proc) is not optimized to be fast. Despite it being a short-running program, the JVM imposed runtime cost is there but not relevant enough to try hard to get rid of it.

So, I guess what I’m asking for are more concrete ideas about applicability and then also some data that would support those ideas.

There’s not only servers… And also, just having a C-API doesn’t make it a ready-to-use library for Scala Native.

If you are coming from Python you could also use Scala on the JVM, couldn’t you?

Yeah, native-image times are really bad (but that’s an ongoing struggle for any compiled language). Would be interesting to see some apple-to-apple comparisons for that particular aspect (but, of course, it’s also somewhat expected that Scala -> native translation can be faster than going via bytecode first, the question is whether the downsides are worth it).

2 Likes

After I wrote this I just saw this article about startup time improvements with native-image: https://medium.com/graalvm/static-compilation-of-java-applications-at-alibaba-at-scale-2944163c92e. So, I realize that my impression of the JVM-startup being fast enough for most scripts is rather biased by using a (7+ year old but still) reasonably fast laptop where there’s usually enough spare cores available to run the JVM JIT and GC in parallel. Other deployment scenarios where you have to pay (in latency and money) for these extra resources like FaaS, slimly deployed containers or VMs, or embedded or small devices might not have that luxury and would benefit a lot more from more efficient startup.

2 Likes
  • a number crunching application that runs on wasm either in the browser or on the desktop

I had a quick try with that project (200 lines of code, uses scala collections, file IO, and regular expressions, no dependencies). Here’s a comparison:

  • AdoptOpenJDK 8, 2.5 seconds (4.7s CPU time)
  • Graal native-image JDK 11 20.1.0, 2.29 seconds (same CPU time)
  • Scala native 0.4.0-M2: 20-50 seconds depending on settings (many settings fail because of a memory leak)

Build times:

  • scalac + assembly: 5s
  • native-image, on top of above: 30s
  • Scala Native: 13s in debug mode (leading to >50s runtime), 200 - 300s in release-full mode (20s runtime), 30s in release-fast mode (which runs until the machine runs out of memory)

In the best case, Scala Native’s results are due to a problem with the regex implementation. As far as I got with profiling Scala Native code, much time is spent in GC which might be a problem in itself or a consequence of a potential memory leak.

In any case, I find this statement somewhat optimistic…

1 Like

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

Not being able to include JARs that are not written in Scala in a Scala Native app is a massive limitation… :frowning: Bit of a show-stopper that…

Can’t we decompile a JAR to Scala and then compile to LLVM IR? That is a bit tongue-in-cheek, but surely there are some bytecode patterns that cause problems and others that are fine. Libraries that adhere to certain rules could be included that way.

If there is no way around this limitation, one idea that could offset this (in terms of value proposition) would potentially be cross-compilation, as GraalVM does not have that (and it is not on the road map; the GraalVM team seems to have given up on the idea of cross-compilation).

I’m not an expert on Scala Native compilation process, but AFAIK Scala Native acts as a compiler plugin for Scala compiler and collects some vital information long before emitting Java bytecode and metadata inside *.class files. I think bending Scala Native to do decompilation is probably not an easy path.

I’m not sure how cross-compilation is relevant here, but:

  • AFAIU GraalVM native-image can compile bytecode produced by Scala, but that would bypass Scala Native entirely.
  • SubstrateVM from Graal project has its own garbage collector and objects’ representation. Scala Native also has its own garbage collector and objects’ representation. Thus sharing normal objects between them seems impossible.
  • However both GraalVM and Scala Native provide a FFI (foreign function interface) which could be used as an interface between native code produced by GraalVM native-image and native code produced by Scala Native. But that would be cumbersome as you would need to use raw C strings and C structs to pass information between these native code parts. OTOH if you were able to implement C ABI in picocli then it would be usable not only from Scala Native but also from any other language including C, C++, Rust, C#, Python and so on. OTOH (how many hands do I have?) that would be weird, time consuming for picocli authors and maybe not worth the effort in the long run?

Overall, the situation seems complicated and I don’t have enough insight to it as I’m only a bystander.

I guess that if it were feasible to recompile a random jar, it would have been done a long time ago for scala.js.

2 Likes

I have very mixed feelings regarding Scala Native, Scala.js and any other attempt to target a platform other than the JVM.

Let me say I am very impressed by these efforts and what they have achieved, and I’m sure the people behind it are very smart and diligent.

If these were just experimental research projects, everything would be fine and there was no problem.

However, these projects are already creating a pressure to make the Scala language and the Scala standard library more “platform agnostic”, and this is where such efforts become a liability.

One of the strongest selling points for Scala is that there is a huge ecosystem of Java libraries that we can easily integrate. Well, if we compile Scala to bytecode and run it on the JVM, that is. Neither Scala Native nor Scala.js allow using Java libraries in general, infact, they do not even support some of the most popular parts of the Java Standard library. I have always been heavily using Java libraries, so that is a total showstopper.

The lack of support for Java libraries is due to fundamental obstacles. Scala, Java and all the other JVM languages all share a common set of design principles including automatic garbage collection, customizable classloading, separate compilation, generics through type erasure (but not for arrays!), lack of direct memory access, and reflection. These principles are an adaptation to running on the JVM, and they do not make sense for targeting another platform.

On the JVM, everything is at runtime either a primitive value or an object that has a getClass method, or an array. Much code relies on this, but it is true on no other platform.

For this reason, much that normally works in Scala (i.e. on the JVM) will never work in Scala.js or Scala Native or any other attempt to compile Scala to another platform. Even things that do work on other platforms will often be nothing more than a fragile and leaky abstraction forcing the user to be aware of the underlying implementation.

I have been using Scala for seven years and I tried Scala.js, but the next time I will rather use JavaScript than Scala.js. Should I ever need native code, I suspect I would rather use C++ than Scala Native.

To Scala.js and Scala Native: keep up the awesome work, but please do not expect the Scala to become “platform agnostic”. Scala will only work well on the JVM for the foreseeable future, so that should be the priority when it comes to designing the Scala language or the Scala library.

@curoli:
I disagree. Stating that targeting native environment is somewhat against Scala design principles is like stating that GraalVM’s native-image is against Java design principles, so it doesn’t make sense to go for native-image.

Let me address some of the issues:

  • garbage collection is present both on JVM, JavaScript and some languages typically compiled to native code (e.g. Go and Haskell).
  • classloading customizations aren’t usually done inside applications (i.e. I’ve never seen anyone going for classloading gymnastics in deployed application). SBT uses some classloading tricks, but SBT works for Scala, Scala.js and Scala Native already.
  • type erasure exists when e.g. translating TypeScript to JavaScript, but TypeScript is taking over the frontend world anyway.
  • specialized arrays exist in JavaScript too - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray
  • lack of direct memory access is present on JVM, JavaScript and also on some languages typically compiled to native code (Go, Haskell).
  • reflection is partially supported for Scala Native, Scala.js and AOT compilation using GraalVM’s native-image. In all cases it need some upfront configuration, but that’s still probably better than situation in C++.

There are plenty of Scala libraries compiling using Scala.js already and it doesn’t seem to me that Scala.js slows down Scala language evolution substantially. Browser based applications also usually do not need functionality typical for backend. For example you don’t use JDBC or server sockets in frontend, because of the sandboxed environment. Even if you can’t compile under Scala.js some apps that compile under Scala JVM it doesn’t mean Scala.js is pointless. Microsoft created Blazor WebAssembly which allows you to run C# on client side and there’s hype in .NET community. But not all code can run under Blazor. Does that make Blazor pointless?

2 Likes