Scala Native Next Steps

I have very mixed feelings regarding Scala Native, Scala.js and any other attempt to target a platform other than the JVM.

Let me say I am very impressed by these efforts and what they have achieved, and I’m sure the people behind it are very smart and diligent.

If these were just experimental research projects, everything would be fine and there was no problem.

However, these projects are already creating a pressure to make the Scala language and the Scala standard library more “platform agnostic”, and this is where such efforts become a liability.

One of the strongest selling points for Scala is that there is a huge ecosystem of Java libraries that we can easily integrate. Well, if we compile Scala to bytecode and run it on the JVM, that is. Neither Scala Native nor Scala.js allow using Java libraries in general, infact, they do not even support some of the most popular parts of the Java Standard library. I have always been heavily using Java libraries, so that is a total showstopper.

The lack of support for Java libraries is due to fundamental obstacles. Scala, Java and all the other JVM languages all share a common set of design principles including automatic garbage collection, customizable classloading, separate compilation, generics through type erasure (but not for arrays!), lack of direct memory access, and reflection. These principles are an adaptation to running on the JVM, and they do not make sense for targeting another platform.

On the JVM, everything is at runtime either a primitive value or an object that has a getClass method, or an array. Much code relies on this, but it is true on no other platform.

For this reason, much that normally works in Scala (i.e. on the JVM) will never work in Scala.js or Scala Native or any other attempt to compile Scala to another platform. Even things that do work on other platforms will often be nothing more than a fragile and leaky abstraction forcing the user to be aware of the underlying implementation.

I have been using Scala for seven years and I tried Scala.js, but the next time I will rather use JavaScript than Scala.js. Should I ever need native code, I suspect I would rather use C++ than Scala Native.

To Scala.js and Scala Native: keep up the awesome work, but please do not expect the Scala to become “platform agnostic”. Scala will only work well on the JVM for the foreseeable future, so that should be the priority when it comes to designing the Scala language or the Scala library.

1 Like

@curoli:
I disagree. Stating that targeting native environment is somewhat against Scala design principles is like stating that GraalVM’s native-image is against Java design principles, so it doesn’t make sense to go for native-image.

Let me address some of the issues:

  • garbage collection is present both on JVM, JavaScript and some languages typically compiled to native code (e.g. Go and Haskell).
  • classloading customizations aren’t usually done inside applications (i.e. I’ve never seen anyone going for classloading gymnastics in deployed application). SBT uses some classloading tricks, but SBT works for Scala, Scala.js and Scala Native already.
  • type erasure exists when e.g. translating TypeScript to JavaScript, but TypeScript is taking over the frontend world anyway.
  • specialized arrays exist in JavaScript too - TypedArray - JavaScript | MDN
  • lack of direct memory access is present on JVM, JavaScript and also on some languages typically compiled to native code (Go, Haskell).
  • reflection is partially supported for Scala Native, Scala.js and AOT compilation using GraalVM’s native-image. In all cases it need some upfront configuration, but that’s still probably better than situation in C++.

There are plenty of Scala libraries compiling using Scala.js already and it doesn’t seem to me that Scala.js slows down Scala language evolution substantially. Browser based applications also usually do not need functionality typical for backend. For example you don’t use JDBC or server sockets in frontend, because of the sandboxed environment. Even if you can’t compile under Scala.js some apps that compile under Scala JVM it doesn’t mean Scala.js is pointless. Microsoft created Blazor WebAssembly which allows you to run C# on client side and there’s hype in .NET community. But not all code can run under Blazor. Does that make Blazor pointless?

7 Likes

I strongly disagree. Scala.js brought to light the general power of the language, and the ways in which the language is not platform-dependent.

Plain and simply: I’ve been working in JavaScript literally since before it had that name. I also built one of the first production UIs in Scala.js. With that context: using JavaScript libraries from Scala.js works far better than doing so from JavaScript itself – it’s easier to use, and results in vastly more maintainable code. It’s the best programming language I’ve found for serious browser application work.

That’s precisely because we have a reasonably clean distinction between the language and the integrations. Scala on the JVM works well with JVM libraries. Scala in the browser works well with JavaScript libraries. Trying to paint this as an either/or is, IMO, an unnecessarily restrictive lens…

20 Likes

@jrudolph, why don’t you see these obvious things of having a really good scala native support?

When you start creating an application, you do it like a script, so Scala has the same light syntax as Python, you create something really fast that already works great, lets say on the jvm.

Then some part needs to be visualised, you use ScalaJS for that part. You don’t need to learn a new language, no need to learn a java script framework like angular or react, you don’t need to rewrite your code in javascript or TypeScript or whatever. You save a lot of time! Using the scalajs jquery wrapper makes this super easy.

You then need some super high performance processing of these same data, and you use Scala Native for that part. You don’t need to learn a new programming language like rust or go (although maybe you have to look at the excellent Scala Native books first few chapters)

Well, there is not multithreading at the moment, but if this is parallelizable or batchable, you can spin up these parts in separate processes (that is really easy to do) and merge these together in the end. There are plenty of examples where this would be a valuable approach.

Being able to use jars in Scala Native is not very relevant is it? Have you ever looked at all the amazing libraries being created in the C or C++ programming language? Why would you want to use jars, when you can use superfast low level C apis? I guess it would also be very easy to integrate with Rust and Go libraries if they have a C api available.

Not to forget the ability for Scala to stand on its own feats in the future, without being dependent on the jvm or needing to pay license money on the oracle jvm or daalvik.

I truly believe that ScalaJS and the Scala Native platform will be what brings Scala forward after dotty has been launched.

6 Likes

Couldn’t have been said better! Working with Scala.js in the browser is amazing :slight_smile:

4 Likes

Hi All,

I don’t know much about this topic, but I have been thinking about it in a abstract way for a while (planning on learning more)!

I was actually thinking about the old java motto;

Write once RUN everywhere

Which now (in the world of JS, GWT, Java, Scala (JS, JVM, Native) & Kotlin (JS, JVM, Native)) might be how to;

Write once and IMPORT / USE everywhere

Do you all think we could build some sort of standard in the larger community (IETF?, Java, Scala, Kotlin) to facilitate this daydream of;

Write once and IMPORT / USE everywhere

Cheers,

Scott

I disagree. I also use Scala 99% on the JVM, but in the few cases where I actually needed code to run in the browser, Scala.js was such a blessing; it worked quite seemlessly (and that’s a few years back), and I would have failed miserably if I had to use JavaScript.

Again, I would disagree. At least what I saw for Scala.js, it was working very well. I don’t think that the “Java library ecosystem” is as important in general as you portrait. Sure, if I analyse a larger application of mine, at the very bottom, there are probably still some Java libraries, but the huge majority of dependencies is already 100% Scala. I also don’t think you want to write the same kind of apps for Scala.js and SN than the ones for the desktop on JVM. Even very large projects such as Akka / Akka Stream now work under Scala.js

7 Likes

To get that also out of the way: I hope I don’t sound dismissive of the original efforts put into Scala Native by Denys and the contributors. Without this huge chunk of work, there wouldn’t even be something to discuss here.

And to phrase something I said before also a bit differently: I would very much like to see more ways to write low-level code with Scala (but it doesn’t necessarily has to be Scala Native).

That said … I don’t think the comparison to scala.js can hold any water. The thing with the JVM and the javascript ecosystem is that they already provide the platform. These platforms are huge, have seen decades of refinements and improvements. Even if you don’t use much third-party libraries, the basic infrastructure which is there is already huge. Maybe it’s too easy to dismiss the fact, that it’s not trivial to write code that can run on so many architectures from the exactly same source code. And this not just the status quo but those platforms are constantly evolving. All these improvements come to Scala mostly for free.

Scala Native cannot give you this kind of platform. In the best case, you have the (huge) pool of C libraries available but you only get the same kind cross-architecture and OS benefits if you only ever rely on cross-platform APIs and recompile and redistribute for every target platform (and sometimes subdivisions of that like Linux distributions or OS versions).

In all of the platforms, there’s some mismatch between what the platform natively offers (JDK, C APIs, Javascript APIs) and idiomatic Scala APIs. So, you will usually need some binding/wrapper libraries. This sounds similar for JVM, JS and native universes but there are huge differences. The biggest difference is memory safety. This is huge. It is really difficult to write safe bindings that wrap unsafe pointers on a GC’d platform. This is far different than providing just an idiomatic wrapper around a JDK or JS library.

My impression is that there are - at the same time - overblown hopes that Scala Native can be a viable platform for building some kinds of tools (but what exactly is often kept in fuzzy terms, the hope stated most frequently is writing “high-performance” code, but that’s perfectly possible already now on the JVM). On the other hand, the maturity of Scala Native is overstated. This is a common marketing strategy and it’s fine for that, sometimes that’s the only way to get funding. What I’d like to see is kind of a reality check which points out the exact niche where Scala Native could thrive right now and then see what kind of work is needed to make it work in that niche and then more general to fit more applications.

Right now Scala Native is not stating a lot of caveats explicitly which would curb those hopes:

  • It only works for certain target architectures (< those supported by JVMs)
  • Startup can be faster (more power efficient) on slow machines with only a single core available, which is only if the application runs shorter than 1 second (or so)
  • Performance in benchmarks seem great, but generally it’s more likely to be slower than running on the JVM
  • A comparatively small part of Scala libraries can be cross-built because the common platform between JVM and Native is too small. In general, you cannot expect an application to run on JVM and Native mostly unchanged.
  • Multi-threading is only supported in a coarse-grained way by starting new processes and using IPC for synchronization. Latency for spawning new processes is magnitudes higher than starting new OS threads or green threads (like tasks on a JVM thread pool).
  • There’s only a small set of GC implementations that are unlikely to perform better than what the JVM has to offer. If your application is memory intensive, you will have to fallback on unsafe native memory operations.
  • If the base library doesn’t support the tools for you application, you will have to interface through C APIs. These are unsafe and you will have to know how to debug SEGFAULTS.
  • The tooling around debugging and profiling is spotty.
  • Scala Native currently only supports Scala 2.11. This is completely irrelevant for current applications as Scala 2.11 and 2.13 frontends and standard libraries are mostly identical. It only matters if you want to use cross-compiled libraries. Supporting a new frontend like dotty will be a huge effort.
  • Scala Native can only then be more memory efficient than the JVM if you do manual memory management. But as long as Scala doesn’t provide language primitives to make that memory management safe (mostly preventing unsafely allocated references from escaping scopes), this will be inherently unsafe. There are few approaches that have managed to find intermediate solutions that are not either unsafe ( C ), somewhat inefficient (JVM, any GC language), or somewhat cumbersome (C++ smart pointers, Rust borrow checker). You could say, at least Scala Native gives you those tools to use unsafe memory access when you need it for performance, but after all, the JVM does the same using Unsafe but almost no one uses it.
  • One particular performance issue (and arguably the only significant one, aside from startup issues) with the JVM is the difficulty of making use of SIMD (like AVX instructions or CUDA). This is an area where Scala Native could theoretically provide good solutions more easily (because it would be cheaper to run native SIMD computation kernels when you are already running natively, than when you first have to get out of the managed environment, which on the JVM is only possible using JNI or intrinsics, or if you have to rely on the JIT to compile to SIMD primitives).
  • Compilation times are far higher than for Scala on the JVM if you need decent runtime performance

The current niche for Scala Native is really narrow. If anything, I’d like to hear reports of people who have used Scala Native successfully and state why the niche is attractive. Many will have run into one of the above limitations, so it would be good to have high-level descriptions of what the most pressing issues are.

In fact, I have the suspicion that many people would like to use Scala Native but haven’t done so far successfully because of stumbling over just one of the above issues. Paradoxically, to me it seems, hopes are even more inflated because people currently aren’t able to actually run significant applications, so they don’t really experience the huge gap between what’s needed and what’s there.

4 Likes

We’ve currently been using Scala Native for research.

Our goal is to compile very fast database implementations from high-level Scala specifications, using metaprogramming. Traditionally, at the end of the pipeline we’d have generated plain C code, compiled with a C compiler. But it’s a lot of work to translate everything the implementation needs to C.

In this context, Scala Native has been a huge boon to productivity, and has allowed things we would not even have considered before, such as allowing queries to use unrestricted Scala idioms. It all gets compiled down to Scala Native, where the DB implementation parts use low-level code with C-like performance, and the higher-level parts from the user’s queries are integrated within the low-level parts thanks to it all compiling via LLVM.

Can’t say for sure this would scale to a production-ready system, but I don’t see a strong reason why not. Though proper support for multi-threading would probably help a lot.

9 Likes

Stating first that SN is not mature yet, and then asking for proofs where SN has been used successfully, I think that’s kind of contradictory. You cannot ask for advanced show cases if the platform is not yet stable.

Also, regarding high-performance. There are other factors, such as low-latency and predictable performance. That’s why you find almost no single mature real-time audio or video platform on the JVM, but many based on C, C++ and Rust. Another factor is memory foot-print. Small computers like Raspberry Pi get better now, but on the not-so-old Raspberry Pi 3 with 1 GB of RAM I very quickly ran into trouble running more complex Scala applications. JVM does eat a lot of memory.

5 Likes

It’s also worth noting that “mature” is a process.

I mean, Scala.js was a mess in the early days: it functioned, but performed so poorly that you couldn’t imagine doing anything serious with it. It took years of hard work for the team to optimize it to be the crisp competitor to native JS that it is now. And it wasn’t practical to use until Haoyi threw his weight behind it, porting his existing tools to it and building new ones to make it usable. I came in as a user just as that was starting to gel, and bet that things would continue to get better – fortunately, that proved absolutely correct.

It’s still relatively early days for Scala Native at this point. I honestly don’t know whether it will turn out to be a serious player in the down-at-the-metal world. But I do know that Scala.js looked pretty implausible in the early days, and has IMO been a real success story, so I’m content to root for the project and hope that it shows itself to be worthwhile.

10 Likes

I agree with all the other comments made by @LPTK, @Sciss, and @jducoeur. When you see how incredibly well Scala.js has evolved, I don’t see why this success should not work out similarly for Scala Native (I think it already has to some extent).

You ask about a niche for Scala Native. For me there is no need for any niche. The major thing for me is that I can write low level code and still use Scala. I want to be able to manually handle memory using pointers for some part of my application. I want to be able to reuse my models when necessary, also in Scala Native, as I do in Scala.js. Using unsafe misc, is a poor hack in comparison, I would guess that is why nobody uses it, except for special purpose software.

I would much rather use Scala Native. Working through the first few chapters of the amazing Scala Native book, I get exactly the same feeling of control that I got when I first used Scala.js, compared to using Javascript. I learned Javascript, JQuery and then Angular before I learned Scala. I have done a bit of coding in C. The pointer stuff is much easier to understand and use in Scala Native.

You don’t need to wrap entire C libraries to use Scala Native, only the parts you need, just as we did in the beginning with Scala.js. It is very similar.

Time will show if people will embrace this initiative. I am pretty sure it will :relaxed:

6 Likes

Good observation :slight_smile: For one, there’s always the possibility that I’m just wrong and so I’m asking for substantiated counterarguments. I googled a bit around and found only GitHub - tindzk/awesome-scala-native: Compilation of Scala Native resources and libraries (and the scaladex page). As usual you will find more libraries than end-user applications of any tech, so that’s why I asked (and I hope there’s some part of Scala Native that works, even if it’s not the one that I tried).

Other than that, “maturity” is not clearly defined and invite people to overestimate the generality of that statement.

Good point. I only got myself recently started with Rust because of experimenting with real time audio stuff. For comparison, afaics there are two major reasons that affect latency and predictability on the JVM:

  • GC pauses
  • JIT non-determinism and resulting performance

Do you have any information whether SN’s GC is optimized for low-pause times? The general advice for low-latency applications on GC platforms is to be allocation free, so that GCs are avoided altogether. The same must be said for native applications since native heap allocation is often even more expensive than allocations in a GC-managed heap (but, admittedly, doesn’t trigger excessive pauses or need twice as much memory).

JIT non-determinism means that you can get different performance profiles on different runs. This makes work on performance optimizations harder on the JVM. A general problem with ultimate performance is that you need to work close to the metal for performance-critical work, i.e. you need to understand the transformations the compiler does (or write machine code directly). Scala on the JVM probably has more layers between the high-level language and the executed machine code than Scala Native but it’s not a given that you can predict performance from Scala code on SN for the last bits of performance.

So, I agree, it would be nice to have low-latency and predictable performance but we should treat that as an explicit feature and without explicit work on it, it’s not a given that Scala Native provides it.

Thanks for sharing. Sounds like an interesting project. Have you tried running on the JVM or with native-image?

Regarding other comments, there seem to be two kinds of arguments:

The first one is “Let’s wish that Scala Native will be this and that and it will be (at some point).” - Wishing is completely fine, it’s the basis of many political decisions. But hey, we are engineers, let’s try to get also into the technical argument and try get a nuanced few into the trade-offs and costs of features and let that also drive the decision making :wink:

The other one (which might be implicitly meant by what looks like an argument of the first kind) is “Scala Native is compiling to bare metal, the consequence is that it must (or will?) have [feature X].” - let’s try to be explicit here. Many of the things people have wished for (high performance, low-latency, low memory footprint) are actually features that need to be built and validated. So far, not even the Scala Native documentation actually promises those features.

At the moment we’re actually relying on LMDB (Lightning Memory-Mapped Database), a low-level C library, to manage the data efficiently. It was pretty easy to define the bindings and use it seamlessly from our code, thanks to Scala Native.

2 Likes

“What Platform or Ecosystem?” is a very good question to ask. I am also skeptical that bootstrapping Scala Native into its own platform will be feasible.

One (to me) interesting possibility is to target the Python ecosystem. By which I mean Python libraries + the native libaries supported by it. There’s a lot of interesting stuff in that area. Swift is making good use of it, and the way this is done is all publicly accessible as far as I can see.

11 Likes

I am surprised that nobody has mentioned native iOS apps.

You can use Scala.js for that with one of the several frameworks that allow you to write iOS apps in JavaScript (like Capacitor but there would be some significant benefit (startup times, no JIT compilation, hopefully performance) to be able to run natively compiled code instead. And here you certainly have a very large ecosystem.

5 Likes

I have been working a bit with Python within scientific computing. I really miss Scala’s type system, and pythons syntax often results in deep nesting of loops and if/else. I think if Scala could wrap those same native libraries we could make really nice, fluent api’s and dsl’s.

4 Likes

Thanks for running this small experiment Johannes, I would be interested in seeing if there are important differences in memory consumption as well. Would you be able to provide such numbers?

(Peak) memory consumption is not really a scalar value with a GC because there’s usually a trade-off between GC (pause) times and peak memory usage. The other problem is that other than for the JVM I don’t really know how to reliably get GC stats (like retained size) for native-image and scala-native. The very blunt tool of using /usr/bin/time -v and then reporting “Maximum resident set size”, I see approx this:

  • java -jar -Xmx250m … : 3s wall clock run time, max RSS: 229MB
  • native-image with -Xmx250: 3s wall clock run time, max RSS: 89MB (though there’s something weird going on, as max RSS goes up, when I decrease -Xmx)
  • env SCALANATIVE_MAX_SIZE=250m ./smaps-reader-out-release-full-lto-thin: 22s wall clock run time, max RSS: 3.2GB (as reported before, with different configurations I could get it to eat so much memory that it killed my desktop environment because of triggering Linux OOM handling)

This again may or may not just point to regexp being currently broken in scala-native. More rigorous benchmarking would be necessary to give more relevant results.

2 Likes

As a bit of background, for the given timings the smaps reader parses all /proc/<pid>/smaps files which during testing were about 40MB of size. A more reasonable approach with more recent kernels is to parse smaps_rollup instead which already aggregates all entries into one per process. With that change the task is much smaller and I get these numbers:

  • java -Xmx25m -jar: 0.8s wall clock (1.66s CPU) time, 65 MB peak RSS
  • native-image with -Xmx25m: 0.32s wall clock (0.3s CPU) time, 13MB peak RSS
  • Scala Native 0.4.0-M2, release-full, thin lto: 0.5s wall clock (0.49s CPU) time, 25MB peak RSS