Scala Native Next Steps

Exciting news! As mentioned above, picocli offers tooling to make it easy to create GraalVM native images for picocli-based command line applications. I would like to learn more to see how picocli can be improved to also make it easy for people to write Scala Native command line applications with it.

Is there a link about requirements for libraries that facilitates their use in Scala Native? Looking at the reflective instantiation support in the OP, it appears that classes need to be annoted with @EnableReflectiveInstantiation before they can be reflectively instantiated in a Scala Native app.

Picocli uses reflection internally to instantiate subcommands and other components. Some of these may be written in Scala and can be annotated with @EnableReflectiveInstantiation, but some of them (like a number of picocli built-in components) are written in Java and do not have this annotation.

So, I hope there will be another mechanism separate from this annotation to register classes that need to be instantiated reflectively. If these requirements are documented somewhere, library authors can prepare to make it possible for their libraries to be used in Scala Native.

2 Likes

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

I’d like to say, do not make things thinking of overtaking/competing with other alternatives. Scala does not compete with Java or Kotlin (in any regard really). Scala.js does not compete with Typescript or JavaScript. And Scala-Native should not try to compete with alternatives.
There’s no award to achieve by competing, and there’s a lot you could lose for trying.
If you need a reason of why Scala-Native over alternatives, it’s simply because it is Scala, and the alternatives aren’t.

Having worked with Rust, I can tell you that it sucks (for a scala programmer). It’s terribly inexpressive, it’s constantly getting in my way and making me do small performance decisions every step of the way, oh, and it totally gave up on the concept of abstractions and just forces on me a text processor (their macro system, which is used for everything really, since the base language lacks abstraction capabilities).

13 Likes

Scala Native itself doesn’t need built-in support for TLS to be useful in server development, it’s possible to build on top of existing servers like NGINX Unit https://unit.nginx.org/

If we are going to pick any language where Scala Native is “competing” (and I don’t think such discussions are fruitful anyways as covered by @rcano), I would rather pick Python instead of Go or Rust. Having worked full-time with Python, I think there’s lot of applications that are using Python today where Scala would be better suited.

I am super excited about GraalVM native-image and I’m already heavily using it in several projects both in open source (Scalafmt) and at work. However, the developer experience with native-image today is not great: it takes a long time to link a binary resulting in slow edit-and-test feedback loops and there are many native-image configuration flags that tweak the runtime semantics, which requires a decent upfront investment to learn. In comparison, Scala Native binaries link significantly faster and you can test them just like you would test a JVM application giving me better confidence that the binary works as expected at runtime.

I think native-image is revolutionary technology and a huge boon to the Scala/JVM ecosystem. I’m also just as excited about the applications for Scala Native. These are not contradicting views.

7 Likes

Just to pick up on this, I work on a lot of scripts, mostly Python.

Python is great, but inevitably you want to do something in parallel: whether parallel HTTP requests, maybe parallel building some docker containers, parallel auto-formatting some source files, whatever.

This is the point at which all hell breaks loose. Multi-threading in Python doesn’t work well, and multi-processing in Python is clunky and fragile. multiprocessing.pool breaks Ctrl-C and makes scripts unkillable unless you jump through hoops with Ctrl-Z and kill -9, and the -9 is often required. Many of these scripts are slow (hence the parallelization) and it’s not uncommon that a script will blow up with a NameError or TypeError after several minutes of execution, with the stack trace mangled due to multiprocessing. Deployment is often a pain: even with tools like PEX to try and make somewhat-hermetic executables, someone will inevitably brew install something that messes up the global python install.

Some of these tools end up being ported to Scala. This works, but it certainly is inconvenient to take a large codebase started in Python when it was small and short-running, and porting it to Scala just because it started becoming large and slow.

This awful experience is the status quo for a lot of our important scripts and automation. They’re not written in Scala because the JVM startup overhead and resource footprint is too much, but by no means is the Python experience “good”. It’s acceptable, and we’re surviving, but there’s a real opportunity for Scala-Native to come in here with nice Futures-based parallelism, proper multithreading, easy collections transformations, type checking to catch dumb errors.

Scala-Native could provide a real alternative to writing scripts in Python, and perhaps be a lot more ergonomic than the other alternatives which are writing your command line tools in Go or Rust (which is also increasingly common)

17 Likes

Thanks for the replies and points of view, @rcano and @olafurpg.

Regarding competition: I’m also not a fan of overly competitive thinking when it comes to languages. From an idealistic viewpoint, I would like to see Scala Native to succeed and I could just be happy that there’s funding for a somewhat risky project.

That said, I am also realistic enough to know that resources are limited. Resources are Scala Center’s developer resources, but also public attention and momentum that are required to drive such a project.

Ultimately, the only way (for anything) to survive is find its (probably temporary) niche. The statement at the top about why Scala Native is relevant and what is missing and will be done is too fuzzy to be useful. You could say this is a classic project management problem. If you want such a project to succeed you will first have to admit there’s an insurmountable amount of work ahead of you, and then you will need to have a very specific idea how to arrive incrementally at something useful as quickly as possible. The statement gives a reasonable roadmap for the short-term but doesn’t say what reasonable short- and mid-term goals are and how to reach them.

My assumption behind this is that Scala Native currently isn’t useful (enough). Here again the question is whether you look at “being useful” as an isolated absolute concept or relative to what’s going on elsewhere. Let’s take the stance, that it is useful if, given a systems programming task, you would choose Scala Native over anything else. Let’s look at a few scenarios:

  • A simple short-running command line application (dealing mostly with file IO and nothing else, no): yes, Scala Native is applicable even now, you can program in Scala, basic APIs are there
  • A short-running command line application that is CPU intensive: not so much, since there’s no support for multi-threading to spread the work
  • A tool that connects to other service over the network, not so much, since protocol implementations are missing
  • A low-level tool that interfaces with the kernel (basically standard C usage) or C libraries: somewhat applicable and performant but inherits all the bad parts from C (memory unsafety, clunky APIs, etc.) while even missing concurrency primitives for the Scala parts. If you compare that with just doing the same using Scala on the JVM using JNA as FFI, you don’t win much. Using JNA is somewhat uncommon but it works on the JVM if you don’t have the utmost performance or resource requirements. With native-image, you can also solve these with SubstrateVM’s FFI.
  • A bigger, long-running application: plus points for being able to write in Scala, minus points for missing multi-threading, missing connectivity, Scala on the JVM already provides all these features with good to perfect performance characteristics (but the well-known bloat wrt startup times and memory usage)

Which use cases are missing and which of these use cases are the primary target for Scala Native?

For me personally, the current answer whether to use Scala Native is “no” (aside from playing around) because there’s nothing useful enough I can do with it. I have a project for quick Linux memory analysis (doing some custom aggregation) that would come closest. That currently runs a few seconds per invocation. But even there, most of the runtime is because its functionality (parsing files in /proc) is not optimized to be fast. Despite it being a short-running program, the JVM imposed runtime cost is there but not relevant enough to try hard to get rid of it.

So, I guess what I’m asking for are more concrete ideas about applicability and then also some data that would support those ideas.

There’s not only servers… And also, just having a C-API doesn’t make it a ready-to-use library for Scala Native.

If you are coming from Python you could also use Scala on the JVM, couldn’t you?

Yeah, native-image times are really bad (but that’s an ongoing struggle for any compiled language). Would be interesting to see some apple-to-apple comparisons for that particular aspect (but, of course, it’s also somewhat expected that Scala → native translation can be faster than going via bytecode first, the question is whether the downsides are worth it).

2 Likes

After I wrote this I just saw this article about startup time improvements with native-image: Static Compilation of Java Applications at Alibaba at Scale | by Alina Yurenko | graalvm | Medium. So, I realize that my impression of the JVM-startup being fast enough for most scripts is rather biased by using a (7+ year old but still) reasonably fast laptop where there’s usually enough spare cores available to run the JVM JIT and GC in parallel. Other deployment scenarios where you have to pay (in latency and money) for these extra resources like FaaS, slimly deployed containers or VMs, or embedded or small devices might not have that luxury and would benefit a lot more from more efficient startup.

3 Likes
  • a number crunching application that runs on wasm either in the browser or on the desktop
1 Like

I had a quick try with that project (200 lines of code, uses scala collections, file IO, and regular expressions, no dependencies). Here’s a comparison:

  • AdoptOpenJDK 8, 2.5 seconds (4.7s CPU time)
  • Graal native-image JDK 11 20.1.0, 2.29 seconds (same CPU time)
  • Scala native 0.4.0-M2: 20-50 seconds depending on settings (many settings fail because of a memory leak)

Build times:

  • scalac + assembly: 5s
  • native-image, on top of above: 30s
  • Scala Native: 13s in debug mode (leading to >50s runtime), 200 - 300s in release-full mode (20s runtime), 30s in release-fast mode (which runs until the machine runs out of memory)

In the best case, Scala Native’s results are due to a problem with the regex implementation. As far as I got with profiling Scala Native code, much time is spent in GC which might be a problem in itself or a consequence of a potential memory leak.

In any case, I find this statement somewhat optimistic…

1 Like

IIUC Scala Native requires all of the source code to be written in Scala. You can’t just put ordinary JARs on classpath and use Scala Native to compile them to LLVM IR and then native executables. I don’t know if there are any workarounds or plans to change that situation.

Not being able to include JARs that are not written in Scala in a Scala Native app is a massive limitation… :frowning: Bit of a show-stopper that…

Can’t we decompile a JAR to Scala and then compile to LLVM IR? That is a bit tongue-in-cheek, but surely there are some bytecode patterns that cause problems and others that are fine. Libraries that adhere to certain rules could be included that way.

If there is no way around this limitation, one idea that could offset this (in terms of value proposition) would potentially be cross-compilation, as GraalVM does not have that (and it is not on the road map; the GraalVM team seems to have given up on the idea of cross-compilation).

I’m not an expert on Scala Native compilation process, but AFAIK Scala Native acts as a compiler plugin for Scala compiler and collects some vital information long before emitting Java bytecode and metadata inside *.class files. I think bending Scala Native to do decompilation is probably not an easy path.

I’m not sure how cross-compilation is relevant here, but:

  • AFAIU GraalVM native-image can compile bytecode produced by Scala, but that would bypass Scala Native entirely.
  • SubstrateVM from Graal project has its own garbage collector and objects’ representation. Scala Native also has its own garbage collector and objects’ representation. Thus sharing normal objects between them seems impossible.
  • However both GraalVM and Scala Native provide a FFI (foreign function interface) which could be used as an interface between native code produced by GraalVM native-image and native code produced by Scala Native. But that would be cumbersome as you would need to use raw C strings and C structs to pass information between these native code parts. OTOH if you were able to implement C ABI in picocli then it would be usable not only from Scala Native but also from any other language including C, C++, Rust, C#, Python and so on. OTOH (how many hands do I have?) that would be weird, time consuming for picocli authors and maybe not worth the effort in the long run?

Overall, the situation seems complicated and I don’t have enough insight to it as I’m only a bystander.

I guess that if it were feasible to recompile a random jar, it would have been done a long time ago for scala.js.

2 Likes

I have very mixed feelings regarding Scala Native, Scala.js and any other attempt to target a platform other than the JVM.

Let me say I am very impressed by these efforts and what they have achieved, and I’m sure the people behind it are very smart and diligent.

If these were just experimental research projects, everything would be fine and there was no problem.

However, these projects are already creating a pressure to make the Scala language and the Scala standard library more “platform agnostic”, and this is where such efforts become a liability.

One of the strongest selling points for Scala is that there is a huge ecosystem of Java libraries that we can easily integrate. Well, if we compile Scala to bytecode and run it on the JVM, that is. Neither Scala Native nor Scala.js allow using Java libraries in general, infact, they do not even support some of the most popular parts of the Java Standard library. I have always been heavily using Java libraries, so that is a total showstopper.

The lack of support for Java libraries is due to fundamental obstacles. Scala, Java and all the other JVM languages all share a common set of design principles including automatic garbage collection, customizable classloading, separate compilation, generics through type erasure (but not for arrays!), lack of direct memory access, and reflection. These principles are an adaptation to running on the JVM, and they do not make sense for targeting another platform.

On the JVM, everything is at runtime either a primitive value or an object that has a getClass method, or an array. Much code relies on this, but it is true on no other platform.

For this reason, much that normally works in Scala (i.e. on the JVM) will never work in Scala.js or Scala Native or any other attempt to compile Scala to another platform. Even things that do work on other platforms will often be nothing more than a fragile and leaky abstraction forcing the user to be aware of the underlying implementation.

I have been using Scala for seven years and I tried Scala.js, but the next time I will rather use JavaScript than Scala.js. Should I ever need native code, I suspect I would rather use C++ than Scala Native.

To Scala.js and Scala Native: keep up the awesome work, but please do not expect the Scala to become “platform agnostic”. Scala will only work well on the JVM for the foreseeable future, so that should be the priority when it comes to designing the Scala language or the Scala library.

1 Like

@curoli:
I disagree. Stating that targeting native environment is somewhat against Scala design principles is like stating that GraalVM’s native-image is against Java design principles, so it doesn’t make sense to go for native-image.

Let me address some of the issues:

  • garbage collection is present both on JVM, JavaScript and some languages typically compiled to native code (e.g. Go and Haskell).
  • classloading customizations aren’t usually done inside applications (i.e. I’ve never seen anyone going for classloading gymnastics in deployed application). SBT uses some classloading tricks, but SBT works for Scala, Scala.js and Scala Native already.
  • type erasure exists when e.g. translating TypeScript to JavaScript, but TypeScript is taking over the frontend world anyway.
  • specialized arrays exist in JavaScript too - TypedArray - JavaScript | MDN
  • lack of direct memory access is present on JVM, JavaScript and also on some languages typically compiled to native code (Go, Haskell).
  • reflection is partially supported for Scala Native, Scala.js and AOT compilation using GraalVM’s native-image. In all cases it need some upfront configuration, but that’s still probably better than situation in C++.

There are plenty of Scala libraries compiling using Scala.js already and it doesn’t seem to me that Scala.js slows down Scala language evolution substantially. Browser based applications also usually do not need functionality typical for backend. For example you don’t use JDBC or server sockets in frontend, because of the sandboxed environment. Even if you can’t compile under Scala.js some apps that compile under Scala JVM it doesn’t mean Scala.js is pointless. Microsoft created Blazor WebAssembly which allows you to run C# on client side and there’s hype in .NET community. But not all code can run under Blazor. Does that make Blazor pointless?

7 Likes

I strongly disagree. Scala.js brought to light the general power of the language, and the ways in which the language is not platform-dependent.

Plain and simply: I’ve been working in JavaScript literally since before it had that name. I also built one of the first production UIs in Scala.js. With that context: using JavaScript libraries from Scala.js works far better than doing so from JavaScript itself – it’s easier to use, and results in vastly more maintainable code. It’s the best programming language I’ve found for serious browser application work.

That’s precisely because we have a reasonably clean distinction between the language and the integrations. Scala on the JVM works well with JVM libraries. Scala in the browser works well with JavaScript libraries. Trying to paint this as an either/or is, IMO, an unnecessarily restrictive lens…

20 Likes

@jrudolph, why don’t you see these obvious things of having a really good scala native support?

When you start creating an application, you do it like a script, so Scala has the same light syntax as Python, you create something really fast that already works great, lets say on the jvm.

Then some part needs to be visualised, you use ScalaJS for that part. You don’t need to learn a new language, no need to learn a java script framework like angular or react, you don’t need to rewrite your code in javascript or TypeScript or whatever. You save a lot of time! Using the scalajs jquery wrapper makes this super easy.

You then need some super high performance processing of these same data, and you use Scala Native for that part. You don’t need to learn a new programming language like rust or go (although maybe you have to look at the excellent Scala Native books first few chapters)

Well, there is not multithreading at the moment, but if this is parallelizable or batchable, you can spin up these parts in separate processes (that is really easy to do) and merge these together in the end. There are plenty of examples where this would be a valuable approach.

Being able to use jars in Scala Native is not very relevant is it? Have you ever looked at all the amazing libraries being created in the C or C++ programming language? Why would you want to use jars, when you can use superfast low level C apis? I guess it would also be very easy to integrate with Rust and Go libraries if they have a C api available.

Not to forget the ability for Scala to stand on its own feats in the future, without being dependent on the jvm or needing to pay license money on the oracle jvm or daalvik.

I truly believe that ScalaJS and the Scala Native platform will be what brings Scala forward after dotty has been launched.

6 Likes

Couldn’t have been said better! Working with Scala.js in the browser is amazing :slight_smile:

4 Likes

Hi All,

I don’t know much about this topic, but I have been thinking about it in a abstract way for a while (planning on learning more)!

I was actually thinking about the old java motto;

Write once RUN everywhere

Which now (in the world of JS, GWT, Java, Scala (JS, JVM, Native) & Kotlin (JS, JVM, Native)) might be how to;

Write once and IMPORT / USE everywhere

Do you all think we could build some sort of standard in the larger community (IETF?, Java, Scala, Kotlin) to facilitate this daydream of;

Write once and IMPORT / USE everywhere

Cheers,

Scott

I disagree. I also use Scala 99% on the JVM, but in the few cases where I actually needed code to run in the browser, Scala.js was such a blessing; it worked quite seemlessly (and that’s a few years back), and I would have failed miserably if I had to use JavaScript.

Again, I would disagree. At least what I saw for Scala.js, it was working very well. I don’t think that the “Java library ecosystem” is as important in general as you portrait. Sure, if I analyse a larger application of mine, at the very bottom, there are probably still some Java libraries, but the huge majority of dependencies is already 100% Scala. I also don’t think you want to write the same kind of apps for Scala.js and SN than the ones for the desktop on JVM. Even very large projects such as Akka / Akka Stream now work under Scala.js

7 Likes

To get that also out of the way: I hope I don’t sound dismissive of the original efforts put into Scala Native by Denys and the contributors. Without this huge chunk of work, there wouldn’t even be something to discuss here.

And to phrase something I said before also a bit differently: I would very much like to see more ways to write low-level code with Scala (but it doesn’t necessarily has to be Scala Native).

That said … I don’t think the comparison to scala.js can hold any water. The thing with the JVM and the javascript ecosystem is that they already provide the platform. These platforms are huge, have seen decades of refinements and improvements. Even if you don’t use much third-party libraries, the basic infrastructure which is there is already huge. Maybe it’s too easy to dismiss the fact, that it’s not trivial to write code that can run on so many architectures from the exactly same source code. And this not just the status quo but those platforms are constantly evolving. All these improvements come to Scala mostly for free.

Scala Native cannot give you this kind of platform. In the best case, you have the (huge) pool of C libraries available but you only get the same kind cross-architecture and OS benefits if you only ever rely on cross-platform APIs and recompile and redistribute for every target platform (and sometimes subdivisions of that like Linux distributions or OS versions).

In all of the platforms, there’s some mismatch between what the platform natively offers (JDK, C APIs, Javascript APIs) and idiomatic Scala APIs. So, you will usually need some binding/wrapper libraries. This sounds similar for JVM, JS and native universes but there are huge differences. The biggest difference is memory safety. This is huge. It is really difficult to write safe bindings that wrap unsafe pointers on a GC’d platform. This is far different than providing just an idiomatic wrapper around a JDK or JS library.

My impression is that there are - at the same time - overblown hopes that Scala Native can be a viable platform for building some kinds of tools (but what exactly is often kept in fuzzy terms, the hope stated most frequently is writing “high-performance” code, but that’s perfectly possible already now on the JVM). On the other hand, the maturity of Scala Native is overstated. This is a common marketing strategy and it’s fine for that, sometimes that’s the only way to get funding. What I’d like to see is kind of a reality check which points out the exact niche where Scala Native could thrive right now and then see what kind of work is needed to make it work in that niche and then more general to fit more applications.

Right now Scala Native is not stating a lot of caveats explicitly which would curb those hopes:

  • It only works for certain target architectures (< those supported by JVMs)
  • Startup can be faster (more power efficient) on slow machines with only a single core available, which is only if the application runs shorter than 1 second (or so)
  • Performance in benchmarks seem great, but generally it’s more likely to be slower than running on the JVM
  • A comparatively small part of Scala libraries can be cross-built because the common platform between JVM and Native is too small. In general, you cannot expect an application to run on JVM and Native mostly unchanged.
  • Multi-threading is only supported in a coarse-grained way by starting new processes and using IPC for synchronization. Latency for spawning new processes is magnitudes higher than starting new OS threads or green threads (like tasks on a JVM thread pool).
  • There’s only a small set of GC implementations that are unlikely to perform better than what the JVM has to offer. If your application is memory intensive, you will have to fallback on unsafe native memory operations.
  • If the base library doesn’t support the tools for you application, you will have to interface through C APIs. These are unsafe and you will have to know how to debug SEGFAULTS.
  • The tooling around debugging and profiling is spotty.
  • Scala Native currently only supports Scala 2.11. This is completely irrelevant for current applications as Scala 2.11 and 2.13 frontends and standard libraries are mostly identical. It only matters if you want to use cross-compiled libraries. Supporting a new frontend like dotty will be a huge effort.
  • Scala Native can only then be more memory efficient than the JVM if you do manual memory management. But as long as Scala doesn’t provide language primitives to make that memory management safe (mostly preventing unsafely allocated references from escaping scopes), this will be inherently unsafe. There are few approaches that have managed to find intermediate solutions that are not either unsafe ( C ), somewhat inefficient (JVM, any GC language), or somewhat cumbersome (C++ smart pointers, Rust borrow checker). You could say, at least Scala Native gives you those tools to use unsafe memory access when you need it for performance, but after all, the JVM does the same using Unsafe but almost no one uses it.
  • One particular performance issue (and arguably the only significant one, aside from startup issues) with the JVM is the difficulty of making use of SIMD (like AVX instructions or CUDA). This is an area where Scala Native could theoretically provide good solutions more easily (because it would be cheaper to run native SIMD computation kernels when you are already running natively, than when you first have to get out of the managed environment, which on the JVM is only possible using JNI or intrinsics, or if you have to rely on the JIT to compile to SIMD primitives).
  • Compilation times are far higher than for Scala on the JVM if you need decent runtime performance

The current niche for Scala Native is really narrow. If anything, I’d like to hear reports of people who have used Scala Native successfully and state why the niche is attractive. Many will have run into one of the above limitations, so it would be good to have high-level descriptions of what the most pressing issues are.

In fact, I have the suspicion that many people would like to use Scala Native but haven’t done so far successfully because of stumbling over just one of the above issues. Paradoxically, to me it seems, hopes are even more inflated because people currently aren’t able to actually run significant applications, so they don’t really experience the huge gap between what’s needed and what’s there.

4 Likes