Scala Native Next Steps

@kostaskougios Hum, I’m confused by your experience. Scala Native does have a garbage collector. If there really are memory leaks for simple transformations on lists, that would definitely be a bug. I have not heard other reports like that, so if you could provide a reproduction, that would be very helpful.

2 Likes

Oh yes sorry you’re right, just managed to use the scala lib’s List and saw that native uses a GC.

The recommended version is 0.4.0-M2 because it is much better than the 0.3.x series even though it is pre-release.

5 Likes

Super excited to hear this. I’ve gone back and forth on whether to use scala native or jextract for a project I’m working on, and always landed on jextract cause scala native appeared fairly dead.

Anything on WebAssembly support?

4 Likes

No, WebAssembly is not directly in our plans. Wasm still lacks today primitives required for efficient implementation of GCs, so it’s not worth putting core resources into it. Experimentation by third parties is welcome, of course.

4 Likes

There is a demo here for Web Assembly if you would like to experiment.

1 Like

There is a keynote by sjrd, you can check it out on youtube.

thanks, I am aware of this, but I cannot run a project on something that is uncertain to work in the future, so I was hoping there is some sentiment to put wasm support into the mainline SN. but then of course it would be competition to sjs :wink:

You can think of wasm applications which do not require a GC, just like people were using SN for terminal applications with No-GC. I don’t see how this is a prohibitive criterion. By this argument, Rust or Kotlin for wasm wouldn’t have a reason to exist, either?

Yes, but Rust does it’s own memory management no?

Rust doesn’t need a GC. But for Scala, no GC only makes sense for short-lived applications. Web pages can be long-lived.

That said, there are use cases for WASM other than the browser, so it does have uses.

People in this thread interested in WASM support, what is your intended use case?

I would be interested in real-time audio applications in the browser, interfacing to AudioWorklet. This could even be dual-native, running as well outside wasm with a native audio API back-end.

I’d like to point out that, so far, there is no evidence that Scala Native compiled to wasm with a custom GC (using slowed down encodings to work around the limitations of wasm) would be any faster than Scala.js using the built-in GC and JIT of the browsers.

5 Likes

Btw, WASM is a target for smart-contract bytecode in proposed Ethereum-2.0, EOS, Polkadot edgeware, and several other blockchain systems. It looks like WASM will become a de-facto standard for smart-contract VM.

1 Like

I think this is amazing news!! I work currently in fields of Bioinformatics, and the possibility to interop with C is essential for performance reasons.

I also think lambda functions in the cloud could benefit from this, but I don’t know much about that :blush:.

2 Likes

While a custom tracing GC implemented in WASM MVP would surely be slow there is a way to work around that. Scala-Native could implement unmanaged mode (for lack of a better word) in which access to managed heap is restricted (and also causing some temporary slowness). That unmanaged mode could be modeled after JNI of Java platform or something simpler can be devised.

First let me describe how I understand how OpenJDK works:

  • it has multiple tracing GCs to choose from
  • such GCs can move objects throughout their lifetimes
  • it supports multiple threads so it has to ensure thread safety of its operations
  • to reduce GC overhead in multithreaded environment stop-the-world (STW) pauses are employed
  • to achieve STW pause all threads must be stopped
  • threads are stopped at the so called safepoints, which are littered throughout the generated (by JIT compiler) native code
  • when all application threads are stopped GC is free to move objects around and rewrite references to them without worrying about data races
  • safepoints are efficiently implemented using page fault trapping, i.e. page fault trapping is very expensive when it happens, but since safepoints are rarely invoked then the amortized cost for safepoints checks is very low
  • some threads can execute unmanaged native code (let’s consider only code invoked through JNI interface) while e.g. GC tries to achieve STW pause
  • it turns out threads executing native code don’t have to be stopped right away during STW pause because unmanaged native code can’t directly access managed Java objects
  • unmanaged native code can access managed Java objects only through JNI API which takes care about safepoints (i.e. it waits for STW pause to end) and overall safety of such operations
  • JNI API is relatively slow but the idea of JNI is to have a long running unmanaged code that does few or no JNI API calls

Scala Native could have something similar, i.e. split between managed native code and unmanaged native code. Let’s see what would be similar to situation in OpenJDK if we want to implement JNI-like mechanism in Scala Native:

  • there would be separate managed and unmanaged heaps
  • safepoints and direct managed heap allocation would only be present on managed code
  • unmanaged code would be free of safepoints and direct managed heap allocations
  • unmanaged code that doesn’t access managed heap would run as fast as there was no GC present
  • unmanaged code would need some ugly boilerplate (or sophisticated trickery) to access managed objects
  • accessing managed objects from unmanaged code would entail relatively high overhead, but as in JNI the point of unmanaged code is to mostly avoid accessing managed heap

Above scheme has one big drawback - almost everything in Scala library expects a GC as objects are never directly freed (well, managed platforms prohibit explicitly deleting managed objects anyway). Therefore unmanaged Scala Native code would need separate standard library, plenty of macros to rewrite e.g. for comprehensions to code that doesn’t generate garbage, etc

Given the complexity of above scheme I propose a much simpler one. First, forget everything above as it’s not relevant anymore. Second, let’s see how the new idea look like:

  • there’s still split between managed code and unmanaged code (e.g. unmanaged code could be annotated by @unmanaged annotation)
  • unmanaged code doesn’t contain any safepoints, memory barriers, pointer healing or any other GC related awareness (except new keyword which obviously need to allocate in managed heap which is under GC control)
  • there’s no penalty (neither in ugliness nor in performance) for accessing managed objects from unmanaged code
  • most important point: the only difference that unmanaged code brings it that it disables garbage collection entirely (on first transition from managed code to unmanaged code) until the end of that unmanaged code
  • when no unmanaged code is running and there are currently no stack frames related to unmanaged code then GC is enabled back again and can collect any garbage
  • no garbage collection during unmanaged code execution means no need for thread synchronization, so code can run at full speed, but also we risk running out of memory
  • unmanaged code is reentrant and there is a thread local count of sequences of unmanaged stack frames - only when that count goes back to zero (for all threads) the GC is enabled back
  • turning garbage collection on and off (globally) is costly so it shouldn’t be done often - this is similar to JNI where very short JNI calls have too high overhead to be profitable at all

Example:

object Main {
  val uCnt = new ThreadLocal(0) // unmanagedCount

  def main(args: Array[String]): Unit = {
    // uCnt == 0, GC is turned on
    val greeting = prepareGreeting(args.head)
    // still uCnt == 0

    // uCnt += 1 due to transition from managed code to unmanaged code
    // since we've changed between uCnt == 0 and uCnt == 1 we must call GC
    // in this case block him from collecting garbage
    printLn(greeting)
    // uCnt -= 1 due to transition from unmanaged code to managed code
    // since we've changed between uCnt == 0 and uCnt == 1 we must call GC
    // in this case inform it that current thread doesn't required blocking of GC activity

    // uCnt == 0, we're entering this method with GC turned on
    managedA()
  }

  def prepareGreeting(who: String): String =
    s"Hello, $who"

  @unmanaged
  def printLn(line: String): String =
    println(line)

  def managedA(): Unit = {
    // incrementing uCnt on transition from managed to unmanaged
    // uCnt: 0 --> 1 : GC must be blocked, we need to call GC to do that
    unmanagedB()
    // decrementing uCnt on transition from unmanaged to managed
    // uCnt: 1 --> 0 : GC was blocked and current thread doesn't need that blocking now
    //  we need to call GC to let him know that
    //  if all threads have uCnt == 0 then GC can and should be enabled again
  }

  @unmanaged
  def unmanagedB(): Unit = {
    // calling managed code doesn't change uCnt
    // uCnt: 1 --> 1 : GC was blocked and must stay blocked, so no GC call needed
    managedC()
  }

  def managedC(): Unit = {
    // incrementing uCnt on transition from managed to unmanaged
    // uCnt: 1 --> 2 : GC was blocked and must stay blocked, no GC call needed
    unmanagedD()
    // decrementing uCnt on transition from unmanaged to managed
    // uCnt: 2 --> 1 : GC was blocked and must stay blocked, no GC call needed
  }

  @unmanaged
  def unmanagedD(): Unit = {
    // uCnt stays the same as we're calling unmanaged code from unmanaged code
    unmanagedE()
    // when inlining managed code to unmanaged code, the managed one becomes unmanaged
    //   as we know that GC is stopped anyway
    managedF()
  }

  @unmanaged
  def unmanagedE(): Unit = {
    ... // something
  }

  inline def managedF(): Unit = {
    ... // something
  }
}

Tell me if that makes any sense and if I understood the problem correctly.

Update:
Actually, @gcBlocking would be a better annotation name than @unmanaged in that second (simpler) proposal.

I don’t see anything obviously wrong with what you suggest. However, that would be a very significant departure from Scala Native’s core design and philosophy. That is not a direction I am willing to take, but if someone would like to try and research that direction, they can do so.

Where are the Scala Native’s core design and philosophy listed? How it is violated by my second simpler proposal (i.e. @gcBlocking methods)?

That @gcBlocking methods would not only be faster (not counting transitioning between enabled and disabled GC) under WASM environment, but also in regular non-sandboxed ones (i.e. native x86, ARM, etc code). It would provide speedup regardless of GC implementation (given it would be a tracing GC which is practically almost always the case under JIT compilers).

1 Like

Is there any plan for Windows support?

It will also be awesome to have a GUI library. There’s a SN port for GTK but it seems to be abandoned.