new-Vector performance on Scala.js

For the benchmark numbers, I generally see the same as you mention, but it there are huge differences between browser. For example, the in the “collection building” benchmark

  • Chrome seems to be really good with the old Vector, and gets up to 3x slowdown for the new one
  • In Safari, the new is only slightly slower than the old Vector, beating chrome by far on the new (but slightly slower than Chrome on the old)
  • Firefox is the slowest of all browsers, and gets up to 40% slower for the new Vector
1 Like

On the balance, does this give us pause about going forward with the new Vector?

I think it probably shouldn’t, because:

  • 90%+ of Scala usage is on the JVM
  • some of the slowdowns on Scala.js are dismayingly large, but some of the speedups on the JVM are large, too
  • in general, on any platform, users who are manipulating large amounts of data and really care about performance may need to make platform-specific and Scala-version-specific decisions about what data structures to use; this is just one example of that.

Regardless, the benchmark numbers will remain valuable as guidance for Scala.js users who care about performance — thanks, David, for that.

The final call on this isn’t mine to make, but we do need to make a call before much longer, as the last big 2.13.2 PR recently landed so we might be ready for release as early as next week.

4 Likes

I suppose it’s not an option to have both implementations in stdblib, and maybe even somehow default JVM to one and JS to the other (but both implementations exist everywhere and can be accessed directly if needed)?

1 Like

I was hoping the results might elicit some insight from @sjrd about what the story is, and/or the PR author would jump in and confirm the JVM-side is (or isn’t) fine.

I’d suggest that before signing off on 2.13.2 we understand whether the slowdown we’re seeing only occurs in JS-land, or whether it occurs on the JVM too.

If it’s a JS-only slowdown then it would make sense to release it and then probably the Scala.JS team will wave their magic wand over time to provide further optimisations. If it’s the case that I’m benchmarking cases that haven’t been considered and it’s a universal slowdown affecting the JVM too, then it might be an idea to work out what the fix will require before we lock ourselves in to binary backward-compatibility.

1 Like

My feeling is it shouldn’t block the rollout. Not only is JVM Scala much bigger, I would also argue:

  • The use cases are also a lot more performance sensitive. Frontend code tends to be more limited by complexity than perf

  • And a lot of browser work that is performance sensitive is not bound on Scala.js execution speed: script download speed, script parsing speed, JS library speed (e.g. React), network round trips (front code tends to be ajaxy), DOM perf, CSS rendering perf (my personal most recent Scala.js perf battle)

  • Browser perf is a mercurial thing. While the JVM behaves the same yesterday as today, browser optimizations change rapidly and perf for the same code can swing around massively over time, nevermind different browsers. IME the standard perf advice is to try to optimize code “semantically”, but not to micro-optimize with the browser optimizer in mind as such wins or losses tend to be fleeting.

4 Likes

This is pure speculation, but a possible reason for the performance regression is the new design with 7 subclasses. Vector is final in 2.13.1, so operations on a vector (e.g., appended) have a single implementation. In the new design, each subclass overrides appended.

flatMap is not overridden in Vector, the implementation uses the new VectorBuilder which was heavily optimized and brings significant perormance benefits compared to the old vector on JVM. This seems to translate to JS too.

More concerning than the performance regression is the TypeError: this.Zd.c[c] is undefined that shows up on one of the benchmarks – @sjrd, what are your thoughts on this?

1 Like

Doesn’t Scala.js already replace some small parts of the std library with custom implementations? But maintaining a custom implementation of something as complex as Vector might not be feasible.

I don’t know enough about the implementations (old and new) of Vector to make any reasonable guess.

In general, significant performance profile differences between JVM and JS can be observed for two main reasons:

  • Preventing Scala.js from inlining and stack allocating. The JVM does this at run-time based on profiles. Scala.js can only do that at link time using static information, but when it can, it’s better at it. Sometimes a method wouldn’t be inline by the JVM, possibly preventing class allocation, but Scala.js would inline it. Making the code profile different could still not inline on the JVM but also would prevent Scala.js from inlining as well. This can result in performance drops in Scala.js that are not observable on the JVM.
  • Type tests on interfaces. They are slow on JS. Every time you do a type test with an interface type to “optimize” some code paths, it makes it worse on JS.

More concerning than the performance regression is the TypeError: this.Zd.c[c] is undefined that shows up on one of the benchmarks – @sjrd, what are your thoughts on this?

Is this only in fullOpt mode? What’s the corresponding error message in fastOpt mode? The message would more intelligible. This looks like an ArrayIndexOutOfBoundsException to me, but the fastOpt code would show that for sure (because they’re checked in fastOpt mode).

I suppose it’s not an option to have both implementations in stdblib, and maybe even somehow default JVM to one and JS to the other (but both implementations exist everywhere and can be accessed directly if needed)?

No, it’s not an option, because I don’t have the bandwidth nor knowledge to maintain an entirely different version of Vector.

1 Like

Huh! :scream: It appears the new Vector intentionally catches ArrayIndexOutOfBoundsExceptions. That is a big big no-no in Scala.js! It won’t work. AIIOBEs are undefined behavior. This is absolutely, utterly terrible.

Please don’t ship that!

Also, this can explain the performance regressions in JS, because try..catches are expensive in JS, even when they don’t actually catch anything.

1 Like

OK, let me tone that down a bit. It only catches ArrayIndexOutOfBoundsExceptions to then throw more IndexOutOfBoundsExceptions and NoSuchElementExceptions. So cases for UBE in Scala.js would throw UBEs in fastOpt (instead of IIOBE) but that can be considered acceptable. It does mean that the contract of Vector is worse in Scala.js: calling .head on an empty vector is UB, now, instead of reliably throwing NoSuchElementException. The error message of the UB is also worse because it mentions internal implementation details of an array index exception, instead of a NoSuchElementException. NoSuchElementExceptions were never considered UB before, so this is a breach of portability contract nevertheless.

It could still explain the performance regressions.

Another possibility is that mega-morphic dispatch of the new Vector is problematic on some JIT’s but less so on others. The old vector implementation specifically avoided generating subclasses for simple cases in order not to run into that problem. Maybe the JVM has gotten better at megamorphic dispatch (Graal certainly falls into that camp), but JS JITs haven’t? I am only guessing here.

1 Like

I submitted the issue https://github.com/scala/bug/issues/11920 as well as two PRs https://github.com/scala/scala/pull/8827 and https://github.com/scala/scala/pull/8828.

2 Likes

artificial intelligence’s out of body experience

Not knowing anything about JS performance tuning, this is my best guess, too. There is no exception handling involved in appended and prepended. Both are as straight-forward as they could be in the new implementation: a single megamorphic call up front (which wasn’t the case in the old implementation) and then some auxiliary calls to small methods which are all monomorphic and inlinable.

What does “undefined behaviour” mean in this case? Your further explanations seem to indicate that it reliably throws an exception but a different kind. Is this correct? Why can’t it throw the right one?

Could we have separate code paths for JVM and Scala.js? (Hey, wasn’t there some kind of preprocessor SIP for this kind of thing? :thinking:)

It looks like performance on JVM and Scala.js is very much at odds here and there is no single solution that will be fast on both. On the JVM an exception handler is essentially free but we save a few percent (in apply performance tests) for not doing the bounds checking twice.

BTW, does Scala.js honor @inline annotations? All JVM benchmarks were done with full inlining (as used by a normal Scala release build) and we rely on inlining by the Scala compiler for small methods for best performance.

That is explained at https://www.scala-js.org/doc/semantics.html#undefined-behaviors. In fastOpt mode (only), it reliably throws a org.scalajs.linker.runtime.UndefinedBehaviorError, whose getCause() is the exception that should have been thrown. This is only for debugging purposes. In fullOpt mode, however, this becomes unchecked for performance reasons, and the optimizer/compiler is allowed whatever it pleases with it (including removing the code, or let it pass without exception, or whatever really).

That increases maintenance burden and decreases testability. Every time we have a different path in the JVM and in JS, that increases the chances that a bug in one goes undetected and that libraries are not portable.

Scala.js honors @inline, even more than Scala/JVM. And that is always, not just using some flags. Our optimizer is better than scalac’s optimizer, because we always have a closed world assumption.

3 Likes

The implementation was now changed to no longer catch ArrayIndexOOBE (#8827 and #8829). It would be really nice to check if that fixes the TypeError and changes the performance (bounds checks might be cheaper than exception handlers on JS, according to @sjrd).

@japgolly could you maybe do the magic again? The current version number is 2.13.2-bin-ca30256.

@lrytz no worries, updated! https://japgolly.github.io/scalajs-benchmark

Also did we find out whether v :+ a / a +: v experience a big performance regression on the JVM too?

2 Likes

@japgolly Could you double-check that the website was indeed updated? It still mentions the commit b4428c8, which was before the recent changes?

@sjrd Try a force-reload. I think Github Pages doesn’t do caching properly. I can see ca30256 when I view it.