JVM impovements to type checking

Hi, this is my first message to the Scala community.

I’ve been working to improve the runtime performance of the JVM instanceof. I don’t know if Scala utilizes instanceof interface checks for runtime checking, so this post may be irrelevant. However, my first PR changes instanceof checks from O(n) to nearly O(1). Apart from anything else, this can dramatically improve the performance of Java pattern matching switch statements.

So, my question: does this affect Scala? Do you use Java interfaces at all, and do you use instanceof checks on them? One limitation that might be relevant is that my changes so far only benefit classes that implement fewer than 64 interfaces.

13 Likes

Hey thanks for posting, this sounds great!

As for instanceof, A lot of scala code uses the pattern matching idiom, e.g. to break down a scala.List into either :: (cons cell) or Nil (empty list), which is based upon instanceof at runtime. Typically these are class types rather than interfaces, but I imagine there may be a lot of people matching on various mixin interfaces.

e.g. here we can do a different algorithm on a sequence collection by inspecting its runtime type.

def handle(col: Seq[Person]) = col match
  case col: LinearSeq[Person] => // do some linear optimised algorithm
  case col: IndexedSeq[Person] => // do some random-access optimised algorithm

Some more details:

  • Scala traits are always compiled to JVM interfaces, so yes, we do use them.
  • While x instanceof Intf may not be used so much, erasure means that (Intf) x (check_cast) is quite common. If check_cast needs the same run-time type test as instanceof (which I assume to be true), then any improvement directly affects Scala.

Yes, checkcast does use the same logic as instanceof. I did hear that sometimes Scala code uses many interfaces.
The old instance check was horrible once you had 10 or more interfaces. The new one is super-fast for 20 or so interfaces, a little bit slower for 40, and performance tails off after that. I’m guessing that a larger number of traits or interfaces is rare.

20 interfaces is pretty much the common case in the standard collections library. :sweat_smile:

1 Like

20 interfaces is fine. If the number gets way above 40, performance is going to suck. If that’s unlikely or at least really unusual, it’ll be fine.

Thanks for sharing this, will this be backported to java 21? IIRC, In Netty, there are many instanceOf ,eg for Http2Frame testing.

I’m a bit late to this thread, but I have a few historical and current items to add.

A few years back, we noticed some performance issues with instanceof checks for interfaces in tight loops in the Scala collection library and worked around by switching to a virtual call.

The original performance bug and analysis was reported by the Spark team in: Major performance bottleneck in scala.collection.mutable.Builder · Issue #9823 · scala/bug · GitHub

We eventually linked this back to: https://bugs.openjdk.org/browse/JDK-8180450 .

I assume this is the bug fix you’re referring to? 8180450: secondary_super_cache does not scale well by theRealAph · Pull Request #18309 · openjdk/jdk · GitHub

I’m currently helping a customer with a Java 25 upgrade and they are have noticed an unexplained slowdown of a workload (~5%) that invokes the Scala Compiler in an concurrent setting. We haven’t reproduced it in small standalone workload yet. The compiler implementation is likely sensitive to any changes in instanceof as many hot methods are implemented with pattern matching that is compiled into chains of instanceof.

I’ll report back here as I learn more. Hopefully we can re-run the workload in question in an JDK build with this patch reverted so as to test if is a factor.

2 Likes

Here are secondary_super (transitively implemented interfaces) list lengths a variety of standard collections.

scala> println(concreteCollections.map(x => (x, getAllInterfaces(x).toList.length)).sortBy(_._2).mkString("\n"))
(interface scala.collection.SeqView,8)
(class scala.collection.SeqView$Map,9)
(class scala.collection.SeqView$Take,9)
(class scala.collection.SeqView$Drop,9)
(interface scala.collection.immutable.Set,11)
(interface scala.collection.immutable.Map,13)
(class scala.collection.immutable.HashSet,17)
(class scala.collection.immutable.LazyList,18)
(class scala.collection.immutable.HashMap,19)
(class scala.collection.mutable.LinkedHashSet,21)
(class scala.collection.immutable.Vector,22)
(class scala.collection.immutable.List,23)
(class scala.collection.immutable.TreeSet,25)
(class scala.collection.mutable.ListBuffer,25)
(class scala.collection.mutable.LinkedHashMap,26)
(class scala.collection.immutable.TreeMap,27)
(class scala.collection.mutable.ArrayBuffer,28)
(class scala.collection.mutable.Queue,29)
(class scala.collection.mutable.Stack,29)
(class scala.collection.mutable.TreeMap,31)

I see the change has a few feature toggles under -XX:+UnlockDiagnosticVMOptions. That will make it easier to test if the workload is sensitive to this change, thanks!

❯ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal | grep Second
     bool ErrorLogSecondaryErrorDetails            = false                                  {diagnostic} {default}
     bool InlineSecondarySupersTest                = true                                {C2 diagnostic} {default}
     bool StressSecondarySupers                    = false                                  {diagnostic} {default}
     bool UseSecondarySupersCache                  = true                                   {diagnostic} {default}
     bool UseSecondarySupersTable                  = true                                   {diagnostic} {default}
     bool VerifySecondarySupers                    = false                                  {diagnostic} {default}

I’ve also analyzed the longest secondary_super lists in the Scala 2 compiler implementation. There are indeed some long ones due to the use of the Cake Pattern. The Global class that ties all the components together has more than 64 transitive interfaces.

Another language feature (self types) results in synthetic checkcast of from Global to an interface among this long list.

Boiled down:

scala> trait T1 { self: Logging => def m = log("foo") }; trait Logging { def log(a: Any) = ()}; class Global extends Logging with T1 /* and many more */;
trait T1
trait Logging
class Global

scala> :javap T1#m
  public default void m();
    descriptor: ()V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: checkcast     #12                 // class $line7/$read$$iw$Logging
         4: ldc           #24                 // String foo
         6: invokeinterface #28,  2           // InterfaceMethod $line7/$read$$iw$Logging.log:(Ljava/lang/Object;)V
        11: return

I’m modifying GitHub - RedHatPerf/type-pollution-agent to profile such type checks over long secondary_super lists to try to pin them down to hot methods in the compiler.

As I understand from your comments and code, >64 will fall into:

    // FIXME: We could do something smarter here, maybe a vectorized
    // comparison or a binary search, but is that worth any added
    // complexity?

And 25-64 will pay some increasing extra cost for probing on collisions.

Unfortunately the ByteBuddy transform in the agent is corrupting the compiler so it fails with an access error before it starts real work.

But some code is run which produces the trace:

--------------------------
Excessive Secondary Supers:
--------------------------
1:	scala.tools.nsc.Global
Count:	46351
Types:
	scala.tools.nsc.Global
Traces:
	scala.reflect.internal.Names$TermName.scala$reflect$internal$Names$TermName$$$outer(Names.scala:570)
		class: scala.tools.nsc.Global
		count: 31138
	scala.reflect.internal.StdAttachments$Attachable.$init$(StdAttachments.scala:24)
		class: scala.tools.nsc.Global
		count: 7932
	scala.reflect.internal.Names$TypeName.scala$reflect$internal$Names$TypeName$$$outer(Names.scala:612)
		class: scala.tools.nsc.Global
		count: 6852
	scala.reflect.internal.Names$TermName$.scala$reflect$internal$Names$TermName$$$outer(Names.scala:607)
		class: scala.tools.nsc.Global
		count: 121
...

Update: I’m now fairy certain that Scala 2 compiler is about 5% slower due do this change. This seems to be worked around by -XX:+UnlockDiagnosticVMOptions -XX:-UseSecondarySupersTable.

Here’s a standalone, Java implemented benchmark that shows the essential pattern of code in scalac that results in hot-path checkcast:

Scala 3 implementation uses a different overall design pattern so doesn’t have classes with such long transitive interface lists, and as such is unaffected.

3 Likes