I am not sure if this has been mentioned/talked about elsewhere but as of now the the inlining optimizer is only available for Scala 2.12/Scala 2.13. While Dotty/Scala has the new inline keyword, this is only for manual optimization where as the advantage of the Scala 2.12/2.13 optimizer is that it goes through the entire codebase looking for cases that makes sense to inline (i.e. megamorphic calls) without developers having to manually figure out what needs to be inlined and what not.
Are there any plans to add this functionality to Scala 3 and if so is there a chance that it would be backported to Scala 3.3 LTS?
I’m far from expert in this area, just curious to understand: in the docs you linked, under “Motivation” there is a link to a 2015 talk video by @lrytz outlining the “problem”, which itself references older blogs going back to 2011. It certainly sounds like a pretty compelling need, based on Lukas’s explanation.
Approximately, the problem is with highly “megamorphic” virtual calls like function apply() that call tiny snippets of code in hot loops. If these were inlined, the virtual call overhead could be eliminated.
I’m just wanting to confirm whether the need for inline still exists a decade later, or whether improvements in the runtime mean the problem is less acute now?
GraalVM (I mean the JVM with new JIT implementation written in Java, not Graal Native Image) is a lot better than HotSpot in finding inlining chains that eliminate megamorphic callsites. These improvements are partially available in CE, fully only in EE.
The 2.12 / 2.13 optimizer implementation is independent of the compiler’s frontend representation (Types / Symbols / ASTs), it works on the bytecode representation. Some metadata that is useful for the optimizer but not represented in the bytecode is stored in classfiles as a separate attribute (InlineInfoAttribute). This allows the optimizer to work with classfiles without looking at the “pickle”, the serialized symbol table (what would be Tasty on Scala 3).
This setup would allow adding the optimzier to the Scala 3 compiler relatively easily.
The code could also be used as a basis for implementing a compiler-independent optimizer (linker). I believe this would be a better way forward because the linker can make a closed-world assumption. This makes many more methods effectively final (methods that don’t have any overrides in the code being optimized), which enables inlining. Also unused code / classes can be removed, which makes more methods effectively final.
For Scala 3, a linker based on Tasty is also a viable option.
I do recall that the new GraalVM jit is a lot better at detecting megamorphic callsites but I don’t think its wise to rely on EE because that doesn’t send the best message.
Ideally the Scala 3 compiler should produce the most optimial bytecode for the platforms it supports and that I would presume would be in OpenJDK/CE for JVM.
Porting the Scala 2 optimizer is on our things to do list. Nothing is scheduled yet, we are still in the stage where we look for someone who could do it.
This whole-world optimizer and linker looked quite promising. But I’ve never ever heard again about it. Almost like it evaporated into thin air…
I guess something like that could be really interesting in places where performance is king—and you usually can’t use any abstractions because of that. Such a powerful optimizer would let one write high level code even in the HPC context. Scala was once¹ targeted explicitly towards HPC applications. But Scala could become only a first-class choice in this field if abstractions were free.
Zero cost abstractions are one of the strongest selling-point for Rust. I’ve heard rumors that people are leaving Scala for Rust… So embracing and extending Rust’s current advantages would be a smart move!
To my highest enjoyment I’ve also heard Scala is going to take inspiration from the overly praised error handling in Rust. That’s so exciting. Can’t wait for the results. Wise decision! But why not take some more inspiration from Rust’s good parts and its selling points?
I am really looking forward to a Scala with zero cost FP capabilities! Such a language is long overdue.
It’s time for disrupting innovation. Scala should lead once more also on that ground.
That is welcome news. I recall when experimenting with Graal EE (EE specifically and only) that it seemed to eliminate much of the performance impact arising from boxing of primitives. Presumably, that runtime’s optimizations are able to “unbox” primitives in many cases and avoid heap allocation.