Classfile Parser doesn't skip / unlink `isScalaRaw` classes

Compiler internals question: for classfiles with a Scala signature, the classfile parser doesn’t actually parse the classfile content but just unpickles the signature (see here). When defining nested classes, only the top-level class gets a Scala signature, nested symbols are created from there. The classfiles for nested classes obtain a “Scala” marker attribute.

In the classfile parser, isScala is a classfile with a Scala signature, isScalaRaw is one with a marker attribute.

I’m wondering why we don’t skip over isScalaRaw classfiles and unlink their symbols from the scope. Instead, the content of the classfile (parents, fields, methods) is parsed and symbols are created, like for a java-defined classfile.

In 2.11 you can do this

$ cat B.scala
class B { class C }
$ scalac11 B.scala
$ scala11
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).

scala> val bc = new B$C(new B)
bc: B$C = B$C@6504e3b2

scala> val c: B#C = bc
<console>:12: error: type mismatch;
 found   : B$C
 required: B#C
       val c: B#C = bc
                    ^

So there are two incompatible symbols for the same class.

In 2.12 an assertion in the backend fails.

scala> new B$C(new B)
java.lang.AssertionError: assertion failed: List(LB$C;)

I (very) vaguely remember that there’s a connection from this to the 2.11 optimizer (or specialization?), but I couldn’t find it so far – maybe it rings a bell for @dragos?

I observed that many specialized classfiles (Tuple2$mcZZ$sp etc) are completed in this way during specialization (see here) - maybe we could save some cycles there.

1 Like

I also vaguely remember that I tried that but the old optimizer failed. In the old optimizer we needed to match bytecode methods and classes back to symbols, and sometimes the heuristics weren’t able to undo name mangling, especially for impl classes, and particularly on second-step inlining (inline once from bytecode, say, foreach and inline a second time. now all you have is bytecode names). I guess the new optimizer doesn’t have this limitation, so I think it’s a worthwhile optimization/sanity check.

When would you unlink them? I wonder if unlinking them at the point the top-level class is completed won’t miss some code that already used the mangled symbols. Probably not a huge problem, it would mean user-code mentioned $ symbols that are anyway off-limits.

1 Like

Regarding specialization, indeed, it’s surprising it would complete those symbols. I think it would be good to see why it completes so much.

Fixed in https://github.com/scala/scala/pull/5952. For reference, the reason why specialized classfiles get completed is explained in this comment: https://github.com/scala/scala/pull/5952#issuecomment-311057607