Scala Signature Layout

#1

Hello Scala community,

we have recently discovered that it is currently impossible to shade and relocate scala libraries. We have tried tools for gradle and sbt.

We found out that this problem is related to the fact that the Scala compiler adds the ScalaSignature and ScalaLongSignature annotations, of which none of the shading libraries we found seem to be aware of.

Since we are in dire need of a working shading/relocation method, we set out to write a tool, which would fix these annotations. We found prior work in https://github.com/hutkev/ScalaShade. However ScalaShade is written in Java and uses none of the existing Scala tooling such as Scalap.

We are now interested in finding information and documentation about the exact format of the ScalaSignature annotations. We think we know how to parse and possibly transform some of the signature elements, but we do not know what all of them represent. This makes it hard for us to become confident in our tools ability to solve the problem in a general and maintainable way.

The only documentation we found so far is scala.reflect.internal.pickling.PickleFormat as well as this document.

Any help in finding information would be highly appreciated.
Our current progress can be tracked here: https://github.com/opencypher/morpheus/pull/913

Thanks in advance!
Best,
Max KieĂźling for the openCypher Morpheus Team

7 Likes
#2

Hi Max

The documentation is not extensive :slight_smile: PickleFormat is the main one, and the Pickler / UnPickler and scalap source code of course. The PDF is a good high-level overview.

They represent the compiler-internal data structures, which are not thoroughly documented either.

Don’t hesitate to ask specific questions, either on this forum or on the https://gitter.im/scala/contributors gitter, where many compiler hackers hang out.

I’d be interested to understand the use case in more detail. Why is it importand to have shaded scala signatures? @adriaanm suggested on a different channel that it might be enough use a java shading tool and strip out the scala signatures.

Lukas

#3

Hej Lukas,

thanks a lot for your reply.

So the high-level reason we want to do the relocation is that we are in the process of bringing a new module into the Apache Spark project. The new module itself introduces some new Scala dependencies of which the Spark committers are not very fond of. They suggested that we build a shaded and relocated fat jar, which would contain all our dependencies. The discussion can be found here: https://github.com/apache/spark/pull/24490

We already considered the approach you mentioned above, but are not sure what it’s consequences would be. In some parts of the code that would be included in the shaded jar we are using reflection a lot, and also have a couple of package objects. As far as we understand stripping the signatures would cause issues with both the reflection and the package objects, right?

We are currently having a hard time understanding both the content of the annotation as well as how the payload of the entries in the table are encoded.
It seems that the content is some kind of tree structure.
Is there any pointer towards something that could explain that structure?

Best, Max

#4

Is that Scala reflection (scala.reflect) or Java reflection?

Java reflection is not affected. I don’t have much experience with (Java) shading, so I don’t know how to handle Java reflection when shading a dependency. For package objects, I don’t see any particular issue offhand.

I don’t have any other resources at hand than what we discussed before…

Lukas

#5

Hej Lukas,

we have found a workaround to our problem. In our shaded jar we only relocate the dependencies of the library we want to expose. We discovered that if we make sure never to expose classes from those relocated dependencies in the libraries API, the library can be safely used in other projects. To solve our problem this seems to be good enough for now.

However I still wanted to share our findings as they might be interesting for other.

Using a patched version of ScalaShade we were able to repair most of the signatures. This version of ScalaShade currently repairs all ExtClassModRef entries, as well as some Constants. For a fully repaired Signature it would at least be necessary to also repair all ExtRef entries and there might be others that we did not catch.

However during our experiment we discovered that repairing the Scala signatures alone is not enough to reconstruct a fully functioning class after relocation. The Scala compiler in some cases (e.g. for traits) generates methods with names basse on the package and trait name. These methods are not properly relocated. Now if, in a project, one tries to use/ extend a relocated class with such a method, the Scala compiler tries to access the method based on the relocated package name, which fails since the name is still based on the old package. To fix this the shading plugin would have to rename the all of these generated methods and all of their references.

In conclusion it seems that proper Scala shading with relocation seems to be possible but is connected with considerable efforts. ScalaShade seems to be a good start. We think it would be possible to incorporate ScalaShade with existing shade plugins, together with an improved relocator, which is aware of the methods generated by the Scala compiler.

Best, Max

#6

Usually you shade dependencies when you package an application—not a library—in which case you don’t run into such problems.