SCP-009: Analysis of transitive dependencies and stub errors

jvican · February 20, 2017, 7:24pm

This document summarizes a problem analysis previous to our implementation for Scala Center’s proposal “Improve user experience for builds that use only direct dependencies”, submitted by Stu Hood and Eugene Burmako. It gives a general explanation of the problem, introduces Scala Center’s current strategy and describes our next steps.

Our current focus

The goal is to help developers reduce their compilation classpaths by removing unnecessary compile-time dependencies. The original proposal introduces several ways to achieve this. In the last Advisory Board meeting, board members decided to approve the third strategy: a compiler flag to require import statements for all symbols used during compilation (including those not otherwise mentioned in the source).

While finding the minimum number of classpath entries for a successful compilation is theoretically feasible, the amount of required work and time may be overwhelming. Is there an easier approach?

Instead of creating a compiler flag that hints what the minimal direct classpath is, we can turn the problem around. We iteratively find this minimal symbol context by emitting warnings for unused classpath entries. Developers then need to include all the resolved transitive dependencies in the classpath, and the Scala compiler will tell them which entries should be removed. This approach is less complex and time-consuming; it involves less fundamental changes in the current compiler structure, and hence easier to get right. We have tested this idea in the following prototype.

The cause of stub errors

Stub errors happen at symbol creation in an independent subsystem of the compiler: the unpickler. When a Scala class X depends on an already compiled class Y, the Scala compiler creates symbols for Y’s definitions based on the pickled Scala signature stored in Y’s class file. For the compiler to successfully identify these definitions, they have to be defined in a classpath artifact.

The pickled Scala signature of Y contains all its symbol information. For instance, it stores the fully qualified name, the originating tree, the annotations, the symbol type, and the owner. However, information about Y’s external dependencies are defined elsewhere. If Y depends on class Z, Y’s pickled signature points to Z’s class file so that the compiler can also fetch its symbol information.

But what happens if Z’s class file is not found in the classpath and its symbol cannot be created? Instead of failing eagerly, the compiler creates a lazy stub symbol filled in with a predefined error message. Only if the compiler inspects Z’s symbol (forcing the lazy evaluation), an exception is thrown. This exception is commonly called a stub error. Unfortunately, stub errors do not report precise information about the error because:

Y is a symbol from a compiled binary, not from a source;
The Scala compiler does not store the creation context of stub symbols. In particular, it doesn’t know (1) why the unpickling of Y required Z’s symbol, nor (2) the originating source position in X that triggered Y’s and Z’s symbol creation.

When do stub errors happen?

Stub errors happen in any situation where Y’s symbols depending on Z are required/used in X. This applies to the following situations:

Z is part of a type bound (either lo or hi)
Z is the type of a parameter
Z is an implicit class parameter (T: Z)
- This is a corollary from the second case, because implicits parameters are desugared into normal parameters.
Z is a subclass of Y

In addition to this, variables or return types of functions that are typed as Z and used in X will fail with stub errors. Note also that the scoping rules in Y and its definitions determine whether the symbol of Z is necessary or not; if a method of Y cannot be used in X, then its use will be prohibited, the lazy evaluation won’t be forced and its misuse will be reported before the stub error is detected.

Any reference to Z that cannot be represented in the public signature of class Y will not produce stub errors. This includes the following scenarios:

The use of Z inside the body of a constructor, variable or method.
Nested class definitions that are not used in X (same for objects).

It is important to note that if Y is an abstract class or trait used in X with new Foo {}, it will always trigger a failure – the Scala compiler needs to replicate its class definition in X in order to create an anonymous subclass.

Next steps

Improving developer experience and helping companies compile Scala projects in a distributed fashion is an important use case. After our previous analysis, we have settled to change the Scala compiler to:

Improve error reports for stub errors and augment them with precise information that helps developers diagnose the root of the problem;
Provide an import analysis that will tell developers which classpath entries are unnecessary for the successful compilation of artifact X.

We hope that these changes considerably help Scala developers enjoy speedups in their compilation times under builds with direct dependencies. If you have any comment on our current proposal, please drop a comment in the following Discourse thread.

Implementation

A PR with all the aforementioned changes can be found here.

This work was finished around the beginning of January 2017 and has been supervised by Eugene Burmako. The engineers at Twitter have kindly reviewed and tried our prototype out.