Speeding Up Compilation Time with scalac-profiling

Hello everyone,

I recently posted in scala-lang about how we can use scalac-profiling to profile and speed up compilation times of projects using a lot of macros or implicit searches.

Read the blog post here.

If you have any question about the analysis or the plugin, I’m happy to answer them in this thread.

Happy profiling!


This is very good. Quite a long read. I will definitely use it once I clear my schedule a little to do performance tweaks.

1 Like

…once I clear my schedule a little to read it…

Hi! Thank you for the blog post! It’s really interesting and useful. I’m following the instructions, but I can’t understand how to generate a Flamegraph. Whenever I execute the script:

./flamegraph.pl \
--hash --countname="ns" \
--color=scala-compilation \
$MY_PATH_TO_FLAMEGRAPH_DATA > bloop-profile-initial.svg

I get this error message: ERROR: No stack counts found.
I went through the Flamegraph README file, but still, I’m not sure what I’m missing there…

1 Like

Hey :wave: Did you make sure that the flamegraph file is not empty and that you replace $MY_PATH_TO_FLAMEGRAPH_DATA with the absolute path to the flame graph file? Also, did you confirm you are using the flame graph script from the submodule in the scalac-profiling repo?

1 Like

Oh yes, the flamegraph file is empty, but it shouldn’t be, right? :thinking: I’ve run sbt clean compile with these scalacOptions
And I’m using the flame graph script from the submodule in the scalac-profiling repo.

Can you show me the scalac options that have the variables starting with $ replaced by the values? It’s not clear to me if you’re resolving the plugin first and then using the returned path in -Xplugin for example.

Another important thing (and this is sbt specific) is that you may not be adding the scalac options correctly to your build. If you want to profile the compile time of the application, make sure > show $YOUR_PROJECT/compile:scalacOptions contains the options you add, if you’re profiling the tests, use show $YOUR_PROJECT/test:scalacOptions.

As someone who just went through the agony of finding out why it didn’t accept my flamegraph, check the newlines. It doesn’t work with windows newlines.

1 Like

Would you say it is worth diving into profiling if the compiler statistics report that only 40% (roughly*) spent in the typer phase?

* I can’t comfortably say I warmed-up the compiler correctly. Also, I have a multi-module Gradle project, which means the statistics are reported per-module and not aggregatively.

PS. the blog post convinced me to move from Scala 2.11 to 2.12 :slight_smile:

If typer was only 40 I would be on the lookout for other inefficiencies in other parts of the compiler and also make sure I’ve warmed up the compiler correctly. I’ve never seen such a low value for typechecking. Can you share what other phases dominate the compilation time? You should also use parallel bytecode generation assuming genbcode takes a long time.

Well, here’s the statistics for my slowest-compiling module (512 Scala sources + 123 Java sources):

*** Cumulative timers for phases
#total compile time           : 1 spans, ()84316.407ms
  parser                      : 1 spans, ()2126.061ms (2.5%)
  namer                       : 1 spans, ()1063.004ms (1.3%)
  packageobjects              : 1 spans, ()59.888ms (0.1%)
  typer                       : 1 spans, ()35940.382ms (42.6%)
  patmat                      : 1 spans, ()4916.451ms (5.8%)
  superaccessors              : 1 spans, ()641.7ms (0.8%)
  extmethods                  : 1 spans, ()145.072ms (0.2%)
  pickler                     : 1 spans, ()290.674ms (0.3%)
  xsbt-api                    : 1 spans, ()5428.866ms (6.4%)
  xsbt-dependency             : 1 spans, ()2692.049ms (3.2%)
  refchecks                   : 1 spans, ()2765.116ms (3.3%)
  uncurry                     : 1 spans, ()2155.691ms (2.6%)
  fields                      : 1 spans, ()929.506ms (1.1%)
  tailcalls                   : 1 spans, ()420.865ms (0.5%)
  specialize                  : 1 spans, ()1840.268ms (2.2%)
  explicitouter               : 1 spans, ()700.845ms (0.8%)
  erasure                     : 1 spans, ()6234.984ms (7.4%)
  posterasure                 : 1 spans, ()360.898ms (0.4%)
  lambdalift                  : 1 spans, ()716.089ms (0.8%)
  constructors                : 1 spans, ()453.472ms (0.5%)
  flatten                     : 1 spans, ()264.71ms (0.3%)
  mixin                       : 1 spans, ()1969.458ms (2.3%)
  cleanup                     : 1 spans, ()400.979ms (0.5%)
  delambdafy                  : 1 spans, ()676.714ms (0.8%)
  jvm                         : 1 spans, ()10894.368ms (12.9%)
  xsbt-analyzer               : 1 spans, ()95.56ms (0.1%)
63 warnings found
Completed Scala compilation: 1 mins 38.062 secs

The jvm phase is the one where the bytecode is being generated, correct? If so, what do you mean by parallel generation? -Ybackend-parallelism?

Also, do note that Gradle only supports running Scala with the old Zinc. I also suspect that it doesn’t support the server mode and thus not allowing for compiler warm-up. Seems like it’s using the same process when compiling different modules in the same build, but it uses a different process per-build.

The cost of incremental compilation is unusually high in your project. It will be most likely better in Zinc 1 (the overhead of xsbt-analyzer should be close to 1 or 2%, and the cost of xsbt-dependency around 3-4% when you upgrade). The best you can do to decrease your compilation times now is to use Zinc 1 as soon as possible. I recommend you consider creating a gradle integration for Bloop. From there we can do more benchmarks and see what the bottleneck is then and optimize it.

P.S. In the upcoming months I’ll be removing the cost of xsbt-api in Zinc 1 too, so that will give you and other projects a much faster incremental compilation. The api phase is the most expensive.

1 Like

I’ve seen it multiple times, e.g. when compiling Dotty.

For me I also had a rather low value for typecheking and high for jvm. In this case your just creating a lot of extra classes. For me the solution was to move most of my stuff to Circe derivation. There are still macros, but they don’t expand as many other macros and generate less classes. I cut my build time by around 40% if I remember correctly.

By the way, this is a good way to reduce the time spent in the genbcode phase in case the value is high.

Yeah, I think this is due to flame graph or whatever perl version you’re using. IIRC we’re relying on java utilities to correctly use the newline of the underlying operating system. If you think this is our fault, please let me know by filing a ticket in scalacenter/scalac-profiling.

@alodavi Confirm me if this is the real issue and you’re running on Windows. Otherwise I’m happy to help you out with whatever issue you’re hitting on.

Ah, that makes sense. Don’t have perl on my windows machine so I fired up WSL when I couldn’t get any further. A warning would be nice, but I don’t think anything else would be needed if that’s the case.

Yeah, agreed. I think flame graph should emit this warning. My recommendation is that you look into it, it’s worth it to reduce the compile times as much as possible, specifically if you are a big user of circe :smile:.

@jvican thank you for your help! I’m running on Mac, I don’t think that the issue was with Flamegraph since the file I was trying to pass to it was empty… Unfortunately, I can’t investigate the issue further at the moment, but thank you anyway!