OSS and Professional thoughts on migrating to Scala 3

lihaoyi · January 27, 2020, 5:07am

I’ve been in a lot of debates on Scala 3 features, often arguing strongly for compatibility and less churn over novelty and experimentation. Here are some thoughts I’ve put down about the Scala 3 migration, from both my OSS and Professional contexts, that may be useful to see where I’m coming from. Somewhat long, TLDR at the bottom.

OSS Migration

I maintain a suite of OSS libraries and tools:

sourcecode
geny
requests-scala
os-lib
scalatags
upickle
pprint
utest
fansi
fastparse
ammonite
mill

In many ways I am lucky: my own libraries form a self-contained dependency graph, so I have the freedom to upgrade them at my own pace. For the sake of discussion, I will focus on Ammonite, which depends on roughly ~all of the libraries above.

Ammonite Scala Version Support

When Scala 2.13 came out, I dropped support for Scala 2.11 in all my libraries and tools. This was not without controversy, as a very large proportion of professional use is still on 2.11 (we’ll discuss this later), but it is what it is. One way of contextualizing this is to compare the Scala version releases (Ammonite usually supports each new version within a week of release) with the Scala versions that Ammonite dropped support for:

Scala 2.10 came out Jan 2013
Scala 2.11 came out May 2014
Scala 2.12 came out Nov 2016
Scala 2.13 came out Jun 2019
Ammonite dropped Scala 2.10 in Dec 2017
Ammonite dropped Scala 2.11 in Sep 2019

Overall, Ammonite (and all my other libraries) aimed to support the last ~2-3 major Scala versions. This resulted in supporting each major Scala version for ~5 years, or about 3 years after the next major version comes out.

Using “5 years of support” as our rule of thumb, we can extrapolate this to Scala 3, we may find it looks something like this:

Scala 3.0 comes out Dec 2020/2021???
Ammonite drops Scala 2.12 in Nov 2021???
Ammonite drops Scala 2.13 in Nov 2024???

All these dates are just guesses, but it indicates that in a best-case scenario, for my OSS work, I expect to be in the “migrating” phase for the next half-decade before I end up fully on Scala 3. This says nothing about Scala 3: it simply extrapolates how long I’ve supported old Scala versions in the past. In fact, if the slow Scala 3 migration goes as smoothly as the slow upgrade from Scala 2.10 to Scala 2.13, I’d consider that a resounding success.

Cross Building and Cross Sources

All my OSS libraries support multiple Scala versions via cross-building: first using SBT, and then Mill. Most sources are shared between all 2-3 supported Scala major versions, up to 20 supported minor versions, along with Scala-JVM and Scala.js. For sources that differ between the entries in the build matrix, I use version-specific source folders to encapsulate version-specific logic.

Ammonite has the most of these, as it interacts with unstable compiler internals, and currently has the following version-specific source folders:

scala-2.12.0_8/
scala-2.12.10-2.13.1+/
scala-2.12.9+/
scala-2.12/
scala-2.13.1/
scala-2.13/
scala-not-2.12.10-2.13.1+/
scala-not-2.13.1/

While this seems like a lot, it’s an acceptable price to pay to support the current set of 10 different minor versions from 2.12.1 to 2.13.1, interacting with tons of internal and unstable APIs. Most of these folders have O(10s) of lines of code, so the amount of duplication is minimal.

Maintaining a cross-build with version-specific folders hand is somewhat tedious, but it’s the best compromise I have found so far. It allows library-development and Scala-version-upgrades to happen independently, with new library features supporting all versions of Scala, and minimal “forced” upgrades where wanting to use a new version of a library forces a downstream user to also upgrade their Scala version.

Cross-building has worked remarkably well over the past decade, across 2 different axes (ScalaVersion X ScalaPlatform), and I do not see any of the discussed alternatives (version-specific git branches, C-preprocessor directives, etc.) as an improvement over the status quo. Honestly, it works great.

One consequence of cross-building is that the oft-mentioned “auto-migration tool” is of zero value. There is not single point at which I can “cut over” to the new version entirely: rather, there will be a ~5-year cross-build period as old versions of Scala are slowly dropped, until all remaining versions are in the Scala 3.x series.

Compatibility is a continuum, not a binary property, and this shows here as well: the less compatible Scala 3 is with Scala 2, the more code has to be duplicated from the shared src/ folder into version-specific src-2/ or src-3/ folders. This accurately reflects the fact that decreasing compatibility bifurcates the codebase and increases the maintenance burden for the 5-year period until the old versions are discarded.

Binary Compatibility and Macros

Almost all my libraries use macros. Whether simple ones like sourcecode.Line to pick up line numbers, deriving case-class serializers using upickle.default.macroRW, doing tree transformations using utest.Tests{...}, or heavy inlining using fastparse.P{...}. While macros are nominally experimental, the reality is that the entire ecosystem depends heavily on macros: Scalatest, Circe, SBT, etc. all depend heavily on macros, and libraries like Jackson-Module-Scala do not use macros but use scala.reflect.runtime just as heavily.

These also happen to be the foundational libraries that everyone relies on. Even downstream libraries that do not rely on macros themselves very likely rely on Scalatest and make use of its macro asserts!

Due to macros, the fact that Scala 3 can use libraries in a binary compatible way isn’t all that helpful: the focus should thus be on getting these core libraries cross-built and cross-published for Scala 3 as soon as possible. These core ecosystem libraries are all macro-heavy, and will need to be re-published for Scala 3: only then will the rest of the ecosystem even stand a chance at migrating

Professional Migration

Our work codebase is perhaps 1MLOC on Scala 2.12, and 1MLOC cross built between Scala 2.12 and 2.11, with a few stragglers still on Scala 2.10. Essentially, all our backend services are on 2.12, and all our Apache-Spark-related code must support both Scala 2.11 and Scala 2.12 since the current major version of Spark 2.4 is on Scala 2.11, and the next major version of Spark 3.0 is on Scala 2.12.

Migrating Services

Migrating our services to new versions of Scala is relatively straightforward: once all our upstream dependencies are upgraded, we can upgrade as well. We do not have concrete plans to move to 2.13, but will likely investigate it later this year and I do not anticipate any real difficulties in upgrading.

The last upgrade from 2.10 to 2.12 which took place early-mid 2018 took a few weeks of full-time work to upgrade maybe a million lines of code, and went smoothly without any issue. People loved it; cut our jar sizes in half and compile times in half as well!

If Scala 3 comes out, and isn’t too breaking, it should not be hard to fully cut over this code to Scala 3 as well.

We do not make use of any fancy Scala language features: we have almost no implicits of our own, and I believe we do not define a single macro. Nevertheless, we do make heavy use of libraries like Scalatest or Jackson-Module-Scala which themselves make heavy use of scala.reflect at compile time and run time, and so we will need to wait for them to be re-published for Scala 3 before we can consider upgrading. This is not a big deal: we did that upgrading to 2.12, will do it to upgrade to 2.13, and can do it again when upgrading to 3.0.

Migrating Spark-Related Code

Migrating our spark-related code is more tricky: our spark-related code is used as a library by our customers, and shares the same JVM and classpath. Thus even if Apache Spark 3.0 is out supporting Scala 2.12, and all our dependencies support Scala 2.12, we still need to support Scala 2.11 (and Spark 2.4) as long as we have customers who demand it.

For our spark-related code, Even if Apache Spark 3.0 comes out with Scala 2.12 support later this year, we are likely going to support Spark 2.4 with Scala 2.11 for many years to come. And later, when Scala 3.0 comes out, or Scala 3.1, or Scala 3.2, I fully expect we will be cross-building much of this code against Scala 2.11 and Scala 2.12 for the foreseeable future.

Other Enterprises

As part of our developer tools team, I often compare notes with other organizations using Scala, which are generally similarly sized to somewhat larger than us (100-1000 developers). Most of our peers are still in various stages of migrating from Scala 2.11 to Scala 2.12: whether investigating it for the first time, just starting the migration, or already part way through and making good progress.

Two major things stand out when talking to people about migrating past Scala 2.11:

Apache Spark’s current major version (Spark 2.4) is still on Scala 2.11. Thus
any code that interfaces with Spark has to also be on Scala 2.11
If you do not have cross-building capabilities in your build tool (i.e. most
build tools except SBT and Mill), you are unable to have different Scala
versions in the same build/repository. Thus even non-spark-related code the
happens to be in the same repository is stuck on Scala 2.11!

Neither of these properties is likely to change quickly: Apache Spark is a large project and needs time to upgrade to new versions, and cross-building in non-SBT tools will take its time to appear. Thus even if Scala 3 comes out end 2020/2021, we might expect many of these enterprises to be still in the middle of their 2.11 -> 2.12 migrations, perhaps finally moving onto Scala 2.13 some time later and Scala 3 even further down the line

The overall consequence of this is that if we want to support enterprise users in our open source projects, even supporting the last 2 major versions of Scala is insufficient! I expect many to still be mostly on Scala 2.x past 2025

TLDR

For my OSS work, I expect to be in the “migrating” phase for the next
half-decade before I end up fully on Scala 3. If the slow Scala 3 migration
goes as smoothly as the slow upgrade from Scala 2.10 to Scala 2.13, I’d
consider that a resounding success
One consequence of cross-building is that the oft-mentioned
“auto-migration tool” is of zero value
Compatibility is a continuum, not a binary property, and this shows here as well:
the less compatible Scala 3 is with Scala 2, the more code has to be duplicated
from the shared src/ folder into version-specific src-2/ or src-3/ folders
Due to macros, the fact that Scala 3 can use libraries in a binary compatible
way isn’t all that helpful: the focus should thus be on getting the core
ecosystem libraries cross-built and cross-published for Scala 3 as soon as
possible
Migrating our services to new versions of Scala is relatively straightforward:
once all our upstream dependencies are upgraded, we can upgrade as well
For our spark-related code, even if Apache Spark 3.0 comes out with Scala 2.12
support later this year, we are likely going to support Spark 2.4 with Scala
2.11 for many years to come
If we want to support enterprise users in our open source projects, even
supporting the last 2 major versions of Scala is insufficient! I expect many
to still be mostly on Scala 2.x past 2025

Conclusion

Hopefully this gives some background to why I’ve been arguing in favor of compatibility and smooth (if slow) migrations, rather than hoping for a fast “big bang” upgrade with some hypothetical tooling-assistance.

In both my OSS and Professional contexts, not only are “endless” slow and smooth Scala version upgrades the norm, they’re also fine: the Scala language and implementation has improved by leaps and bounds, and everyone in my organization could tell you how much more productive they are on Scala 2.12 than Scala 2.10 due to the improved compile times.

Going forward, what I would hope for is to minimize breaking changes where we don’t need to make them, and where we do need to make them do them in a way that’s measured and intentional. If that means some not-fully-baked work-in-progress breaking change might miss Scala 3.0, land in Scala 3.1, have the old thing deprecated in Scala 3.2 and removed in Scala 3.3, then so be it. Scala 3 won’t be the first version to cause some amount of breakage, and I wouldn’t expect it to be the last. And that’s OK: 2.12 broke usage on Java 7, 2.13 broke a lot of collections APIs, such is the price of progress.

From where I sit, there’s nothing special about Scala 3.0 with regard to breaking changes: it is just another upgrade in an endless series of upgrades, one that we hope to be able to upgrade to and cross-build against with minimal pain and suffering.

Thanks for reading, and I hope you found this post interesting!

Akashicseer · January 27, 2020, 6:05am

I hope they listen to you, you have a lot wrapped up in this and you make very good points from the business world standpoint.

kjsingh · January 27, 2020, 7:12am

Would you like to share why git branches didn’t work?

julienrf · January 27, 2020, 8:57am

Thanks @lihaoyi for writing this up! Overall, I agree with your points, I just have the following comments:

In the case of scala-collection-compat, we have created a CrossCompat rewrite rule, which auto-migrates the code into a form that cross-compiles. We didn’t have direct feedback on this feature, so I’m not sure how successfully it has been applied, but wouldn’t that have more than “zero value”?

Sure, and this process has already started.

Howover, I would like to stress that relying on macros is a risk that library authors are maybe not enough aware of. Clearly, cross-compiling code that defines macros is hard, and library maintainers should always take this into account before deciding to rely on macros or not.

RichType · January 27, 2020, 9:15am

Thanks for taking the time to write this. It seems to me that the software industry as a whole is moving towards a model of steady progressive predictable change, as opposed to the backwards compatibility must be maintained for ever model and the big bang revolutionary model.

A complex system that works is always developed from a simple system that works. We can surely go further now and say that a complex system that works, always develops from a very simple system that works through, multiple steps that all work. Symbolism matters, the nettle needs to be grasped. So I would suggest that the Epic / Major change distinction should be dropped and hence that the aim will be to deliver Scala 4, no more than 2 years after Scala 3, and Scala 5, no more than 4 years after the release of Scala 3.

lihaoyi · January 27, 2020, 9:20am

Thanks for reading!

@kjsingh

I don’t think I say that git branches don’t work, rather I think they’re generally an inferior tool to using version-specific source-folders if you expect to keep your library compatible with multiple versions of Scala as it evolves.

We do in fact use git-branches for some of our spark-related-code at work, and it definitely has a tendency for the branches to diverge and for improvements to only land in master, and for PRs targeting multiple branches to get harder over time as the codebases diverge.

It’s not impossible - it works - but from my experience with the process I definitely would not choose git-branch cross-building for my own projects, and at work we have not chosen git-branch based cross-building for other projects and instead added custom support to Bazel to allow source-folder-based cross-building and are very happy with that choice

@julienrf,

Regarding the CrossCompat rule, scala-collection-compat has definitely been great, no doubt. I can’t really speak to the value of the rewrite rule, because the scala-collection-compat shims have been excellent at bridging the gap between 2.12 and 2.13 and making them almost the same. I’ve used scala-collection-compat very heavily, but I haven’t personally used the rewrite rule at all.

Mind you, I’m a big fan of autofixes in general. At work, I have pushed projects that autofix all sorts of things: pull-requests are automatically fmted with the relevant *fmt tool regardless of what language they are in (scalafmt, jsonnet fmt, yapf, etc.), generated files are automatically re-generated, and so on. Autofixes are great; I just don’t they apply to the a potential Scala-3 migration in either of my OSS or Professional contexts, and I suspect other people maintaining open source libraries may have similar constraints.

Regarding the process having already started, you’re right that that’s the case. I’m just bringing this up because I’ve constantly heard “can we autofix this?” and “this will be solved by a scalafix” in threads, with much hope and skepticism is unduly focused on such autofixes, which in the end I do not think will be the savior that people seem to hope. It’s a good technology, but I don’t think it’s a good fit for this particular problem

You’re right about macros being a risk, but I think that argument is for a time years ago. Right now, the current usage of macros is where the community has decided it is worth it, and for us managing the upgrade we have to make do with the current reality.

Furthermore, I think the macros in libraries like Circe (circe-derivation), Scalatest, and SBT are reasonable: I have many of the same macros in my own libraries uPickle, uTest, and Mill. I don’t think any of these libraries are using macros gratuitously.

RichType · January 27, 2020, 10:56am

One of the stated aims of Scala 3 is “to be more opinionated.” I have for a little while become concerned about this, but didn’t feel comfortable about speaking up. As @lihaoyi, a very major player in the Scala Community has raised his concerns, I a very minor player within the Scala Community feel able to raise mine.

If Scala 3 is to be more opinionated, the question that arises for me is whose opinion. Martin Odersky’s? EPFL’s? Lightbend’s? The SIP Commitee?, some kind of wide ranging consensus or something else? An attitude of “trust me”, or “trust us” seems to have arisen. Bjarne Stroustrup famously said “There’s languages that people complain about and there’s languages that nobody uses.” So I would argue that its actually a testament to the creator of a language, if when he / she comes to create a 2nd / 3rd version of that language, there are a significant number of people who care enough to not just trust them to get on with it.

I hope my contributions will be experienced as respectful. However I would argue that even disrespectful criticism is a testament to the achievements of the language creator. It was inevitable that once Scala started to have significant success, it would be seen as a threat. In particular many people were happy with Scala as a Haskellator. Haskell has consistently (whether by accident or design) failed to break out of its ghetto. It was inevitable that some of those people would become angry when they realised the creators of Scala were interested in Scala as a language in its own right. It was inevitable that some of them would become angry if and when Scala threatened to eclipse Haskell.

I would suggest that their criticisms of Scala can not be taken in good faith. Their criticisms were not intended to be constructive, they were in fact intended to be destructive of Scala’s success. Added to this is the fact that the more successful a language becomes, the more developers will be required to use it not through choice. Some of them will have been quite a happy with the previous programming languages, this will inevitably produce a growing ground swell of dissatisfaction.

So in particular the criticism of language complexity is not one that can simply be taken in good faith. Managing complexity, reducing incidental complexity is what its all about. There are no silver bullets for simplicity, as we strive to solve ever more complex problems, as we struggle to work in ever more complex problem domains and eco systems. To say that one finds a programming language complicated, really tells us nothing more than that you don’t like the language. I mean who actually ever claims to like the unnecessary complexity of a language? So I would suggest that any mad rushes to “simplify” the language will end in disaster. Sudden lurches into “simplification” will actually lead to the reverse.

We should be very wary of premature simplification. We should be very wary of trying to simplify aspects of the language, before we have fully grasped the fully comlexity of the problem domain, before we have fully grasped the full range and diversity of the eco system that uses that aspect of the language. Above all we need to understand the full capabilities that are required, only then should we focus on accessibility and learning. Learnabilty, the ability to use the language quickly and efficiently for simple purposes is of vital importance. The ability to evolve one’s use of the language, to be able to progress in using the language in more sophisticated and powerful ways in well documented, comprehensible small steps is a key axis of scalabilty. However its important not to put the cart before the horse.

eyalroth · January 27, 2020, 12:39pm

I believe the general strategy of the professional / commercial industry has settled around adopting changes as gradually as possible.

By far, the thing I am waiting for the most in Scala 3 is reduced compile times, and I have the feeling that this bothers many other professional developers. Anything that makes the migration harder just for the sake of other features / benefits is one step further away from faster compile times.

I think this lesson can be somewhat learned from Python 3, which took forever to migrate to (and is still going on), and perhaps because it introduced too many breaking changes that only provide minor benefits on their own (this is a nice article about it).

rgwilton · January 27, 2020, 2:11pm

Scala 3 is introducing some nice things (Union types, enums, nullability) and removing/fixing some broken stuff, but I’m not entirely convinced by some of the new implicits stuff. I thought that it was meant to be simpler than what we have today (which I basically avoid due to the complexity), but objectively I’m not convinced that it necessarily meets that bar of being significantly simpler.

E.g. one example is extension methods. There are multiple different ways that these can be expressed, but I think that the language would be much cleaner if they were all expressed using the Collective Extensions syntax, and all other methods for defining extension methods were dropped.

In fact, I would even suggest using _ if it doesn’t matter what the extension name is.

E.g.

extension stringOps on (xs: Seq[String]) {
  def longestStrings: Seq[String] = {
val maxLength = xs.map(_.length).max
xs.filter(_.length == maxLength)
  }
}

extension listOps on [T](xs: List[T]) {
  def second = xs.tail.head
  def third: T = xs.tail.tail.head
}

extension _ on [T](xs: List[T]) with Ordering[T] {
  def largest(n: Int) = xs.sorted.takeRight(n)
}

My perceived advantages of this syntax over the other methods are:

It clearly identifies what it is (i.e. an extension). It is easy for a newbie to see and google for Scala extension methods.
Methods are defined in exactly the same way as you would find in a regular class.
This is apparently similar to how it is done in other languages, and commonality and familiarity is generally a good thing.

eyalroth · January 27, 2020, 3:48pm

This is a bit off-topic in this discussion. I believe the better way to show you concerns are either in the topic for the current implicits proposal (monitored by the SIP committee I think), or in the topic for the alternative proposal (authored by me).

rgwilton · January 27, 2020, 4:15pm

The specific comment about extension syntax may be off topic (but raised previously), but I think that the underlying principle that it is attempting to illustrate is on topic. Scala 3 was meant to be simpler than Scala 2, but I’m not sure whether it is going to be simpler or just have different (and perhaps more) overall complexity.

I hope that the folks in charge don’t try and rush Scala 3 out of the door, but take the necessary time to take a critical look at all of the features that are being introduced and harshly evaluate each feature to determine worth the feature is really worth the additional complexity that it inevitably brings. I would suggest, if in doubt leave it out, and introduce it in Scala 3.1 or 3.2, especially since it is much harder to take something out again after it has been introduced. There is an excellent presentation along this theme by Guy Steele: “Growing a language”.

som-snytt · January 27, 2020, 4:31pm

Maybe the new rule should be that discussion of Scala 3 implicits syntax is not allowed unless conducted over a beer or other beverage of preference.

That rule could also improve the atmosphere at SIP meetings.

Is there already an XKCD where any mention of Dotty leads inevitably to a personal opinion about syntax?

Also add it to the list of things excluded from dinner conversation: politics, religion, Dotty implicits.

On-topic, I appreciate the patented @lihaoyi tone of thoughtful, fearless optimism. We’ve heard the Python 3 alarm for so long that it’s baked into the Scala 3 sales pitch; I’m not sure I’ve heard “keep calm and migrate on.”

eyalroth · January 27, 2020, 4:37pm

Yeah, I was just trying to prevent yet another syntax bikeshedding discussion about implicits. I couldn’t agree more with your underlying message.

jsuereth · January 27, 2020, 4:39pm

Thanks for the excellent write up!

These are all things that have been discussed and are top-of-mind in the discussion of Scala 3. You definitely nailed the major concerns when it comes to community, stability + maintenance, and then commercial migration.

I think your concern w/ macros and dependencies on macros is a major pain-point for evolving the ecosystem. It means that the binary-compatibly usage-mode for libraries will basically require a fence between downstream scala 2 code, and scala 3 code. This gives an option for folks either at the beginning of the dependency chain, or the end but really hurts those in the middle (like ammonite).

As you’re aware, the Scala ecosystem has focused on rebuilding from core, and most maintainers have gotten great at rebuilding the ecosystem quick on release. The binary compatibility basically just gives those at the FAR downstream the ability to try out a (limited) Scala 3 with their current tooling (theoretically).

I think ensuring these ‘middle-of-the-graph’ dependencies have a good migration story is critical for the health of the ecosystem (both OSS and commercial), so thank you for calling attention to it!

nafg · January 27, 2020, 5:11pm

One thing that might help a lot with migration is if Scala 3 macros were implemented for Scala 2. The reverse is impossible because Scala 2 macros are about exposing low-level implementation details which don’t exist in, and can’t be mapped to, Scala 3. However Scala 3 macros are very high level. In principle I expect it would be possible to implement them on Scala 2.

Of course, the development effort would be significant. I don’t know that anyone has the time, ability, and interest to do this. But I think it’s the only way to get to the seamless transition that Li Haoyi is describing.

odersky · January 27, 2020, 5:49pm

I’d be happy if we could just have extension on syntax. But we can’t since this syntax does not allow to define or implement abstract extension methods. And that capability is the key to do typeclasses. Without it, you are back to simulacrum. Have a look at https://dotty.epfl.ch/docs/reference/contextual/typeclasses-new.html and see whether that could be implemented with just extension on.

eyalroth · January 27, 2020, 8:10pm

I think this hits the point of what @rgwilton and the rest of us are saying regarding simplicity. The new given system is not simple (and definitely isn’t simpler than before), and it will make it harder for people to migrate (or newly adopt Scala).

There are alternatives for making this much simpler, and I would’ve loved if someone from the SIP committee would take a look at those alternatives and actually consider them.

odersky · January 27, 2020, 8:40pm

Let’s take discussions about individual features in the threads where they belong. I should not to have responded to @rgwilton here, my mistake.

odersky · January 27, 2020, 9:26pm

I believe it would be a huge undertaking to re-implement Scala 3 macros in Scala 2. A solution that’s simpler to implement was proposed by @dwijnand: Allow parallel Scala 3 and Scala 2 implementations of the same macro in the same source file. The effort to write these dual implementations is not worse than writing everything with Scala 3 macros if we take into account that the Scala 2 versions exist already.

That would then allow cross building libraries without separate version specific files (but maybe such versions are still needed for other reasons). I’d be interested in feedback from library maintainers whether such a feature would help in cross-building.

dwijnand · January 27, 2020, 9:41pm

The motivation for this idea was to allow libraries to transition to the Scala 3 compiler, while still keeping Scala 2.13 support for their libraries. Kind of like an interim stage in migrating a library from Scala 2 to Scala 3.