Scala 2.12 has been out for more than a year now (it was released back in November 2016).
I would guess that historically Spark has been a big source of new recruits to the Scala world. People who might not necessarily have been interested in discovering a new language started using Spark because it addressed specific problems that they had to deal with and through this they discovered Scala.
So it’s disappointing that Spark, rather than leading the Scala charge, is actively holding up the move to 2.12 for a large element of the community. As a major part of many people’s infrastructure the availability, or lack of it, of a Scala 2.12 version of Spark affects their take up of the current Scala release.
Progress on moving to 2.12 is tracked by SPARK-14220. If you look at it you can see that there are many subtasks and that the majority of them are marked as resolved. This gives the impression of a large amount of progress with little more to do, however, if you’ve been tracking this item you’d have seen that little or no real progress has been made on this in the last year. I.e. it’s not going anywhere quick.
Most of the resolved subtasks are trivial (waiting for the availability of 2.12 versions of various libraries and updating to them once available) and many were already marked as done by the time of the final release of 2.12 or shortly afterward.
The final two items (SPARK-14540 [1] and SPARK-14643 [2]) show little sign of active progress. My impression is that Josh Rosen and Sean Owen would very much like to get these out of the way but actually, no time has been prioritized to do so.
As such we’re left with “if you really care about this, try to work on [it yourself]” style comments. Fair enough - it seems not enough people do care about it for it to be prioritized by Hortonworks, Databricks, Cloudera or the other major active contributors to the Spark code base.
I find this surprising and problematic for the Scala community at large. If major projects stay on earlier versions of Scala then this splits the community and makes progress difficult. The lack of progress on SPARK-14220 (it’s essentially in stasis) doesn’t give much hope that it’ll even be resolved by the time Scala 2.13 release candidates start appearing in a few months.
Perhaps the Spark community has decided that Scala 2.11 is good enough and they’re happy to stick with it indefinitely but I think this attitude has and will freeze the progress of Scala in many major in-house projects all over the world.
Lightbend and others have made major efforts to help out on projects that are seen as very important to the health of the overall ecosystem but, for whatever reason, seem to have stalled. SBT being the most obvious case.
SBT is obviously more fundamental to almost everybody’s Scala workflow than Spark but I’d argue that Spark is big enough that left as it is it could act as a major drag on the take up of new Scala releases.
I don’t think we’re anywhere near a Python 3 moment but it would be a shame if the failure of major frameworks to shift to new versions of Scala led to a split up of the Scala world. Scala has made so much progress on eliminating the headache of libraries being tied to specific versions - all the effort to build popular libraries and pick up issues before new releases means that the most popular libraries are now generally available immediately along with the final release of a new Scala version.
With the release of Java 9 and 10 it seems Scala has been moving in the right direction while Oracle is moving in a direction reminiscent of the early years of Scala, i.e. major libraries breaking on new releases without new versions being lined up to play well with whatever changes have been introduced.
Anyway - as usual, I’ve produced a TL;DR message on something simple. Does the Scala community see it as an issue that a major framework like Spark remains stuck on a previous release of Scala more than a year after 2.12 became available? And if so might Lightbend or others consider contributing effort to resolving this issue?
Perhaps Spark has grown less important, it certainly seems to have lost much of the vim of its earlier years (though perhaps this is fair enough and healthy as something matures). But as part of the Scala specialization at Coursera (Big Data Analysis with Scala and Spark taught by @heathermiller) it still seems to be pretty central to the whole Scala offering.
Thanks for reading this far
Note: this was commented on before back in early February in “Spark and Scala 2.12” [3], when I was too busy on other things to weigh-in, but it didn’t seem to result in much follow-up then.
/George
Links that Discourse prevented me from including in clickable form as they exceeded my two links per post limit:
https://issues.apache.org/jira/browse/SPARK-14540
https://issues.apache.org/jira/browse/SPARK-14643
https://contributors.scala-lang.org/t/spark-and-scala-2-12/1576/5