Spark and Scala 2.12


#1

I’ve been busy with work but noticed that Apache Spark still hasn’t moved to Scala 2.12. I think this is a shame. Does anyone know the fundamentals of why (what’s in Scala 2.12) that makes Spark hard to upgrade? Just curious, not sure I’d be able to help, but it seems like months have been spent on it.
I think, for many Java teams, Spark is the gateway to picking up and adopting Scala, so it seems a shame it seems to be stalling somewhat?


#2

You may find a lot of info here: https://issues.apache.org/jira/browse/SPARK-14220

Seems like there are only couple of things left.


#3

In short, there are two pending issues:

  • Java 8 lambdas in Scala 2.12 breaks the Dataset API: more details
    • they need to remove a method = requires major number bump (ex: 2.2 to 2.3)
    • this is from March 2016
    • this is solved / trivial
  • Closure cleaning for Scala 2.12: 1% of tests still fails (2 tests) more details
    • PR fixing 99% in November 2017
    • remaining 1% is a hard problem, they might cut a release anyway

My guess is that it might be available in 2.3. The version 2.2 was released the 1st of December 2017. I would say this will happen in 2018 :smiley:.


#4

Thanks for that, looks like they are close.


#5

They may be close, but if you really need it then it’s worth looking into yourself. Those tickets looked identical 6 months ago, when it also looked close.


#6

If this is a pressing issue, I would suggest publishing spark on 2.12 on your own and using it for development (the dev workflow with 2.12 should be much smoother). Don’t let that 1% of failing tests in the closure cleaner prevent you from using a better Scala :wink:.


#7

Great news! The issue is fixed for Spark version 2.4.0! :smiley: :smiley: :smiley:
https://issues.apache.org/jira/browse/SPARK-14220