Spark as a Scala gateway drug and the 2.12 failure

Now that the 2.12 port is done I was wondering if anyone on the Scala team (or anyone else) could comment on how things might look for 2.13 … I’m curious whether the recently-completed work will make things easier next time around.

The main 2.12 changes impacting Spark were (in detail):

  • our new closure encoding actually made it easier to implement Spark’s “closure cleaner”, but it took some effort to convince ourselves (see details in doc linked above)
  • SAM types being compatible with function types resulted in a source incompatibility – this was resolved in 2.12.0 by improving type inference for overloaded higher-order methods. Further improvements coming in 2.13. One corner case remains with Unit-returning functions.
  • we need a stable API for REPL users, such as Spark. Help greatly appreciated in coordinating between the various projects.

For 2.13, I expect we’ll have this stable REPL API, but the collections are somewhat unknown in their impact on the Spark code base. If anyone would like to try – now is an excellent time, and this greatly benefits both communities! Sadly, our team at Lightbend will likely not be able to get to this in time for M5.

3 Likes

@SethTisue can spark finally make it into the 2.12 community build?

That would be very welcome, but likely a significant effort. I don’t think we will have time to tackle this ourselves in the next 6 months.

can spark finally make it into the 2.12 community build?

we can discuss at add Spark · Issue #763 · scala/community-build · GitHub. I’ve already put some thoughts there

To clarify, the 2.12 port isn’t done, there is no release built of 2.12. From an watchers perspective, it appears huge progress has been made, and they are again very close, but in case anyone reads this and wonders why 28 days later they can’t find binaries, it is because the issue is still open. Once a release is up on Maven though it would be great to have an update to this thread for watchers to know they can start their 2.12 migration efforts. I know I, for one, will immediately act on it’s availability.

1 Like

Spark 2.4 will release with beta Scala 2.12 support.

So does that leave Scala-native as the most significant part of the Scala eco-system still on 2.11?

Scala Native 2.12 support is scheduled for the next release.

1 Like

@sadhen - can you link to anything stating that Scala 2.12 support is billed to appear in Spark 2.4? There’s no fix version listed for SPARK-14220.

Despite all the initial euphoria it seems there are still non-trivial issues that need to be addressed before this ticket can be considered resolved - at least according to Sean Owen in this update on 7th August. Sean Owen seems to be the public face at Spark for this ticket so I’d say he knows what he’s talking about.

Since August 7th there’s been no new update that I’ve seen. I’m really glad of all the heavy lifting (as Sean puts it) that has gone into this ticket so far - but overall the visibility on this issue has been low. Up until the (now apparently premature) comment on 2nd August that the issue had been resolved there was little sign that anything was actually happening - so it’s hard to know if people are working hard on the further issues pointed out by Sean or if it’s gone quiet. Maybe everything will be resolved soon - or maybe not.

Thanks for the update.

Could you or Sean make those issues actionable? I don’t know if someone at Lightbend is working to fix those, but unless those issues are extended with better description of the problems, I don’t have a feeling of how those issues are non-trivial and how the core team can help to fix them.

1 Like

We have ongoing threads with the spark team, mostly on github. Not aware of anything being blocked by the Scala team. Since aug 7, a new Janino release was cut for spark to unblock one of the tickets. Another ticket has also made progress thanks to Sean (I don’t have the PR handy, but it was about the udf method and type tags.

EDIT:

2 Likes

https://issues.apache.org/jira/browse/SPARK-14220 is resolved now.

Last week, I (Darcy Shen) contribute some time for the migration.

Now, the last failing unit test has been fixed.

And this PR (https://github.com/scala/scala/pull/7156) is for the migration. Spark SQL’s Row uses WrappedArray, as a result, the bug affects the correctness of the equality of Row.

@ghawkins the stating: http://apache-spark-developers-list.1001551.n3.nabble.com/code-freeze-and-branch-cut-for-Apache-Spark-2-4-tt24365.html#a24839

6 Likes

Is it correct then that Spark users can upgrade to Spark 2.4 once Scala 2.12.7 is released? If so, can anyone comment on the ETA of Scala 2.12.7?

The milestone is set for Sep 14th. 2.12.7 Milestone · GitHub
So if they make to that date, the release is a few days after.

1 Like

This is great! Thanks much to all the Spark devs!

Is this a definite that Scala-Native 0.3.9 / 0.4.0 won’t be released without Scala 2.12 support? Maybe I misunderstood, but I thought 2.12 support had been scheduled for earlier releases.

I suggest asking about scala-native on 2.12 at the relevant ticket: https://github.com/scala-native/scala-native/issues/233. Let’s keep this thread as focussed as possible :slight_smile:

In related news, I just promoted 2.12.7 to maven. More info: 2.12.7 release 🚋

3 Likes

A week late in noticing but I thought it might be worth pointing out here that Scala 2.12 support is finally GA. After experimental support was announced last November in Spark 2.4.0, it’s now GA in Spark 2.4.1 as announced in their release notes on March 31.

So it’s great to finally see this. Obviously, everyone involved should be congratulated :tada::slightly_smiling_face:

However, perhaps some post-mortem analysis is still in order to determine why it took almost 29 months to move what, for many, is a major part of the Scala ecosystem from Scala 2.11 onto 2.12. The technical reasons are known - but was this fundamentally a language-specific issue, i.e. something about Scala itself, or some narrower Spark specific issue? That Spark saw other issues as having greater value to their users, and so devoted time to them rather than to enabling their user base to shift to Scala 2.12. If this was the case, then the question is why didn’t they see it as valuable to allow their user base to keep up with the overall Scala ecosystem or why was the cost of doing so seen as prohibitive (in the short term).

Perhaps it’s as simple as one significant technical issue - the closure cleaner. This has been discussed in this thread and was discussed in the Lightbend blog entry “How Lightbend Got Apache Spark To Work With Scala 2.12” (the title gives the impression is that it was just Lightbend that got things done, but others are credited within the article itself).

With the Scala 2.13-RC1 milestone closed and the release in-progress, one has to hope there won’t be a similar delay in Spark being available for Scala 2.13. It would, of course, be amazing if Spark was eventually added to the community build process (as covered in community-build issue #763).

7 Likes