Spark as a Scala gateway drug and the 2.12 failure

That would be very welcome, but likely a significant effort. I don’t think we will have time to tackle this ourselves in the next 6 months.

can spark finally make it into the 2.12 community build?

we can discuss at add Spark · Issue #763 · scala/community-build · GitHub. I’ve already put some thoughts there

To clarify, the 2.12 port isn’t done, there is no release built of 2.12. From an watchers perspective, it appears huge progress has been made, and they are again very close, but in case anyone reads this and wonders why 28 days later they can’t find binaries, it is because the issue is still open. Once a release is up on Maven though it would be great to have an update to this thread for watchers to know they can start their 2.12 migration efforts. I know I, for one, will immediately act on it’s availability.

1 Like

Spark 2.4 will release with beta Scala 2.12 support.

So does that leave Scala-native as the most significant part of the Scala eco-system still on 2.11?

Scala Native 2.12 support is scheduled for the next release.

1 Like

@sadhen - can you link to anything stating that Scala 2.12 support is billed to appear in Spark 2.4? There’s no fix version listed for SPARK-14220.

Despite all the initial euphoria it seems there are still non-trivial issues that need to be addressed before this ticket can be considered resolved - at least according to Sean Owen in this update on 7th August. Sean Owen seems to be the public face at Spark for this ticket so I’d say he knows what he’s talking about.

Since August 7th there’s been no new update that I’ve seen. I’m really glad of all the heavy lifting (as Sean puts it) that has gone into this ticket so far - but overall the visibility on this issue has been low. Up until the (now apparently premature) comment on 2nd August that the issue had been resolved there was little sign that anything was actually happening - so it’s hard to know if people are working hard on the further issues pointed out by Sean or if it’s gone quiet. Maybe everything will be resolved soon - or maybe not.

Thanks for the update.

Could you or Sean make those issues actionable? I don’t know if someone at Lightbend is working to fix those, but unless those issues are extended with better description of the problems, I don’t have a feeling of how those issues are non-trivial and how the core team can help to fix them.

1 Like

We have ongoing threads with the spark team, mostly on github. Not aware of anything being blocked by the Scala team. Since aug 7, a new Janino release was cut for spark to unblock one of the tickets. Another ticket has also made progress thanks to Sean (I don’t have the PR handy, but it was about the udf method and type tags.

EDIT:

2 Likes

https://issues.apache.org/jira/browse/SPARK-14220 is resolved now.

Last week, I (Darcy Shen) contribute some time for the migration.

Now, the last failing unit test has been fixed.

And this PR (https://github.com/scala/scala/pull/7156) is for the migration. Spark SQL’s Row uses WrappedArray, as a result, the bug affects the correctness of the equality of Row.

@ghawkins the stating: http://apache-spark-developers-list.1001551.n3.nabble.com/code-freeze-and-branch-cut-for-Apache-Spark-2-4-tt24365.html#a24839

6 Likes

Is it correct then that Spark users can upgrade to Spark 2.4 once Scala 2.12.7 is released? If so, can anyone comment on the ETA of Scala 2.12.7?

The milestone is set for Sep 14th. 2.12.7 Milestone · GitHub
So if they make to that date, the release is a few days after.

1 Like

This is great! Thanks much to all the Spark devs!

Is this a definite that Scala-Native 0.3.9 / 0.4.0 won’t be released without Scala 2.12 support? Maybe I misunderstood, but I thought 2.12 support had been scheduled for earlier releases.

I suggest asking about scala-native on 2.12 at the relevant ticket: https://github.com/scala-native/scala-native/issues/233. Let’s keep this thread as focussed as possible :slight_smile:

In related news, I just promoted 2.12.7 to maven. More info: 2.12.7 release 🚋

3 Likes

A week late in noticing but I thought it might be worth pointing out here that Scala 2.12 support is finally GA. After experimental support was announced last November in Spark 2.4.0, it’s now GA in Spark 2.4.1 as announced in their release notes on March 31.

So it’s great to finally see this. Obviously, everyone involved should be congratulated :tada::slightly_smiling_face:

However, perhaps some post-mortem analysis is still in order to determine why it took almost 29 months to move what, for many, is a major part of the Scala ecosystem from Scala 2.11 onto 2.12. The technical reasons are known - but was this fundamentally a language-specific issue, i.e. something about Scala itself, or some narrower Spark specific issue? That Spark saw other issues as having greater value to their users, and so devoted time to them rather than to enabling their user base to shift to Scala 2.12. If this was the case, then the question is why didn’t they see it as valuable to allow their user base to keep up with the overall Scala ecosystem or why was the cost of doing so seen as prohibitive (in the short term).

Perhaps it’s as simple as one significant technical issue - the closure cleaner. This has been discussed in this thread and was discussed in the Lightbend blog entry “How Lightbend Got Apache Spark To Work With Scala 2.12” (the title gives the impression is that it was just Lightbend that got things done, but others are credited within the article itself).

With the Scala 2.13-RC1 milestone closed and the release in-progress, one has to hope there won’t be a similar delay in Spark being available for Scala 2.13. It would, of course, be amazing if Spark was eventually added to the community build process (as covered in community-build issue #763).

7 Likes

From looking at this on a surface level (note that I don’t really use Spark that much) I can think of some recommendations for a postmortem/retrospective

  • It sounds like there is an argument for putting the closure cleaning (or a part of it) inside Scala stdlib somehow (or maybe the compiler, not sure which abstraction level works best). This may put more effort onto the Scala compiler team, but it turns out that implementing the closure cleaner properly required effort from the Scala compiler team anyways. At least if the closure cleaner is part of the official Scala release, it will always be available when Scala gets released and it appears that this isn’t as Spark specific as we think it is (i.e. look at Fink)
  • Alternately it seems like it may be a good time to revisit Spore’s (https://docs.scala-lang.org/sips/spores.html) which were deliberately designed to solve this problem. I know there were some technical reasons as to why they couldn’t be completely finished, but there might be some argument in getting Spore’s over the hurdle so they can actually be used (or maybe even part of Scala itself)? As far as I understand, if you use Spore’s instead of Scala’s closure, you don’t even really need a closure cleaner since with Spore’s you have to explicitly define which variables get included inside a closure.

As you said, and its important to re-iterate, there were very real reasons for this delay. It was an incredibly technical problem, so much so that it required Scala compiler engineers to get it over the fence, so I don’t think we should over dramatize what happened.

1 Like

Will closure encoding change dramatically after Scala 2.12? Scala 2.12 biggest change was leveraging and integrating with lambdas support in Java 8, i.e. closure’s encoding was rewritten from scratch. In Scala 2.13 the biggest change is another collections’ redesign.

IMO we should just wait and see how long it takes to adapt Spark’s closure cleaner to Scala 2.13. If it takes several months then integrating closure cleaner with core Scala would be warranted. Otherwise the huge delay in supporting Scala 2.12 could be treated as a rare event.

I don’t think there’s anything that’ll need to be changed in the closure cleaner for 2.13.