Spark as a Scala gateway drug and the 2.12 failure


#42

Now that the 2.12 port is done I was wondering if anyone on the Scala team (or anyone else) could comment on how things might look for 2.13 … I’m curious whether the recently-completed work will make things easier next time around.


#43

The main 2.12 changes impacting Spark were (in detail):

  • our new closure encoding actually made it easier to implement Spark’s “closure cleaner”, but it took some effort to convince ourselves (see details in doc linked above)
  • SAM types being compatible with function types resulted in a source incompatibility – this was resolved in 2.12.0 by improving type inference for overloaded higher-order methods. Further improvements coming in 2.13. One corner case remains with Unit-returning functions.
  • we need a stable API for REPL users, such as Spark. Help greatly appreciated in coordinating between the various projects.

For 2.13, I expect we’ll have this stable REPL API, but the collections are somewhat unknown in their impact on the Spark code base. If anyone would like to try – now is an excellent time, and this greatly benefits both communities! Sadly, our team at Lightbend will likely not be able to get to this in time for M5.


#44

@SethTisue can spark finally make it into the 2.12 community build?


#45

That would be very welcome, but likely a significant effort. I don’t think we will have time to tackle this ourselves in the next 6 months.


#46

can spark finally make it into the 2.12 community build?

we can discuss at https://github.com/scala/community-builds/issues/763. I’ve already put some thoughts there


#47

To clarify, the 2.12 port isn’t done, there is no release built of 2.12. From an watchers perspective, it appears huge progress has been made, and they are again very close, but in case anyone reads this and wonders why 28 days later they can’t find binaries, it is because the issue is still open. Once a release is up on Maven though it would be great to have an update to this thread for watchers to know they can start their 2.12 migration efforts. I know I, for one, will immediately act on it’s availability.


#48

Spark 2.4 will release with beta Scala 2.12 support.


#49

So does that leave Scala-native as the most significant part of the Scala eco-system still on 2.11?


#50

Scala Native 2.12 support is scheduled for the next release.


#51

@sadhen - can you link to anything stating that Scala 2.12 support is billed to appear in Spark 2.4? There’s no fix version listed for SPARK-14220.

Despite all the initial euphoria it seems there are still non-trivial issues that need to be addressed before this ticket can be considered resolved - at least according to Sean Owen in this update on 7th August. Sean Owen seems to be the public face at Spark for this ticket so I’d say he knows what he’s talking about.

Since August 7th there’s been no new update that I’ve seen. I’m really glad of all the heavy lifting (as Sean puts it) that has gone into this ticket so far - but overall the visibility on this issue has been low. Up until the (now apparently premature) comment on 2nd August that the issue had been resolved there was little sign that anything was actually happening - so it’s hard to know if people are working hard on the further issues pointed out by Sean or if it’s gone quiet. Maybe everything will be resolved soon - or maybe not.


#52

Thanks for the update.

Could you or Sean make those issues actionable? I don’t know if someone at Lightbend is working to fix those, but unless those issues are extended with better description of the problems, I don’t have a feeling of how those issues are non-trivial and how the core team can help to fix them.


#53

We have ongoing threads with the spark team, mostly on github. Not aware of anything being blocked by the Scala team. Since aug 7, a new Janino release was cut for spark to unblock one of the tickets. Another ticket has also made progress thanks to Sean (I don’t have the PR handy, but it was about the udf method and type tags.

EDIT:


#54

https://issues.apache.org/jira/browse/SPARK-14220 is resolved now.

Last week, I (Darcy Shen) contribute some time for the migration.

Now, the last failing unit test has been fixed.

And this PR (https://github.com/scala/scala/pull/7156) is for the migration. Spark SQL’s Row uses WrappedArray, as a result, the bug affects the correctness of the equality of Row.

@ghawkins the stating: http://apache-spark-developers-list.1001551.n3.nabble.com/code-freeze-and-branch-cut-for-Apache-Spark-2-4-tt24365.html#a24839


2.12.7 release :train:
#55

Is it correct then that Spark users can upgrade to Spark 2.4 once Scala 2.12.7 is released? If so, can anyone comment on the ETA of Scala 2.12.7?


#56

The milestone is set for Sep 14th. https://github.com/scala/scala/milestone/73
So if they make to that date, the release is a few days after.


#57

This is great! Thanks much to all the Spark devs!


#58

Is this a definite that Scala-Native 0.3.9 / 0.4.0 won’t be released without Scala 2.12 support? Maybe I misunderstood, but I thought 2.12 support had been scheduled for earlier releases.


#59

I suggest asking about scala-native on 2.12 at the relevant ticket: https://github.com/scala-native/scala-native/issues/233. Let’s keep this thread as focussed as possible :slight_smile:


#60

In related news, I just promoted 2.12.7 to maven. More info: 2.12.7 release 🚋