That would be very welcome, but likely a significant effort. I donāt think we will have time to tackle this ourselves in the next 6 months.
can spark finally make it into the 2.12 community build?
we can discuss at add Spark Ā· Issue #763 Ā· scala/community-build Ā· GitHub. Iāve already put some thoughts there
To clarify, the 2.12 port isnāt done, there is no release built of 2.12. From an watchers perspective, it appears huge progress has been made, and they are again very close, but in case anyone reads this and wonders why 28 days later they canāt find binaries, it is because the issue is still open. Once a release is up on Maven though it would be great to have an update to this thread for watchers to know they can start their 2.12 migration efforts. I know I, for one, will immediately act on itās availability.
Spark 2.4 will release with beta Scala 2.12 support.
So does that leave Scala-native as the most significant part of the Scala eco-system still on 2.11?
Scala Native 2.12 support is scheduled for the next release.
@sadhen - can you link to anything stating that Scala 2.12 support is billed to appear in Spark 2.4? Thereās no fix version listed for SPARK-14220.
Despite all the initial euphoria it seems there are still non-trivial issues that need to be addressed before this ticket can be considered resolved - at least according to Sean Owen in this update on 7th August. Sean Owen seems to be the public face at Spark for this ticket so Iād say he knows what heās talking about.
Since August 7th thereās been no new update that Iāve seen. Iām really glad of all the heavy lifting (as Sean puts it) that has gone into this ticket so far - but overall the visibility on this issue has been low. Up until the (now apparently premature) comment on 2nd August that the issue had been resolved there was little sign that anything was actually happening - so itās hard to know if people are working hard on the further issues pointed out by Sean or if itās gone quiet. Maybe everything will be resolved soon - or maybe not.
Thanks for the update.
Could you or Sean make those issues actionable? I donāt know if someone at Lightbend is working to fix those, but unless those issues are extended with better description of the problems, I donāt have a feeling of how those issues are non-trivial and how the core team can help to fix them.
We have ongoing threads with the spark team, mostly on github. Not aware of anything being blocked by the Scala team. Since aug 7, a new Janino release was cut for spark to unblock one of the tickets. Another ticket has also made progress thanks to Sean (I donāt have the PR handy, but it was about the udf method and type tags.
EDIT:
- this is the issue I mentioned: https://issues.apache.org/jira/browse/SPARK-25044
- all subtasks of the umbrella ticket are either done or (in 1 case) have a pending PR
https://issues.apache.org/jira/browse/SPARK-14220 is resolved now.
Last week, I (Darcy Shen) contribute some time for the migration.
Now, the last failing unit test has been fixed.
And this PR (https://github.com/scala/scala/pull/7156) is for the migration. Spark SQLās Row uses WrappedArray, as a result, the bug affects the correctness of the equality of Row.
@ghawkins the stating: http://apache-spark-developers-list.1001551.n3.nabble.com/code-freeze-and-branch-cut-for-Apache-Spark-2-4-tt24365.html#a24839
Is it correct then that Spark users can upgrade to Spark 2.4 once Scala 2.12.7 is released? If so, can anyone comment on the ETA of Scala 2.12.7?
The milestone is set for Sep 14th. 2.12.7 Milestone Ā· GitHub
So if they make to that date, the release is a few days after.
This is great! Thanks much to all the Spark devs!
Is this a definite that Scala-Native 0.3.9 / 0.4.0 wonāt be released without Scala 2.12 support? Maybe I misunderstood, but I thought 2.12 support had been scheduled for earlier releases.
I suggest asking about scala-native on 2.12 at the relevant ticket: https://github.com/scala-native/scala-native/issues/233. Letās keep this thread as focussed as possible
A week late in noticing but I thought it might be worth pointing out here that Scala 2.12 support is finally GA. After experimental support was announced last November in Spark 2.4.0, itās now GA in Spark 2.4.1 as announced in their release notes on March 31.
So itās great to finally see this. Obviously, everyone involved should be congratulated
However, perhaps some post-mortem analysis is still in order to determine why it took almost 29 months to move what, for many, is a major part of the Scala ecosystem from Scala 2.11 onto 2.12. The technical reasons are known - but was this fundamentally a language-specific issue, i.e. something about Scala itself, or some narrower Spark specific issue? That Spark saw other issues as having greater value to their users, and so devoted time to them rather than to enabling their user base to shift to Scala 2.12. If this was the case, then the question is why didnāt they see it as valuable to allow their user base to keep up with the overall Scala ecosystem or why was the cost of doing so seen as prohibitive (in the short term).
Perhaps itās as simple as one significant technical issue - the closure cleaner. This has been discussed in this thread and was discussed in the Lightbend blog entry āHow Lightbend Got Apache Spark To Work With Scala 2.12ā (the title gives the impression is that it was just Lightbend that got things done, but others are credited within the article itself).
With the Scala 2.13-RC1 milestone closed and the release in-progress, one has to hope there wonāt be a similar delay in Spark being available for Scala 2.13. It would, of course, be amazing if Spark was eventually added to the community build process (as covered in community-build issue #763).
From looking at this on a surface level (note that I donāt really use Spark that much) I can think of some recommendations for a postmortem/retrospective
- It sounds like there is an argument for putting the closure cleaning (or a part of it) inside Scala stdlib somehow (or maybe the compiler, not sure which abstraction level works best). This may put more effort onto the Scala compiler team, but it turns out that implementing the closure cleaner properly required effort from the Scala compiler team anyways. At least if the closure cleaner is part of the official Scala release, it will always be available when Scala gets released and it appears that this isnāt as Spark specific as we think it is (i.e. look at Fink)
- Alternately it seems like it may be a good time to revisit Sporeās (https://docs.scala-lang.org/sips/spores.html) which were deliberately designed to solve this problem. I know there were some technical reasons as to why they couldnāt be completely finished, but there might be some argument in getting Sporeās over the hurdle so they can actually be used (or maybe even part of Scala itself)? As far as I understand, if you use Sporeās instead of Scalaās closure, you donāt even really need a closure cleaner since with Sporeās you have to explicitly define which variables get included inside a closure.
As you said, and its important to re-iterate, there were very real reasons for this delay. It was an incredibly technical problem, so much so that it required Scala compiler engineers to get it over the fence, so I donāt think we should over dramatize what happened.
Will closure encoding change dramatically after Scala 2.12? Scala 2.12 biggest change was leveraging and integrating with lambdas support in Java 8, i.e. closureās encoding was rewritten from scratch. In Scala 2.13 the biggest change is another collectionsā redesign.
IMO we should just wait and see how long it takes to adapt Sparkās closure cleaner to Scala 2.13. If it takes several months then integrating closure cleaner with core Scala would be warranted. Otherwise the huge delay in supporting Scala 2.12 could be treated as a rare event.
I donāt think thereās anything thatāll need to be changed in the closure cleaner for 2.13.