I’m curious: are there any estimates as to the prevalence of future use of meta programming quotes in Scala code? There seems to be an implicit judgment here about the importance of making that use simpler being greater than the importance of supporting the established Scala coding patterns of the fastest growing pool of new Scala users in the world.
After all, Spark will move to Scala 3 at some point and what then? Notebook users will have no refactoring tools or even compilation checking tools to upgrade the many millions of Scala notebooks that will be around by then. My point is simply that delaying the change doesn’t make things better: it simply allows the problem to grow.
On the one hand, I grant the possibility; OTOH, I worry that this is a bit blithe.
So far, I’ve generally been nodding along with the notion that the symbol-literal syntax is a minor edge case, with a smallish number of users who should be able to use Scalafix to deal with their issue. Any serious Scala code is probably going to want to use Scalafix for the Scala 3 upgrade anyway, and until now I had figured that only relatively serious Scala code was using this syntax.
But this one gives me pause. As far as I know, this is a comparatively large and relatively unsophisticated (in terms of Scala) audience, using tooling other than what most of us here are used to. Can this be handled with Scalafix? In theory, sure. But in practice, I have no idea whether that’s realistic.
I’d like to believe that you’re correct, but I think it remains to be demonstrated. Without a straightforward workflow for the workbook users, suitable for an audience that deeply does not give a damn about Scala 3 and isn’t willing to put more than slight work into it, I’m concerned that this is a potentially major “shoot ourselves in the foot” moment, the sort of “split the community” item that we’re trying fairly hard to avoid…
Sure, but it could be as much as putting a button on a Jupyter notebook that says “Migrate to Scala 3” which will just convert all of the Scala source code in the workbook.
My point is that the logical conclusion of such an attitude is that “some things cannot ever be changed because a lot of people can use them” even in the light of the fact that Scala Center is putting a lot of effort into developing tools so that this doesn’t happen.
Perhaps this raises the question, is our goal for everyone to upgrade to Scala 3? Personally I think there is no way that will happen unless we seriously rethink our approach to evolving the language, as suggested in the thread I started a while ago, Move Slowly and Iterate. Maybe the goal is not so ambitious though? Maybe we should be satisfied if most people upgrade and accept that many people never will?
I’d echo the pragmatism of @jducoeur. While it doesn’t pay to freeze the evolution of a language, it certainly doesn’t pay to imagine unrealistic versions of the future, especially when it comes to the behavior of multiple third parties and large pools of unsophisticated users.
With this in mind, here is some context about notebook environments.
There are at least three major notebook environments for Spark that I know of: Jupyter (backend is Python), Zeppelin (backend is Java/Scala), Databricks (closed source, unknown backend but huge usage because Databricks is the company behind Spark). The frontends are all JavaScript.
Notebooks are JSON files and only some of the formats are documented.
A large portion of notebooks is not under SCM so there is no easy way to say “run this tool on all my notebooks”. In some managed services, such as Databricks, the main way to get/update the JSON of a notebook is via a REST API. Not all notebook users have REST API access.
Notebooks have include dependencies. Automated jobs run from notebooks. In short, in many environments there is no atomic way of updating many notebooks without causing production failures.
Notebooks execute cells in a REPL session. Code in a cell is compiled when a cell is executed.
Closed source and managed service implementations add proprietary code wrappers around the code in a cell. Also, there are dependencies on closed source JARs.
In short, any tool that attempts to rewrite the JSON will have to operate entirely on the level of Scala syntax (it cannot rely on compilation).
Further, there is no way to verify that a migration of notebook code is successful in an automated manner.
Call me a skeptic, but I don’t see any reasonable way to automate code migration that won’t cause significant production problems and won’t delay some migration error discovery (due to migration tool bugs) until the time a notebook cell is executed.
Therefore, very simply put, this boils down to a judgment call about which community of users we value more and are willing to hurt/aid: the hundreds of thousands of people who’ll be using Spark+Scala notebooks by the time it’s time to make this change or the unknown number of advanced developers working in IDEs who will be using meta programming quotes.
There is no right technical answer here; this is about the ethics of breaking changes and how they affect different types of users. On that note, we should acknowledge that notebook users are substantially under-represented in this discussion as the folks typically engaged in SIP discussions are advanced developers working in IDEs.
That sad story about notebooks suggest that ALL current Scala features must be frozen, because removing anything will break notebooks. That include:
symbol literals
implicits of any form (they are frequently used by everyone, because a lot of libraries use them)
XML literals
forSome
early initializers
with syntax in types (which is going to be replaced by union and intersection types)
auto application
weak conformance
etc
Generally, looking at http://dotty.epfl.ch/docs/index.html we see a lot of changed and removed features. Shall we abandon all that because Scala notebooks?
I see at least one solution to reduce migration pain. Support old syntax in Scala for some time and then (at the same time) report all deprecation warnings to users with references to changes in Scala language. For example if we deprecate symbol literals in Scala 2020 (and accordingly notebook software will warn about them straight away), but they will be removed only in Scala 2025 then users will have 5 years to react.
I think there is also -language:future or something like that which warns about future deprecations. With that users that are still on e.g. Scala 2.11 (this is currently the default in Spark, I think) can be warned about more features that will go away.
Sure, but the way you are stating your arguments basically creates a conclusion which amounts to stonewalling, i.e. we are never going to change because of “x”
Right, but tools like Scalafix have nothing to do with IDE’s nor how advanced developers are. IDE is not going to make any difference in how Scalafix is applied other then providing a shortcut to run the command, which again isn’t any different then integrating Scalafix in the Scala workbook based solutions which will run the migration on .scala files in the users account.
I mean there are always problems in coding (whether you are using notebooks or IDE’s, whether you are an advanced user or not). From having used notebook solutions like Jupyter and actually worked with people that use such solutions, migrating notebook solutions is actually far easier than what the “advanced coders” do because the notebook environment that the Scala source files work in is very restricted and restrained. This actually makes the migration much easier because the set of imports that notebooks run under is basically frozen and trivial. The amount of source code contained in notebooks is irrelevant if the type of code written (from a language perspective) is trivial (which it is). The main complexity in the style of code written in notebooks is math and equations/formula, which from a language POV is almost trivial (and its also something that is not even changing)
I can’t help but see as this making a mountain out of a molehill, I mean nothing is actually stopping the current notebook environments from integrating something like Scalafix in an easy way users are going to understand. Its just going to be a feature request like any other. Of course just refusing to even try then this is another problem altogether.
I am not saying that the solution is going to “magically fix everything” but nothing is stopping from integrating Scalafix in a notebook solution (which would be bundled with migrating on Scala3) and seeing if it works on a current set of Scala worksheets. This is the pragmatic solution, which is actually trying it in the first place! If there are issues then great, we can work as a community to solve them.
Our priority should surely be getting on to Dotty. The big hurdle to that as I understand it, is finding replacements for Macros. To that end I would suggest we need to be ruthless. If removing Symbol literals significantly helps, then it should be done as a priority, possibly even putting a deprecation warning into 2.13, if that could be done without delaying 2.13.0’s release date. If it doesn’t significantly aid the Dotty in replicating the Scala 2 eco systems functionality, then the issue should be left to 3.1 or later.
I think what I’m suggesting is already the strategy of the Scala core development teams. I’m just suggesting a sharpening of that strategy. I would also suggest that two and half years is far too long between versions. That yearly versions of Scala are necessary to implement the replacement, deprecation, removal cycle.
Given the time that 2.13 has taken is it realistic to hope to get away with just one more Scala 2 version before Dotty?
@tarsa of the list you provided, only symbol literals and implicits are heavily used in notebooks. I agree that an extended period of deprecation would be useful. It might even prompt notebook projects/vendors to help with migration tools.
Still, I’m slightly baffled by the urgency to deal with this issue of syntax relative to the cost. Fundamentally, looking at the setup of this discussion, four of the key pro arguments were incorrect:
Symbols are used in some existing Scala code, but are not used pervasively.
Wrong, because of Spark
Most DSLs that use symbol literals don’t need interning and could therefore use string literals instead, at the cost of only one additional character per literal.
Probably true in terms of number of DSLs but certainly not true in terms of volume of DSL usage. In Spark’s Scala DSL this doesn’t work.
Although there is migration cost to use Symbol() or sym or string literals instead, it’s not tricky migration, it’s an easy Scalafix rewrite (or, as a quick-and-dirty alternative that would work in many codebases, even search-and-replace).
Wrong, because of notebooks.
Also, Aaron Hawley did an experiment using the Scala community build (described in this comment ); he believed the results indicated the change wouldn’t be too disruptive.
Wrong, because this ignored notebook code.
To be unaware of the needs of the fastest-growing community of Scala users and get things so wrong in setting up this SIP should, hopefully, cause some soul searching when it comes to the evolution of the language. After all, while we may not love how Spark uses Scala, it is the best thing that has happened to Scala in terms of popularity recently.
I am not arguing for a stop to language evolution. I’m arguing for deep understanding of the reality of the situation and a thoughtful cost-benefit analysis.
@mdedetrich rather than assume what multiple OSS projects and for-profit vendors might do in the future, doesn’t it make sense to try to get their perspective now?
I’m surprised that authors of Scala based tools are unaware that Scala is evolving in backward incompatible ways. Scala did that since forever.
I think that backporting deprecation warnings to Scala 2.11 and Scala 2.12 is the most sensible way to go. The date at which symbol literals should be dropped could be decided right before first release candidate of Scala 2.14, after talking with Spark community.
Regardless of decision about symbol literals, Scala notebooks users should be notified about migration requirements well in advance. The sooner the better.
symbols are great for DSLs because they look cleaner than strings ('foo is only 1 tick of noise versus 4 for “foo”), so i would be sad to loose the 'foo syntax.
i never cared much for the Symbol class itself. if 'foo returned the String “foo” that would work just as well for me.
if there is an alternative syntax that is as clean as 'foo such as :foo or %foo i am fine with that too. if not we will use strings in DSLs instead and accept that it is a little uglier.
i have no interest in Symbol(“foo”) or sym"foo" since they are worse than strings in terms of readability so i rather use strings instead if those are the alternatives.
I agree with @ssimeonov that this is going to break a huge amount of code out there, and will be giving a very negative impression to the users of the biggest drivers of Scala adoption. In addition to Spark’s DSL, Symbols are used a lot in Apache Flink and in Akka’s routing DSL, as I explained in this comment here.
To deprecate a huge part of these projects’ apis seems very much like “biting the hand that feeds”. We should be bending over backwards to please these crowds, not breaking them.
Considering the 2.12->2.13 is already going to be a quite breaking change, and there’s a whole flurry of changes scheduled for 3.0, I think that this is a pretty low priority change. Just my opinion.
Sure but thats not the impression that I am getting from your posts, the tone of what you are saying is seems to indicate that you don’t want anything to change. There also seems to be cases where you are jumping the gun, for example when you say (in context of ScalaFix)
There is no evidence of this whatsoever. ScalaFix can take Scala sources as input and it outputs migrated Scala sources. I don’t see any indication why this would be problematic for notebooks, especially considering how trivial the transformation is.
I don’t have issues with people phrasing their reservations/issues/problems (in fact this should be encouraged) but there is a difference between saying “this is going to be painful for notebooks but lets investigate on how we can solve this issue with existing tools” vs “this cannot change because a portion of the Scala demographic is relying on this feature and there is complete refusal to change”.
I guess the biggest thing that is hitting my nerve is the whole refusal + jumping to conclusions and assumptions (i.e. we can’t fix this issue because people use notebooks, I fail to see why this would hold any bearing).
Also echoing this statement
Its been very clear, right from the start, that Scala is not Java. Its not a language that is going to stay static forever and always bend backwards when it comes to compatibility. That doesn’t give it a free ride to always change things, but at the same time it also means that we can’t just give a blanket “no” when there is good technical impetus/reasons behind the change. There is middle ground which is where Scala sits.
Right but this is just one side of the coin, if the fix is trivial (which it is) the amount of code out there is (almost) irrelevant if it can all be migrated with one button/cli (or however you want to implement it). The migration for Python2 to Python3 was painful because there was no tool like Scalafix, and even if that was the case it would be so inaccurate that the tool wouldn’t be useful just because of Python’s dynamic nature.
That’s fine that it’s possible in theory, but how is the usability of Scalafix from a notebook? Wouldn’t some kind of export → migrate → re-import tools have to be implemented? And is there any guarantee anyone is going to? Because if not, then I don’t think that the existence of Scalafix really negates the point being made that this will break a huge amount of code and leave a really sour taste in a lot of people’s mouths, at a time when a whole lot of other changes are going through. I don’t see why this can’t happen in 3.1 or beyond, all the reasons mentioned so far are seem pretty non-urgent
Well yes, but even if we are going to be arguing that users are unable to press a couple a couple of buttons (which equates to running Scalafix under the hood, or even just -migration -rewrite as Martin suggested) then I honestly don’t know what to say.
I mean honestly I don’t even know why we are discussing this, such functionality should automatically be part of the notebook software, I mean one of the major points of notebooks is they are meant to make things approachable for the people using them.
Sure there is nothing wrong with this, but delaying the change is a pretty different statement compared to saying we shouldn’t do it at all. It also seems a major impetus for this change is to get the new version of macros in, which I would argue is also a very important change since so many libraries either directly or indirectly depend on macros.
I mean, maybe I’m just ignorant but I’ve never heard of such a button in Apache Zeppelin for instance. Does such a button exist now? If not, then is somebody going to build this button? If nobody is planning to build this button, then it isn’t just a matter of clicking a couple buttons.
Tools like Jupyter rely on Scala and the ecosystem to work, so it should be their impetus to proactively deal with such issues. Sometimes languages have to migrate their features, it doesn’t have to happen very often but does occasionally. Its also their job to evolve along with the Scala community.
It should be implemented for the same reason Intellij supports new features when Scala releases them, this isn’t any different. If a tool like Jupyter decides not to do this, then they are responsible for it. Not a lot is being asked.
I am not sure why this is even being debated, when Scala 2.12 completely rewrote their backend with regards to how lambda’s worked (because Scala 2.12 is now using SAM’s from Java 8) which affected Spark more then most, I didn’t see people going around saying that Scala 2.12 should not target Java8 Lambda’s and should forget about dropping support for Java 1.6. Instead Spark contributors accepted this and started doing the necessary changes. Yes it did take a huge amount of time, but then again this problem (i.e. reliably serializing all possible form of Scala’s lambdas) is hundreds of times more complicated than the one that is being discussed in this thread.
All I see is people are making a massive deal out of something that isn’t, we have been through this (and much worse) before, its really not as big as people make it out to be.
And on another note about the Scala 2.12 lambda change, there were actually much more legitimate reasons as to why that shouldn’t have happened outside of Spark, i.e. Scala 2.12 basically killed any chance of the language working with Android (at least for a while) because for some obtuse reason Google Android decided not to support Java 1.8. If there is any reason why we should say “no” to something, it should be along these lines and honestly not about this Symbol literal change (which is pocket change in comparison)
Could one possible compromise be to leave the 'symbol syntax in place, but put it behind a flag (that notebook-style applications can enable when they migrate) and make it desugar to java.lang.Strings instead? Then whoever wants to upgrade their notebook-style applications to Scala X.Y, with symbols removed, can simply enable the compiler flag and convert their user-facing function signatures from Symbol to String. Most importantly, this would allow all the user code in hard-to-refactor places to remain unchanged