Proposal to deprecate and remove symbol literals

SethTisue · March 20, 2019, 9:34pm

Greetings, Scala users and contributors!

This thread is the SIP (Scala Improvement Process) Committee’s request for comments on a proposal to deprecate and eventually remove symbol literals from the language.

The discussion will remain open until the committee’s April meeting (likely around April 16th).

Summary

We propose to deprecate, then later remove, the single-quote 'foo syntax for
constructing instances of scala.Symbol.

Background

For more than a decade, the Scala standard library has included a scala.Symbol class. As the Scaladoc for Symbol states,

This class provides a simple way to get unique objects for equal strings. Since symbols are interned, they can be compared using reference equality.

Symbol’s companion object has an apply method, so a symbol can be created by writing e.g. Symbol("foo").

The language has also long included special syntax for this class. Again, the Scaladoc:

For instance, the Scala term 'mysym will invoke the constructor of the Symbol class in the following way: Symbol(“mysym”).

A more formal (and only slightly longer) treatment is in the section 1.3.7 of the Scala Language Specification.

Proposal details

We propose to leave scala.Symbol in place, but remove the special syntax.

At the time the single-quote syntax was added to Scala, Scala did not yet have string interpolators. One may speculate that if string interpolators had been added first, the single-quote syntax would probably never have existed.

The following changes are already tentatively in place in the latest Scala 2.13 nightlies, for release very soon as part of Scala 2.13.0-RC1:

deprecate the single quote syntax (PR #7395)
add a sym string interpolator, so that we can write e.g. sym"foo" instead of 'foo (PR #7495)

The second change (the addition of the sym interpolator) is independent of the first. It would also be possible to simply require users to write Symbol("foo") (which is five characters longer).

Deprecating the syntax in 2.13 paves the way for removing symbol literals entirely from Scala 2.14 and Scala 3.

(A more radical alternative that was also considered, but isn’t part of the current proposal: the scala.Symbol class itself could also be deprecated and removed.)

Discussion

A long discussion on this has already taken place, beginning in December 2017 in this thread started by Martin Odersky:

https://github.com/scala/scala-dev/issues/459

The main points made in favor:

The concepts of symbols is not core to the language.
- “[Symbols] don’t really have a purpose in Scala”
- “I am saddened that this feature of Lisp heritage will likely go away, but despite my best efforts I couldn’t find a compelling reason to keep it.”
Symbols are used in some existing Scala code, but are not used pervasively.
All syntax has a cost. Keeping the syntax means “necessity for us to keep teaching the concept and for programmers the risk of being puzzled when they see it”
The syntax clashes (in our minds, if not in the parser) with the 'x' syntax for character literals. “it looks like an unclosed character literal”
Tooling doesn’t always understand the syntax, especially because of the overlap with the character literal syntax. “I was annoyed more than once by some editor’s syntax highlighting becoming confused”
Scala 2.13 adds literal types, increasing the spec and implementation footprint of the symbol literal syntax (since it would be strange and inconsistent to have the literals without the corresponding literal types).
Most DSLs that use symbol literals don’t need interning and could therefore use string literals instead, at the cost of only one additional character per literal.
Although there is migration cost to use Symbol() or sym or string literals instead, it’s not tricky migration, it’s an easy Scalafix rewrite (or, as a quick-and-dirty alternative that would work in many codebases, even search-and-replace).

Also, Aaron Hawley did an experiment using the Scala community build (described in this comment); he believed the results indicated the change wouldn’t be too disruptive.

The main points against the proposal were:

The change will break existing code, for no substantial gain.
Several very popular libraries use symbol literals. Examples include ScalaTest, Play, and Ammonite.

Conclusion

Opening this proposal for discussion by the community to get additional perspectives.

SethTisue · March 20, 2019, 9:39pm

I’ve already said in the linked discussion that I’m in favor of this proposal; apologies if I didn’t do the arguments against it justice.

I’m not sure if we should even bother adding the sym interpolator. I’m neither strongly in favor nor strongly against.

drdozer · March 20, 2019, 9:46pm

Thumbs up from me, for what that’s worth. I don’t see the need for the interpolator.

MarkCLewis · March 20, 2019, 10:07pm

While I have to admit that the single quote symbol option is my favorite way of referring to columns in SparkSQL, I can see how this would simplify the language and is inevitably worth doing. I’m just not happy about switching to the $“name” syntax instead of 'name. Lazy developer problems.

eje · March 20, 2019, 10:39pm

I’d advocate not adding sym"foo" syntax - it saves very little over Symbol("foo"), and not adding it is aligned with the goal of language simplification

AMatveev · March 21, 2019, 6:52am

I think it is optimistic suggestions.
IMHO: if a constant does not require closing tag it is just more simple constant.
I have the reflex:

type s""
move cursor back
type constant
move cursor forrward

And the second and fourth step a little bit annoy if they have no sense.

I agree with this proposal but I will miss for the simple constants.
Related link:

Better number literals - #18 by odersky

sjrd · March 21, 2019, 9:09am

As a data point, I agree with several others that I don’t see a real point in the sym"..." interpolator. It does not really give any value over Symbol("..."). If someone is bothered by the character count due to heavy use of symbols in their codebase, nothing prevents them from defining such an interpolator/shorter method name locally in their codebase, as we would do for any other locally heavily-used API.

Jasper-M · March 21, 2019, 2:00pm

I have a feeling that almost all API’s that use Symbol do so because of the syntax, not because of the type. So deprecating the syntax (which many people use) but leaving in the type (which nobody cares about) seems a bit backwards to me.

curoli · March 21, 2019, 4:42pm

Yeah, why does Symbol even exist? I understand it is to ensure that the underlying String is interned. But in Java (Scala, too?), String literals are already interned, and for Strings you compute or receive, you can simply call the intern method.

The syntax is as bad as it gets. If we had to find the most unpleasing and hard-to-read way to write a String literal, it would probably to have an opening delimiter, but no clear closing delimiter.

Can’t wait to see this gone.

AMatveev · March 21, 2019, 5:55pm

I completely agree with you.
I do not know the main reason for rejecting the syntax.
It may be

I would prefer it if the following were string literals

''key
''198.0

It is just more comfortable to type ''111.1.nn than "111.1".nn

bvenners · March 21, 2019, 6:12pm

Symbol enforces that the String is a valid Scala identifier, which is useful for dynamic invocations. There are a few ScalaTest matchers that use it for that, for example. So the type does serve a purpose and should probably stay. That said, probably we’d switch that to just a plain String, rather than make our own sym"…" kind of String interpolator (or one from the lib). Given it is for testing, if someone uses a non-identifier String, they would find out the next time they run their tests.

eed3si9n · March 21, 2019, 6:35pm

I agree as well.

In general, if we’re not doing something special with the passed in String, it’s a good sign that we don’t need to use String Interpolation.

nafg · March 21, 2019, 6:42pm

IIUC the class does not enforce that, only the syntax. So if the syntax is removed, Symbol will not have any advantage in that regard.

Also, identifiers can have names that aren’t valid identifier names, using backticks.

Perhaps a different way to force a valid name is to concoct something using Dynamic…

SethTisue · March 22, 2019, 11:40am

(seconding @nafg

The Symbol class doesn’t enforce that. It’s only the parser enforces that, only when the single-quote syntax is used:

scala> 'foo-bar
           ^
       error: value - is not a member of Symbol
            ^
       error: not found: value bar

scala> Symbol("foo-bar")
res1: Symbol = 'foo-bar

scala> sym"foo-bar"
res2: Symbol = 'foo-bar

So, this is actually an argument against the proposal that I failed to include in my summary.

SethTisue · March 22, 2019, 11:49am

@bvenners can you expand on whether you think it matters, keeping Naftoli’s remark about backticks in mind…?

re: reflective invocation (assuming that’s what you mean by “dynamic invocation”), I’d suggest keeping mind that Scala’s rules about constitutes a legal identifier, even without involving backticks, are much broader than the JVM’s. This is illustrated by the following transcript:

scala> '%%%
res7: Symbol = '%%%

scala> object O { def %%% = 3 }
defined object O

scala> O.getClass.getDeclaredMethods
res8: Array[java.lang.reflect.Method] = Array(public int O$.$percent$percent$percent())

scala> O.getClass.getMethod("%%%")
java.lang.NoSuchMethodException: O$.%%%()
  at java.lang.Class.getMethod(Class.java:1786)
  ... 36 elided

scala> O.getClass.getMethod("$percent$percent$percent")
res12: java.lang.reflect.Method = public int O$.$percent$percent$percent()

given this, I don’t see what use the single-quote syntax has w/r/t reflection.

eje · March 22, 2019, 4:41pm

If this is true, then (IMO) that is a strong argument for removing the Symbol class entirely.

Ichoran · March 22, 2019, 6:04pm

I am on the side of not only removing the syntax, but also Symbol. As a type it seems pointless.

If compile-time checking of string constants is valuable, akin to what Symbol allowed, then write a macro. The f-interpolator already does this. Whether it’s an interpolator or just a regular function is not a critical detail, I think. Interpolators allow you to save a couple of characters for parens, but adding an extension method gets one of those characters back and might be more ergonomic.

"foo".sym  // Compile error if "foo" is bad in some way

bvenners · March 23, 2019, 6:16am

I didn’t realize the Symbol type itself doesn’t enforce that, but it doesn’t:

scala> Symbol("foo+")
res1: Symbol = 'foo+

So nevermind on the need for the type to stay around if we lose the tick mark.

For ScalaTest it isn’t very important. We would change the places where Symbols are taken now to just take strings after this change, and let a test failure tell the user if they accidentally used a non-identifier string. That’s fine for testing, and we could do a ScalaFix rule to rewrite most of those automatically given they are likely mostly done with symbol literals.

RichType · March 23, 2019, 6:58pm

I’ve used Symbols quite a bit as I thought it was more efficient than String for repeated values. I’ve found the syntax convenient and useful for enforcing that the text is a single word. I wouldn’t object to its removal if this was part of a coherent plan for the compile time refinement typing (not to be confused with type refinement) of literals and string literals in particular, as well as efficient String use.

So my request would be that the syntax is not removed until we have refinement types and the compile time checking of refinement type literals.

mdedetrich · March 23, 2019, 10:57pm

Correct me if I am wrong, but isn’t one of the main reasons behind symbols even existing is performance? The idea is that when you compare 2 symbols to be the same, i.e. Symbol("something") == Symbol("something") is is incredibly fast because its just a static lookup.

The reason why Symbols even exist in languages like Lisp, Smalltask and Ruby is because they use Maps as a core construct and due to this they need really fast lookup since there are so many cases where they have static strings as Map keys.

This isn’t an argument against removing the symbol literal syntax, however there is still a case of having symbol literals because of this performance benefit. For example in akka-http, its recommended to use symbol literals when handling things like query params, cookies etc etc (and akka-http does use symbol literals internally iirc)

It is true that actual literals should theoretically replace this feature, but I am not sure if the performance benefits are realized currently.