Proposal to deprecate and remove symbol literals

Literal string constants in the class data pool are interned by the jvm. Its got to be a true literal in the source though. At least in Java those end up in the classfile constant pool. I assume scalac puts them in the same place.

The point of the string interpolator, sym, was to offer an olive branch for codebases that use symbol literals. In hindsight, an alternative syntax is not necessary. It doesn’t offer very much benefit over Symbol.

Most importantly, the shorthand syntax would only be introduced in 2.13 and later. It doesn’t offer much help to libraries that need to cross compile to 2.12 and earlier. This is a common requirement, since most of the codebases that used symbol literals were Scala developer libraries (e.g. generic programming, test frameworks, mocking frameworks) that need to publish to multiple Scala versions. Maintaining two forks of your code base just to use sym is not a good story.

Why wouldn’t writing an interpolator for 2.12 and prior work? Cross compiling would just require an extra dependency, I think.

Why not use the same dependency with 2.13?

Besides, what namespace (for importing) would you use?

It shouldnt be too hard to have an import that works for both, delegating to the version specific implementation. Am I missing something?

Why not "foo".toSymbol? It is more descriptive, and is still very easy to type with the help of an IDE.

1 Like

Is it possible to switch the implementation of Symbol to JVM’s built-in intern string, e.g.

final class Symbol private(val name: String) extends AnyVal

object Symbol {
  def apply(name: String) = new Symbol(name.intern())
}
1 Like

I’m a bit surprised. I thought that Symbol was already implemented to simply delegate to String#intern.

No, because the Symbol itself is supposed to be eq to any other Symbol for the same name. It is not enough that the underlying names be eq.

You can actually implement eq for your value class!

final class Symbol private(val name: String) extends AnyVal {
  def eq(that: Symbol) = name eq that.name
}
object Symbol {
  def apply(name: String) = new Symbol(name.intern())
}
val a, b = Symbol("test")
a eq b // true

Note that eq is not supposed to be available on Any, so we shouldn’t be able to call it on the boxed version of the class. Of course we’ll see an inconsistency if we use asInstanceOf to force observation the boxed version of the class, but that’s expected of a low-level unsafe operation like asInstanceOf IMHO.

def f[A](a: A, b: A) = a.asInstanceOf[AnyRef] eq a.asInstanceOf[AnyRef]
f(a, b) // false

Also, the current implementation of collections does the right thing (though that’s why they’re so slow):

Set(a,b).size // 1

I’d echo the comments of others that 'foo is the most-commonly used way to refer to columns in Apache Spark, which is not only the fastest-growing unified big data & analytics/ML platform but also, by far, the most active Scala OSS project. See the stats. There are millions of difficult-to-refactor assets that use this syntax.

Removing the 'foo syntax would have devastating effects to the rapidly-growing community of data engineers and data scientists who use Scala not in IDEs and code editors but in Jupyter, Zeppelin and Databricks notebooks. This group constitutes the largest influx of Scala users in the world.

Notebooks offer none of the refactoring tools of modern IDEs. There is no tool that can check whether the Scala code in a notebook compiles correctly. Notebooks are often not even backed by SCM repositories. Like it or not, this is the way an increasing amount of Scala code will be written in the future because the Spark+Scala community is the fastest growing group of Scala coders in the world.

It would be a huge disservice to the Scala community to force this group to discover problems in millions of notebooks, one at a time, at “runtime” (Scala code in a notebook executes in a REPL; it compiles when a notebook cell is executed).

4 Likes

This is an extreme case of 'bla syntax being used in DSLs to flag something that is really a string as something that is visually like an identifier. This is an abuse but clearly a very tasty one. I definitely don’t want to watch spark fizzle out. Presumably scala2 code will continue to parse under dotty in its scala2 syntax mode? So I don’t think we fall off a cliff over-night.

Yes, symbol literals are still supported under -language:Scala2. If absolutely needed, we could support them longer with a more specific language import, but that’s not ideal, since they will clash with meta programming quotes. For the moment, we allow 'x meaning quote only under a splice. I.e.
${ ... 'x ... } means x is quoted whereas toplevel 'x means x is a symbol literal in Scala-2 mode and it is an error otherwise. This works OK for now, but at some point we’d like to let 'x mean quoted also on the top-level. As long as we support symbol literals (even under a flag) we should not do that.

Exactly - but my point is that this body of existing notebook scripts are scala2 syntax and semantics. They will continue to work, because it’s only once you move to scala3 syntax that you are affected.

Have you considered `x instead of 'x?

1 Like

Backtick has a more important, and less disambiguatable, use: to express names that don’t follow the usual rules for identifiers. For instance, to refer to a Java method named yield, but it also allows names with spaces and other characters, which is sometimes useful (at least in the absence of a better solution).

1 Like

scala.meta.Term.Name can be used as the type of a field name, which can be created by quasiquotes, e.g. q"fieldName" .

Given Scala Reflect is internally used by Spark, if Spark switch from Scala Reflect to Scala Meta in the future, it’s very natural to switch from symbols to quasiquotes for referencing a field name.

Posting as constructively as possible, I don’t think this is a healthy way to look at the problem because its ultimate conclusion is essentially saying that Scala can’t evolve as a language because a certain portion of the community (regardless of how big or small) refuses to to adopt (for whatever reason possible).

Although its definitely true that Spark + workbook users are a significant part of the Scala programmers, its also not outside the realms of possibility for such users to use tools like https://github.com/scalacenter/scalafix to fix their workbooks (especially considering that if the syntax for Symbol changes, its likely to be a very trivial transformation).

There is also a possibility of rather than just removing the ``x` syntax (because it seems one motivator for removing the Symbol syntax is for it to be used for macros) is that we can give it an alternate syntax.

Heck we can even copy for the smalltalk/Ruby book and use :x which denotes a Symbol although I think that this may make a lot of edge cases in the Scala parser.

There are other possibilities, I mean one can do #x or %x, I don’t think either of these are being used and the former looks pretty easy on the eyes

Seems like Symbol is a perfect candidate for opaque type then. Interning a String and turning it into an opaque type (which guarantees it will never box) would be enough to recreate the reference equality guarantee.