Proposal to deprecate and remove symbol literals

I do not think so.
https://docs.scala-lang.org/sips/42.type.html

val book =
  ("author" ->> "Benjamin Pierce") ::
  ("title"  ->> "Types and Programming Languages") ::
  ("id"     ->>  262162091) ::
  ("price"  ->>  44.11) ::
  HNil

It is just an example.
In general I would prefer not to use Dynamic in dsl.

Symbols have two advantages:

  1. one character less than a string
  2. are integrated with uniqueness cache

I haven’t seen 2) used anywhere in a meaningful way (i.e. case where eq vs == performance matters). 1) is used, but it definitely is not a game changer. No other popular statically typed language has this weird “unclosed string” and nobody is missing that. Ruby has symbols, but strings in Ruby are mutable, so it’s a different story.

I will be very happy to have

So it will be posible to write:

1.5.nn \\instead of "1.5".nn

But I would prefer:

  • a digit 0-9 or escape
  • followed by a sequence of digits or letters,
  • which can also contain one or more '_'s, if followed by a digit or letter,
  • which can also contain one or more '.'s if followed by a digit or escape.

I understand that scala has very low freedom choosing escape character.
But I will miss it …

Slightly improved in 2.13-RC1

In languages like Ruby and various Lisp’s, its always used for performance reasons because so Maps are so prevelant (and in many cases they replace objects/classes) and so you need a very efficient way to lookup by a String (i.e. an interned String). These interened strings are the semantic equivalent of looking a property in a class or some other datastructure in statically typed languages.

Yes its not as widely used in static languages like Scala, but it still has its uses. Akka-http uses it everywhere for things like query parameters, i.e. String’s whos value never changes during the lifetime of the program.

https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.28

Compile-time constant expressions of type String are always “interned” so as to share unique instances, using the method String.intern

Example constant String in Java is: "The integer " + Long.MAX_VALUE + " is mighty big."

So there’s no difference between 'something and "something" with regard to interning. Difference is only between Symbol(nonFixedExpression.toString) vs nonFixedExpression.toString. First will be interned, second not.

Akka-HTTP uses compile-time constant Strings in Symbols, so they are all interned no matter if Symbols are used or not. ScalaTest also uses compile-time constant Strings. Apache Spark examples also show compile-time constant Strings.

To extract extra performance from Symbols you need to have a non-constant expression passed to Symbol.apply and then cache the result. Without caching you’re paying the interning cost every time you invoke Symbol.apply. So in fact you can even lose performance by using Symbols in wrong way.

Ruby of course can’t have String constants pool because Ruby’s Strings are mutable. Java has immutable String and String constants pool work properly.

I would agree with this line of thinking if we were discussing adding symbol literals. We are instead deciding to remove them, and there’s more code using them than we thought. Is the cost of removing them a good investment of our users’ time and good will for the expected benefits? I’m still trying to figure that one out. I don’t see symbols as holding Scala evolution down, nor as a particular hurdle for newcomers (is explaining what they are going to take more than 30 seconds?) and feature interaction is very low.

Note that removing them won’t make the implementation of these libraries harder (something I can live with for a good cause), but it will require all users of those libraries/frameworks to rewrite their code. I am willing to change my mind if I understand those benefits, and probably it’d help if someone from the Spark core team or Akka-http team chimed in. /cc @rxin @jrudolph

1 Like

I must say I’m quite unhappy about this decision to remove them from 2.13. I’ve used Symbol literals quite a bit and in particular I’ve used them for downstream facing interfaces.

Before Symbol literal are deprecated I would like to see a proper decision made about whether Symbols are going to remain part of the language long term And if they are going to remain in the language long term, how and when they should be used. If Symbols are to remain a part of the language, then Symbol literals could well justify their place. If Symbols are to remain in the language then, we should be encouraging people to use them more and making literal creation easy seems like an excellent way to encourage their use when there are significant performance gains to be made.

1 Like

I wonder how realistic code where Symbols give measurable performance advantage would look like and how often anyone writes such code. Remember that in code below underlying char array won’t be compared:

object ClassFromLibraryA {
  val s = "something"
}

object ClassFromLibraryB {
  val s = "something"
}

object Application extends App {
  // here both Strings will be the same reference thanks to Java constant expressions pool
  // and String.equals explicitly checks for reference equality at the beginning
  println(ClassFromLibraryA.s == ClassFromLibraryB.s)
}

You would need something like this to get any gain from Symbols:

object Application {
  val suffix = "aaa"
  val cachedSymbol1 = Symbol("something" + suffix)
  val cachedSymbol2 = Symbol("something" + suffix)

  def main(args: Array[String]): Unit =
    println(cachedSymbol1 == cachedSymbol2)
}

with the caveat that creating cached symbols usually take much more than comparing Strings, because Symbols use synchronization underneath. So you must compare equal Symbols much more frequently than you create them.

Note that I had to use explicit Symbol factory. You won’t be able to get any performance advantage from using only Symbol literals.

Symbols are definitely used in Akka HTTP routing code and are also shown in the documentation and examples. E.g. here: parameters • Akka HTTP. You can already now use Strings where you can use Symbols.

Funny enough, the documentation syntax highlighter doesn’t know about them…

I cannot speak for the Akka HTTP users, but personally I guess I personally wouldn’t mind if they would be removed / discouraged. Saving that one character per symbol seems somewhat appealing in the overall brevity of the routing DSL but on the other hand it is far from essential. I guess many people use a mix of String and Symbol literals, so discouraging symbols would make usages more uniform.

In general, I find the approach using language flag (per file or per compilation) for experimental / deprecated language / syntax features quite interesting. It gives an incremental path of updating your code, striving for a minimal set of those flags in the code base.

All incompatibilities have a price, we feel it a lot as library writers. To make progress, the trade-offs between introducing compelling features and irritating incompatibilities needs to be chosen well. By now, I think Scala 3 might be compelling enough to warrant the migration burden. Of all the changes necessary to migrate code, it seems this one seems only minor.

To provide more context, at least when it comes to Spark, we didn’t use the ’ symbol syntax so we could save one character vs string.

'col in Spark returns a Symbol, which gets implicitly converted into a Column class. Spark provides a programmatic API for specifying the expression tree. Consider this:

@scala.annotation.varargs def select(cols: Column*): DataFrame

So users can do:

df.select('id, 'dept, 'salary + 1000)

There is also an overloaded version of select that accepts only strings, which is used when a user wants to simply select a bunch of columns without additional expressions:

@scala.annotation.varargs def select(col: String, cols: String*): DataFrame

In this one you can’t express salary + 1000, but only “salary”.

3 Likes

Symbol is what caused me a lot of confusion and pain when I started using Ammonite: @lihaoyi used it everywhere in documentation there, so I could not figure out what it was as it looked like quotes and was not mentioned in the most popular scala courses/tutorials. So, I suggest either deleting it or putting it to tutorials and other places so people will know that it exists and how it behaves

it isnt just spark or akka-http that uses them: symbols are appealing for any DSL i think. we use them in almost all our internal DSLs that are used by us and by our clients.

if the reason to remove 'x is that its needed for something else and that something else is important… i understand!

if the reason to remove 'x is simply to make the language smaller… you picked one of the most user (not developer) facing features of the language. i am not convinced that is a great idea.

1 Like

Yes, Symbol is a much better target (than String) for extensions (used in DSLs) because Symbol has just a few methods, while String has already plenty built-in and also plenty of extension methods in Scala’s standard library. That’s actually a strong argument unlike the previous ones (saving one character per string or performance gains in some really obscure cases nobody cares to show).

3 Likes

Thanks for the explanations! A question: What about a design where the implicit conversion goes from String to a Column class? Would that also work?

Strings already have many operators available on them:

println("abc" + 3) // prints abc3
println("abc" * 3) // prints abcabcabc
println("abc" ++ "xyz") // prints abcxyz

Mixing them with additional extension methods/ implicit conversion/ whatever would create a lot of confusion.

That’s not valid Scala code, though, unless I’m missing something?

That doesn’t seem type-safe at all. How would you write such extension methods?

Yeah… I think it was a bad example …

Hi Martin,

As tarsa said, because String already has a lot of methods defined on it, the implicit conversion to Column won’t work because +, * will all break.