Escaping Unicode

See also: https://github.com/scala/bug/issues/3220

I would like to propose removing the special situation of unicode escapes, and make them “escapes just as any other escape”.

The two big issues I see with this are

  • This changes semantics of string literals, and I’m not sure what the correct migration path for this should be. Morally “”“tab\tseparated”"", and “”“tab\u0009separted”"" should be handled the same. At the moment, it’s not the same: the former has a the string “\t”, the later has a tab character. Breaking code is bad. The status-quo is also bad.

  • The same goes for the raw interpolator.

I would also point out two further restrictions that this change would create: You used to be able to write a unicode escape in an identifier without needing to quote it with backticks. You lose this ability now. You either use backticks or just insert the literal character. Backticks have a special meaning in case statements though, and removing this functionality removes the ability to use unicode escapes in variable patterns, because using backticks in them makes them a stable identifier pattern.

I personally believe that people using unicode escapes in variable patterns, along with the people who write \u0076\u0061\u006c\u0020\u0078\u0020\u003d\u0020\u0037 instead of val x = 7 go the special hell, and their code breaking will be - at least in the grand (because eternal) scheme of things - the least of their worries, but for completeness sake I want to include it for discussion.

Is the general case worth it? I think so. What about breaking string literals and raw interpolations.

2 Likes

Yes please

Unicode escapes are already special-cased for char and string literals. Why not just also special-case for triple-quoted multiline literals and interpolations, and for comments? Then you also get an easy way to insert a quote in an interpolation.

scala> '\u000A'
res0: Char =


scala> "a\tb\u000Ac"
res1: String =
a	b
c

scala> """a\tb\u000Ac"""
res2: String =
a\tb
c

scala> s"\"hello, world\""
<console>:1: error: ';' expected but ',' found.
       s"\"hello, world\""
                ^

scala> s"\u0022hello, world\u0022"    // would work
<console>:1: error: ';' expected but ',' found.
       s"\u0022hello, world\u0022"
                    ^

It leads to a lot of accidental complexity and filling holes with holes.

The unfortunate situation that we don’t have a good way to escape quotes in interpolations, in my opinion, isn’t a good reason to have unicode escapes special cased, and special cased differently for strings, interpolations and comments.

It’s nice that unicode escapes are special cased so that you can do '\u000A', but you can also do '\n', and I don’t think there is a good reason to use the unicode escape instead.

The same goes for "a\tb\u000Ac", just use "a\tb\nc" and just replace """a\tb\u000Ac""" with

"""a\tb
c"""

s"\"hello, world\"" should just work as expected, without having to employ and special case unicode escapes for it, which is https://github.com/scala/bug/issues/6476

I can read your desperation in that ticket though. Is that the main point of contention?

1 Like