Backticks in string interpolation

kevinwright · February 27, 2018, 9:55pm

I previously raised this as an SIP, but it languished through lack of strong support for the proposal.

The core idea is that

Anything can be made into a valid identifier by wrapping it in backticks
In string interpolation, a single identifier doesn’t need to be wrapped in curly braces

Both of these statements are individually valid, but not when combined.
For the sake of language consistency I seek to rectify this.

My initial motivation for the proposal was the boilerplate code-generation that I implemented for shapeless, and which was subsequently adopted by cats.

See https://github.com/milessabin/shapeless/blob/master/project/Boilerplate.scala
and https://github.com/typelevel/cats/blob/master/project/Boilerplate.scala

Where statements rapidly become unwieldy and hard to read, as in:
def apply(l : ${`A::N`}): Out = l match { case ${`a::n`} => ${`(a..n)`} }

Which could so easily be re-written as the far more comprehensible:
def apply(l : $`A::N`): Out = l match { case $`a::n` => $`(a..n)` }

Note that the form s"$`something`" is currently INVALID in Scala and produces a syntax error, so introducing this ability in no way presents a breaking change or a modification of any existing valid behaviour.

som-snytt · February 28, 2018, 12:09am

IIRC the PR was

The PR enforcing the spec was

kevinwright · March 2, 2018, 10:10pm

Also Backticks in String Interpolation SIP tracking · Issue #3 · scala/slip · GitHub

With the relevant text:

Rejected due to lack of support overall, and concerns that adding this conformance will actually make the rules less consistent (Martin’s argument), as in: there are going to be differences in what identifiers are valid anyway, his view is that it is better to have just one valid set of identifiers off the bat - alpha-numerics, and put anything else inside of {}s like now.

It’s long bothered me that this proposal got conflated with other ideas, such as an $" escape and trailing underscores in interpolated identifier names. These are unrelated concerns!

The proposal is only about observing that the use of backticks to “validate” an identifier is a global property of the language, whereas “alphanum only” is specific to string interpolation in the absence of braces.

As such, the more global principle should take priority, especially given that it only requires enabling a syntax that’s currently invalid and therefore not already used in any other form (so no backward compatibility concerns)

jducoeur · March 3, 2018, 4:07pm

Agreed, at least in principle. I’d be curious to understand Martin’s argument about this making the rules less consistent, though: the opposite seems true to me, but I can’t say I understand the language to the same degree he does. If it’s just a conflict between the rules inside interpolators vs. the rules everywhere else, then I’d argue that the latter should win, barring technical reasons otherwise…

som-snytt · March 3, 2018, 7:58pm

I don’t see a test in the old PR for

val * = 42
s"$*"

which would be accepted by taking any valid identifier. There’s nothing special about backticks in terms of what is a valid identifier. An “op” is at least a “plainid”, but a backticker is outré.

kevinwright · March 3, 2018, 8:39pm

I don’t see how that’s relevant.
I don’t see how anything other than the s"$`identifier`" syntax is relevant.

Current behaviour is:

scala> val * = 42
*: Int = 42

scala> s"$*"
<console>:1: error: invalid string interpolation $*, expected: $$, $identifier or ${expression}
       s"$*"

scala> s"${*}"
res0: String = 42

scala> s"${`*`}"
res1: String = 42

scala> s"$`*`"
<console>:1: error: invalid string interpolation $`, expected: $$, $identifier or ${expression}
       s"$`*`"

The only thing I’m calling for is that the message
invalid string interpolation $`, expected: $$, $identifier or ${expression}
should be correct as regards all accepted non-operator formulations of identifier defined in https://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html#identifiers

It’s a breach of the principle of least surprise to not have a valid identifier accepted when an error message calls very explicitly for an identifier, and that’s not something we ever want to do unless it’s to avoid ambiguous and potentially even more surprising behaviour. Which is why the emphasis on non-operator is relevant here, because I do agree with Martin that all possible identifiers could be especially confusing in interpolation, e.g s"$a_+b" where a_+ is defined (though s"$`a_+`b" is beautifully non-ambiguous).

After all, some allowances must be made for the fact that identifiers in string interpolation don’t have the advantage of being otherwise delimited… There’s just no good reason why the sacrifice of backticks should be such an allowance.

som-snytt · March 3, 2018, 10:05pm

In that case, I would express more strongly that I don’t understand why special-casing backticks is relevant. They are no more special than any other identifier. So the proposed special case does not embrace general principles, but is just another rule. Moreover, the syntax is arguably ugly, and is no better at disambiguation than the current syntax using braces.

Currently, a macro can parse arbitrary syntax, such as

cats"$$`A::N`"
cats"#`A::N`"

It would be nice if a macro interpolator had API to control interpretation of

x"$A"
@implicitNotFound(msg"No F[$A]")

(where A is not a value). Currently, the string is parsed by parser, but that could be deferred.

kevinwright · March 4, 2018, 12:18am

I agree, wholeheartedly, but coming from the other direction.

String interpolation has special cases when it comes to identifiers, but why should backticks be one of those special cases? There’s no logical requirement for it

My argument is not to add a special case, but rather to remove one.

The core motivation is for scenarios where you’d have to use backticks anyway, with identifiers such as `a..z`. In this case braces are shown up as being utterly redundant, and pure ceremony for no gain:

s"${`a..z`}" vs s"$`a..z`"

I don’t want something that’s “better at disambiguation” than the current syntax, I just want not to be forced to disambiguate twice with two differing syntaxes - even though one is globally recognised throughout the entire rest of the language.

sourcekick · August 30, 2018, 4:49pm

What about dropping this:

“In string interpolation, a single identifier doesn’t need to be wrapped in curly braces”

I am in favor of forcing the curly braces in string interpolation, because in a productive workflow it is annoying and noisy if the IDE or someone tells you that you may remove unnecessary curly braces. I am much in favor of having only one consistent and working way here, always curly braces. How would that affect this topic?

jducoeur · August 31, 2018, 12:27pm

Disagree, at least mildly – the non-curly-brace case is significantly more common, in my experience, and I find it more readable that way. (Granted, I might change my tune if I wasn’t usually using an IDE with syntax coloring.)

Adowrath · August 31, 2018, 10:22pm

You haven’t configured your IDE correctly then.

You don’t have a complete style guide then.

Seriously, I do not see these as starting points to discussions. I do see the argument of having it only one way, to require braces all the time, and I oppose that. I clutch when I have to use braces, because they feel a bit cumbersome to me.