Pre-SIP: Sharp (string) interpolation

Proposal

I propose to have a shorthand string interpolation, by using the sharp # character.
The syntax will change as follows (feel free to correct if I expressed it wrong):

processedStringLiteral
                 ::=  alphaid ‘"’ {[‘\’] processedStringPart | ‘\\’ | ‘\"’} ‘"’
                   |  alphaid ‘"""’ {[‘"’] [‘"’] char \ (‘"’ | ‘$’) | escape} {‘"’} ‘"""’ 
                   |  alphaid ‘#’ {printableChar} (whiteSpace | nl | (‘,’ ‘ ’)) ;
  • The new grammar (supposedly) allows any consecutive non-whitespace/newline/comma-followed-by-space characters to be part of the interpolation.
  • No arguments are supported for this sharp interpolation (we can allow them, but I don’t think its a good move).
  • The mechanism invokes exactly the same StringContext class that regular string interpolation uses.

Here are some examples of what will be possible:

val binVal = b#101001100101
val hexVal = h#0304903FFAA
val bigVal = big#345,463,489,989,893,859,438,943,643
val dateVal = date#22.02.2022
val date2Val = date#22/02/2022
val ipVal = ip#192.168.0.1
val phoneVal = phone#+1-800-555-5555
val phoneList = List(phone#+1-800-555-5555, phone#+1-800-777-7777)

As you can see, the fact that we associate the sharp # token with the word “number” enables us to express values more naturally to match the spoken (english) language.
Remember that this is in term positions and not types, so # cannot be confused with path dependent type ascription.

Applying methods on sharp interpolation

The sharp interpolation accepts all characters until whitespace, newline, or comma characters.
This is to allow flexible separators like in phone numbers and dates.
Consequently, unlike regular string interpolation, in order to apply methods we must add space or new line:

ip#192.168.1.0.connect //error
ip#192.168.1.0 .connect //OK
ip#192.168.1.0
  .connect //OK
h#1234+h#abcd //error
h#1234 + h#abcd //OK

Related Issues

Discussion

  1. Should we also use ; as end-of-string for the sharp interpolator? Currently the proposal allows comma (with no following space) to be a separator. Is this OK or confusing?
  2. Theoretically we can allow for interpolation arguments. Should we? I think if someone wants arguments then they should just use the regular string interpolation.
  3. I think that the experimental Numeric Literals (FromDigits) language feature in Scala 3 is not good enough, and this proposal does a better job of enabling the user full control over the acceptable way of expressing numeric literals. Should they both exist or should we remove FromDigits?
  4. Should we add interpolators like big#345463489989893859438943643 to the standard library?
  5. Anything else?
7 Likes

Which rule makes it so that ) is not part of the string interpolation?

It’s also pretty unprecedented that the meaning of the program changes if you add or remove a space after the comma.

4 Likes

I want to like this proposal–not the least because I’m still convinced FromDigits is broken by design–, but there are several aspects that, IMO, deserve to be made much more precise.

First, the set of accepted characters seems very ad hoc. What is the principle behind this choice? As @Jasper-M pointed out, the grammar would clearly include ) but the examples assume ) to be excluded.

The proposal is advertised as “interpolation”, but it doesn’t allow splicing values, which is the defining feature of string interpolation (across languages). There’s something off there.

All the examples share a “numeric” feel. Is this intentional? Should it be reflected in how this feature is actually called/advertised?

The examples mix notation specifications (b, h) and data type specifications (big, date, etc.). What if I want a big number expressed in hexadecimal notation? It seems to me that this proposal should focus on the data type specifications (aka semantics), not notation.

6 Likes

You both are right. I missed that. We can exclude ) or ( like I did with the comma, or altogether. It’s really the question of what is the flexibility we want from this feature and what will be less confusing and have less surprise effects. Should we allow the following?

phone#(917)-555-5555

I could go either way and same with the option of a comma. Open for your input.

I’m not attached to the name. Maybe “Sharp Numerics”? The point is that the underlying implementation mechanism with StringContext must be the same. So if someone needs some splicing arguments then they can utilize the regular syntax for that.

big"1234${externaldigits}5678"

This proposal just adds another grammar option for interpolation without splicing. The data type can be flexible because the string interpolation can be transparent inline and give us all the flexibility that we require.
Only if we add default interpolations to the standard library (like we have with s and f), then we need to care about this.
Maybe big should instead be called d or dec, as in decimal number. The data type of the decimal/hex/binary interpolations can be automatically set by the underlying value itself, if we choose to implement it so in the interpolation.

1 Like

Does this enable user defined literals with compile time checking? For that can be very useful in the context of DSL definitions.

For example, recently i needed a parameter in a method that codes for allowed combinations of some values. This could be done by defining objects for each of them and passing that to a function. Something like:

trait Base
object A extends Base
object B extends Base
object C extends Base

def recipient(elms: Seq[Base], title: String) = ...

This makes it possible to write recipient(Seq(A,B,B,C,A,C),"...") but if the list of arguments is long, this becomes confusing quickly with all the comma’s. But more importantly, all checking on valid combinations must be done at runtime (without macro’s anyway). I would rather write recipient(cb#ABBCAC,"...") with my own interpolator, where the checking is done at compile time.

Would that be possible with this proposal?

You can already do something very similar with Dynamic and inline

However, it only works if you use something that is a valid method name

(This example checks whole strings, but it could be done char by char)

It’s probably cleaner to use string interpolation and inline, but I don’t know if it’s possible
cb"abbabc"

1 Like

Thank you for the tip, I know such things are possible. So maybe my example was not in enough need for a user defined literal. In fact, my question is: does this SIP proposal enables user defined literals, which would then also be rendered in a different way by the IDE? This would be a great asset for Scala imho.

All this proposal does is add another grammar option for interpolation (without splices). But to your question, it’s already possible using the regular string interpolation syntax, hence it will be possible with this proposal (but you will need a space after the comma).

1 Like

Is the proposal actually to just shorten some definitions by one character (context#value vs context"value")? To me it doesn’t seem to be worth it given the gotchas below:

8 Likes

Personally I don’t think it’s a realistic gotcha.
First, you have the IDE coloring which is based on the grammar. You immediately see the potential mistake.
And really how often do you find yourself with using a literal/string interpolation followed by a .method or use infix method without spaces?

Well, surely there would be many instances of things like: method(arg1, arg2, h#deadbeef, h#cafebabe) where we have a problem already as , and ) are amibigous here (are they part of method invocation or string interpolation?). Also what about e.g. s"My lucky number is: ${h#abcd}"? The } character need to be unambigously handled here too. If the primary use case is val someName = context#value then the benefit of one saved character is even smaller (proportionally) than in inline definitions.

4 Likes

Yes you are right. It may look like the exclusions in:

stringFormat     ::=  {printableChar \ (‘"’ | ‘}’ | ‘ ’ | ‘\t’ | ‘\n’)} ;

It’s not. Indeed the Sharp numerics/interpolation is expected to be interacting directly as argument. We just need to cover the grammar properly. The rule is not that complicated though. We need a group of characters that cannot be the suffix of the string as final characters if followed by space. If you look at the grammar it’s not that special. There are various rules that we’ve grown accustomed to and just feel natural.

1 Like

To me the problem seems it is “one-size fits all”. Sure, it is reasonable to exclude the use of a space for each literal, but not so much a dot. The same is true for other literals. We do allow for a dot in a double, the letter x in an integer, and the letter e in both, but not a k or a comma. Although the latter would be nice in large numbers instead of the (ugly?) underscores.

So, would it be possible somehow to make the allowed format dependent on the alphaid itself? For example, if a regex can be defined, it is possible to directly check the literal at compile time.

ip4#192.168.1.0.connect //okay
ip4#192.256.1.0.connect //error
ip6#fe80::aede:84ff:fe10:1722.connect //okay
ip6#2a04:8188:281::1700:8c8d:f13::1.connect //error

Anyway, forcing to add a space after each use greatly reduces the appeal of this proposal i think.

2 Likes

While this could work, I think syntax highlighting is not going to cope with this

1 Like

Do you mean it’s not possible, or that people are not going to implement it ?

1 Like

most syntax highlighting tools are just context free grammars - they will not typically be able to parse your scala code, look up the definition of ipv4 and find some regex there to apply

6 Likes

Inline macro implementation of the string interpolation allows you to do that, so yes.
But if the rule is that we allow . as part of the pattern, then we must have a space. If we don’t allow . to be part of the pattern then we can treat like a space or new line and behave “as expected”.

It’s possible to add another editor plugin, but I really don’t think it’s required. It’s enough to properly report the error and you can even define where the error is marked in the source file, so there is enough information.

Rethinking about the . dot limitation, I believe there is something we can do, but if the grammar is too complex, then maybe this proposal is not worth it.
The . delimiter is indeed useful mainly for numeric expressions.
But what are those expressions?
As mentioned, a double value can have the e for exponent. d#1234.34e-27
e is special. Should it be the only special letter?
What if someone wants to write complex numbers?

c#33.2e5-j15.1 * c#22.1-j1e7

Fractions:

f#2+1/3 * f#2/3 //two and a third times two thirds

Can we allow physical units in the pattern?

val velocity = p#12.5m / p#22.3s
val length = p#3s * p#12.4m/s
val g = p#9.81m/s^2

We are trying to make Scala more popular among the science/algorithmic community.
I think it’s worth thinking how this feature can help get us there.
cc: @eje

2 Likes

IMHO: It is not very intuitive
I would prefer:

192.168.1.0.ip.connect
or
192.168.1.0#ip.connect

where ip is extension or something else.

I think for digits it is quite good.
For printableChar it is better to use open and close characters.