SIP-XX: Dedented Multiline String Literals

Opened a new Pre-SIP, please take a look if you’re interested!

Please read through the whole proposal before commenting

17 Likes

Love this, but one nitpick to the SIP – a big motivator seems to be the interaction of interpolation with stripMargin, but the SIP doesn’t explicitly mention that interpolation will be supported by the new literal syntax (like s''')

It does mention that:

Dedented string literals should be able to be used anywhere a normal " or triple """
can be used:

  • Literal types (String & Singleton)
  • String interpolation, with builtin or custom interpolators
  • Inline methods and Macros that require the literal value at compile-time
  • Pattern matching

You’re right, I missed that. Thanks.

I love this! I have had trouble with using multiline strings with stripmargin inside @implicitNotFound, obly to find out it doesnt work. It just seems like an intuitive addition. Sad that the ””” syntax is already taken, but definitely a worthwhile addition to me.

I like the idea, but I would prefer not having a new syntax. Maybe we can control the way multi-line literals work using language imports? I. e. you get the old behaviour by default for now, but if you import scala.language.newMultilineStrings you get the new behaviour, and at some point in the future we can change the default…

2 Likes

I think this is the right way to do it–a separate syntax from """. The reason is that we can’t compose string interpolators, and there are valid use cases for both the de-indented and indented forms. If it turns out that everyone always uses ''' then we can deprecate """ in time. The biggest downside is that tooling already understands """ and doesn’t understand ''' but, eh, if this is a tricky fix, there’s something wrong with the tool.

As an alternative, we could instead use a . after the string interpolator. This isn’t valid syntax now, so it would be unambiguous. The main reason I don’t like that is because it’s too easy to get the wrong one; s''' does not look much like s""", but s.""" does look a lot like s""".

One issue to figure out–can you add a comment apparently inside the string? And should we have the content indented more deeply as if it’s a block? Or at the same level as where it’s created? I think deeper actually looks nice. It also means that you can write ''' inside your text block because it isn’t interpreted as the end marker unless the indentation is one less.

    val help = '''  // Version printed to stdout
      This program is cursed.
      '''
      ^ yes, we mean to print this!
      See what I mean?
    '''  // end help--matches opening depth

How many of these languages have both """ and ''' strings possibilities as the SIP intends to create? And how many also differentiate strings with and without interpolation prefix?

2 Likes

Plenty of languages have multiple syntaxes for multiline strings, many with a flexible delimiter syntax to avoid escaping within the body.

  • Elixir and Python have both ''' and """, albeit with different semantics than that proposed here

  • Ruby has <<HEREDOC, <<-INDENTED_HEREDOC, and <<~SQUIGGLY_HEREDOC, as well as tagged strings via <<-EXPECTED.chomp syntax

  • C# actually allows the starting and ending delimiters to be arbitrarily long, e.g. a four-quote """" opening needs be followed by a four-quote """" closing, which would allow you to embed three-quote """ literals into the string. That’s actually pretty nice, is functionally similar to what Ruby does with HEREDOCs, and maybe we should adopt that as well

  • Swift allows """, #""", ##""", etc. with arbitrary numbers of # to be the delimiter

Plenty of languages have interpolation prefixes as well

Scala isn’t special here. This problem is universal, and there are plenty of wildly-different languages which have attempted solutions, many of which look and feel largely the same. These range from flexible-syntax languages like Ruby, boring enterprise languages like C#, to hybrid languages like Swift.

2 Likes

So the way other languages seem to do this is by making the delimiter syntax flexible:

  • Ruby <<~SQUIGGLY_HEREDOC can take an arbitrary header
  • C# can take arbitrary-length opening delimiter (e.g. four-quote """") with a corresponding closing delimiter
  • Swift lets you prefix the " with # (e.g. #""") and then scans for a corresponding closing delimiter

We can probably do that too, I added it to the proposal as Extended Delimiters

Sorry to be in the other corner of the room here, but please not a third way to define something trivial as a string. If this is so important maybe we can allow a single quoted string to extend over multiple lines and build this in. That way you certainly don’t break any code, and it is intuitive. The triple quotes “”" keep their original behaviour. So:

val x = "111
   222
     333"
println(x)

Now generates an “unclosed string literal” error, but would then give

111
222
333

I don’t know if this causes other language problems though.

4 Likes

A " character followed by a newline is currently a syntax error, so we could simply use that as a delimiter and follow @lihaoyi’s proposal otherwise. However the downside is that triple quoted string literals allow special characters like " or \ in the string while they need to be escaped in ". We could either change the rules for string literals starting with " followed by newline, but that would be inconsistent and confusing, or we could continue to require escaping, which partially defeats the point. Overall I’m not a fan of the idea. If we want to keep an existing syntax, then sticking with """ and fixing it with a language import seems like a better idea to me.

1 Like

I’m pretty strongly in favor of Haoyi’s suggestion.

Yes, we can wish that we’d gotten this right in the first place (IMO Haoyi’s approach is strictly better than what we have now), but we can’t rewrite history, and the amount of settled code using """ is sufficiently enormous that it is probably never going away.

Using a language import is a recipe for disaster IMO: the visual gap between the import and the usage is likely to be tens, often hundreds of lines, so it would promote eternal user confusion about why """ works one way in this file and another way in the other. Worse, there’s no compile-time warning for missing the import, so problems from missing imports will tend to show up in production.

I entirely sympathize that it’s inelegant to have another way to do multi-line strings, but I think it would be foolish for us to refuse to solve a problem on that ground. The suggested ''' approach is visually distinct enough to be able to recognize the difference between the string types, easy to adapt to, and likely to just plain work.

Making " do the right thing is plausible, but as @mberndt points out, there are some subtleties to consider. And it will cause its own permanent confusion about why """ exists in the first place, if we can get to the point where " is simply always better.

So I’d say we simply own it: add ''', admit that """ is an older version that wasn’t quite as good, and leave them both there in parallel for a long time – possibly forever, possibly switching permanently after a deprecation period.

The only tweak I’d suggest is that we also cook up a scalafix recipe to rewrite from the old style to the new one on an opt-in basis, so teams can switch over wholesale. (That would be strictly necessary if we ever wanted to permanently switch.)

3 Likes

I agree that in retrospect we should have defined multi-line literals that respect indentation. But that boat has sailed a long time ago. The question is whether the better usability is enough to warrant two different string literal syntaxes.

So let’s see what is the downside of the current approach. Here’s what we can do now:

println:
  """here are some
    |  lines of random
    |text.
    |""".stripMargin

Here’s the same with the proposed new syntax:

println:
   '''
   here are some
     lines of random
   text.

   '''

The two extra elements are the “|” delimiters and the final .stripMargin call. The stripMargin is admittedly ugly. It would be better if there was a shorter version, like this:

println:
   """here are some
     |  lines of random
     |text.
     |""".sm

One could also roll this into the string interpolators. Say d for simple de-dent, then sd, fd, and so on. That would give:

println:
   d"""here are some
      |  lines of random
      |text.
      |"""

Are these variants usable enough, or do we need a separate syntax for these new string literals? I tend to think there are more important things to improve but am ready to be convinced otherwise.

4 Likes

Haoyi’s proposal discusses some other problem areas, and particularly ways in which string interpolators fail for some major use cases.

Just to note on this paragraph in the SIP:

Single-quoted strings with " cannot currently span multiple lines, and so they could be specified to have these dedenting semantics when used multi-line.

  • This has the disadvantage that a single " isn’t very visually distinct when used for demarcating blocks of text, and separating them vertically from the code before and after

I don’t think it is a big issue (if an issue at all).

However, a more imortant one is that we couldn’t use " chars in such multiline strings without escaping, e.g. the following wouldn’t work, unfortunately:

val json = "
  {
    "one": 1,
    "two": 2
  }
"

As a practical matter, given the difficulty of handling wrapping and such with stripmargin, I never use it. The interpolator doesn’t help one bit; it’s the awkwardness of the | prepended to each line. Even with multi-line edit common on editors these days, I would much rather use some braces and define the text actually at the left margin.

But this would be lovely:

println: "
  I have two things to say.
    (1) There is one more thing after this.
    (2) There was one thing before this.
  I had two things to say.
"
4 Likes

@odersky please read the proposal and not just the excerpt! It addresses all your questions directly

1 Like

Yes, we can! There are a few options, define that the string is closed only by a

  • single-quote as only character on a line. So all other single-quote are interpreted as quotes in the string
  • single-quote directly followed by an EOL character directly (then you must only escape this special case, if it occurs which is seldom)
  • a tripple quote.

The latter combination would look like this:

val json = "
  {
    "one": 1,
    "two": 2
  }"""

and has the advantage we do not need to introduce other characters like ’ ’ ’ to define a string. Feels more natural!

@haoyi I now read the proposal in full and it does indeed discuss the alternatives I raised before.

But I am still a bit pained that we now use another lexical delimiter with '''. It does not feel very nice for a language that strives to be small and orthogonal in its features.

So maybe go back to using single " instead? The argument against is

  • This has the disadvantage that a single " isn’t very visually distinct when used for demarcating blocks of text, and separating them vertically from the code before and after

But I am not sure that’s a very strong point. Any editor or IDE would syntax-highlight strings, so you know their span by their color alone.

About embedded single-line string literals:

A slightly more lenient version would interpret a " as closing a multi-line string literal if

  • there’s only whitespace between the last EOL and it, and
  • there is no other " following anywhere on the same line.

That would allow any embedded single-line string without need for escapes.

A single " also has the advantage that one can easily refactor between single line and multi-line literals. The quotes would never need to change.

I feel the single" would tell a better story. Instead of having three string literals we only have one and a half. From now on it would always be single " for string literals, so we are effectively removing a syntactic feature. Triple quotes stay around only for backwards compatibility.

11 Likes