SIP-XX: Dedented Multiline String Literals

Yes! If we can make this work with a single quote I am fully supportive!

9 Likes

I’d lean in favor of finding a solution with a single " as well. ' is very much associated with Chars in Scala, not Strings.

At first glance, I didn’t understand why we would want anything but

as that is quite enough for virtually all embedded " (the only exception being a single " in the embedded string, which would need an escape, but that’s completely fine.

However, the alternative

also allows for a comma after the closing ". And that’s quite useful when the string is one of the parameters of a method call, for example:

assertEquals(
  "
  this is the
  expected result
  ",
  computation()
)

That said, the same rule would exclude

foo(
  "
  this is the
  expected result
  ", "
  second string argument
  "
)

I don’t think that’s particularly limiting, as one can insert a newline after the comma to solve the issue.

But then, is there any argument for allowing anything else than a single comma after the closing "? Is there any other reasonable construct where one would like to follow a multi-line string literal on the same line with something else than a comma?

7 Likes

There is also the alternative where we demand an extra indentation level for the inside of the string. In that case, there is an easy solution: any " at the increased indentation level is embedded in the string. Only a leading " at the initial indentation level would close the string:

val foo = "
  one
  "
  two
"

that also removes any limitation to what can come after the closing ".

6 Likes

I agree it’s not great to have yet another string literal syntax. One mitigating factor is that ''' for triple-quoted strings are common (Python, Ruby, Dart, Elixir), so we’re not doing something unprecedented here. Many languages have multiple string syntaxes, and I expect that once ''' is rolled out there should be ~zero reason for anyone to use """.stripMargin strings ever again and they could be deprecated. So Scala would have 2 “current” syntaxes for strings and 1 deprecated one left in purely for backwards compat

I would personally be ok with using a single-double-quote for multiline strings, though we need to think carefully about the tradeoffs. Apart from the subjective style, multiline single quotes " have one major disadvantage in that they won’t support extended delimiters

We allow Extended Delimiters with more than three ''', to allow the strings to contain arbitrary
contents, similar to what is provided in C# and Swift. e.g. if you want the string to > > contain ''', you can use a four-'''' delimiter to
stop the ''' within the body from prematurely closing the literal:

> def helper = {
    val x = ''''
    '''
    i am cow
    hear me moo
    '''
    ''''
    x
  }

println(helper)
'''
i am cow
hear me moo
'''

Extended delimiters (present in C#, Swift, and similar to Ruby HEREDOC) allow ANY literal string to be embedded in source code by choosing an appropriate delimiter before and after. This basically turbo-charges the “Raw” string nature of our existing triple-quoted strings, which are great but have problems embedding """s within themselves.

Requiring the closing " to be alone of the line doesn’t work: apart from it being convenient to put ,s and other things after the closing delimiter, it also would prohibit us from defining strings containing a single double quote on its own line:

val s = "
  "
"

@sjrd’s solution of requiring one more layer of indentation within the text block has a similar problem: how would that work in the case above, where we want the multi-line string to contain a single quote?

It seems that without some sort of extended delimiter, we’d have no choice but to have some multiline strings need escaping for certain characters. Not the end of the world, but it does partially defeat the purpose of multiline strings which is to avoid syntactic noise

1 Like

Yes, but like you say, demanding an escape \ here is not the end of the world. We do have escapes for these kinds of situations. I think if they need to be used only rarely, we can leave it at that. Extended delimiters are nice, but they are an as yet other solution to a problem that can be solved with escapes.

1 Like

One other possible mitigation is to define that multi-line single-quoted strings use the indentation of the opening quote as the delimiter. We do not yet know the indentation of the closing quote when parsing lines of a multi-line string, so we cannot use that information to terminate the parse, but we do have the indentation of the opening quote.

That could work, but it would mean code such as

> def helper = {
    val x = ''''
    '''
    i am cow
    hear me moo
    '''
    ''''
    x
  }

Would need to be rewritten as

> def helper = {
    val x = "
              "
              i am cow
              hear me moo
              "
            "
    x
  }

or

> def helper = {
    val x = 
      "
        "
        i am cow
        hear me moo
        "
      "
    x
  }

Not the end of the world, but certainly annoying

1 Like

Indentation is only defined for whitespace, so this example would not work since the " after the = does not have a defined indentation. You’d have to use the second version, with the opening quote always on its own line. Which sinks that idea, IMO.

One issue with demanding an escape here with \" is that it would mean that AFAICT all \s would need to be escaped, not just in this particular scenario. That does partially defeat the purpose of multiline strings in being able to handle strings with \s cleanly. Like if you’re writing regexes or latex or something in a string you’d like to use a multiline string to avoid needing to double-up all the escapes. Which would mean people would need to fall back to """ strings, so we wouldn’t be able to deprecate them

1 Like

Would the raw string interpolator not work for these situations?

Good point, maybe it could! It’s not quite raw since you’d then need to escape the $s as $$s, but maybe that’s good enough

It would be exactly as you wrote. In my head (but clearly I did not communicate that well), the defining indentation level is the start of the line where the opening " appears.

For example, in

val bar =
  val foo = "
    "
  "
  foo

the defining indentation level of the multi-line string literal is 2: the 2 spaces at the start of the line val foo = ".


That said, an escape is good enough as well. And if you otherwise want the whole string to be raw because you don’t want backslashes to be interpreted, you can always write ${'"'} as a last resort.

1 Like

I think it looks fine for single-line statements like those given, but how would it work in something like this:

foo
  .bar(qux, 
    baz, "

Would the indentation chosen be that of the baz, .bar, or foo tokens?

Yes, you may want to call .toUpperCase, + to append some suffix, or call any other method or extension method on that multi-line string. Those are all very common use cases, even ignoring the .stripMargin that is ubiquitous everywhere today.

1 Like

It would be that of baz. But that specific example would not pass a decent coding style, which should mandate instead:

foo
  .bar(qux,
    baz,
    "

+ is likely going to go to the next line. Point taken for .toUpperCase, though.

I don’t think this rule would work? e.g. I could imagine someone trying to write

def openingParagraph = "
  one dark and stormy 
  night, he said
  "...i am cow 
  hear me moo"
".toJson

Which would prematurely close the string on the first " rather than the last. And it’s hard to define a rule that distinguishes the first " from the desired closing "

It does feel to me that making "-delimited multiline strings ergonomic in the common case of having "s in the string requires that we special case some "s to close the string and other "s to not, which would get quite hairy and unpredictable. These rules seem like they would be hard for developers to keep straight in their heads.

And the alternative of not special casing "s would mean they would all need to be escaped, which will mean multi-line " strings would not be able to replace """.stripMargin strings since their big value prop is avoiding escaping for common characters like ". We shouldn’t forget that the purpose of """ was both to be multi-line and to avoid escaping, and multi-line " strings seem to lack the latter

The only non-hairy idea I can think of would be to make it purely indentation-based, which would mean relying on the indentation of the opening delimiter to determine the span of the string (i.e. everything less delimited than that). I think people are used to working with indentation-based blocks, with YAML and Python and Scala 3 making it the default. But that has its own hairiness, e.g. either we

  • Use column-offset of the opening ", which may be unnecessarily far to the right, shifting the string rightward and wasting horizontal space, or force the opening " to be on its own line which wastes vertical space
  • Use the column offset of the first non-whitespace character on that line, which is pretty unprecedented in Scala

The alternative of using a variable-length ''' seems a lot clearer: ''' is uncommon enough that in the common case nothing needs escaping, and in the uncommon case you can extend it to '''' or ''''' to ensure it is distinct from any part of the string. There’s also a lot of precedence for both the syntax (Python, Ruby, Dart, Elixir) and the semantics (C#, Swift, Ruby) whereas having some special rules that this " closes the string but that " does not seems pretty unprecedented

3 Likes

Note the “single-line”. It does indeed require escaping for embedded multi-line strings. Which is the same as what is the case now. Even with the current triple """ quotes, we need escapes to embed another triple quotes string. I don’t think that’s too much of a problem.

One reason why I often avoid stripMargin is because of language injection in IntelliJ IDEA. I use multiline strings to represent Javascript scripts or GLSL code. Once I use | and stripMargin, syntax highlighting and autocompletion stop working. Any string syntax which preserves the script source as much as possible is welcome.

Given I use it for code, using ``` ``` like in Markdown would be most natural fit, but ''' ''' is not bad either.

3 Likes

Maybe, but it’s hell to write about code in forums which use markdown for this purpose.

1 Like

Yes but it’s a matter of degree.

  • Multiline strings never need to embed other triple-quotes, unless you are embedding a very small subset of Scala code. So you almost never need to escape them.

  • Multiline strings will need to embed single-double-quotes all the time: English, HTML, XML, JSON, YAML, TOML, Scala, Javascript, Java, Python, C#, Ruby, Go, …

I guess the “no other quote on the same line” helps, but it does seem like a pretty strange restriction (though necessary) that people would have trouble wrapping their head around. Where in the programming ecosystem would people have encountered a similar parsing/lexing rule before?

2 Likes