Multiline string literals: can we get rid of the need for stripMargin?

smarter · April 13, 2020, 8:12pm

A typical usage of a Scala multiline string literal looks like:

val s = """one
          |two
          |three""".stripMargin.

I’ve always disliked the need to manually specify the margin using | and then strip them away using stripMargin, and now Java is introducing multiline string literals using basically the same syntax, but with automatic margin stripping: https://openjdk.java.net/jeps/378. So how about we adopt the same algorithm for Scala string literals? It’s a breaking change but I think it’s less likely to break things than to fix existing code where usage of stripMargin was forgotten (I’ve seen this happen multiple times just in the dotty codebase, so I expect this to be a common error).

nafg · April 13, 2020, 8:46pm

Why not do it via a custom interpolator?

smarter · April 13, 2020, 8:53pm

Because you cannot use multiple interpolators at the same time, and because the default should do “the right thing”.

lihaoyi · April 13, 2020, 9:57pm

Jsonnet also does automatic margin stripping of multiline strings. I must say it’s very convenient!

soronpo · April 14, 2020, 6:44am

And in addition, it will be better for compile-time strings that cannot benefit from .stripMargin application.
E.g., using scalatest’s assertCompiles.

odersky · April 14, 2020, 8:11am

I think this would be an interesting idea. But I just looked at the JEP and it is … rather frightening in its complexity. Interesting that Java now uses significant indentation, even though it is only for this specific usecase. Also, I wonder how they will deal with tabs in incidental whitespace; the JEP does not talk about them at all.

The main problem I see adopting this is to define what is incidental whitespace. In

val s = """one
            two
             three""".

How do we know where to strip? Do we count characters? What if someone uses a proportional font (some people do!). And what if people use tabs instead of spaces?

The Java proposal sidesteps this by demanding a newline after the initial “”" which is not included in the string. So it would have to be

val s = """
        one
        two
        three"""

But if we adopted this we’d be doubly incompatible with the current design because we would have to skip both whitespace to the left and the initial newline. I agree whitespace to the left would be rarely intended but the initial newline could certainly be intentional.

odersky · April 14, 2020, 8:26am

Maybe we could sidestep all this by using single quotes?

I.e.

val s = "
        one
        two
        three
        "

The rule would be that a single quote at the end of a line starts a multi-line string literal. The string is terminated by another single quote at the start of a line. Initial and final newlines are not counted. This means that triple-quote literals would no longer be necessary and could be deprecated and phased out.

To determine what is incidental whitespace I would use the indentation of the final quote. Everything to the left of it is incidental. I.e.

print("
     one
     two
     three
   ").replace(" ", "_")
println("!")

prints

__one
__two
__three!

This gives more control than the JEP’s incidental whitespace algorithm and is dramatically simpler.

h-vetinari · April 14, 2020, 8:40am

odersky:

To determine what is incidental whitespace I would use the indentation of the final quote. Everything to the left of it is incidental. I.e.
print("
     one
     two
     three
   ").replace(" ", "_")
println("!")

While that may be smart and simple, it is IMO basically impossible to read (i.e. know from reading what the result will be). Especially in long multi-line strings, one might not even see the end in the IDE, and hence constantly have to scroll for what’s actually significant indentation.

For these reasons, I think the alignment has to be determined by the beginning. And that doesn’t have to be the extra newline from the JEP. It could just be the characters to the right of """ (same output as your example):

print("""  one
           two
           three""").replace(" ", "_")
println("!")

This would be like before with .stripMargin, but just replacing the |'s with an extra space.

As for tabs vs. spaces: that’s almost always unintentional and should IMO raise a warning (along with a reasonable default behaviour for handling it) or just an error.

LPTK · April 14, 2020, 8:54am

The single-quote approach seems like a really simple and elegant idea! And it would work well with existing string interpolators.

However, that feels a little weird and potentially hard to read (as pointed out by @h-vetinari).

Maybe it would help readability if the start and end quotes both indicated the indentation, and if they had to agree with each other:

For instance:

object Test:
  val s =
    "
    one
    two
    three
    "
  // or:
  val s = "
          one
          two
          three
          "

print(
  "
    one
    two
    three
  ").replace(" ", "_")
println("!")

__one
__two
__three!

rgwilton · April 14, 2020, 8:59am

In case it is of interest, YANG (RFC 7950, section 6.1.3), uses the following rule:

If a double-quoted string contains a line break followed by space or
tab characters that are used to indent the text according to the
layout in the YANG file, this leading whitespace is stripped from the
string, up to and including the column of the starting double quote
character, or to the first non-whitespace character, whichever occurs
first. Any tab character in a succeeding line that must be examined
for stripping is first converted into 8 space characters.

odersky · April 14, 2020, 9:13am

Note that the JEP has the same problem: To find out about indentation, you have to inspect all lines of the multi-line string, so you also have to scroll to the end.

Using the start does not work reliably, as I have argued in my previous post. Specifically, it does not work with proportional font characters, or with tabs. It would work reliably if we insisted that the start is on its own line, as @LPTK suggested. But I find that very hard to read.

Some explanation: Using indentation is tricky in the details. I am very happy with our current design, which works reliably with tabs and spaces, and does not rely on monospaced fonts. But it only works if all you are allowed to ask is: “what is the column of the start of this line?”. The whole thing breaks down if you need to count non-space characters. So, in

val s = "
  ...
  "

you are not allowed to ask on what column the leading " is. An indentation width for us is a string of spaces and tabs. So the leading " does not have a meaningful indentation width.

gabro · April 14, 2020, 9:14am

Adding this to the mix, in case it’s useful as inspiration https://github.com/davegurnell/unindent

h-vetinari · April 14, 2020, 10:35am

I responded to the tabs question already (either warn or raise, as mixing is almost always an artefact). If someone really needs to mix tabs and spaces, they can still fall back to the .stripMargin-construct.

And the question about proportional font characters seems to me to have no bearing on left-stripping spaces (unless someone uses proportional space-like characters, which is just begging for trouble to a degree that it doesn’t deserve support IMO). And again, the fallback to .stripMargin would be available.

Assuming those two objections are overcome, the start could then be used reliably, as I illustrated above. This feature is clearly about developer comfort - the goal should (IMO) be the biggest gain in ease of use for the vast majority of cases, not complete compatibility down to the most obscure edge case.

odersky · April 14, 2020, 10:59am

What if a formatting uses only tabs? And, using proportional fonts how do I align with with " in

  val s = "

martijnhoekstra · April 14, 2020, 11:24am

Hey, that’s pretty close to my hobby horse!

Without looking, I fear this may not be compatible with https://github.com/scala/scala/pull/8830

h-vetinari · April 14, 2020, 11:58am

If someone is truly using non-monospaced fonts in an IDE (I shudder at the thought, but maybe that’s just me…; the same confusion would apply to different indentation levels within their multiline strings), they still have the .stripMargin-construct at their disposal.

PS. In my previous response, I assumed (falsely, it appears) you were talking about characters that didn’t have unit width even in utf-8. I’m guessing I don’t have enough imagination to consider programming in times new roman.

morgen-peschke · April 14, 2020, 3:45pm

Random thought: why not keep the same basic behavior as the current stripMargin?

A different delimiter could be used to indicate it’s a compile-time construct (possibly ```), and the margin character would be determined by the first character in the string:

val runTime = """one
                |two
                | three""".stripMargin

val compileTime = ```|one
                     |two
                     | three```
runTime == compileTime

scalway · April 14, 2020, 4:39pm

I feel bit uncomfortable with "standard string" notation in multi-line form.

It means we would need to always espace " character and it could be frustrating for all copy-paste operations (e.g. html, js, xml and so on).
It will also confuse IDE’s a bit because every unclosed string potentially contains rest of the file (For example I have turned off option that inserts pair quote in Idea).

My proposal is to prefer java’s way (if I understand it corectly):

//auto stripMargins 
//and skip first line 
//if first line is: `fl.isEmpty || fl.forall(_.isWhitespace)`
val longText = """
  this is 
  very long text
    yea
  """
//this is 
//very long text
//  yea


//keep trailing whitespaces otherwise
val longText = """oh
  this is
  very long text
    yea
  """
//oh
//  this is
//  very long text
//    yea

//produce warning if after `trimPrefix` first line contains something
val longText = """ oh
  this is
  very long text
  """
//warning: there are whitespaces before 'oh' in first line of multi-line string.
//warning: remove them or
//warning: use `someCompilerFlag` to suppress this warning or
//warning: add new line before
//warning: current behaviour: we stripMargins as if the content was in a new line
//oh
//  this is
//  very long text

//stripMargin still works as expected 
//this will keep string as is but stripMargin will strip it properly
val longText = """oh
   |this is
   |very long text
  """.stripMargin

//this will keep string as is but stripMargin will strip it properly
val longText = """|oh
   |this is
   |very long text
  """.stripMargin

//this will remove whitespaces by new design and stripMargin will remove `|` 
val longText = """
   |oh
   |this is
   |very long text
  """.stripMargin
//oh
//this is
//very long text

//don't know how to force compiler to keep indentation and leading newline. 
//here is ugly workaround
val longText = """>
   I really 
   want to have 
   this indentation
  """.drop(1)
//
//   I really 
//   want to have 
//   this indentation

But it changes current behavior and this could lead to migration hell :(. Don’t have better ideas.

sake92 · April 15, 2020, 1:02pm

Could we use the closing """ position instead of counting the spaces for each line? It would strip as many spaces as before closing """. If it went past any non-space character it would be compile error.
I’d love tabs to be a compiletime error also.

EDIT:
Seems like Java 14 forces \s*\n on opening triplequote, e.g. these don’t work:
""" test """

""" test
"""

This does:

""" 
test"""

I like this because it makes handling of first line easier. We just ignore it…
I don’t like it strips trailling whitespace though. It has its usecases like Markdown, where space at end of line is significant!
Try here https://tryjshell.org/

In summary, these would be nice:

force blanks+newline after opening triplequote
force newline+blanks before closing triplequote
use closing triplequote as index of how many spaces to remove

odersky · April 20, 2020, 3:34pm

That’s exactly my earlier proposal, but with " instead of """, so that we do not need to change the semantics of """.