Multiline string literals: can we get rid of the need for stripMargin?

In case it is of interest, YANG (RFC 7950, section 6.1.3), uses the following rule:

If a double-quoted string contains a line break followed by space or
tab characters that are used to indent the text according to the
layout in the YANG file, this leading whitespace is stripped from the
string, up to and including the column of the starting double quote
character, or to the first non-whitespace character, whichever occurs
first. Any tab character in a succeeding line that must be examined
for stripping is first converted into 8 space characters.

Note that the JEP has the same problem: To find out about indentation, you have to inspect all lines of the multi-line string, so you also have to scroll to the end.

Using the start does not work reliably, as I have argued in my previous post. Specifically, it does not work with proportional font characters, or with tabs. It would work reliably if we insisted that the start is on its own line, as @LPTK suggested. But I find that very hard to read.

Some explanation: Using indentation is tricky in the details. I am very happy with our current design, which works reliably with tabs and spaces, and does not rely on monospaced fonts. But it only works if all you are allowed to ask is: “what is the column of the start of this line?”. The whole thing breaks down if you need to count non-space characters. So, in

val s = "
  ...
  "

you are not allowed to ask on what column the leading " is. An indentation width for us is a string of spaces and tabs. So the leading " does not have a meaningful indentation width.

Adding this to the mix, in case it’s useful as inspiration https://github.com/davegurnell/unindent

2 Likes

I responded to the tabs question already (either warn or raise, as mixing is almost always an artefact). If someone really needs to mix tabs and spaces, they can still fall back to the .stripMargin-construct.

And the question about proportional font characters seems to me to have no bearing on left-stripping spaces (unless someone uses proportional space-like characters, which is just begging for trouble to a degree that it doesn’t deserve support IMO). And again, the fallback to .stripMargin would be available.

Assuming those two objections are overcome, the start could then be used reliably, as I illustrated above. This feature is clearly about developer comfort - the goal should (IMO) be the biggest gain in ease of use for the vast majority of cases, not complete compatibility down to the most obscure edge case.

What if a formatting uses only tabs? And, using proportional fonts how do I align with with " in

  val s = "

Hey, that’s pretty close to my hobby horse!

Without looking, I fear this may not be compatible with https://github.com/scala/scala/pull/8830

If someone is truly using non-monospaced fonts in an IDE (I shudder at the thought, but maybe that’s just me…; the same confusion would apply to different indentation levels within their multiline strings), they still have the .stripMargin-construct at their disposal.

PS. In my previous response, I assumed (falsely, it appears) you were talking about characters that didn’t have unit width even in utf-8. I’m guessing I don’t have enough imagination to consider programming in times new roman. :sweat_smile:

1 Like

Random thought: why not keep the same basic behavior as the current stripMargin?

A different delimiter could be used to indicate it’s a compile-time construct (possibly ```), and the margin character would be determined by the first character in the string:

val runTime = """one
                |two
                | three""".stripMargin

val compileTime = ```|one
                     |two
                     | three```
runTime == compileTime
3 Likes

I feel bit uncomfortable with "standard string" notation in multi-line form.

  • It means we would need to always espace " character and it could be frustrating for all copy-paste operations (e.g. html, js, xml and so on).
  • It will also confuse IDE’s a bit because every unclosed string potentially contains rest of the file (For example I have turned off option that inserts pair quote in Idea).

My proposal is to prefer java’s way (if I understand it corectly):

//auto stripMargins 
//and skip first line 
//if first line is: `fl.isEmpty || fl.forall(_.isWhitespace)`
val longText = """
  this is 
  very long text
    yea
  """
//this is 
//very long text
//  yea


//keep trailing whitespaces otherwise
val longText = """oh
  this is
  very long text
    yea
  """
//oh
//  this is
//  very long text
//    yea

//produce warning if after `trimPrefix` first line contains something
val longText = """ oh
  this is
  very long text
  """
//warning: there are whitespaces before 'oh' in first line of multi-line string.
//warning: remove them or
//warning: use `someCompilerFlag` to suppress this warning or
//warning: add new line before
//warning: current behaviour: we stripMargins as if the content was in a new line
//oh
//  this is
//  very long text

//stripMargin still works as expected 
//this will keep string as is but stripMargin will strip it properly
val longText = """oh
   |this is
   |very long text
  """.stripMargin

//this will keep string as is but stripMargin will strip it properly
val longText = """|oh
   |this is
   |very long text
  """.stripMargin

//this will remove whitespaces by new design and stripMargin will remove `|` 
val longText = """
   |oh
   |this is
   |very long text
  """.stripMargin
//oh
//this is
//very long text

//don't know how to force compiler to keep indentation and leading newline. 
//here is ugly workaround
val longText = """>
   I really 
   want to have 
   this indentation
  """.drop(1)
//
//   I really 
//   want to have 
//   this indentation

But it changes current behavior and this could lead to migration hell :(. Don’t have better ideas.

Could we use the closing """ position instead of counting the spaces for each line? It would strip as many spaces as before closing """. If it went past any non-space character it would be compile error.
I’d love tabs to be a compiletime error also. :grin:

EDIT:
Seems like Java 14 forces \s*\n on opening triplequote, e.g. these don’t work:
""" test """

""" test
"""

This does:

""" 
test"""

I like this because it makes handling of first line easier. We just ignore it… :smiley:
I don’t like it strips trailling whitespace though. It has its usecases like Markdown, where space at end of line is significant!
Try here https://tryjshell.org/


In summary, these would be nice:

  • force blanks+newline after opening triplequote
  • force newline+blanks before closing triplequote
  • use closing triplequote as index of how many spaces to remove
2 Likes

That’s exactly my earlier proposal, but with " instead of """, so that we do not need to change the semantics of """.

2 Likes

Please do not forget this use case:

"""
    one
    "two"
    three
  """

I really like this about """ now. Same example with escaping single quotes:

"
    one
   \"two\"
    three
  "
4 Likes

``` would allow preserving the current semantics of """ and ", and avoid the need to quote embedded single quotes:

```
    one
    "two"
    three
  ```

There are other alternatives for creating a fenced block that could be worth exploring, like the heredoc:

val multiLineString = <<<EOF
    one
    "two"
    three
  EOF

println(<<<EOF
    one
    "two"
    three
  EOF
)

Yes, but the problem is that we cannot change the semantics of """ and we will not introduce a third quote such as ```. So that leaves us with multiline " which is still available. But having to escape quotes at start of lines is not nice, I agree.

One solution to this would be to use the existing stripMargin. I.e. you could write the example like this:

  "
  |one
  |"two"
  |three
  ".stripMargin

That’s just the old stripMargin we have, used in the new context.

heredoc idea looks interesting and could be changed a bit to reuse " syntax, i.e.

val x = "EOF
    one
    "two"
    """three"""
EOF
1 Like

A conservative approach would be to have a new stripping interpolator that:

val x = str”foo
           |bar
           |baz”

And deprecate the old one.

Using a custom interpolator is not good enough as I mentioned before: Multiline string literals: can we get rid of the need for stripMargin?

Hear, hear! or rather Here, here! for heredoc.

REPL lets you specify a margin character, by analogy to detabbing <<- or <<~ style.

scala> :paste <| EOF
// Entering paste mode (EOF to finish)

    |class C { def f = 42 }
EOF

// Exiting paste mode, now interpreting.

defined class C

Probably there would be more demand for this sort of thing, and uniformity across snippets, scripts, and so-called normal code, if Scala were used more in anger for “scaled” development. Right now, as per the other thread, my main snippet doesn’t scale from REPL to @main.

As for Scaladoc, we’d need to standardize the spelling of hereDoc. I’d be inclined to pronounce it like “heretic”, as it is a syntactic tic.

It’s clear heredoc syntax isn’t used in REPL because no one complains about it. I’d expect the first case to detect the desired indent.

scala> :pa <-
// Entering paste mode (ctrl-D to finish)

  val s = """
    hi
    world
  """

// Exiting paste mode, now interpreting.

s: String =
"
hi
world
"

scala> :pa <*
// Entering paste mode (ctrl-D to finish)

val s = """
  *  hi
  *  world
"""

// Exiting paste mode, now interpreting.

s: String =
"
  hi
  world
"

scala>

It’s encouraging that REPL already prints multiline singly-quoted strings.

The current limitation on interpolators excludes val s = <<EOF"my text... in some shoehorning of syntax. Reminder that the result can also take parameters, such as

val s = sm" * strip comment asterisk"("* ")

I haven’t read through all the comments, just wanted to mention that groovy has a stripIndent function. I did not have a look at their algorithm but maybe it is good enough (never had problems at least):

I see that you linked to a post that said it wasn’t optimal because you felt the default could do it, but I don’t see that it “is not good enough”. It is a fact of life to deal with deprecations. Adding a new interpolator and deprecating the old one seems “good enough” to me. Adding more syntax is a real cost.