A typical usage of a Scala multiline string literal looks like:
val s = """one
|two
|three""".stripMargin.
I’ve always disliked the need to manually specify the margin using | and then strip them away using stripMargin, and now Java is introducing multiline string literals using basically the same syntax, but with automatic margin stripping: https://openjdk.java.net/jeps/378. So how about we adopt the same algorithm for Scala string literals? It’s a breaking change but I think it’s less likely to break things than to fix existing code where usage of stripMargin was forgotten (I’ve seen this happen multiple times just in the dotty codebase, so I expect this to be a common error).
I think this would be an interesting idea. But I just looked at the JEP and it is … rather frightening in its complexity. Interesting that Java now uses significant indentation, even though it is only for this specific usecase. Also, I wonder how they will deal with tabs in incidental whitespace; the JEP does not talk about them at all.
The main problem I see adopting this is to define what is incidental whitespace. In
val s = """one
two
three""".
How do we know where to strip? Do we count characters? What if someone uses a proportional font (some people do!). And what if people use tabs instead of spaces?
The Java proposal sidesteps this by demanding a newline after the initial “”" which is not included in the string. So it would have to be
val s = """
one
two
three"""
But if we adopted this we’d be doubly incompatible with the current design because we would have to skip both whitespace to the left and the initial newline. I agree whitespace to the left would be rarely intended but the initial newline could certainly be intentional.
Maybe we could sidestep all this by using single quotes?
I.e.
val s = "
one
two
three
"
The rule would be that a single quote at the end of a line starts a multi-line string literal. The string is terminated by another single quote at the start of a line. Initial and final newlines are not counted. This means that triple-quote literals would no longer be necessary and could be deprecated and phased out.
To determine what is incidental whitespace I would use the indentation of the final quote. Everything to the left of it is incidental. I.e.
print("
one
two
three
").replace(" ", "_")
println("!")
prints
__one
__two
__three!
This gives more control than the JEP’s incidental whitespace algorithm and is dramatically simpler.
While that may be smart and simple, it is IMO basically impossible to read (i.e. know from reading what the result will be). Especially in long multi-line strings, one might not even see the end in the IDE, and hence constantly have to scroll for what’s actually significant indentation.
For these reasons, I think the alignment has to be determined by the beginning. And that doesn’t have to be the extra newline from the JEP. It could just be the characters to the right of """ (same output as your example):
print(""" one
two
three""").replace(" ", "_")
println("!")
This would be like before with .stripMargin, but just replacing the |'s with an extra space.
As for tabs vs. spaces: that’s almost always unintentional and should IMO raise a warning (along with a reasonable default behaviour for handling it) or just an error.
In case it is of interest, YANG (RFC 7950, section 6.1.3), uses the following rule:
If a double-quoted string contains a line break followed by space or
tab characters that are used to indent the text according to the
layout in the YANG file, this leading whitespace is stripped from the
string, up to and including the column of the starting double quote
character, or to the first non-whitespace character, whichever occurs
first. Any tab character in a succeeding line that must be examined
for stripping is first converted into 8 space characters.
Note that the JEP has the same problem: To find out about indentation, you have to inspect all lines of the multi-line string, so you also have to scroll to the end.
Using the start does not work reliably, as I have argued in my previous post. Specifically, it does not work with proportional font characters, or with tabs. It would work reliably if we insisted that the start is on its own line, as @LPTK suggested. But I find that very hard to read.
Some explanation: Using indentation is tricky in the details. I am very happy with our current design, which works reliably with tabs and spaces, and does not rely on monospaced fonts. But it only works if all you are allowed to ask is: “what is the column of the start of this line?”. The whole thing breaks down if you need to count non-space characters. So, in
val s = "
...
"
you are not allowed to ask on what column the leading " is. An indentation width for us is a string of spaces and tabs. So the leading " does not have a meaningful indentation width.
I responded to the tabs question already (either warn or raise, as mixing is almost always an artefact). If someone really needs to mix tabs and spaces, they can still fall back to the .stripMargin-construct.
And the question about proportional font characters seems to me to have no bearing on left-stripping spaces (unless someone uses proportional space-like characters, which is just begging for trouble to a degree that it doesn’t deserve support IMO). And again, the fallback to .stripMargin would be available.
Assuming those two objections are overcome, the start could then be used reliably, as I illustrated above. This feature is clearly about developer comfort - the goal should (IMO) be the biggest gain in ease of use for the vast majority of cases, not complete compatibility down to the most obscure edge case.
If someone is truly using non-monospaced fonts in an IDE (I shudder at the thought, but maybe that’s just me…; the same confusion would apply to different indentation levels within their multiline strings), they still have the .stripMargin-construct at their disposal.
PS. In my previous response, I assumed (falsely, it appears) you were talking about characters that didn’t have unit width even in utf-8. I’m guessing I don’t have enough imagination to consider programming in times new roman.
Random thought: why not keep the same basic behavior as the current stripMargin?
A different delimiter could be used to indicate it’s a compile-time construct (possibly ```), and the margin character would be determined by the first character in the string:
val runTime = """one
|two
| three""".stripMargin
val compileTime = ```|one
|two
| three```
runTime == compileTime
I feel bit uncomfortable with "standard string" notation in multi-line form.
It means we would need to always espace " character and it could be frustrating for all copy-paste operations (e.g. html, js, xml and so on).
It will also confuse IDE’s a bit because every unclosed string potentially contains rest of the file (For example I have turned off option that inserts pair quote in Idea).
My proposal is to prefer java’s way (if I understand it corectly):
//auto stripMargins
//and skip first line
//if first line is: `fl.isEmpty || fl.forall(_.isWhitespace)`
val longText = """
this is
very long text
yea
"""
//this is
//very long text
// yea
//keep trailing whitespaces otherwise
val longText = """oh
this is
very long text
yea
"""
//oh
// this is
// very long text
// yea
//produce warning if after `trimPrefix` first line contains something
val longText = """ oh
this is
very long text
"""
//warning: there are whitespaces before 'oh' in first line of multi-line string.
//warning: remove them or
//warning: use `someCompilerFlag` to suppress this warning or
//warning: add new line before
//warning: current behaviour: we stripMargins as if the content was in a new line
//oh
// this is
// very long text
//stripMargin still works as expected
//this will keep string as is but stripMargin will strip it properly
val longText = """oh
|this is
|very long text
""".stripMargin
//this will keep string as is but stripMargin will strip it properly
val longText = """|oh
|this is
|very long text
""".stripMargin
//this will remove whitespaces by new design and stripMargin will remove `|`
val longText = """
|oh
|this is
|very long text
""".stripMargin
//oh
//this is
//very long text
//don't know how to force compiler to keep indentation and leading newline.
//here is ugly workaround
val longText = """>
I really
want to have
this indentation
""".drop(1)
//
// I really
// want to have
// this indentation
But it changes current behavior and this could lead to migration hell :(. Don’t have better ideas.
Could we use the closing """ position instead of counting the spaces for each line? It would strip as many spaces as before closing """. If it went past any non-space character it would be compile error.
I’d love tabs to be a compiletime error also.
EDIT:
Seems like Java 14 forces \s*\n on opening triplequote, e.g. these don’t work: """ test """
""" test
"""
This does:
"""
test"""
I like this because it makes handling of first line easier. We just ignore it…
I don’t like it strips trailling whitespace though. It has its usecases like Markdown, where space at end of line is significant!
Try here https://tryjshell.org/
In summary, these would be nice:
force blanks+newline after opening triplequote
force newline+blanks before closing triplequote
use closing triplequote as index of how many spaces to remove