In case it is of interest, YANG (RFC 7950, section 6.1.3), uses the following rule:
If a double-quoted string contains a line break followed by space or
tab characters that are used to indent the text according to the
layout in the YANG file, this leading whitespace is stripped from the
string, up to and including the column of the starting double quote
character, or to the first non-whitespace character, whichever occurs
first. Any tab character in a succeeding line that must be examined
for stripping is first converted into 8 space characters.
Note that the JEP has the same problem: To find out about indentation, you have to inspect all lines of the multi-line string, so you also have to scroll to the end.
Using the start does not work reliably, as I have argued in my previous post. Specifically, it does not work with proportional font characters, or with tabs. It would work reliably if we insisted that the start is on its own line, as @LPTK suggested. But I find that very hard to read.
Some explanation: Using indentation is tricky in the details. I am very happy with our current design, which works reliably with tabs and spaces, and does not rely on monospaced fonts. But it only works if all you are allowed to ask is: “what is the column of the start of this line?”. The whole thing breaks down if you need to count non-space characters. So, in
val s = "
...
"
you are not allowed to ask on what column the leading " is. An indentation width for us is a string of spaces and tabs. So the leading " does not have a meaningful indentation width.
I responded to the tabs question already (either warn or raise, as mixing is almost always an artefact). If someone really needs to mix tabs and spaces, they can still fall back to the .stripMargin-construct.
And the question about proportional font characters seems to me to have no bearing on left-stripping spaces (unless someone uses proportional space-like characters, which is just begging for trouble to a degree that it doesn’t deserve support IMO). And again, the fallback to .stripMargin would be available.
Assuming those two objections are overcome, the start could then be used reliably, as I illustrated above. This feature is clearly about developer comfort - the goal should (IMO) be the biggest gain in ease of use for the vast majority of cases, not complete compatibility down to the most obscure edge case.
If someone is truly using non-monospaced fonts in an IDE (I shudder at the thought, but maybe that’s just me…; the same confusion would apply to different indentation levels within their multiline strings), they still have the .stripMargin-construct at their disposal.
PS. In my previous response, I assumed (falsely, it appears) you were talking about characters that didn’t have unit width even in utf-8. I’m guessing I don’t have enough imagination to consider programming in times new roman.
Random thought: why not keep the same basic behavior as the current stripMargin?
A different delimiter could be used to indicate it’s a compile-time construct (possibly ```), and the margin character would be determined by the first character in the string:
val runTime = """one
|two
| three""".stripMargin
val compileTime = ```|one
|two
| three```
runTime == compileTime
I feel bit uncomfortable with "standard string" notation in multi-line form.
It means we would need to always espace " character and it could be frustrating for all copy-paste operations (e.g. html, js, xml and so on).
It will also confuse IDE’s a bit because every unclosed string potentially contains rest of the file (For example I have turned off option that inserts pair quote in Idea).
My proposal is to prefer java’s way (if I understand it corectly):
//auto stripMargins
//and skip first line
//if first line is: `fl.isEmpty || fl.forall(_.isWhitespace)`
val longText = """
this is
very long text
yea
"""
//this is
//very long text
// yea
//keep trailing whitespaces otherwise
val longText = """oh
this is
very long text
yea
"""
//oh
// this is
// very long text
// yea
//produce warning if after `trimPrefix` first line contains something
val longText = """ oh
this is
very long text
"""
//warning: there are whitespaces before 'oh' in first line of multi-line string.
//warning: remove them or
//warning: use `someCompilerFlag` to suppress this warning or
//warning: add new line before
//warning: current behaviour: we stripMargins as if the content was in a new line
//oh
// this is
// very long text
//stripMargin still works as expected
//this will keep string as is but stripMargin will strip it properly
val longText = """oh
|this is
|very long text
""".stripMargin
//this will keep string as is but stripMargin will strip it properly
val longText = """|oh
|this is
|very long text
""".stripMargin
//this will remove whitespaces by new design and stripMargin will remove `|`
val longText = """
|oh
|this is
|very long text
""".stripMargin
//oh
//this is
//very long text
//don't know how to force compiler to keep indentation and leading newline.
//here is ugly workaround
val longText = """>
I really
want to have
this indentation
""".drop(1)
//
// I really
// want to have
// this indentation
But it changes current behavior and this could lead to migration hell :(. Don’t have better ideas.
Could we use the closing """ position instead of counting the spaces for each line? It would strip as many spaces as before closing """. If it went past any non-space character it would be compile error.
I’d love tabs to be a compiletime error also.
EDIT:
Seems like Java 14 forces \s*\n on opening triplequote, e.g. these don’t work: """ test """
""" test
"""
This does:
"""
test"""
I like this because it makes handling of first line easier. We just ignore it…
I don’t like it strips trailling whitespace though. It has its usecases like Markdown, where space at end of line is significant!
Try here https://tryjshell.org/
In summary, these would be nice:
force blanks+newline after opening triplequote
force newline+blanks before closing triplequote
use closing triplequote as index of how many spaces to remove
Yes, but the problem is that we cannot change the semantics of """ and we will not introduce a third quote such as ```. So that leaves us with multiline " which is still available. But having to escape quotes at start of lines is not nice, I agree.
One solution to this would be to use the existing stripMargin. I.e. you could write the example like this:
"
|one
|"two"
|three
".stripMargin
That’s just the old stripMargin we have, used in the new context.
REPL lets you specify a margin character, by analogy to detabbing <<- or <<~ style.
scala> :paste <| EOF
// Entering paste mode (EOF to finish)
|class C { def f = 42 }
EOF
// Exiting paste mode, now interpreting.
defined class C
Probably there would be more demand for this sort of thing, and uniformity across snippets, scripts, and so-called normal code, if Scala were used more in anger for “scaled” development. Right now, as per the other thread, my main snippet doesn’t scale from REPL to @main.
As for Scaladoc, we’d need to standardize the spelling of hereDoc. I’d be inclined to pronounce it like “heretic”, as it is a syntactic tic.
It’s clear heredoc syntax isn’t used in REPL because no one complains about it. I’d expect the first case to detect the desired indent.
scala> :pa <-
// Entering paste mode (ctrl-D to finish)
val s = """
hi
world
"""
// Exiting paste mode, now interpreting.
s: String =
"
hi
world
"
scala> :pa <*
// Entering paste mode (ctrl-D to finish)
val s = """
* hi
* world
"""
// Exiting paste mode, now interpreting.
s: String =
"
hi
world
"
scala>
It’s encouraging that REPL already prints multiline singly-quoted strings.
The current limitation on interpolators excludes val s = <<EOF"my text... in some shoehorning of syntax. Reminder that the result can also take parameters, such as
I haven’t read through all the comments, just wanted to mention that groovy has a stripIndent function. I did not have a look at their algorithm but maybe it is good enough (never had problems at least):
I see that you linked to a post that said it wasn’t optimal because you felt the default could do it, but I don’t see that it “is not good enough”. It is a fact of life to deal with deprecations. Adding a new interpolator and deprecating the old one seems “good enough” to me. Adding more syntax is a real cost.