SIP-XX: Dedented Multiline String Literals

Good point! I use this all the time. (Yes, this can be automated to be prepended, but sometimes it’s just easier to add it in the string definition itself)

I gave three options for that! My favorite is

val text = "
  I want this text
  to be indented
  by "   ".
".indentBy("   ")

The thing with the indentation-based text delimiters is that it is extremely reliable, so we don’t have to worry about allowing trailing commands. (The others work technically, but we need to think a little more carefully about whether it’s confusing and which cases work.) And since it’s all const values, one could modify the compiler’s const string handling to create the appropriate const string from the method. Another advantage of indentBy is that it’s a generally useful thing to have. For instance, if I had indentBy, I would use it to create properly indented strings for stripMargin which I would then paste in :laughing:

But I agree that the HEREDOC approach is a good choice also.

2 Likes

For what it’s worth, I encoded two de-indentation routines and they both feel very usable to me. They could probably be pushed to compile-time in a macro, but I haven’t bothered yet.

Both dedent to min indentation:

scala> """
     |     This is an example string.
     |   It's indented, but we'll fix that.
     |     All done!
     | """.dedent()
val res0: String = "  This is an example string.
It's indented, but we'll fix that.
  All done!
"
`

and a flavor with an explicit margin marker:

```scala
scala> val string = """
     |    |
     |      This is my string
     | 
     |    you had better love it
     |    because I do
     |  """.demargin()
val string: String = "  This is my string

you had better love it
because I do"

Tweaks to the details (e.g. terminal newline kept vs discarded) are also pretty easy.

Anyway, I’ve used both a bit, and they’re both really easy to get the hang of. I think, therefore, that the bikeshedding of the details is relatively unimportant. And we can get the job done with triple quotes already. So I think the thing to do is to get any flavor of the proposal in (with " which I think has maximal usability).

I hadn’t considered that, but this SIP could also improve the repl and other debug outputs:

scala> List("Hello", "World").mkstring("\n")
val res0: String = "Hello
World"

Could become:

scala> List("Hello", "World").mkstring("\n")
val res0: String = '''
  Hello
  World
'''
3 Likes

As a tree-sitter-scala maintainer, I’m not thrilled with the idea of adding yet-another string literal, for relatively narrow niche use cases. We already have double-quotes, string literal, and triple quotes. Multi-line string literal requires changing C scanner code in https://github.com/tree-sitter/tree-sitter-scala/blob/master/src/scanner.c, which I’m not looking forward to.

If at all possible, I request that we try to solve without expanding the lexical syntax of Scala language. For example, I’m +1 on @Ichoran’s dedent().

    val x = """
    i am cow
    hear me moo
    """.dedent

Alternatives considered section says:

A macro-based .stripMarginMacro could avoid the issue with composition of interpolators mentioned above, but still will suffer from the issue of not being literal types or _literal string expressions), and also would not work in pattern matching.

While that might be true for user-land post-typer macros, when you’re the Scala 3 compiler you can substitute the AST pre-typer and or treat """.dedent as an alternative closing token. That would have the lexical shape of current Scala 3.x code, but can behave exactly how you want the new syntax to behave.

Special note on more-than-three '. Please don’t do the extended thing. Every editor would have to implement some sort of stateful parsing, and we will all get it wrong.

Thanks for your feedback Eugene!

Regarding Tree-Sitter-Scala, I have taken a first pass at implementing the triple-single-quoted strings in the tree-sitter grammar ([WIP] Add support for de-dented `'''` strings in tree-sitter grammar by lihaoyi · Pull Request #477 · tree-sitter/tree-sitter-scala · GitHub). You are right that it took some wrangling of C code to make it work. So far the implementation is a proof-of-concept and is not as strict as it could be - e.g. it doesn’t enforce the starting/ending newline - but it does implement the extended delimiter syntax in the proposal and includes tests that seem to pass.

I must admit that I’m new to both C and Tree-Sitter, so the implementation is somewhat messy and likely not as neat as it could be. But as a start I think it shouldn’t be too difficult to clean it up and get it into a merge-able state. Hopefully implementing this in other parsers will be similarly tractable

1 Like

I have implemented a proof-of-concept IDE support for the new extensible ''' syntax in IntelliJ IDEA as well

2 Likes

I’m just an user, so I don’t know the impact on parsing or number of lines in Scala specs.

First, I like the idea of being able to just copy/past json or markdown/asciidoc, plus all the explanation given in the SIP about why stripMargin is such a complexity and inconvenience friction nexus that it generally just doesn’t worth it.

I really, really like the possibility to have configurable delimiters like '''' or "-- to reach a point where you never need escaping in a multiline string, in a real clearly delimiter rectangular area, nicely aligned with the other parts of the source.

I think I prefer the extensible triple simple quotes because it brings more consistency between Scala code bases, and we don’t add new purely accidental divergence wrt to what is “Scala code”.

And after some tests, I hate the multi-line single quote. It keep the rigidity of previous solutions, leading to new corner cases.
So in place of something that can be explained hyper simply to everyone: “for multiline, use ''' on its own line. The text indent starts at first quote column. Add more quotes if your text has a triple single quote in it”.
We are back to “and you need to escape when the quote starts the line, but not if the quote is further on” etc.

If the solution still has corner cases, I think it really don’t worth the added complexity. (I hope I didn’t miss some of the workaround in the thread which would lead to a working general solution, in both meaningful white space and not case - but the fact that there’s a need for such complexity is a red herring by itself).

And it’s so unpleasant to parse on long multilines without syntax coloring or with a too small text box reflowing the text!
Reviewing a PR on smartphone on github with that will be extremely confusing.

So please, just go with what Li Haoyi write in the SIP.

2 Likes

Huh? “To create a multi-line quote, put a quote at the end of the line and open a new block. The un-indented block is your string. Close the block with another quote.”

How is that full of corner cases? You have no quoting at all.

The ''' version has a corner case in that it can’t have an embedded '''. You have to detect that and make sure you open and close with extra quotes.

The " version with auto dedent has a corner case in that you can’t have all your lines start with whitespace.

Otherwise they’re corner-case-free. (The " version without auto-dedent does have corner cases. But the extended delimiter solution doesn’t work as well for that.)

So, if you paste code literals, then if you want all the lines indented then you should prefer a non-auto-dedent solution which basically has to be ''', and otherwise you should prefer " because there is no “pasting string literals corner case”. The extra indentation fixes it.

// This is a broken corner case
val indentedMultiLineStringExample = '''
  val str = '''
    I am indented
    by two spaces
  '''
  println(str)
  '''

// This is also a broken corner case but broken differently
val exampleIndentedMultiLineString = "
  val str = "
    I am indented
    by two spaces
  "
  println(str)
"

The first one botches the string ending, and the second one claims to be quoting code with an indented string but the string isn’t actually indented. The fixes are

val workingA = ''''
  val str = '''
    I am indented
    by two spaces
  '''
  println(str)
  ''''

val workingB = "
  val str = "
    I am indented
    by two spaces
  ".indentBy("  ")
  println(str)
"

You can ask for '''' on its own line to open the string as well, but fundamentally you can’t wrap code with multi-line string literals at the same level.

2 Likes

Rounded off this round of POCs with an implementation of ''' strings for Scala-VsCode-Syntax, we now have proof-of-concept implementations to support extensible ''' string delimiters in all major Scala IDEs:

From the experience implementing these three syntax highlighters, implementing support for ''' and extensible delimiters was straightforward (it took ~1 afternoon for all three prototypes) but implementing the various indentation-based single-" syntaxes proposed by @odersky or @sjrd in syntax highlighters seems pretty difficult:

  1. IntelliJ uses a CFG-like grammar for code-generation. AFAICT it does not have support for scanning the lines and taking a num_quotes_on_line % 2 == 1 check for termination, nor does it support indentation-based lexing

  2. TreeSitter-Scala’s lexing of strings relies on external scanners, which allow you to customize lexing once a token has started, but AFAICT do not allow you to “lookbehind” to find the indentation of the current line to use for lexing purposes (although it can capture the current column position of the opening quote, if we wish to use that as the indentation to delimit the string)

  3. VSCode relies on Textmate Grammars, which are constraints to using regexes for lexing and again cannot easily do modulo-arithmetic on the number of quotes on a line, nor can they look behind and capture indentation

While arbitrarily sophisticated lexing strategies can be implemented inside the Scala compiler that uses stateful imperative Scala code to perform lexing and parsing, we need to make sure that we do not break downstream tools that are typically reliant on more constrained syntax models. As far as I can tell, the sophisticated lexing strategies proposed to make multi-line single-" strings work without verbose escaping of "s seem incompatible with the lexing frameworks used by most downstream tools

Perhaps someone with more experience with fancier lexing strategies can take a crack at it and prove me wrong, but the proposed single-" syntaxes are pretty unusual and do not have much precedence in programming language syntaxes I could follow: while many languages have indentation-based parsing for combining tokens into nested blocks, I did not find any with indentation-based lexing or anything similar to the num_quotes_on_line % 2 == 1 approach that I could mimic for building up the tokens in the first place. The ''' syntax was much simpler because it could be lexed without any care for indentation or other concerns, with the indentation-processing and stripping and other syntax checks being done in a follow-up phase that does not affect the initial lexing at all

6 Likes

Just my 2 cents here.

If the overall effort for syntax in Scala 3 is to make it easier for engineers using other languages to start using Scala, shouldn’t we go the way other languages already solve this issue?

Another point: let’s not break the tooling. Let’s go the way it is easier.

So I think I would chose the proposal of @lihaoyi

Hello from the Scala Plugin team.

In principle, the idea of having the indentation-based (“Dedented”) multiline string literals makes sense. Once the final syntax is decided, I would also give it an extended preview phase. A lot of issues with the existing syntax of multiline string literals became clear only after some years of using them.

Speaking of IDE, the change inside IDEs would not be that hard in principle.
However, it will require quite some efforts to fix all the functionality that depend on that.
Note, it’s not only about reading/parsing/lexing and highlighting the code.
There are many other things, use cases, IDE features that should be kept in mind.

Here is a non-exhaustive list of things that we should remember of, that we would need to be adopted:

  • Auto-indent on Enter/Delete/Backspace
  • Insert/delete complementary quote
  • Don’t modify the contents on Format
  • Don’t add .stripMargin
  • Intention: convert between different literal kinds
  • Copy/Paste translation
  • String interpolation
  • Language injection
  • Additional inspections to help users (preferable to lexer errors)
  • Else?
2 Likes

We don’t have a very strong opinion on the particular syntax in the team.

However, the proposed '''...''' seems quite reasonable in practice.
It has the least number of drawbacks and edge cases, it’s relatively easy to support and to comprehend when reading.

I first was leaning to the version with a single quote ". But it seems to introduce too many edge cases. In practice the overloading of ' semantics shouldn’t bring much cognitive load (again, compared to alternatives), especially when reading the code. What users will see is:

'''
   SOME CONTENT
''' 

all highlighted in green.
It will be hard to confuse it with a character 'A'.

As for other alternatives, it would be nice if the change was solely syntactic and did NOT require resolution/any compiler phases. So I would avoid solutions with some new special interpolators or macro methods.

Could you expand on what you think the edge cases are? Some of the proposals introduce edge cases simply because they are trying to be more conservative: “use it the way we think looks right”.

If you follow the natural rules for multi-line single-quote literals, the edge cases seem minimal–in fact I can’t think of any edge case for " that isn’t identical to an edge case for '''.

The worst edges are, I think, in embedded multi-line code blocks. But these are the same either way.

For example:

// Before refactor
def dayOfWeek =
  val formatter = java.time.format.DateTimeFormatter.ofPattern("EEEE")
  java.time.LocalDate.now().format(formatter)
def sayHello(name: String): Unit =
  println("Hello $name!\n  Happy $dayOfWeek!\nHave a wonderful day!")

// After refactor
def sayHello(name: String)(using Logger): Unit =
  val message = '''
    Hello $name!
      Happy ${
  val formatter = java.time.format.DateTimeFormatter.ofPattern("EEEE")
  java.time.LocalDate.now().format(formatter)
    }!
    Have a wonderful day!
    '''
  println(message)
  log(message)

Is this okay? I have no idea. What if we use multi-line quotes inside? May we, or may we not? If not, how do we quote code? If we do, how do we avoid interpreting $-statements literally rather than as quoted? Or is that not a goal?

This, to me, seems like the hard stuff. If this, in any guise, can be accomplished successfully, then surely we can figure out where the quote starts and ends.

Note that there are already perhaps-surprising limitations in single-quoted text:

[error] unclosed multi-line string literal
[error]     val x = "Hi ${""" x """}"
1 Like

Another suggestion that we would bring in the scope of this SIP - let’s normalise the line endings in the new kind of multiline string literals and always use the \n.
(see https://youtrack.jetbrains.com/issue/SCL-19643)
This is probably the only right moment when this change can be done.
Right now the literal type of multiline string can depend on the OS or git repo checkout settings.
So this code can be valid on some machines and invalid on others:

val x: "a\nb" = """a
b"""

2 Likes

Thanks for the feedback @unkarjedy, that is indeed an issue that I encounter frequently, and this is a great chance to try and resolve it for good. I added a section normalize newlines/Newline Confusion to the proposal

3 Likes

Why everybody wants some hard to reason about auto-margin-stripping?

Maybe there is a simpler way, for example to use prefix """ (with space after quotes) to set the start of line, like:

val multiLineText = 
  """ This is the first line.
  """ Second line works fine too
  """    Third line deserves 3 extra leading spaces.
  """ We can use even special substrings without escaping, like """!!!
  """ 
  """ Blank lines are OK too.
//  """ This line is commented out! Normal commenting works!
  """ Best part - there is no need to calculate how many leading spaces have to be stripped!
  """ Though block comments like `/*` would not work an - it is part of `multiText` value.
  s""" Sadly, marking line for interpolation breaks the nice alignment a bit, unless use marks all lines with `s""" ` prefix :) 

This is very similar to JEP 467: Markdown Documentation Comments.
Similar proposal (by me) for ScalaDocs: Support /// as ScalaDoc indicator same as in https://openjdk.org/jeps/467.

I understand that there are implementation difficulties in retrofitting single quotes for multiline strings, but I think it’s worth it if what we get is a single " syntax to cover all string literal use cases.
With years, even triple quotes (""") could be deprecated, continuing the effort to make Scala 3 a smaller language.

Please don’t introduce another ''' syntax :folded_hands:. I’d rather live with the current

"""{
  |  "foo": "bar"
  |}""".stripMargin

than have yet another syntax (which would use a character (') currently related to Char, not String).

About the argument that it’s normal for other languages to define several different ways to create multiline strings, I’d cite the Zen of Nim’s first principle:

Copying bad design is not good design.

7 Likes

While I’m not strictly opposed to the ''' syntax for dedented strings, I cannot help but callout that the choice doesn’t seem exactly “logical”. I mean, for those guys who are with Scala for a long time, it might look OK-ish. However, imagine a newcomer who is just introduced to all the variety of string literals in Scala:

  • '...' – single chars only
  • "..." – single-line strings
  • """...""" – multi-line strings, not dedented (i.e. “raw”)
  • and now '''...''' – multi-line, dedented

I’m pretty sure that for many newcomers the first question will be “Why? What is the link between ' and '''?”) And there’s no simple answer unless they dive deep into the Great (or Sad?) History of String Literals in Scala.

10 Likes

This now has a PR implementation in SIP-72: WIP dedented triple-quoted string literals by lihaoyi · Pull Request #24185 · scala/scala3 · GitHub

1 Like