Multiline string literals: can we get rid of the need for stripMargin?

That’s fine. I have no strong preferences for syntax here. I just want the functionality.

The problem is: There are not much options…

I actually even proposed a very “ugly” syntax first, which uses the trick C# uses. You would have than a whole row of """""""" in some cases. Ugly but functional.

The syntax needs to be short I think, it needs to make it clear that a String literal starts, and it needs to be unambiguous. The proposed string: is not, also not in combination with interpolation (because that’s an infix call to a function taking a closure).

I actually think it would even work with a single "\n but that would only make more chaos with :.

No matter the concrete syntax, I think it needs to be a “only opening” symbol anyway. Because the whole point is to not need a closing symbol and just use indentation.

But inventing a third, distinct syntactic option to declare string blocks seems quite odd. Martin said the same actually.

Like said, the design space is anyway quite limited. You have basically two options:

Something that resembles “here document” (here-doc symbol followed by custom delimiter, which needs to be repeated at the end), or using indentation as delimiter (with unindent as closing symbol).

After agreeing on the variant the only thing that can be discussed is actually which symbol opens the text block. But if you don’t want to add new syntax only for this use case (which is bad) there are not much options either. It’s " or """.

And using the here document variant (which is more or less also what C# does) ignores having indentation based blocks in the language, which is inconsequential. Still the algo for indented blocks needs to be applied to these text blocks to get rid of stripMargin… The ending symbol is than just redundant noise that has no technical reason for existence.

All the examples so far introduce a val, for which one can already use an optional end marker, e.g., end myJson. As a user, I certainly would not want another (required) end marker to finish the string.

1 Like

I certainly would want an end marker, because in multiline strings where whitespace matters, I need to be able to specify how much whitespace to include at the end.

Using block scope seems very unreliable for that.

2 Likes

How many syntax highlighters do you think will support this?

1 Like

I’d rather not have another way of defining a string and don’t mind the | and .stripMargin: in fact I’d say it helps me see where the margin is, rather than have me count whitespaces and hope it ends up where it should.

4 Likes

The term “gutter” means the extra margin at the bound edge of pages in a book.

We may adapt that definition to any margin adjustment according to a context, such as the margin of an embedded multiline literal.

So if anyone manages to nerdsnipe a contributor into implementing this feature, that would be a guttersnipe.

(Internet says guttersnipe was “Wall St slang for streetcorner broker” but that seems less likely than the general sense. But “snipe” was a “term of opprobrium” since Shakespeare.)

Edit: this feature would have helped a test failure where “trailing incidental white space” in a “text block” was helpfully deleted by the text editor. Suddenly, it’s a feature one can’t live without. I think someone asked for \s as well.

My proposal. (Yes, I’ve read the whole preceding discussion and am trying to make a conclusion both from my thoughts before it and the discussion itself.)

The main style

  1. Opening and closing with either ``` or a longer sequence of backticks.

    Why:

    • Opening with " doesn’t fulfil one of the goals – it isn’t quotation mark friendly (a user can’t just trivially paste a text inside without bothering about quotation marks, even with raw).
    • Opening with """ isn’t backward compatible (we can’t just redefine """).
    • Longer sequences of backticks allow shorter sequences of backticks to be contained inside the literal without escaping.
  2. Conservatism:

    • On the line containing the opening backtick sequence, there must be no characters after it, even whitespace ones (otherwise – syntax error).
    • On the line containing the closing backtick sequence, there can be only whitespace characters before it (otherwise – syntax error). They are considered to be an incidental indentation.
    • Every other line must either start exactly with incidental indentation or be empty (otherwise – syntax error).

    Why: it’s easier to start with a conservative style and relax it later (if needed) than otherwise. (Just in case: I don’t oppose to have it more relaxed from beginning, I just described kinda “the bare minimum”, which may or may not be extended.) It’s kinda strict subset of JEP-378.

The alternative (unclosed) style

I like the Mateusz Kowalewski proposal for text blocks. But, in my option, it shouldn’t be implemented alone, it’s too radical. In my option, that’s exactly the same case as with code blocks: we have both {+} and :+unindent, not just :+unindent – so may be done with incidental-indentation-aware string literals: ```+``` and :+unindent, not just :+unindent.

For opening sequence, I would choose something like ```: (not ": or string:) – just for consistency with the main style. Also I would allow longer sequences of backticks (e.g. ````:) with exactly the same meaning (unlike the main style, where number of opening backticks determines the maximal amount of backticks allowed inside the literal, here it affects nothing).

Open questions (not resolved here)

  1. Should these literals

    val a = ```
    hello
    world
    ```
    
    val b = ```:
        hello
        world
    

    be equal to "hello\nworld" or "hello\nworld\n"?

    I would personally choose "hello\nworld\n" (for me it’s less surprising, more often used in practice and more similar to Java), but "hello\nworld" can be advocated as well (particularly this is C# design choice).

  2. Should trailing whitespace characters (in every line) be stripped or no?

    I personally wouldn’t strip (in some contexts, e.g. Markdown, trailing whitespace is significant, and in contexts where it isn’t, keeping it shouldn’t significantly harm), but stripping can be advocated as well (Java strips).

    (The cautious way would be to treat any trailing whitespace as syntax error, unless in incidental indentation or expressed by other means like \u0020/${" "}/etc, but it’s probably too conservative even for the beginning.)

  3. Should line breaks be normalized and, if yes, to which style?

    I personally would normalize to scala.util.Properties.lineSeparator, but, again, it’s debatable. (And if by chance it’s decided not to normalize, then there probably should be compile-time warning or error when they’re non-uniform.)

  4. Should we allow less backticks? I.e.: we could use `` without clashing with existing syntax and, for the alternative (unclosed) style, even `:.

    I’d personally allow `` (and ``:) but not `:.

I don’t know how you even managed to write this proposal given how deeply backticks are embedded within the world’s documentation and even AI skills now. For this alone I would not consider it ever, and would choose the status quo over it.

1 Like

I am not sure it was mentioned here, but the SIP SIP-XX: Dedented Multiline String Literals - #103 by Sporarum was already merged and will be released as a part of 3.10.

2 Likes

To be honest, that was the least significant part of the proposal. The ''' variant just didn’t come to my mind (as I drew on the current discussion where only ", """ and ``` were mentioned). My proposal freely can be re-read just with substituting every backtick with apostrophe, without any loss of meaning (and probably should be read in that way if apostrophe is preferred).

The more important things is that, as Mateusz Kubuszok said, major part of what was described in this proposal is already implemented (partially in that way, partially in different way).

Good to know.

Surprisingly, it’s implemented almost exactly in the way I wanted (except that the line breaks are normalized to just LF [on the other hand, I’ve just realized that compile-time normalization to scala.util.Properties.lineSeparator, which is unknown at compile time, is simply impossible]).

Except that the Mateusz Kowalewski proposal for text blocks (which should then start with ''': probably) are not implemented. But that’s not my personal preference anyway (I just tried to advocate and summarize anything I found theoretically-useful in this discussion).

I like how it makes it look like I’m the one behind that work !
(I’m not, I just posted a small summary I cobbled together)