…or Github issues/PR descriptions, readme.md, and other places. This is mentioned in the proposal as a downside and I think it’s probably a blocker
One example where the “terminate parsing only when there’s a single "
on that line” would cause problems is the following:
val lines = "
this is the
expected result
".split("\n")
This is a pretty common pattern, and I’ve written similar code many times. In this case, the "
would not be recognized as a multiline string terminator because of the "
s in the following "\n"
True. You’d have to write
val lines = "
this is the
expected result
"
.split("\n")
If we want to allow
val lines = "
this is the
expected result
".split("\n")
we could generalize the rule further and say a "
terminates a multi-line string if
- it is only preceded on the same line by whitespace, and
- it is followed by an even number of
"
on the same line (including 0)
That would leave a program with two multi-line strings separated by ,
or whatever as the only false negative.
Instead of the spec langsuage above one could then explain it like this: you can’t have a multi-line string literal start at the same line where another one ends. That’s easy to remember.
So the following is considered to be a single parameter application
foo(
"
this is
","
not a drill
"
)
And this two parameter application
foo(
"
this is
",
"
not a drill
"
)
Did I understand you correctly?
Yes, that’s what I meant.
I thought of another limitation of a single "
is that it makes any string you start typing greedy for the parser, so everything is starting to be colored like string until somewhere down the code it hits a new "
.
For this reason I think we mandate a multi-line single "
to be followed by white-spaces only until a new line. This means that a start of multi-line string is "
+ newline (with optional whitespaces between them).
The first line drop will not be considered for a new line a part of the string.
Examples:
print("hello\n")
print("
world
")
print("goodbye")
Printout:
hello
world
goodbye
This will create a parser error like Scala today.
print("my
attempt
")
In the proposal, neither the leading nor trailing newline are included in the string. This is in line with implementations in other languages (C#, Swift, Ruby), so your example would print
hello
world
goodbye
Also, the existing proposal already mandates that the opening delimiter is followed by a newline and nothing else
The existing proposal for a single "
? Sorry, I missed that.
For the '''
syntax, which I suppose forms the basis for the alternate multiline "
syntax
Why not simply insist that the indented text be deeper?
If you have a trailing "
on a line, this opens a multi-line string, stripped to the level of indentation of the first (nonempty) line.
The string terminates at the first instance of "
starting at the indentation depth of the line that opened the multi-line string.
This allows all manner of splits, closing and reopening, and so on, in a way that is irregular but still visually very clear:
val addresses = Array(
"
123 Main Street
Boise, Idaho
", "(none)", "
P.O. Box 432
Tinyville, MiddleOf, Nowhere
"
)
Awkward, not the best form, but also still pretty clear what it is supposed to mean (edit: at least now that I’ve fixed my indentation typos…).
The only thing this would prevent is quoting indented paragraphs:
val quote = "
As she pushed off, she knew that
no one would see this sandbar again.
"
If we really wanted to allow that, we could also have the bare-quote-with-nothing-else rule with flat (same-level indent) parsing:
val quote =
"
As she pushed off, she knew that
no one would see this sandbar again.
"
But I think the cleanest and most intuitive basic rule is: open with single quote, close with single quote at the indentation level of the line it started on, the text inside has the indentation removed, and you stop when you hit the quote at the correct indentation level (and it’s an error if it’s indented less).
If you want a fancier parser, you can capture the whole set of lines and dedent the minimum indent level of the block; that would allow the quote above too. But if you want to go line-by-line, I think the no paragraph indents rule is fine.
The only thing that is missing then is
val indented =
"
I really do mean
to print this out
indented two spaces
"
But given that this is easily solved by an .indent
method, I don’t think it’s a major barrier.
(Note: Java has indent
since 12; if we need to support 11 or less, we could add an indentWith(s: String)
method that would both allow more flexibility and avoid the name collision with later Java.)
How about using this fact for the benefit of better clarity and disambiguation?
When a multiline string opens with "
, then it can allow non-whilte characters XYZ until EOL. Then, XYZ should occur right before the closing "
in order to close the string. Any occurence of a single "
without preceding XYZ is not considered as a closing one and therefore doesn’t require escaping.
For example:
val json =
"---
{
"one": 1,
"ruler": " --- " // it's ok since `---` not followed by `"`
}
---".toJson
def openingParagraph = "#-#
one dark and stormy
night, he said
"...i am cow
hear me moo"
#-#"
// Any sequence of chars after opening " can be used
// therefore `[expected]` will be effectively removed from the string.
assertEquals(
"[expected]
this is the
expected result
[expected]",
computation()
)
A caveat: the XYZ sequence cannot contain "
itself because it will make it a single-line string.
Would it be too crazy to implement?
I think it could work! It would be similar to the HEREDOC from Ruby. With proper choice of header, such as the "--- ---"
you demonstrated, it looks great too. And it entirely removes the need for escaping, since the header/footer strings are customizable. And it doesn’t require a new '''
quote style to be introduced.
Unless XYZ equals the double quote ""
, in which case it interpreted as the regular three quoted multiline string definition
I think this proposal is nice, since there is always some combi XYZ that is not part of the string to be defined.
If so, we could still opt for my third option: always close a multiline string started with a single quote with a triple quote (with internal triple quote’s to be escaped).
It’s nice, but doesn’t it make things worse by creating new flexibility of headers? Why not just choose "-- --"
? And if we do this, how is it really different than the original proposal?
I guess, it could become just one string literal " ... "
that covers both single-line strings and multiline strings. And it would allow to customize the way how each string should be closed based on its content, which is only known to a user at the moment. Therefore, if chosen properly, less-to-no escaping would be necessary.
Just to clarify: additional XYZ are not required for multiline strings:
// https://en.wikipedia.org/wiki/What_a_piece_of_work_is_a_man
val simpleQuote = "
What a piece of worke is a man! how Noble in
Reason? how infinite in faculty, in forme, and mouing
how expresse and admirable in Action, how like an Angel
in apprehension, how like a God?
"
However, they can be added depending on the string content, when necessary:
val scalaMultilineStringsDoc =
"~~~
Scala allows to define multiline strings. Example:
val jsonText = "---
{
"one": 1,
"two": 2,
"three": 3
}
---"
Any character sequence can be used after the opening `"`.
This sequence will be used to identify the closing `"` and differentiate it from any other `"` thay may occur withing the string.
~~~"
Yes, this would be an unfortunate but necessary exception. I guess, if some better multiline string approach would have been implemented, then the triple-quoted strings could be deprecated and then decomissioned over time.
I really don’t think this is necessary with indentation-aware parsing.
val scalaMultilineStringsDoc = "
Scala allows defining of multiline strings. Example:
val jsonText = "
{
"one": 1,
"two": 2,
"three": 3
}
"
A multiline string begins with a trailing, non-matching quote
at the end of a line. Subsequent lines must be indented
more deeply than the line on which the trailing quote started.
The multiline string is ended by a single quote at the indentation
depth of the line on which the trailing quote was given.
The indentation level of the first nonempty line after the quote is
removed from every line. Thus, the example above is equivalent to
val jsonText =
"""{""" + "\n" +
""" "one": 1,""" + "\n" +
""" "two": 2,""" + "\n" +
""" "three": 3,""" + "\n" +
"""}""" + "\n"
"
If we really really want to be able to specify the indentation depth, we could use <-
on the start line to mark where the boundary is with the start-quote on its own line
val indented =
" <-
This is indented two spaces.
This is flush with the left.
"
val disallowed =
" <-
This is indented -2 spaces! Compiler error!
This is okay (it starts with 2 spaces)
"
Because <-
is a reserved symbol, there is very little risk of it accidentally appearing there as an unquoted string. And if you want to include a literal <-
, you just do it like normal in the block of text.
This is cleaner than delimiter marks. We already have spaces to act as delimiter marks! The problem with delimiters is that they require you to be aware of the content–if the content changes (e.g. you’re quoting code), then you can mess up your string. Indentation, in contrast, only requires you to indent properly, which you already committed to doing by using an indentation-aware format. “Copy, and hit tab until it’s deeper” is pretty straightforward.
If we are going to do this, we should pick a small number of characters that are okay, and probably limit it to repeats of the same character so we don’t have ASCII-art header/trailers. #
is common, but I agree that -
and ~
look cool too. I don’t think #-#
is a good idea. Anything that looks too fancy suggests that it has meaning, which it doesn’t.
This is a bit remeniscent of Perl’s string quoting operators. These start with q
and are followed by a special character which is the string delimiter. q"moo"
, q'moo'
, q#moo#
etc. all mean the same thing, e. g. q(foo)
or q{foo}
. There’s an additional twist for parenthesis-like characters: a string literal started with an opening paren is closed with the corresponding closing paren. I rather like that too.