Syntax highlighting inside custom string interpolators

Sporarum · January 16, 2025, 11:06pm

Since there is a lot of talk about a new syntax for collection literals, I thought it might be time to bring up something I have been thinking about a lot lately:

It’s not fun to program inside string interpolators

Let me explain: in the JS world, there is JSX, which allows things like this:

handleOnClick = () => {
  console.log("clicked");
};

button =
    <button id="btn" onClick={this.handleOnClick}>
      Click Here
    </button>

Two thinks jump out:

Code is injecting things into another language
There is syntax highlighting

Here is an hypothetical equivalent in Scala:

def handleOnClick() =
  console.log("clicked");

val button =
    html"""<button id="btn" onClick=${handleOnClick}>
      Click Here
    </button>"""

We still have the ability to insert things into other contexts, but the syntax highlighting is gone

So let’s do the same !
Small issue:

There’s one JS_, whereas there’s an arbitrary amount of string interpolators, for an arbitrary number of target languages

Here are the obvious (to me) solutions:

magic: Compile and run a syntax highlighter at compile-time
markdown: Use whatever syntax highlighter is present on the host machine

# Example

```scala
def f[T](x: T): T
```
Well it doesn't quite work in markdown inside markdown, but if you remove one layer it does, it's what this forum runs on !

Why am I speaking about this now ?

Well because it would help solving the “we need to define bulk data locally” problem that the new collection syntax aims to solve:

val people: List[Person] = json"""
[
  { name = "John"},
  { name = "Paula"},
  { name = "Rain"},
]
""".asScalaCollection

Of course this is much more ambitious and powerful, but at the same time, in a way it doesn’t touch the language at all: not its syntax, not its semantics, only its tooling

It’s also a good excuse to put some compiler engineers to work on tooling, which seems like it might be a good thing:

It’s also the kind of crazy ideas that get stolen by other languages later
And it won’t be like the ones that didn’t, looking at you XML literals, as it’s not even syntax, only tooling, it doesn’t even have to be specced outside of “Highlighter[magic] is reserved for that purpose with no guarantee”

bishabosha · January 17, 2025, 11:49am

then when it comes to nested splices then you need the triple quotes again - this could make it still unappealing.

see this is what rust solves with its syntactical macros that work on parser tokens:

let value = json!({
    "code": 200,
    "success": true,
    "payload": {
        "features": [
            "serde",
            "json"
        ],
        "homepage": null
    }
});

Sporarum · January 17, 2025, 2:55pm

I’m not sure I understood, could you explain ?

bishabosha · January 17, 2025, 3:29pm

just like this:

def handleOnClick() =
  console.log("clicked");

val button =
    html"""<button id="btn" onClick=${handleOnClick}>
      ${
         val something = ???
         html"""
           <input placeholder="foo" type="text"/>
         """
      }
    </button>"""

MateuszKowalewski · January 17, 2025, 10:36pm

I don’t get how this is Scala related.

Language injection is strictly an IDE feature.

The compiler does not do “syntax highlighting”. Especially not for arbitrary languages that can be found inside of a string literals.

Also so called “bare words” are a catastrophically idea. See also here.

I mean, I also very much like to have better IDE features. But this here is something that strictly belongs to tooling dev. You need LSP runtimes that can support language injection. A Scala compiler is not such an universal LSP runtime.

Suddenly Scala starts to move into the direction of the worst parts of Perl…

We have right now on the table: The other nonsense proposal about Perl collection literals ([…]), people arguing for a “comma operator” (to express empty and singular tuples) which is just tasteless, now bare words…

It’s sad seeing this.

(I don’t want to dismiss this submission here as bad per se! I think better tooling / IDE support for programming is long overdue. We’re still handling strings instead of structured data. No progress since at least over 60 years, the time as we switched from binary to strings… But this here is simply not a case for the Scala compiler, even it’s an important features in general).

Sporarum · January 18, 2025, 2:57pm

I mean it’s literally about Scala tooling ^^’

This forum is not only about the Scala language spec and compiler
UX is extremely important, and IRC the SIP committee also looks at these kinds of things
One version of this idea would need compiler changes:

What I meant by that was:

Compiler sees the string-interpolator, computes some related value and sends it to the editor (for example Highlighter(using json))
The editor hotswaps the syntax highlighter at that point (which I’m guessing is possible)

The other option was of course much simpler and more reasonable, but this is not a pre-SIP, simply a discussion, so I think it’s good if we can push boundaries and see what works and what doesn’t

Thank you for the clarification, it is indeed a problem
Let’s just use guillemet (« ») instead /j
But also in those cases I think it would be worthwhile to extract the inner part in a def or val anyways:

def handleOnClick() =
  console.log("clicked");
def inner(param: Any) = 
  val something = param
  html"""
    <input placeholder="foo" type="text"/>
  """
val button =
    html"""<button id="btn" onClick=${handleOnClick}>
      ${inner(???)}
    </button>"""

And we should also be able to do:

val dependent = html"""
<button ...>
  ${ if debug then "" else html"<MyDebug>${debugInfo}</MyDebug>" }
</button>
"""

Sporarum · January 18, 2025, 3:01pm

I don’t see how this relates ?

You would always need at least <someInterpolator>"someText" for this proposal to apply, which is the opposite of a bare word (unless I misunderstood something)

Sporarum · January 18, 2025, 3:26pm

Thinking more about it, I think it should work like this:

// Scala parser, this is a scala file
val dependent = 
  // v search for a syntax highlighter for "html" / ".html" (no compilation need)
  html"""
    // HTML syntax highlighter
    <button ...>
      // `${` detected, switching back to Scala highlighter
      ${ 
        if debug then "" else
          // v search html syntax highlighter
          html"
            // HTML syntax highlighter
            <MyDebug>
               // `${` detected, switching back to Scala highlighter
               ${debugInfo}
               // `}` detected, going back to previous highlighter (HTML)
            </MyDebug>"
          // end of string, going back to scala highlighter
      }
      // `}` detected, going back to previous highlighter (HTML)

   </button>
  """
// end of string, going back to scala highlighter

Here is what would be given to the different highlighters (more or less):

scala:

val dependent = html"""dummy"""

html:

<button ...>
  dummy
</button>

scala:

if debug then "" else html"dummy"

html:

<MyDebug>dummy</MyDebug>

scala:

debugInfo

Or potentially something like the following so the LSPs know where variables come from:
scala:

val dependent = html"""${if debug then "" else html"${debugInfo}"}"""

html:

<button ...>
  <MyDebug>dummy</MyDebug>
</button>

This of course places limitations on what can be highlighted, the following would not (and should not ?) work:

val part1 = html"<bu"
val part2 = html"tton/>"
val sum = part1 + part2

In my mind this is a good thing, JSX-like libraries should not allow you to break the sub-language’s semantics (and to my knowledge JSX doesn’t)

tarsa · January 18, 2025, 5:37pm

i’m not sure what is the effort and effect in each proposal, but i think we can go with a simple improvment that will have majority of the advantages while requiring only 1% of effort on scala side.

the proposal is:
next to scala.StringContext#s add aliases to that method which do exactly the same, but are markers for IDEs to inject foreign languages higlighting automatically

package scala

case class StringContext(parts: String*) {
  // existing method
  def s(args: Any*): String = macro ???

  // new methods

  /** Alias for s"...". Hint for IDE to inject JavaScript highlighting. */
  def sJs(args: Any*): String = do what def s does

  /** Alias for s"...". Hint for IDE to inject CSV highlighting. */
  def sCsv(args: Any*): String = do what def s does

  /** Alias for s"...". Hint for IDE to inject HTML highlighting. */
  def sHtml(args: Any*): String = do what def s does
}

note that even java (with all their financial and human resources) is not very willing to include even JEP 198: Light-Weight JSON API, which suggests that embedding foreign languages syntaxes into scala won’t be a good idea. otoh, if we offload that to third party libraries, then it won’t be as easily usable as the built-in aliases proposal above.

Sporarum · January 18, 2025, 6:17pm

The point is exactly the opposite !

We need to be able to insert values programmatically, otherwise there is no point
Look at the React example: handleOnClick is compiled in a way that clicking the button calls it, which cannot be done through something like htmlParser(s"""code here""")

These string interpolators should not be present in the standard library:

We cannot have an objective, complete list of every programming language there is
Libraries need access to both parts and args (otherwise there is no point in having a custom interpolator for languages)

Sporarum · January 18, 2025, 6:29pm

To recapitulate, here is the updated proposal (as seen in this comment):

Language/compiler changes:

None whatsoever

LSP changes:

When encountering a string interpolator interpolatorName, look to find another LSP which supports the language interpolatorName (or alternatively which reads .interpolatorName files), if one is found, slice the ${}s out of the string and give them to it

Multi-level splits like

(MyDebug was in some Scala in <button>)

would probably be too complicated and offer little benefit

(That benefit would be allowing syntax highlighters who use different colors for things like unused identifiers to correctly detect these cases)

tarsa · January 18, 2025, 6:43pm

my point is that adding the hinting aliases will give benefit right away, without the trouble of developing or finding a library that does more than just standard string interpolation (also pulling such dependencies can increase severity of the so called dependency hell). it’s not a big problem if not all languages are covered in stdlib aliases, just like it’s not a big problem that .jsx allows to embed html, but not python.

your idea is not conflicting with my idea, i.e. if user wants to just have json highliting then just use sJson"..." from my proposal and when user wants more then use json"..." from specialized library as in your proposal. now there’s the cost vs benefit ratio question.

also since the interpolators are methods, we could be a little more principled and require annotation (with retention policy https://docs.oracle.com/javase/8/docs/api/java/lang/annotation/RetentionPolicy.html#SOURCE ) on that methods that will explicitly tell that we ask for syntax highlighting from code editor.

Sporarum · January 18, 2025, 7:05pm

Ah I see

But does that happen often ?

If you need to have a big json string in the middle of your code it’s either that it should be a in a separate file, or that you will pass it to a library (which could itself define a dummy string interpolator)

tarsa · January 18, 2025, 7:30pm

if i’m searching correctly then e.g. json4s has 0 string interpolators Code search results · GitHub , scala-csv has 0 string interpolators Code search results · GitHub etc.

we use these libraries and define many pieces of inline json and csv data in our tests. the converters between text representation and classes are in production code, but example data in textual representation is in tests.

instead of waiting for third party libraries to add dummy string interpolators, we can add them to stdlib. having them available without any import would make them more easy to use.

that was just a food for thought. the value of that entire proposal depends on what the code editors would do with the hints. just syntax highlighting is ok, but we can expect more features, like auto-closing elements, code completion, reformatting of the embedded foreign language snippet, alternative data viewers (view csv as table, allow to sort rows, reorder columns, etc) - doing all of that without leaving the currently edited scala file would be cool.

Sporarum · January 18, 2025, 7:30pm

Or even simpler, do something like:

import language.defaultLanguageInterpolators.json

Could we do something like the following with macros ?

given jsonExtensionMethod = generateDefaultLanguageInterpolator("json")

json"test"
// desugars to forwarder named "json" on StringContext to s

tarsa · January 18, 2025, 8:16pm

if the IDE would automatically suggest adding that import after writing json"whatever" then it would be simple enough, probably. interpolators documentation should clearly state that they’re only about hinting the format or language of interpolated snippet to code editor. they should also clearly state that they aren’t safeguarding against mailicious input injection (so maybe we should e.g. avoid putting sql into the standard list; just add data formats, not coding languages).

regarding malicous input: slick requires # before $expression to treat it literally and concatenate with rest of sql. Plain SQL Queries — Slick 3.0.0 documentation

Splicing Literal Values

While most parameters should be inserted into SQL statements as bind variables, sometimes you need to splice literal values directly into the statement, for example to abstract over table names or to run dynamically generated SQL code. You can use #$ instead of $ in all interpolators for this purpose, as shown in the following piece of code:
val table = "coffees"
sql"select * from #$table where name = $name".as[Coffee].headOption

maybe the default language interpolators should all require that # symbol just to remind users that they’re not safeguarding against malicious input.

scalway · January 19, 2025, 11:31am

For context, IntelliJ has a language injection feature that works well in Scala. However, it requires preconfiguration for specific prefixes, so it’s not an instant solution.

That said, it provides more than just syntax highlighting (e.g., automatic tag closing, among other features). Emmet also works, although it always generates single-line output for some reason:

The point I’m trying to make is that the compiler doesn’t need to “emit” any special metadata for LSP. We already know that a string interpolator parsing JSON will likely be named json. And if not (e.g., ujson), it could be configured in the IDE to handle it appropriately.

While I think this would be a great feature to implement in IDEs, I’m not convinced it belongs in the compiler. It feels more like a tooling enhancement request.

tarsa · January 19, 2025, 12:38pm

that’s why i’ve proposed the annotation for string interpolators. combined with earlier proposal, that could be:

tarsa:

package scala

case class StringContext(parts: String*) {
  // existing method
  def s(args: Any*): String = macro ???

  // new methods

  /** Alias for s"...". Hint for IDE to inject JSON highlighting. */
  @syntaxHighlightingHint(language = "JSON")
  def sJson(args: Any*): String = do what def s does

  /** Alias for s"...". Hint for IDE to inject CSV highlighting. */
  @syntaxHighlightingHint(language = "CSV")
  def sCsv(args: Any*): String = do what def s does

  /** Alias for s"...". Hint for IDE to inject YAML highlighting. */
  @syntaxHighlightingHint(language = "YAML")
  def sYaml(args: Any*): String = do what def s does
}

that would reduce guesswork on IDE side and/or effort on its manual configuration. ide would only need to recognize the single annotation. IDEs have lots of cool features that nobody knows about, because it’s not obvious out-of-the-box that they exist, so having something that works automatically will make many more people use it.

Sporarum · January 19, 2025, 1:10pm

Thank you for your feedback, I wanted to hear from tooling people !

I no longer do either:

I agree !
Since this relates to the scala echosystem as a whole, and not only specific tools, I believe this was a good place for it
In particular because it would drastically reduce the need for Pre-SIP: A Syntax for Collection Literals

tgodzik · January 20, 2025, 12:53pm

From my side, here are two places it can be implemented:

GitHub - scala/vscode-scala-syntax: Visual Studio Code extension for syntax highlighting Scala sources

This is the syntax that AFAIK is also used in github, so it would be nice to do.

scala3/presentation-compiler/src/main/dotty/tools/pc/PcSemanticTokensProvider.scala at main · scala/scala3 · GitHub

This is used for semantic highlighting in Metals. It’s easier to implement some heuristics that could parse json/html etc.

How much work each of those places require is another thing. If we implemented it in the first one, we could potentially reuse that for semantic highlighting (not set it for string interpolation in which case VS Code will revert to the syntax)