SIP: name based XML literals

yangbo · August 17, 2018, 7:02pm

As discussed in Proposal to remove XML literals from the language, we all agreed that the current symbol-based XML literals has made the Scala XML implementation stall on the old fashioned scala-xml library, which is not desired to every participator in that thread. In this post, I will propose an alternative solution other than simply removing XML literals.

Background

Name-based for comprehension has been proven success in Scala language design. A for / yield expression will be converted to higher-order function calls to flatMap, map and withFilter methods, no matter which type signatures they are. The for comprehension can be used for either Option or List, even when List has an additional implicit CanBuildFrom parameter. Third-party libraries like Scalaz and Cats also provides Ops to allow monadic data types in for comprehension.

Name-based pattern matching is introduced by Dotty. It is greatly simplified the implementation compared to Scala 2. In addition, specific symbols in Scala library (Option, Seq) are decoupled from the Scala compiler.

Considering the success of the above name-based syntactic sugars, in order to decouple scala-xml library from Scala compiler, name-based XML literal is an obvious approach.

Goals

Keeping source-level backward compatibility to existing symbol-based XML literals in most use cases of scala-xml
Allowing schema-aware XML literals, i.e. static type varying according to tag names, similar to the current TypeScript and Binding.scala behavior.
Schema-aware XML literals should be understandable by both the compiler and IDE (e.g. no white box macros involved)
Existing libraries like ScalaTag should be able to support XML literals by adding a few simple wrapper classes. No macro or metaprogramming knowledge is required for library authors.
The compiler should expose as less as possible number of special names, in case of being intolerably ugly.

Non-goals

Embedding fully-featured standard XML in Scala.
Allowing arbitrary tag names and attribute names (or avoiding reserved word).
Distinguishing lexical differences, e.g. <a b = "c"></a> vs <a b="c"/>.

The proposal

Lexical Syntax

Kept unchanged from Scala 2.12

XML literal translation

Scala compiler will translate XML literal to Scala AST before type checking. The translation rules are:

Self-closing tags without prefixes

<tag-name />

will be translated to

xml.tags.`tag-name`()

Self-closing tags with some prefixes

<prefix-1:tag-name />

will be translated to

xml.tags.`prefix-1`.`tag-name`()

Attributes

<tag-name attribute-1="value"
          attribute-2={ f() }
          prefix-2:attribute-3={"value"} />

will be translated to

xml.tags.`tag-name`(
  xml.attributes.`attribute-1`(xml.text("value")),
  xml.attributes.`attribute-2`(xml.interpolation(f())),
  xml.attributes.`prefix-2`.`attribute-3`(xml.interpolation("value"))
)

CDATA

<![CDATA[ raw ]]> will be translated to xml.text(" raw ") if -Xxml:coalescing flag is on, or xml.cdata(" raw ") if the flag is turned off as -Xxml:-coalescing.

Process instructions

<?xml-stylesheet type="text/xsl" href="sty.xsl"?>

will be translated to

xml.processInstructions.`xml-stylesheet`("type=\"text/xsl\" href=\"sty.xsl\"")

Child nodes

<tag-name attribute-1="value">
  text &amp; &#x68;exadecimal reference &AMP; &#100;ecimal reference
  <child-1/>
  <!-- my comment -->
  { math.random }
  <![CDATA[ raw ]]>
</tag-name>

will be translated to

xml.tags.`tag-name`(
  xml.attributes.`attribute-1`(xml.text("value")),
  xml.text("""
  text """),
  xml.entities.amp,
  xml.text(""" hexadecimal reference """),
  xml.entities.AMP,
  xml.text(""" decimal reference
  """),
  xml.tags.`child-1`(),
  xml.text("""
  """),
  xml.comment(" my comment "),
  xml.text("""
  """),
  xml.interpolation(math.random),
  xml.text("""
  """),
  xml.cdata(" raw "), // or xml.text(" raw ") if `-Xxml:coalescing` flag is set
  xml.text("""
""")
)

Note that hexadecimal references and decimal references will be unescaped and translated to xml.text() automatically, while entity references are translated to fields in xml.entities.

XML library vendors

An XML library vendor should provide a package or object named xml, which contains the following methods or values:

tags
attributes
entities
processInstructions
text
comment
cdata
interpolation

An XML library user can switch different implementations by importing different xml packages or objects. scala.xml is used by default when no explicit import is present.

In a schema-aware XML library like Binding.scala, its tags, attributes, processInstructions and entities methods should return factory objects that contain all the definitions of available tag names and attribute names. An XML library user can provide additional tag names and attribute names in user-defined implicit classes for tags and attributes.

In a schema-less XML library like scala-xml, its tags, attributes, processInstructions and entities should return builders that extend scala.Dynamic in order to handle tag names and attribute names in selectDynamic or applyDynamic.

Known issues

Name clash

<toString/> or <foo toString="bar"/> will not compile due to name clash to Any.toString.

Compilation error is the desired behavior in a schema-aware XML library as long as toString is not a valid name in the schema. Fortunately, unlike JSX, <div class="foo"></div> should compile because class is a valid method name.
A schema-less XML library user should instead explicit construct new Elem("toString").

White space only text

Alternative approach

XML initialization can be implemented in a special string interpolation as xml"<x/>". The pros and cons of these approaches are list in the following table:

	symbol-based XML literals in Scala 2.12	name-based XML literals in this proposal	`xml` string interpolation
XML is parsed by ...	compiler	compiler	library, IDE, and other code browsers including Github, Jekyll (if syntax highlighting is wanted)
Is third-party schema-less XML library supported?	No, unless using white box macros	Yes	Yes
Is third-party schema-aware XML library supported?	No, unless using white box macros	Yes	No, unless using white box macros
How to highlight XML syntax?	By regular highlighter grammars	By regular highlighter grammars	By special parsing rule for string content
Can presentation compiler perform code completion for schema-aware XML literals?	No	Yes	No

AMatveev · August 17, 2018, 7:24pm

How will it work in pattern matching?

yangbo · August 17, 2018, 7:38pm

The same translation rules should be applied to XML patterns. The translated XML literals are untyped function calls, which can be considered as patterns as well.

nafg · August 17, 2018, 7:35pm

I use scalajs-react a lot and something like this would be very nice. It would make it possible to copy-paste HTML snippets with little modification, and a lot more familiar to people coming from JSX in javascript.

AMatveev · August 17, 2018, 7:47pm

As I know, pattern matching does not allow to use expression like
‘’’
xml.tags.tag-name
‘’’
So it seems it will not work.

Krever · August 17, 2018, 7:50pm

I wanted to ask this in xml-literals-dropping-thread, but should work here as well: could this be implemented as a compiler plugin?

yangbo · August 17, 2018, 7:51pm

object xml {
  object tags {
    object `tag-name` {
      def unapply(o: Any): Boolean = true
    }
  }
}

"value" match {
  case xml.tags.`tag-name`() =>
}

The above code compiles to me.
https://scalafiddle.io/sf/AEfymJL/0

yangbo · August 17, 2018, 7:58pm

Yes as long as current symbol-based XML literal feature is not removed, since a compiler plug-in can convert _root_.scala.xml.xxx symbols back to the name-based ASTs. In fact, Binding.scala does perform such conversion in macro annotations.

No if current symbol-based XML literal feature is removed, because Scala compiler does not allow changing the parser phrase.

AMatveev · August 17, 2018, 8:05pm

~~Type is not expression. If xml will be translated to object tree It will not take vendor xml implementation.~~

nafg · August 17, 2018, 8:09pm

Compiler plugins can’t add new parsing rules

Krever · August 17, 2018, 8:35pm

How ridiculous would it be to consider giving them such capability?

ebruchez · August 17, 2018, 8:52pm

In passing, <prefix-1:prefix-2:tag-name /> is not allowed in XML-with-namespaces, which is how most parsers are configured these days. An XML library supporting namespaces would have to report an error when encountering this.

Alternatively, don’t split the name into prefixes at parsing time. Just pass the entire name as is to the XML library, which then can decide how to interpret : characters in names and otherwise validate the element name.

AMatveev · August 17, 2018, 8:53pm

It is definitely more ridiculous then try to return white box macros in dotty.
I think it has been very well discused in those topics.

Jasper-M · August 17, 2018, 9:00pm

Dotty has “research plugins” which allow adapting the parser phase.

yangbo · August 17, 2018, 9:03pm

Good catch. I updated the proposal to remove multiple prefixes.

I left one prefix in the proposal, as it is the current Scala 2.12 parser behavior.

zygfryd · August 17, 2018, 9:26pm

Wouldn’t it be more natural to statically enforce validity if you used separate parameter lists for attributes and nodes?

The library author could define traits like AttributeOfTagFoo and ChildOfTagFoo that tag constructors accepted. It would also eliminate the need to programatically separate those two things in every single tag constructor, they can’t be interleaved in XML anyway.

Blaisorblade · August 17, 2018, 10:14pm

Research plugins are on purpose disabled on compiler releases, and they have little portabilitiy guarantees: having the ecosystem depend on experimental and unmaintainable APIs (cough cough, scala2 macros, cough cough) has shown itself a recipe for technical debt. And there it was pretty clear that the API was pretty brittle. That doesn’t prevent forks from enabling such plugins, but hopefully it makes clear that they are unsupported.

Personal opinion: In principle, I’m a fan of extensible languages. In practice, they require well-specified and robust extension points. So it seems much easier to bundle name-based XML support inside the compiler than to document and stabilize all the relevant internal APIs, in particular untyped trees, if somebody is willing to push through the implementation. This might be short-sighted, but I doubt exporting Dotty internals would be very different from Scalac internals: Dotty has a better design, but internal APIs are not designed for ease of use.

Remember I don’t sit on the SIP committee tho and I’m still a Dotty beginner, so take this with a grain of salt.

EDIT: also, “new parsing rules” in a plugin via inheritance doesn’t sound modularly maintainable, unless you’re willing to review the plugin for each change to the parser.

yangbo · August 18, 2018, 4:35am

I like the idea of ChildOfTagFoo.

However, separate parameter list is problematic in pattern matching. Would you like to see a schema-aware XML library that defines shared ChildOfTagFoo traits for both attributes and nodes?

zygfryd · August 18, 2018, 9:45am

case <foo>{children @ _*}</foo> => matches child nodes and not attributes though, so case class foo(children: Node*)(attributes: Attribute*) extends Node would give us the right unapply for free.

yangbo · August 18, 2018, 9:48am

I don’t think current version of Scala or Dotty supports pattern matching of multiple parameter lists.