SIP: name based XML literals

yangbo · August 19, 2018, 5:11am

If there are two parameter lists, what should the exact rule to translate an XML pattern be? There are some open questions in two parameter lists approach:

What is the order of the parameter lists when both attributes and nodes are present?
Will there be an empty parameter list when attributes are absent?
Will there be an empty parameter list when nodes are absent?
Will the XML literal creations and XML literal patterns share the same translation rule?
How to adapt existing one parameter list libraries (e.g. ScalaTags) to the new protocol?

yangbo · August 19, 2018, 5:18am

I think it’s better to introduce a new flag -Xxml:-preserve-whitespace flag to remove whitespace-only text when translating XML literals.

sjrd · August 19, 2018, 6:25am

It’s even better to let the library decide what to do with xml.text(s) calls whose s only contains whitespace.

yangbo · August 19, 2018, 7:40am

Ignoring whitespaces in an XML literal initialization is possible. Binding.scala’s @fxml annotation does filter out whitespace-only texts.

However, a library is unable to ignore whitespace in XML patterns without special compiler flags.

Considering some code like this:

<a> <b/> </a> match {
  case <a> <b/> </a> =>
}

It will be translated to the following code according to this proposal:

xml.tags.a(xml.text(" "), xml.tags.b(), xml.text(" ")) match {
  case xml.tags.a(xml.text(" "), xml.tags.b(), xml.text(" ")) =>
}

If xml.tags.a.apply filters out those whitespace only text, then the number of child nodes becomes 1, preventing it being matched by the pattern of the same XML literal.

BTW: I found it’s very elegance if we can make XML patterns share the same translation rule for XML creation, though I did not use XML pattern in real-world project at all.

AMatveev · August 20, 2018, 9:31am

I don’t understand the tags.a implementation. If you use the dynamics it will not be compiled:

import scala.language.dynamics
import scala.language.reflectiveCalls
object xml {
  object tags  extends Dynamic {
  def selectDynamic(name: String):{def unapply(o: Any): Boolean} = {
      new {
      def unapply(o: Any): Boolean = true
      }
    }
  }
  
}
val a = xml.tags.`tag-name`
println(a.unapply("")) //print true
"value" match {
  case xml.tags.`tag-name`() =>println("ok")
}

java.lang.NullPointerException

java.lang.NullPointerException
	at scala.tools.nsc.typechecker.PatternTypers$PatternTyper.typedConstructorPattern(PatternTypers.scala:61)
	at scala.tools.nsc.typechecker.PatternTypers$PatternTyper.typedConstructorPattern$(PatternTypers.scala:70)
	at scala.tools.nsc.typechecker.Typers$Typer.typedConstructorPattern(Typers.scala:111)
	at scala.tools.nsc.typechecker.Typers$Typer.vanillaAdapt$1(Typers.scala:1173)
	at scala.tools.nsc.typechecker.Typers$Typer.adapt(Typers.scala:1231)
	at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5654)
	at scala.tools.nsc.typechecker.Typers$Typer.typedInternal(Typers.scala:5672)
	at scala.tools.nsc.typechecker.Typers$Typer.body$2(Typers.scala:5613)
	at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5618)
	at scala.tools.nsc.typechecker.Typers$Typer.$anonfun$typed1$29(Typers.scala:4706)

yangbo · August 20, 2018, 9:43am

yangbo · August 23, 2019, 12:28am

Recently I was working on the implementation of this proposal. I’ve made some changes in the implementation to the original proposal:

Support XML namespaces.
Add a xml.literal call to wraps the entire XML literals.
Rename tags to elements
Translate constant attribute values to xml.values.xxx instead of xml.text("xxx") in order to restrict attribute values to some predefined enums.
Translate constant XML texts to xml.texts.xxx instead of xml.text("xxx") in order to restrict text content to some predefined enums.

The rest parts of this post is the modified version of the proposal. Let me know your thoughts.

Name based XML literals

Background

Name-based for comprehension has been proven success in Scala language design. A for / yield expression will be converted to higher-order function calls to flatMap , map and withFilter methods, no matter which type signatures they are. The for comprehension can be used for either Option or List , even when List has an additional implicit CanBuildFrom parameter. Third-party libraries like Scalaz and Cats also provides Ops to allow monadic data types in for comprehension.

Name-based pattern matching is introduced by Dotty. It is greatly simplified the implementation compared to Scala 2. In addition, specific symbols in Scala library ( Option , Seq ) are decoupled from the Scala compiler.

Considering the success of the above name-based syntactic sugars, in order to decouple scala-xml library from Scala compiler, name-based XML literal is an obvious approach.

Goals

Keeping source-level backward compatibility to existing symbol-based XML literals in most use cases of scala-xml
Allowing schema-aware XML literals, i.e. static type varying according to tag names, similar to the current TypeScript and Binding.scala behavior.
Schema-aware XML literals should be understandable by both the compiler and IDE (e.g. no white box macros involved)
Existing libraries like ScalaTag should be able to support XML literals by adding a few simple wrapper classes. No macro or metaprogramming knowledge is required for library authors.
The compiler should expose as less as possible number of special names, in case of being intolerably ugly .
Able to implement an API to build a DOM tree with no more cost than manually written Scala code.

Non-goals

Embedding fully-featured standard XML in Scala.
Allowing arbitrary tag names and attribute names (or avoiding reserved word).
Distinguishing lexical differences, e.g. <a b = "c"></a> vs <a b="c"/> .

The proposal

Lexical Syntax

Kept unchanged from Scala 2.12

XML literal translation

Scala compiler will translate XML literal to Scala AST before type checking.
The translation rules are:

Self-closing tags without prefixes

<tag-name />

will be translated to

xml.literal(
  xml.elements.`tag-name`()
)

Node list

<tag-name />
<prefix-1:tag-name />

will be translated to

xml.literal(
  xml.elements.`tag-name`(),
  `prefix-1`.elements.`tag-name`()
)

Attributes

<tag-name attribute-1="value"
          attribute-2={ f() }/>

will be translated to

xml.literal(
  xml.elements.`tag-name`(
    xml.attributes.`attribute-1`(xml.values.value),
    xml.attributes.`attribute-2`(xml.interpolation(f()))
  )
)

CDATA

<![CDATA[raw]]> will be translated to xml.literal(xml.texts.raw) if -Xxml:coalescing flag is on, or xml.literal(xml.cdata("raw")) if the flag is turned off as -Xxml:-coalescing .

Process instructions

<?xml-stylesheet type="text/xsl" href="style.xsl"?>

will be translated to

xml.literal(
  xml.processInstructions.`xml-stylesheet`("type=\"text/xsl\" href=\"style.xsl\"")
)

Child nodes

<tag-name attribute-1="value">
  text &amp; &#x68;exadecimal reference &AMP; &#100;ecimal reference
  <child-1/>
  <!-- my comment -->
  { math.random }
  <![CDATA[ raw ]]>
</tag-name>

will be translated to

xml.literal(
  xml.elements.`tag-name`(
    xml.attributes.`attribute-1`(xml.values.value),
    xml.texts.`$u000A  text `,
    xml.entities.amp,
    xml.texts.` hexadecimal reference `,
    xml.entities.AMP,
    xml.texts.` decimal reference$u000A  `,
    xml.elements.`child-1`(),
    xml.texts.`$u000A  `,
    xml.comment(" my comment "),
    xml.texts.`$u000A  `,
    xml.interpolation(math.random),
    xml.texts.`$u000A  `,
    xml.cdata(" raw "), //  or (xml.texts.` raw `), if `-Xxml:coalescing` flag is set
    xml.texts.`$u000A  `
  )
)

Note that hexadecimal references and decimal references will be unescaped and translated to xml.texts automatically, while entity references are translated to fields in xml.entities .

Prefixes without `xmlns` bindings.

<prefix-1:tag-name-1 attribute-1="value-1" prefix-2:attribute-2="value-2">
  <tag-name-2>content</tag-name-2>
  <!-- my comment -->
</prefix-1:tag-name-1>

will be translated to

xml.literal(
  `prefix-1`.elements.`tag-name-1`(
    `prefix-1`.attributes.`attribute-1`(`prefix-1`.values.`value-1`),
    `prefix-2`.attributes.`attribute-2`(`prefix-2`.values.`value-2`),
    `prefix-1`.texts.`$u000A  `,
    xml.elements.`tag-name-2`(
      xml.texts.content
    ),
    `prefix-1`.texts.`$u000A  `,
    `prefix-1`.comment(" my comment "),
    `prefix-1`.texts.`$u000A`
  )
)

Note that unprefixed attribute will be treated as if it has the same prefix as its enclosing element.

`xmlns` bindings.

<prefix-1:tag-name-1 xmlns="http://example.com/0" xmlns:prefix-1="http://example.com/1" xmlns:prefix-2="http://example.com/2" attribute-1="value-1" prefix-2:attribute-2="value-2">
  <tag-name-2>content</tag-name-2>
  <!-- my comment -->
</prefix-1:tag-name-1>

will be translated to

xml.literal(
  xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).elements.`tag-name-1`(
    xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).attributes.`attribute-1`(xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).values.`value-1`),
    xml.prefixes.`prefix-2`(xml.uris.`http://example.com/2`).attributes.`attribute-2`(xml.prefixes.`prefix-2`(xml.uris.`http://example.com/2`).values.`value-2`),
    xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).texts.`$u000A  `,
    xml.noPrefix(xml.uris.`http://example.com/0`).elements.`tag-name-2`(
      xml.noPrefix(xml.uris.`http://example.com/0`).texts.content
    ),
    xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).texts.`$u000A  `,
    xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).comment(" my comment "),
    xml.prefixes.`prefix-1`(xml.uris.`http://example.com/1`).texts.`$u000A`
  )
)

XML library vendors

An XML library vendor should provide a package or object named xml , which contains the following methods or values:

elements
attributes
values
entities
processInstructions
texts
comment
cdata
interpolation
noPrefix
prefixes
uris
literal

All above methods except literal should return a builder, and literal will turn one or more builders into an XML object / or an XML node list.

An XML library user can switch different implementations by importing different xml packages or objects. scala.xml is used by default when no explicit import is present.

In a schema-aware XML library like Binding.scala, its elements , attributes , processInstructions and entities methods should return factory objects that contain all the definitions of available tag names and attribute names. An XML library user can provide additional tag names and attribute names in user-defined implicit classes for tags and attributes .

In a schema-less XML library like scala-xml , its elements , attributes , processInstructions and entities should return builders that extend scala.Dynamic in order to handle tag names and attribute names in selectDynamic or applyDynamic .

Those XML libraries can be extended with the help of standard XML namespace bindings. A plug-in author can create implicit class for xml.uris to introduce foreign elements embedded in existing XML literals.

Known issues

Name clash

<toString/> or <foo equals="bar"/> will not compile due to name clash to Any.toString and Any.equals .

Compilation error is the desired behavior in a schema-aware XML library as long as toString is not a valid name in the schema. Fortunately, unlike JSX, <div class="foo"></div> should compile because class is a valid method name.
A schema-less XML library user should instead explicit construct new Elem("toString") .

White space only text

Adowrath:

Should whitespace-only text be preserved, though? I’m asking this because, if it is preserved, this won’t work:
val a = <a>
  <b/>
</a>
a match { case <a><b/></a> => () }

Alternative approach

XML initialization can be implemented in a special string interpolation as xml"<x/>", which can be implemented in a macro library. The pros and cons of these approaches are list in the following table:

	symbol-based XML literals in Scala 2.12	name-based XML literals in this proposal	`xml` string interpolation
XML is parsed by …	compiler	compiler	library, IDE, and other code browsers including Github, Jekyll (if syntax highlighting is wanted)
Is third-party schema-less XML library supported?	No, unless using white box macros	Yes	Yes
Is third-party schema-aware XML library supported?	No, unless using white box macros	Yes	No, unless using white box macros
How to highlight XML syntax?	By regular highlighter grammars	By regular highlighter grammars	By special parsing rule for string content
Can presentation compiler perform code completion for schema-aware XML literals?	No	Yes	No

julienrf · August 23, 2019, 10:11am

I don’t think we need XML literals. Why not just use Scala methods (like scalatags or scala-dom-types)?

AMatveev · August 23, 2019, 3:28pm

It has been discussed in Proposal to remove XML literals from the language
See: Against the removal summary

yangbo · October 25, 2019, 2:22am

Since there has been an implementation for this proposal, shall we move forward? @sjrd

AMatveev · October 25, 2019, 8:40am

xml.tags.`prefix-1`.`tag-name`(
  xml.attributes.`attribute-1`(xml.text("value")),
  xml.attributes.`attribute-2`(xml.interpolation(f())),
  xml.attributes.`prefix-2`.`attribute-3`(xml.interpolation("value"))
)

IIUC it is the simplest way which uses scala dynamics.
But it has a disadvatage. There are no contract between compiler and library vendors.
Would it be better to have well defined traits which make such contract?

trait Tags{
 def apply(prefix:String, name:String)(atrs:Atr*):Tag
 ....
}

xml.tags("prefix-1","tag-name")(
 xml.attributes("attribute-1")(xml.text("value")),
 xml.attributes.("attribute-2")(xml.interpolation(f())),
 xml.attributes("prefix-2","attribute-3")(xml.interpolation("value"))
)

It seems more simple to implement by a xml vendor, there will be better documentation at least

yangbo · October 25, 2019, 3:26pm

You can define the trait in a library. But keep in mind the compiler should support type signatures other than your trait

AMatveev · October 26, 2019, 10:33am

It seems more complicated by dinamics.

Why do you think that dynamics are better than statistics in such case?

tarsa · October 26, 2019, 12:03pm

@AMatveev:
Name based doesn’t necessarily mean dynamically typed. For comprehensions in Scala are name based, but usually aren’t dynamically typed. You can implement Monad typeclass for your type, but you don’t have to. It’s enough if your type has foreach, map, flatMap, etc methods. They don’t need to be defined in any common trait.

@yangbo:
As for the proposal, I’m generally against :]
scalajs-react has good enough approach for me:

<.html(
  <.head(
    <.title("My website!")
  ),
  <.body(
    <.h1("Hello, world!"),
    <.p("this is my website")
  )
)

There’s also Play framework template engine called Twirl which is essentially a XML templating engine in which you can mix Scala code with XML content (as with XML literals). AFAIR IntelliJ Ultimate already has support for Twirl templates. Instead of changing Scala language you can change Twirl templating engine. Sample code:

<ul>
@for(p <- products) {
  <li>@p.name ([email protected])</li>
}
</ul>

@if(items.isEmpty) {
  <h1>Nothing to display</h1>
} else {
  <h1>@items.size items!</h1>
}

@display(product: Product) = {
  @product.name ([email protected])
}

<ul>
@for(product <- products) {
  @display(product)
}
</ul>

martijnhoekstra · October 26, 2019, 1:45pm

This objection makes sense to me if XML literals weren’t already part of the language.

But seeing that they are, this proposal makes the language more flexible and less dependent on blessed libraries. The alternatives you give don’t change that

Katrix · October 26, 2019, 2:32pm

I would not say that Twirl is that good of a replacement here. It doesn’t give you fine grained types IIRC, while it claims to be scala like it also has many surprising behaviors, it requires IntelliJ ultimate for IDE support, and even when you have that IDE support it loves to stop working.

tarsa · October 26, 2019, 3:43pm

As I’ve suggested before - if Twirl has issues (like bugs or unintuitive behaviour) then you can improve Twirl without changing the Scala language.

Also there’s nothing stopping JetBrains (authors of IntelliJ) from disabling syntax completion in proposed XML literals in Community edition of IntelliJ IDEA.

yangbo · October 26, 2019, 4:01pm

Since this proposal is implemented before type checking, any IDE based on presentation compiler should support XML literals automatically.

Metals is supporting the current macro based implementation at the moment.

tarsa · October 26, 2019, 4:49pm

IntelliJ Scala plugin has its own Scala language parser. JetBrains plan to make some hybrid approach, i.e. use some of LSP functionalities to enrich assistance, but they don’t plan to give up on their parser. That custom parser will be the foundation of IntelliJ Scala plugin for the foreseeable future.

sjrd · November 14, 2019, 5:18pm

The last SIP meeting ended up being private-only, but we did discuss this SIP. We will discuss again in public on November 27th, but here is nevertheless the gist of the feedback from the SIP committee.

Overall, there seem to be agreement that the name-based approach can be good. However, there is still a lot of reluctance in supporting first-class XML literals in the language. Instead, we would like to see this implemented as a library using a string interpolator macro. Since pattern matching does not need to be supported, it should be possible to do this both in Scala 2 with scala-reflect macros and in Scala 3 with inline, quotes and splices.

Therefore, the feedback of the committee is that the author (or anyone else) is encouraged to try and implement the proposal as a macro library. @yangbo Given your track record of macro implementer, it also seems to us that you should be more than qualified to do so.

SIP: name based XML literals

Name based XML literals

Background

Goals

Non-goals

The proposal

Lexical Syntax

XML literal translation

Self-closing tags without prefixes

Node list

Attributes

CDATA

Process instructions

Child nodes

Prefixes without xmlns bindings.

xmlns bindings.

XML library vendors

Known issues

Name clash

White space only text

Alternative approach

Prefixes without `xmlns` bindings.

`xmlns` bindings.