SIP: Curried varargs

yangbo · August 10, 2019, 11:51pm

History

Date	Version
Aug 11th 2019	Initial Draft
Aug 12th 2019	Translating sequence arguments to `applyNextSeq` calls

Introduction

The repeated parameters syntax is widely used in Scala libraries to create collection initializers, string interpolations, and DSLs. Unfortunately, repeated parameters are type unsafe as it erase all arguments to their common supertype, inefficient as it creates a temporary Seq that is difficult to be eliminated by optimizer. In practice, all sophisticated string interpolation libraries, including string formatting and quasiquotes in standard library, scalameta and my fastring library, are written in macros in order to avoid runtime overhead of repeated parameters.

We propose curried varargs to improve both the type safety and the performance. Given a function call f(a, b, c), when f is a subtype of Curried, the function call should be rewritten to f.applyBegin.applyNext(a).applyNext(b).applyNext(c).applyEnd.

Motivating Examples

Examples

Recently I was working on the implementation of Pre SIP: name based XML literals. During implementing that proposal, I found that the proposal is inefficiency due to repeated parameters, and it could be improved dramatically with the help of curried functions.

For example, according to the proposal the XML literal <div title="my-title">line1<br/>line2</div> will result the following code:

xml.tags.div(
  xml.attributes.title(xml.text("my-title")),
  xml.text("line1"),
  xml.tags.br(),
  xml.text("line2")
)

With the help of this curried varargs proposal and @inline, we are able to implement an API to build a DOM tree with no additional overhead over manually written Scala code.

import org.scalajs.dom.document
import org.scalajs.dom.raw._

object xml {
  type Attribute[-A <: Element] = A => Unit
  @inline def text(data: String) = data
  object attributes {
    @inline def title(value: String): Attribute[Element] = _.setAttribute("title", value)
  }
  object tags {
    class Builder[+E <: Element](private val element: E) extends AnyVal with Curried {
      @inline def applyBegin = this
      @inline def applyNext(text: String) = {
        element.appendChild(document.createTextNode(text))
        this
      }
      @inline def applyNext(node: Node) = {
        element.appendChild(node)
        this
      }
      @inline def applyNext[A <: Attribute[E]](attribute: A) = {
        attribute(element)
        this
      }
      @inline def applyEnd = element
    }
    @inline def div = new Builder(document.createElement("div"))
    @inline def br = new Builder(document.createElement("br"))
  }
}

Since xml.tags.div returns a Builder, which is a subtype of Curried, calls on xml.tags.div will be translated to the curried form, as shown below:

xml.tags.div
  .applyBegin
  .applyNext(xml.attributes.title(xml.text("my-title")))
  .applyNext(xml.text("line1"))
  .applyNext(xml.tags.br.applyBegin.applyEnd)
  .applyNext(xml.text("line2"))
  .applyEnd

When the above code is compiled in Scala.js, the builders should be eliminated entirely as a zero cost abstraction layer, and the output JavaScript is tiny as shown below:

var $$this = $m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createElement("div");
$$this.setAttribute("title", "my-title");
$$this.appendChild($m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createTextNode("line1"));
var $$this$1 = $m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createElement("br");
$$this.appendChild($$this$1);
$$this.appendChild($m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createTextNode("line2"));

Comparison Examples

The Builder API can be also implemented in repeated parameters:

import org.scalajs.dom.document
import org.scalajs.dom.raw._

object xml {
  type Attribute[-A <: Element] = A => Unit
  @inline def text(data: String) = data
  object attributes {
    @inline def title(value: String): Attribute[Element] = _.setAttribute("title", value)
  }
  object tags {
    class Builder[+E <: Element](private val element: E) extends AnyVal {
      @inline def apply(attributesAndChildren: Any*) = {
        attributesAndChildren.foreach {
          case text: String =>
            element.appendChild(document.createTextNode(text))
          case node: Node =>
            element.appendChild(node)
          case attribute: Attribute[E] =>
            attribute(element)
        }
        element
      }
    }
    @inline def div = new Builder(document.createElement("div"))
    @inline def br = new Builder(document.createElement("br"))
  }
}

However, the Scala compiler is unable to optimize repeated parameters, as a result, the output JavaScript from Scala.js would look like the below code.

var $$this$1 = $m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createElement("div");
var this$3 = $m_LScalaFiddle$xml$attributes$();
var jsx$1 = new $c_sjsr_AnonFunction1().init___sjs_js_Function1((function($this, value) {
  return (function(x$1$2) {
    x$1$2.setAttribute("title", value)
  })
})(this$3, "my-title"));
var $$this = $m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createElement("br");
var array = [jsx$1, "line1", $$this, "line2"];
var i = 0;
var len = $uI(array.length);
while ((i < len)) {
  var index = i;
  var arg1 = array[index];
  if ($is_T(arg1)) {
    var x2 = $as_T(arg1);
    $$this$1.appendChild($m_Lorg_scalajs_dom_package$().document__Lorg_scalajs_dom_raw_HTMLDocument().createTextNode(x2))
  } else if ($uZ((arg1 instanceof $g.Node))) {
    $$this$1.appendChild(arg1)
  } else if ($is_F1(arg1)) {
    var x4 = $as_F1(arg1);
    x4.apply__O__O($$this$1)
  } else {
    throw new $c_s_MatchError().init___O(arg1)
  };
  i = ((1 + i) | 0)
};

Despite of the type safety issue due to the usage of Any, the above code are inefficient:

Unnecessary temporary object for the xml.attributes.title(xml.text("my-title")).
Unnecessary temporary Seq to hold repeated parameters.
Unnecessary runtime type check for each argument.

The similar issues can be found in many other usage of repeated parameters. For example, Scala string interpolation is inefficient due to its internal vararg function call, unless implementing it in a macro; Scala collection initializers (e.g. List(1, 2, 3)) create unnecessary temporary Seq before creating the desired collection.

Design

This proposal introduces a new type Curried defined as following:

trait Curried extends Any

When a function call f(p1, p2, p3, ... pn) is being type checked, the compiler will firstly look for apply method on f. If an applicable apply method is not found and f is a subtype of Curried, the compiler will convert the function call to curried form f.applyBegin.applyNext(p1).applyNext(p2).applyNext(p3) ... .applyNext(pn).applyEnd, and continue type checking the translated call.

Expanding sequence argument

Optionally, some arguments to a Curried call may be a sequence argument marked as _*. Those are arguments should be translated to applyNextSeq calls instead of applyNext. For example, f(p1, s1: _*, p2) will be translated to the following code.

f.applyBegin
  .applyNext(p1)
  .applyNextSeq(s1)
  .applyNext(p2)
.applyEnd

Unlike traditional repeated parameters, which restrict the sequence argument at the last position, sequence arguments in a curried call are allowed at any position.

Builder type shifting

The type of partially applied function might be changed during applying each argument. Given the following type signature:

class ListBuilder[A] {
  def applyNext[B >: A](b: B): ListBuilder[B] = ???
  def applyNextSeq[B >: A](seqB: Seq[B]): ListBuilder[B] = ???
  def applyEnd: List[A] = ???
}
object List extends Curried {
  def applyBegin[A]: ListBuilder[A] = ???
}

List(42, "a") should be translated to List.applyBegin.applyNext(42).applyNext("a").applyEnd. Then, the typer will infer type parameters as List.applyBegin[Nothing].applyNext[Int](42).applyNext[Any]("a").applyEnd, therefore the final return type of applyEnd will be List[Any].

Explicit type parameters

When a Curried is invoked with some type arguments, those type arguments will be moved to the applyBegin method. Therefore, List[Int](1 to 3: _*) should be translated to List.applyBegin[Int].applyNextSeq(1 to 3).applyEnd.

Implicit parameters

A more common form of curried function call would be like f(a)(b)(c). We prefer the explicit named method calls to applyNext instead of the common form, in order to support implicit parameters in applyNext. Therefore, each explicit parameter might come with an implicit parameter list, resolving the infamous multiple type parameter lists issue.

Multiple curried vararg parameter lists

When a Curried is invoked with multiple parameter lists, for example:

f(a, b, c)(d, e)

Then the first parameter list should be translated to a curried call:

f.applyBegin
  .applyNext(a)
  .applyNext(b)
  .applyNext(c)
.applyEnd(d, e)

(d, e) is translated to the curried form only if applyEnd returns a Curried.

Overloaded curried calls

Curried varargs enables overloaded functions for each parameter. Parameters will not be erased to their common supertype.

Implementation

This proposal can be implemented either in the Scala compiler or in a whitebox macro. Curried.scala is an implementation of the proposal in a whitebox macro.

Alternatives

Repeated parameters

Repeated parameters are packed into a Seq, which is then passed to the callee.

Pros

Interoperable with Java

Cons

Always boxing value class parameters
Unable to inline function parameters
Unable to inline call-by-name parameters
Unable to perform implicit conversion for each parameter
Unable to infer context bound for each parameter
Erasing all parameters to their common super type

Reference

morgen-peschke · August 11, 2019, 4:20am

I really like this idea.

How do you propose something like this would desugar?

foo(seq: _*)

Maybe a synthetic tail recursive helper calling applyPartially over each element until the sequence is consumed? That would free the author from implementing this manually, which is a nice bonus, or should it be like unapplySeq and require the author to explicitly handle this case?

yangbo · August 11, 2019, 5:26am

Thank you for the feedback. I did not thought about _* before. Now I added an Expanding sequence argument section to address the use case that you mentioned.

LPTK · August 11, 2019, 7:19am

I remember finding out, after some profiling, that my program was spending a huge amount of time just creating arrays wrapped in Seq and tearing them down immediately just because I used List(x) instead of x :: Nil (and the like). Now I’m instinctively afraid of varargs and always use :: even though it’s uglier.

I like this proposal because it could be used to make List.apply not stupidly inefficient, compared to using :: and Nil.

tarsa · August 11, 2019, 9:06am

One solution that already works is just encoding curried varargs explicitly as curried varargs:

object FastList {
  def main(args: Array[String]): Unit = {
    println(FastListBuilder.done())           // List()
    println(FastListBuilder(1).done())        // List(1)
    println(FastListBuilder(1)(2).done())     // List(1, 2)
    println(FastListBuilder(1)(2)(3).done())  // List(1, 2, 3)
  }
}

class FastListBuilder[A] private {
  private val listBuffer = scala.collection.mutable.ListBuffer[Any]()

  def apply[B >: A](nextElem: B): FastListBuilder[B] = {
    listBuffer += nextElem
    this.asInstanceOf[FastListBuilder[B]]
  }

  def done(): List[A] = listBuffer.toList.asInstanceOf[List[A]]
}

object FastListBuilder {
  def done(): List[Nothing] =
    new FastListBuilder().done()

  def apply[A](firstElem: A): FastListBuilder[A] =
    new FastListBuilder()(firstElem)
}

A relatively big drawback of this encoding is somewhat ugly client code. Instead of FastListBuilder(1, 2, 3, 4, 5) we need to have e.g. FastListBuilder(1)(2)(3)(4)(5).done() Such syntax is ugly enough that probably nobody uses it for the most common use case which is building collections.

Another drawback is exposed mutability. Each consecutive apply mutates the collection builder and we lose referential transparency.

yangbo · August 11, 2019, 3:25pm

I would not consider “ugly” as a drawback. Anyway, you could use some pretty symbols for explicit currying, like List < 1 :+ 2 :+ 3 >. The drawbacks I see are:

Explicit currying is not source level compatible with existing Scala code.
Currying in Scala is more verbose than currying in Haskell or OCaml in term of syntax.
Explicit curried collection initializers loss the symmetry with pattern matching.

szeiger · August 12, 2019, 9:52am

Do you have any performance numbers for this design? In theory it should require fewer allocations but that may not matter if Hotspot can elide the allocations of short-lived wrapper objects entirely.

The proposed translation of : _* is problematic. Currently this is the most efficient way of calling a varargs method. Simply pass an existing Seq and it becomes a straight-forward method call with no extra allocations. With this proposal it would become the least efficient way: It always requires recreating the sequence, plus the external foldLeft call makes it impossible for the curried varargs method to optimize based on the existing size or array-based internals of the sequence.

It would be nice if the proposal also addressed pattern matching. Can/should this be done by currying or does the new scheme keep the existing varargs pattern matching, thus making it less uniform.

morgen-peschke · August 12, 2019, 3:11pm

At alternate way of translating :_* could be to translate it to a method with a default implementation,
so it could be overridden if there’s a more efficient way of handling it known to the programer.

For example:

f(a, b, c:_*)

Would be translated into this:

f.applyBegin
  .applyNext(a)
  .applyNext(b)
  .applyNextSeq(c)
  .applyEnd

The default implementation of applyNextSeq could be something along these lines:

def applyNextSeq[A](as: Seq[A]): ListBuilder[A] = as.foldLeft(this)(_.applyNext(_))

But, if the programmer knows a better way of doing this, it could be overridden (using the ListBuilder example from above):

class ListBuilder[A] {
  def applyNext[B >: A](b: B): ListBuilder[B] = ???
  def applyNextSeq[B >: A](b: Seq[B]): ListBuilder[B] = ???
  def applyEnd: List[A] = ???
}
object List extends Curried {
  def applyBegin[A]: ListBuilder[A] = ???
}

I could be missing something, but this is basically a simple desugaring, and I think that changing pattern matching to be isomorphic would require deeper changes to the compiler.

yangbo · August 12, 2019, 10:48pm

I think the correspondence between pattern matching and initializers in Scala is meant to be achieved at the grammar level. I did not see the symmetry at the bytecode level between unapply, unapplySeq and apply.

This proposal does not affect the correspondence at the grammar level. In fact, the original intention of this proposal is to keep the same translation rule for both pattern matching and initializers in name based XML literals when optimizing the performance of that proposal. Explicit currying cannot achieve the goal.

yangbo · August 12, 2019, 10:56pm

I like your idea, and I have updated the proposal for applyNextSeq. applyNextSeq also resolves the type inference problem in Scala 2 due to foldLeft.

yangbo · August 12, 2019, 11:15pm

There are some discussion at https://github.com/scala/scala/pull/6093, indicating how traditional varargs hurt the performance. Note that this proposal enables zero cost abstractions, eliminating all wrappers at compile time, with the help of @inline annotations and value classes, achieving similar performance to macro generated code.

morgen-peschke · August 13, 2019, 4:48pm

One thing that really interests me about this proposal is that, combined with shapeless’s HList, it could provide a way to cleanly abstract over arity by using applyNext to build the HList, before doing something with it in the applyEnd

@milessabin could probably come up with something better, but this looks workable:

class ArbitraryArityFunction[Args <: HList, RetVal](f: Args => RetVal) extends Curried {
  def applyBegin: HListBuilder[HNil] = new HListBuilder[HNil](HList())
}
object ArbitraryArityFunction {
  class HListBuilder[In <: HList, Args <: HList, RetVal](val in: In, f: Args => RetVal) {
    def applyNext[S](suffix: S)(implicit P: hlists.Prepend[In, S]) : HListBuilder[P.Out] = 
      new HListBuilder[P.Out](in :+ suffix, f)

    def applyEnd(implicit ev: In =:= Args): RetVal = f(in)
  } 
}

milessabin · August 14, 2019, 9:08am

@morgen-peschke shapeless has several gadgets which are probably relevant to your example. It has type classes which can convert between arbitrary arity functions and single argument functions which take an HList argument. shapeless’s ProductArgs and RecordArgs traits might also be relevant.

I think we can most likely do better in Dotty.

Jasper-M · August 14, 2019, 9:37am

Though if dotty removes all autotupling you lose a tool for arity abstraction.

yangbo · August 24, 2019, 6:30pm

It been two weeks since the initial post. I have created a pull request for this proposal https://github.com/scala/docs.scala-lang/pull/1487 . @sjrd Shall we move forward?

yangbo · August 26, 2019, 8:24am

Dotty’s name based pattern matching extract values by _1 to _N methods. If we instead use head and tail to extract values, then pattern matching becomes “curried” automatically. The change will not break unapply methods that return Option[TupleN], because in Dotty a Tuple is just an HList. This head/tail based pattern matching also unifies unapply and unapplySeq, as it is compatible with Seq and enables rest pattern for tuples as well.

The only drawback is that it does not perform well for collections that has a slow tail method (e.g. most of mutable collections). Even if an unapply author returns a view of a mutable collection, it still creates more temporary objects than the current Dotty implementation.

ghik · August 28, 2019, 8:09am

I really like this proposal.

Apart from applyNext and applyNextSeq, I would also like to see applyNextNamed which would be used in case of named arguments.

class StringMapBuilder[V] {
  def applyNextNamed[V0 >: V](key: String, value: V0): StringMapBuilder[V0] = ???
  def applyEnd: Map[String,V] = ???
}
object StringMapBuilder extends Curried {
  def applyBegin[V]: StringMapBuilder[V]
}

makeMap(a = 1, b = 2)
would translate to
makeMap.applyBegin.applyNextNamed("a", 1).applyNextNamed("b", 2).applyEnd

Also, I think it would be good to bring this work under some common denominator with scala.Dynamic

yangbo · August 28, 2019, 8:54am

If applyNextNamed is added to this proposal, I would suggest to deprecate all methods except selectDynamic in Dynamic, making Curried and Dynamic be orthogonal.

ghik · August 28, 2019, 9:30am

Technically they are not methods which can be deprecated, because there’s no fixed signature. Also, updateDynamic would also have to stay. But we could possibly get rid of applyDynamic and applyDynamicNamed.

yangbo · August 28, 2019, 5:18pm

Currently, a.f = v is translated to a.updateDynamic("f")(v) if a is a Dynamic, and a.f(b, c) = v is converted to a.selectDynamic.update(a, b, c). If a.f.update is a Curried, it should be further translated to a.f.update.applyBegin.applyNext(a).applyNext(b).applyNext(v).applyEnd.

The rule is not ideal for two reason:

Inconsistency between assignment with or without arguments.
As a library author, you cannot distinguish indices from the value to assign. Suppose you have an N-dimensional array of int where N is determined at runtime, when applyNext is invoked with an Int, it does not know whether it is an index of a value.

I wonder if we could separate indices from the value, like, translating a(b, c) = v to a.update(b, c)(v).