Whitebox def macros

olafurpg · October 9, 2017, 10:14am

This post is a followup of Roadmap towards non-experimental macros | The Scala Programming Language to initiate a discussion whether whitebox def macros should be included in an upcoming SIP proposal on macros. Please read the blog post for context.

Whitebox macros are similar to blackbox def macros with the distinction that the result type of whitebox def macros can be refined at each call-site. The ability to refine the result types opens up many applications including

fundep materialization, used by shapeless Generic
extractor macros, used by quasiquotes in scala.reflect, scala.meta, and
scala.macros
anonymous type providers

To give an example of how blackbox and whitebox macros differ, imagine that we wish to implement a macro to convert case classes into tuples.

import scala.macros._
object CaseClass {
  def toTuple[T](e: T): Product = macro { ??? }
  case class User(name: String, age: Int)
  // if blackbox: expected (String, Int), got Product
  // if whitebox: OK
  val user: (String, Int) = CaseClass.toTuple(User("Jane", 30))
}

As you can see from this example, whitebox macros are more powerful than blackbox def macros.
A whitebox macro that declares its result type as Any can have it’s result type refined to any precise type in the Scala typing lattice. This powerful capability opens up questions. For example, do implicit whitebox def macros always need to be expanded in order be disqualified as a candidate during implicit search?

Quoting Eugene Burmako from SIP-29 on inline/meta, which contains a detailed analysis on “Loosing whiteboxity”

The main motivation for getting rid of whitebox expansion is simplification -
both of the macro expansion pipeline and the typechecker. Currently, they
are inseparably intertwined, complicating both compiler evolution and tool
support.

Note, however, that the portable design of macros v3 (presented in http://scala-ang.org/blog/2017/10/09/scalamacros.html) should in theory make it possible to infer the correct result types for whitebox macros in IDEs such as IntelliJ.

Quoting the minutes from the Scala Center Advisory Board:

Dotty, he [Martin Odersky] says, wants to be a “capable language” rather than
a “language toolbox”. So it matters whether whitebox macros are being used to do
“Scala-like” things, or to turn Scala into something else. So “we will have
to look at each one” of the ways whitebox macros are being used.

Adriaan Moors, the Scala compiler team lead at Lightbend agreed with Martin, and mentioned a current collaboration with Miles Sabin to improve scalac so that Shapeless and other libraries can rely less on macros and other nonstandard techniques

What do you think, should whitebox def macros be included in the macros v3 SIP proposal? In particular, please try to answer the following questions

towards what end do you use whitebox def macros?
why are whitebox def macros important for you and your users?
can you use alternative metaprogramming techniques such as code generation scripts or compiler plugins to achieve the same functionality? How would that refactoring impact your whitebox macro?

LPTK · October 9, 2017, 1:11pm

Thanks a lot to Ólafur, Eugene and the Scala Center in general for setting up such a thorough and transparent process.

Here are my personal thoughts, as an extensive user of macros:

Quasiquotes

The first use-case for whitebox macros that comes to mind is of course quasiquotes, because we often want what is quoted to influence the typing of the resulting expression. This is invaluable when one wants to design type-safe quasiquote-based interfaces. For example, see the Contextual library. Haskell has similar capabilities thanks to Template Haskell.

Type-Safe Metaprogramming

This extends the point above, but it goes much further.

We have been working on Squid, an experimental type-safe metaprogramming framework that makes use of quasiquotes as its primary code manipulation tool. Squid quasiquotes are statically-typed and hygienic. For example { import Math.pow; code"pow(0.5,3)" } has type Code[Double] and is equivalent to code"_root_.Math.pow(0.5,3)".

(You can read more about Squid Code quasiquotes in our upcoming Scala Symposium paper: Type-Safe, Hygienic, and Reusable Quasiquotes.)

The main reasons for using whitebox quasiquote macros here are:

to enable pattern matching: we have an alternative code{pow(0.5,3)} syntax that could be a blackbox, but it doesn’t work in patterns (while the quasiquoted form works); making patterns more flexible might be a way to solve this particular point;
to enable type-parametric matching: one can write things like pgrm.rewrite{ case code"Some[$t]($x).get" => x }. This works thanks to some type trickery, namely it generates a local module t that has a type member t.Typ, and types the pattern code using that type, extracting an x variable of type Code[t.Typ]. This is somewhat similar to the type providers pattern. The rewrite call itself is also a macro that, among other things, makes sure that rewritings are type-preserving.
to enable extending Scala’s type system: we have alternative ir quotation mechanism that is contextual in the sense that quoted term types have an additional context parameter. This (contravariant) type parameter expresses the term’s context dependencies/requirements. Term val q = ir"(?x:Int).toDouble" introduces a free variable x and thus has type IR[Double,{val x:Int}] where the second type argument expresses the context requirement. (IR stands for Intermediate Representation.) Expression code"(x:Int) => $q + 1" had type IR[Int => Double,{}] because the free variable x in q was captured (this is determined statically). That term can then be safely be ran (using its .run method, which requires an implicit proving that the context is empty C =:= {}). Thus we “piggyback” on Scala’s type checker in a modular way to provide our own user-friendly safety checking that would be very hard to express using vanilla Scala.

As you have guessed, this relies on invoking the compiler from within the quasiquote macro. I understand that this is technically tricky and makes type-checking “inseparably intertwined” with macro expansion, but on the other hand that’s also an enormous advantage. If it’s possible to sanitize the interface between macros and type-checkers, that would give Scala a very unique capability that puts it in a league of its own in terms of expressivity –– basically, the capability to have an extensible type system.

Could Squid’s quasiquotes be made a compiler plugin? Probably, though I’m not knowledgeable enough to answer with certainty, and I suspect it would be very hard to integrate these changes right into the different versions of Scala’s type checker.

As an aside, in Squid we also came up with the “object algebra interface” way to make language constructs expressed in the quasiquotes independent from the actual intermediate representation of code used. This seems similar to the way the new macros are intended to work –– the main difference being that we support only expressions (not class/method definitions).

The Dynamic trait

I think the usage of the Dynamic trait becomes extremely limited (from a type-safe programming point of view) if we don’t have a way to refine the types of the generated code based on the strings that are passed to its methods selectDynamic & co. (doing so is apparently even known as the “poor man’s type system”).

If that is possible to do in a sane way, I could not recommend going with that possibility enough!

olafurpg · October 10, 2017, 8:26am

Thank you for your detailed response @LPTK

In Squid, do you rely on fundep materialization? Implicit Macros | Macros | Scala Documentation There may be a design space between blackbox and whitebox def macros that supports refined result types but not fundep materialization.

I suspect it would be very hard to integrate these changes right into the different versions of Scala’s type checker.

I suspect so too, we face the same challenges designing a macro system that works reliably across different compilers

The Dynamic trait

That is a good observation. I am not sure how common this technique is. I have contacted the author of scalikejdbc to share how they use selectDynamic with whitebox def macros.

Also, not sure how Rethink Structural Types · Issue #1886 · lampepfl/dotty · GitHub may impact this.

the capability to have an extensible type system.

Note that this may not necessarily be a desirable capability. Some whitebox def macros are so powerful they can be used to turn Scala into another language!

retronym · October 10, 2017, 8:51am

I’d like a way for whitebox macros authors to be able (although not neccesarily obliged) to separate the part of the macro that computes the return type from the part that computes the expanded term. Let’s call the first part “signature macros”.

For implicit macros, this would lend itself to more efficient typechecking. Even for non-implicit macros, an IDEs could be more efficient if they could just run the “signature macro”.

I think that this separation also will help to shine a light on whether the full Scala language is the right language for signature macros, or if a more restrictive language could express a broad set of use cases of whitebox macros.

I suppose the contract would be that if the signature macro returned a type and no errors, the corresponding term expansion macro would be required to succeed and to conform to the computed return type.

Obviously a naive implementation of the signature macro is to just run the term macro and typecheck it, as per the status quo. I think we should aim higher than that, though!

heathermiller · October 10, 2017, 9:04am

Ryan Culpepper recently suggested essentially the same thing that you call “signature macros” two weeks ago! …Really glad to hear this suggestion; means that at least a subset of us are thinking along the same lines

cc/ @olafurpg

LPTK · October 10, 2017, 9:13am

Not currently. We had a prototype system that perhaps did something like that (not sure): it was a system for statically generating evidence that structural types did not contain certain names or were disjoint in terms of field names. For example, you could write def foo[A,B](implicit dis: A <> B) meaning that A and B are structural types that share no field names. You could then call foo[{val x:Int},{val y:Double}] but not foo[{val x:Int},{val x:Double}]. When extendind an abstract context C as in C{val x:Int}, the contextual quasiquote macro would look for an evidence that C <> {def x} to ensure soundness in the face of name clashes.
However, instead of porting that old prototype to the current system, we’re probably going to move to a more modular solution, which shouldn’t need any implicit macros.

There is one particularly nasty thing that a Squid implicit macro currently does: it looks inside the current scope to see if it can find some type representation evidence. This allows us to use an extracted type t implicitly as in case ir"Some[$t]($x) => ... implicitly[t.Typ] ... instead of having to write case ir"Some[$t]($x) => implicit val t_ = t; ... implicitly[t.Typ] .... I understand this is probably asking macros for too much, and I think we could do without it (though it may degrade the user experience a little).

About Dynamic, one of the things I’ve used it for was to automatically redirect method calls to some wrapped object (cf. composition vs inheritance style).

Yeah, it’s a judgement call. IMHO Scala is already a language that lets you define a myriad different sub-languages thanks to its flexible syntax and expressive type system. I think that’s one thing many people like about the language (cf., for example, the vast ecosystem of SQL/data analytic libraries that define their own custom syntaxes and semantics).

LPTK · October 10, 2017, 9:28am

Sounds like the most natural way to do it would be to just have type macros. Then whitebox macros are just blackbox macros with a return type that is a macro invocation.

def myWhitebox[A](a: A, str: String): MyReturn[A, str.type] = macro ...
type MyReturn[A, S <: String with Singleton] = macro ...

It’s a nice separation of concerns. But I’m afraid there are a lot of whitebox macros in the wild where both code generation and type refinement are very much intertwined, because they’re semantically inseparable. In the case of Squid, what I’d do is to parametrize the current macro to either just compute a type or do the full code generation; but that would mean a lot of computation would be duplicated (I would have to parse, transform, typecheck and analyse the quasiquote string in both type signature and code-gen macro invocations), and batch compile times would be strictly worse.

ethan.atkins · October 11, 2017, 6:22pm

To add onto what @LPTK wrote, I’d speculate that there are very few whitebox macros for which the signature macro could be easily separated from the term macro without a lot of code duplication and/or redundant work. An alternative approach may be to conflate the signature macro and the term macro. The macro expansion could return a tuple of (List[c.Type], c.Tree) where the list of types must contain exactly as many types as there are method type arguments. For example, suppose that I want to implement the CaseClass.toTuple[T] method from above.

object CaseClass {
  def toTuple[C, T](cls: C): T = macro CaseClassMacros.impl[C, T]
}
class CaseClassMacros(val c: Context) {
  import c.universe._
  def impl[C: c.WeakTypeTag, T](cls: c.Expr[C]): (List[c.Type], c.Tree) = {
    ...
    val tree = q"""..."""
    val tType: c.Type = ???
    val resultTypes = List(weakTypeOf[C], tType)
    (resultTypes, tree)
  }
}

The typechecking of the returned tree could be deferred until after the compiler has verified that result types are valid. There would be no need to re-expand the macro using the result types since the type T is a functional dependency of C.

While this is less conceptually elegant than having independent signature and term macros, I think that it would be more practical for macro authors.

odersky · October 13, 2017, 1:09pm

When thinking about macros I have found it useful to consider two dimensions:

First dimension: What is the expressive power of the macro language?

Inlining only
Purely functional, interpreted subset of the language with heavy sandboxing
Full power of the underlying (compiled) language

Second dimension. When should this power be available?

Only in a specialized version of the language
In every build
After every editor/IDE keystroke

Scala with whitebox macros is currently at the extreme point (3, 3) of the matrix. This is IMO is a very problematic point to be on. Having the full power of the underlying language at your disposal means your editor can (1) crash, (2) become unresponsive, or (3) pose a security risk, just because some part of your program is accessing a bad macro in a library. That’s not hypothetical. I still remember the very helpful(?) Play schema validation macro that caused all IDEs to freeze.

Scala with blackbox macros is at (3, 2). This is slightly better as only building but not editing is affected by bad macros and you can do a better job of isolating and diagnosing problems. But it still would make desirable tools such as a compile server highly problematic because of security concerns.

If we take other languages as comparisons they tend to be more conservative. Template Haskell lets you do lots of stuff, but it is its own language. I believe that was a smart decision of the Haskell designers. Meta OCaml is blackbox only and does not have any sort of inspection, so it’s essentially compile-time staging and nothing else.

So, if Scala continued to have whitebox macros it would indeed be far more powerful than any other language. Is that good or bad? Depends on where you come from and what you want to do, for sure. But I will be firmly in the “it would be very bad” camp. In the future, I want to concentrate on making Scala a better language, with better tooling, as opposed to a more powerful toolbox in which people can write their own language . There’s nothing wrong with toolboxes, but it’s not a primary goal of Scala as I see it.

Given this dilemma, maybe there’s no single solution that satisfies all concerns. That was the original motivation of the inline/meta proposal in SIP 29: Have only inlining available as a standard part of the language. Inlining does a core part of macro expansion (arguably, the hardest part to implement correctly). Then build on that using meta blocks that are enabled by a special compiler mode or a compiler plugin. If we have only blackbox macros the plugin can be a standard one which simply runs after typer. With whitebox macros the “plugin” would in fact have to replace the typer, which is much more problematic. I believe it would in effect mean we define a separate language, similar to Template Haskell. That’s possible, but I believe we need then to be upfront about this.

odersky · October 13, 2017, 1:32pm

One thing to add to my previous comment: Some form of type macros (or, as @retronym calls them, signature macros) might be a good replacement for unfettered whitebox macros. Dotty’s inline essentially does two things:

beta reduction of inline function application
simplification of if-then-else with statically known conditions

In the type language, we already have beta-reduction. If

type F[X] = G[X]

then F[String] is known to be the same as G[String]. If we add some form of condiional, we might already have enough to express what we want, and we would stay in the same envelope of expressive power.

LPTK · October 14, 2017, 5:05pm

To get into the same ballpark in terms of expressiveness, I think you’ll also need some form of recursion purely at the type level, which is not currently possible:

type Fix[A[_]] = A[Fix[A]]

illegal cyclic reference: alias [A <: [_$2] => Any] => A[Fix[A]] of type Fix refers back to the type itself

Wouldn’t supporting this potentially break the type system pretty badly?

A minor nitpick:

Actually, MetaOCaml is not related to macros. It’s essentially for generating and compiling code at runtime (traditional multi-stage programming) –– though it’s true that the approach was ported to compile-time with systems such as MacroML, or more recently modular macros.

odersky · October 14, 2017, 7:39pm

@LPTK Yes, we’d have to add some form of recursion to type definitions, with the usual complications to ensure termination.

You are right about Meta OCaml. I meant OCaml Macros: https://oliviernicole.github.io/about_macros.html

odersky · October 15, 2017, 4:54pm

For implicit macros, this would lend itself to more efficient typechecking.

Indeed. But, furthermore we have by now decided that every implicit def needs to come with a declared return type. This restriction is necessary to avoid puzzling implicit failures due to cyclic references. So, it seems whatever is decided for whitebox macros, implicit definitions in the future cannot be whitebox macros.

mrumkovskis · October 16, 2017, 5:19pm

We use whitebox macros to compile db queries and return query result as typed rows, i.e. db query string also serves as a class definition.
For example:
scala> tresql"emp[ename = ‘CLARK’] {ename, hiredate}".map(row => row.ename + " hired " + row.hiredate) foreach println
select ename, hiredate from emp where ename = 'CLARK’
CLARK hired 1981-06-09

Is there a way to achieve this without whitebox macros?

nightscape · October 16, 2017, 8:50pm

We use whitebox macros to do symbolic computation (using a Java library called Symja) at compile-time.
As we have no idea what the final function/formula is going to look like, we cannot define a fixed return type.
I’d also be interested if there’s a way to do this without whitebox macros.

mdedetrich · October 18, 2017, 1:27pm

I would also ask the guys at Quill http://getquill.io/. I think they use whitebox macros quite heavily and it would be a shame if Quill will not be able to work with Dotty due to this reason since its doing an excellent job at solving the problem its solving (strong statically typed SQL that is also performant). I will ping the guys so they can provide their opinions/feedback

soronpo · October 18, 2017, 1:34pm

I think Scala is very DSL friendly and I praise it a lot in that sense, since it is my primary usage for the language (to create a custom DSL). To discard that away would be a shame, IMHO.

soronpo · October 18, 2017, 1:40pm

We heavily use whitebox macros in the singleton-ops library for the same reason. It may be possible to create a language feature that supports this type of thing, but currently macros is all we got.

fwbrasil · October 19, 2017, 10:13pm

Quill uses a whitebox macro to encode type-level information about the original AST of a quotation. This mechanism allows Quill to generate queries at compile time, providing quick feedback to the user about the final SQL query and almost zero runtime overhead. It also opens the path for more advanced features like compile-time query probing. Example:

When testDB.run is called, the macro only knows that a term q is being used. The macro system doesn’t provide a way to allow inspection of the original AST of q. To workaround this limitation, Quill encodes the original AST information as a type annotation of the type refinement generated by the quote macro.

To exemplify, this quotation:

val q = quote(1)

is expanded to:

val q = new Quoted[Int] {
    @QuotedAst(Constant(1))
    def ast = Constant(1)
}

When q is used within another quotation, Quill obtains the QuotedAst annotation from the term type and is able to expand the original AST locally.

Note that this approach has an important limitation. If the user uses type widening:

val q: Quoted[Int] = quote(1)

the type refinement information is lost and Quill has to fall back to runtime query generation using the ast method.

I’d say that this usage of whitebox macros is a workaround and could be better handled by the inline keyword initially proposed with the new macros system. Regardless of type widening, the user could declare quotations as inline values:

inline val q: Quoted[Int] = quoted(1)

and wherever this value is used, the tree quoted(1) is expanded locally, giving access to the original AST.

Is inline still being considered? I’ve heard different answers from different people about this feature.

seratch · October 29, 2017, 12:28pm

ScalikeJDBC uses white box def macros to validate names of selectDynamic calls under a particular set of conditions. The set of allowed names by the validator macros is corresponding to the primary constructor argument names of a class specified as the type parameter of SQLSyntaxSupport trait.

Here is a quite simple example:

import scalikejdbc._

// id, name are possible dynamic names
case class Account(id: Long, name: Option[String]) 

object Accounts extends SQLSyntaxSupport[Account] {}

val a = AccountFinder.syntax("a")

val accounts: Seq[Account] = {
  withSQL {
    select(
      a.result.id, // a.result.selectDynamic call validated by whitebox def macros
      a.result.name
    ).from(Accounts as a)
     .where(a.name.like("Bob%")) // a.selectDynamic call validated
  }.map { r => 
    Account(
      id = r.get[Int](a.resultName.id), // a.resultName.selectDynamic call validated
      name = r.get[Option[String]](a.resultName.name)
    )
  }.list.apply()
}

If we can achieve the same goal without using Dynamic in the future, that should be much better.