Whitebox def macros

Sounds like the most natural way to do it would be to just have type macros. Then whitebox macros are just blackbox macros with a return type that is a macro invocation.

def myWhitebox[A](a: A, str: String): MyReturn[A, str.type] = macro ...
type MyReturn[A, S <: String with Singleton] = macro ...

It’s a nice separation of concerns. But I’m afraid there are a lot of whitebox macros in the wild where both code generation and type refinement are very much intertwined, because they’re semantically inseparable. In the case of Squid, what I’d do is to parametrize the current macro to either just compute a type or do the full code generation; but that would mean a lot of computation would be duplicated (I would have to parse, transform, typecheck and analyse the quasiquote string in both type signature and code-gen macro invocations), and batch compile times would be strictly worse.

1 Like

To add onto what @LPTK wrote, I’d speculate that there are very few whitebox macros for which the signature macro could be easily separated from the term macro without a lot of code duplication and/or redundant work. An alternative approach may be to conflate the signature macro and the term macro. The macro expansion could return a tuple of (List[c.Type], c.Tree) where the list of types must contain exactly as many types as there are method type arguments. For example, suppose that I want to implement the CaseClass.toTuple[T] method from above.

object CaseClass {
  def toTuple[C, T](cls: C): T = macro CaseClassMacros.impl[C, T]
}
class CaseClassMacros(val c: Context) {
  import c.universe._
  def impl[C: c.WeakTypeTag, T](cls: c.Expr[C]): (List[c.Type], c.Tree) = {
    ...
    val tree = q"""..."""
    val tType: c.Type = ???
    val resultTypes = List(weakTypeOf[C], tType)
    (resultTypes, tree)
  }
}

The typechecking of the returned tree could be deferred until after the compiler has verified that result types are valid. There would be no need to re-expand the macro using the result types since the type T is a functional dependency of C.

While this is less conceptually elegant than having independent signature and term macros, I think that it would be more practical for macro authors.

1 Like

When thinking about macros I have found it useful to consider two dimensions:

First dimension: What is the expressive power of the macro language?

  1. Inlining only
  2. Purely functional, interpreted subset of the language with heavy sandboxing
  3. Full power of the underlying (compiled) language

Second dimension. When should this power be available?

  1. Only in a specialized version of the language
  2. In every build
  3. After every editor/IDE keystroke

Scala with whitebox macros is currently at the extreme point (3, 3) of the matrix. This is IMO is a very problematic point to be on. Having the full power of the underlying language at your disposal means your editor can (1) crash, (2) become unresponsive, or (3) pose a security risk, just because some part of your program is accessing a bad macro in a library. That’s not hypothetical. I still remember the very helpful(?) Play schema validation macro that caused all IDEs to freeze.

Scala with blackbox macros is at (3, 2). This is slightly better as only building but not editing is affected by bad macros and you can do a better job of isolating and diagnosing problems. But it still would make desirable tools such as a compile server highly problematic because of security concerns.

If we take other languages as comparisons they tend to be more conservative. Template Haskell lets you do lots of stuff, but it is its own language. I believe that was a smart decision of the Haskell designers. Meta OCaml is blackbox only and does not have any sort of inspection, so it’s essentially compile-time staging and nothing else.

So, if Scala continued to have whitebox macros it would indeed be far more powerful than any other language. Is that good or bad? Depends on where you come from and what you want to do, for sure. But I will be firmly in the “it would be very bad” camp. In the future, I want to concentrate on making Scala a better language, with better tooling, as opposed to a more powerful toolbox in which people can write their own language . There’s nothing wrong with toolboxes, but it’s not a primary goal of Scala as I see it.

Given this dilemma, maybe there’s no single solution that satisfies all concerns. That was the original motivation of the inline/meta proposal in SIP 29: Have only inlining available as a standard part of the language. Inlining does a core part of macro expansion (arguably, the hardest part to implement correctly). Then build on that using meta blocks that are enabled by a special compiler mode or a compiler plugin. If we have only blackbox macros the plugin can be a standard one which simply runs after typer. With whitebox macros the “plugin” would in fact have to replace the typer, which is much more problematic. I believe it would in effect mean we define a separate language, similar to Template Haskell. That’s possible, but I believe we need then to be upfront about this.

1 Like

One thing to add to my previous comment: Some form of type macros (or, as @retronym calls them, signature macros) might be a good replacement for unfettered whitebox macros. Dotty’s inline essentially does two things:

  • beta reduction of inline function application
  • simplification of if-then-else with statically known conditions

In the type language, we already have beta-reduction. If

type F[X] = G[X]

then F[String] is known to be the same as G[String]. If we add some form of condiional, we might already have enough to express what we want, and we would stay in the same envelope of expressive power.

To get into the same ballpark in terms of expressiveness, I think you’ll also need some form of recursion purely at the type level, which is not currently possible:

type Fix[A[_]] = A[Fix[A]]

illegal cyclic reference: alias [A <: [_$2] => Any] => A[Fix[A]] of type Fix refers back to the type itself

Wouldn’t supporting this potentially break the type system pretty badly?

A minor nitpick:

Actually, MetaOCaml is not related to macros. It’s essentially for generating and compiling code at runtime (traditional multi-stage programming) –– though it’s true that the approach was ported to compile-time with systems such as MacroML, or more recently modular macros.

@LPTK Yes, we’d have to add some form of recursion to type definitions, with the usual complications to ensure termination.

You are right about Meta OCaml. I meant OCaml Macros: https://oliviernicole.github.io/about_macros.html

For implicit macros, this would lend itself to more efficient typechecking.

Indeed. But, furthermore we have by now decided that every implicit def needs to come with a declared return type. This restriction is necessary to avoid puzzling implicit failures due to cyclic references. So, it seems whatever is decided for whitebox macros, implicit definitions in the future cannot be whitebox macros.

We use whitebox macros to compile db queries and return query result as typed rows, i.e. db query string also serves as a class definition.
For example:
scala> tresql"emp[ename = ‘CLARK’] {ename, hiredate}".map(row => row.ename + " hired " + row.hiredate) foreach println
select ename, hiredate from emp where ename = 'CLARK’
CLARK hired 1981-06-09

Is there a way to achieve this without whitebox macros?

We use whitebox macros to do symbolic computation (using a Java library called Symja) at compile-time.
As we have no idea what the final function/formula is going to look like, we cannot define a fixed return type.
I’d also be interested if there’s a way to do this without whitebox macros.

I would also ask the guys at Quill http://getquill.io/. I think they use whitebox macros quite heavily and it would be a shame if Quill will not be able to work with Dotty due to this reason since its doing an excellent job at solving the problem its solving (strong statically typed SQL that is also performant). I will ping the guys so they can provide their opinions/feedback

I think Scala is very DSL friendly and I praise it a lot in that sense, since it is my primary usage for the language (to create a custom DSL). To discard that away would be a shame, IMHO.

1 Like

We heavily use whitebox macros in the singleton-ops library for the same reason. It may be possible to create a language feature that supports this type of thing, but currently macros is all we got.

Quill uses a whitebox macro to encode type-level information about the original AST of a quotation. This mechanism allows Quill to generate queries at compile time, providing quick feedback to the user about the final SQL query and almost zero runtime overhead. It also opens the path for more advanced features like compile-time query probing. Example:

When testDB.run is called, the macro only knows that a term q is being used. The macro system doesn’t provide a way to allow inspection of the original AST of q. To workaround this limitation, Quill encodes the original AST information as a type annotation of the type refinement generated by the quote macro.

To exemplify, this quotation:

val q = quote(1)

is expanded to:

val q = new Quoted[Int] {
    @QuotedAst(Constant(1))
    def ast = Constant(1)
}

When q is used within another quotation, Quill obtains the QuotedAst annotation from the term type and is able to expand the original AST locally.

Note that this approach has an important limitation. If the user uses type widening:

val q: Quoted[Int] = quote(1)

the type refinement information is lost and Quill has to fall back to runtime query generation using the ast method.

I’d say that this usage of whitebox macros is a workaround and could be better handled by the inline keyword initially proposed with the new macros system. Regardless of type widening, the user could declare quotations as inline values:

inline val q: Quoted[Int] = quoted(1)

and wherever this value is used, the tree quoted(1) is expanded locally, giving access to the original AST.

Is inline still being considered? I’ve heard different answers from different people about this feature.

7 Likes

ScalikeJDBC uses white box def macros to validate names of selectDynamic calls under a particular set of conditions. The set of allowed names by the validator macros is corresponding to the primary constructor argument names of a class specified as the type parameter of SQLSyntaxSupport trait.

Here is a quite simple example:

import scalikejdbc._

// id, name are possible dynamic names
case class Account(id: Long, name: Option[String]) 

object Accounts extends SQLSyntaxSupport[Account] {}

val a = AccountFinder.syntax("a")

val accounts: Seq[Account] = {
  withSQL {
    select(
      a.result.id, // a.result.selectDynamic call validated by whitebox def macros
      a.result.name
    ).from(Accounts as a)
     .where(a.name.like("Bob%")) // a.selectDynamic call validated
  }.map { r => 
    Account(
      id = r.get[Int](a.resultName.id), // a.resultName.selectDynamic call validated
      name = r.get[Option[String]](a.resultName.name)
    )
  }.list.apply()
}

If we can achieve the same goal without using Dynamic in the future, that should be much better.

If I understand correctly, what Quill needs to do specifically is to pass data between macro callsites, data created from the AST of one callsite, and used to create the AST of another callsite.

If you split out whitebox macroing into “type-level computation” and “AST computation”, you won’t be able to generate the necessary data Quill needs during the type-level computation because it depends on the exact AST captured by the macro.

What quill needs is for ways for the AST computations to communicate, which is currently hacked together by shoving data onto the types and unpickling it later, but could plausibly be done with a dedicate mechanism to support that. In which case the bodies of each blackbox macro callsite will have some “side channel” to pass information to each other, but would be guaranteed to not affect the rest of the typer.

Perhaps inlining of ASTs is one such side-channel, or perhaps instead of each AST node having a Type, it would have a tuple of (Type, SideChannelData), where `SideChannelData can be seen an acted upon by macro callsites, but is guaranteed to be ignored by the main typechecker. Hence it could possibly be used to customize codegen, or perform additional validations (in other macros), but it could never e.g. make a typechecking that would otherwise fail, pass because of this data.

An AST computation -> AST computation “side channel” data flow may seem ad-hoc, but nevertheless avoids all the problems that people don’t like about whitebox macros: tooling support, separation of typechecking & codegen, etc… Tools that ignore the SideChannelData would nevertheless be able to typecheck everything successfully; perhaps only missing out on additional errors that macros may generate when using this side channel data for validation.

If we want to expose this side channel data in an IDE, they can be taught how to recognize it, while IDEs which do not recognize it can ignore it and still generate a complete understanding of the “rest” of the code.

Notably, I remember the Parboiled guys wanted to do similar things to optimize parsers across multiple parser rule(...) calls. IIRC were exploring a type-refinement-based mechanism similar to what Quill uses (for some reason I cannot dig up the references right now) as well as build-time code generation (https://github.com/alexander-myltsev/sbt-parboiled2-boost).
They, too, “just” need AST computation -> AST computation data flow: they want their parsers to be able to optimize based on other parsers they call. The “rest of the world” can typecheck without knowing the details inside each parser, just like how it can typecheck withot know the details inside each Quill query

1 Like

Emma uses whitebox macros in a similar way to Quill.

storm-enroute coroutines uses whitebox macros; although I’m not entirely sure if they couldn’t do with blackbox as well. I asked on the Gitter channel, because they break analysis in IntelliJ, but got no answer.

Oh, no. Many of my libraries will have gone then.

2 Likes

I agree to remove whitebox macro in 2.12.4, as long as compiler plugins have the ability to hack the typer.

I actually found that AnalyzerPlugin is a replacement of whitebox macro. Please keep AnalyzerPlugin if whitebox macro will gone.

So I talked to Flavio and it’s pretty clear that we don’t need side-channel data to implement Quill. The current capabilities with inline should be enough. What would really useful for us is a quasai-quoting mechanism.

2 Likes