Whitebox def macros

I’d like a way for whitebox macros authors to be able (although not neccesarily obliged) to separate the part of the macro that computes the return type from the part that computes the expanded term. Let’s call the first part “signature macros”.

For implicit macros, this would lend itself to more efficient typechecking. Even for non-implicit macros, an IDEs could be more efficient if they could just run the “signature macro”.

I think that this separation also will help to shine a light on whether the full Scala language is the right language for signature macros, or if a more restrictive language could express a broad set of use cases of whitebox macros.

I suppose the contract would be that if the signature macro returned a type and no errors, the corresponding term expansion macro would be required to succeed and to conform to the computed return type.

Obviously a naive implementation of the signature macro is to just run the term macro and typecheck it, as per the status quo. I think we should aim higher than that, though!

5 Likes

Ryan Culpepper recently suggested essentially the same thing that you call “signature macros” two weeks ago! …Really glad to hear this suggestion; means that at least a subset of us are thinking along the same lines :slight_smile:

cc/ @olafurpg

1 Like

Not currently. We had a prototype system that perhaps did something like that (not sure): it was a system for statically generating evidence that structural types did not contain certain names or were disjoint in terms of field names. For example, you could write def foo[A,B](implicit dis: A <> B) meaning that A and B are structural types that share no field names. You could then call foo[{val x:Int},{val y:Double}] but not foo[{val x:Int},{val x:Double}]. When extendind an abstract context C as in C{val x:Int}, the contextual quasiquote macro would look for an evidence that C <> {def x} to ensure soundness in the face of name clashes.
However, instead of porting that old prototype to the current system, we’re probably going to move to a more modular solution, which shouldn’t need any implicit macros.

There is one particularly nasty thing that a Squid implicit macro currently does: it looks inside the current scope to see if it can find some type representation evidence. This allows us to use an extracted type t implicitly as in case ir"Some[$t]($x) => ... implicitly[t.Typ] ... instead of having to write case ir"Some[$t]($x) => implicit val t_ = t; ... implicitly[t.Typ] .... I understand this is probably asking macros for too much, and I think we could do without it (though it may degrade the user experience a little).

About Dynamic, one of the things I’ve used it for was to automatically redirect method calls to some wrapped object (cf. composition vs inheritance style).

Yeah, it’s a judgement call. IMHO Scala is already a language that lets you define a myriad different sub-languages thanks to its flexible syntax and expressive type system. I think that’s one thing many people like about the language (cf., for example, the vast ecosystem of SQL/data analytic libraries that define their own custom syntaxes and semantics).

Sounds like the most natural way to do it would be to just have type macros. Then whitebox macros are just blackbox macros with a return type that is a macro invocation.

def myWhitebox[A](a: A, str: String): MyReturn[A, str.type] = macro ...
type MyReturn[A, S <: String with Singleton] = macro ...

It’s a nice separation of concerns. But I’m afraid there are a lot of whitebox macros in the wild where both code generation and type refinement are very much intertwined, because they’re semantically inseparable. In the case of Squid, what I’d do is to parametrize the current macro to either just compute a type or do the full code generation; but that would mean a lot of computation would be duplicated (I would have to parse, transform, typecheck and analyse the quasiquote string in both type signature and code-gen macro invocations), and batch compile times would be strictly worse.

1 Like

To add onto what @LPTK wrote, I’d speculate that there are very few whitebox macros for which the signature macro could be easily separated from the term macro without a lot of code duplication and/or redundant work. An alternative approach may be to conflate the signature macro and the term macro. The macro expansion could return a tuple of (List[c.Type], c.Tree) where the list of types must contain exactly as many types as there are method type arguments. For example, suppose that I want to implement the CaseClass.toTuple[T] method from above.

object CaseClass {
  def toTuple[C, T](cls: C): T = macro CaseClassMacros.impl[C, T]
}
class CaseClassMacros(val c: Context) {
  import c.universe._
  def impl[C: c.WeakTypeTag, T](cls: c.Expr[C]): (List[c.Type], c.Tree) = {
    ...
    val tree = q"""..."""
    val tType: c.Type = ???
    val resultTypes = List(weakTypeOf[C], tType)
    (resultTypes, tree)
  }
}

The typechecking of the returned tree could be deferred until after the compiler has verified that result types are valid. There would be no need to re-expand the macro using the result types since the type T is a functional dependency of C.

While this is less conceptually elegant than having independent signature and term macros, I think that it would be more practical for macro authors.

1 Like

When thinking about macros I have found it useful to consider two dimensions:

First dimension: What is the expressive power of the macro language?

  1. Inlining only
  2. Purely functional, interpreted subset of the language with heavy sandboxing
  3. Full power of the underlying (compiled) language

Second dimension. When should this power be available?

  1. Only in a specialized version of the language
  2. In every build
  3. After every editor/IDE keystroke

Scala with whitebox macros is currently at the extreme point (3, 3) of the matrix. This is IMO is a very problematic point to be on. Having the full power of the underlying language at your disposal means your editor can (1) crash, (2) become unresponsive, or (3) pose a security risk, just because some part of your program is accessing a bad macro in a library. That’s not hypothetical. I still remember the very helpful(?) Play schema validation macro that caused all IDEs to freeze.

Scala with blackbox macros is at (3, 2). This is slightly better as only building but not editing is affected by bad macros and you can do a better job of isolating and diagnosing problems. But it still would make desirable tools such as a compile server highly problematic because of security concerns.

If we take other languages as comparisons they tend to be more conservative. Template Haskell lets you do lots of stuff, but it is its own language. I believe that was a smart decision of the Haskell designers. Meta OCaml is blackbox only and does not have any sort of inspection, so it’s essentially compile-time staging and nothing else.

So, if Scala continued to have whitebox macros it would indeed be far more powerful than any other language. Is that good or bad? Depends on where you come from and what you want to do, for sure. But I will be firmly in the “it would be very bad” camp. In the future, I want to concentrate on making Scala a better language, with better tooling, as opposed to a more powerful toolbox in which people can write their own language . There’s nothing wrong with toolboxes, but it’s not a primary goal of Scala as I see it.

Given this dilemma, maybe there’s no single solution that satisfies all concerns. That was the original motivation of the inline/meta proposal in SIP 29: Have only inlining available as a standard part of the language. Inlining does a core part of macro expansion (arguably, the hardest part to implement correctly). Then build on that using meta blocks that are enabled by a special compiler mode or a compiler plugin. If we have only blackbox macros the plugin can be a standard one which simply runs after typer. With whitebox macros the “plugin” would in fact have to replace the typer, which is much more problematic. I believe it would in effect mean we define a separate language, similar to Template Haskell. That’s possible, but I believe we need then to be upfront about this.

1 Like

One thing to add to my previous comment: Some form of type macros (or, as @retronym calls them, signature macros) might be a good replacement for unfettered whitebox macros. Dotty’s inline essentially does two things:

  • beta reduction of inline function application
  • simplification of if-then-else with statically known conditions

In the type language, we already have beta-reduction. If

type F[X] = G[X]

then F[String] is known to be the same as G[String]. If we add some form of condiional, we might already have enough to express what we want, and we would stay in the same envelope of expressive power.

To get into the same ballpark in terms of expressiveness, I think you’ll also need some form of recursion purely at the type level, which is not currently possible:

type Fix[A[_]] = A[Fix[A]]

illegal cyclic reference: alias [A <: [_$2] => Any] => A[Fix[A]] of type Fix refers back to the type itself

Wouldn’t supporting this potentially break the type system pretty badly?

A minor nitpick:

Actually, MetaOCaml is not related to macros. It’s essentially for generating and compiling code at runtime (traditional multi-stage programming) –– though it’s true that the approach was ported to compile-time with systems such as MacroML, or more recently modular macros.

@LPTK Yes, we’d have to add some form of recursion to type definitions, with the usual complications to ensure termination.

You are right about Meta OCaml. I meant OCaml Macros: https://oliviernicole.github.io/about_macros.html

For implicit macros, this would lend itself to more efficient typechecking.

Indeed. But, furthermore we have by now decided that every implicit def needs to come with a declared return type. This restriction is necessary to avoid puzzling implicit failures due to cyclic references. So, it seems whatever is decided for whitebox macros, implicit definitions in the future cannot be whitebox macros.

We use whitebox macros to compile db queries and return query result as typed rows, i.e. db query string also serves as a class definition.
For example:
scala> tresql"emp[ename = ‘CLARK’] {ename, hiredate}".map(row => row.ename + " hired " + row.hiredate) foreach println
select ename, hiredate from emp where ename = 'CLARK’
CLARK hired 1981-06-09

Is there a way to achieve this without whitebox macros?

We use whitebox macros to do symbolic computation (using a Java library called Symja) at compile-time.
As we have no idea what the final function/formula is going to look like, we cannot define a fixed return type.
I’d also be interested if there’s a way to do this without whitebox macros.

I would also ask the guys at Quill http://getquill.io/. I think they use whitebox macros quite heavily and it would be a shame if Quill will not be able to work with Dotty due to this reason since its doing an excellent job at solving the problem its solving (strong statically typed SQL that is also performant). I will ping the guys so they can provide their opinions/feedback

I think Scala is very DSL friendly and I praise it a lot in that sense, since it is my primary usage for the language (to create a custom DSL). To discard that away would be a shame, IMHO.

1 Like

We heavily use whitebox macros in the singleton-ops library for the same reason. It may be possible to create a language feature that supports this type of thing, but currently macros is all we got.

Quill uses a whitebox macro to encode type-level information about the original AST of a quotation. This mechanism allows Quill to generate queries at compile time, providing quick feedback to the user about the final SQL query and almost zero runtime overhead. It also opens the path for more advanced features like compile-time query probing. Example:

When testDB.run is called, the macro only knows that a term q is being used. The macro system doesn’t provide a way to allow inspection of the original AST of q. To workaround this limitation, Quill encodes the original AST information as a type annotation of the type refinement generated by the quote macro.

To exemplify, this quotation:

val q = quote(1)

is expanded to:

val q = new Quoted[Int] {
    @QuotedAst(Constant(1))
    def ast = Constant(1)
}

When q is used within another quotation, Quill obtains the QuotedAst annotation from the term type and is able to expand the original AST locally.

Note that this approach has an important limitation. If the user uses type widening:

val q: Quoted[Int] = quote(1)

the type refinement information is lost and Quill has to fall back to runtime query generation using the ast method.

I’d say that this usage of whitebox macros is a workaround and could be better handled by the inline keyword initially proposed with the new macros system. Regardless of type widening, the user could declare quotations as inline values:

inline val q: Quoted[Int] = quoted(1)

and wherever this value is used, the tree quoted(1) is expanded locally, giving access to the original AST.

Is inline still being considered? I’ve heard different answers from different people about this feature.

7 Likes

ScalikeJDBC uses white box def macros to validate names of selectDynamic calls under a particular set of conditions. The set of allowed names by the validator macros is corresponding to the primary constructor argument names of a class specified as the type parameter of SQLSyntaxSupport trait.

Here is a quite simple example:

import scalikejdbc._

// id, name are possible dynamic names
case class Account(id: Long, name: Option[String]) 

object Accounts extends SQLSyntaxSupport[Account] {}

val a = AccountFinder.syntax("a")

val accounts: Seq[Account] = {
  withSQL {
    select(
      a.result.id, // a.result.selectDynamic call validated by whitebox def macros
      a.result.name
    ).from(Accounts as a)
     .where(a.name.like("Bob%")) // a.selectDynamic call validated
  }.map { r => 
    Account(
      id = r.get[Int](a.resultName.id), // a.resultName.selectDynamic call validated
      name = r.get[Option[String]](a.resultName.name)
    )
  }.list.apply()
}

If we can achieve the same goal without using Dynamic in the future, that should be much better.

If I understand correctly, what Quill needs to do specifically is to pass data between macro callsites, data created from the AST of one callsite, and used to create the AST of another callsite.

If you split out whitebox macroing into “type-level computation” and “AST computation”, you won’t be able to generate the necessary data Quill needs during the type-level computation because it depends on the exact AST captured by the macro.

What quill needs is for ways for the AST computations to communicate, which is currently hacked together by shoving data onto the types and unpickling it later, but could plausibly be done with a dedicate mechanism to support that. In which case the bodies of each blackbox macro callsite will have some “side channel” to pass information to each other, but would be guaranteed to not affect the rest of the typer.

Perhaps inlining of ASTs is one such side-channel, or perhaps instead of each AST node having a Type, it would have a tuple of (Type, SideChannelData), where `SideChannelData can be seen an acted upon by macro callsites, but is guaranteed to be ignored by the main typechecker. Hence it could possibly be used to customize codegen, or perform additional validations (in other macros), but it could never e.g. make a typechecking that would otherwise fail, pass because of this data.

An AST computation -> AST computation “side channel” data flow may seem ad-hoc, but nevertheless avoids all the problems that people don’t like about whitebox macros: tooling support, separation of typechecking & codegen, etc… Tools that ignore the SideChannelData would nevertheless be able to typecheck everything successfully; perhaps only missing out on additional errors that macros may generate when using this side channel data for validation.

If we want to expose this side channel data in an IDE, they can be taught how to recognize it, while IDEs which do not recognize it can ignore it and still generate a complete understanding of the “rest” of the code.

Notably, I remember the Parboiled guys wanted to do similar things to optimize parsers across multiple parser rule(...) calls. IIRC were exploring a type-refinement-based mechanism similar to what Quill uses (for some reason I cannot dig up the references right now) as well as build-time code generation (https://github.com/alexander-myltsev/sbt-parboiled2-boost).
They, too, “just” need AST computation -> AST computation data flow: they want their parsers to be able to optimize based on other parsers they call. The “rest of the world” can typecheck without knowing the details inside each parser, just like how it can typecheck withot know the details inside each Quill query

1 Like

Emma uses whitebox macros in a similar way to Quill.

storm-enroute coroutines uses whitebox macros; although I’m not entirely sure if they couldn’t do with blackbox as well. I asked on the Gitter channel, because they break analysis in IntelliJ, but got no answer.