New syntax: applicative desugaring in for comprehensions (with PR)

ghik · April 1, 2017, 11:11am

Hello everyone,

Just to make things clear: this is not April fools

Inspired by comonadic comprehensions proposal and the recent success of Comma McTraily I was thinking about how we could express applicative comprehensions in Scala. This could be useful not only for people using typelevel style (I suppose, I’m not one of them) but also for mere mortals like me and people to which the word “applicative” says absolutely nothing. And I think I may have found a simple and totally compatible way to do this.

Basic proposal

So here’s what I propose: let’s reuse the with keyword to express that some computations in a for comprehension are “independent”, which means that one does not depend on the result of another and therefore, they can be executed “together” or “in parallel”:

for {
  a <- aa with
  b <- bb with
  c <- cc
  d <- dd
} yield a + b + c + d

Desugaring of this syntax merges independent computations using product operation. The example above would be equivalent to:

for {
  ((a, b), c) <- aa product bb product cc
  d <- dd
} yield a + b + c + d

The product operation merges two independent computations into a single computation whose result is a pair, i.e. (F[A], F[B]) => F[(A,B)]. It is yet to decide whether product is the best name - I chose it because I saw that Cats calls it like that.

Syntax details

Currently, the specification defines for comprehension syntax as follows:

Expr1          ::=  `for' (`(' Enumerators `)' | `{' Enumerators `}')
                       {nl} [`yield'] Expr
Enumerators    ::=  Generator {semi Generator}
Generator      ::=  Pattern1 `<-' Expr {[semi] Guard | semi Pattern1 `=' Expr}
Guard          ::=  `if' PostfixExpr

My proposal would only change the definition of Generator to:

Generator      ::=  Pattern1 `<-' Expr {with Pattern1 `<-' Expr} {[semi] Guard | semi Pattern1 `=' Expr}

As we can see, the proposed change is minimal, and - more importantly - fully backwards compatible. The only potential doubt I had is the overloaded meaning of the with keyword (which already has two meanings). There are some corner cases in which it may be necessary to explicitly disambiguate between them, e.g.

for {
  // we must use parens to disambiguate two meanings of `with` keyword
  a <- (new A with B) with
  b <- bb
} ...

Use cases

Parallel Futures in for comprehensions
Currently, in order to use two or more concurrently executed Futures in a for comprehension, we must first assign them to intermediate variables:

val af = Future { ... }
val bf = Future { ... }
val cf = Future { ... }
val sumf = for {
  a <- af
  b <- bf
  c <- cf
} yield a + b + c

However, if we defined product for Futures simply as zip, we could instead write:

val sumf = for {
  a <- Future { ... } with
  b <- Future { ... } with
  c <- Future { ... }
} yield a + b + c

I believe this would be a real benefit for most Scala programmers - you don’t need to know anything about applicatives to find this syntax useful.

Batching of operations
When some computations are independent, they can often be batched into a single larger operation. For example, multiple independent network operations could be sent as a single network message.
As an example, here’s a hypothetical API for Redis transactions that could be possible to implement with the new syntax:

for {
  _ <- multi() with
  a <- get("a") with
  set <- smembers("someSet") with
  _ <- incr("b") with
  _ <- exec()
  _ <- doSomethingMore(a, set)
} yield ...

A Redis transaction consists of multiple Redis commands invoked atomically, as a whole. Therefore they cannot use each other’s results and it’s natural to send them all at once through the network. The new syntax allows us to express this type-safely and at the same time have result of every command in the transaction assigned to separate identifier so that we can access them without unnecessary ceremony.

Considerations

As already mentioned, the overload of with keyword could possibly be confusing. Originally I considered to use comma but then realized that it may be error-prone and unreadable, because it may be hard to spot the commas.
Introduction of product as magic method has some downsides. Collections already define product with completely different meaning. Although applicative composition doesn’t seem very useful with collections, the name conflict may still cause very confusing compilation errors. So maybe we should choose different name or even completely different desugaring.
Maybe we should allow the with keyword to be also used after guards and assignments. Currently it’s not possible in order to keep things as simple as possible.

Implementation

Finally, the time has come for some hard arguments

I believe to have already implemented the proposed change. The POC pull request is here: https://github.com/scala/scala/pull/5819

I’m quite happy about it because the changes turned out to be simple. Unless I’m missing something, there’s also full backwards compatibility of scala-reflect module. Quasiquotes understand the new syntax, too. The most complex part to implement was the un-desugaring of for comprehensions used by reification.

—

I’m really eager to hear your feedback about this. I’m also aware that if this change were to make it into the language, this would require a SIP and all the procedure around it.

Cheers,
Roman

ShaneDelmore · April 1, 2017, 5:48pm

I would love some syntax to help with this common pattern. Overall I like the proposal but I do find the fact that it is so similar to the current syntax that it doesn’t obviously convey that these things are happening in parallel, and that we will not proceed until they have all finished. Personally I would prefer something possibly a big uglier, but more obvious like:

for {
  (_, a, set, _, _) <- multi() zip
                       get("a") zip
                       smembers("someSet") zip
                       incr("b") zip
                       exec()
  _ <- doSomethingMore(a, set)
} yield ...

odersky · April 1, 2017, 6:18pm

It looks to me what you really want is better integration of HLists. If we could define a generic product (i.e. working over tuples of arbitrary length), we could write

for {
   (a, b, c) <- product(aa, bb, cc)
   d <- dd
} yield a + b + c + d

Since in dotty tuples are HLists (at least once Olivier’s PR is in), this will become possible. So maybe we don’t need additional syntax here, after all.

ghik · April 1, 2017, 10:25pm

If I understand correctly, better integration of HLists would let us have a single tuple instead of multiple nested pairs. That’s an improvement but it’s something already possible with current Scala (with more boilerplate) and it’s not really the problem I wanted to solve.

What annoys me specifically is that currently I am forced to put all these identifiers into a single tuple. In our abstract examples given in this thread, all the identifiers are very short, but in real code they should be more descriptive and longer. At this point having to put them all into a single tuple turns out to be very clunky.

nafg · April 2, 2017, 7:54am

Can you back up and give a good explanation of what problem you’re trying to solve, and then can we have a robust discussion about the various ways it could be solved and weigh their pros and cons?

ghik · April 2, 2017, 1:20pm

I may have made my intent unclear by mixing applicative (product) and monadic (flatMap) desugarings in a single for comprehension.

Pure applicative comprehensions

Let’s back up to something more basic and abstract: I’d like to have a for comprehension syntax for applicative functors, where flatMap is not available. In other words, I want a feature analogous to Haskell’s ApplicativeDo.

For example, instead of writing

(lengthyExpressionA product lengthyExpressionB product lengthyExpressionC) map {
  case ((meaningfulNameA, meaningfulNameB), meaningfulNameC) => 
    meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC
}

or (assuming HList integration mentioned by Martin)

product(lengthyExpressionA, lengthyExpressionB, lengthyExpressionC) map {
  case (meaningfulNameA, meaningfulNameB, meaningfulNameC) => 
    meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC
}

or (using single-step for comprehension)

for {
  (meaningfulNameA, meaningfulNameB, meaningfulNameC) <- 
    product(lengthyExpressionA, lengthyExpressionB, lengthyExpressionC)
} yield meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC

or (using cats/scalaz syntax for applicatives)

(lengthyExpressionA |@| lengthyExpressionB |@| lengthyExpressionC) map {
  (meaningfulNameA, meaningfulNameB, meaningfulNameC) => 
    meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC
}

I’d prefer to write:

for {
  meaningfulNameA <- lengthyExpressionA with
  meaningfulNameB <- lengthyExpressionB with
  meaningfulNameC <- lengthyExpressionC
} yield meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC

which looks more natural and readable to me because:

there is a clear sequence of computation-to-result assignments (id <- expression)
every line contains a “step” readable by itself - it doesn’t get syntactically mixed with other steps
it’s clear and unambiguous to the reader which identifier represents result of which step
there is no need for any tuple construction-deconstruction ceremony
the explicit with keyword, in my opinion, states clearly enough that the steps are executed “together” and may be parallel

Hybrid applicative-monadic comprehensions

Additionally, a sequence of computations joined using with keyword may seamlessly become a part of some longer for comprehension (which may contain flatMaps) and result of every applicative step stays available to be used by further computations in that comprehension. This point is what I wanted to show with my previous examples with Futures and Redis transactions.

Summary

To put in a summary why I think this syntax could be worth adding to the language:

Sequences that include both monadic and applicative computations can now be expressed using a single for comprehension.
Adding the with keyword between steps in a for comprehension is both explicit and non-intrusive way of stating that the steps are independent and may be parallel. Oh, these async steps in my for comprehension could be parallel… But I’d like to keep them in a nice sequence as they are now… No problem, let’s just put some withs between them!
As I have already shown in my original message, changes to the language formal syntax are minimal - it’s just a small and in my opinion natural extension of existing construct.
As I hope to have shown with the pull request, implementation is also simple and adds minimal complexity to the compiler
The change is completely backwards compatible
I don’t see any bad interactions of this feature with other language features

Ichoran · April 2, 2017, 7:10pm

I am not very enthusiastic about this change because

It turns with into a postfix operator if you are thinking line-by-line (and that is what you’re touting as an advantage of using for)
The collision with existing usage is unfortunate
Mixtures of applicative and monadic for seem ripe for confusion.
The desugaring is new, which may make it apply weirdly and surprisingly to things that one never intended to be applicative functors.

I agree that keeping the names local is helpful. I’m not sure it’s worth it, though. Something like

for {
  { meaningfulNameA <- lengthyExpressionA } |@|
  { meaningfulNameB <- lengthyExpressionB } |@|
  { meaningfulNameC <- lengthyExpressionC }
} yield meaningfulNameB*meaningfulNameB - 4*meaningfulNameA*meaningfulNameC

seems a lot more like what one would want (and one would want it to work in general, e.g. with zip also).

ShaneDelmore · April 2, 2017, 7:25pm

I don’t love the postfix usage, I think I might even prefer

for {
  meaningfulNameA <- lengthyExpressionA
  with meaningfulNameB <- lengthyExpressionB
  with meaningfulNameC <- lengthyExpressionC
} yield meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC

But overall I would want a different word for ExecuteConcurrently. MeaninfulNameA is not getting the result of lengthyExpressionA With lengthyExpressionB. Can you think of another keyword to express the concept rather than overloading With?

We often use (operations) to express do these things first but I understand that that is not meeting your goals in this case because any names created and assigned inside of the () would not be able to be referenced outside of it. Otherwise maybe something like this would work:

for {
  Concurrently(
    meaningfulNameA <- lengthyExpressionA
    meaningfulNameB <- lengthyExpressionB
    meaningfulNameC <- lengthyExpressionC
  )
} yield meaningfulNameB * meaningfulNameB - 4 * meaningfulNameA * meaningfulNameC

If I am understanding correctly, is the biggest issue that there is currently no way to say “do all of these assignments concurrently and keep their names in scope” vs tuple them all and then name them during unapply?

ghik · April 2, 2017, 9:21pm

I agree, both your versions are better, syntactically. The first one seems to be as simple to do as my original proposal. The second one with explicitly demarcated applicative block is the best but I can’t see how it can be done without introducing new keywords.
My usage of with is not ideal but it has the advantage of being fully backwards compatible.
And yes, the primary goal is to avoid tuple construction-deconstruction while keeping names in scope.

ghik · April 2, 2017, 9:42pm

@Ichoran
Agree about 1) and 2) - I didn’t want to introduce new keywords.
3) Would that argument still apply if applicative sections would have their own explicitly demarcated block, like in @ShaneDelmore’s example?
4) Not sure what you mean here. There shouldn’t be any accidents until some actually uses this syntax which means that he/she actually wants to treat these things as applicatives, right?

You’re saying that the benefit may not be worth the cost. Do you think it would be worth to introduce a new keyword and have something similar to the example below?

for {
  a <- aa
  apfor {
    b <- bb(a)
    c <- cc
    d <- dd
  }
  e <- ee(a, d)
} yield c + e

lihaoyi · April 3, 2017, 12:31pm

For what it’s worth, I prefer this syntax over the with syntax being proposed. I think the monadic/applicative

(a, b, c) <- product(aa(), bb(), cc())
d <- a + b + c

Has a nice symmetry, both syntactically and semantically, with the non-monadic/non-applicative

val (a, b, c) = (aa(), bb(), cc())
val d = a + b + c

Whereas

a <- aa() with
b <- bb() with
c <- cc()
d <- a + b + c

Doesn’t have any real analog in non-monadic/applicative land. After all, there isn’t any

val a = aa() with
val b = bb() with
val c = cc()
val d = a + b + c

syntax, although the same argument that the applicative-for syntax is an improvement over a product/zip tuple applies also would benefit some sort of applicative-val = syntax.

That’s not to say I think you must have symmetry between these two things, but I like the symmetry, enough so to prefer the (a, b, c) <- product(aa(), bb(), cc()) syntax over the proposed with syntax.

scottcarey · April 4, 2017, 10:05pm

For those that don’t like overloading with: an alternative not mentioned was encoding the difference in the expansion operator, e.g. <- is monadic expansion, <+ is applicative expansion, rather than using a keyword.

for {
  a <+ Future { aa() }
  b <+ Future { bb() }
  c <+ Future { cc() }
} yield (a * b) - (4 * a * c) + 42

which could de-sugar to various proposals above. This avoids overloading a keyword (but eats up <+ instead).

odersky · April 6, 2017, 9:38am

It seems we are in violent agreement.

lihaoyi · April 6, 2017, 10:52am

No violence, just agreement =) I just wanted to put down my reasoning since i haven’t seen it said before

ghik · April 6, 2017, 11:12am

I understand objections to overloading the with keyword. As I said earlier, my intention was to keep the changes minimal and compatible. We can think of many different syntaxes which introduce new keywords or symbols - I didn’t want to do it because I felt like this would cause even more objections.

However, the current status quo which is zipping computations and deconstructing tuples is very hard for me to accept.

jducoeur · April 6, 2017, 11:58am

Agreed – I find the zip approach quite annoying in practice – but I’m afraid I’m with Martin on this one. Granted, it requires waiting for Dotty, but I think his product suggestion is clearer than the with approach and even cleaner in terms of not needing to introduce new stuff. (Given that the HList changes are coming anyway.) So I’d counsel patience and say we do this right in Scala 3.

(Although, all that said, it does increase even further the pressure to get Dotty stabilized and start concretely moving towards Scala 3…)

jvican · April 6, 2017, 12:06pm

In general, I think that new syntax is unlikely to get accepted into new versions of Scala (I emphasize that it’s unlikely, not impossible). I agree with the previous comments that overloading with would be undesirable. In my opinion, Scala syntax is complicated and we should strive hard to avoid semantic overloads.

I don’t see it mentioned here, but have you tried implementing this feature as a compiler plugin? I don’t know how feasible it is, but it sounds like a good way to move forward and experiment with this feature. You could even introduce new syntax that does not rely on the with subtlety.

If you get a SIP done soon, we may be able to discuss it this month in our SIP meeting. But honestly, to spare you work, I would suggest that you first find ways to improve this proposal or address the community feedback. This will ease review on the SIP Committee substantially .

Lastly, thanks a lot for bringing this together and providing an implementation! I strongly encourage people to provide implementations along their proposals – like it or not, they make a difference for the Committee and the Community feedback.

ghik · April 6, 2017, 5:31pm

To be clear: I’m not trying to push this change anymore. Objections to usage of with are valid and I can’t think of any other acceptable syntax which wouldn’t introduce new keywords or symbols. There were plans to intruduce apfor as a follow-up to comonadic comprehensions, so I keep my hopes in it.

Syntax proposed by Martin is already possible, but with more underlying boilerplate. However, it still requires tupling and untupling, which is the problem I wanted to solve.

I’m not interested in doing this as compiler plugin, because syntactic change like this also needs to be implemented in all the language tools and IDEs. If it isn’t included into the language itself then I don’t see much value in it. Besides, hijacking parser in Scala compiler plugins is possible but very dirty. This is not the kind of feature suitable for a compiler plugin.

tpolecat · April 6, 2017, 7:22pm

FWIW both scalaz and cats provide equivalent syntax for this. Your

(a, b, c) <- product(aa(), bb(), cc())

can be written as

(a, b, c) <- (aa() |@| bb() |@| cc()).tupled

This relies on the Apply typeclass. Is your intent that product be a macro so it’s syntax-based? Or actual language syntax?

nafg · April 6, 2017, 8:39pm

Here’s another approach. IIUC it’s better with inline/meta macros since in current scala macros, for comprehensions are already desugared by the time the macro receives them. It should still be possible, but it would be more difficult.

Basically, you have an annotation macro that lets you write

@applicative
for(a <- aa; b <- bb) yield a + b

and it will desugar it to

(aa |@| bb)[ (a, b) => a + b }

Or it could be a regular macro, so

applicative(for(a <- aa; b <- bb) yield a + b)

If it would be operating on the for-comprehension level, all it has to do is check that all the left-hand sides of <- are not used until the yield clause, and then the rewrite would be straightforward.

If you’re dealing with desugared form, then essentially you have to be able to rewrite code of the form

aa.flatMap(a => bb.map(bb.flatMap(b => cc.map(c => a + b + c)))

So you have a nested chain of flatMaps that ends with a map, and you ensure that all the flatMap function parameters aren’t used until the final map, and again you grab the receivers of flatMap and map, join them with |@|, and then put the RHS of the map function in the apply function.

Of course there might be more complicated scenarios, but at least for an MVP you can just throw an error if you get unexpected code.