Sequencing in for-comprehensions

Sequencing in for-comprehensions

It is often possible and desirable to make multiple extractions without doing
flatMap steps. This can be achieved by making Cartesian products repeatedly or
sequencing lists of objects. Alternative way for avoiding flatMaps is to devise
special auto-generated syntax like for Validation in scalaz. In general, however
sequencing can work as a general abstraction for such purposes.

Sequencing objects instead of flatMapping them can solve performance or
usability problems in many cases:

  • when used with collections, sequence on nested collections allows us to
    compute resulting collection size up-front instead of dynamic resizing during
    flatMapping
  • when used with parsers it can allow for fusing many steps together
  • when doing validations flatMapping doesn’t really make sense if the validation
    fails early - doing sequencing allows to run all steps and combine the errors
  • most importantly, sequencing will allow to run many Futures in parallel
    without resorting to any ugly or deficient workarounds

Consider such example:

for {
  a <- aFut()
  c <- cFutF(a)
  d <- dFutF(a)
  b <- bFut(c, d)
} yield {
  a + b
}

It is immediately seen that values c and d can be computed in parallel,
without a flatMap step between them. We can opt for zipping, but this doesn’t
scale well for bigger counts of independent futures. Therefore we use sequencing
and end up with this:

for {
  a <- aFut()
  Seq(c, d) = Future.sequence(Seq(cFutF(a), dFutF(a)))
  b <- bFut(c, d)
} yield {
  a + b
}

Above syntax is a bit cluttered. With more independent futures stuffed into
sequencing operation it would be easy to make mistakes and it is hard to follow
easily what comes from what. We can improve the readability a bit at a cost of
verbosity:

for {
  a <- aFut()
  cFut = cFutF(a)
  dFut = dFutF(a)
  Seq(c, d) = Future.sequence(Seq(cFut, dFut))
  b <- bFut(c, d)
} yield {
  a + b
}

The downside of the above solution, besided the obvious boilerplate is that we
need to create new identifiers cFut and dFut which pollute the current
scope.

A better solution would be to introduce a special syntax where consecutive <|
(left triangle) steps are fused with a <- (light left arrow) step that comes
after them. Fusing here means replacing steps (more than one) with a
Future.sequence construct described before.

The final version with enhanced for-comprehensions would then look like this:

for {
  a <- aFut()
  c <| cFutF(a)
  d <- dFutF(a)
  b <- bFut(c, d)
} yield {
  a + b
}

Performance considerations

Because sequence call contains all fused steps at once, optimizations are both
more feasible and more efficient than when using cartesian products. For
example:

  • CanBuildFrom passed to sequence can produce arrays with fixed size and not
    care about resizing
  • one call to sequence means less virtual calls and lambda objects creation
    than with repeated cartesian products
  • sequence methods can do some special-casing for small number of parameters
    (usually only a few steps would be fused)

Relation between sequencing and flatMapping

Sequencing should provide the same results as flatMapping. In other words:
replacing <- steps with <| steps (and vice versa) should not change the
meaning of a program. This means that an IDE could automatically suggest to a
developer that he can replace <- steps with <| steps (where possible) for
increased performance.

Where sequence lies?

This is possibly a silly question, but worth answering. sequence is of course
in companion object. Currently it is available for Future class. Scalaz
provides sequence through implicit conversions. I have verified that for
scalazs implementation of sequence on list of lists the law of replacing
<- steps with <| steps would hold.

What can be done without language change?

For the Future class contained in the standard Scala library the situation is
somewhat interesting. Sequencing a list of futures (or alternatively - zipping
them repeatedly) produce a future that will complete after all sequences futures
complete. Therefore instead of deconstructing the list of values in a for
comprehension we can build an evidence that we will use to extract the computed
value from futures directly. Full working example of that technique is there:

The syntax that is possible currently is:

for {
  a <- Future(5)
  b = Future(8)
  c = Future(a + 6)
  ev <- Completer(b)(c)
  d <- Future(ev(c) - ev(b))
} yield {
  a + ev(b) + ev(c) * d
}

For me this type of workaround is much more readable (please vote if you agree)
than tuple deconstruction like:

for {
  a <- Future(5)
  (b, c) <- Future.product(Future(8), Future(a + 6))
  d <- Future(c - b)
} yield {
  a + b + c * d
}

With the proposed language change we would get rid of the Completer and
evidence and get a clean code:

for {
  a <- Future(5)
  b <| Future(8)
  c <- Future(a + 6)
  d <- Future(c - b)
} yield {
  a + b + c * d
}

Related proposals

Similar proposal is here:

The main difference is however that consecutive Cartesian products don’t offer
the same performance optimization opportunities as sequencing.

Update:
Just to clarify what’s wrong (in my opinion) with list destructuring. It is much more likely to overlook mistakes with such notation:

(width, height) <- Future.product(computeHeight(), computeWidth())

Than with this notation:

width <| computeHeight()
height <| computeWidth()

When we want to make a notation widespread then it should bring more benefits than problems.

Hmm, seems I’ve done a fatal mistake and Future.sequence won’t work (because it won’t preserve concrete types of elements, it will only compute a LUB). Future.product with HLists is needed :slight_smile: But still, the notation could work when we have support for HLists in language.

You say…

That’s pretty vague. but then you clarify with:

I’m not sure why “It’s easy to overlook mistakes”, but either way, that doesn’t mean “Cartesian products don’t offer the same performance optimization opportunities as sequencing”. That’s a non sequitur.

Also, the original proposal wasn’t based on using product, but on using a different syntax for binding—a <- aa with instead of a <| aa. Hence, it appears you’re suggesting a different concrete syntax for the same underlying idea. Maybe that’s a better syntax (there was lots of debate over with), but there have been “subproposals” for quite a few alternative concrete syntaxes, so any advantage is not obvious.

I’d suggest

  1. to review more closely that proposal;
  2. keep in mind that it’s extremely hard to make compelling arguments about concrete syntax. PL researchers typically say “syntax doesn’t matter”. In fact, the truth is “we have next to no clue on a science of concrete syntax”;
  3. if you have something compelling to add, please contribute that thread.

I might be missing something—but as-is the proposal doesn’t look convincing.

These two sentences were not related.

Actually the argument about performance was comparing different ways to achieve Cartesian products of more than two arguments. The approach from related proposal is to make products using chained method invocations:

a.product(b).product(c).product(d)

but then I realized that without HLists it’s not possible to do products of arbitrary number of arguments in single step while preserving types. That means that proposal boils down to just a syntactic sugar over a proposed syntax for Dotty:

(a, b, c, d) <- Future.product(aFut, bFut, cFut, dFut)

Doing a product using chained method calls means that until we get to the last product in chain then the previous ones can’t skip computations that lead to the final product. For example if we are doing products on three arrays:

val x = Array(1, 2, 3)
val y = Array(4, 5, 6)
val z = Array(7, 8, 9)

Then by doing a product of all of them at once we can allocate a final array of size 3 * 3 * 3 instead of first allocating a temporary array of size 3 * 3 and then final array of size 3 * 3 * 3 that would be the result of chained product calls.

After realizing that

(a, b, c, d) <- Future.product(aFut, bFut, cFut, dFut)

covers the improvements I’ve thought about aside from syntactic sugar I think that this thread will only confuse people. I’ll try to remove it.

Update:
How to remove a thread?