Sequencing in for-comprehensions
It is often possible and desirable to make multiple extractions without doing
flatMap steps. This can be achieved by making Cartesian products repeatedly or
sequencing lists of objects. Alternative way for avoiding flatMaps is to devise
special auto-generated syntax like for Validation in scalaz. In general, however
sequencing can work as a general abstraction for such purposes.
Sequencing objects instead of flatMapping them can solve performance or
usability problems in many cases:
- when used with collections, sequence on nested collections allows us to
compute resulting collection size up-front instead of dynamic resizing during
flatMapping - when used with parsers it can allow for fusing many steps together
- when doing validations flatMapping doesn’t really make sense if the validation
fails early - doing sequencing allows to run all steps and combine the errors - most importantly, sequencing will allow to run many Futures in parallel
without resorting to any ugly or deficient workarounds
Consider such example:
for {
a <- aFut()
c <- cFutF(a)
d <- dFutF(a)
b <- bFut(c, d)
} yield {
a + b
}
It is immediately seen that values c
and d
can be computed in parallel,
without a flatMap
step between them. We can opt for zipping, but this doesn’t
scale well for bigger counts of independent futures. Therefore we use sequencing
and end up with this:
for {
a <- aFut()
Seq(c, d) = Future.sequence(Seq(cFutF(a), dFutF(a)))
b <- bFut(c, d)
} yield {
a + b
}
Above syntax is a bit cluttered. With more independent futures stuffed into
sequencing operation it would be easy to make mistakes and it is hard to follow
easily what comes from what. We can improve the readability a bit at a cost of
verbosity:
for {
a <- aFut()
cFut = cFutF(a)
dFut = dFutF(a)
Seq(c, d) = Future.sequence(Seq(cFut, dFut))
b <- bFut(c, d)
} yield {
a + b
}
The downside of the above solution, besided the obvious boilerplate is that we
need to create new identifiers cFut
and dFut
which pollute the current
scope.
A better solution would be to introduce a special syntax where consecutive <|
(left triangle) steps are fused with a <-
(light left arrow) step that comes
after them. Fusing here means replacing steps (more than one) with a
Future.sequence
construct described before.
The final version with enhanced for-comprehensions
would then look like this:
for {
a <- aFut()
c <| cFutF(a)
d <- dFutF(a)
b <- bFut(c, d)
} yield {
a + b
}
Performance considerations
Because sequence
call contains all fused steps at once, optimizations are both
more feasible and more efficient than when using cartesian products. For
example:
- CanBuildFrom passed to
sequence
can produce arrays with fixed size and not
care about resizing - one call to
sequence
means less virtual calls and lambda objects creation
than with repeated cartesian products sequence
methods can do some special-casing for small number of parameters
(usually only a few steps would be fused)
Relation between sequencing and flatMapping
Sequencing should provide the same results as flatMapping. In other words:
replacing <-
steps with <|
steps (and vice versa) should not change the
meaning of a program. This means that an IDE could automatically suggest to a
developer that he can replace <-
steps with <|
steps (where possible) for
increased performance.
Where sequence
lies?
This is possibly a silly question, but worth answering. sequence
is of course
in companion object. Currently it is available for Future
class. Scalaz
provides sequence
through implicit conversions. I have verified that for
scalaz
s implementation of sequence
on list of lists the law of replacing
<-
steps with <|
steps would hold.
What can be done without language change?
For the Future
class contained in the standard Scala library the situation is
somewhat interesting. Sequencing a list of futures (or alternatively - zipping
them repeatedly) produce a future that will complete after all sequences futures
complete. Therefore instead of deconstructing the list of values in a for
comprehension we can build an evidence that we will use to extract the computed
value from futures directly. Full working example of that technique is there:
The syntax that is possible currently is:
for {
a <- Future(5)
b = Future(8)
c = Future(a + 6)
ev <- Completer(b)(c)
d <- Future(ev(c) - ev(b))
} yield {
a + ev(b) + ev(c) * d
}
For me this type of workaround is much more readable (please vote if you agree)
than tuple deconstruction like:
for {
a <- Future(5)
(b, c) <- Future.product(Future(8), Future(a + 6))
d <- Future(c - b)
} yield {
a + b + c * d
}
With the proposed language change we would get rid of the Completer
and
evidence and get a clean code:
for {
a <- Future(5)
b <| Future(8)
c <- Future(a + 6)
d <- Future(c - b)
} yield {
a + b + c * d
}
Related proposals
Similar proposal is here:
The main difference is however that consecutive Cartesian products don’t offer
the same performance optimization opportunities as sequencing.
Update:
Just to clarify what’s wrong (in my opinion) with list destructuring. It is much more likely to overlook mistakes with such notation:
(width, height) <- Future.product(computeHeight(), computeWidth())
Than with this notation:
width <| computeHeight()
height <| computeWidth()
When we want to make a notation widespread then it should bring more benefits than problems.