Upates to scala.concurrent.Future wrt breaking changes

mdedetrich · November 1, 2017, 12:28pm

@fwbrasil and @viktorklang have done excellent work in writing a document which details numerous performance improvements to the current implementation of scala.concurrent.Future, you can see the document here https://docs.google.com/document/d/1f3oBH-Nh_BZtd6zJxtW41TX3frnlyamC7dG7QmVIDU4 . As noted in the document, there are changes which won’t be considered since it breaks backwards compatibility.

There are also more aggressive changes that can be considered to improve performance even more, i.e. change map to note require an ExecutionContext (improving cache locality by preventing jumping through ExecutionContext) and add a mapWithExecutor which uses the current signature def map[S](f: T => S)(implicit executor: ExecutionContext): Future[S]. This change can be even more breaking since it can break source compatibility, but I suspect that in 95%+ of the cases, people do not need a custom ExecutionContext specifically for the map operation (i.e. people typically just use the ExecutionContext from the Future you are map'ping on).

Is there a plan to eventually address these breaking changes in Future in some future version of Scala? Future is by far the most used concurrency primitive in Scala and improving its performance as much as possible (while still staying true to the design of Future) should be something that is explored. Since Scala 2.13 is planning to do breaking changes wrt collections design, maybe we can also bundle such changes to Future in the same release?

Krever · November 1, 2017, 12:44pm

How does it play with Monix or Cats/Scalaz IO? For me as a user it’s not perfect to have to choose between 4 options delivering mostly the same features. Scala’s futures are great in comparison to java’s but if it has valid competition it may be good to select the best one and replace the original implementation.

danarmak · November 1, 2017, 12:49pm

This optimization applies to all the methods of Future that are implemented in terms of onComplete, so you’d have lots of method pairs, which is not elegant. And if, in future versions, you come up with more optimizations that change source compatibility or semantics, you might end up with a combinatorial explosion of methods.

mdedetrich · November 1, 2017, 12:52pm

I don’t like the situation with 4-5 different concurrency/async/IO primities, but Future is different from all the others in 2 major aspects

It is strict (which follows Scala’s design of being strict by default). This means that Futures run automatically when defined unless you put it in a thunk (i.e. def). This comes with all of the benefits and disadvantages of being strict.
It allows you to supply a custom execution context, which is very handy when it comes to resource allocation/seperation. It also gives you the ability to do things like “excute this computation on the UI thread” where the “UI thread” is a custom ExecutionContext (I actually did this when working on a Swing application). Its also useful for doing things like dedicating a thread for a specific task so you have better realtime behaviour when your application is under load. Note that some Task implementations also allows this, but not all of them (and arguably its harder to do when your Task type is lazy)

The other Cats/Scalaz/Monix IO are forced to be lazy (Future can be lazy, but its not guaranteed by the typesystem). This means that that you have to run a Task after its been defined. it also means they are more or less referentially transparent (although this does break in rare circumstances)

In my opinion there are legitimate tradeoffs between Future and its lazy variants (Scalaz Task, Monix Task, Fs2 Task) but I think the tradeoffs between the lazy variants themselves (Scalaz Task, Monix Task, Fs2 Task) is less clear.

mdedetrich · November 1, 2017, 12:54pm

onComplete is only meant to be used for logging/diagnostic situations, it actually shouldn’t be used as part of business logic. For this you have map/flatMap/foreach and other related functions

danarmak · November 1, 2017, 1:10pm

What I meant is that flatMap and the other are implemented using onComplete, which is the only true abstract method of Future.

map isn’t special, it’s just used a lot. If you change the semantics of map and call the original method mapWithExecutor, you should by the same reasoning also change flatMap, foreach, collect, recover and many other methods and add a method called xxxWithExecutor containing the original behavior for each one. Making every method into a pair can’t end well.

mdedetrich · November 1, 2017, 1:16pm

Ah yes, I see what you mean (this is what I meant by aggressive change btw)

But its true that map isn’t special, however its used a lot. In fact because of this, frameworks like akka-http had to implement a fast-future variant of Future (with the main difference being that the map function doesn’t use an existing EC).

I do wonder if its easy to separate it out cleanly, map is by far whats used the most (i.e. its called in all for comprehensions)

danarmak · November 1, 2017, 1:18pm

I think for comprehensions call flatMap and filter. And at least some of the other methods are also called often, e.g. recover/recoverWith. It doesn’t make sense to optimize some and not others, if the optimization breaks source compatibility anyway.

mdedetrich · November 1, 2017, 1:22pm

Yes true, although it makes more sense for flatMap to take an ExecutionContext, because this is where an ExecutionContext (by default) makes sense. If you want to merge the result of 2 Future values, this is where the main usecase of ExecutionContext comes in (since these future values typically are behind Http/Database calls)

I think map (and maybe recover) are the most common. The point is that map is almost always just updating the literal value inside a Future, where using an ExecutionContext is redundant

Point taken, the idea is we should use the defaults that make sense

alexandru · November 1, 2017, 1:32pm

FYI, that’s not exactly true — since Scala 2.12 we have another abstract method — transformWith, which is basically flatMap with error handling.

Also Future should not be considered an abstract data type that can be inherited — even though I like it the way it is, because it allows for alternative implementations — because you’re going to break flatMap’s default implementation which discards intermediate references in long chains and that needs to use internals for doing that — so by overriding Future, unless you really know what you’re doing, you’re probably going to end up introducing memory leaks in those flatMap (tail-recursive) chains.

alexandru · November 1, 2017, 1:36pm

Scala’s Future is awesome when used in combination with Monix’s Task, the two being complementary.

A Task works like a lazy Future, being a Future generator if you will.

See my presentation from Scala Days 2017 — https://www.youtube.com/watch?v=bZO-c-yREJ4

danarmak · November 1, 2017, 1:36pm

What about detecting if the EC passed is in fact the same one the original Future is running on (assuming it was constructed by Future.apply or equivalent), and doing the optimization only then?

mdedetrich · November 1, 2017, 3:03pm

Theoretically speaking the JVM should optimize this automatically (since implicit parameters get desugured to plain method parameters) however being the JVM, its not always apparent when inlining happens and when it doesn’t. Thing like method bodies being too large can suddenly trip the JVM into not optimizing something

Alternately whole program optimization (i.e. Dotty deep linker plus maybe also the new optimizer in 2.12.x) could theoritically optimize this away, but I am not really sure thats the case

There are also 2 seperate issues here, one is optimization and the other is intent. Having an ExecutionContext in the method parameter implies that you need it (and you use it) for the computation, but if your computation is just changing the value inside the Future and you don’t really use the executionContext. In this case providing the executionContext is not entirely being correct about the intent of the typical usage of map.

In any case if optimisation can elide the ExecutionContext away in this situation then it shouldn’t really be an issue, however I am doubteful this is going to be reliable on platforms like the JVM

mdedetrich · November 1, 2017, 3:09pm

Yeah I am up for making Future final/abstract as detailed in the document because

It does improve performance sigficantly in a lot of cases
I haven’t seen anyone extending Future apart from addressing the performance problems which we are trying to solve in the first place.

There are however other issues with this at least if we make future final, i.e. we have CancelableFuture which Monix uses (and this extends Future), however maybe it makes sense to put CancelableFuture into scala.concurrent (at least that we now know have a stable concrete of it) however I suspect such a change may not be popular (especially considering that people want to move stuff away from Scala stdlib)

alexandru · November 1, 2017, 3:30pm

Monix extends it for CancelableFuture which is a valid use-case.

I’m not necessarily for having a CancelableFuture in the stdlib, I’m quite happy to have it in Monix.

However I don’t see people moving away from the stdlib, nor should they. Scala is a hybrid language and it needs a Future for imperative code. And I quite like having that Future in the stdlib.

mdedetrich · November 1, 2017, 3:40pm

Yeah my point is, if we decide to make Future final for performance reasons (this is a valid reason) then we need to investigate all of the current reasons for extending Future in the current Scala ecosystem.

So far I see 2 legitimate cases

Stuff like akka-http FastFuture (deliberately implement for performance reasons, i.e. it has a map which doesn’t require an ExecutionContext). Problems like this should be solved in the first place with performance improvements in Future
Stuff like CancelableFuture in Monix. On this note, one of the reasons why Twitter Future still exists is because the Scala Future can’t be cancelled (I think other reasons are also performance related)

If Future is ever made to be final we basically need to make sure that we don’t kill current legitimate cases for extending Future, which means that we may need to add stuff like CancelableFuture into scala.concurrent.Future

danarmak · November 1, 2017, 3:49pm

I once needed to extend Future. I wanted to represent a set of events using Future values, and one of them was an event that would never happen, i.e. the future would never complete. This Future value became a resource leak from all the continuations linked to it, because there was generic code that operated on any event passed to it. So I extended Future to override onComplete to do nothing.

In a sense, a Future.never is the dual of the always-already-completed Future.unit.

mdedetrich · November 1, 2017, 3:56pm

CancelableFuture from Monix already has this, monix/monix-execution/shared/src/main/scala/monix/execution/CancelableFuture.scala at ec266e1a167cdf956e692725a2b2016e79a71141 · monix/monix · GitHub

alexandru · November 1, 2017, 7:04pm

Yet another reason is building already completed values, also from Monix:

sealed trait Ack extends Future[Ack] { /* ... */ }

case object Continue extends Ack { /* ... */ }

case object Stop extends Ack { /* ... */ }

The nice thing about this setup is being able to return a straight Continue when a Future[Ack] is expected.

Btw, I am not convinced that making Future a final class will improve performance.

viktorklang · November 1, 2017, 8:25pm

Hi everyone,

Thanks for raising this conversation, Matthew.

Fortunately/disappointingly (depending how you want to view it) I’ve already implemented most, if not all, of the viable optimizations here, which also include JMH benches: https://github.com/viktorklang/scala-futures/tree/wip-optimizations-√

It’s still a work in progress, however I am rather confident that there’ll be some nice, non-breaking, performance improvements coming out of this.

Sorry for the long story below, it’s only relevant if you enjoy some Future/Promise backstory/rationale:

I, personally, have realized that it is very important that, before I suggest what I consider to be improvements, what something is trying to achieve, as I can make something like Future blazing fast if I am willing to compromise on resource-safety, fairness, determinism, memory-footprint, extensibility, compatibility etc.

What Future/Promise has achieved—from my experience using it, and my interactions with users online and offline—is ubiquity. From what I can tell, it is used by practically every Scala developer out there, which is rather cool, but it also means that it must change only in very responsible ways.

For the casual reader of this thread, you may not know why the following things are as they are, so I thought I’d take the time to outline, from memory, what is intended to be achieved by the following decisions:

ExecutionContext: by having the piece of code which wants to compute things having to specify where, leads to: determinism (no longer racing between completer and invoker), resource-safety (added logic cannot poison the pools which produce the values), fairness (which is up to the ExecutionContext implementation to deliver), extensibility (it’s easy to integrate with most execution engines / thread pools), compatibility (it has very few methods so easy to keep compatible)

Future/Promise: By having a separation between read-capability and write-capability, it is much easier to reason about what code wants to be able to do, and what code does.

Absence of cancellation: This was very consciously decided, if Future can be cancelled it is no longer read-only, which means that any reader can mess up the other readers’ reads if their Future is shared. This leads to tons of defensive copying, and worse, it is no longer clear in the code what will happen, or which defensive copies are actually needed.
Also, semantically, a Future is a placeholder for a value that might not yet exist, and as such, it doesn’t really make sense to make it cancellable—a Task is something which could be cancelled, or perhaps something like a SubmittedTask, anyway, I digress.

I guess what I’m trying to say is, I think Future will be possible to improve, performance wise, in some cases by quite a lot, and in some cases perhaps rather modestly. All of this without breaking source compatibility. (And I’d be extremely cautious to introduce user-breaking changes, just because it is so widely used.)

Also, I’d think a Task-like abstraction/construct in the stdlib would be a good thing, to provide for that nice bridge between a lazy and a strict construct.

Cheers,
√