PRE-SIP: Suspended functions and continuations

I agree this is useful, which is why I view Thread (or VirtualThread) as the successor to Future, since it gives you—among other things, such as interruption and stack traces, which Future does not give you—the similarly useful method Thread.isAlive.


I agree, that’s more or less what I’ve been trying to say with the whole “futures don’t need to be async anymore”. With Loom, a Future can basically be implemented using blocking threads that block on each other, and Future[T] more or less isomorphic to (() => T, java.lang.Thread) with some helper methods to make them more ergonomic. The ergonomics do have some value, and removing the async constraint could allow us to make Future much more ergonomic than it is today.


Isn’t also every Cats, Monix, ZIO, Fiangle system async?


Futures and async play a small-sized role in Scala today. “Most” Scala systems are too small to worry about it. At my employer, one of the largest Scala shops around, only a small number of systems need to be Async.

Can anyone else with evidence that this is or is not the case please share it?.

What about e.g. the prominent role played by Futures in the Play framework? Won’t there be a considerable number of Web applications and micro-services out there using Play?


1.2.1. Threaded versus evented web application servers Roughly speaking, there are two categories of programming models in which web servers can be placed. In the threaded model, large numbers of threads take care of handling the incoming requests. In an evented model, a small number of request-processing threads communicate with each other through message passing. Reactive web application servers adopt the evented model.

Threaded servers A threaded server, such as Apache Tomcat, can be imagined as a train station with multiple platforms.[5] The station chief (acceptor thread) decides which trains (HTTP requests) go on which platform (request processing threads). There can be as many trains at the same time as there are platforms…As implied by the name, threaded web servers rely on using many threads as well as on queuing.

Evented servers To explain how evented servers work, let’s take the example of a waiter in a restaurant. A waiter can take orders from several customers and pass them on to multiple chefs in the kitchen. The waiter will divide their time between the different tasks at hand and not spend too much time on a single task. They don’t need to deal with the whole order at once: first come the drinks, then the entrees, later the main course, and finally dessert and an espresso. As a result, a waiter can effectively and efficiently serve many tables at once. As I write this book, Play is built on top of Netty. When building an application with Play, developers implement the behavior of the chefs that cook up the response, rather than the behavior of the waiters, which is already provided by Play.

Futures are at the foundation of asynchronous and reactive programming in Scala: they allow you to manipulate the results of a computation that hasn’t happened yet and to effectively deal with the failure of such computations, enabling more efficient use of computational resources. You’ll encounter futures in many of the libraries you’ll work with in Play and other tools.

In this first part of the chapter, we’ll make use of Play’s WS library (which we used a bit in chapter 2) to make calls to a few websites related to the Play Framework. The WS library is asynchronous and returns future results, which is just what we need to get our hands dirty.

Futures should primarily be used when there’s a blocking operation happening. Blocking operations are mainly I/O bound, such as network calls or disk access.

5.1.2. Futures in Play Play follows the event-driven web-server architecture, so its default configuration is optimized to work with a small number of threads. This means that to get good performance from a Play application, you need to write asynchronous controller actions. Alternatively, if you really can’t write the application by adhering to asynchronous programming principles, you’ll need to adjust Play’s configuration to another paradigm.

Falling back to a threaded model If you’re in a situation where much of your code is synchronous, and you can’t do much about it or don’t have the resources to do so, the easiest solution might be to give up and fall back to a model with many threads. Although this is likely not the most appealing of solutions because of the performance loss incurred by context switching, this approach may come in handy for existing projects that haven’t been built with asynchronous behavior in mind. In practice, configuring your application for this approach can provide it with the necessary performance boost while giving the team time to change to another approach.

1 Like

I don’t want to get into a argument of “how big async is”, because fundamentally this is about the proposal(s) above:

  1. This proposal, as is, does not really demonstrate it’s value given that it is using Loom to implement things that can be done in Loom already without compiler support

  2. This proposal, as many others before it (scala-async, monadless, scala-continuations, …) does not have a clear story on how it would handle user-defined higher-order functions. Given these are orders of magnitude more common in Scala than in any other popular language, that is a blocker for widespread usability and adoption in the Scala ecosystem

  3. Loom, which is landing in JDK19, gives us the best of both worlds: we can get the performance and concurrency benefits of async code, while writing “direct” style code, in a manner that (a) works without issue with HOFs (b) works across JVM languages (c) adds zero new primitives to the language and a minimal set of new primitives to the standard library.

Given that, a lot of this discussion about “how much of the Scala ecosystem is async” is kind of moot: regardless of how big the problem is, in the end the proposal at hand doesn’t fix the problem, and there is a JVM-level fix coming soon that handles it in a more universally compatible way than any approach we’ve seen before.


Thank you for the thought provoking discussion. I mulled over it a bit more. In the end it comes down to this for me: You have two functions f and g from () to String. One can suspend (say for an external event), the other cannot. Do you want to distinguish the two function’s types?

Here’s a reason why you would want that: Your main thread can call f directly. But calling g means that your mean thread might suspend for a long time, in which case it cannot deal with other events. That’s a bad idea obviously. So, we want to know about whether a function can suspend (or take otherwise a long time to run) or not. As Eric Meijer says: Delay is an effect.

Now, should this info be part of the function’s type, or just be commented somewhere? There’s a tradeoff that depends on what the cost of keeping track of this info is. If we can lower the notational overhead, we can shift the tradeoff towards tracking more things in types. That’s what our project is about.


It wasn’t me bringing up the size of async as an argument against this proposal :slight_smile: It’s only this that itched me, your technical arguments are well explained and relevant.


I think this is the part that is going to be wrong. It’s not wrong today on JDK8, but it will be wrong tomorrow on JDK19.

That statement is correct today: threads are expensive (both memory footprint and cpu for context switching), and so you have a finite number of them. So a thread doing nothing and blocked is bad.

Tomorrow, with Loom, threads are no longer expensive (both memory footprint and CPU), and so a thread doing nothing is OK

I can understand where you are coming from. Lightbend spent a decade with a huge budget pushing it’s “everyone everywhere async all the time” philosophy, well past the point of reason. And it has been somewhat true for ages, because JVM threads have aleays been OS threads, and OS threads are expensive. Maybe not for most small-scale users, but if you are twitter with 1,000,000 concurrent RPCs, then you can’t spawn 1,000,000 OS threads to service them but you can spawn 1,000,000 async Futures

With Loom in JDK 19, OS threads are not expensive. Suddenly the whole reason “blocking is bad” no longer exists. You can, in fact, spawn 1,000,000 JVM threads/virtualthreads

This is unusual. It goes against everything that we’ve known about JDK threading for two decades. So I can understand people’s reticence about it.

However, we have seen this play out in othet ecosystems: in Go, in Erlang, there is no need to avoid blocking calls. Threads are cheap, so if some block you can just spawn more. Sure, at some ultra-high-concurrency use cases (e.g. perhaps Finagle’s original usage in Twitter-scale systems), the difference in cost between a Future and a Thread will be significant. But for the vast majority of JVM users, threads are no longer expensive.

If we decide that we think Loom is not going ot deliver as promised, or we think JDK8-18 or Scala.js are important enough to have this feature dedicated just for them, then fine. But if we “skate to where the puck is”, on the JVM the future is JDK19 and above, and that future includes Loom with cheap threads where blocking a thread doing nothing is no longer a concern


But threads are still very expensive in terms of program understanding, and that matters much more than the runtime cost. A single-threaded program is much simpler to understand and show correct than a multi-threaded program. Threads are only free if you go to a full purely functional dataflow model, but then you can have neither resources nor other effects, so this is an option only for a small set of programs.

Believe me, that’s not where I am coming from :stuck_out_tongue_winking_eye:

I remember a long time ago the Oz language proposed that we should replace sequential programming with lots of lightweight threads. From what I could see, it did not work very well.


Async code has just about all the same costs as threads for program understanding. Perhaps even more, because almost all async code on the JVM is running on top of a multithreaded thread pool!

But even single-threaded async code is hard to read. You have all sorts of combinators that you don’t normally see. Your stack traces no longer make sense when things crash. Sure, there are fewer pre-emption points, but the chances are you will be paying less attention to pre-emption and mutex (it’s single threaded after all!) and will still end up with the same bugs that occur from multithreading

I won’t say that threaded code is easy, but I will say that async on top of threaded code is strictly worse, and even single-threaded async code (e.g, in Javascript) is often hell to read.


Note that everything we like in Scala for concurrency still applies to Loom threads without async. Actors can be thread backed. Futures can be thread-backed. Streams, FRP (like Scala.Rx), etc. can all work on top of threads

We should not confuse the programming model with the underlying concurrency primitive. Using Loom threads doesn’t mean we’re throwing away all our nice concurrency libraries in favor of raw threads and locks


I don’t have all the theorical background and all, so sorry if what I say is dumb.
I speak with the long term (14y) maintenance of floss application (not a lib) in server automation and compliance, but the scala part is mainly doing reaction to reports (“that probe send that”), generation of hundred of thousands files in long running process, and API/UI interaction - so a nice mixe of low- & high-latency work, with different throughout needs.

The scala code started as a better java with nice xml support, then more got more functionnal, with an important point on error management (having the compiler help us, poor developer, track for us the non nominal path). Async was not a big need, because we don’t have a huge parallelism on inputs.
So we never needed Future. But we were early adopter of monix, because it was solving problems like: “you can interrupt that think that can be long safely” ; “you can react to things happening on that daemon thread”.
Then we were (extremelly) early adopter of ZIO, because it has all what monix was providing us, plus a very important one: “You can manage error in all imaginable cases, and the compiler will let you know if you don’t.”.
Now, we have ~4y of feedback with ZIO, and I can say that we are not at all interested about sync versus async.
We are interested in:

  • I want to have the compiler be able to track ALL errors in the type system. Then I can choose what errors are important for my model and make them actionnable (the one out of the system are by def not modeled, and the only sane action is to kill the app, we don’t know what happened - I discussed that at and devoxx fr: DevoxxFR 2021 - Systematic error management in application - Speaker Deck).
  • does that operation can take a long time and block ; and we want to be able to safely timeout it with correctly handling all resources used by the operation and let upper app level of the failure if the app flow need it;
  • I never ever want to face an app-wide deadlock because some threadpool is exhausted. This is a very fundamental point. That means that I really don’t want to have to know if perhaps an underlying lib is calling a blocking DNS query forever, and I want to have the runtime manage that (yes, InetAddress comes to mind). But it can totally be a very long operation that from the point of view of the system is blocking: if all of the 4 threads of the pool (b/c the machine as 4 cores) are spend on a very slow writing operation, everything else is blocked. Not that ZIO 1 was not good on that (but it WAS the state of the art), forcing me to wrap every third party java app in a “might be blocking, who knows” construct, and ZIO 2 is now much simpler, since it’s the runtime that will migrate to the correct threadpool so that things are ok-performing and app is not blocked).
  • all that is extremelly operational, and in the general case you really don’t want to look at it appart for very specific performance reason. BUT when I need performance (because I know things about the app constraint the runtime does not), I want to have a very precise handle of all the internal operational things, down to OS threads and cache affinity.

So I care about modeling what errors can rise and automatically and compositionnaly managing them according to my model ; what can take long time and be able to interrupt it with safe handling of resources ; never have accidental app-wide dead lock because of runtime details (threadpool exhaustion or whatever runtime thing is involved); still be able to finely manage performance if I need to.

I never want to think in async. It’s complicated. Concurrency is even more complicated. I want to think in independant flow of data being transformed, or queues of event needing to be processed, and independant flow of step of execution, that can have errors I didn’t recal, and have the compiler force me and help to deal with them accordingly to my app need.

(and none of that is very theorical, sorry for that).


This is a false distinction with abundant counterexamples.

When you call Lock#lock() or InputStream#read() on JDK 8, this function can block indefinitely. Not just take a long time to run, but block forever, without ever returning control to the caller, and without ever allowing your application to “deal with other events”.

Keep in mind the “type signatures” of these methods are () => Unit. More generally, any function A => B can block for an infinite amount of time. So comparing the types A => B versus A => Future[B] does not give you any indication at all of how “long” these computations will be running, nor if they will interfere with your application’s ability to “deal with other events”.

The sole and only question is: which shape of function can be more efficient under different JDK versions? Because Future embeds callback capabilities, A => Future[B] can be made more efficient than A => B on pre-Loom JDKs. So, while we cannot say anything about when (or if) the functions A => B and A => Future[B] will return / resume, we can say that pre-Loom, the second shape of function has the potential to be more efficient–that is, to consume fewer scarce resources.

Now, post-Loom, this changes entirely! We still have the invariant that A => B and A => Future[B] do not give us any information on when these functions will return, if ever. But under Loom, we can now say that A => B always has the potential to be more efficient than A => Future[B].

Let’s imagine that tomorrow, the JDK deletes physical threads. Now, every time you do new Thread().run(), this is a virtual thread. All threads in all thread pools (such as ExecutionContext) are just virtual threads. Let’s further say the last few holdouts of true OS-level thread-blocking (file IO, synchronized, JNI, etc.) are done away with, and virtual threads never block OS threads.

In this hypothetical world, code looks EXACTLY the same as it does today, pre-Loom, and semantically, behaves exactly the same as well (with one exception being that Loom can garbage collect threads, which would increase the number of valid programs that can be written, but not change the semantics of existing programs). The only difference is that it’s more efficient. And you can delete all the reactive callbacks and radically simplify things. In this world, what does it mean to “track sync/async”?? This question cannot be answered because the question does not make any sense!

Async/sync is NOT about how long a method will take to “return” or “resume” (depending on the model), it is ONLY about how efficient a computation is. I understand that it is hard for all of us to wrap our minds around this fact, having dealt with non-first class async programming for so long, but this is the reality, and trivial examples like InputStream#read or Lock#lock or even URL#hashCode should act as the proverbial nail in the coffin to the idea that Future[String] somehow communicates something interesting about “long-running” computations.

In summary, Future conveys no necessary information about delays on return / resumption. While I fully support lowering notational overhead for keeping track of aspects of our program at compile-time (and there are many aspects that I think could help developers here, depending on how low you can get this overhead), I think keeping track of async versus sync in particular is a red herring, and our unholy (and also, unhealthy) obsession with it derives only from our collective lack of experience with green threading models.


To the extent that Future[A] it conveys accidental information about delays on resumption, it’s only because programmers have pushed more long-running computations into async because they want their programs to be efficient. That is to say, perhaps on average, a Future[A] will take longer to complete than () => A, but that’s only due to programmer convention, and does not derive from any fundamental difference between sync versus async computation.


Yes, but is that what it should be? (in an ideal world).

I think there’s a misunderstanding. I am not advocating for Future or async. I am just asking the question whether we want to track “possibility to suspend” in some way or not.


Note: I am not comparing fibers with (monadic) async. I am asking whether suspending should be tracked in the types, independently of whether the implementation is traditional threads, coroutines, or async. For async the answer is obviously yes, since the Task or Future type already implies the possibility to suspend. For the other two it’s a design choice.


I think the question is: What does “possibility to suspend” even mean? Even vanilla OS threads can suspend, that’s what they’re for! As JDG said, something with a return type of Future[T] tells you nothing about a method: it could do anything before returning Future.successful(t)!

I feel your question needs to be properly defined before it can have a good discussion. “tracking side effects”, “tracking capabilities”, “avoid wasting threads”, etc. are well defined and well known problems. I have never heard of “possibility to suspend”, and have no idea what it’s meant to accomplish


I think suspension is helpful as a type. When I have a function Suspend ?=> A, I can enable new extension syntax for operators that I only want to make available when I’m in a suspended context. For example, sleep.

If we don’t have that marker, all the suspension or async-related APIs are available in all places and can easily lead to more complex, or less performant programs than they should.

When expressed as a context function or capability, it does not interfere with composition, and I can still pass it to map.

either {
  eitherValue.bind // bind is only available under `Control[L] ?=> A`
structured {
  futureValue.join // only available inside structured concurrency

I think operators related to suspension should not be available in the environment scope without being coupled to a type.
If these computations are not carrying a type, the behavior is entirely hidden, and a user does not know if a function performs suspension work by looking at the signature. You may end up using suspension in toString, hashCode or others where this might not be a good idea.

This design also leads to good organization of components. Android users with Jetpack Compose, for example have a clear distinction between UI components and the suspend handlers that can perform work and transform them. This concern is not just UI, applies to many other use cases.
There are places in programming where interleaving suspend/IO code with pure code is not ideal.

A feature like context functions + capabilities may covers these concerns.

Kotlin already tracks this and prevents a lot of beginner mistakes.


Possibility to suspend is a property of a function. It means that the function can wait for an external event before it returns. We could also recommend that functions that might take a long time to run are labelled this way, but that’s much harder to enforce.

1 Like

Given this definition, what value do you hope developers to get out of this?

You have given two possible definitions for “suspend”. In a pre-Loom world, with expensive threads, both of these definitions are very useful because they let us reason about performance and concurrency characteristics:

  • “marking functions that wait for external events” is useful from a “don’t waste threads” perspective: we can use combinators to compose those functions without wasting a thread waiting for the external event.

  • “marking functions that may take a long time” are useful from a “fairness” perspective, to allow a user-land schedulers to treat them differently and avoid short-running tasks from getting blocked by long-running tasks, since we have a limited thread pool and tasks have to queue up to get onto it.

In a post-Loom world, with cheap threads, both those reasons go away:

  • Wasting threads is fine because threads are cheap, so having a thread blocked on waiting for an external event is fine

  • Fairness is no longer a concern, because threads are cheap, so we no longer need limited threadpools. There doesn’t need to be any event queue that tasks have to shuffle through before they get run: they can all run at the same time, sharing the smaller number of CPUs, and pre-empting each other so no task can block the others. (Some lightweight threading systems still allow long-running tasks to hog things, but Loom claims to be implementing Forced Pre-emption which should prevent that)

Are there other reasons you can think of that would make reasoning about these definitions useful to a developer?


A reason may be preventing beginner mistakes. Mixing suspension in pure code is not always desirable. This is just one of the many cases where you don’t want to have access to suspension unless your functions are already marked as suspended: