PRE-SIP: Suspended functions and continuations

rssh · July 11, 2022, 4:44pm

Would it work If instead of a suspend keyword we have just Suspend ?=> A

It’s mean that function f: suspended A=>B. will have type. A => (Suspend? => B) ?

If yes, let we have List[A] and method map[B](f: A=>B):List[B].
Than List.map(x => f(x)). will have type. List[Suspended?=>Int]. Is it what we want ?

val result = List(1,2,3).map(f)
if (result.head == 1) then 
   // never will be here ?  or will ?

odersky · July 11, 2022, 5:11pm

I think we need to distinguish two questions:

Should the sync/async distinction be reflected in types?
How does async get implemented?

I believe the answer to the first question is “yes, but not in the traditional way”. The traditional way leads to the colored function problem: We need to duplicate large bodies of code in sync and async versions. Or, alternatively, we have to sprinkle a lot of type variables around just to accommodate the
possibility that everything can be async. I believe we can do much better by switching the viewpoint
from effects to capabilities. An async computation would then be a computation that references the
Suspend capability. It turns out that this change in viewpoint gives a much better solution to the effect polymorphism problem.

The second question, how async is implemented, is largely orthogonal to the first. It could be by mapping to a monad, which is the most common solution for Scala today. Or by mapping to state machines, provided we can extend that to higher-order functions. Or by building on a runtime with support for coroutines and continuations, which (hopefully) Loom will provide. Once we buy into capabilities, we’ll have another interesting option: duplicate every function that can take a suspendable argument into sync and async versions. This may look like it reintroduces function coloring but doesn’t really since it is all codegen instead of source. Also, the capability type system tells us what we need to duplicate, so it’s hopefully not pervasive.

We are just starting a large scale (7 persons over 5 years) project to research these things. The aim is to find a unified approach to resources and effects that could become a high-level analogue to what Rust is for lower-level systems programing. The sync/async problem is one of the fundamental problems we study. We will work on a concrete type system for suspendability, and will look into different implementation strategies and compare them.

Here is a slide deck that describes the project. Btw, I am still looking to fill some roles in this project. Here’s a job add for a post doctoral scientist: https://recruiting.epfl.ch/Vacancies/2452/Description/2. Another ad for a research engineer will follow.

jdegoes · July 11, 2022, 5:56pm

I strongly disagree with this position.

The async/sync distinction disappears in a language with green threads. Literally the only reason we have the async/sync distinction is because the JVM (unlike Go, etc.) chose not to give us green threads, but rather, to give us operating system level threads, which was a mistake given the highly concurrent nature of modern applications (perhaps foreseeable, perhaps not).

In a perfect world, Thread.sleep() (etc.) would always be efficient. It is only when Thread is an OS-level thread that it becomes impossible to make it so, at least, due to the way OS threads are implemented and handled. Loom brings us that perfect world, more or less, and with time it will be more and more true.

Tracking sync/async distinction is absolutely not remotely relevant to software development in any language with green threads (see also: Go, Haskell, Erlang, etc.), and is a bit like using the type system to track whether MOVAPS is being used to copy floating-point values (that’s an implementation deal that should be dealt with by your language, e.g. compiler + runtime).

We are just starting a large scale (7 persons over 5 years) project to research these things.

Given this research project, I hope it’s crystal clear that now is not the time to import legacy Kotlin syntax + semantics into the Scala 3 language, since any attempt to do so would be both indadvised in light of Loom, and premature in light of Scala 3 research topics.

odersky · July 11, 2022, 7:50pm

It’s a fair question: what do we want to track in types? I agree it’s a tradeoff that has to be analyzed carefully, and that different people might make different choices.

In the particular case of suspension / delays. it seems to me that the main reason you want to track it is that a computation might hang on to some other critical resource that you also want to track. In that case it makes a difference whether the resource is only blocked for a short time, or indefinitely, until the suspended computation resumes. You might argue that even non-suspending long-running computations have the same problem. True, but I remember that Microsoft at some point decreed that any computaton running longer than 40ms (? or whereabouts) had to be wrapped in a future.

lihaoyi · July 11, 2022, 9:04pm

I most agree with JDG here. Lots of things are worth tracking in types, but I don’t think sync/async distinctions is one of them.

Let’s go back to the fundamentals; why dont people just spawn threads? One thread per Future? One thread per Actor? A lot of early Scala libraries did this. Why didn’t it work?

The basic issue with this approach here is one of performance and resource footprint: threads are expensive, context switching is expensive, so you typically won’t want to have more than O(1,000) threads in an application that may have O(100,000,000) objects. Thus people invented async to try and multiplex workflows onto a small number of threads, to reduce the performance and resource footprint concerns.

But in the end, the problem with threads is not semantics, but performance! All the libraries and frameworks around async is to try and make “async code” look exactly like “direct style” threaded code, because semantically that’s what people want. People invented “async backpressure”, because they no longer can rely on blocking to slow down upstream producers. Everyone wants the semantics of threads, but without their cost issues.

Consider another data point: threads are avoided in the JVM in high performance IO-bound use cases in favor of async, but they’re not avoided in languages like Python, where they are the preferred mechanism for IO bound operations. Why? Because everything else in Python is so expensive enough that threads are comparatively cheap! Again, this emphasises that avoiding threads is a performance/cost/footprint issue, and not an issue of programming semantics

Loom, by and large, fixes the cost of threads, and makes them cheap. You can now spawn O(10,000,000) threads without issue. Suddenly the whole reason for async goes away! Futures are still useful as a programming model, as are Actors, but they no longer need to be async. They can have a thread each, they can block sometimes, no problem at all. No need for async.

Thus, given the JVM has Loom, I wouldn’t think it’s worth it to integrate a compiler-level async functionality at this point. Async was important in the past, in some high performance use cases. That use case has largely been satisfied by Loom, in a much better way. Even without Loom, I would argue that it doesn’t quite meet the threshold of building into the language, and anyway all the implementations/proposals presented so far (including this one!) have so many limitations around HOFs as to be mostly un-adoptable. I say this as someone who tried in earnest to adopt Scala-Continuations back in 2012

Scalajs is cool - I am literally the world’s biggest proponent of Scalajs - but I don’t think such a major investment just for Scalajs is worth ir. Furthermore, if we’re willing to invest 7pax times 5 years of effort in Scalajs, there are a million other things the project could benefit from more than an async CPS tranformation.

jdegoes · July 11, 2022, 9:12pm

In the particular case of suspension / delays. it seems to me that the main reason you want to track it is that a computation might hang on to some other critical resource that you also want to track. In that case it makes a difference whether the resource is only blocked for a short time, or indefinitely

There is a total isomorphism between async and sync computation. In particular, for Future.never, the synchronous equivalent is while(true){}. Everything that you want or need with synchronous computation, you want and need with asynchronous computation, and visa versa.

Literally the only distinction between sync and async is efficiency (which is an implementation detail, pushed to the language level or lower, as in modern languages like Go or Kotlin), and even that is not set in stone (perhaps one could imagine an OS + CPU such that sync, i.e. OS threads, would be faster than async, i.e. virtual threads in the language runtime).

Tracking complexity, efficiency, and performance, be that O(n) or time bounds, might indeed be interesting, such that developers would have a better handle on the runtime behavior of their code, but I would view that as theoretically orthogonal to sync versus async (and practically, nearly so, especially post Loom). Whether or not such tracking of complexity, efficiency, or performance should be done in the type system or at the level of the runtime, and exposed to users via tooling, is perhaps a subject more suited to research than commercial programming.

I will suggest, however, that (a) the overhead of any sort of static complexity, efficiency, or performance tracking would have to be extremely small to justify its pervasive use in software (because the benefit is marginal—not zero, however), and (b) the obsession of JVM programmers with the async/async distinction is not shared by developers who use languages with green threads (i.e. it’s a historical artifact that will disappear in a post-Loom world).

odersky · July 11, 2022, 9:14pm

Yes, but does that mean that Future[String] should no longer exist as a separate type from () => String? That’s the crux of the matter, as I see it. One can answer either way. but I think it’s a reasonable position to say that one wants to see in the type whether a computation can suspend or not.

jdegoes · July 11, 2022, 9:29pm

This is the whole point: there is no difference (aside from efficiency) between “long async suspension” and “slow synchronous code”, or “short async suspension” and “fast synchronous code”.

If you call InputStream#read, it may not return for 1 minute (or even longer!). All async does is take that same 1 minute computation and make it vastly more efficient.

In reality, Future[String] is not guaranteed to take more or less time than () => String. If you are using Future properly, then in a pre-Loom world, Future[String] will be more efficient than () => String. Post-Loom, however, () => String is more efficient than Future[String], even if async is involved.

Post-Loom, the successor to Future is actually java.lang.Thread (specifically, VirtualThread). That’s because () => String does not give you a “handle” on the running computation (like Future), whereas VirtualThread does give you such a handle, and over time Loom will evolve more capabilities to interact with virtual threads (currently you can interrupt them, get their stack trace, and so forth, which is actually more than you can do with Future!).

lihaoyi · July 11, 2022, 9:29pm

What you describe is already false today. Futures have nothing to do with suspending, and not all Futures can be suspended. And normal threaded code can be suspended already; that’s the whole point of threads and pre-emptive multitasking! They’re just a bit expensive

Will Future-based async programming be much less useful with Loom? I say yes, in which case Future[String] could conceivably be large relaxed by String. or () => String. Maybe not 100%, but mostly. Even blocking systems have callbacks sometimes.

Futures and async play a small-sized role in Scala today. “Most” Scala systems are too small to worry about it. At my employer, one of the largest Scala shops around, only a small number of systems need to be Async. My OSS projects, almost nothing is async. The Scala compiler and broader tooling ecosystem, no async. With Loom, we can expect Async’s importance to diminish further. That leaves Scala.js as a use case.

If Async was so important in the Scala ecosystem, we would have seen Lightbend having much more adoption and importance than they do today, since “async everything” is their thing. But they aren’t all that widely adopted, despite a massive marketing and evangelism effort on their part, which is the market telling us that Async isn’t really necessary for the vast majority of developers and systems. You can see that truth in how much async Scala code you yourself write in Dotty and surrounding systems

Overall, not a strong case for baking Async into the language or compiler

jdegoes · July 11, 2022, 9:36pm

I would also add to this that post-Loom, any callback API can be trivially converted into an ordinary API by using, for example, java.util.concurrent.CompletableFuture, whose get method is non-blocking on virtual threads.

e.g.:

def callbackAPI[A](cb: A => Unit): Unit = ???

def loomAPI(): A = {
  val cf = new CompletableFuture[A]()
  callbackAPI(cf.complete(_))
  cf.get
}

Essentially, Loom allows you to instantly convert callback-based APIs into “synchronous-looking” code, without using any infectous wrapper types such as Future.

I honestly think that post-Loom, there is no reason for scala.concurrent.Future to exist.

lihaoyi · July 11, 2022, 9:43pm

One thing that Future[T] lets you do that normal T doesn’t is reason about completion: you can ask a Future[T] if it is done computing or not, and make decisions based on that fact. That can be useful sometimes.

Not something people do often, but it is a “unique” capability of Future, and so there will continue being reason for it to exist. But Futures won’t be nearly as common as they are today, and even today they are uncommon at best.

And in a post-Loom world, even Future doesn’t need to be async, and you can spawn a (virtual) thread per Future and block on things just fine. So even if Future remains, the need for an “async transform” goes away

jdegoes · July 11, 2022, 9:59pm

I agree this is useful, which is why I view Thread (or VirtualThread) as the successor to Future, since it gives you—among other things, such as interruption and stack traces, which Future does not give you—the similarly useful method Thread.isAlive.

lihaoyi · July 12, 2022, 3:52am

I agree, that’s more or less what I’ve been trying to say with the whole “futures don’t need to be async anymore”. With Loom, a Future can basically be implemented using blocking threads that block on each other, and Future[T] more or less isomorphic to (() => T, java.lang.Thread) with some helper methods to make them more ergonomic. The ergonomics do have some value, and removing the async constraint could allow us to make Future much more ergonomic than it is today.

lrytz · July 12, 2022, 7:11am

Isn’t also every Cats, Monix, ZIO, Fiangle system async?

philipschwarz · July 12, 2022, 7:18am

Futures and async play a small-sized role in Scala today. “Most” Scala systems are too small to worry about it. At my employer, one of the largest Scala shops around, only a small number of systems need to be Async.

Can anyone else with evidence that this is or is not the case please share it?.

What about e.g. the prominent role played by Futures in the Play framework? Won’t there be a considerable number of Web applications and micro-services out there using Play?

From
rwa

1.2.1. Threaded versus evented web application servers Roughly speaking, there are two categories of programming models in which web servers can be placed. In the threaded model, large numbers of threads take care of handling the incoming requests. In an evented model, a small number of request-processing threads communicate with each other through message passing. Reactive web application servers adopt the evented model.

Threaded servers A threaded server, such as Apache Tomcat, can be imagined as a train station with multiple platforms.[5] The station chief (acceptor thread) decides which trains (HTTP requests) go on which platform (request processing threads). There can be as many trains at the same time as there are platforms…As implied by the name, threaded web servers rely on using many threads as well as on queuing.

Evented servers To explain how evented servers work, let’s take the example of a waiter in a restaurant. A waiter can take orders from several customers and pass them on to multiple chefs in the kitchen. The waiter will divide their time between the different tasks at hand and not spend too much time on a single task. They don’t need to deal with the whole order at once: first come the drinks, then the entrees, later the main course, and finally dessert and an espresso. As a result, a waiter can effectively and efficiently serve many tables at once. As I write this book, Play is built on top of Netty. When building an application with Play, developers implement the behavior of the chefs that cook up the response, rather than the behavior of the waiters, which is already provided by Play.

Futures are at the foundation of asynchronous and reactive programming in Scala: they allow you to manipulate the results of a computation that hasn’t happened yet and to effectively deal with the failure of such computations, enabling more efficient use of computational resources. You’ll encounter futures in many of the libraries you’ll work with in Play and other tools.

In this first part of the chapter, we’ll make use of Play’s WS library (which we used a bit in chapter 2) to make calls to a few websites related to the Play Framework. The WS library is asynchronous and returns future results, which is just what we need to get our hands dirty.

Futures should primarily be used when there’s a blocking operation happening. Blocking operations are mainly I/O bound, such as network calls or disk access.

5.1.2. Futures in Play Play follows the event-driven web-server architecture, so its default configuration is optimized to work with a small number of threads. This means that to get good performance from a Play application, you need to write asynchronous controller actions. Alternatively, if you really can’t write the application by adhering to asynchronous programming principles, you’ll need to adjust Play’s configuration to another paradigm.

Falling back to a threaded model If you’re in a situation where much of your code is synchronous, and you can’t do much about it or don’t have the resources to do so, the easiest solution might be to give up and fall back to a model with many threads. Although this is likely not the most appealing of solutions because of the performance loss incurred by context switching, this approach may come in handy for existing projects that haven’t been built with asynchronous behavior in mind. In practice, configuring your application for this approach can provide it with the necessary performance boost while giving the team time to change to another approach.

lihaoyi · July 12, 2022, 8:10am

I don’t want to get into a argument of “how big async is”, because fundamentally this is about the proposal(s) above:

This proposal, as is, does not really demonstrate it’s value given that it is using Loom to implement things that can be done in Loom already without compiler support
This proposal, as many others before it (scala-async, monadless, scala-continuations, …) does not have a clear story on how it would handle user-defined higher-order functions. Given these are orders of magnitude more common in Scala than in any other popular language, that is a blocker for widespread usability and adoption in the Scala ecosystem
Loom, which is landing in JDK19, gives us the best of both worlds: we can get the performance and concurrency benefits of async code, while writing “direct” style code, in a manner that (a) works without issue with HOFs (b) works across JVM languages (c) adds zero new primitives to the language and a minimal set of new primitives to the standard library.

Given that, a lot of this discussion about “how much of the Scala ecosystem is async” is kind of moot: regardless of how big the problem is, in the end the proposal at hand doesn’t fix the problem, and there is a JVM-level fix coming soon that handles it in a more universally compatible way than any approach we’ve seen before.

odersky · July 12, 2022, 8:18am

Thank you for the thought provoking discussion. I mulled over it a bit more. In the end it comes down to this for me: You have two functions f and g from () to String. One can suspend (say for an external event), the other cannot. Do you want to distinguish the two function’s types?

Here’s a reason why you would want that: Your main thread can call f directly. But calling g means that your mean thread might suspend for a long time, in which case it cannot deal with other events. That’s a bad idea obviously. So, we want to know about whether a function can suspend (or take otherwise a long time to run) or not. As Eric Meijer says: Delay is an effect.

Now, should this info be part of the function’s type, or just be commented somewhere? There’s a tradeoff that depends on what the cost of keeping track of this info is. If we can lower the notational overhead, we can shift the tradeoff towards tracking more things in types. That’s what our project is about.

lrytz · July 12, 2022, 8:27am

It wasn’t me bringing up the size of async as an argument against this proposal It’s only this that itched me, your technical arguments are well explained and relevant.

lihaoyi · July 12, 2022, 8:29am

I think this is the part that is going to be wrong. It’s not wrong today on JDK8, but it will be wrong tomorrow on JDK19.

That statement is correct today: threads are expensive (both memory footprint and cpu for context switching), and so you have a finite number of them. So a thread doing nothing and blocked is bad.

Tomorrow, with Loom, threads are no longer expensive (both memory footprint and CPU), and so a thread doing nothing is OK

I can understand where you are coming from. Lightbend spent a decade with a huge budget pushing it’s “everyone everywhere async all the time” philosophy, well past the point of reason. And it has been somewhat true for ages, because JVM threads have aleays been OS threads, and OS threads are expensive. Maybe not for most small-scale users, but if you are twitter with 1,000,000 concurrent RPCs, then you can’t spawn 1,000,000 OS threads to service them but you can spawn 1,000,000 async Futures

With Loom in JDK 19, OS threads are not expensive. Suddenly the whole reason “blocking is bad” no longer exists. You can, in fact, spawn 1,000,000 JVM threads/virtualthreads

This is unusual. It goes against everything that we’ve known about JDK threading for two decades. So I can understand people’s reticence about it.

However, we have seen this play out in othet ecosystems: in Go, in Erlang, there is no need to avoid blocking calls. Threads are cheap, so if some block you can just spawn more. Sure, at some ultra-high-concurrency use cases (e.g. perhaps Finagle’s original usage in Twitter-scale systems), the difference in cost between a Future and a Thread will be significant. But for the vast majority of JVM users, threads are no longer expensive.

If we decide that we think Loom is not going ot deliver as promised, or we think JDK8-18 or Scala.js are important enough to have this feature dedicated just for them, then fine. But if we “skate to where the puck is”, on the JVM the future is JDK19 and above, and that future includes Loom with cheap threads where blocking a thread doing nothing is no longer a concern

odersky · July 12, 2022, 8:45am

But threads are still very expensive in terms of program understanding, and that matters much more than the runtime cost. A single-threaded program is much simpler to understand and show correct than a multi-threaded program. Threads are only free if you go to a full purely functional dataflow model, but then you can have neither resources nor other effects, so this is an option only for a small set of programs.

Believe me, that’s not where I am coming from

I remember a long time ago the Oz language proposed that we should replace sequential programming with lots of lightweight threads. From what I could see, it did not work very well.