PRE-SIP: Suspended functions and continuations

lihaoyi · July 12, 2022, 8:50am

Async code has just about all the same costs as threads for program understanding. Perhaps even more, because almost all async code on the JVM is running on top of a multithreaded thread pool!

But even single-threaded async code is hard to read. You have all sorts of combinators that you don’t normally see. Your stack traces no longer make sense when things crash. Sure, there are fewer pre-emption points, but the chances are you will be paying less attention to pre-emption and mutex (it’s single threaded after all!) and will still end up with the same bugs that occur from multithreading

I won’t say that threaded code is easy, but I will say that async on top of threaded code is strictly worse, and even single-threaded async code (e.g, in Javascript) is often hell to read.

lihaoyi · July 12, 2022, 8:54am

Note that everything we like in Scala for concurrency still applies to Loom threads without async. Actors can be thread backed. Futures can be thread-backed. Streams, FRP (like Scala.Rx), etc. can all work on top of threads

We should not confuse the programming model with the underlying concurrency primitive. Using Loom threads doesn’t mean we’re throwing away all our nice concurrency libraries in favor of raw threads and locks

fanf · July 12, 2022, 8:58am

I don’t have all the theorical background and all, so sorry if what I say is dumb.
I speak with the long term (14y) maintenance of floss application (not a lib) in server automation and compliance, but the scala part is mainly doing reaction to reports (“that probe send that”), generation of hundred of thousands files in long running process, and API/UI interaction - so a nice mixe of low- & high-latency work, with different throughout needs.

The scala code started as a better java with nice xml support, then more got more functionnal, with an important point on error management (having the compiler help us, poor developer, track for us the non nominal path). Async was not a big need, because we don’t have a huge parallelism on inputs.
So we never needed Future. But we were early adopter of monix, because it was solving problems like: “you can interrupt that think that can be long safely” ; “you can react to things happening on that daemon thread”.
Then we were (extremelly) early adopter of ZIO, because it has all what monix was providing us, plus a very important one: “You can manage error in all imaginable cases, and the compiler will let you know if you don’t.”.
Now, we have ~4y of feedback with ZIO, and I can say that we are not at all interested about sync versus async.
We are interested in:

I want to have the compiler be able to track ALL errors in the type system. Then I can choose what errors are important for my model and make them actionnable (the one out of the system are by def not modeled, and the only sane action is to kill the app, we don’t know what happened - I discussed that at scala.io and devoxx fr: DevoxxFR 2021 - Systematic error management in application - Speaker Deck).
does that operation can take a long time and block ; and we want to be able to safely timeout it with correctly handling all resources used by the operation and let upper app level of the failure if the app flow need it;
I never ever want to face an app-wide deadlock because some threadpool is exhausted. This is a very fundamental point. That means that I really don’t want to have to know if perhaps an underlying lib is calling a blocking DNS query forever, and I want to have the runtime manage that (yes, InetAddress comes to mind). But it can totally be a very long operation that from the point of view of the system is blocking: if all of the 4 threads of the pool (b/c the machine as 4 cores) are spend on a very slow writing operation, everything else is blocked. Not that ZIO 1 was not good on that (but it WAS the state of the art), forcing me to wrap every third party java app in a “might be blocking, who knows” construct, and ZIO 2 is now much simpler, since it’s the runtime that will migrate to the correct threadpool so that things are ok-performing and app is not blocked).
all that is extremelly operational, and in the general case you really don’t want to look at it appart for very specific performance reason. BUT when I need performance (because I know things about the app constraint the runtime does not), I want to have a very precise handle of all the internal operational things, down to OS threads and cache affinity.

So I care about modeling what errors can rise and automatically and compositionnaly managing them according to my model ; what can take long time and be able to interrupt it with safe handling of resources ; never have accidental app-wide dead lock because of runtime details (threadpool exhaustion or whatever runtime thing is involved); still be able to finely manage performance if I need to.

I never want to think in async. It’s complicated. Concurrency is even more complicated. I want to think in independant flow of data being transformed, or queues of event needing to be processed, and independant flow of step of execution, that can have errors I didn’t recal, and have the compiler force me and help to deal with them accordingly to my app need.

(and none of that is very theorical, sorry for that).

jdegoes · July 12, 2022, 9:06am

This is a false distinction with abundant counterexamples.

When you call Lock#lock() or InputStream#read() on JDK 8, this function can block indefinitely. Not just take a long time to run, but block forever, without ever returning control to the caller, and without ever allowing your application to “deal with other events”.

Keep in mind the “type signatures” of these methods are () => Unit. More generally, any function A => B can block for an infinite amount of time. So comparing the types A => B versus A => Future[B] does not give you any indication at all of how “long” these computations will be running, nor if they will interfere with your application’s ability to “deal with other events”.

The sole and only question is: which shape of function can be more efficient under different JDK versions? Because Future embeds callback capabilities, A => Future[B] can be made more efficient than A => B on pre-Loom JDKs. So, while we cannot say anything about when (or if) the functions A => B and A => Future[B] will return / resume, we can say that pre-Loom, the second shape of function has the potential to be more efficient–that is, to consume fewer scarce resources.

Now, post-Loom, this changes entirely! We still have the invariant that A => B and A => Future[B] do not give us any information on when these functions will return, if ever. But under Loom, we can now say that A => B always has the potential to be more efficient than A => Future[B].

Let’s imagine that tomorrow, the JDK deletes physical threads. Now, every time you do new Thread().run(), this is a virtual thread. All threads in all thread pools (such as ExecutionContext) are just virtual threads. Let’s further say the last few holdouts of true OS-level thread-blocking (file IO, synchronized, JNI, etc.) are done away with, and virtual threads never block OS threads.

In this hypothetical world, code looks EXACTLY the same as it does today, pre-Loom, and semantically, behaves exactly the same as well (with one exception being that Loom can garbage collect threads, which would increase the number of valid programs that can be written, but not change the semantics of existing programs). The only difference is that it’s more efficient. And you can delete all the reactive callbacks and radically simplify things. In this world, what does it mean to “track sync/async”?? This question cannot be answered because the question does not make any sense!

Async/sync is NOT about how long a method will take to “return” or “resume” (depending on the model), it is ONLY about how efficient a computation is. I understand that it is hard for all of us to wrap our minds around this fact, having dealt with non-first class async programming for so long, but this is the reality, and trivial examples like InputStream#read or Lock#lock or even URL#hashCode should act as the proverbial nail in the coffin to the idea that Future[String] somehow communicates something interesting about “long-running” computations.

In summary, Future conveys no necessary information about delays on return / resumption. While I fully support lowering notational overhead for keeping track of aspects of our program at compile-time (and there are many aspects that I think could help developers here, depending on how low you can get this overhead), I think keeping track of async versus sync in particular is a red herring, and our unholy (and also, unhealthy) obsession with it derives only from our collective lack of experience with green threading models.

Notes

To the extent that Future[A] it conveys accidental information about delays on resumption, it’s only because programmers have pushed more long-running computations into async because they want their programs to be efficient. That is to say, perhaps on average, a Future[A] will take longer to complete than () => A, but that’s only due to programmer convention, and does not derive from any fundamental difference between sync versus async computation.

odersky · July 12, 2022, 9:42am

Yes, but is that what it should be? (in an ideal world).

I think there’s a misunderstanding. I am not advocating for Future or async. I am just asking the question whether we want to track “possibility to suspend” in some way or not.

odersky · July 12, 2022, 9:47am

Note: I am not comparing fibers with (monadic) async. I am asking whether suspending should be tracked in the types, independently of whether the implementation is traditional threads, coroutines, or async. For async the answer is obviously yes, since the Task or Future type already implies the possibility to suspend. For the other two it’s a design choice.

lihaoyi · July 12, 2022, 10:02am

I think the question is: What does “possibility to suspend” even mean? Even vanilla OS threads can suspend, that’s what they’re for! As JDG said, something with a return type of Future[T] tells you nothing about a method: it could do anything before returning Future.successful(t)!

I feel your question needs to be properly defined before it can have a good discussion. “tracking side effects”, “tracking capabilities”, “avoid wasting threads”, etc. are well defined and well known problems. I have never heard of “possibility to suspend”, and have no idea what it’s meant to accomplish

raulraja · July 12, 2022, 10:14am

I think suspension is helpful as a type. When I have a function Suspend ?=> A, I can enable new extension syntax for operators that I only want to make available when I’m in a suspended context. For example, sleep.

If we don’t have that marker, all the suspension or async-related APIs are available in all places and can easily lead to more complex, or less performant programs than they should.

When expressed as a context function or capability, it does not interfere with composition, and I can still pass it to map.

either {
  eitherValue.bind // bind is only available under `Control[L] ?=> A`
}
structured {
  futureValue.join // only available inside structured concurrency
}

I think operators related to suspension should not be available in the environment scope without being coupled to a type.
If these computations are not carrying a type, the behavior is entirely hidden, and a user does not know if a function performs suspension work by looking at the signature. You may end up using suspension in toString, hashCode or others where this might not be a good idea.

This design also leads to good organization of components. Android users with Jetpack Compose, for example have a clear distinction between UI components and the suspend handlers that can perform work and transform them. This concern is not just UI, applies to many other use cases.
There are places in programming where interleaving suspend/IO code with pure code is not ideal.

A feature like context functions + capabilities may covers these concerns.

Kotlin already tracks this and prevents a lot of beginner mistakes.

odersky · July 12, 2022, 10:18am

Possibility to suspend is a property of a function. It means that the function can wait for an external event before it returns. We could also recommend that functions that might take a long time to run are labelled this way, but that’s much harder to enforce.

lihaoyi · July 12, 2022, 10:33am

Given this definition, what value do you hope developers to get out of this?

You have given two possible definitions for “suspend”. In a pre-Loom world, with expensive threads, both of these definitions are very useful because they let us reason about performance and concurrency characteristics:

“marking functions that wait for external events” is useful from a “don’t waste threads” perspective: we can use combinators to compose those functions without wasting a thread waiting for the external event.
“marking functions that may take a long time” are useful from a “fairness” perspective, to allow a user-land schedulers to treat them differently and avoid short-running tasks from getting blocked by long-running tasks, since we have a limited thread pool and tasks have to queue up to get onto it.

In a post-Loom world, with cheap threads, both those reasons go away:

Wasting threads is fine because threads are cheap, so having a thread blocked on waiting for an external event is fine
Fairness is no longer a concern, because threads are cheap, so we no longer need limited threadpools. There doesn’t need to be any event queue that tasks have to shuffle through before they get run: they can all run at the same time, sharing the smaller number of CPUs, and pre-empting each other so no task can block the others. (Some lightweight threading systems still allow long-running tasks to hog things, but Loom claims to be implementing Forced Pre-emption which should prevent that)

Are there other reasons you can think of that would make reasoning about these definitions useful to a developer?

raulraja · July 12, 2022, 10:39am

A reason may be preventing beginner mistakes. Mixing suspension in pure code is not always desirable. This is just one of the many cases where you don’t want to have access to suspension unless your functions are already marked as suspended:

lihaoyi · July 12, 2022, 10:48am

I don’t think that’s very useful, because performance issues (which is what this is) are all about context, which the compiler does not have. For example:

If .toString is called once a day, even if longRunningAsyncAndIO() takes 10 seconds making 50 network round trips around the earth it’s probably fine
If .toString is called once a millisecond (which should enough time for O(1,000,000) operations on a modern CPU) then taking anything more than 0.1ms is unacceptable

I’ve hit this myself, where even scala.collection.Map#toString takes forever, not because it’s implementation is bad or suspends, but because it’s a large map.

In general, perf-related things are difficult for the compiler to reason about. Pre-Loom, “Don’t block threads” and “Run long-running tasks on a separate thread pool” are reasonable heuristics. They don’t always work - probably 90% of Scala code is small-scale enough that blocking threads is fine - but they’re reasonable as rules of thumb.

However, post-Loom, those specific heuristics no longer apply. Anyway, as heuristics, these things always have a false-positive and false-negative rate, and IMO they’re more suitable to be lint-warnings (with the ability to @nowarn override it) rather than a major language feature.

odersky · July 12, 2022, 11:15am

Are there other reasons you can think of that would make reasoning about these definitions useful to a developer?

I think I explained it all here: PRE-SIP: Suspended functions and continuations - #43 by odersky. Just putting every action in its own thread means you end up in a dataflow architecture (at best). That’s fine, if there are no other resources to manage and no other effects to perform. Dataflow is very similar to lazy evaluation in that respect (both have their origins in FP).

In fact, it would be good to study Go concurrency and what experience people have with it. I know there are sophisticated dynamic race condition detectors for Go, which seems to point to a problem people have in this model.

lihaoyi · July 12, 2022, 11:44am

There’s a non sequitor going from “dataflow isn’t everything” or “single threaded programs are simpler than multithreaded” to “tracking suspension is a valuable thing”. Maybe you have something in mind, but you haven’t presented a reason to make the jump between those statements.

I agree with both your initial statements, and don’t see where that implies there’s value in tracking suspension in function signatures. As far as I can tell, they’re totally unrelated

As I’ve said above, Loom threads v.s. Async Futures is orthogonal to your programming model: Actors, Futures (dataflow), Streams, “direct” code, etc. You can have either underlying implementation with any of the programming models. There’s nothing in virtual threads limiting us to dataflow concurrency only.

The only difference with Loom threads is you can block. You don’t have to, in which case you can continue using all the existing Future/Actor/Stream/etc. APIs entirely unchanged. The ability for Loom threads to block then allows you to simplify things, but it doesn’t force you to make any changes to the API of these async libraries that you don’t want.

jdegoes · July 12, 2022, 11:44am

What do you mean, possibility to suspend? If you mean a “thread” of computation “waits” for some other “thread” of computation to do something, then this has nothing to do with sync vs async:

Synchronous code suspends whenever it “blocks” (InputStream#read, URL#hashCode, etc). This is OS-level suspension, as the thread is put to sleep and woken up again when there is something to do.
Asynchronous code suspends whenever a result is not immediately available (“semantic blocking”). This is runtime-level suspension, as the green thread (fiber) is put to sleep and woken up again when there is something to do. These suspensions are more efficient than OS-level suspensions.

If you wanted to track “possibly suspends” in the type system, it would have nothing to do with sync vs async, and everything to do if the thread / fiber / virtual thread / green thread is capable of entering a waiting state.

I personally do not think such tracking is useful: how is it actionable? How will it cause you to write different code? Most hypothetical use cases are better solved by timeouts (see @fanf’s post above).

Moreover, even if it were marginally actionable, i.e. you can contrive some example where this information causes you to write different code, then you have to ask the question: is the signal going to be visible amonst all the noise?

Here is a list of things that may “possibly suspend” (at the OS-level, AKA “synchronous”, or virtual thread level, AKA “asynchronous”):

Putting an item into a queue or taking an item from a queue
Using synchronized around a method (e.g. Collection#add for some collections)
Reading from sockets or files
Computing the hash code of a URL
etc.

Due to the concurrent-safe design of many libraries and vast portions of the JDK, suspension points are innumerable, and occur at places you might not expect.

Even if tracking “suspendability” were actionable (and I am not convinced that it is), how would you be able to find the needle in the proverbial haystack?

In summary, and in my opinion, sync vs async tracking is not useful, and even “suspendability tracking” appears not actionable, but even if it were actionable, it’s not clear that it would be practically so due to the huge number of suspension points in concurrent-safe code.

odersky · July 12, 2022, 12:03pm

I agree with both your initial statements, and don’t see where that implies there’s value in tracking suspension in function signatures. As far as I can tell, they’re totally unrelated

I think I covered that as well already:

Here’s a reason why you would want that: Your main thread can call f directly. But calling g means that your mean thread might suspend for a long time, in which case it cannot deal with other events. That’s a bad idea obviously. So, we want to know about whether a function can suspend (or take otherwise a long time to run) or not. As Eric Meijer says: Delay is an effect.

In other words: If you don’t go all in with one thread per action you will compose actions sequentially. And then it matters whether an action can suspend or not since suspension is transitive.

EDIT: I really should stop now. I did not want to hijack the thread with that question.

lihaoyi · July 12, 2022, 12:26pm

You covered that, and I responded. The only reason “main thread cannot deal with other events” is a problem, is due to limited thread pools, causing either starvation or fairness issues. The limited thread pools are going away with Loom, along with these issues, so that “main thread cannot deal with events” is no longer a problem.

If you don’t believe any of these steps in the reasoning that’s fine: maybe we think Loom won’t land, maybe we think supporting non-Loom environments is important, maybe we think that there are other reasons apart from those given that tracking suspension is important. But you haven’t said which step you don’t believe, so at this point we’re both just talking in circles quoting ourselves

The whole thing boils down to: why is it such an obviously bad idea that the main thread cannot deal with events? We know why that’s true pre-Loom. I have given the reasons above. It has been true for 20 years. But post-Loom, the problem space has changed, and our reasoning should change along with it.

raulraja · July 12, 2022, 12:29pm

I think it’s important we discuss the value of tracking suspension. In my view, we lose safety if we don’t as we achieve direct style but we are unable to track which operations may suspend and can run into unfortunate cases like the toString screenshot above. From the original proposal:

An example of such safety in practice

odersky · July 12, 2022, 12:45pm

No, not at all. Even with an unlimited thread pool, a main thread that waits when calling function g and that therefore cannot handle another event that is supposed to be treated later causes starvation. You will say, sure but that’s bad design. The events should be handled in different threads. To which I respond, yes, but how do you know that it’s bad design? You need to know whether a call can suspend (or otherwise take a long time to complete) or not. And then we get to the tradeoff whether that should be in the types or treated informally.

mdedetrich · July 12, 2022, 1:40pm

So here are my 2 cents on this topic

Firstly this discussion where we use the terminology of sync vs async is missing the bigger picture where fundamentally we are debating on concurrency and parallelism. To make things clear

Concurrency: An abstraction that lets you represent the notion of preforming multiple computations at the same time
Parallelism: A hardware/physical feature (i.e. cores/threads) that lets you in reality perform computations at once

Various programming languages for both historical and design reasons have confused/mixed these concepts which underlies what is being discussed here. For example Java (up unti loom) has used threads as its core concurrency model which is great for CPU bound tasks since they bind directly to OS threads but is terrible for IO bound tasks where your bottleneck is not your CPU being busy computing some task but rather waiting for a response on the network/filesystem.

In my opinion, the most ideal case if you were to design a programming language from scratch is to have an abstraction for concurrency (not parallelism!) as your core abstraction and then have various interpreters/implementations to handle the different cases (i.e. a ForkJoinPool for your IO bound tasks in your webserver/database/filesystem and raw Thread's for your CPU bound algorithms). Even with their problems, this is what Scala and Rust do although with different results, i.e. both Scala (with Future/IO/Task etc) and Rust (with async) let you represent your logic in purely abstract terms while giving you fine grained control about how to run those concurrent computations (i.e. executors/execution contexts). The big difference here is that due to Scala being based on a VM (JVM) with seamless interopt on a high level language (Java) that doesn’t have the necessary abstractions (pre loom) there are performance overheads where as Rust will transform async code at compile time into a state machine that runs on your specified executor (this is the fastest way you can solve this problem).

Commentators that are parroting around saying “green threads will solve everything” and using examples of languages like Go or Elixir are conveniently ignoring that for CPU bound concurrent tasks you cannot rely on green threads alone. Green threads may be cheap but if you have a CPU bound concurrent task’s extra cost is not cheap anymore, in more detail

Erlang: Erlang actually has significant problems for long running CPU bound code, anyone with experience using it in a non trivial circumstance would have actually come across this. In practice Erlang solves this problem largely in the same way that Python does, i.e. these CPU bound functions are abstracted away with FFI which is often backed by C (for example a lot of number based operations don’t use Beam’s green threads otherwise it would be pitifully slow). Quick googling of “erlang CPU bound” demonstrates this
Go: Its been well documented that if you have pure CPU bound tasks and you use goroutines as their concurrency primitive, it is notably slower than the C/C++/Java/Rust equivalent using raw threads.
Haskell: Haskell has virtual threads but actually distinguishes between IO bound tasks (IO monad) and synchronous tasks (not bound in IO monad). Due to this ghc (and the fact that ghc compiles to native code and not a JVM) can take advantage of this. Also people doing CPU bound concurrency in Haskell use STM, Control.Concurrent.Thread (which are standard threads) or other techniques.

In practice this is why languages like Go and Elixir have fond their niche in areas where IO bound concurrency is much more important (webservers/network/etc etc) and not CPU bound problems where the JVM is still quite heavily used.

So really what does this mean? Well in essence it means that if you want to have a language (and a runtime) that handles both CPU and IO bound tasks performantly you are going to have the colored function problem or alternately different abstractions for dealing with CPU bound computations (or both). You need some way of marking filesystem/network/IO tasks to be multiplexed onto a small amount of threads and other CPU bound (or cheap CPU tasks such as boxing of datastructures) to NOT be needlessly multiplexed on multiple threads and ideally all be run on the same thread/s with minimal context communication between those threads. If you only have green threads then your language will be great for webservers but terrible for CPU bound number crunching algorithms and vice versa. If you want more evidence about why only having green threads is a terrible idea, this is what Rust initially tried to do and realised that they experienced all the problems I just described (Rust being a new native language had the liberty of experimenting in great detail with different solutions, and they ended up scrapping green threads which was their initial attempt).

Ironically the problem with Scala is that it has the basis for good abstractions to solve this problem in a principled manner but because of language stability reasons and the JVM its execution has been sublime (in contrast to Rust). For example with Future it would be ideal if the Future.map operation would, by default, run on a ExecutionContext.parasitic since almost all of the time the .map operations are cheap CPU tasks that are best executed quickly on the same thread (this is the same reason why when you read the akka/akka-stream codebase on pretty much every future.map operation they explicitly run this on ExecutionContext.parasitic). For blocking calls there is the blocking function but alas since Future is in the Scala stdlib its too late to change this. The bigger thing is that even though Scala has the necessary tools to solve this, because of its history (JVM and then multi-platform) there are issues in its execution/implementation.

So more to the topic of this SIP regarding suspended functions/continuations, this effort I think is going to be largely wasted and there are many people in Kotlin actually arguing that adopting co-routines was a bit of a mistake. As mentioned earlier, there are very clear issues with HKT’s/HOFs when it comes to suspended functions and unlike Kotlin/Java/Go this is extremely common in Scala. Also from personal experience in using co-routines in Kotlin, while its great for trivial linear code that you want to execute asynchronously its actually more difficult to reason about when dealing with more complex interruption/concurrent cases. I think it would be far wiser to spend the effort on coming up with and/or improving the current abstractions that we use rather than jumping to co-routines which work better with other languages that are designed to be simpler (i.e. go/co-routines), these languages are not Scala.

Also on another note, exceptions are not a replacement for value based propagation (i.e. Option/Either/Validation etc etc).I know that a lot of effort has been put into Scala 3 for making exception based error handling more principled and ergonomic, but no-matter what is done there are a whole set of problems where you don’t want to use exceptions whatsoever and thats because while throwing exceptions is free, catching them are extremely expensive. This is why, for example, you would never use exceptions for form validation on a webserver, these kinds of problems will always be using value based error handling. I think it would be fair to say that even with modern idiomatic Scala code, exceptions are only to represent cases where your program enters a state that it isn’t reasonable to continue and if you do end up catching exceptions its typically only to log/print the exception+stacktrace and continue crashing/recovering, otherwise you use Option/Either or some other validation type.

Why am I bringing this up? I am getting the impression that people here are arguing that we need to move away from Monad’s such as Option/Either to try/catch/throw because they are “complex to understand” and “don’t scale well when composed with other effects” (such as Future/IO/Task etc) which, while its true that this is a difficult problem; having to work with Monad’s (or more specifically value based error handling) is something we will always have to deal with in Scala and brushing this under the rug because its “annoying” and “we don’t like dealing with it” isn’t doing anyone any favors.

In summary, adding continuations is solving the wrong problem (in context of Scala) and there have been attempts at this before which have failed, i.e. https://github.com/scala/scala-async and while I am sure that this proposal would be more ergonomic than scala-async its still not enough of a reason for people to pick it up.