PRE-SIP: Suspended functions and continuations

Ok I think I understand where you are coming from

“Moving slow operations off a key thread to avoid starvation of that specific thread” is a valid concern; that’s a common issue in UI programming and Game programming, where heavy operations in the core event loop cause issues. I feel this all the time when I try to rename a file in Intellij, and it starts indexing the world on the main thread and locking up the IDE

But as JDG has said, this has nothing to do with suspending or not: a heavy CPU-bound computation can also cause the key threads to be blocked problematically. As I’ve mentioned before, scala.Map#toString has personally caused me problems in the past, not because the code is unsuitable, but because the data shape was unusual with a much larger Map than most people would expect to work with.

In the end, starvation of individual threads due to long-running operations is a performance issue, and performance issues are notoriously hard to analyze statically. Even for human programmers, the general advice is “always profile first, don’t try to statically analyze the code”. Suspension, networking, IO, etc. is all a red herring here, because heavy CPU computation causes all the same issues. And the resolution is the same: profile it, identify the hotspots, and optimize them in place or move the long-running tasks (whether CPU or IO bound) to a separate (virtual)thread.

Given how difficult performance issues are to statically analyse, I think expecting the compiler to be perform accurate analysis here with only static information is not a promising approach. The compiler doesn’t know

  • The shape/size of the input data
  • How many cores the code will have available to parallelize things
  • Whether the filesystem is spinning-rust, SSDs, or in-memory tmpfs
  • Whether your JDBC query is going to an in-process SQLite, or to a postgres database 10,000 miles away
  • The “use case” of the code which could make some 1ms computations unacceptably slow (e.g. in a 144fps Game loop) while other 10s computations perfectly fine (e.g. in a once-a-day batch job)

All of these are things that can, and do, make-or-break whether a key thread is blocked for an acceptable amount of time or not. e.g. when IntelliJ’s indexing blocks my IDE UI thread and doesn’t block other people’s, it’s because of the shape/size of the input it’s handling, not because I’m running an inferior implementation of IntelliJ compared to others.

That’s not to say inaccurate analysis is not useful. The current rules of thumb of not blocking threads is kinda-sorta useful, even if there are false positives and false negatives (a community leader once told me that instead of using a blocking AWS SDK call in a Play controller in my O(0.0001) qps college toy project, I should instead re-implement the entire AWS SDK in an async fashion…). But this sort of inaccurate analysis seems like something that belongs in a Linter (with appropriate configurability to properly suite individual codebases, and @nowarn escape hatches to override the linter when it’s wrong) rather than built deeply into the language spec and compiler

9 Likes

In an ideal world one would track this case but the reason why at least personally I don’t advocate for this is because in most real world situations this isn’t really that possible because this typically happens in cases where you have large input data, something the programmer only knows about (@lihaoyi example of calling .toString on a large map, or I case I had recently in calculating diff on data structures that are 30mb plus in memory). For such tasks however you still ideally want to have the ability to designate how they will run, for example you may want to execute these tasks on a pinned separate thread so at least you don’t stall the current thread. BTW these cases happen all the time in gaming and to a lesser extent UI’s, you have heavy CPU bound computations that in a lot of cases only the programmer knows about. This reminds of when I built a Scala based GUI using swing some time ago, I was using Scala’s standard Future as my IO type and I had an ExecutionContext that represented the UI thread which gave me fine grained control of only doing rendering on UI thread and other heavy CPU bound/async tasks to be run separately.

This still however doesn’t detract from the fact that marking computations which we know should by run “asynchronously” (i.e IO/network/filesystem) is useful.

Also strongly agreed, while there is a lot of legitimate debate about how to tackle this problem and its not an easy one to solve, co-routines is a solution that doesn’t play well with Scala design and how its idiomatically used.

4 Likes

@jdegoes @lihaoyi I think the main reason why people are more concerned about suspensions than long-running computations is that a long-running computation is usually identified during testing but waiting for an external event can have dramatically different outcomes depending on test vs production environment, system load, state of the network, user behavior, etc. It’s not an absolute, for sure. But if we take the Microsoft guideline as an example, then even long running computations that are identified as such could be treated as if they are suspending.

2 Likes

I think the microsoft guidelines are reasonable, but it is worth noting that they don’t have lightweight threads on the CLR. That sort of guidelines are reasonable on the pre-Loom JVM as well.

Basically Loom removes the applicability of these guidelines to avoid thread starvation from “high-concurrency multi-threaded systems”, which includes most backend systems, web services, API servers, etc. For those, we can just auto-scale the thread pool if some get blocked, with Loom letting us do that cheaply and efficiently

These guidelines continue to apply to “high-concurrency single-threaded systems”: UI threads, Game loops, code running within certain kinds of Actors, and environments like Scala.js. These are scenarios where “throwing more threads at it” is not possible, and work needs to be moved off-thread manually.

The current guidelines in the Scala community are overwhelmingly targeted at the multithreaded use case, for people developing backend systems and servers. Not that much UI development or Game dev happening in Scala, and Scala.js remains niche. That leaves the code running in single-threaded Actors, but only those for which latency and responsiveness is important.

IMO this isn’t a sufficiently broad use case to make a heavy investment in, but that’s a subjective judgement. I think we’ve reached mutual understanding and there isn’t any more to discuss :slight_smile:

5 Likes

Then I would suggest the conversation move from tracking async suspensions (which is inconsistent or even incoherent, as demonstrated above), to tracking indefinite-length computations, for which tracking both sync and async suspensions could be regarded as a poor man’s proxy.

If the goal is to track indefinite-length computations, then I would regard that as prime topic for future research, and personally, do not see how that relates to Loom, Kotlin coroutines, etc.

Incidentally, I share @lihaoyi’s opinion that the value of tracking indefinite-length computations is virtually gone in a post-Loom world.

Indeed, it is already gone for those using functional effect systems like ZIO.

If I am writing the following code:

for {
  bytes      <- drainStream(request)
  transcoded <- doTranscoding(bytes)
  _          <- uploadToS3(transcoded)
} yield Response.OK

Then for every statement, there exist two possibilities:

  1. I need the result of this statement in order to continue to the next statement.
  2. I do not need the result of this statement in order to continue to the next statement.

Note that this is a question of data flow, and fully resolvable statically.

If I care about latency (game and UI apps are excellent examples, but even in a microservice or API backend, latency matters a lot), then in any case where (2) holds (that is, in any case where the result of some computation is not needed in order to continue to the next statement), I will execute such qualifying statement in the background.

Using ZIO, I would transform the code to the following:

def doProcessing(bytes: Array[Byte]) = 
  for {    
    transcoded <- doTranscoding(bytes)
    _          <- uploadToS3(transcoded)
  } yield ()

for {
  bytes <- drainStream(request)
  _     <- doProcessing(bytes).forkDaemon
} yield Response.OK

In this refactoring, I am respecting sequential ordering in cases where subsequent statements depend on prior statements. In other cases, I am shifting work to new background fibers, which are so cheap you should always use them.

In a post Loom world, this is the new paradigm for low-latency: if you need the result from a previous computation, then you must perform it sequentially after that statement. But if you do not need the result from a previous computation, then you may, and often should, perform that work in the background.

Here’s the kicker: Tracking “suspendability” is neither a necessary nor sufficient condition for performing work in the background. For example, draining the stream will be done on a thread (or virtual thread) that suspends (either synchronously or asynchronously). Yet, we need the result of draining the stream in order to proceed to the transcoding step. So the fact that draining the stream may suspend is irrelevant to our sequential logic. Yet, to return the OK response, we do not need to wait for the transcoding or uploading to complete, so we push that computation into the background on a new virtual thread.

ZIO (and of course Loom) make it so cheap to do background processing that the new paradigm is: if you need to do something sequentially, then you do it sequentially, if you don’t need the result to make further progress, then you push it onto a virtual thread.

At no point do we need to understand or care about whether an OS thread is synchronously suspending, or whether a virtual thread is asynchronously suspending. That is not a relevant consideration, and even if you argue there is a poor man’s proxy there (a heuristic), I can show innumerable examples that demonstrate how weak that proxy is (e.g. Collection#add doing synchronous suspend pre-Loom; URL#hashCode doing asynchronous suspend post-Loom; etc.).

In summary:

  1. Sync suspension and async suspension must be treated together, never separately; any tracking proposal must consider them equivalent in every possible way (they fuse into the same concept under pure green threading models, such as the one Loom is almost giving us).
  2. In the new paradigm of cheap virtual threads, as argued by @lihaoyi, if we don’t need some result in order to make progress, then in any low latency application (not just games and UI), we will push that result into the background. This is a data flow question that has a statically analyzable answer, but it has nothing to do with tracking sync + async suspension in the type system.

Precision and clarity of thought is extremely important to get to the heart of the matter, which I think we have gotten to after much discussion.

4 Likes

I would like this thread to focus on the proposal of having support for continuations in the language instead of discussing the impact of Loom on functional effects. So, I moved a post to a new topic: Impact of Loom on “functional effects”. Please continue that discussion there!

This is something that I think isn’t quite so trivial. Simple static dataflow analysis makes it easy to statically move stuff onto cheap virtualthreads, except:

  1. Side effects are still prevalent throughout Scala. Not as common as in other languages, but enough that parallelizing automatically is risky business. Even when I’ve parallelized stuff manually, I regularly get bitten by side effects I didn’t notice

  2. Granularity & Overhead: virtualthreads are cheap, but not free. IIRC they’re something like ~1kb each, v.s. ~1kb per Future ~1mb for OS threads. That means you can’t go around parallelizing every 1 + 1 in your program without drowning in overhead, and you have to pick some granularity of work below which you won’t parallelize

It’s easy to parallelize things in Scala thanks to it’s FP-ish nature, and Loom makes it easier by making thrrwds cheaper, but I don’t think it’s reached the point where we can feasibly automate it yet. At some point, you have to make subjective judgements about what to send to background threads and what to run on your current thread.

Deciding what to parallelize is a performance issue, and one the challenges of performance issues is the fact that a decision here that works perfectly in one environment with a given input data can totally fall apart in a different environment on a different set of inputs

6 Likes

Most users of functional effect systems are not relying on them primarily for async programming. Rather, they are relying on them for concurrency (compositional timeouts, races, parallelism), resource-safety in the presence of concurrency, typed errors, context, fiber-local state.

For me, as an author, this was the point of proposing continuations. To encode functional effect systems and use them in a direct style, for more use-cases than asynchronous programming:

We would like to write scala programs in a direct style while maintaining the safety of the indirect monadic style.

We’d like to not have to wait until Loom-capable JDK/JVM usage reaches the majority of deployed production systems. Currently, only 12% of Java users are on JDK 15 (Source). It could be a very long wait for Loom to land. Perhaps it will be backported, but that is unknown.

How we may encode functional effects and program them with a direct style, given the current absence of Loom on most deployed JVM systems and on scala-js/native, is the purpose of this pre-proposal.

5 Likes

I agree I oversimplified the issue to focus on data flow. You do not always want to push every statement (whose result is not necessary for making further progress) onto a background thread.

That said, keep in mind the heuristic would not push 1 + 1 to the background, because that produces an Int that would be required for subsequent computation. Rather, it’s generally Unit returning methods or side-effecting processes (uploading a file, doing CPU processing, etc.) those results are unnecessary.

(In modern loggers, even log(str: String): Unit is effectively pushed onto a background thread via queuing. As a heuristic, Unit would be a vastly better criteria than “does async / sync suspension”, because usually you need the result of such a computation to make further progress.)

For sure. Maybe this is also a good point to think about what tooling could bring to this problem: it’s one strategy to force a programmer to add type annotations to manually propagate statically-known information about runtime behavior (suspendability, big-oh, little-oh, exception types, etc.); and another, quite different strategy to expose runtime behavior via tooling.

Tooling has the advantage that it’s less work for a developer: you do not need to engage in the boilerplate of propagating statically-known information via manually typing ASCII characters (which is probably the most common complain for statically-typed exceptions in Java).

Imagine a tool that looks at your code and says, uploadToS3() returns Unit, which is not used by subsequent statements, and on average takes 24 seconds to complete. Do you want to execute this computation on a virtual thread? [Yes] [No]`

2 Likes

For modern Linux at least the stack sizes for OS threads are much larger, on most distros its 8 mb (if you run ulimit -s you can find out how big it is on your installation). Its also the same on MacOS M1.

But yes, this point is completely correct. If you are doing mathematical/scientific computations (which is actually a big demographic for JVM, something a lot of people seem to have forgotten) as you said you do not want to parallelize every single 1 + 1 in your program even though you theoritically can because its side effect free. I suspect even doing runtime based analysis that is being alluded to will incur significant overhead (something that is a non concern in IO bound apps).

3 Likes

I appreciate, emphathize with, and support this goal—of making it easier to leverage functional effect systems in more places. However, I do not support this pre-SIP proposal at all.

Every proposal has costs and benefits, but a major problem with this proposal is that the benefits (direct, imperative async on pre-Loom JDKs) decrease rapidly as more and more companies adopt Loom (since Loom already solves the async problem in the correct way, by giving us virtual threads), while the costs are high and ongoing (in terms of development, maintenance, education), and have permanent implications, dictating syntax and semantics of Scala 3 for a whole generation of Scala programmers, and creating challenges for library and even application code bases that would have to straddle the quite significant suspend/Loom divide.

With each passing year, the benefits of the proposal would decrease further, approaching 0, while the costs would still remain high and ongoing, with inescapable permanent implications.

Even if you ignore this basic dynamic (which, in a cost-constrained environment, strongly suggests the proposal be rejected), we are stuck with the fact that the proposal has possibly fatal drawbacks:

  • Introducing two-colored functions into Scala, which is highly undesirable
  • Failing to provide clear semantics and typing around higher-order functions
  • Failing to provide a new notion of polymorphism (“suspend-polymorphism”), which is necessary to write generic code that can suspend or not depending on the functions it is passed
  • Failing to provide a type and value for suspendable lambdas, which are necessary to preserve the parity between methods and functions introduced in Scala 3
  • Failing to have a non-Loom-based runtime implementation
  • Not actually adding any new capabilities atop Loom, since everything in the proposal can be done much more simply in straight up Scala 3 or Java on Loom

I would support the proposal more if it were simply adding continuations to Scala 3 (not with Loom, but through generating resumable bytecode for methods and functions), without any changes to the syntax or semantics of Scala. That means you could leverage the benefits (on Scala Native, Scala.js) without having to pay some of the costs, and without having to change the Scala 3 language at all.

But in its current form, I think it’s a bad idea for Scala 3. In my opinion, even Kotlin needs to re-examine suspend, or runs the risk of becoming legacy compared to Java-on-Loom, which does not have the severe limitations or weirdness of Kotline coroutines.

10 Likes

Some of these ideas, of automatic parallelisation of FP code, went in the research literature under the name of implicitly parallel functional programming or implicit parallelism. “Implicit” in the sense of not requiring a special syntax by the programmer. Here is a reference on that subject. https://archiv.ub.uni-marburg.de/diss/z2008/0547/pdf/djb.pdf.

Now, there may be some domains within Scala, such as the planning of distributed data-parallel processes, or build systems, in which implicit parallelism would be useful. However, that may not be the case for the Scala base language. Scala has a rather conventional operational semantics, in that it is strict and sequential by default. This semantics is key for reasoning about side effects performed by program evaluation. Adding implicit parallelism would mess that mental model. Note that this is a matter of language design, and essentially separate from how efficient a specific platform has become in supporting parallelism.

Nevertheless, this is an interesting and deep topic that indeed deserves a long discussion on its own.

2 Likes

For my piece I feel that the work here and the suspend keyword would be minimally intrusive while adding capabilities not available today.

There are signifcant obstacles to its use and challenges that can be worked through, but it improves the situation.

Loom is not a reason to not do something that adds a useful feature for the language in the way people use it. The jvm community will need to support 11/17 for the better part of the next decade (September 2029 extended support) Loom does not lessen the value in doing something like this.

3 Likes

Even hardcoded-into-the-runtime implementations like Loom or Go’s lightweight threading hit issues with FFI calls into C code or OS-level blocking syscalls.

This could be a big issue with Scala Native as we explicitly advertise “calling native code with ease”, paraphrasing.

4 Likes

Yes, but it is unavoidable. It applies as much to the compiler/language-driven approach as it does the the runtime-driven approach that Loom does.

The general issue is that you need to “async transform” code for this to work. Compiler support can transform code being compiled, but has problem with upstream libraries compiled earlier. Runtime support (Loom, Go, Erlang) can transform all code inside the runtime, but has problrms with code running outside the runtime (e.g. C libraries or OS syscalls)

If you want to make some async transform that works with everything, including with C libraries and OS syscalls, you need to make changes to the OS, since that’s the thing that runs C code and syscalls.

This can be done, but is maybe outside the scope of this discussion

1 Like

I deleted my original message about this, because I had hoped that someone with more familiarity about it would bring this up (I know next to nothing about it).

But, I think it needs to be pointed out that there is already some support for state-machine based async which was merged into the compiler in 2020.

So there’s at least a precedent of supporting a similar thing in the compiler – though not with new language constructs (other than what can be introduced by a macro). Or at least, there was, in Scala 2.x.

I also think arguments involving Loom are unproductive. It’s not any more useful to argue “this is unnecessary, because Loom (an implementation detail) will make threads cheap” than it is to argue “this is necessary, because the current implementation makes threads expensive”. Both arguments are orthogonal to the question of whether the language should have a primitive for a continuation. Heck, an argument for the proposal, invoking the looming Loom, is that because Loom will probably eventually expose a continuation primitive, Scala should introduce one preemptively.

To be clear, I’m not arguing either way. I just think the arguments against are relying too much on Loom, and I am curious about how the compiler’s built-in support for scala-async (which I wasn’t a fan of, FWIW, but there it is) factors in here. Could the proposal (by backing off on keywords, and using types or annotations) be macro-implemented in a similar way, to start off with? In order to demonstrate its benefit in-vivo (though maybe with slightly clunkier syntax)

5 Likes

How would the debug experience look like with this new approach to color your functions?

That’s an issues all async implementations seem to have.

People in the Kotlin world for example are talking about that issue, and it seems still unsolved.

https://issuecloser.com/blog/kotlin-coroutines-stack-trace-issue

The issues could be potentially solved on the JVM to some extend by fully embracing Loom, but what about the other Scala environments?

Also, when fully relaying on Loom, why would we need this in the first place? (Anyway Loom seems quite far from some production runtimes given that some people are still on Java 8…)

This proposal also seems not to solve one of the other common problems: You don’t see on call-side which functions block! So you could block inside a suspended function without noticing by accident. This has quite bad consequences for the whole async runtime usually.

(Loom solves this by making more or less everything transparently non-blocking and async. But Loom is far away, and only availability on the JVM)

The other thing I’ve noticed in this proposal (a point which I really don’t like to be honest): The function signatures become meaningless to some extend, as you lose referential transparency, even on seemingly trivial functions. It returns a String according to its type? But well, it could cause a network-wide dead-lock on your cluster by doing that… Welcome back to imperative programming hell!

Not only debugging things becomes harder, you don’t even know with functions could potentially cause havoc as they look all harmless judging only by their signatures. Without ad hoc external linting features bolted on (like the one showed in the IntelliJ screenshots) this becomes a mine field.

Besides that: I don’t see the simpler (and imho more down to earth) approach mentioned in the alternatives. Instead of full blown continuations one could implement directly one of the patterns which are often build on top of continuations, namely coroutines.

Continuations by themself are one of the most “heavyweight” features ever invented in programming languages (form the mental-effort-to-grok-them standpoint), and almost no languages expose them to end-users. That’s for a reason, imho. Continuations are “just too powerful”.

Coroutines on the other hand would be likely more in the spirit of a “least power approach” in this case here.

There have been even already some experiments in Scala regarding coroutines — which looked quite interesting imho, even the project as such seems to be dead since a long time.

(Also have a look at the linked website. The docs are interesting.)

A coroutine implementation seems to suffer from the debuggability problem still… Even it looks more “simple” in the end without those proposed CPS transforms (which require dedicated debugger support at least).


I’m in the camp of people who think there should be a way to liberate programming form the monadic style, but all those “painting your functions” approaches aren’t the answer in my opinion either.

For now and on the JVM Loom seems to be “the answer”. (Direct code, no visible colors)

Still that’s not the final answer as only proper effect- and resource-safety will improve things significantly!

I for my part would enjoy for sure a modern, performant, and safe systems language that could finally enable to build an innovative resource- and capability-safe operating system for a distributed networked world!

I hope Scala (native) will become this language some day with it’s planed resource and effect tracking…

(Only some build-in verification capabilities, maybe in the form of a “sub-language” like Cogent¹, would be missing. But now I’m daydreaming and should stop spamming this forum for sure. :grinning:)


¹ Cogent — Cogent 3.0.1 documentation

4 Likes

I’m inclined that Loom won’t solve UI-thread coding issue.
For instance, suppose I want to launch a Swing application, and I want the app to display a progress bar as it loads.

A naive implementation could look like

PluginManager.install(this, true);
splash.setProgress(30);
log.debug("Setup tree");
JMeterTreeModel treeModel = new JMeterTreeModel();
JMeterTreeListener treeLis = new JMeterTreeListener(treeModel);
final ActionRouter instance = ActionRouter.getInstance();
splash.setProgress(40);
log.debug("populate command map");
instance.populateCommandMap();
splash.setProgress(60);
treeLis.setActionHandler(instance);
log.debug("init instance");
splash.setProgress(70);
GuiPackage.initInstance(treeLis, treeModel);
splash.setProgress(80);
log.debug("constructing main frame");
MainFrame main = new MainFrame(treeModel, treeLis);
splash.setProgress(100);
ComponentUtil.centerComponentInWindow(main, 80);
main.setLocationRelativeTo(splash);
main.setVisible(true);
main.toFront();

Unfortunately, it has two issues:

  1. Swing APIs should be called only from the AWT thread, so the startup method must be called from the AWT thread
  2. If the sequence executes on the AWT thread, then Swing has no chance to respond to setProgress calls. In other words, the UI is not updated, and the progress bar is not really moving (that was exact issue in JMeter by the way)

I do not think Loom solves this case since I can’t execute the same sequence on a random virtual thread (see 1.)

What is needed here is something that would split the method (e.g. after each setProgress call), so it “releases the UI thread”, and schedules the continuation shortly afterwards.

I agree coloring functions looks sad, however, Kotlin coroutines enable writing the method in the very same linear sequence, yet it could re-schedule continuations on the UI threads right after setProgress.

Here’s the implementation: Use kotlinx-coroutines for UI launcher by vlsi · Pull Request #712 · apache/jmeter · GitHub

suspend fun startGuiInternal(testFile: String?) {
    setupLaF()
    val splash = SplashScreen()
    suspend fun setProgress(progress: Int) {
        splash.setProgress(progress)
        // Allow UI updates
        yield()
    }
    splash.showScreen()
    setProgress(10)
    JMeterUtils.applyHiDPIOnFonts()
    setProgress(20)
    log.debug("Configure PluginManager")
    setProgress(30)
    log.debug("Setup tree")
    val treeModel = JMeterTreeModel()
    val treeLis = JMeterTreeListener(treeModel)
    val instance = ActionRouter.getInstance()
    setProgress(40)
    // this is a non-UI CPU-intensive task, so we can schedule it off the UI thread
    withContext(Dispatchers.Default) {
        log.debug("populate command map")
        instance.populateCommandMap()
    }
    setProgress(60)
    treeLis.setActionHandler(instance)
    log.debug("init instance")
    setProgress(70)
    GuiPackage.initInstance(treeLis, treeModel)
    setProgress(80)

The code looks sequential and understandable, and the compiler splits the execution into chunks so the UI can be updated in-between.

5 Likes

The shown code would be also a nice example for resource and capability tracking.

Just imagine the progress bar is a resource and updating it would require the appropriate capability.

Not only that you could still write that code in direct style, you actually couldn’t use the progress bar wrong!

I’m really looking forward to this new capabilities in Scala. :smiley:

1 Like

What about code like:

VirtualThread.run(() => {
  something
  something
  something
  runOnGuiThreadAndWait {
    showGui()
    setProgress()
  }
  something
  something
  runOnGuiThreadAndWait {
    setProgress()
  }
  something
  something
  runOnGuiThreadAndWait {
    setProgress()
  }
  something
  something
})

this shifts all heavy lifting outside of the GUI thread, so GUI retains full responsiveness.

OTOH, if you yield only after setProgress() then running heavy somethings on GUI thread will freeze it for some time.

1 Like