PRE-SIP: Suspended functions and continuations

jeremyrsmith · July 14, 2022, 12:04am

I deleted my original message about this, because I had hoped that someone with more familiarity about it would bring this up (I know next to nothing about it).

But, I think it needs to be pointed out that there is already some support for state-machine based async which was merged into the compiler in 2020.

So there’s at least a precedent of supporting a similar thing in the compiler – though not with new language constructs (other than what can be introduced by a macro). Or at least, there was, in Scala 2.x.

I also think arguments involving Loom are unproductive. It’s not any more useful to argue “this is unnecessary, because Loom (an implementation detail) will make threads cheap” than it is to argue “this is necessary, because the current implementation makes threads expensive”. Both arguments are orthogonal to the question of whether the language should have a primitive for a continuation. Heck, an argument for the proposal, invoking the looming Loom, is that because Loom will probably eventually expose a continuation primitive, Scala should introduce one preemptively.

To be clear, I’m not arguing either way. I just think the arguments against are relying too much on Loom, and I am curious about how the compiler’s built-in support for scala-async (which I wasn’t a fan of, FWIW, but there it is) factors in here. Could the proposal (by backing off on keywords, and using types or annotations) be macro-implemented in a similar way, to start off with? In order to demonstrate its benefit in-vivo (though maybe with slightly clunkier syntax)

MateuszKowalewski · July 14, 2022, 4:03am

How would the debug experience look like with this new approach to color your functions?

That’s an issues all async implementations seem to have.

People in the Kotlin world for example are talking about that issue, and it seems still unsolved.

github.com

Kotlin/kotlinx.coroutines/blob/master/docs/topics/debugging.md

**Table of contents**

<!--- TOC -->

* [Debugging coroutines](#debugging-coroutines)
* [Debug mode](#debug-mode)
* [Stacktrace recovery](#stacktrace-recovery)
  * [Stacktrace recovery machinery](#stacktrace-recovery-machinery)
* [Debug agent](#debug-agent)
* [Android optimization](#android-optimization)

<!--- END -->

## Debugging coroutines

Debugging asynchronous programs is challenging, because multiple concurrent coroutines are typically working at the same time.
To help with that, `kotlinx.coroutines` comes with additional features for debugging: debug mode, stacktrace recovery 
and debug agent.

## Debug mode

This file has been truncated. show original

https://issuecloser.com/blog/kotlin-coroutines-stack-trace-issue

The issues could be potentially solved on the JVM to some extend by fully embracing Loom, but what about the other Scala environments?

Also, when fully relaying on Loom, why would we need this in the first place? (Anyway Loom seems quite far from some production runtimes given that some people are still on Java 8…)

This proposal also seems not to solve one of the other common problems: You don’t see on call-side which functions block! So you could block inside a suspended function without noticing by accident. This has quite bad consequences for the whole async runtime usually.

(Loom solves this by making more or less everything transparently non-blocking and async. But Loom is far away, and only availability on the JVM)

The other thing I’ve noticed in this proposal (a point which I really don’t like to be honest): The function signatures become meaningless to some extend, as you lose referential transparency, even on seemingly trivial functions. It returns a String according to its type? But well, it could cause a network-wide dead-lock on your cluster by doing that… Welcome back to imperative programming hell!

Not only debugging things becomes harder, you don’t even know with functions could potentially cause havoc as they look all harmless judging only by their signatures. Without ad hoc external linting features bolted on (like the one showed in the IntelliJ screenshots) this becomes a mine field.

Besides that: I don’t see the simpler (and imho more down to earth) approach mentioned in the alternatives. Instead of full blown continuations one could implement directly one of the patterns which are often build on top of continuations, namely coroutines.

Continuations by themself are one of the most “heavyweight” features ever invented in programming languages (form the mental-effort-to-grok-them standpoint), and almost no languages expose them to end-users. That’s for a reason, imho. Continuations are “just too powerful”.

Coroutines on the other hand would be likely more in the spirit of a “least power approach” in this case here.

There have been even already some experiments in Scala regarding coroutines — which looked quite interesting imho, even the project as such seems to be dead since a long time.

(Also have a look at the linked website. The docs are interesting.)

A coroutine implementation seems to suffer from the debuggability problem still… Even it looks more “simple” in the end without those proposed CPS transforms (which require dedicated debugger support at least).

I’m in the camp of people who think there should be a way to liberate programming form the monadic style, but all those “painting your functions” approaches aren’t the answer in my opinion either.

For now and on the JVM Loom seems to be “the answer”. (Direct code, no visible colors)

Still that’s not the final answer as only proper effect- and resource-safety will improve things significantly!

I for my part would enjoy for sure a modern, performant, and safe systems language that could finally enable to build an innovative resource- and capability-safe operating system for a distributed networked world!

I hope Scala (native) will become this language some day with it’s planed resource and effect tracking…

(Only some build-in verification capabilities, maybe in the form of a “sub-language” like Cogent¹, would be missing. But now I’m daydreaming and should stop spamming this forum for sure. )

¹ Cogent — Cogent 3.0.1 documentation

vlsi · July 14, 2022, 7:20am

I’m inclined that Loom won’t solve UI-thread coding issue.
For instance, suppose I want to launch a Swing application, and I want the app to display a progress bar as it loads.

A naive implementation could look like

PluginManager.install(this, true);
splash.setProgress(30);
log.debug("Setup tree");
JMeterTreeModel treeModel = new JMeterTreeModel();
JMeterTreeListener treeLis = new JMeterTreeListener(treeModel);
final ActionRouter instance = ActionRouter.getInstance();
splash.setProgress(40);
log.debug("populate command map");
instance.populateCommandMap();
splash.setProgress(60);
treeLis.setActionHandler(instance);
log.debug("init instance");
splash.setProgress(70);
GuiPackage.initInstance(treeLis, treeModel);
splash.setProgress(80);
log.debug("constructing main frame");
MainFrame main = new MainFrame(treeModel, treeLis);
splash.setProgress(100);
ComponentUtil.centerComponentInWindow(main, 80);
main.setLocationRelativeTo(splash);
main.setVisible(true);
main.toFront();

Unfortunately, it has two issues:

Swing APIs should be called only from the AWT thread, so the startup method must be called from the AWT thread
If the sequence executes on the AWT thread, then Swing has no chance to respond to setProgress calls. In other words, the UI is not updated, and the progress bar is not really moving (that was exact issue in JMeter by the way)

I do not think Loom solves this case since I can’t execute the same sequence on a random virtual thread (see 1.)

What is needed here is something that would split the method (e.g. after each setProgress call), so it “releases the UI thread”, and schedules the continuation shortly afterwards.

I agree coloring functions looks sad, however, Kotlin coroutines enable writing the method in the very same linear sequence, yet it could re-schedule continuations on the UI threads right after setProgress.

Here’s the implementation: Use kotlinx-coroutines for UI launcher by vlsi · Pull Request #712 · apache/jmeter · GitHub

suspend fun startGuiInternal(testFile: String?) {
    setupLaF()
    val splash = SplashScreen()
    suspend fun setProgress(progress: Int) {
        splash.setProgress(progress)
        // Allow UI updates
        yield()
    }
    splash.showScreen()
    setProgress(10)
    JMeterUtils.applyHiDPIOnFonts()
    setProgress(20)
    log.debug("Configure PluginManager")
    setProgress(30)
    log.debug("Setup tree")
    val treeModel = JMeterTreeModel()
    val treeLis = JMeterTreeListener(treeModel)
    val instance = ActionRouter.getInstance()
    setProgress(40)
    // this is a non-UI CPU-intensive task, so we can schedule it off the UI thread
    withContext(Dispatchers.Default) {
        log.debug("populate command map")
        instance.populateCommandMap()
    }
    setProgress(60)
    treeLis.setActionHandler(instance)
    log.debug("init instance")
    setProgress(70)
    GuiPackage.initInstance(treeLis, treeModel)
    setProgress(80)

The code looks sequential and understandable, and the compiler splits the execution into chunks so the UI can be updated in-between.

MateuszKowalewski · July 14, 2022, 9:08am

The shown code would be also a nice example for resource and capability tracking.

Just imagine the progress bar is a resource and updating it would require the appropriate capability.

Not only that you could still write that code in direct style, you actually couldn’t use the progress bar wrong!

I’m really looking forward to this new capabilities in Scala.

tarsa · July 14, 2022, 9:09am

What about code like:

VirtualThread.run(() => {
  something
  something
  something
  runOnGuiThreadAndWait {
    showGui()
    setProgress()
  }
  something
  something
  runOnGuiThreadAndWait {
    setProgress()
  }
  something
  something
  runOnGuiThreadAndWait {
    setProgress()
  }
  something
  something
})

this shifts all heavy lifting outside of the GUI thread, so GUI retains full responsiveness.

OTOH, if you yield only after setProgress() then running heavy somethings on GUI thread will freeze it for some time.

vlsi · July 14, 2022, 9:18am

What about code like:

It does not work since I need to reuse values from some of the calls.

For instance, one of the steps creates treeLis = new JMeterTreeListener(treeModel) and I need to reuse that variable later.

runOnGuiThreadAndWait {
    SplashScreen splash = SplashScreen()
}
runOnGuiThreadAndWait {
    JMeterTreeListener treeLis = new JMeterTreeListener(treeModel);
    ActionRouter instance = ActionRouter.getInstance();
    splash.setProgress(40); // how do I access "splash" variable here?
}
...
runOnGuiAndWait {
    treeLis.setActionHandler(instance); // <-- how do I access "instance" variable here?
    log.debug("init instance");
    splash.setProgress(70); // how do I access "splash" variable here?
}
runOnGuiAndWait {
    GuiPackage.initInstance(treeLis, treeModel); // how do I access treeLis and treeModel variables?
    splash.setProgress(80); // how do I access "splash" variable here?
}

I do not see a reasonable way to access variables across runOnGuiThreadAndWait calls. Well, of course, I could wrap every variable into AtomicReference<...>, however, it would look awful.

An alternative option is to nest callbacks (so the variables are visible in the nested blocks), however, it would look even more awful.

OTOH, if you yield only after setProgress() then running heavy something s on GUI thread will freeze it for some time.

In my Kotlin example, withContext(Dispatchers.Default) moves heavy something from UI thread. It does not break the code flow, and it can be added incrementally.

Just in case you wonder: the code was like that for ages, and I do not say it is the best way to write UI code. I just run into a case where Loom does not seem to help, since Loom always stays on a single thread.

tarsa · July 14, 2022, 9:26am

runOnGuiAndWait can be made to return a value from provided function, so the code could look like:

val splash = runOnGuiThreadAndWait { SplashScreen() }
val (treeLis, instance) = runOnGuiThreadAndWait {
    JMeterTreeListener treeLis = new JMeterTreeListener(treeModel);
    final ActionRouter instance = ActionRouter.getInstance();
    splash.setProgress(40); // ok
    (treeLis, instance)
}
// or alternatively
val treeLis = runOnGuiThreadAndWait { new JMeterTreeListener(treeModel) }
val instance = runOnGuiThreadAndWait { ActionRouter.getInstance() }
runOnGuiThreadAndWait { splash.setProgress(40) }
...
runOnGuiAndWait {
    treeLis.setActionHandler(instance); // ok
    log.debug("init instance");
    splash.setProgress(70); // ok
}
runOnGuiAndWait {
    GuiPackage.initInstance(treeLis, treeModel); // ok
    splash.setProgress(80); // ok
}

still not as pretty as that Kotlin coroutine, but at least with the code above computations are done outside of GUI thread by default, so there’s hopefully less chance of mistakenly running CPU-intensive operation on GUI thread and freezing it.

vlsi · July 14, 2022, 9:36am

That is indeed better than AtomicReferences, however, it is still a bunch of runOnGuiAndWait calls.

I wonder if placing explicit runOnGuiAndWait on each line counts as code coloring

so there’s hopefully less chance of mistakenly running CPU-intensive operation on GUI thread and freezing it.

The flip side is NPE in UI for calling UI code from a non-UI thread.
Initially, JMeter was starting all the code from main thread, then someone observed a crash (or something like that), then devs wrapped all the startup within a single runOnGuiAndWait, then they realized progress does not update, and they split the startup in two functions.

I just tried to slap a coroutine over that code, and it worked flawlessly

dsilvasc · July 14, 2022, 9:43am

It’s still useful to signpost functions that perform remote procedure calls. If I have a list of user IDs and want their information, I might use a user data provider and call one of its methods, which eventually makes a remote procedure call. If I do this in a loop (mapping over the list and calling the provider for each user ID in the lambda), my application will call the remote service sequentially, one ID at a time.

Futures at least warn me at the type level that I probably shouldn’t do that.

On the other hand, networks are faster now so the cost of remote calls has become less important for applications without tight time budgets.

tarsa · July 14, 2022, 9:57am

As a side note, I think https://yona-lang.org/ could be a cool source of inspiration when it comes to concurrency and parallelism. I mean mostly the automatic parallelism facilities:

yona-lang execution model
let
    keys_file = File::open "tests/Keys.txt" {:read}
    values_file = File::open "tests/Values.txt" {:read}

    keys = File::read_lines keys_file
    values = File::read_lines values_file

    () = File::close keys_file
    () = File::close values_file
in
    Seq::zip keys values |> Dict::from_seq
In this example, both files are read concurrently, without having to write any additional boiler-plate.

How does Yona do this? A couple of things: first, check the difference between the do and let expressions on the syntax page. They both are used to evaluate multiple steps of computation, however do ensures that the steps take place in the same sequence as they are defined, let tries to parallelize non-blocking tasks.

MateuszKowalewski · July 14, 2022, 10:15am

I don’t think yona would be helpful here as it’s dynamic.

If you do everything at runtime anyway things become comparatively simple.

mdedetrich · July 14, 2022, 10:53am

Its interesting that people here are talking about the UI-thread coding issue but this in my view is already a solved problem in Scala as mentioned earlier, i.e. you can make your own ExecutionContext that points to the Swing event dispatch thread

object SwingExecutionContext {
  val executionContext: ExecutionContext = ExecutionContext.fromExecutor(new Executor {
    def execute(command: Runnable): Unit = SwingUtilities invokeLater command
  })
}

And then using standard Future composition you can just do this

for {
  _ <- something
  _ <- something
  _ <- something
  _ <- Future.successful(()).flatMap { _ =>
    for {
      _ <- showGui()
      _ <- setProgress()
    } yield ()
  }(SwingExecutionContext.executionContext)
} yield ()

The Future.successful is a bit weird but its written that way because you are not passing any data from another Future to render into the UI thread, i.e. something like this is more realistic

for {
  _ <- something
  _ <- something
  _ <- something
  _ <- getCurrentProgress().flatMap { progress =>
    for {
      _ <- showGui()
      _ <- setProgress(progress + 10)
    } yield ()
  }(SwingExecutionContext.executionContext)
} yield ()

Other IO types like Task/IO can also do this. The example shows how you can invoke computations outside of the main UI/event dispatch thread (typically with Future this is a ForkJoinPool and in the above example it would use whatever implicit ExecutionContext you have in scope) and then you can explicitly provide your own ExecutionCOntext if you only want to execute something onto the UI thread since you don’t want to overload it.

tarsa · July 14, 2022, 11:12am

automatic batching for parallel execution is done by static analysis, at least that is written on yona execution model:

Another important concept here is that the order of aliases defined in the let expression does matter. Yona doesn’t just randomly re-arrange them based on dependencies alone. If it did, it could for example close those files before they are ever read. That would be incorrect. Yona just uses static analysis of this expression to determine which aliases can be “batched” and actually batches the execution if they provide underlying Promises. Then the whole expression is transformed into something like this:

not sure how it works, though.

Scala 3 probably could use Capture Checking to determine subsequences of async computations inside which computations can be run in parallel, e.g. if computations capture disjoint capabilities then they can be run in parallel.

note: I haven’t spent a lot of time to check if my idea would work properly, but if yona-lang authors made something working then maybe Scala can do similar thing too.

mdedetrich · July 14, 2022, 11:16am

There is also HVM which does automatic parallelisation of purely functional programs https://github.com/Kindelia/HVM which you may find interesting.

To do something like this in Scala it would require a way for tracking purity, i.e. the compiler needs to know that functions are pure/side effect free which currently isn’t possible (you can use types at runtime to designate that computations are pure however the compiler just sees it as a type and nothing more).

Finally you would need to see if its possible for the JVM to show the same performance characteristics that HVM via clang does.

tarsa · July 14, 2022, 11:26am

Well, it’s interesting, but limiting automatic parallelism to only purely functional code reduces its applicability too much.

hepin1989 · July 16, 2022, 12:14pm

Add a soft keyword to solve a problem that will not be valid soon after Loom been released maybe not a good idea.

adamw · July 17, 2022, 4:37pm

I think I would frame the question differently; at least for me “suspension” sounds quite abstract and hard to reflect in the real world.

Maybe I’m oversimplifying, but isn’t the crux of the problem answering the question whether we want side-effecting and “normal” methods to have the same signature? (I tried asking the same on twitter some time ago, but without conclusive answers ).

If the signatures should be different - then the second step would be considering specific solutions. Coloring using IO, coloring using suspend or coloring using capabilities - I think these are the propositions on the table.

If however the method signature shouldn’t tell us whether the method is side effecting, and if we are targeting a Loom runtime - well then neither suspensions, nor capabilities are going to be useful.

Not that I know the “right” answers, however I am leaning towards having side-effecting and “normal” methods distinguished by the type system. The reason is simple, and I think quite well-known in the literature as RPC fallacy. People did attempt to make a remote call look as if it was a local call, and as far as I know, they all failed. To the point that it’s now pretty well established that it’s a bad idea. To give more context, take a look at Jonas Boner’s presentation. (Maybe Lightbend wasn’t so wrong about async after all But then, maybe the are talking about “async in the large” not “async in the small”?)

It’s all about failure modes - the ways in which a remote call can fail are vastly different from the ways a local call can. And by the way - all file-reading operations are network calls as well. Shouldn’t we be tracking this in our type system? The side-effecting operations mentioned here would probably be more or less what today we know as “blocking” operations, but as blocking is no longer an issue with Loom, we need to make our focus more precise (and that’s a good thing!).

These side-effecting capabilities can be more or less fine grained, but a direct consequence of a capability being required or not in a function, is how the function might fail, which errors and how be handled and how. And this brings direct runtime consequences.

In yet another words: I would first established what kind of properties we want to track through the type system in Scala. Reading through the (very interesting) proposal and discussion, I think there are quite diverging opinions. But only once a goal is set (not necessarily to unanimous applause), as to how far the type system should go, we can consider the (secondary to safety) syntactical approaches: using either wrapper, suspended or direct style.

mdedetrich · July 17, 2022, 10:16pm

The interesting point to make here is that out of co-incidence in almost all cases async tasks also happen to be side effecting, that is pretty much every IO/file/network operation also happens to be a side effect. What this means in practice is that even if you deliberately avoid the color function problems for async tasks (i.e. you make no distinction between asynchronous and non synchronous computations), if you still care about strongly typing your side effects then you pretty much end up re-creating red-blue color problem anyways.

This actually describes the history and design of Haskell. That is, even if Haskell didn’t have virtual/green threads and solved the IO vs CPU bounded computations in a different way, it would still have the IO because thats how Haskell is able to solve the “representing side effects in a purely functional language” problem.

I would argue that this is the reason why making a big deal out of the red-blue/color function problem is a bit benign because if you accept the preposition that a significant portion of the Scala programmers track side effects via types then you end up, by accident, marking your computations as asynchronous anyways. Which brings us to final point, if there is significant (usually in practice pretty much complete) overlap between marking async functions and marking side effecting functions, doesn’t it make sense to take advantage of this since we are solving 2 “problems” at once?

lihaoyi · July 17, 2022, 11:14pm

This is something commonly heard, but it doesn’t pass the sniff test.

The aws CLI command is called the same way as local CLI commands
The boto3 Python library is called the same way as local Python libraries
requests.get calls in Python looks the same as any other method calls

Yes, treating RPCs the same as normal methods can fail. In high performance or high concurrency scenarios, where the thread overhead is unacceptable. Or in high-reiability scenarios, where the novel failure modes become significant.

But to say “they all failed” is absurd. There are more people happily using Python’s requests alone than there are in the entire Scala community. Probably the majority of the world is treating RPCs like normal method calls, and it generally works reasonably well.

Sure sometimes treating RPCs as normal methods has caveats and overheads, and sometimes it falls apart, but that’s not unique to RPCs: every abstraction has caveats and overheads, and scenarios they fail. But that doesn’t mean they’re failures in general, it just means that specialized use cases sometimes call for specialized tools or techniques.

morgen-peschke · July 18, 2022, 1:29am

I think you might be comparing apples and oranges here.

CLI commands have a single (rather coarse-grained) path to handling failures (die with some error code), so the difference between aw and less failing is much less relevant than the difference in failure modes between a pure function and a database query.

The python libraries are also not really equivalent comparisons for a similar the same reason: idiomatic error handling in Python is to just throw an exception, so two Python functions which both have a return type and may-or-may-not throw exceptions (but you’d better assume they do) aren’t a great analog for how failure modes are handled in idiomatic Scala.

It would make sense that, if async computations can be made so performant that the difference between a pure function and side-effecting network call can be made invisible on the JVM, that this would be a boon to Java applications, and in this context, Loom replacing Future in Java applications makes a lot of sense.

However: having recently had to try to answer the question, “how many ways can critical method X fail”, in a part of a codebase that (while written in Scala) used the Exception-first style, I can say with certainty that moving to this sort of style would be a mistake.