PRE-SIP: Suspended functions and continuations

What do you mean, possibility to suspend? If you mean a “thread” of computation “waits” for some other “thread” of computation to do something, then this has nothing to do with sync vs async:

  1. Synchronous code suspends whenever it “blocks” (InputStream#read, URL#hashCode, etc). This is OS-level suspension, as the thread is put to sleep and woken up again when there is something to do.
  2. Asynchronous code suspends whenever a result is not immediately available (“semantic blocking”). This is runtime-level suspension, as the green thread (fiber) is put to sleep and woken up again when there is something to do. These suspensions are more efficient than OS-level suspensions.

If you wanted to track “possibly suspends” in the type system, it would have nothing to do with sync vs async, and everything to do if the thread / fiber / virtual thread / green thread is capable of entering a waiting state.

I personally do not think such tracking is useful: how is it actionable? How will it cause you to write different code? Most hypothetical use cases are better solved by timeouts (see @fanf’s post above).

Moreover, even if it were marginally actionable, i.e. you can contrive some example where this information causes you to write different code, then you have to ask the question: is the signal going to be visible amonst all the noise?

Here is a list of things that may “possibly suspend” (at the OS-level, AKA “synchronous”, or virtual thread level, AKA “asynchronous”):

  • Putting an item into a queue or taking an item from a queue
  • Using synchronized around a method (e.g. Collection#add for some collections)
  • Reading from sockets or files
  • Computing the hash code of a URL
  • etc.

Due to the concurrent-safe design of many libraries and vast portions of the JDK, suspension points are innumerable, and occur at places you might not expect.

Even if tracking “suspendability” were actionable (and I am not convinced that it is), how would you be able to find the needle in the proverbial haystack?

In summary, and in my opinion, sync vs async tracking is not useful, and even “suspendability tracking” appears not actionable, but even if it were actionable, it’s not clear that it would be practically so due to the huge number of suspension points in concurrent-safe code.

4 Likes

I agree with both your initial statements, and don’t see where that implies there’s value in tracking suspension in function signatures. As far as I can tell, they’re totally unrelated

I think I covered that as well already:

Here’s a reason why you would want that: Your main thread can call f directly. But calling g means that your mean thread might suspend for a long time, in which case it cannot deal with other events. That’s a bad idea obviously. So, we want to know about whether a function can suspend (or take otherwise a long time to run) or not. As Eric Meijer says: Delay is an effect.

In other words: If you don’t go all in with one thread per action you will compose actions sequentially. And then it matters whether an action can suspend or not since suspension is transitive.

EDIT: I really should stop now. I did not want to hijack the thread with that question.

You covered that, and I responded. The only reason “main thread cannot deal with other events” is a problem, is due to limited thread pools, causing either starvation or fairness issues. The limited thread pools are going away with Loom, along with these issues, so that “main thread cannot deal with events” is no longer a problem.

If you don’t believe any of these steps in the reasoning that’s fine: maybe we think Loom won’t land, maybe we think supporting non-Loom environments is important, maybe we think that there are other reasons apart from those given that tracking suspension is important. But you haven’t said which step you don’t believe, so at this point we’re both just talking in circles quoting ourselves :stuck_out_tongue:

The whole thing boils down to: why is it such an obviously bad idea that the main thread cannot deal with events? We know why that’s true pre-Loom. I have given the reasons above. It has been true for 20 years. But post-Loom, the problem space has changed, and our reasoning should change along with it.

3 Likes

I think it’s important we discuss the value of tracking suspension. In my view, we lose safety if we don’t as we achieve direct style but we are unable to track which operations may suspend and can run into unfortunate cases like the toString screenshot above. From the original proposal:

An example of such safety in practice

No, not at all. Even with an unlimited thread pool, a main thread that waits when calling function g and that therefore cannot handle another event that is supposed to be treated later causes starvation. You will say, sure but that’s bad design. The events should be handled in different threads. To which I respond, yes, but how do you know that it’s bad design? You need to know whether a call can suspend (or otherwise take a long time to complete) or not. And then we get to the tradeoff whether that should be in the types or treated informally.

9 Likes

So here are my 2 cents on this topic

Firstly this discussion where we use the terminology of sync vs async is missing the bigger picture where fundamentally we are debating on concurrency and parallelism. To make things clear

  • Concurrency: An abstraction that lets you represent the notion of preforming multiple computations at the same time
  • Parallelism: A hardware/physical feature (i.e. cores/threads) that lets you in reality perform computations at once

Various programming languages for both historical and design reasons have confused/mixed these concepts which underlies what is being discussed here. For example Java (up unti loom) has used threads as its core concurrency model which is great for CPU bound tasks since they bind directly to OS threads but is terrible for IO bound tasks where your bottleneck is not your CPU being busy computing some task but rather waiting for a response on the network/filesystem.

In my opinion, the most ideal case if you were to design a programming language from scratch is to have an abstraction for concurrency (not parallelism!) as your core abstraction and then have various interpreters/implementations to handle the different cases (i.e. a ForkJoinPool for your IO bound tasks in your webserver/database/filesystem and raw Thread's for your CPU bound algorithms). Even with their problems, this is what Scala and Rust do although with different results, i.e. both Scala (with Future/IO/Task etc) and Rust (with async) let you represent your logic in purely abstract terms while giving you fine grained control about how to run those concurrent computations (i.e. executors/execution contexts). The big difference here is that due to Scala being based on a VM (JVM) with seamless interopt on a high level language (Java) that doesn’t have the necessary abstractions (pre loom) there are performance overheads where as Rust will transform async code at compile time into a state machine that runs on your specified executor (this is the fastest way you can solve this problem).

Commentators that are parroting around saying “green threads will solve everything” and using examples of languages like Go or Elixir are conveniently ignoring that for CPU bound concurrent tasks you cannot rely on green threads alone. Green threads may be cheap but if you have a CPU bound concurrent task’s extra cost is not cheap anymore, in more detail

  • Erlang: Erlang actually has significant problems for long running CPU bound code, anyone with experience using it in a non trivial circumstance would have actually come across this. In practice Erlang solves this problem largely in the same way that Python does, i.e. these CPU bound functions are abstracted away with FFI which is often backed by C (for example a lot of number based operations don’t use Beam’s green threads otherwise it would be pitifully slow). Quick googling of “erlang CPU bound” demonstrates this
  • Go: Its been well documented that if you have pure CPU bound tasks and you use goroutines as their concurrency primitive, it is notably slower than the C/C++/Java/Rust equivalent using raw threads.
  • Haskell: Haskell has virtual threads but actually distinguishes between IO bound tasks (IO monad) and synchronous tasks (not bound in IO monad). Due to this ghc (and the fact that ghc compiles to native code and not a JVM) can take advantage of this. Also people doing CPU bound concurrency in Haskell use STM, Control.Concurrent.Thread (which are standard threads) or other techniques.

In practice this is why languages like Go and Elixir have fond their niche in areas where IO bound concurrency is much more important (webservers/network/etc etc) and not CPU bound problems where the JVM is still quite heavily used.

So really what does this mean? Well in essence it means that if you want to have a language (and a runtime) that handles both CPU and IO bound tasks performantly you are going to have the colored function problem or alternately different abstractions for dealing with CPU bound computations (or both). You need some way of marking filesystem/network/IO tasks to be multiplexed onto a small amount of threads and other CPU bound (or cheap CPU tasks such as boxing of datastructures) to NOT be needlessly multiplexed on multiple threads and ideally all be run on the same thread/s with minimal context communication between those threads. If you only have green threads then your language will be great for webservers but terrible for CPU bound number crunching algorithms and vice versa. If you want more evidence about why only having green threads is a terrible idea, this is what Rust initially tried to do and realised that they experienced all the problems I just described (Rust being a new native language had the liberty of experimenting in great detail with different solutions, and they ended up scrapping green threads which was their initial attempt).

Ironically the problem with Scala is that it has the basis for good abstractions to solve this problem in a principled manner but because of language stability reasons and the JVM its execution has been sublime (in contrast to Rust). For example with Future it would be ideal if the Future.map operation would, by default, run on a ExecutionContext.parasitic since almost all of the time the .map operations are cheap CPU tasks that are best executed quickly on the same thread (this is the same reason why when you read the akka/akka-stream codebase on pretty much every future.map operation they explicitly run this on ExecutionContext.parasitic). For blocking calls there is the blocking function but alas since Future is in the Scala stdlib its too late to change this. The bigger thing is that even though Scala has the necessary tools to solve this, because of its history (JVM and then multi-platform) there are issues in its execution/implementation.

So more to the topic of this SIP regarding suspended functions/continuations, this effort I think is going to be largely wasted and there are many people in Kotlin actually arguing that adopting co-routines was a bit of a mistake. As mentioned earlier, there are very clear issues with HKT’s/HOFs when it comes to suspended functions and unlike Kotlin/Java/Go this is extremely common in Scala. Also from personal experience in using co-routines in Kotlin, while its great for trivial linear code that you want to execute asynchronously its actually more difficult to reason about when dealing with more complex interruption/concurrent cases. I think it would be far wiser to spend the effort on coming up with and/or improving the current abstractions that we use rather than jumping to co-routines which work better with other languages that are designed to be simpler (i.e. go/co-routines), these languages are not Scala.

Also on another note, exceptions are not a replacement for value based propagation (i.e. Option/Either/Validation etc etc).I know that a lot of effort has been put into Scala 3 for making exception based error handling more principled and ergonomic, but no-matter what is done there are a whole set of problems where you don’t want to use exceptions whatsoever and thats because while throwing exceptions is free, catching them are extremely expensive. This is why, for example, you would never use exceptions for form validation on a webserver, these kinds of problems will always be using value based error handling. I think it would be fair to say that even with modern idiomatic Scala code, exceptions are only to represent cases where your program enters a state that it isn’t reasonable to continue and if you do end up catching exceptions its typically only to log/print the exception+stacktrace and continue crashing/recovering, otherwise you use Option/Either or some other validation type.

Why am I bringing this up? I am getting the impression that people here are arguing that we need to move away from Monad’s such as Option/Either to try/catch/throw because they are “complex to understand” and “don’t scale well when composed with other effects” (such as Future/IO/Task etc) which, while its true that this is a difficult problem; having to work with Monad’s (or more specifically value based error handling) is something we will always have to deal with in Scala and brushing this under the rug because its “annoying” and “we don’t like dealing with it” isn’t doing anyone any favors.

In summary, adding continuations is solving the wrong problem (in context of Scala) and there have been attempts at this before which have failed, i.e. https://github.com/scala/scala-async and while I am sure that this proposal would be more ergonomic than scala-async its still not enough of a reason for people to pick it up.

18 Likes

That sounds… Very confusing and easy to stuff up. Would you be able to provide a couple of examples showing how this might work in practice?

1 Like

ZIO 2 “solves” that problem by having a runtime that manages where the function is actually run for you.
JDG being present, he will be able to tell more about that, but I was the trigger to bring the attention to that complicated point in ZIO 1. So, basically, there’s 3 kind of thread:
1/ an interpreter one that is special and that needs extreme care about how it handles blocking: it needs to be made by specialist and is part of the runtime capabilities itself, like GC is a given of the JVM. That interpreter manages where code lives between two main kind (and corresponding threadpools):
2/ the kind of “compution that run quickly, can take advantage of a CPU full time and don’t wait”. Not that it can have I/O things here, as long as they don’t block (for ex, modern network interface are totally able to saturate a CPU usage, even if they are not pure computation)
3/ things that block.
‘Blocks’ is given by some notion of progress that I have zero idea about. But the manager is able to observe, and even learn for futur occurence, which function blocks and migrate them from the CPU-bounded threadpool to the I/O-blocking one.

As an user, I can help the manager by tagging a function as blocking, to avoid the try/observe progress/migrate part. And I’m sure than in a near future I will be able to tag function for core affinity and the like (“this function is pure computation and I really want to take advantage of CPU caches”).

There is inefficiency along the way - migration from one threadpool to another is costly, and other things. But it’s the kind of inefficiencies similar to garbage collection: they allow for me, the user, to never ever have to think “does that implementation of a third party lib might block in a rare scenario”. And just, by default, have a good usage of available resources.

3 Likes

Unless I mistaken this would still incur overhead, i.e. you have a runtime which is trying to figure out what tasks are CPU bound and what tasks are IO bound where as explicitly defining what is IO bound and what is CPU bound will have better performance since you don’t have this overhead.

I would say that comparing this to a GC wrt inefficiency isn’t really an apt comparison. GC isn’t necessary slower (in fact in some cases it can be faster), the bigger problem with GC is it doesn’t give you control over memory

Most importantly currently the JVM/Java/Scala is being used in contexts where having this inefficiency is not acceptable so while its fine to have a library/framework to make things easier at the cost of performance enforcing this on a language/VM level isn’t going to fly.

You remember that exactly the same was said about GC when they were introduced? Exactly the same. And as you say, now they allow to sometime raise better perf (at the cost of bigger footprint). You still use rust where you need tight controle over memory.

This is a very general and broad assumption. I would say that having scala solving the “I’m not sure what is blocking and how long, optimze that for me” use case would be very compiling. Even since as soon as the runtime manage that, you can apply the same kind of innovation we got than in GC land, and why not ends up with better predictive / optimistic thread usage than what you could do by hand. In my case, the kind of app we are using scala for takes huge benefits of that.

But also, your last sentence is very important: ZIO demonstrates that it does not need to be done at the lang level, it can be done with a lib. Oh, there certainly things that can be enhanced with a shared knowledge with the langage or the jvm (and actually, ZIO will benefit of Loom optimisation for better perf). But it can fly as lib.

1 Like

We know that it is not possible to say, in general, whether an arbitrary Scala function will “take a long time to complete”, because the language is too powerful. So what we are really talking about right now is “suspension tracking”.

Now, as for suspension, in this Discourse thread, all I have been hearing are arguments why async suspension should be “tracked in the type system”. Indeed, the PRE-SIP suggests only this.

No one is talking about sync suspension. In fact, it seems everyone is satisfied to track async suspension, and also completely ignore sync suspension.

Let’s take the following purely synchronous function, which would not be tracked under any proposal in this Discourse thread:

public byte[] readNext(int n);

A function calling this method will absolutely suspend nearly every time it is called with sufficient n, because the runtime cannot guarantee to read the next n bytes within any definite amount of time. This is an operating-level system synchronous suspension, mediated by the runtime, whereby the OS thread will be placed into a waiting state, to be awakened when a sufficient number of bytes has been read from the socket.

The key point: here we have a purely synchronous method that is suspending potentially forever.

Yet, no one here is arguing that we should track this suspension in the type system. Instead, everyone seems keen on tracking “async suspension”.

If folks are happy only tracking async suspension, which every argument on this thread would seem to indicate is correct, then Loom revelas a glaring contradiction:

  1. Pre Loom, no one wants to track readBytes in the type system (literally no one has suggested it!), despite the fact that it will almost always synchronously suspend.
  2. Post Loom, neither the signature nor the semantics of readBytes will change, nor will its execution time change in any way.
  3. Post Loom, readBytes magically becomes 100% async, and its synchronous supension will change to async suspension.
  4. Suddenly, now everyone wants to track it, even though nothing changed?!?

This is the definition of an incoherent position.

IF there is value in tracking async suspension—which is not something that anyone has demonstrated, principally, by coming up with an example where a developer would write different code knowing of the existence of an async suspension—then by virtue of the isomorphism between sync and async computation, then there must exist this exact same value in tracking sync suspension. Which no one is arguing for (!).

To be consistent, one must either say that compile-time data on both sync and async suspension is not actionable (which is my position, because I believe timeouts are generally the solution to any such contrived use cases); or that compile-time data on both sync and async suspension is actionable. Forcefully stated, there is no coherent position which gives special status to async suspension.

If some future version of Scala can track suspension, it must necessarily track sync and async suspension, treating them equally (because they are equal in every way that matters). Which means, yes, Collection#add and numerous other functions will have to be “tracked” in the type system as being able to suspend a (physical|virtual) thread. That’s a stupendously noisy future for a hypothesis that no one has yet convincingly demonstrated.

What I am strongly convinced of is that adding two-colored functions that work poorly with higher-order functions and polymorphism by importing a legacy Kotlin design made irrelevant by Loom’s ideal solution is absolutely unnecessary and actively detrimental to Scala 3.

9 Likes

Ok I think I understand where you are coming from

“Moving slow operations off a key thread to avoid starvation of that specific thread” is a valid concern; that’s a common issue in UI programming and Game programming, where heavy operations in the core event loop cause issues. I feel this all the time when I try to rename a file in Intellij, and it starts indexing the world on the main thread and locking up the IDE

But as JDG has said, this has nothing to do with suspending or not: a heavy CPU-bound computation can also cause the key threads to be blocked problematically. As I’ve mentioned before, scala.Map#toString has personally caused me problems in the past, not because the code is unsuitable, but because the data shape was unusual with a much larger Map than most people would expect to work with.

In the end, starvation of individual threads due to long-running operations is a performance issue, and performance issues are notoriously hard to analyze statically. Even for human programmers, the general advice is “always profile first, don’t try to statically analyze the code”. Suspension, networking, IO, etc. is all a red herring here, because heavy CPU computation causes all the same issues. And the resolution is the same: profile it, identify the hotspots, and optimize them in place or move the long-running tasks (whether CPU or IO bound) to a separate (virtual)thread.

Given how difficult performance issues are to statically analyse, I think expecting the compiler to be perform accurate analysis here with only static information is not a promising approach. The compiler doesn’t know

  • The shape/size of the input data
  • How many cores the code will have available to parallelize things
  • Whether the filesystem is spinning-rust, SSDs, or in-memory tmpfs
  • Whether your JDBC query is going to an in-process SQLite, or to a postgres database 10,000 miles away
  • The “use case” of the code which could make some 1ms computations unacceptably slow (e.g. in a 144fps Game loop) while other 10s computations perfectly fine (e.g. in a once-a-day batch job)

All of these are things that can, and do, make-or-break whether a key thread is blocked for an acceptable amount of time or not. e.g. when IntelliJ’s indexing blocks my IDE UI thread and doesn’t block other people’s, it’s because of the shape/size of the input it’s handling, not because I’m running an inferior implementation of IntelliJ compared to others.

That’s not to say inaccurate analysis is not useful. The current rules of thumb of not blocking threads is kinda-sorta useful, even if there are false positives and false negatives (a community leader once told me that instead of using a blocking AWS SDK call in a Play controller in my O(0.0001) qps college toy project, I should instead re-implement the entire AWS SDK in an async fashion…). But this sort of inaccurate analysis seems like something that belongs in a Linter (with appropriate configurability to properly suite individual codebases, and @nowarn escape hatches to override the linter when it’s wrong) rather than built deeply into the language spec and compiler

9 Likes

In an ideal world one would track this case but the reason why at least personally I don’t advocate for this is because in most real world situations this isn’t really that possible because this typically happens in cases where you have large input data, something the programmer only knows about (@lihaoyi example of calling .toString on a large map, or I case I had recently in calculating diff on data structures that are 30mb plus in memory). For such tasks however you still ideally want to have the ability to designate how they will run, for example you may want to execute these tasks on a pinned separate thread so at least you don’t stall the current thread. BTW these cases happen all the time in gaming and to a lesser extent UI’s, you have heavy CPU bound computations that in a lot of cases only the programmer knows about. This reminds of when I built a Scala based GUI using swing some time ago, I was using Scala’s standard Future as my IO type and I had an ExecutionContext that represented the UI thread which gave me fine grained control of only doing rendering on UI thread and other heavy CPU bound/async tasks to be run separately.

This still however doesn’t detract from the fact that marking computations which we know should by run “asynchronously” (i.e IO/network/filesystem) is useful.

Also strongly agreed, while there is a lot of legitimate debate about how to tackle this problem and its not an easy one to solve, co-routines is a solution that doesn’t play well with Scala design and how its idiomatically used.

4 Likes

@jdegoes @lihaoyi I think the main reason why people are more concerned about suspensions than long-running computations is that a long-running computation is usually identified during testing but waiting for an external event can have dramatically different outcomes depending on test vs production environment, system load, state of the network, user behavior, etc. It’s not an absolute, for sure. But if we take the Microsoft guideline as an example, then even long running computations that are identified as such could be treated as if they are suspending.

2 Likes

I think the microsoft guidelines are reasonable, but it is worth noting that they don’t have lightweight threads on the CLR. That sort of guidelines are reasonable on the pre-Loom JVM as well.

Basically Loom removes the applicability of these guidelines to avoid thread starvation from “high-concurrency multi-threaded systems”, which includes most backend systems, web services, API servers, etc. For those, we can just auto-scale the thread pool if some get blocked, with Loom letting us do that cheaply and efficiently

These guidelines continue to apply to “high-concurrency single-threaded systems”: UI threads, Game loops, code running within certain kinds of Actors, and environments like Scala.js. These are scenarios where “throwing more threads at it” is not possible, and work needs to be moved off-thread manually.

The current guidelines in the Scala community are overwhelmingly targeted at the multithreaded use case, for people developing backend systems and servers. Not that much UI development or Game dev happening in Scala, and Scala.js remains niche. That leaves the code running in single-threaded Actors, but only those for which latency and responsiveness is important.

IMO this isn’t a sufficiently broad use case to make a heavy investment in, but that’s a subjective judgement. I think we’ve reached mutual understanding and there isn’t any more to discuss :slight_smile:

5 Likes

Then I would suggest the conversation move from tracking async suspensions (which is inconsistent or even incoherent, as demonstrated above), to tracking indefinite-length computations, for which tracking both sync and async suspensions could be regarded as a poor man’s proxy.

If the goal is to track indefinite-length computations, then I would regard that as prime topic for future research, and personally, do not see how that relates to Loom, Kotlin coroutines, etc.

Incidentally, I share @lihaoyi’s opinion that the value of tracking indefinite-length computations is virtually gone in a post-Loom world.

Indeed, it is already gone for those using functional effect systems like ZIO.

If I am writing the following code:

for {
  bytes      <- drainStream(request)
  transcoded <- doTranscoding(bytes)
  _          <- uploadToS3(transcoded)
} yield Response.OK

Then for every statement, there exist two possibilities:

  1. I need the result of this statement in order to continue to the next statement.
  2. I do not need the result of this statement in order to continue to the next statement.

Note that this is a question of data flow, and fully resolvable statically.

If I care about latency (game and UI apps are excellent examples, but even in a microservice or API backend, latency matters a lot), then in any case where (2) holds (that is, in any case where the result of some computation is not needed in order to continue to the next statement), I will execute such qualifying statement in the background.

Using ZIO, I would transform the code to the following:

def doProcessing(bytes: Array[Byte]) = 
  for {    
    transcoded <- doTranscoding(bytes)
    _          <- uploadToS3(transcoded)
  } yield ()

for {
  bytes <- drainStream(request)
  _     <- doProcessing(bytes).forkDaemon
} yield Response.OK

In this refactoring, I am respecting sequential ordering in cases where subsequent statements depend on prior statements. In other cases, I am shifting work to new background fibers, which are so cheap you should always use them.

In a post Loom world, this is the new paradigm for low-latency: if you need the result from a previous computation, then you must perform it sequentially after that statement. But if you do not need the result from a previous computation, then you may, and often should, perform that work in the background.

Here’s the kicker: Tracking “suspendability” is neither a necessary nor sufficient condition for performing work in the background. For example, draining the stream will be done on a thread (or virtual thread) that suspends (either synchronously or asynchronously). Yet, we need the result of draining the stream in order to proceed to the transcoding step. So the fact that draining the stream may suspend is irrelevant to our sequential logic. Yet, to return the OK response, we do not need to wait for the transcoding or uploading to complete, so we push that computation into the background on a new virtual thread.

ZIO (and of course Loom) make it so cheap to do background processing that the new paradigm is: if you need to do something sequentially, then you do it sequentially, if you don’t need the result to make further progress, then you push it onto a virtual thread.

At no point do we need to understand or care about whether an OS thread is synchronously suspending, or whether a virtual thread is asynchronously suspending. That is not a relevant consideration, and even if you argue there is a poor man’s proxy there (a heuristic), I can show innumerable examples that demonstrate how weak that proxy is (e.g. Collection#add doing synchronous suspend pre-Loom; URL#hashCode doing asynchronous suspend post-Loom; etc.).

In summary:

  1. Sync suspension and async suspension must be treated together, never separately; any tracking proposal must consider them equivalent in every possible way (they fuse into the same concept under pure green threading models, such as the one Loom is almost giving us).
  2. In the new paradigm of cheap virtual threads, as argued by @lihaoyi, if we don’t need some result in order to make progress, then in any low latency application (not just games and UI), we will push that result into the background. This is a data flow question that has a statically analyzable answer, but it has nothing to do with tracking sync + async suspension in the type system.

Precision and clarity of thought is extremely important to get to the heart of the matter, which I think we have gotten to after much discussion.

4 Likes

I would like this thread to focus on the proposal of having support for continuations in the language instead of discussing the impact of Loom on functional effects. So, I moved a post to a new topic: Impact of Loom on “functional effects”. Please continue that discussion there!

This is something that I think isn’t quite so trivial. Simple static dataflow analysis makes it easy to statically move stuff onto cheap virtualthreads, except:

  1. Side effects are still prevalent throughout Scala. Not as common as in other languages, but enough that parallelizing automatically is risky business. Even when I’ve parallelized stuff manually, I regularly get bitten by side effects I didn’t notice

  2. Granularity & Overhead: virtualthreads are cheap, but not free. IIRC they’re something like ~1kb each, v.s. ~1kb per Future ~1mb for OS threads. That means you can’t go around parallelizing every 1 + 1 in your program without drowning in overhead, and you have to pick some granularity of work below which you won’t parallelize

It’s easy to parallelize things in Scala thanks to it’s FP-ish nature, and Loom makes it easier by making thrrwds cheaper, but I don’t think it’s reached the point where we can feasibly automate it yet. At some point, you have to make subjective judgements about what to send to background threads and what to run on your current thread.

Deciding what to parallelize is a performance issue, and one the challenges of performance issues is the fact that a decision here that works perfectly in one environment with a given input data can totally fall apart in a different environment on a different set of inputs

6 Likes

Most users of functional effect systems are not relying on them primarily for async programming. Rather, they are relying on them for concurrency (compositional timeouts, races, parallelism), resource-safety in the presence of concurrency, typed errors, context, fiber-local state.

For me, as an author, this was the point of proposing continuations. To encode functional effect systems and use them in a direct style, for more use-cases than asynchronous programming:

We would like to write scala programs in a direct style while maintaining the safety of the indirect monadic style.

We’d like to not have to wait until Loom-capable JDK/JVM usage reaches the majority of deployed production systems. Currently, only 12% of Java users are on JDK 15 (Source). It could be a very long wait for Loom to land. Perhaps it will be backported, but that is unknown.

How we may encode functional effects and program them with a direct style, given the current absence of Loom on most deployed JVM systems and on scala-js/native, is the purpose of this pre-proposal.

5 Likes

I agree I oversimplified the issue to focus on data flow. You do not always want to push every statement (whose result is not necessary for making further progress) onto a background thread.

That said, keep in mind the heuristic would not push 1 + 1 to the background, because that produces an Int that would be required for subsequent computation. Rather, it’s generally Unit returning methods or side-effecting processes (uploading a file, doing CPU processing, etc.) those results are unnecessary.

(In modern loggers, even log(str: String): Unit is effectively pushed onto a background thread via queuing. As a heuristic, Unit would be a vastly better criteria than “does async / sync suspension”, because usually you need the result of such a computation to make further progress.)

For sure. Maybe this is also a good point to think about what tooling could bring to this problem: it’s one strategy to force a programmer to add type annotations to manually propagate statically-known information about runtime behavior (suspendability, big-oh, little-oh, exception types, etc.); and another, quite different strategy to expose runtime behavior via tooling.

Tooling has the advantage that it’s less work for a developer: you do not need to engage in the boilerplate of propagating statically-known information via manually typing ASCII characters (which is probably the most common complain for statically-typed exceptions in Java).

Imagine a tool that looks at your code and says, uploadToS3() returns Unit, which is not used by subsequent statements, and on average takes 24 seconds to complete. Do you want to execute this computation on a virtual thread? [Yes] [No]`

2 Likes