Impact of Loom on “functional effects”

jdegoes · July 21, 2022, 8:03pm

In essence, this argument reduces to, Let’s make it difficult to use loops, so that a user has to think about sequential versus concurrent execution.

I sympathize with the goal of encouraging developers to think about concurrency opportunities, but I think the solution is not to delete loops or to make them painful, but rather, to make it easy to specify that a loop should be parallel if desired by an end-user. There are both language-level solutions to this problem, as well as library- and framework-level solutions (e.g. ZIO.foreachPar). Indeed, future JEPs will have their own new solutions to this problem.

Not every remote computation do you want to parallelize, even if you can parallelize them (e.g. events.foreach(insertTimestamped(_)), and not every local computation do you want to sequentialize (e.g. chunk.map(decodeJson(_))).

Perhaps developers should be thinking more about opportunities for concurrency, but that is not a reason to make an ordinary for loop complicated by using wrapper types whose net result is to delete the simple for loop and force something like Traverse#traverse[F[_]] on users.

In summary:

The fact that some looping operations can benefit from concurrency is not a reason to track async or even remote operations in the type system, or to delete classic loops (or higher-order functions like map/foreach), or to make it more difficult to use ordinary loops.
Languages, libraries, and frameworks all have their own diverse and opinionated solutions to encourage programmers to benefit from concurrent code, and innovation will continue to produce new higher-level abstractions that can more easily hint, suggest, or enforce concurrency where useful.

LPTK · July 22, 2022, 7:43am

Can you please provide us with your definition of “sync” and “async”? I suspect they are nonstandard, and that it’s why people have been talking pas each other in the past discussions.

In everything you say, it seems one could replace “async” by “using green threads” and “sync” by “using OS threads”. I hope I am wrong and I misread you, because this is not at all the accepted definition.

I found the analogy on this Microsoft documentation page quite nice:

If you have experience with cooking, you’d execute those instructions asynchronously. You’d start warming the pan for eggs, then start the bacon. You’d put the bread in the toaster, then start the eggs. At each step of the process, you’d start a task, then turn your attention to tasks that are ready for your attention.

I think asynchronous as most people understand it is just a style of programming where you launch tasks and continue doing things immediately, without waiting. There are several ways of implementing that style. Callbacks is one. Async/await is another. Both can use a thread pool or physical threads behind the scenes – using the latter does not make a program written in asynchronous style “sync” programming. So asynchronous programming has nothing to do with whether green threads or OS threads are used by the implementation.

jdegoes · July 22, 2022, 9:55am

I will quote JEP 425:

While [modern async designs] remove the limitation on throughput imposed by the scarcity of OS threads, it comes at a high price: It requires what is known as an asynchronous programming style, employing a separate set of I/O methods that do not wait for I/O operations to complete but rather, later on, signal their completion to a callback. Without a dedicated thread, developers must break down their request-handling logic into small stages, typically written as lambda expressions, and then compose them into a sequential pipeline with an API (see CompletableFuture, for example, or so-called “reactive” frameworks). They thus forsake the language’s basic sequential composition operators, such as loops and try/catch blocks.

The fundamental distinction between synchronous and asynchronous programming is that synchronous code returns its result directly:

def getUser(id: UserId): User

On the other hand, asynchronous code returns its result indirectly, by invoking a callback (don’t call me, I’ll call you):

def getUser(callback: UserId => Unit): Unit

In asynchronous programming, this could be termed an asynchronous return, to draw attention to its semantic equivalence with synchronous return.

Now, await/async or even for comprehensions slightly muddy the waters, because when you use such higher-level machinery that is built on callbacks, you do not see the callbacks, either at all, or very clearly. That is not an accident, it is by design: we wish to minimize the visible introduction of callbacks, so we can avoid callback hell.

But that await/async or even a for comprehension are capable of hiding callbacks does not change the fact that these systems are precisely asynchronous because they are implemented using callbacks.

So, in summary, sync programming is a style of programming whereby we invoke (synchronous) functions that synchronously return their values to us, and async programming is a style of programming where we invoke (asynchronous) functions that asynchronously return their values to us through the mechanism of callbacks, whether that is visible or hidden.

The reason the industry adopted asynchronous programming, despite the fact that it is a more difficult style to program in (even with added layers), is scalability. Quoting JEP 425 again:

Some developers wishing to utilize hardware to its fullest have given up the thread-per-request style in favor of a thread-sharing style. Instead of handling a request on one thread from start to finish, request-handling code returns its thread to a pool when it waits for an I/O operation to complete so that the thread can service other requests.

We program with callbacks because we must, not because it is our preferred programming model. Our preferred programming model is synchronous programming.

Now, over time, async machinery has evolved to replicate sync machinery:

Just like synchronous code can “block”, so to asynchronous code can “block”. In an asynchronous context, “blocking” means that there exists a potentially infinite delay between the registration of a callback, and the invocation of the callback, where such delay is caused by the result not yet being available from some external system (e.g. a response to a request, a chunk of data, etc.).
Just like synchronous code “suspends” when it blocks, so also asynchronous code “suspends” when it blocks.
Most async systems have replicated exceptions, including exception handling, and in some cases, even finally.
Just like asynchronous code may never resume (which corresponds to the callback never being invoked), synchronous code may never resume (e.g. while(true){}).
Etc.

At this point, for every useful or necessary feature in synchronous programming, there exists an analogue in asynchronous programming, whose implementation may be quite different, because it is based on callbacks, but whose semantics are identical, at least up to the limitations of callback-based programming (e.g. async stack traces are notoriously difficult because they require runtime support).

Now, Loom has arisen precisely because we would like to have our cake and eat it too:

We wish to achieve the scalability of systems built on callbacks.
We wish to program synchronously, in a direct style, without callbacks.

Loom gives us this magical combination through virtual threads, which have the same computation model as ordinary threads, but without the high cost. Effectively, this lets us create large numbers of threads, which “block” all the time, but this “blocking” is implemented inside the JVM, and therefore, it does not have to block operating system threads, which enables a small number of operating system threads to execute the work of large numbers of virtual threads.

Async reimplemented JVM threading in user-land code, to achieve high scalability, but Loom just makes JVM threading highly scalable.

Every async system implements something like a “fiber” (green thread implemented in user-land), even if it doesn’t expose a first-class value that represents the running computation.

However, let me be clear: when I say, Loom makes everything async, I am being imprecise: while Loom threads have an async (callback-like) implementation inside the Java runtime, to anyone building on the JVM, Loom code is synchronous code, written in a purely direct style.

To be precise, I would say Loom makes threads and blocking cheap, by giving us green threads. I take the shorthand form because the idea that “async is scalable, sync is not” is etched into everyone’s brains, due to decades of wrestling with JVM physical threads, and I want to emphasize that Loom gives us all of that scalability by baking into the JVM everything that we were doing manually with callbacks.

I understand that is a common perception but it is imprecise: technically speaking, asynchronous programming is about callbacks, not concurrency, and it is possible to launch tasks and continue doing things immediately, without waiting both in asynchronous style, and also in synchronous style.

For example, on pre-Loom JVM, I can write code like:

def fork[A](a: => A): Unit = new Thread() { override def run() = a }.start

fork(uploadFileToS3(file))
fork(...)
fork(...)

This is purely synchronous code and will block operating system level threads (limiting scalability), but it is also code that is launching tasks and continuing to do things immediately, without waiting.

Concurrency is the technically correct word to use when describing the interleaving of multiple independent strands of sequential computation, and concurrency is possible both in synchronous code, as well as asynchronous code.

Future gives us an async data type, all of whose operations are implemented using callbacks, as well as immediately submits the execution of user-defined code to a thread pool, which will initiate concurrent execution of the code.

That’s a design choice of Future, and not a necessary design choice in the landscape of async systems. Indeed, the functional effect systems like ZIO or CE or Monix make a different choice: to separate async operations from concurrent operations.

mdedetrich · July 22, 2022, 10:18am

This is why I mentioned earlier that we should be using more precise CS terminology such as concurrency and parallelism, i.e.

PRE-SIP: Suspended functions and continuations

Firstly this discussion where we use the terminology of sync vs async is missing the bigger picture where fundamentally we are debating on concurrency and parallelism. To make things clear

Concurrency: An abstraction that lets you represent the notion of preforming multiple computations at the same time

Parallelism: A hardware/physical feature (i.e. cores/threads) that lets you in reality perform computations at once

Various programming languages for both historical and design reasons have confused/mixed these concepts which underlies what is being discussed here. For example Java (up unti loom) has used threads as its core concurrency model which is great for CPU bound tasks since they bind directly to OS threads but is terrible for IO bound tasks where your bottleneck is not your CPU being busy computing some task but rather waiting for a response on the network/filesystem.

This is incorrect, Future does not work with callbacks (at least if we are talking about composition/working with Future’s). Callbacks are anonymous functions (aka thunks) that are passed directly into function calls, taken from http://callbackhell.com/ with Javascript which coined the term “callback hell”,

fs.readdir(source, function (err, files) {
  if (err) {
    console.log('Error finding files: ' + err)
  } else {
    files.forEach(function (filename, fileIndex) {
      console.log(filename)
      gm(source + filename).size(function (err, values) {
        if (err) {
          console.log('Error identifying file size: ' + err)
        } else {
          console.log(filename + ' : ' + values)
          aspect = (values.width / values.height)
          widths.forEach(function (width, widthIndex) {
            height = Math.round(width / aspect)
            console.log('resizing ' + filename + 'to ' + height + 'x' + height)
            this.resize(width, height).write(dest + 'w' + width + '_' + filename, function(err) {
              if (err) console.log('Error writing file: ' + err)
            })
          }.bind(this))
        }
      })
    })
  }
})

As you can see with the fs.readdir/files.forEach/widths.forEach functions, they accept another function as the last argument which is your callback. Future’s do not take anonymous functions in their arguments, it’s completely different.

What Future/Promise allows is you to convert from callback style to monadic map/flatMap (or in JS land .then() style) which along with syntax sugar/yield/generators gets rid of the nesting. Even Scala.js documentation has a section on how to use Future’s to get rid of callbacks (see the Future section in From ES6 to Scala: Advanced - Scala.js)

jdegoes · July 22, 2022, 10:53am

I can’t even respond to this, so I won’t.

mdedetrich · July 22, 2022, 11:07am

Multiple people are already confused with redefinition of terminology so this is not exactly helpful.

adamw · July 22, 2022, 3:46pm

Agreed on most points, especially preserving classic loops. Not entirely convinced about blocking in .map, I think the end effects might be surprising, so I would still see value in being explicit about sequencing, or not. But I’ll happily try another approach.

Ichoran · July 22, 2022, 4:36pm

jdegoes:

For example, on pre-Loom JVM, I can write code like:
def fork[A](a: => A): Unit = new Thread() { override def run() = a }.start

fork(uploadFileToS3(file))
fork(...)
fork(...)
This is purely synchronous code and will block operating system level threads (limiting scalability), but it is also code that is launching tasks and continuing to do things immediately, without waiting.

Which is exactly the same as Future, except you don’t need to define fork:

Future(uploadFileToS3(file))
Future(...)
Future(...)

I agree with your definition of async/sync vs. concurrency–that seems reasonable to me.

But the problem with your example is that this doesn’t capture the reason why async is used for concurrency. This is concurrency, but without any possibility of using the results. There are a few limited use-cases where that’s adequate. Usually it is not.

The key operation of useful concurrency is not fork. It is join with a return value.

What Project Loom will allow, as I understand it, is simplifying chaining, and in that way it could sort of replace Future, though I haven’t seen you give a clear example of it. Suppose we have a value a and operations f: A => B and g: B => C. When we use Future we have two choices:

Future(f(a)).map(g)

and

Future(g(f(a))

The former adds annoying boilerplate; the latter is simple sequential style. But the latter is a bad use of Future if both f and g take a lot of time and you’re already using your full allotment of OS threads, because g(f(a)) will starve other computations for longer.

With Project Loom, you can do g(f(a)) and not worry.

But what’s really simple sequential style is this:

val b = f(a)
val c = g(b)
val d = h(a)
val e = i(c, d)

And that is not something that Project Loom will help with very much. The Future analog would be something like

val b = Future(f(a))
val c = b.map(g)
val d = Future(h(a))
val e = Future(i(Await.result(c, Duration.Inf), Await.result(d, Duration.inf)))

and it’s that last bit–a join, essentially–that is all the hassle, and which Project Loom won’t really de-hassle in any important way as far as I can tell. And it is the necessity of handling this case that makes it very important to have types tell you what is what, because you may forget to wrap it in a Future otherwise (assuming that i is fast, and there’s other work you can go on to do sequentially).

(You could also write val e = d.map(v => i(Await.result(c, Duration.Inf), v)).)

lihaoyi · July 22, 2022, 6:17pm

Ichoran:

The Future analog would be something like
val b = Future(f(a))
val c = b.map(g)
val d = Future(h(a))
val e = Future(i(Await.result(c, Duration.Inf), Await.result(d, Duration.inf)))
and it’s that last bit–a join, essentially–that is all the hassle, and which Project Loom won’t really de-hassle in any important way as far as I can tell. And it is the necessity of handling this case that makes it very important to have types tell you what is what, because you may forget to wrap it in a Future otherwise (assuming that i is fast, and there’s other work you can go on to do sequentially).

(You could also write val e = d.map(v => i(Await.result(c, Duration.Inf), v)).)

It is a hassle with Futures. And part of the reason it is a hassle is that Await.result is intentionally made exceedingly verbose, to discourage usage. And we want to discourage usage, because with expensive threads, Await.result is a very expensive operation, and we want to encourage cheaper operations like map/zip/sequence/etc. where possible.

With Loom, threads are no longer expensive, Await.result is no longer expensive, we no longer need to encourage map/zip/sequence, and Await.result doesnt need to be verbose anymore:

val b = Future(f(a))
val c = b.map(g)
val d = Future(h(a))
val e = Future(i(c(), d()))

val e = d.map(v => i(c(), v))

There is in fact precedence for such an API. The original API for Scala Actors and Futures is exactly this!

The Scala Actors API | Scala Documentation

This only changed in Akka and Finagle futures (and thus the Scala std lib) due to the requirement to conserve threads, perhaps with a bias towards serving very-high-performance/very-high-concurrency/very-high-scalability environments that Lightbend and Twitter are targeting. (Even today, most systems are small enough to never hit these perf/concurrency/scalability bottlenecks)

With Loom, the API of Futures would look very different from today, as would the usage style and best practices. It would look much like the original Scala futures API: forks and joins would be cheap, both syntactically and in resource footprint. There is still be value in a Future type, but it would look much more like Phillip Haller’s original API, and without the combinator/for-comprehension soup we have to deal with today

mdedetrich · July 22, 2022, 6:56pm

Well this is kind of rephrasing the problem, the reason why Future is a hassle has little to do specifically with Await.result but rather coloured function problem. map/zip/sequence is not really a hassle, what is a hassle is that if you are for example using a blocking web framework, you decide to use Future’s and then you need to use Await.result because your web framework doesn’t accept Future (or some other asynchronous type) as a final value in your routes. If you are using a fully asynchronous framework, outside of tests you would typically never use Await.result apart from possibly the edge of your program (i.e. in your main). If you are using a proper asynchronous web framework like http4s or akka-http you don’t ever use Await.result or w/e equivalent.

In any case if you use purely functional types such as IO/Task or ZIO you also still have to use map/traverse/sequence (in the case of ZIO that have aliases with more familiar names such as foreach but in terms of category theory that is using traverse).

Note that in my response I assume we are making a distinction between hassle and familiarity. Scala’s “monadic” style with map/flatMap + for comprehension is definitely less familiar (although with generators in ES6 have made the concept more mainstream) but the hassle part comes from the coloured function problem, not Await.result specifically.

I wouldn’t downplay the “very-high-performance/very-high-concurrency/very-high-scalability environments”, it is the major reason why node.js is even a thing (aside from JS familiarity). node.js whole shtick and the reason it even exists is solving that exact problem.

Ichoran · July 22, 2022, 7:25pm

Okay, this is partly a good point–the verbosity goes away.

But the conceptual complexity of keeping track of joins doesn’t go away. join is the only place where the callback-nature of async necessarily manifests itself.

Chaining can potentially run into the problem if you use map style or forget to vs. wrapping the entire computation. But join always has the problem.

Ichoran · July 22, 2022, 7:32pm

I agree (the noisy syntax is something of a distraction but calls attention to what’s happening), but my point is equivalent to saying that when you have joins, there is no simple way to escape the colored function problem. Loom isn’t it until it’s so cheap that everything can be transparently async/concurrent, and Haskell has shown us that even with one of the most amazing compiler-level optimizations around, you can’t even make everything lazy (which is the cheapest type of non-vacuous async I know of) without a substantial performance penalty.

(Everything transparently async/concurrent = each operation starts a new thread that awaits the results of everything needed for that operation.)

lihaoyi · July 22, 2022, 8:29pm

But having a cheap Future[T]#apply(): T does solve the colored function problem. The problem with colors is not just that you have two colors, but that you have two colors and you cannot call functions of one color from the other, and thus the former color goes viral.

In the status quo Scala, the problem is you cannot call async methods from non-async methods (given we are avoiding Await.result) and so Future[T] goes viral and your whole app ends up in Future. It’s the virality that’s the headache, not the mere presence of two different types. It’s the fact that if I have a big non-future codebase, if I have want to call one thing deep inside that returns a Future[T], I have to rewrite/refactor the whole callstack to wrap it in Future

With Loom and cheap joins, you have Future.apply(t: => T): Future[T] to convert one way and Future[T]#apply(): T to convert the other way. You have two colors, but you no longer have the “one colored function cannot call the other color”. Thus the main problem with the two-colored functions goes away. Yes you have two colors, but they are trivially convertible and inteoperable, and you can use whichever of Future[T] or T is a best fit for each piece of code without the annoying virality

This also goes into something that JDG has brought up repeatedly. Future may not disappear, but a post-Loom Future would look very different from pre-Loom Futures: cheap threads, cheap joins (both perf-wise and syntactically), less (though not quite zero) need for combinators, no more virality, etc… The need for a “type that represents in-progress computations” would not disappear, but the need for the current implementation and interface of scala.concurrent.Future may diminish substantially (with remaining usage inside highly-concurrent single-threaded environments: Scala.js, game-loops, UI-loops, etc)

Ichoran · July 22, 2022, 10:40pm

Well, this could be, but JDG was also saying that keeping track of it in types wasn’t necessary. I don’t think that’s true at all.

Anyway, you can’t escape the colored function logic. Easy wrappers help. Maybe they help enough–I don’t know. I think Loom used the way you envision it is just going to push the colors deeper into the stack of functions and make more color switching necessary, which will be a win for performance, but not for comprehensibility because the border between the two will be larger.

Right now, we really only need functions to be colored when there’s a good chance they’ll block or otherwise do something that has a substantial delay.

We certainly don’t want the conversion to be implicit or you’ll end up with performance nightmares like

val a = 5     // Blue, trivially
val b = f(a)  // Transparently red Future[Int] => Future[Int]
val c = xs.fold(b)(_ + _)

So that’s right out.

You also have the problem of

val a = Future(longRunning())
val b = Future(alsoLongRunning())
val c = a().foo
val d = b().bar
val e = blueFunctionThatRunsALongTimeBeforeRequiring(c, d)

It’ll work, but it’ll be slow compared to

val e = redFunctionThatRunsALongTimeBeforeRequiring(a, b)

And this doesn’t even get into the issue with side effects: When blue, you are guaranteed the order of side-effects. When red, they are arbitrary unless you explicitly sequence them…and then you have to be careful not to accidentally sequence them. (C.f. standard issue with for and Future.)

(The loom equivalent would be

val a = redProducingFn().value()
val b = redProducingFn2().value()

Oops, just lost all our potential for parallelism there…

All else being equal, I would love for threads to be cheaper (with ~zero overhead). But unless they are literally free, the performance difference matters. And even if they are free, the difference between being concurrent and sequential matters, so it’s not just the direct-vs-callback issue of sync/async that is an issue.

So unlike JDG’s (and maybe your) attitude that Project Loom solves the colored function problem, I just think it kicks the can down the road a little bit. Maybe down the road is a better place to be–probably. Maybe we need to get further down the road yet by having something shorter than Future(x) to redify things. (Maybe ... x.) But I can’t see with any confidence at all that having strong assistance from the type system is not a good idea, or that the current way is adequate (though maybe ... x would be). So Odersky’s explorations into that may turn out to be very valuable (or not–but that’s how research goes; you can only promise to learn things, not that what you’ve learned will turn out a particular way e.g. “given capabilities are the right abstraction for simplifying the issues with concurrency”).

lihaoyi · July 23, 2022, 2:26am

The thing is, even in the status quo, all of your “problematic” examples can happen with Futures as well.

for {
  c <- Future(longRunning())
  d <- Future(alsoLongRunning())
  e <- blueFunctionThatRunsALongTimeBeforeRequiring(c.foo, d.bar)
} yield e

People accidentally sequence things in Futures all the time. Accidentally linearizing things in flatMaps or multi-step for comprehensions, when they could have done parts in parallel, is very common. It even happens to the literally world-leading experts in the topic in the Scala standard library (https://github.com/scala/scala/pull/9655). “Tracking things in types” helps not at all

All the failure modes you bring up are real. But Futures, even the viral tracked-in-types Futures we have today, solve exactly none of those failure modes. They simply let you make the same mistakes, with a slightly different syntax, and with useless stack traces when things crash. JDG is right in saying that async combinator based Future code, and “direct style” blocking/threaded Future code, are isomorphic. They’re isomorphic even down to the common bugs and pitfalls!

I mean, I can’t prove that research that doesn’t exist yet won’t yield something unexpected and amazing. But so far all the benefits you seem to be attributing to “tracking asynchronosity using types” that Futures should be giving us today, simply don’t exist: the exact same problems exist when working with Futures. Whether or not we track things in types, sequencing and paralleism are just as tricky topics

tarsa · July 23, 2022, 4:52am

LPTK:

How is that any different than for Future? Any amount of computation may be performed eagerly and deferred in a () => Something or () => Try[Something]:
def foo1(arg: T): () => Something =
  () => compute(arg)

def foo2(arg: T): () => Something =
  val res = compute(arg)
  () => res

Even with IO you can do mistakes, etc instead of:

retry(fetchFromUrl(...).flatMap(result => parseResult(result)))

you can do:

fetchFromUrl(...).flatMap(result => retry(parseResult(result)))

so type system won’t enforce semantic correctness anyway.

My point was that to have retryable computation, control parallelism level, defer computation, etc with Future you need to do double wrapping, e.g. () => Future[Something] and that looks like overkill most of the time, so people do single wrapping i.e. Future[Something] already wraps a lambda inside.

Also composition of Future returning methods is painful compared to both imperative style and IO monads. Only with Futures you need to provide execution context for combinators (.map, .flatMap, .onComplete, etc all require execution context). If you couple that with the double wrapping requirement for retryable computations with Futures then you end up with particularly annoying and not very readable composition bolierplate.

Hmm, I somehow doubt that IO monads would provide noticeable advantages compared to post-Loom imperative style. Retries, timeouts, parallelism control, sleeps, automatic resource management, stacktraces, error propagation, etc are simple in the direct synchronous style that threads (virtual or not) offer. The direct synchronous code style is directly compatible with control structures, i.e. ifs, match sequences, loops, recursion and any combination of such control structures (including any amount of nesting).

Let’s say that with imperative style I can do:

DB.transaction {
  val entity = loadEntity()
  if (someCondition(entity)) {
    saveAnotherEntity(deriveFrom(entity))
  }
  saveEntity(transform(entity))
}

but with monads I need to do e.g.:

for {
  entity <- loadEntity()
  _ <- if (someCondition(entity)) {
      saveAnotherEntity(deriveFrom(entity))
    } else { 
      // boilerplate already in such trivial case
      // and it gets worse the more complicated the case is
      // of course this particular case can be simplified
      // using specialized combinator, but how many of them should we have?
      IO.successful(())
    }
  _ <- saveEntity(transform(entity))
} yield ()

…but spawning a VirtualThread is akin to something like IOFiber.spawn and not to io.flatMap(...). With IO monads you pay for .flatMap at every sequential step, while with VirtualThreads the direct synchronous imperative style is as cheap as it was since forever.

But why should we be forced only when dealing with collections? If we have 5 different actions that can be run at the same time then we don’t end up with List[IO[5 types here]]. I don’t see any advantage here if we want simple sequential code. Sequential code is the default anyway and post-Loom it looks simple. Post-Loom you pay the price (in terms of inflated source code) for parallelism only where you explicitly use it.

It’s probably better to use more less ambiguous, but longer terms (e.g. blocking platform thread vs blocking virtual thread) than to use short but rather ambiguous words (what do people understand as async?).

Is HTTP request-response cycle sync or async (irrespective of implementation details of server and client as we’re focusing on protocol only)? Depends on interpretation. I’d say that messaging (like Kafka or JMS) is fully async as you don’t wait for response at all, but with HTTP request you generally (synchronously) wait for response, no matter if using callback or blocking a platform thread (that’s not visible using e.g. network sniffer).

Or in terms of actors and their operations: actorRef ! message (fire and forget) is async as we’re not waiting for answer at all, but actorRef ? message (ask pattern) is sync as we stop progressing (at some point) until the second actor replies.

Wrapping in Future is done because the function (that is being run in async style) potentially takes a long time. Spawning a VirtualThread takes a fraction of a microsecond (IIRC) and you don’t spawn a new VirtualThread per every step, but only once per sequential process. Therefore there should be much fewer virtual threads spawned than monadic compositions invoked.

The problem with function coloring is that you can’t opt-out from the viral color (e.g. in Scala.js Await.result(...) doesn’t work at all) or it is heavily penalized (as in general with Futures). To elaborate on the second point: if you do Await.result(...) in a Future then the execution context has to (temporarily) spawn more platform threads (which is very expensive, orders of magnitude more expensive than spawning a VirtualThread). Otherwise you get reduced parallelism or even deadlock (I think that wouldn’t be hard to get with nontrivial apps).

Blocking virtual threads (e.g. invoking otherThread.join()) is fully OK as every separate sequential process has it’s own VirtualThread. You don’t reduce parallelism if you block virtual thread. In fact, virtual threads are designed to be blocked every time it’s convenient to do so. Scheduler then temporarily detaches the virtual thread from underlying platform thread and instead places some other virtual thread, but one that’s ready to execute. Therefore there’s no kicking can down the road.

What Loom doesn’t do is automagically parallelising your code. Loom provides cheap virtual threads (so blocking them is also cheap and therefore encouraged), but you (or some library you’re using) have to explicitly create them every time. But creating a virtual thread is very cheap, comparable to context switch anyway, so you’re in the same performance ballpark (vs other efficient ways to parallelize your code). There’s (at least) one scenario where virtual threads are not necessarily the best solution, i.e. CPU-intensive tasks that run well on e.g. ForkJoinPool tuned for LIFO tasks processing. But even then the performance loss is not dramatic and actually no monad (be it Future, ZIO or whatever) is tuned for such scenarios.

mdedetrich · July 23, 2022, 7:16am

This is the advantage of tracking the distinction between sync and async in types, VirtualThread’s are cheap but they aren’t free, so while the cost of VirtualThread’s are negligible in IO bound apps in other cases its a problem.

If you for example track synchronous types (as well as blocking vs non blocking) and also have a way of tracking side effects, you can perform optimizations such as fusing an entire block of sync computations to just execute once after another on a single thread (as would be the same if you wrote it in classical imperative style). The advantage here is that if you mix IO bound with CPU bound you get best of both worlds, IO bound computations jump around different threads as they are multipled onto Fibers/ForkJoinPools etc etc and CPU bound computations can either be fused into a single physical thread or multiple physical threads with minimal context-switching.

There is also what I believe to be somewhat of an elephant in the room which no one has really provided an answer for which is that unless Loom is doing some fancy black magic, the implementation of Loom is going to be slower than what it could be because it has a goal of preserving an entire stack trace which does have a significant overhead (this overhead will be lower than what Cats-effect/ZIO/Monix/IO have to deal with because its done directly in the VM but its still there).

It may be possible to negate this cost in Loom world if Loom is really smart and can optimize away this cost with

class Ex extends Throwable(..., writableStackTrace = false)

but in this case you will have to use Loom in combination with a better effect type anyways due to working with error based values (at worst Future or the other purely functional lazy ones), value based error tracking being absolutely terrible in Java for ergonomic reasons (no tuples/validation/result type). I mean its understandable why Loom did this, typical Java/JVM code is either directly or indirectly over-reliant on stack propagation but it also means that Loom is leaving performance on the table.

Honestly the only language that is doing this well and in a principle manner is Rust. Its taking Rust longer to do it (mainly because of how hard it is do provide zero cost abstraction for these high level concepts) but similar to Scala, Rust has a lazy IO type that gets transformed to a state machine and you can provide the Executor which the state machine uses. To put things into perspective, this implementation is fast enough that its appropriate for the Linux kernel (its using an Executor that has been tweaked to use Linux’s KThread).

Its not just single-threaded environment’s, any multithreaded application that is not overly IO bound (but you still want to optimize your IO bound tasks in your app) is going to have to have the same issue which is something that people commenting on this thread are conveniently ignoring. While yes its true that if you have purely IO bound typical web-server you may not even care, but these kind of arguments come out of ignorance of heavy real world use cases of JVM most of which aren’t even commenting here.

I mean there is a reason why with your Scala IO types do so much ceremony with sync/async, its because we wan’t our implementations of IO to run as fast as possible so that as users of libraries we get the best of both worlds (nice abstractions at good performance). Whether this using ExecutionContexts.parasitic in Future/akka world on cheap .map operations to avoid thrashing the ForkJoinPool or in the purely functional types to optimize the loop that the interpreters run on.

tarsa · July 23, 2022, 7:50am

Stack trace is preserved regardless if you use raw platform threads or if you use virtual threads (and you can’t escape threads, you can’t execute in the void). I’m not up to date with Loom internals but probably right now there’s some inefficiency because virtual threads’ stacks are migrated between managed heap and unmanaged stack by so called freezing and thawing. But that’s an implementation detail that can be fixed in the long run and the cost of managing stack trace will be virtually the same in case of raw platform threads and virtual threads.

In other words (disregarding current implementation details of Loom that is still in early stage) you have same costs of managing stack trace under virtual threads as in raw platform threads. Throwing exceptions under virtual threads has roughly the same cost as throwing exceptions under raw platform threads. So if you generate millions of errors values per second then you probably should use Either monad or something like this instead of throwing so many exceptions. Future and Try wrap full exceptions in case of errors so they also aren’t lowering the overhead.

I think that nobody said that post-Loom we must use exceptions only and forget about custom lightweight error types and that’s the elephant in the room that you don’t see

odersky · July 23, 2022, 8:41am

But that comes with its own set of problems. Once you allow to go from Future[T] to T in direct style without requiring some capability you have thrown away the only means we had to distinguish a local call from an RPC across the internet. It really all boils down to “ablity to suspend”.

mdedetrich · July 23, 2022, 8:48am

Yeah I don’t see how its possible to completely eliminate the cost of preserving stack trace in VirtualThread’s, this is what I am suspicious about because no other language has managed to do it (you can do some optimizations i.e. stack trace within a function is preserved but that is quite different from an exception being thrown in a completely different location).

If you want to fully preserve stack trace you are essentially implementing tracing and there IS a cost to that in terms of both memory and CPU. Even if you do something silly like increase the amount of memory a VirtualThread has something like 8 megs (i.e. default size of a physical thread on Linux/OSX) you will then get into issues of cache locality.

This also doesn’t make sense because actual physical threads have a large stack size which is inherited from the OS, thats where the JVM stores the stack. The whole point of VirtualThreads/Fiber’s/Green Threads is you DON’T want to allocate 8 megabytes because that is a massive amount of memory. Typically speaking green threads/fibers are roughly 1kb in size (this of course depends on language/runtime) and yes you can use some of that space to store stack information but its not going to go very far.

No what I am saying is that at least for Java/JVM, the ecosystem and the way that “idiomatic” code is written either directly or indirectly relies on preservation of stack (mainly because error handling/mentality in JVM/Java is centered around exceptions) and this shoehorned Loom into needing to preserve stack trace in VirtualThread (at least if they want to fulfil the “zero code changes required” mantra)