PRE-SIP: Suspended functions and continuations

I think you might be comparing apples and oranges here.

CLI commands have a single (rather coarse-grained) path to handling failures (die with some error code), so the difference between aw and less failing is much less relevant than the difference in failure modes between a pure function and a database query.

The python libraries are also not really equivalent comparisons for a similar the same reason: idiomatic error handling in Python is to just throw an exception, so two Python functions which both have a return type and may-or-may-not throw exceptions (but you’d better assume they do) aren’t a great analog for how failure modes are handled in idiomatic Scala.

It would make sense that, if async computations can be made so performant that the difference between a pure function and side-effecting network call can be made invisible on the JVM, that this would be a boon to Java applications, and in this context, Loom replacing Future in Java applications makes a lot of sense.

However: having recently had to try to answer the question, “how many ways can critical method X fail”, in a part of a codebase that (while written in Scala) used the Exception-first style, I can say with certainty that moving to this sort of style would be a mistake.

4 Likes

Note that despite the sensationalist claims of its author, this thing is not viable at all. It fails catastrophically on certain program shapes. The author says a type system could rule out these program shapes, but no type system that does something like that has been demonstrated so far AFAIK.

2 Likes

I don’t think so. This is a non-goal for many in the Scala community, including I believe Odersky (citation needed). Capture tracking’s notion of purity does not correspond to functional programming’s notion of purity. Indeed, neither concept embeds the other completely, so while they overlap in some cases, they are genuinely different concepts.

There is probably no future Scala version (from EPFL) that tracks what functional programmers mean by ‘purity’. Scala is a hybrid language with a user base beyond pure functional developers, and an implicit goal to capture Python-like markets, which entails an embrace of procedural programming.

Java already makes them identical via various RPC frameworks. The main problem is inefficiency. Loom allows you to make them look identical while still retaining efficiency.

There are extremely compelling reasons to do so:

  • Handle RPC errors with try/catch/finally (the value of this CANNOT be overstated)
  • Abstract over both local and remote implementations
  • Write resource-safe code using ordinary language mechanisms (try-with-resources, try/finally, etc.)
  • Single-colored functions

Perhaps in a new programming language designed for cloud-native computation, one would have some differences (to be proposed) between local and remote computations.

But for ordinary programming languages designed prior to the advent of cloud-native systems, the pros of having a uniform computation model vastly outweigh the benefits (indeed, the uniformity is a primary driver of adoption for functional effect systems!).

Moreover, the drawbacks have been overstated. There are two main drawbacks to RCP-calls-as-ordinary-function-calls:

  1. Failure with new error types. RPC calls may fail in new ways that application code may not anticipate or necessarily know how to deal with. I think such is largely solvable without new language constructs by better design of RPCs.
  2. More seriously, timeout and retry behavior. RPC calls are flakier than local calls and subject to significantly longer delays. However, these have robust solutions that work across both local procedure calls and remote procedure calls: retry strategies and timeout policies. Retry strategies properly apply to recoverable errors and are useful in local and remote contexts; timeout policies too, are useful in both local and remote contexts. Frameworks (or, to take a more extreme point of view, libraries and even programming languages) should take special care to separate recoverable and non-recoverable errors and provide compositional ways of applying both retry and timeout policies.

Currently Loom does not provide a lot of machinery to help with (2). However, it provides a solid foundation for library authors to develop their own approaches to solving these challenges, based on underlying language primitives that are proven and familiar to developers.

More precisely, today we have “async blocking”, which happens when a fiber / virtual thread suspends, waiting external re-activation, and we also have “sync blocking”, which happens when a physical thread suspends, waiting external re-activation. What Loom is doing is upgrading almost all “sync blocking” to “async blocking”. Semantically, they’re all blocking, it’s just a question of efficiency: async blocking is vastly more efficient than sync blocking, so it’s merely a sort of optimization applied retroactively to the masses of synchronous code that have already been written.

I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals. But it is clear that no official answers will be forthcoming until capability-based research program is closer to completion (ETA: 5 years). And until then it is extremely risky to modify the language, especially in ways that import already-obsolete Kotlin designs into the much more modern Scala 3 programming language.

3 Likes

I don’t know for the general use case, but ZIO solved that issue pretty well. It creates a very insightful error trace, with what code would have been executed next (in the context of the app, not the internal fiber management weaving). Very actionnable, debug is (almost) as simple as in mono-threaded code.
And if I followed things correctly, in ZIO it’s even cheap (in runtime perf - almost free, even)

3 Likes

Yes, although one of these two problems, on one of the platforms (JVM/Loom) is set to disappear. Hence my proposition to shift the focus of the problem on something that isn’t platform-dependent :slight_smile:

I didn’t say I want to track purity :wink: That’s probably too much. Writing to mutable state? Probably not. Performing a network call? Probably yes. Maybe tracking non-local computation would be a good, precise term?

(in fact you propose the same in the next section, as I now see)

Java already makes them identical via various RPC frameworks.

Not always - you often get different checked exceptions, which is a way of “marking” a method as side-effecting. Where we have IO[], java often has throws IOException - both influence the signatures. But again given history, we might be looking for better solutions than checked exceptions (I think in general in Scala we are looking for better solutions to various problems :slight_smile: ).

Do we want the compiler to point out that we might not be handling all the error cases that we should (which could lead to applying e.g. a retry/timeout strategy)? I think in typed a language the answer might be “yes”.

I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals.

There definitely won’t be agreement, but luckily we have EPFL and Martin who picks the direction as the where Scala should be headed (with input from the community of course, but ultimately somebody has to make some choices from time to time).

1 Like

I think they failed as ultimately you do need to tackle the fact that an RPC call fails differently from a local call. Now this might be done with discipline (in Python) or with the help of a compiler (in Scala) - that’s a dynamic vs static typed discussion, people have different preferences, and that’s completely fine.

But there are no magic solutions which make RPC calls behave just as local calls. You need different code when doing an RPC, than when doing a local call. (note that this code might be far away from the invocation site, somehere in an error handler, but it still needs to be there).

3 Likes

That is what we call “effects”. Interaction between an automaton and its environment: I highly recommend Oleg Kiselyov's talk titled "Having an Effect"[0] in which he... | Hacker News

You seem to want an effect system.

Thanks, I’ll take a look.

I might be indeed looking for what’s known as an “effect system” in literature, however I have the feeling that outside of academia, the term “effect tracking” is an overloaded term, with many possible meanings (covering mutable state, async, remote computations etc.). So maybe a more precise one would suit our communication better.

There’s something that still isn’t clear for me from the discussion. Does Loom somehow solve the classic N+1 problem? I.e. let’s say I have a function that does a Google search: def google(str: String): List[URL]. Now I try doing this:

val list : List[String] = ???
list.map(str => google(str))

How does Loom ensure this is done efficiently, i.e. by spawning one thread per element of list?

If we tracked in types that google can perform a costly block, we would be able to use that information to, perhaps, forbid the above piece of code. Perhaps there should be a variant of map which always spawns a thread per element and allows blocking operations.

Regardless of what the exact solution is, tracking in types that google can block seems better than the situation where it’s easy to have into performance problems when using it. Sure, similar problems would occur with computation-intensive function as well, but I feel like they occur much more easily once we start doing async programming and a single function call could suddenly take 100ms, or whatever is the local Google roundtrip time.

1 Like

You are conflating concurrency with asynchronicity.

Loom does not change the semantics of your code: in particular, it does not automatically insert any concurrent operations, nor does such a thing make sense in general (see above academic references on auto-parallelization, which is fraught with known issues).

Loom merely takes your synchronous code (that is, code formerly using physical threads and operations like IO or locks that “sync block” those threads) and makes it fully asynchronous (using virtual threads and “async blocking”, which is more efficient than “sync blocking”).

As such, maybe in your code base, you have some code like list.map(str => google(str)), where each invocation to google blocks a physical thread. Under Loom, the code has the same meaning and will produce the same result, only google can now be fully asynchronous (which does not imply it is concurrent with respect to the thread executing the List#map, because it is NOT concurrent), which means you get the same behavior before but it runs more efficiently.

Loom is all about efficiency, not concurrency, per se: taking the same programs and making them work better. As a consequence, you can now do “async operations” (i.e. “efficient operations”) anywhere without having wrapper types like Future, including in List#map.

4 Likes

Ok, so you want to track “local” computation versus “remote” computation. First off, that would not be related to async versus sync tracking: both sync and async can do remote computation, the only difference is efficiency.

Second, in the era of cloud-native applications, the cloud itself has become a sort of standard library: every other call is to some microservice or GraphQL or REST API. Our applications are the glue that hold together operations implemented in the cloud. So tracking “remote” computation may be increasingly and incredibly noisy, as we enter a future in which nearly all calls might be “remote”.

Third, and in my opinion, it is very important to not be obsessed with “tracking” things for the sake of academic novelty (which is good for obtaining grant money but bad for commercial software). Tracking information using types involves considerable effort for developers, who have to type more characters and wrestle with more mistakes (see also: uninferrable exception lists in Java). You can, like Odersky is trying to do, reduce the cost of tracking–preferrably NOT via inserting more magic fraught with edge cases that works in unexpected ways with other language features, such as “auto-adaptation” in context functions–but fundamentally, you must still acknowledge it has a cost.

To pay for itself, you have to demonstrate that the information is (a) actionable, and (b) so frequently actionable that the costs of universal tracking are outweighted by the proven benefits.

I have not even heard a hand-wavvy argument on remote vs local being actionable: what would a developer do differently, knowing that “doX()” is a remote call versus a local call? What would the developer do differently, knowing that “doX()” is a local call versus a remote call? Not abstractly, but what concrete code would a developer write knowing such a difference?

I have argued above that the steps a developer would and should take to flaky computations always involves retries, and the steps a developer would and should take to long-running computations always involves timeouts. Although remote computations are more likely to be flaky and long-running, it is only a correlation, and many local computations can be both flaky and long-running. So the mere presence or absense of a “remote bit” is likely to be insufficient information to be actionable.

If I am wrong, then it should be possible to provide some evidence that:

  1. Devleopers know to do and actually do something radically different based on the “remote bit”, such that it significantly affects correctness or performance or some other metric that matters to the business.
  2. Developers do this so often that it overwhelms the significant drawbacks to infecting every type signature across the entire code base with a “remote bit” (or at least, infecting either all remote code, or all local code, with such a bit, if you can infer its negation by its absence).

Ultimately, my stance is that “effect tracking” is a distraction and a waste of resources, hence my blog post, Effect Tracking Is Commercially Worthless.

That dynamic could change in a future in which tracking things is cost-free or super-low-cost and completely automatic (fully type-inferred), but until when and if that point arrives, I will always be asking proponents of effect tracking to demonstrate (a) actionability of information, and (2) pervasiveness of need, such that benefits clearly outweigh costs. To my knowledge, no one has demonstrated this in the case of remote vs local, and it cannot be demonstrated at all in the case of sync vs async.

4 Likes

Correct. That’s the whole point. I thought we’re past the sync/async distinction :wink:

Also agreed. So if we want local and remote invocations to have a different signature, because of cloud-native the cost has to be minimised. I think that’s the point of @odersky 's research project.

Well I would say that you have demonstrated that two paragraphs below: the actions to take are retries and timeouts, the frequency is there because of cloud native.

One point where I would disagree is that local computations need recovery logic as above to a similar degree as remote do. I don’t think it’s only correlation. Every remote invocation can be flakey / long-running / throw errors randomly. But only some local ones have these characteristics.

Now, I don’t have hard empirical evidence that the “remote bit” actually matters. Only anecdotal :wink: But on the other hand, is there evidence that a consistent and principled approach to errors originating from remote calls doesn’t influence the bug ratio? Especially that these bugs tend to manifest themselves in production, not in the calm and idealised test environment.

Finally, aren’t we talking here about error handling - something that is very close to the heart of every ZIO programmer? The whole point of effect tracking, or remote-call tracking, or however we call it, is to properly handle the error scenarios. Java implements this by requiring methods to add throws IOException, which is often circumvented by programmers. ZIO moves the error channel to a type parameter, for composability. I don’t think it’s at all unreasonable to look for other, maybe more general solutions, where errors are just one specialisation of the “effect” a computation might have.

4 Likes

I would be happy if that were true but given other posts on this thread, including, indeed, the nature of the pre-SIP itself, it seems unlikely. :grinning_face_with_smiling_eyes:

Indeed, Odersky himself stated:

“The sync/async problem is one of the fundamental problems we study [in our 7 persons over 5 year project].” (emphasis added)

From my experience, I would say that developers failing to apply retry or timeout logic is not a significant source of lost business revenue, partially because libraries and frameworks are designed to handle or carot users into doing the correct thing (e.g. Http.get requiring a timeout parameter).

It happens sometimes, and it has measurable costs, but the overall amount of revenue lost due to failure to apply retry or timeout logic pales in comparison to the revenue lost dealing with unexpected null values, transformating data from A to B without mistakes, or possibly even retrying the wrong thing (e.g. NPE) because of the lack of a two-channel error model.

Even for resource handling, the main issue in modern web apps is memory leaks; the occurrence of lost file handles or connections in a database pool is made rare by libraries and frameworks (or try-with-resources in Java).

For things which are not a significant problem in commerical software development, it is all the more important to ensure the costs are minimized; and to ensure that new features aimed at addressing these “problems” produce clear benefit in magnitude sufficient as to overwhelm those minimized costs.

I agree that only some local computations have these characteristics, but not that all remote ones do. For example, if your application is running with EBS or EFS storage, then despite all disk-related operations being remote, it is extremely unlikley to be flaky or long-running.

This raises another important point: that sometimes operations that your application may expect to be local, are in fact remote. Which means that any attempt to track “local” versus “remote” is at best an educated guess. Indeed, a repository interface may suggest the database is remote, while a particular implementation may be using H2 embedded.

To me, this is feeling like researching how many angels can dance on the head of a pin.

Meanwhile, while we discuss whether to embed a remote versus local bit in the type system (in a TBD comonadic effect system that no one is asking for, despite, of course, some academic value), modern cloud-native, industry-focused languages like Ballerina make it trivial to produce and consume cloud services and leverage user-defined data structures in cloud protocols, innovating on real problems that consume massive amounts of developer time.

Which of these focus areas stands to benefit industry the most?

(Actually, we’re not even really discussing local versus remote, because most people contributing to this thread seem to believe the async versus sync distinction is important to track in the type system.)

In my view, ZIO’s error handling works because (a) it is based on values, which allow even polymorphic abstraction over duplication (b) it is fully inferred, meaning no additional developer work is required to benefit from it (“zero” cost), and (c) it leverages the type system to cleanly separate recoverable errors from non-recoverable errors, with an ability to dynamically shift errors between channels (which is critical in a cloud-native environment, where only some errors should be retried). Java failed on all three accounts, which is, I believe, why checked exceptions are regarded mostly as a mistake (CanThrow fails on two accounts, and its potential successor will probably fail on those same two accounts).

I would be happy to see another error model that takes this same direction with fewer costs and / or greater benefits, and if that happens to be part of a capability-based (comonadic) effect system geared toward solving problems rather than tracking bits of debatable value, then I would appreciate that, as well. But keep in mind the burden of proof is on those making the claim that such a system would be superior to what exists today, and that it warrants investment and support from the broader Scala community.

4 Likes

One question that I think we have not answered: What about the error case? If Futures are replaced by VirtualThread with a result (or any of the other options discussed here), how are errors handled?

  • Returning a sum of result and error? But then we are back to monadic instead of direct style. And you could argue that if you are working with monads, you might as well work with one that makes it explicit that things can suspend.
  • Or throw an exception? But then these need to be tracked somehow
3 Likes

I sketched a Future-lite successor in the other thread:

class Future[A](f: CompletableFuture[Try[A]], vt: VirtualThread) {
  def virtualThread = vt 

  def result: A = f.get.get

  // etc.
}
object Future {
  def apply[A](code: => A): Future[A] = {
    val f = new CompletableFuture[Try[A]]()
    val vt = Thread.startVirtualThread { () =>
      f.complete(Try(code()))
    }
    new Future(f, vt)
  }
}

Future#result gives you access to the result, and if that is an exception, then it would throw.

How to model that exceptional case–whether with Java checked exceptions, Try values, Either values, ZIO values, CanThrow capabilities, CanThrow’s comonadic capability type-based successor, or something else entirely–is independent from the sync/async question, and also independent from Loom.

In other words, however we would want to describe any exceptional result from any method, we should use that same process to describe the possible error that may result from calling Future#result.

1 Like

On a somewhat related note there is another plausible point which is stack traces + exceptions. The nice thing about Future is its decoupled from stacktraces, that is by design you are not meant to expect the stack to be consistent or (even be there at all). As is evident by anyone that has used Future along with the standard ExecutionContext (such as ForkJoinPool) the stack traces are meaningless because the computation’s can jump between different at whim (thats the whole point of multiplexing computations onto real threads). You can see this even in the Future api, i.e. you have methods like Future.failed to designate a failed Future with a Throwable but its just passed around and propagated as a value. You can still recover from exceptions thrown in Future but as stated before its expensive, critically you don’t have to throw (i.e. you can just use Future.failed).

This is one area where I am a bit skeptical loom, although I haven’t looked at loom in great detail but if VirtualThread is meant to preserve stack traces and there is a lot of code out there that assumes stack traces are consistent and properly propagated, unless I am missing something this will have a performance penalty (versus not caring about the stack at all). This performance penalty is already visible right now, either in the case of Future with custom ExecutionContext's that preserve stack or other IO types that propagate the stack in the interpreter. I do believe that loom’s solution to this problem is not going to have the same overhead but as said previously I don’t see how it can be “cost free”.

Ultimately though this is one of the best benefits of doing value based error handling rather than throwing and catching exceptions, if you throw and expect to catch exceptions its expensive and Scala’s IO/Async takes forced programmers to not rely on the stack for basic error handling (which is a good thing). If my previous point about loom is correct (i.e. loom is forced to propagate stack in order to remain code compatible with existing code that relies on try/catch + preservation of stack to function). I also haven’t seen any ability for Loom to granularly handle stack propagation so you don’t deal with performance penalty if you don’t rely on exceptions.

For this reason alone (and others), despite what people claim Loom is not going to kill Scala’s Future even in the hypothetical where everyone runs JVM 19+ (w/e version is released with Loom) and Scala.js/scala-native is ignored.

This is not a nice thing. In fact, constructing exceptions in Future-based code incurs the cost of building out a stack trace, without the benefit–because the stack traces so constructed are useless and only reflect the callstack from the last “bounce” inside the execution context to the current operation.

Async stack traces in Loom are the same as sync stack traces, and have the same overhead. You pay this overhead only when (1) your code actually fails, and (2) your exception type is generating an exception (not all exception types are wired to generate exceptions, see also NoStackTrace).

I am not a fan of throwing exceptions (versus using typed values), but this is not a reason to prefer values, because if you are using exception types that do not generate stack traces, then it will be faster than value-based error propagation tends to be (Either, Try, etc.).

Stack tracing is absolutely and positively not a reason that Future will survive in a post-Loom world. Future has no benefits whatsoever with respect to stack tracing compared to Loom.

Future will survive only because people don’t want to rip apart legacy code bases, not because Future conveys any value with respect to stack traces (or anything else, really, since the marginal utility of other benefits is better obtained using more modern data types, like a typed-VirtualThread, for example).

5 Likes

I agree, re-throwing exceptions is one of the options. But then, should these exceptions be tracked in the type system or not? Previously, the main criticism of exceptions was that they were untracked. We now have a way to track them with experimental.saferExceptions made watertight with capture checking. But that means successor future should have the error type as a type parameter rather than fixed to Exception, because otherwise info about thrown exceptions is not propagated across futures.

1 Like

Well if you care that much about the cost of a the stack trace you can use scala.util.control.NoStackTrace for exactly this problem, you will just be passing (almost) a reference around.

But this is all besides the point because if you care so much about the cost of stack traces then you shouldn’t be even using Future.failed (or throwing/catching) which goes back to the point of using value based error handling.

How does this work exactly? The reason why stack traces are “free” with normal threads is that the stack is part of the OS thread and due to already paying the cost of a heavy OS thread passing the stack along doesn’t cost anything.

On the other hand the whole point with green thread/fiber implementations is they typically do not have any “stack” on them and they have a very small size (ergo 1kb for Erlag) so while catching/throwing can be made free, preserving the stack trace especially for very large non local calls is another story. Of course you can just pass the incrementally growing stack along in your virtual thread but that has a performance penalty and since you also experience problems due to cache locality of threads.

Well its another reason on a bucket list of reasons but regarding the rest of your point, there is no fundamental reason why error based value propagation is less performant then using try/catch without stack propagation because in the end it all amounts to the same thing, i.e. control flow mechanism. On the JVM using error based value handling can be slower but thats a because JVM isn’t that optimized for it, you can look at go instead which has optimized their runtime for value based error propagation (note that my response also takes into account that we are comparing apples to apples, i.e. if you are referencing error based values then you also need to compare that to catching exception’s to use the value of the exception being caught).

More concerningly though if you care that much about Loom and the JVM, typical JVM/Java code does preserve and propagate stack. scala.util.control.NoStackTrace is a Scala specific feature and I don’t even remember seeing Java programs create their own version of scala.util.control.NoStackTrace to mitigate cost of stack propagation, in fact in such cases they use values/null if they care about performance that much.

I think you misunderstood my point, the benefit of Future is precisely that it forced programmers to NOT care about the stack at all and also to NOT use it as the primary error handling mechanism.

This reminds me of the exact same argument that people were using to justify java.misc.unsafe having no reason to exist. In the worst case scenario, even in the context of a library creator/maintainer, such abstractions are necessary and it has nothing to do with legacy. Whether people like it or not, Future is not going anywhere for reasons aside of legacy.

1 Like