Note that despite the sensationalist claims of its author, this thing is not viable at all. It fails catastrophically on certain program shapes. The author says a type system could rule out these program shapes, but no type system that does something like that has been demonstrated so far AFAIK.
I donât think so. This is a non-goal for many in the Scala community, including I believe Odersky (citation needed). Capture trackingâs notion of purity does not correspond to functional programmingâs notion of purity. Indeed, neither concept embeds the other completely, so while they overlap in some cases, they are genuinely different concepts.
There is probably no future Scala version (from EPFL) that tracks what functional programmers mean by âpurityâ. Scala is a hybrid language with a user base beyond pure functional developers, and an implicit goal to capture Python-like markets, which entails an embrace of procedural programming.
Java already makes them identical via various RPC frameworks. The main problem is inefficiency. Loom allows you to make them look identical while still retaining efficiency.
There are extremely compelling reasons to do so:
- Handle RPC errors with try/catch/finally (the value of this CANNOT be overstated)
- Abstract over both local and remote implementations
- Write resource-safe code using ordinary language mechanisms (try-with-resources, try/finally, etc.)
- Single-colored functions
Perhaps in a new programming language designed for cloud-native computation, one would have some differences (to be proposed) between local and remote computations.
But for ordinary programming languages designed prior to the advent of cloud-native systems, the pros of having a uniform computation model vastly outweigh the benefits (indeed, the uniformity is a primary driver of adoption for functional effect systems!).
Moreover, the drawbacks have been overstated. There are two main drawbacks to RCP-calls-as-ordinary-function-calls:
- Failure with new error types. RPC calls may fail in new ways that application code may not anticipate or necessarily know how to deal with. I think such is largely solvable without new language constructs by better design of RPCs.
- More seriously, timeout and retry behavior. RPC calls are flakier than local calls and subject to significantly longer delays. However, these have robust solutions that work across both local procedure calls and remote procedure calls: retry strategies and timeout policies. Retry strategies properly apply to recoverable errors and are useful in local and remote contexts; timeout policies too, are useful in both local and remote contexts. Frameworks (or, to take a more extreme point of view, libraries and even programming languages) should take special care to separate recoverable and non-recoverable errors and provide compositional ways of applying both retry and timeout policies.
Currently Loom does not provide a lot of machinery to help with (2). However, it provides a solid foundation for library authors to develop their own approaches to solving these challenges, based on underlying language primitives that are proven and familiar to developers.
More precisely, today we have âasync blockingâ, which happens when a fiber / virtual thread suspends, waiting external re-activation, and we also have âsync blockingâ, which happens when a physical thread suspends, waiting external re-activation. What Loom is doing is upgrading almost all âsync blockingâ to âasync blockingâ. Semantically, theyâre all blocking, itâs just a question of efficiency: async blocking is vastly more efficient than sync blocking, so itâs merely a sort of optimization applied retroactively to the masses of synchronous code that have already been written.
I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals. But it is clear that no official answers will be forthcoming until capability-based research program is closer to completion (ETA: 5 years). And until then it is extremely risky to modify the language, especially in ways that import already-obsolete Kotlin designs into the much more modern Scala 3 programming language.
I donât know for the general use case, but ZIO solved that issue pretty well. It creates a very insightful error trace, with what code would have been executed next (in the context of the app, not the internal fiber management weaving). Very actionnable, debug is (almost) as simple as in mono-threaded code.
And if I followed things correctly, in ZIO itâs even cheap (in runtime perf - almost free, even)
Yes, although one of these two problems, on one of the platforms (JVM/Loom) is set to disappear. Hence my proposition to shift the focus of the problem on something that isnât platform-dependent
I didnât say I want to track purity Thatâs probably too much. Writing to mutable state? Probably not. Performing a network call? Probably yes. Maybe tracking non-local computation would be a good, precise term?
(in fact you propose the same in the next section, as I now see)
Java already makes them identical via various RPC frameworks.
Not always - you often get different checked exceptions, which is a way of âmarkingâ a method as side-effecting. Where we have IO[]
, java often has throws IOException
- both influence the signatures. But again given history, we might be looking for better solutions than checked exceptions (I think in general in Scala we are looking for better solutions to various problems ).
Do we want the compiler to point out that we might not be handling all the error cases that we should (which could lead to applying e.g. a retry/timeout strategy)? I think in typed a language the answer might be âyesâ.
I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals.
There definitely wonât be agreement, but luckily we have EPFL and Martin who picks the direction as the where Scala should be headed (with input from the community of course, but ultimately somebody has to make some choices from time to time).
I think they failed as ultimately you do need to tackle the fact that an RPC call fails differently from a local call. Now this might be done with discipline (in Python) or with the help of a compiler (in Scala) - thatâs a dynamic vs static typed discussion, people have different preferences, and thatâs completely fine.
But there are no magic solutions which make RPC calls behave just as local calls. You need different code when doing an RPC, than when doing a local call. (note that this code might be far away from the invocation site, somehere in an error handler, but it still needs to be there).
That is what we call âeffectsâ. Interaction between an automaton and its environment: I highly recommend Oleg Kiselyov's talk titled "Having an Effect"[0] in which he... | Hacker News
You seem to want an effect system.
Thanks, Iâll take a look.
I might be indeed looking for whatâs known as an âeffect systemâ in literature, however I have the feeling that outside of academia, the term âeffect trackingâ is an overloaded term, with many possible meanings (covering mutable state, async, remote computations etc.). So maybe a more precise one would suit our communication better.
Thereâs something that still isnât clear for me from the discussion. Does Loom somehow solve the classic N+1 problem? I.e. letâs say I have a function that does a Google search: def google(str: String): List[URL]
. Now I try doing this:
val list : List[String] = ???
list.map(str => google(str))
How does Loom ensure this is done efficiently, i.e. by spawning one thread per element of list
?
If we tracked in types that google
can perform a costly block, we would be able to use that information to, perhaps, forbid the above piece of code. Perhaps there should be a variant of map
which always spawns a thread per element and allows blocking operations.
Regardless of what the exact solution is, tracking in types that google
can block seems better than the situation where itâs easy to have into performance problems when using it. Sure, similar problems would occur with computation-intensive function as well, but I feel like they occur much more easily once we start doing async programming and a single function call could suddenly take 100ms, or whatever is the local Google roundtrip time.
You are conflating concurrency with asynchronicity.
Loom does not change the semantics of your code: in particular, it does not automatically insert any concurrent operations, nor does such a thing make sense in general (see above academic references on auto-parallelization, which is fraught with known issues).
Loom merely takes your synchronous code (that is, code formerly using physical threads and operations like IO or locks that âsync blockâ those threads) and makes it fully asynchronous (using virtual threads and âasync blockingâ, which is more efficient than âsync blockingâ).
As such, maybe in your code base, you have some code like list.map(str => google(str))
, where each invocation to google
blocks a physical thread. Under Loom, the code has the same meaning and will produce the same result, only google
can now be fully asynchronous (which does not imply it is concurrent with respect to the thread executing the List#map
, because it is NOT concurrent), which means you get the same behavior before but it runs more efficiently.
Loom is all about efficiency, not concurrency, per se: taking the same programs and making them work better. As a consequence, you can now do âasync operationsâ (i.e. âefficient operationsâ) anywhere without having wrapper types like Future
, including in List#map
.
Ok, so you want to track âlocalâ computation versus âremoteâ computation. First off, that would not be related to async versus sync tracking: both sync and async can do remote computation, the only difference is efficiency.
Second, in the era of cloud-native applications, the cloud itself has become a sort of standard library: every other call is to some microservice or GraphQL or REST API. Our applications are the glue that hold together operations implemented in the cloud. So tracking âremoteâ computation may be increasingly and incredibly noisy, as we enter a future in which nearly all calls might be âremoteâ.
Third, and in my opinion, it is very important to not be obsessed with âtrackingâ things for the sake of academic novelty (which is good for obtaining grant money but bad for commercial software). Tracking information using types involves considerable effort for developers, who have to type more characters and wrestle with more mistakes (see also: uninferrable exception lists in Java). You can, like Odersky is trying to do, reduce the cost of trackingâpreferrably NOT via inserting more magic fraught with edge cases that works in unexpected ways with other language features, such as âauto-adaptationâ in context functionsâbut fundamentally, you must still acknowledge it has a cost.
To pay for itself, you have to demonstrate that the information is (a) actionable, and (b) so frequently actionable that the costs of universal tracking are outweighted by the proven benefits.
I have not even heard a hand-wavvy argument on remote vs local being actionable: what would a developer do differently, knowing that âdoX()â is a remote call versus a local call? What would the developer do differently, knowing that âdoX()â is a local call versus a remote call? Not abstractly, but what concrete code would a developer write knowing such a difference?
I have argued above that the steps a developer would and should take to flaky computations always involves retries, and the steps a developer would and should take to long-running computations always involves timeouts. Although remote computations are more likely to be flaky and long-running, it is only a correlation, and many local computations can be both flaky and long-running. So the mere presence or absense of a âremote bitâ is likely to be insufficient information to be actionable.
If I am wrong, then it should be possible to provide some evidence that:
- Devleopers know to do and actually do something radically different based on the âremote bitâ, such that it significantly affects correctness or performance or some other metric that matters to the business.
- Developers do this so often that it overwhelms the significant drawbacks to infecting every type signature across the entire code base with a âremote bitâ (or at least, infecting either all remote code, or all local code, with such a bit, if you can infer its negation by its absence).
Ultimately, my stance is that âeffect trackingâ is a distraction and a waste of resources, hence my blog post, Effect Tracking Is Commercially Worthless.
That dynamic could change in a future in which tracking things is cost-free or super-low-cost and completely automatic (fully type-inferred), but until when and if that point arrives, I will always be asking proponents of effect tracking to demonstrate (a) actionability of information, and (2) pervasiveness of need, such that benefits clearly outweigh costs. To my knowledge, no one has demonstrated this in the case of remote vs local, and it cannot be demonstrated at all in the case of sync vs async.
Correct. Thatâs the whole point. I thought weâre past the sync/async distinction
Also agreed. So if we want local and remote invocations to have a different signature, because of cloud-native the cost has to be minimised. I think thatâs the point of @odersky 's research project.
Well I would say that you have demonstrated that two paragraphs below: the actions to take are retries and timeouts, the frequency is there because of cloud native.
One point where I would disagree is that local computations need recovery logic as above to a similar degree as remote do. I donât think itâs only correlation. Every remote invocation can be flakey / long-running / throw errors randomly. But only some local ones have these characteristics.
Now, I donât have hard empirical evidence that the âremote bitâ actually matters. Only anecdotal But on the other hand, is there evidence that a consistent and principled approach to errors originating from remote calls doesnât influence the bug ratio? Especially that these bugs tend to manifest themselves in production, not in the calm and idealised test environment.
Finally, arenât we talking here about error handling - something that is very close to the heart of every ZIO programmer? The whole point of effect tracking, or remote-call tracking, or however we call it, is to properly handle the error scenarios. Java implements this by requiring methods to add throws IOException
, which is often circumvented by programmers. ZIO moves the error channel to a type parameter, for composability. I donât think itâs at all unreasonable to look for other, maybe more general solutions, where errors are just one specialisation of the âeffectâ a computation might have.
I would be happy if that were true but given other posts on this thread, including, indeed, the nature of the pre-SIP itself, it seems unlikely.
Indeed, Odersky himself stated:
âThe sync/async problem is one of the fundamental problems we study [in our 7 persons over 5 year project].â (emphasis added)
From my experience, I would say that developers failing to apply retry or timeout logic is not a significant source of lost business revenue, partially because libraries and frameworks are designed to handle or carot users into doing the correct thing (e.g. Http.get
requiring a timeout
parameter).
It happens sometimes, and it has measurable costs, but the overall amount of revenue lost due to failure to apply retry or timeout logic pales in comparison to the revenue lost dealing with unexpected null
values, transformating data from A to B without mistakes, or possibly even retrying the wrong thing (e.g. NPE) because of the lack of a two-channel error model.
Even for resource handling, the main issue in modern web apps is memory leaks; the occurrence of lost file handles or connections in a database pool is made rare by libraries and frameworks (or try-with-resources
in Java).
For things which are not a significant problem in commerical software development, it is all the more important to ensure the costs are minimized; and to ensure that new features aimed at addressing these âproblemsâ produce clear benefit in magnitude sufficient as to overwhelm those minimized costs.
I agree that only some local computations have these characteristics, but not that all remote ones do. For example, if your application is running with EBS or EFS storage, then despite all disk-related operations being remote, it is extremely unlikley to be flaky or long-running.
This raises another important point: that sometimes operations that your application may expect to be local, are in fact remote. Which means that any attempt to track âlocalâ versus âremoteâ is at best an educated guess. Indeed, a repository interface may suggest the database is remote, while a particular implementation may be using H2 embedded.
To me, this is feeling like researching how many angels can dance on the head of a pin.
Meanwhile, while we discuss whether to embed a remote versus local bit in the type system (in a TBD comonadic effect system that no one is asking for, despite, of course, some academic value), modern cloud-native, industry-focused languages like Ballerina make it trivial to produce and consume cloud services and leverage user-defined data structures in cloud protocols, innovating on real problems that consume massive amounts of developer time.
Which of these focus areas stands to benefit industry the most?
(Actually, weâre not even really discussing local versus remote, because most people contributing to this thread seem to believe the async versus sync distinction is important to track in the type system.)
In my view, ZIOâs error handling works because (a) it is based on values, which allow even polymorphic abstraction over duplication (b) it is fully inferred, meaning no additional developer work is required to benefit from it (âzeroâ cost), and (c) it leverages the type system to cleanly separate recoverable errors from non-recoverable errors, with an ability to dynamically shift errors between channels (which is critical in a cloud-native environment, where only some errors should be retried). Java failed on all three accounts, which is, I believe, why checked exceptions are regarded mostly as a mistake (CanThrow
fails on two accounts, and its potential successor will probably fail on those same two accounts).
I would be happy to see another error model that takes this same direction with fewer costs and / or greater benefits, and if that happens to be part of a capability-based (comonadic) effect system geared toward solving problems rather than tracking bits of debatable value, then I would appreciate that, as well. But keep in mind the burden of proof is on those making the claim that such a system would be superior to what exists today, and that it warrants investment and support from the broader Scala community.
One question that I think we have not answered: What about the error case? If Futures are replaced by VirtualThread with a result (or any of the other options discussed here), how are errors handled?
- Returning a sum of result and error? But then we are back to monadic instead of direct style. And you could argue that if you are working with monads, you might as well work with one that makes it explicit that things can suspend.
- Or throw an exception? But then these need to be tracked somehow
I sketched a Future
-lite successor in the other thread:
class Future[A](f: CompletableFuture[Try[A]], vt: VirtualThread) {
def virtualThread = vt
def result: A = f.get.get
// etc.
}
object Future {
def apply[A](code: => A): Future[A] = {
val f = new CompletableFuture[Try[A]]()
val vt = Thread.startVirtualThread { () =>
f.complete(Try(code()))
}
new Future(f, vt)
}
}
Future#result
gives you access to the result, and if that is an exception, then it would throw.
How to model that exceptional caseâwhether with Java checked exceptions, Try
values, Either
values, ZIO
values, CanThrow
capabilities, CanThrow
âs comonadic capability type-based successor, or something else entirelyâis independent from the sync/async question, and also independent from Loom.
In other words, however we would want to describe any exceptional result from any method, we should use that same process to describe the possible error that may result from calling Future#result
.
On a somewhat related note there is another plausible point which is stack traces + exceptions. The nice thing about Future
is its decoupled from stacktraces, that is by design you are not meant to expect the stack to be consistent or (even be there at all). As is evident by anyone that has used Future
along with the standard ExecutionContext
(such as ForkJoinPool
) the stack traces are meaningless because the computationâs can jump between different at whim (thats the whole point of multiplexing computations onto real threads). You can see this even in the Future
api, i.e. you have methods like Future.failed
to designate a failed Future
with a Throwable
but its just passed around and propagated as a value. You can still recover from exceptions thrown in Future
but as stated before its expensive, critically you donât have to throw
(i.e. you can just use Future.failed
).
This is one area where I am a bit skeptical loom, although I havenât looked at loom in great detail but if VirtualThread
is meant to preserve stack traces and there is a lot of code out there that assumes stack traces are consistent and properly propagated, unless I am missing something this will have a performance penalty (versus not caring about the stack at all). This performance penalty is already visible right now, either in the case of Future
with custom ExecutionContext
's that preserve stack or other IO types that propagate the stack in the interpreter. I do believe that loomâs solution to this problem is not going to have the same overhead but as said previously I donât see how it can be âcost freeâ.
Ultimately though this is one of the best benefits of doing value based error handling rather than throwing and catching exceptions, if you throw and expect to catch exceptions its expensive and Scalaâs IO/Async takes forced programmers to not rely on the stack for basic error handling (which is a good thing). If my previous point about loom is correct (i.e. loom is forced to propagate stack in order to remain code compatible with existing code that relies on try/catch + preservation of stack to function). I also havenât seen any ability for Loom to granularly handle stack propagation so you donât deal with performance penalty if you donât rely on exceptions.
For this reason alone (and others), despite what people claim Loom is not going to kill Scalaâs Future
even in the hypothetical where everyone runs JVM 19+ (w/e version is released with Loom) and Scala.js/scala-native is ignored.
This is not a nice thing. In fact, constructing exceptions in Future
-based code incurs the cost of building out a stack trace, without the benefitâbecause the stack traces so constructed are useless and only reflect the callstack from the last âbounceâ inside the execution context to the current operation.
Async stack traces in Loom are the same as sync stack traces, and have the same overhead. You pay this overhead only when (1) your code actually fails, and (2) your exception type is generating an exception (not all exception types are wired to generate exceptions, see also NoStackTrace
).
I am not a fan of throwing exceptions (versus using typed values), but this is not a reason to prefer values, because if you are using exception types that do not generate stack traces, then it will be faster than value-based error propagation tends to be (Either
, Try
, etc.).
Stack tracing is absolutely and positively not a reason that Future
will survive in a post-Loom world. Future
has no benefits whatsoever with respect to stack tracing compared to Loom.
Future
will survive only because people donât want to rip apart legacy code bases, not because Future
conveys any value with respect to stack traces (or anything else, really, since the marginal utility of other benefits is better obtained using more modern data types, like a typed-VirtualThread, for example).
I agree, re-throwing exceptions is one of the options. But then, should these exceptions be tracked in the type system or not? Previously, the main criticism of exceptions was that they were untracked. We now have a way to track them with experimental.saferExceptions
made watertight with capture checking. But that means successor future should have the error type as a type parameter rather than fixed to Exception
, because otherwise info about thrown exceptions is not propagated across futures.
Well if you care that much about the cost of a the stack trace you can use scala.util.control.NoStackTrace
for exactly this problem, you will just be passing (almost) a reference around.
But this is all besides the point because if you care so much about the cost of stack traces then you shouldnât be even using Future.failed
(or throwing/catching) which goes back to the point of using value based error handling.
How does this work exactly? The reason why stack traces are âfreeâ with normal threads is that the stack is part of the OS thread and due to already paying the cost of a heavy OS thread passing the stack along doesnât cost anything.
On the other hand the whole point with green thread/fiber implementations is they typically do not have any âstackâ on them and they have a very small size (ergo 1kb for Erlag) so while catching/throwing can be made free, preserving the stack trace especially for very large non local calls is another story. Of course you can just pass the incrementally growing stack along in your virtual thread but that has a performance penalty and since you also experience problems due to cache locality of threads.
Well its another reason on a bucket list of reasons but regarding the rest of your point, there is no fundamental reason why error based value propagation is less performant then using try/catch without stack propagation because in the end it all amounts to the same thing, i.e. control flow mechanism. On the JVM using error based value handling can be slower but thats a because JVM isnât that optimized for it, you can look at go instead which has optimized their runtime for value based error propagation (note that my response also takes into account that we are comparing apples to apples, i.e. if you are referencing error based values then you also need to compare that to catching exceptionâs to use the value of the exception being caught).
More concerningly though if you care that much about Loom and the JVM, typical JVM/Java code does preserve and propagate stack. scala.util.control.NoStackTrace
is a Scala specific feature and I donât even remember seeing Java programs create their own version of scala.util.control.NoStackTrace
to mitigate cost of stack propagation, in fact in such cases they use values/null if they care about performance that much.
I think you misunderstood my point, the benefit of Future
is precisely that it forced programmers to NOT care about the stack at all and also to NOT use it as the primary error handling mechanism.
This reminds me of the exact same argument that people were using to justify java.misc.unsafe
having no reason to exist. In the worst case scenario, even in the context of a library creator/maintainer, such abstractions are necessary and it has nothing to do with legacy. Whether people like it or not, Future
is not going anywhere for reasons aside of legacy.
In order for you to see how my statements are correct, I would have to explain Future
, the cost of stack trace generation, the connection between Future
and stack traces, the cost of exception throwing, the cost of catching exceptions, and the cost of value-based error propagation (both theoretical and as practiced in Try
, Either
, ZIO
, etc.), and possibly more besides.
I have no interest in explaining these things here, but I will repeat myself: that exceptions or error handling in general are NOT a reason to use Future
, not even slightly, if anything, the reverse, and that Loomâs impact on exception handling are only net positive.