PRE-SIP: Suspended functions and continuations

There is also HVM which does automatic parallelisation of purely functional programs https://github.com/Kindelia/HVM which you may find interesting.

To do something like this in Scala it would require a way for tracking purity, i.e. the compiler needs to know that functions are pure/side effect free which currently isn’t possible (you can use types at runtime to designate that computations are pure however the compiler just sees it as a type and nothing more).

Finally you would need to see if its possible for the JVM to show the same performance characteristics that HVM via clang does.

2 Likes

Well, it’s interesting, but limiting automatic parallelism to only purely functional code reduces its applicability too much.

Add a soft keyword to solve a problem that will not be valid soon after Loom been released maybe not a good idea.

1 Like

I think I would frame the question differently; at least for me “suspension” sounds quite abstract and hard to reflect in the real world.

Maybe I’m oversimplifying, but isn’t the crux of the problem answering the question whether we want side-effecting and “normal” methods to have the same signature? (I tried asking the same on twitter some time ago, but without conclusive answers :wink: ).

If the signatures should be different - then the second step would be considering specific solutions. Coloring using IO, coloring using suspend or coloring using capabilities - I think these are the propositions on the table.

If however the method signature shouldn’t tell us whether the method is side effecting, and if we are targeting a Loom runtime - well then neither suspensions, nor capabilities are going to be useful.

Not that I know the “right” answers, however I am leaning towards having side-effecting and “normal” methods distinguished by the type system. The reason is simple, and I think quite well-known in the literature as RPC fallacy. People did attempt to make a remote call look as if it was a local call, and as far as I know, they all failed. To the point that it’s now pretty well established that it’s a bad idea. To give more context, take a look at Jonas Boner’s presentation. (Maybe Lightbend wasn’t so wrong about async after all :wink: But then, maybe the are talking about “async in the large” not “async in the small”?)

It’s all about failure modes - the ways in which a remote call can fail are vastly different from the ways a local call can. And by the way - all file-reading operations are network calls as well. Shouldn’t we be tracking this in our type system? The side-effecting operations mentioned here would probably be more or less what today we know as “blocking” operations, but as blocking is no longer an issue with Loom, we need to make our focus more precise (and that’s a good thing!).

These side-effecting capabilities can be more or less fine grained, but a direct consequence of a capability being required or not in a function, is how the function might fail, which errors and how be handled and how. And this brings direct runtime consequences.

In yet another words: I would first established what kind of properties we want to track through the type system in Scala. Reading through the (very interesting) proposal and discussion, I think there are quite diverging opinions. But only once a goal is set (not necessarily to unanimous applause), as to how far the type system should go, we can consider the (secondary to safety) syntactical approaches: using either wrapper, suspended or direct style.

8 Likes

The interesting point to make here is that out of co-incidence in almost all cases async tasks also happen to be side effecting, that is pretty much every IO/file/network operation also happens to be a side effect. What this means in practice is that even if you deliberately avoid the color function problems for async tasks (i.e. you make no distinction between asynchronous and non synchronous computations), if you still care about strongly typing your side effects then you pretty much end up re-creating red-blue color problem anyways.

This actually describes the history and design of Haskell. That is, even if Haskell didn’t have virtual/green threads and solved the IO vs CPU bounded computations in a different way, it would still have the IO because thats how Haskell is able to solve the “representing side effects in a purely functional language” problem.

I would argue that this is the reason why making a big deal out of the red-blue/color function problem is a bit benign because if you accept the preposition that a significant portion of the Scala programmers track side effects via types then you end up, by accident, marking your computations as asynchronous anyways. Which brings us to final point, if there is significant (usually in practice pretty much complete) overlap between marking async functions and marking side effecting functions, doesn’t it make sense to take advantage of this since we are solving 2 “problems” at once?

2 Likes

This is something commonly heard, but it doesn’t pass the sniff test.

  • The aws CLI command is called the same way as local CLI commands
  • The boto3 Python library is called the same way as local Python libraries
  • requests.get calls in Python looks the same as any other method calls

Yes, treating RPCs the same as normal methods can fail. In high performance or high concurrency scenarios, where the thread overhead is unacceptable. Or in high-reiability scenarios, where the novel failure modes become significant.

But to say “they all failed” is absurd. There are more people happily using Python’s requests alone than there are in the entire Scala community. Probably the majority of the world is treating RPCs like normal method calls, and it generally works reasonably well.

Sure sometimes treating RPCs as normal methods has caveats and overheads, and sometimes it falls apart, but that’s not unique to RPCs: every abstraction has caveats and overheads, and scenarios they fail. But that doesn’t mean they’re failures in general, it just means that specialized use cases sometimes call for specialized tools or techniques.

5 Likes

I think you might be comparing apples and oranges here.

CLI commands have a single (rather coarse-grained) path to handling failures (die with some error code), so the difference between aw and less failing is much less relevant than the difference in failure modes between a pure function and a database query.

The python libraries are also not really equivalent comparisons for a similar the same reason: idiomatic error handling in Python is to just throw an exception, so two Python functions which both have a return type and may-or-may-not throw exceptions (but you’d better assume they do) aren’t a great analog for how failure modes are handled in idiomatic Scala.

It would make sense that, if async computations can be made so performant that the difference between a pure function and side-effecting network call can be made invisible on the JVM, that this would be a boon to Java applications, and in this context, Loom replacing Future in Java applications makes a lot of sense.

However: having recently had to try to answer the question, “how many ways can critical method X fail”, in a part of a codebase that (while written in Scala) used the Exception-first style, I can say with certainty that moving to this sort of style would be a mistake.

3 Likes

Note that despite the sensationalist claims of its author, this thing is not viable at all. It fails catastrophically on certain program shapes. The author says a type system could rule out these program shapes, but no type system that does something like that has been demonstrated so far AFAIK.

1 Like

I don’t think so. This is a non-goal for many in the Scala community, including I believe Odersky (citation needed). Capture tracking’s notion of purity does not correspond to functional programming’s notion of purity. Indeed, neither concept embeds the other completely, so while they overlap in some cases, they are genuinely different concepts.

There is probably no future Scala version (from EPFL) that tracks what functional programmers mean by ‘purity’. Scala is a hybrid language with a user base beyond pure functional developers, and an implicit goal to capture Python-like markets, which entails an embrace of procedural programming.

Java already makes them identical via various RPC frameworks. The main problem is inefficiency. Loom allows you to make them look identical while still retaining efficiency.

There are extremely compelling reasons to do so:

  • Handle RPC errors with try/catch/finally (the value of this CANNOT be overstated)
  • Abstract over both local and remote implementations
  • Write resource-safe code using ordinary language mechanisms (try-with-resources, try/finally, etc.)
  • Single-colored functions

Perhaps in a new programming language designed for cloud-native computation, one would have some differences (to be proposed) between local and remote computations.

But for ordinary programming languages designed prior to the advent of cloud-native systems, the pros of having a uniform computation model vastly outweigh the benefits (indeed, the uniformity is a primary driver of adoption for functional effect systems!).

Moreover, the drawbacks have been overstated. There are two main drawbacks to RCP-calls-as-ordinary-function-calls:

  1. Failure with new error types. RPC calls may fail in new ways that application code may not anticipate or necessarily know how to deal with. I think such is largely solvable without new language constructs by better design of RPCs.
  2. More seriously, timeout and retry behavior. RPC calls are flakier than local calls and subject to significantly longer delays. However, these have robust solutions that work across both local procedure calls and remote procedure calls: retry strategies and timeout policies. Retry strategies properly apply to recoverable errors and are useful in local and remote contexts; timeout policies too, are useful in both local and remote contexts. Frameworks (or, to take a more extreme point of view, libraries and even programming languages) should take special care to separate recoverable and non-recoverable errors and provide compositional ways of applying both retry and timeout policies.

Currently Loom does not provide a lot of machinery to help with (2). However, it provides a solid foundation for library authors to develop their own approaches to solving these challenges, based on underlying language primitives that are proven and familiar to developers.

More precisely, today we have “async blocking”, which happens when a fiber / virtual thread suspends, waiting external re-activation, and we also have “sync blocking”, which happens when a physical thread suspends, waiting external re-activation. What Loom is doing is upgrading almost all “sync blocking” to “async blocking”. Semantically, they’re all blocking, it’s just a question of efficiency: async blocking is vastly more efficient than sync blocking, so it’s merely a sort of optimization applied retroactively to the masses of synchronous code that have already been written.

I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals. But it is clear that no official answers will be forthcoming until capability-based research program is closer to completion (ETA: 5 years). And until then it is extremely risky to modify the language, especially in ways that import already-obsolete Kotlin designs into the much more modern Scala 3 programming language.

2 Likes

I don’t know for the general use case, but ZIO solved that issue pretty well. It creates a very insightful error trace, with what code would have been executed next (in the context of the app, not the internal fiber management weaving). Very actionnable, debug is (almost) as simple as in mono-threaded code.
And if I followed things correctly, in ZIO it’s even cheap (in runtime perf - almost free, even)

2 Likes

Yes, although one of these two problems, on one of the platforms (JVM/Loom) is set to disappear. Hence my proposition to shift the focus of the problem on something that isn’t platform-dependent :slight_smile:

I didn’t say I want to track purity :wink: That’s probably too much. Writing to mutable state? Probably not. Performing a network call? Probably yes. Maybe tracking non-local computation would be a good, precise term?

(in fact you propose the same in the next section, as I now see)

Java already makes them identical via various RPC frameworks.

Not always - you often get different checked exceptions, which is a way of “marking” a method as side-effecting. Where we have IO[], java often has throws IOException - both influence the signatures. But again given history, we might be looking for better solutions than checked exceptions (I think in general in Scala we are looking for better solutions to various problems :slight_smile: ).

Do we want the compiler to point out that we might not be handling all the error cases that we should (which could lead to applying e.g. a retry/timeout strategy)? I think in typed a language the answer might be “yes”.

I do not think that question will ever have agreement, which maybe argues Scala should be more opinionated so as to select for a user base compatible with its goals.

There definitely won’t be agreement, but luckily we have EPFL and Martin who picks the direction as the where Scala should be headed (with input from the community of course, but ultimately somebody has to make some choices from time to time).

1 Like

I think they failed as ultimately you do need to tackle the fact that an RPC call fails differently from a local call. Now this might be done with discipline (in Python) or with the help of a compiler (in Scala) - that’s a dynamic vs static typed discussion, people have different preferences, and that’s completely fine.

But there are no magic solutions which make RPC calls behave just as local calls. You need different code when doing an RPC, than when doing a local call. (note that this code might be far away from the invocation site, somehere in an error handler, but it still needs to be there).

2 Likes

That is what we call “effects”. Interaction between an automaton and its environment: I highly recommend Oleg Kiselyov's talk titled "Having an Effect"[0] in which he... | Hacker News

You seem to want an effect system.

Thanks, I’ll take a look.

I might be indeed looking for what’s known as an “effect system” in literature, however I have the feeling that outside of academia, the term “effect tracking” is an overloaded term, with many possible meanings (covering mutable state, async, remote computations etc.). So maybe a more precise one would suit our communication better.

There’s something that still isn’t clear for me from the discussion. Does Loom somehow solve the classic N+1 problem? I.e. let’s say I have a function that does a Google search: def google(str: String): List[URL]. Now I try doing this:

val list : List[String] = ???
list.map(str => google(str))

How does Loom ensure this is done efficiently, i.e. by spawning one thread per element of list?

If we tracked in types that google can perform a costly block, we would be able to use that information to, perhaps, forbid the above piece of code. Perhaps there should be a variant of map which always spawns a thread per element and allows blocking operations.

Regardless of what the exact solution is, tracking in types that google can block seems better than the situation where it’s easy to have into performance problems when using it. Sure, similar problems would occur with computation-intensive function as well, but I feel like they occur much more easily once we start doing async programming and a single function call could suddenly take 100ms, or whatever is the local Google roundtrip time.

1 Like

You are conflating concurrency with asynchronicity.

Loom does not change the semantics of your code: in particular, it does not automatically insert any concurrent operations, nor does such a thing make sense in general (see above academic references on auto-parallelization, which is fraught with known issues).

Loom merely takes your synchronous code (that is, code formerly using physical threads and operations like IO or locks that “sync block” those threads) and makes it fully asynchronous (using virtual threads and “async blocking”, which is more efficient than “sync blocking”).

As such, maybe in your code base, you have some code like list.map(str => google(str)), where each invocation to google blocks a physical thread. Under Loom, the code has the same meaning and will produce the same result, only google can now be fully asynchronous (which does not imply it is concurrent with respect to the thread executing the List#map, because it is NOT concurrent), which means you get the same behavior before but it runs more efficiently.

Loom is all about efficiency, not concurrency, per se: taking the same programs and making them work better. As a consequence, you can now do “async operations” (i.e. “efficient operations”) anywhere without having wrapper types like Future, including in List#map.

2 Likes

Ok, so you want to track “local” computation versus “remote” computation. First off, that would not be related to async versus sync tracking: both sync and async can do remote computation, the only difference is efficiency.

Second, in the era of cloud-native applications, the cloud itself has become a sort of standard library: every other call is to some microservice or GraphQL or REST API. Our applications are the glue that hold together operations implemented in the cloud. So tracking “remote” computation may be increasingly and incredibly noisy, as we enter a future in which nearly all calls might be “remote”.

Third, and in my opinion, it is very important to not be obsessed with “tracking” things for the sake of academic novelty (which is good for obtaining grant money but bad for commercial software). Tracking information using types involves considerable effort for developers, who have to type more characters and wrestle with more mistakes (see also: uninferrable exception lists in Java). You can, like Odersky is trying to do, reduce the cost of tracking–preferrably NOT via inserting more magic fraught with edge cases that works in unexpected ways with other language features, such as “auto-adaptation” in context functions–but fundamentally, you must still acknowledge it has a cost.

To pay for itself, you have to demonstrate that the information is (a) actionable, and (b) so frequently actionable that the costs of universal tracking are outweighted by the proven benefits.

I have not even heard a hand-wavvy argument on remote vs local being actionable: what would a developer do differently, knowing that “doX()” is a remote call versus a local call? What would the developer do differently, knowing that “doX()” is a local call versus a remote call? Not abstractly, but what concrete code would a developer write knowing such a difference?

I have argued above that the steps a developer would and should take to flaky computations always involves retries, and the steps a developer would and should take to long-running computations always involves timeouts. Although remote computations are more likely to be flaky and long-running, it is only a correlation, and many local computations can be both flaky and long-running. So the mere presence or absense of a “remote bit” is likely to be insufficient information to be actionable.

If I am wrong, then it should be possible to provide some evidence that:

  1. Devleopers know to do and actually do something radically different based on the “remote bit”, such that it significantly affects correctness or performance or some other metric that matters to the business.
  2. Developers do this so often that it overwhelms the significant drawbacks to infecting every type signature across the entire code base with a “remote bit” (or at least, infecting either all remote code, or all local code, with such a bit, if you can infer its negation by its absence).

Ultimately, my stance is that “effect tracking” is a distraction and a waste of resources, hence my blog post, Effect Tracking Is Commercially Worthless.

That dynamic could change in a future in which tracking things is cost-free or super-low-cost and completely automatic (fully type-inferred), but until when and if that point arrives, I will always be asking proponents of effect tracking to demonstrate (a) actionability of information, and (2) pervasiveness of need, such that benefits clearly outweigh costs. To my knowledge, no one has demonstrated this in the case of remote vs local, and it cannot be demonstrated at all in the case of sync vs async.

2 Likes

Correct. That’s the whole point. I thought we’re past the sync/async distinction :wink:

Also agreed. So if we want local and remote invocations to have a different signature, because of cloud-native the cost has to be minimised. I think that’s the point of @odersky 's research project.

Well I would say that you have demonstrated that two paragraphs below: the actions to take are retries and timeouts, the frequency is there because of cloud native.

One point where I would disagree is that local computations need recovery logic as above to a similar degree as remote do. I don’t think it’s only correlation. Every remote invocation can be flakey / long-running / throw errors randomly. But only some local ones have these characteristics.

Now, I don’t have hard empirical evidence that the “remote bit” actually matters. Only anecdotal :wink: But on the other hand, is there evidence that a consistent and principled approach to errors originating from remote calls doesn’t influence the bug ratio? Especially that these bugs tend to manifest themselves in production, not in the calm and idealised test environment.

Finally, aren’t we talking here about error handling - something that is very close to the heart of every ZIO programmer? The whole point of effect tracking, or remote-call tracking, or however we call it, is to properly handle the error scenarios. Java implements this by requiring methods to add throws IOException, which is often circumvented by programmers. ZIO moves the error channel to a type parameter, for composability. I don’t think it’s at all unreasonable to look for other, maybe more general solutions, where errors are just one specialisation of the “effect” a computation might have.

3 Likes

I would be happy if that were true but given other posts on this thread, including, indeed, the nature of the pre-SIP itself, it seems unlikely. :grinning_face_with_smiling_eyes:

Indeed, Odersky himself stated:

“The sync/async problem is one of the fundamental problems we study [in our 7 persons over 5 year project].” (emphasis added)

From my experience, I would say that developers failing to apply retry or timeout logic is not a significant source of lost business revenue, partially because libraries and frameworks are designed to handle or carot users into doing the correct thing (e.g. Http.get requiring a timeout parameter).

It happens sometimes, and it has measurable costs, but the overall amount of revenue lost due to failure to apply retry or timeout logic pales in comparison to the revenue lost dealing with unexpected null values, transformating data from A to B without mistakes, or possibly even retrying the wrong thing (e.g. NPE) because of the lack of a two-channel error model.

Even for resource handling, the main issue in modern web apps is memory leaks; the occurrence of lost file handles or connections in a database pool is made rare by libraries and frameworks (or try-with-resources in Java).

For things which are not a significant problem in commerical software development, it is all the more important to ensure the costs are minimized; and to ensure that new features aimed at addressing these “problems” produce clear benefit in magnitude sufficient as to overwhelm those minimized costs.

I agree that only some local computations have these characteristics, but not that all remote ones do. For example, if your application is running with EBS or EFS storage, then despite all disk-related operations being remote, it is extremely unlikley to be flaky or long-running.

This raises another important point: that sometimes operations that your application may expect to be local, are in fact remote. Which means that any attempt to track “local” versus “remote” is at best an educated guess. Indeed, a repository interface may suggest the database is remote, while a particular implementation may be using H2 embedded.

To me, this is feeling like researching how many angels can dance on the head of a pin.

Meanwhile, while we discuss whether to embed a remote versus local bit in the type system (in a TBD comonadic effect system that no one is asking for, despite, of course, some academic value), modern cloud-native, industry-focused languages like Ballerina make it trivial to produce and consume cloud services and leverage user-defined data structures in cloud protocols, innovating on real problems that consume massive amounts of developer time.

Which of these focus areas stands to benefit industry the most?

(Actually, we’re not even really discussing local versus remote, because most people contributing to this thread seem to believe the async versus sync distinction is important to track in the type system.)

In my view, ZIO’s error handling works because (a) it is based on values, which allow even polymorphic abstraction over duplication (b) it is fully inferred, meaning no additional developer work is required to benefit from it (“zero” cost), and (c) it leverages the type system to cleanly separate recoverable errors from non-recoverable errors, with an ability to dynamically shift errors between channels (which is critical in a cloud-native environment, where only some errors should be retried). Java failed on all three accounts, which is, I believe, why checked exceptions are regarded mostly as a mistake (CanThrow fails on two accounts, and its potential successor will probably fail on those same two accounts).

I would be happy to see another error model that takes this same direction with fewer costs and / or greater benefits, and if that happens to be part of a capability-based (comonadic) effect system geared toward solving problems rather than tracking bits of debatable value, then I would appreciate that, as well. But keep in mind the burden of proof is on those making the claim that such a system would be superior to what exists today, and that it warrants investment and support from the broader Scala community.

2 Likes