PRE-SIP: Comprehensive Function Applications

diesalbla · September 1, 2022, 11:16am

The following is a proposal for a small Scala language improvement.

The purpose of this thread is to discuss the proposal, gather support or objections from users of the Scala language, get an overview (from compiler engineers) of its feasibility.

Comprehensive function application (syntax candy)

We propose to extend Scala with comprehensive function applications, a syntax to apply normal functions (or class constructors) to expressions with a wrapped type, so as to avoid using for-yield comprehensions. By wrapper we mean any data-type W[A], such as Option, List, or Future, with one type parameter and for which the map and flatMap operations are defined.

Example

As an example, we want to apply dwarves to a sequence of expressions as arguments.

def dwarves(fili: Int, kili: Char, ori: String, nori: Boolean): Double

In a plain context each argument evaluates to a plain value and the function returns a plain result. Thus, we can put those expressions inside the list of arguments, with or without parameter names, or we can extract them as local val.

// auxiliary declaration
def fetchOri(): String = "In a hole"

// Direct application
dwarves((18 / 3) * 7, 'g', fetchOri(), false)

// named parameters
dwarves(
  fili = (18 / 3) * 7,
  kili = 'g',
  ori = fetchOri(),
  nori = false
)

// Local `val` declarations.
val f = (18 / 3) * 7
val k = 'g'
val o = fetchOri()
val n = false
dwarves(fili = f, kili = k, ori = o, nori = n)

However, if any expression has a wrapped type, such as Option[Int], we cannot apply dwarves directly to it. We would need to remove the wrapping (which for some types like Future is not advised), or pop the wrapping up to the caller of the function. This is often done using the flatMap and map operations, or alternatively a for-yield syntactic sugar;:

def fetchOri(): Option[String] = Some("In a hole")

// map-flatMap 
val fili = (18 / 3) * 7
Some('g').flatMap { kili =>
  fetchOri().map { ori =>
    val nori = false
    dwarves(fili, kili, ori, nori)
  }
}

// for-yield 
val f = (18 / 3) * 7
for {
  k <- Some('g')
  o <- fetchOri()
} yield dwarves(f, k, o, false)

But the first way adds some boilerplate with map, flatMap, nesting, and the continuation block. The for comprehension hides some of that, but it still has the for-yield keywords, and forces us to write all <- bindings before the function call the local val declarations above. What we cannot do is to directly pass those expressions with a wrapped type inside the list of arguments of the function, as we can in a plain context. We propose such a syntax, which we call of comprehensive applications, which resemble named parameters but use <- bindings for wrapped arguments.

// comprehensive application
dwarves(fili = (18 / 3) * 7, kili <- Some('g'), ori <- fetchOri(), nori = false)

Proposal

Syntax: In section 6.6 Function Applications of the Scala Language specification, modify the ArgumentExprs category by adding a new case, that of ArgExprs, where an ArgExprs is a a comma-separated list of ArgExpr, each ArgExpr being either a) an assigned argument assignment id = Expr, or a bound argument id <- Expr.

ArgumentExprs ::= '(' [ArgExprs] ')'
ArgExprs  ::= ArgExpr {',' ArgExprs}
ArgExpr ::= Expr | ArgBind
ArgBind ::= id '<-' Expr
         |  id  = Expr

For checking symbols, for a construction of ArgExprs the compiler should check that all the identifiers at the left hand side of either <- or = are parameters of the function, without repetitions, and all non-defaulted parameters appear.

Typing if a function application has at least one bound argument, then all expressions on the right hand side of a bound argument should be able to be unified to a same wrapper type. This wrapper type being the lowest upper bound of the wrapper types of all bound arguments (e.g. Some and None to Option). The type inferred for the application would then be the return type of that function, lifted to the wrapper type (e.g. Option[Double]).

Semantics: To reduce asymmetry between direct an indirect style, a function application with bound arguments should have a similar meaning as current functiona applications do, as specified in Section 6.6.1 of the specificaation. To this end, a function application that contains at least one binding argument should be equivalent to a block of Scala constructed as follows:

any assigned argument before the first bound argument is extracted, either as a local val for by-value parameters or as a lazy val for by-name parameters. The order of these expressions should be the order of the arguments in the application, not the order of the parameters in the function.
any bound argument p <- e before the last bound argument is rewritten into a e.flatMap { p => <body> } block. Any assigned argument following it should be extracted within the body, and the next bound argument should be the last element in the block.
the last bound argument p <- e should be transformed into an e.map { p => <body> } block, where the body contains the rest of the arguments (extracted) and the function call.

Wrapped expressions should be un-wrapped (by map or flatMap) in the same order as they appear in the call, which may differ from the order of those parameters in the function definition. A bound argument for a by-name parameter would still perform the flatMap on that expression, even if the value it yields may not be needed, whereas assigned arguments would not be evaluated.

Following this translation, the “comprehenisve application” code above is translated to the “map-flatMap” code above.

Advantages: supporting comprehensive applications has some advantages for language improvement:

It follows on the principle of Scala as a functional language, in which function application should be the preferred way to compose expressions.
Leaner syntax: for simple cases it makes a for-yield comprehension, or a map-flatMap call, unnecessary.
Symmetry: it closes a gap between direct and indirect style, putting these styles just a <- operator apart.
Non-intrusive: It requires no new reserved words or operators, since it reuses the <- operator.
Compatible: It can be used in the same contexts and data types that a for comprehension can.

Limitations: the semantics for bound arguments is aligned with that of plain arguments, but only for by-value parameters. However, this alignment is not as clear for by-name parameters. Consider the following example:

def sloth(fili: Int, kili: => Char, ori: => String, nori: Boolean): Double = 
  if (nori) foo(fili, ori) else bar(fili, kili)

def lazyOri(): String = { println("Hi"), "In a hole" }

// plain call 
sloth(fili = (18 / 3) * 7, kili = 'g'       , ori  =      lazyOri() , nori = false)

// comprehensive call 
sloth(fili = (18 / 3) * 7, kili <- Some('g'), ori <- Some(lazyOri()), nori = false)

// for comprehension:
for {
  kili <- Some('g')
  ori  <- Some(lazyOri())
} yield sloth((18 / 3) * 7, kili, ori, false)

In the plain call the side effect (printing "Hi") is never run, since the expression is a by-name parameter that is never used. In the comprehensive call, however, it is run within a second flatMap, since that is closer to for comprehensions than to plain calls. Threading laziness into the function would mean altering the function code itself, which is a very different proposition.

Possible extensions, excluded from proposal

Our proposal is only focused on using named bound arguments p <- e within some function calls. In our experience, this would avoid some frequent uses of for comprehensions, and would be an unambiguous extension to the syntax. Nevertheless, the same idea could be applied to other constructs.

We could handle positional bound arguments by using underscores instead of parameter names, which would extend our proposal to applications of anonymous functions.

val mine: (Int, Char) => Double

mine(42, 'g') : Double
mine(_ <- Option(42), 'g') : Option[Double]

Likewise, we could support binary operations by writing such anonymous bindings inside parentheses.

           42   + 13  : Int
(_ <- Some(42)) + 13  : Option[Int]

// translated to 
Some(42).map(_ + 13)

In particular, we could add an ad-hoc translation for short-cut boolean operators, to avoid a second map or flatMap if an operand is not needed.

primQ: Option[Boolean]
quinQ: Option[Boolean]
(_ <- primQ) || (_ <- quinQ)

// normal, strict-bindings , translation 
primQ.flatMap { q => quinQ.map { a || b} }

// short-cut translation
primQ.flatMap { q => if (q) Some(true) else quinQ }

We could use a similar syntax inside the conditions of an if expressions (always in parentheses for Scala 3), with the net effect of lifting the results of both branches.

if (cond) "fili" else "kili"  : String

(if (_ <- condF()) "fili" else "kili" ): Option[String]

Ultimately, we could also add binding local declarations val id <- expr, inside blocks (like function bodies). This would turn the rest of the block, from the first binding to the end, into a silent for comprehension.

// for comprehensions 
val techpoint: Option[Int] = 
  for {
    x <- foo() 
    y <- bar()
  } yield x + y + 1

// Binding `val` declarations
val techpoint: Option[Int] = {
  val x <- foo()
  val y <- bar()
  x + y + 1
}

However, it is not clear how much ambiguity this would cause when combined with the rest of the language syntax. Furthermore, the overuse of underscores could not help to make the Scala language clearer. Perhaps it could be provided with caution.

Similar or related work

Haskell has its idiomatic way of dealing with wrapped values. All functions are currified: they all take one parameter and are applied using the binary $ operator (or whitespace). In a similar way, the Applicative type class defines two binary operations, <$> and <*>, that mimic to a degree the same application syntax.

(<$>) :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b

(+)  $       1  $       2  -- 3 
(+) <$> Just 1 <*> Just 2  -- Just 3

// Our example in Haskell
dwarves :: Int -> Char -> String -> Boolean -> Double
fetchOri :: Maybe String

dwarves <$> pure ((18 / 3) * 7) <*> Some('g') <*> fetchOri <*> pure false

The Idris language, which is close to Haskell, provides a feature of Idiom brackets for lifting function applications to wrapped types. This syntax is inspired by the article Applicative programming with effects. With this syntax, a function application over wrapped types is written between [| and |] delimiters. To the best of our understanding, our example would be written as:

[| dwarves ((18 / 3) * 7) 'g' fetchOri false |]

In cats, the Apply type-class defines a set of [mapN functions(implemented, implemented with sbt-boilerplate, with which one can write a tuple of wrapped values and apply a function to the results they yield.

( Some('g'), fetchOri()).map2((18 / 3) * 7, _, _, false)

This sometimes sidesteps the for-comprehensions, but it is not totally convenient. First, with the mapN combinator one writes the arguments first and the function afterwards, a reversed style that looks strange. Second, because a function is passed in eta form, once a function has too many parameters it can be hard to track the relation between arguments and parameters. It should also be noted that cats provides other combinators such as tupled and parMapN, for other purposes.

better-monadic-for is a compiler plugin for Scala 2 that fixes some known snags and puzzler of for comprehensions, and improves the desugaring of the basic for to allow some redundant operations. It does not provide a syntax for function applications. This improvements were submitted for consideration into the language in Pre-SIP: improve for-comprehensions functionality.

Monadless implements as an alternative to for comprehensions with two Scala macros. The lift macro denotes a block of code inside of which wrapped types are composed. Within it, the unlift function marks extracting the value from an expression of a wrapped type. To the best of our understanding, our example would be written with monadless as follows:

lift {
  dwarves( 18 / 3 * 7, unlift(Some('g')), unlift(fetchOri()), false )
}

This code needs no parameter names, the calls to unlift are not more noisy than the <- operator, and it just adds one nesting level. Nevertheless, monadless has a wider scope than our proposal, since it supports more complex expressions and statements. Although such a change would have some benefits that make it worth exploring, it could also seems a bit more difficult than this proposal.

jducoeur · September 1, 2022, 12:30pm

The view from the peanut gallery:

The boilerplate reduction here is quite neat – this does seem to be pretty much minimal syntax. IMO the underscore variant is probably necessary: I don’t often think about parameter names in real-world coding, and it would be awfully inconvenient to have to always look them up.

I’m a little concerned from a code-understanding POV. Thinking about this from the viewpoint of a newer Scala programmer (or even someone reading quickly), I suspect that the sheer lack of ceremony is going to produce some moments of serious confusion about why (in your first example), dwarves() is returning an Option[Double] instead of Double. I suspect folks would get used to it, but I’m not sure how big a hiccup that would be.

It would be interesting to see some worked examples of this using async monads, which are really the interesting use case – Option is a nice simple case to talk about, but in practice probably 90% of my usage would be Future / IO / ZIO. (My intuition is that it would work pretty well, but it wants some thought.)

Overall, though, it’s very intriguing. It still feels very much like Scala, just a bit more concise.

Sporarum · September 1, 2022, 1:19pm

I am not very convinced this is a good idea.

(CFA = Comprehensive Function Application)

It feels like an elegant extension of the language, but for me it breaks a pillar of function application:
“Function application returns a value of the type described in the function’s signature”

(You could argue partial application already does that, and I agree, but it is used sparingly, whereas CFA would get used everywhere)

Note that this problem composes:

def id[T](x: T): T = x
val g = Some('g')
val same = id(x <- g)
dwarves(fili = (18 / 3) * 7, kili <- same, ori = lazyOri(), nori = false)

To know the return type of dwarves, you need to consult the signature of g !
And there is a very real risk people start coding like the above

For me these are deal breakers, but there might be a small change that makes them go away, maybe Option dwarf(kili <- Some('g'), ...) (the expected type), maybe for dwarf(kili <- Some('g'), ...), or maybe even for[Option] ...

There is also the question of, is the for we have that bad ?
It has the merrit of being relatively concise, and even it already causes some issues with people learning the language, issues CFA would only exacerbate

However, I do want to congratulate you on this proposal, it is very well thought through, with lots of examples, and very well explained!

Small correction:

I think this should be “lowest upper bound” instead

Baccata · September 1, 2022, 2:05pm

I personally think that the existence of userland solution that are close enough to the desired result (and happen to have a larger scope) warrant the inclusion of bespoke syntax to the language.

dotty-cps-async is an alternative to monadless for instance, and makes it pretty simple to customise the lift/unlift keywords to something that would be more palatable. This is actually what cats-effect-cps does for the Scala 3 support.

In ortherwords, you could have something like :

lift(dwarves( 18 / 3 * 7, Some('g').unlift, fetchOri().unlift, false))

Additionally, I think that the hypothetic syntax you’re proposing would have a lot more value in the context of Applicative composition than Monadic composition, and there is strictly nothing built-in the Scala syntax or stdlib that even acknowledge the concept of Applicative.

VladKopanev · September 1, 2022, 2:18pm

Alternative suggestion, what if we implement monadic values inlining in for-comprehension, like in Idris, e.g.:

for { 
 user <- createUser(User(name, !randomUUID))
} yield user

Where randomUUID is of type IO[UUID].
Or something similar, it shouldn’t necessarily be a bang operator for inlining.

This way, it would be evident from the code that we are working in the monadic context and that expressions prepended with the inline operator would transform into another flatMap without changing the return type of the function the call of which we are inlining expression into.

gabro · September 1, 2022, 2:52pm

I agree this is a major drawback, especially when reading code in a non-interactive environment.

I think the proposal is very well put together, but this aspect (i.e. the fact that the return type changes based on the arguments beyond what the signature says) should be at least included in the “limitations” section.

jpablo · September 1, 2022, 6:11pm

To add another data point, Lean 4 supports a variant of this:

def dwarves (fili: Int) (kili: Char) (ori: String) (nori: Boolean): Option String := sorry

def fetchOri: Option String := 
  some "In a hole"

def fetchKili: Option Char :=
  some 'g'

def fetchFili: Option Int := 
  some ((18 / 3) * 7)

def a: Option String := do
  dwarves (<- fetchFili) (<- fetchKili) (ori := <- fetchOri) (nori := false)

def get1 := some 6
def get2 := some 9

def b: Option Int := do (<- get1).mod 3 + (<- get2).mod 3

example: b = some 0 := rfl

The catch is that is just syntactic sugar for regular do notation application, so there’s no “unwrapping” of anything.

In my experience it works pretty well, with the caveat that it’s a bit strange when calling a “method” immediately; IMO instead of

(<- fetchOri).isEmpty

it would be much more natural to write

fetchOri -> isEmpty

or

fetchOri .! isEmpty

or something similar.

AMatveev · September 1, 2022, 6:23pm

Here is another variation of the idea

yield dwarves(f, for Some('g'), for fetchOri(), false)
// or 
for {
  o <- fetchOri()
} yield dwarves(f, for Some('g'), o, false)

diesalbla · September 3, 2022, 10:21am

For Scala, we could dual-case the left-arrow <- as a unary prefixed operator, to avoid using underscores. In Scala, we cannot use the right-arrow operator -> because that is used as tuple constructor "Hi" -> 42 : (String, Int). For the case in which the yield value is the receiver of a method call, there is fetchOri.map(_.isEmpty) which is not so much of a problem.

diesalbla · September 3, 2022, 10:39am

That is not broken. A function application returns a value of that type; and a CFA (a different construct) returns a value of the wrapper type around the function return type.

An optional type parameter in the for keyword could be helpful, as a separate proposal. However, since for comprehensions are desugared before typing, that type parameter would have to become a type annotation for the whole for. As to the types being difficult to track, this is not different from the rest of the language. One can write plain function applications, with polymorphic code, for which types are hard to trace, which is why it sometimes helps to introduce type annotations.

In a small way, yes, because it distracts from the essence of what is meant, which is a function and its arguments. The for comprehension serve a purpose, but CFAs can serve a specific sub-case better.
Going back to the comparison between plain and wrapped types: if in plain style every function application had to take only val or lazy val identifiers as arguments, that would not be a comfortable way to write. If it is not comfortable for a plain context, then it cannot be for the wrapped one.

Thanks. For some context, this was brought from writing non-trivial circe decoders or scalacheck generators, which often fit the pattern of the CFA.

diesalbla · September 3, 2022, 11:16am

VladKopanev:

Alternative suggestion, what if we implement monadic values inlining in for-comprehension, like in Idris, e.g.:
for { 
 user <- createUser(User(name, !randomUUID))
} yield user
Where randomUUID is of type IO[UUID]. Or something similar, it shouldn’t necessarily be a bang operator for inlining.

IIUC, that would turn the for comprehension into a syntactic context inside of which every call to the “expand” operator would be replaced with a binding at the for comprehension level of the expression with an auxiliary variable. E.g. that would desugar to:

for {
 x<- randomUUID
 user <- createUser(User(name, x))
} yield user

A general solution to this would just mean running through each point in a for comprehension, and analysing the expression on the right of the <- to extract each “inlined” wrapped type as a flatMap, before the expression. It may need to be inside a for-comprehension, which sets the scope at which that binding is extracted. It could very well be implementing as part of the desugaring of the for comprehension itself.

This is a very interesting idea, and it seems closer to the deep-nested-code analysis and transformation that the macros of monadless or dotty-cps-async. From my first view, it has a lot of merit as a separate proposal.

Sporarum · September 13, 2022, 3:10pm

Even with polymorphic code, you usually know the rough shape of what you’ll get:

def values[A <: Container](c: Container): List[A]
values(???) // List[something]

With CFA, potentially:

values(c <- ???) // something[List[something]]

The cases where you don’t know what to expect with polymorphic methods is the exception, usually due to bad interface design, whereas with CFA, it feel almost inevitable ?

That being said, I agree that it’s relatively straightforward to fix it with some well placed expected types, so I don’t consider the above a necessary deal breaker, my main point is the following:

I understood that, but you even named it comprehensive function application, and the syntax differs only slightly, whereas the change to the return type is very significant !

For me syntax is the crux of the issue, something like
for dwarves(fili = (18 / 3) * 7, kili <- same, ori = lazyOri(), nori = false)
Is acceptable, as there is a clear sign something extra is going on, while
dwarves(fili = (18 / 3) * 7, kili <- same, ori = lazyOri(), nori = false)
Is not

(I am not saying for is a good choice here, but there needs to be something)