Pre-SIP: Shorthand syntax for lambdas with named arguments

pshirshov · October 13, 2017, 10:51am

Let’s consider these statements:

Seq(1, 2, 3, 4).map(x => x + x * 2) // (1)
Seq(1, 2, 3, 4).reduce((x, y) = x*x + y*y) // (2)

Scala is ultimately expressive, but for case (1) some other languages provide better syntax, usually with a predefined name like it:

Seq(1, 2, 3, 4).map(it + it * 2) // equivalent to (1)

At the same time in scala we may use shorthand syntax with positional arguments:

Seq(4, 5, 6, 7).map(_ * 2) // (3)
Seq(4, 5, 6, 7).reduce(_ - _) // (4)

Situations when we need to refer lambda argument more than once so we can’t use _ are very common. It would be great to have similar shorthand syntax in Scala.

It’s possible to introduce new syntax in line with tuple value accessors style:

Seq(4, 5, 6, 7).map(_1 + _1 * 2) // equivalent to (1) 
Seq(4, 5, 6, 7).reduce( _1 * _1 + _2 * _2) // equivalent to (2)

Such a change should be relatively easy to implement and it should be pretty much safe for existing codebase.

In unlikely case of name conflict (someone has defined val _1 = 1 or imported tuple members) compiler may throw an error (preferred) or do shadowing + throw a warning. As well it’s possible to keep full backward compatibility by following simple rule: when you have lambda with arrow apply current logic, when lambda has no arrow shadow context with lambda named args.

In case people are happy with this idea I may come back with a patch.

sjrd · October 13, 2017, 11:52am

I’m not convinced. The cases where one parameter is reused several times in the lambda body, and where _-like syntax works at all (i.e., they are all at the top-level of the lambda text) are very rare. I don’t think adding more stuff to the language to “better support” these rare use cases (by making them shorter) is worth it. Especially since the potential for conflicts does exist, which means this very minor improvement is compensated by the risk of breaking existing code.

We should not break existing code. And if we do break existing code, it had better be outweighed by significant benefits.

szeiger · October 13, 2017, 11:54am

This looks like a lot of complexity for very little gain. In fact, I would always prefer to write

Seq(1, 2, 3, 4).map(x => x + x * 2) // (1)
Seq(1, 2, 3, 4).reduce((x, y) = x*x + y*y) // (2)

over

Seq(4, 5, 6, 7).map(_1 + _1 * 2) // equivalent to (1) 
Seq(4, 5, 6, 7).reduce( _1 * _1 + _2 * _2) // equivalent to (2)

It’s not obvious to me that the shorthand would be an improvement.

The implementation would also be trickier than you expect:

by following simple rule: when you have lambda with arrow apply current logic, when lambda has no arrow shadow context with lambda named args

There is no such thing as a lambda without an arrow. The arrow is what defines a lambda. All you have is an expression which is syntactically indistinguishable from other expressions. You would have to make the distinction by type rather than by syntax.

Not too bad in obvious cases:

val f: (Int => Int) = _1 + 1

I suppose you’d want this to be an error:

val _1 = 42
val f: (Int => Int) = _1 + 1 // error: _1 ambiguous

What happens when we type the expression without an expected type? It has to be an error:

val f = _1 + 1 // error: _1 not found

But now this is valid again and I find it very worrisome that the lack of an expected type does not only change the result type (which happens in other cases, too) but also the scope in which we have to typecheck the expression:

val _1 = 42
val f = _1 + 1 // 43, this is perfectly acceptable

What about this:

val f: (Int => Int) = 42
val x = 42
val f2: (Int => Int) = x

I think according to the rules you propose both would have to effectively expand to _ => 42. This means we can silently lift any expr of type T to any function-like type that returns a T.

Let’s up the ante again:

val _1 = 42
val f: (Int => Int) = _1

An error I suppose. What about:

val _1: (Int => Int) = _ + 1
val f: (Int => Int) = _1

Is this also an error or a valid assignment?

How about these two?

def f = {
  val _1: (Int => Int) = _1 + 1
  ()
}

class F {
  val _1: (Int => Int) = _1 + 1
}

I’m sure there are other interesting corner cases to discover if you probe a bit deeper.

curoli · October 13, 2017, 2:39pm

That’s a very interesting proposal, but:

The “=>” notation is already fairly short and it is nicely readable. It
doesn’t really need a shorter alternative except where you have lots of
very simple expressions, and very simple expressions very rarely contain
the same argument more than once. In those rare cases where a simple
expression does contain the same argument multiple times, you don’t gain
much either, for example (_1*_1), doesn’t gain much compared to (x => x*x)
and the latter is more readable.

Also, the existing notation allows you to add types, such as ((x: Int) =>
x*x), and I don’t see how you would add these to your proposal.

lihaoyi · October 13, 2017, 2:49pm

Notably, I proposed something similar to this (along with another similar feature that seems to have appeared in Dotty by name of “Implicit Function Types”) here: https://github.com/dsl-paradise/dsl-paradise. Not quite the same, but similar: the feature described in dsl-paradise can be used to implement what you’re describing here.

The readme in that repo elaborates on some potential use cases or such a feature. Of course, there will be issues with shadowing and ambiguity in name resolution: that is the same as with each and every other way we have of defining names. In theory we’d deal with it the same way as if we did new Foo{println(_1)} where _1 is both a member of class Foo and also defined in the enclosing scope of the new call.

Whether this specific feature is worth the additional complexity is debatable, but I do think there are sufficient cases where it provides value that it’s worth discussing.

Ichoran · October 13, 2017, 6:47pm

I agree that it’s not worth the change. It saves a maximum of three characters in the single-arg case:

xs.map(_1 * _1)
xs.map(x => x * x)
               ^^^

And a maximum of five-plus-2*(number-of-args) in the multi-arg case:

xs.map(_1 * _2 + _1)
xs.map{ (x, y) => x * y + x }
                    ^^^^^^^^^

If this had been included originally I guess it could have been useful, but mangling the meaning of existing code is not worth it for such slight shortenings. In many multi-arg cases, short descriptive names result in code with superior readability, even if there are a few more characters.

Using @ instead of _ would be better because there are then no collisions with existing usage. I suppose there may be some merit to that.

xs.map(@1 * @2 + @1)

but @ is not very elegant.

pshirshov · October 13, 2017, 9:38pm

And a maximum of five-plus-2*(number-of-args) in the multi-arg case:

Yes, though the necessity of these named parameters may be very annoying sometimes. And when is possible positional arguments feels like a huge relief (yup-yup, very opinionated). So, why not? The change is relatively easy and safe.

but mangling the meaning of existing code is not worth it for such slight shortenings

I may make it safe for existing code (technically, it’s easy to distinguish between x(_1*_1) and x(x => x*x). The only cornercase is by-name passing, but why not to just disable this expansion rule for such a case.

Alternatives are - compiler flag, different syntax (like your proposal) or shadowing.

So, why not?

Ichoran · October 13, 2017, 10:15pm

Different people have different opinions. For instance, I am never annoyed by single-letter variable names. I can encode something in those names that helps me remember what the arguments are, for example.

Because it adds complexity and irregularity (if _1 is going to be kept non-ambiguous) to the language specification for the benefit of a style for which positive opinion may not be widely shared.

nafg · October 15, 2017, 1:18pm

I don’t like the idea of a word being either an identifier, or a keyword, depending on context. (See discussion about fixing _root_, which currently violates this.)

_ is a keyword. It always is syntax, directly invoking some language grammar.
_1 is an identifier. It is always a user-defined name of something.

To make _1 alternate between these two (and as @szeiger points out, in tricky ways) would make code much harder to read.

Remember: Code is read more than written, and code is meant to be read by people more than by computers.