Making `for` simpler and more regular

Ichoran · August 15, 2018, 1:57am

I don’t normally hear people complaining about for (maybe because it’s awesome), but it has a huge number of irregularities compared to “normal” Scala, and despite having used it for about a decade now, I don’t always immediately recall every detail of how to use it effectively.

Since Scala is getting a bit of a facelift with Dotty anyway, now seems like an ideal time to consider how to freshen up for to be a little less weird.

Which aspects of the behavior of for, if any, do you find particularly incongruous or difficult to get working?

For me, the top two are for() vs for{}, and the sudden ability to assign to create vals through simple assignment. But there are a bunch of things.

I have a mock proposal for a more uniform for as a potential starting point for discussion.

for() allows only a single statement. Use for{} for multiple statements.
Assignment uses normal syntax; both vals and vars are fine, and are desugared exactly as you would expect (i.e. placed in the body of the method).
~~Postfix if goes away. Use destructuring for conditionals instead.~~ Postfix if and incomplete destructuring both go away. Use case instead (with if as needed, as in match).
yield has a type parameter that you can use to specify the element type.
foreach version goes away.

I’m pretty sure this is not complete enough, but it is much simpler to understand than the existing for. Edit: missed “not” before! Very important not!

Examples:

for (a <- List(1, 2)) yield 3*a

for {
  a <- List(1, 2)
  val b = 3*a
} yield a*b

// Gives a compiler error instead of Vector((), (), (), ())
for (c <- "fish") yield[Char] (if (c < 'c') c)

New suggestion of using case:

for {
  case Left((x, "salmon")) <- myEither
} yield x + x

mpilquist · August 15, 2018, 2:16am

ctongfei · August 15, 2018, 4:24am

the foreach version is very important, especially for beginners who are familiar with Java/C#/etc. The yield clause (monadic comprehension) is not so intuitive for beginners. I would argue that we keep the foreach version.

martijnhoekstra · August 15, 2018, 8:18am

I’m not entirely sure that’s true.

In the cases that mostly matter in that context,the only observable difference between the foreach version and the map version is that one returns Unit and the other one returns a result that you could throw away.

That’s not the case for lazy structures (LazyList), but having to force that manually before it does its side-effects doesn’t seem too bad to me.

for (a <- List(1, 2, 3)) yield println(a) is a word longer than for (a <- List(1, 2, 3)) println(a), but it’s not difficult to explain how to do that to the hypothetical Java/C# newcomer. It’s just “weird syntax that does nothing” (that they may later discover does something afterall)

Also, it paves the way for a future where not having the yield means something else. But that doesn’t have to be a motivating case for this proposal.

sjrd · August 15, 2018, 8:49am

There are things I don’t like with for, but I have significantly different opinions on a few things you mention:

For me the fact that destructuring in for is filtering is the big problem of for. Like, the biggest issue I have with it. I want destructuring to behave the same as destructuring in a val Pattern = ... statement, i.e., throw a MatchError if it does not match. I don’t want it to filter. The destructuring-filters thing is also a significant issue for the performance of many “idiomatic” for comprehensions.

I use that one very often, and I’d be annoying at having to call foreach instead. Also, I can’t just replace it with a for..yield (dropping the result) because that’s clearly less performant. So the “nice” thing to write would be slower than the sometimes awkward call to foreach.

AMatveev · August 15, 2018, 8:56am

I think it is undoubtedly

Other points will not make my life easier, but they can do code writing harder

Scala’s for is beautiful for writing high-level bussness logic.
But, for me, it is a little toothache to use recursive function or while to emulate java’s for when I need very high performance.

It is the reason I love java for writing orm or the core of aplication server.

I think it will be useful if there are ‘jfor’(simple java’s ‘for’)
But it is not very important, usually for such tasks we use java.

jducoeur · August 15, 2018, 12:40pm

Mixed feelings, but I’m sympathetic to this one. I’m still cleaning up inappropriate paren-style fors in Querki’s codebase, from back before I understood this properly.

Huh – not something I’ve ever found myself wanting. Do you have actual use cases for var’s in for assignment? It feels like a recipe for massively confusing newbies (who often have a mental hurdle about how map() can be considered immutable), and I’m not sure when it would be desireable.

I don’t strictly object to using standard syntax for val, and certainly wouldn’t mind it being an option (now I’m wondering whether it would be useful to be able to spell out implicit val in the middle of a comprehension), but really, this isn’t one of the aspects of for that tends to bug me.

I’m with @sjrd on this one – destructuring was a clever idea that turns out to be a horrible wart. I don’t think I’ve ever had a case where it was actually what I wanted, and too many where destructuring caused a mysterious bug by filtering when I didn’t expect it. I’d love to see destructuring filters just go away.

In principle I like it. In practice, I’m wondering if it would ever get used enough to pull its weight.

Personally, I totally agree – forgetting to say yield is certainly my number-one cause of for bugs, and that still hits me from time to time. (Especially in Scalatags, for some reason.) But I do have some sympathy for @sjrd’s counter-argument.

I wonder if it would be possible to instead add a compiler warning if you forget the yield on a non-Unit result? It would sometimes be inconvenient, requiring you to add an explicit Unit ascription, but I’m not sure how often…

Ichoran · August 15, 2018, 12:57pm

On second thought, you’re right. I’ve almost always been bitten by it rather than having it help me, and simply making it the only way to do things is probably not enough to rescue it.

shawjef3 · August 15, 2018, 3:19pm

I would like to add one more thing that I’ve occasionally wanted. Lets say you have some for expression and you want to have a counter, do some logging, or otherwise perform some side effect. Currently the syntax for this is ugly. You can’t just state what you want to happen as you would outside of a for expression. You must assign the result of the side effect to an empty value.

var xCounter = 0

val sums =
  for {
    x <- List(1,2,3)
    () = xCounter += 1
    y <- List(4,5,6)
  } yield x + y

I would prefer

var xCounter = 0

val sums =
  for {
    x <- List(1,2,3)
    xCounter += 1
    y <- List(4,5,6)
  } yield x + y

Or maybe even some special way to break out of the for syntax and do something procedural. I’d love to see more ideas.

for() allows only a single statement. Use for{} for multiple statements.

I disagree with this because using curly braces for a single expression for makes it easier to add additional expressions. It’s like allowing trailing commas. If anything, I’d get rid of the parens, but we should keep them for backwards compatibility.

Assignment uses normal syntax; both vals and vars are fine, and are desugared exactly as you would expect (i.e. placed in the body of the method).

Do you mean allow var and val inside of for?

  for {
    x <- List(1,2,3)
    var yCounter = 0
    y <- List(4,5,6)
    () = yCounter += 1
  } yield x + y

My understanding is that val used to be required to assign a value, but then that syntax was deprecated. I’d want to know why before agreeing with this.

~~Postfix if goes away. Use destructuring for conditionals instead.~~ Postfix if and incomplete destructuring both go away. Use case instead (with if as needed, as in match ).

Is postfix if what you see in

for {
  x <- List(1,2,3)
  if x % 2 == 0
} yield x

? If so I disagree with removing it, especially for backwards compatibility. And I don’t want to have to write case _ if ... just to get a boolean filter. But I like the idea of case and I’d like to see some examples.

yield has a type parameter that you can use to specify the element type.

This is interesting and I’d like to consider it some more. For example, when you have a for expression over some data structure that isn’t generic, would having a type parameter on yield be a syntax error?

foreach version goes away.

yield gives map so I’m guessing we’d end up with M[Unit]s everywhere we we previously intended to prevent from having anything to garbage collect.

Ichoran · August 15, 2018, 3:38pm

To clarify, I meant you could use for {} for anything, but for() would only be allowed for single expressions. This brings it in line with normal expressions (where {} denotes a code block that can contain multiple expressions).

The problem with postfix if is that it’s another wrinkle to the language. If you just say, “case statements work in for-comprehensions”, then you already know how to use them because they exist in match statements and in function definitions.

It also has unintuitive evaluation order. You can write things like

for {
  x <- xs
  y <- f(x) if x > 3
} yield y

and be mystified by why f(x) gets called on an invalid value. With the case syntax, it’s clear that you’re doing f(x) first:

for {
  x <- xs
  case y if x > 3 <- f(x)
} yield y

Of course you can learn when the if happens, but it’s an extra irregular thing to learn.

Ichoran · August 15, 2018, 3:52pm

Well, I’m sympathetic to this concern, and I use it a lot too, but I don’t think it’s different from the procedure syntax concern. They’re almost exactly parallel: a novel syntax in the language providing a convenience for side-effecting code that is a source of mistakes especially among less-expert users.

Maybe the compiler could recognize when you only produced Unit and threw away the result, and could insert foreach instead of map and friends in that case?

Would that be enough? (If yes, wouldn’t that also be enough for procedure syntax?)

That would fix the problem of mistaken usage, but it wouldn’t simplify the language.

nafg · August 15, 2018, 4:46pm

A (weak) argument in favor of multiple components in parentheses is it’s reminiscent of the C for. Although, that’s probably more of a counterargument than an argument in favor…

I guess if we restrict it then single-line multiple-generator for comprehensions would use curly braces and semicolons, which might take some adjusting. It might be worth it to gain another universal syntax rule.

nafg · August 15, 2018, 4:53pm

As was said, this was the case long ago (at least vals) and was deprecated. The way I see assignments currently they’re kind of like generators but not in the monad, meaning to say that the for comprehension is a sequence of generators-and-fixed-values assigned to identifiers. In that sense var makes no sense and val is redundant.

However it would be interesting if we change the paradigm, and instead look at for comprehensions more like a de-nested flatMap chain, plus interleaved code which currently only can be assignments. In that perspective, other statements could potentially be allowed, including var and reassignment, and using the val keyword would make more sense. In other words think about allowing arbitrary code between generators.

nafg · August 15, 2018, 5:04pm

Since when is it a “postfix if”? My understanding is that for comprehensions are a sequence of generators, assignments, and guards. That is, the if is a separate component of the for comprehension, not attached to a generator. However syntactically it can be written on the same line. That is confusing, and rectifying that should be easy to agree on and implement. Guards in for comprehensions should require a semicolon or newline like any other component.

nafg · August 15, 2018, 5:15pm

Regarding patterns in for comprehensions:

There were proposals in the past about this. I think one was that it should only filter if you use the case keyword.
I agree that usually you just want to destructure, similar to val patterns. However if the pattern is refutable and the match fails, I’m not sure I want a runtime MatchError. Although I’m not sure if unintentional filtering is better. I don’t want a MatchError on val patterns either. What would be great is if the compiler could tell me it’s refutable. One approach would be that if you use a refutable val pattern or for-comprehension pattern you get a compiler warning. Of course this is subject to being able to reliably determine refutability; I’m not sure how well we score on that. Also this would be one of those places where having a good way to control warnings would be helpful.

I’m not that convinced we can or should completely remove guards or refutable (filtering) patterns, though. But better safety (e.g. requiring case) sounds like a win-win. And I guess turning standalone guards into case guards and simply prefixing them with case _ isn’t that bad.

jducoeur · August 15, 2018, 5:16pm

Agreed with @nafg about the usage of if. The notion of “postfix if” took me aback, because I’ve never used it that way, and it never occurred to me that you could use it that way.

So I agree: the postfix syntax per se probably can and should be eliminated – if should always be on a separate line. Allowing it to exist postfix on the previous line is just plain confusing and misleading…

odersky · August 15, 2018, 5:18pm

Guards in for comprehensions should require a semicolon or newline like any other component.

Like vals that would also take us back to the future. guards required a semicolon initially but that requirement got dropped because there was no need for it.

quiray · August 15, 2018, 6:08pm

I really like the Scala’s for comprehensions, but as others noted, they feel a bit off from other parts of the language.
val sounds good to me, I would also add parenthesis to guards to mimic regular if expressions.
Always-yield in my opinion fits with “prefer FP” motto. With foreach meaning, either let compiler deduce return value is not used or even add some special mark (specify Unit as result type, or instead of for use foreach or similar) to signal it is meant only for side-effects.
Personally, I would get rid of for() variant entirely (I don’t even remember ever using it since the time I was learning) and just have one for form - for{}.
Better way of performing side effects would be welcome, current approach feels very hacky. It would be nice, if it could be directly in for{...}, but I am not sure it doesn’t lead to ambiguities. Maybe use block (and interpret it if delimited as side-effect, so no application unless { is on a same line as a function name or delimited with ;)? E.g.:

for {
  a <- x
  { println(a) }
  b <- y
} yield a+b

Adowrath · August 15, 2018, 6:36pm

Then disallowing postfix if and only allow single-line if is what I’d go for, to be honest.

I would answer a complete no to the second part of that. It’d very likely need quite a few () at the end of some procedures but not others. Would you think that more regular than a : Unit = everywhere?

I can’t agree with this more. It’s especially aggravating that even in (x, y) <- List((1, 2), (2, 3)), withFilter is introduced even though that cannot possibly fail. Letting i.e. IntelliJ transform that code into normal map/flatMap calls because I need something more powerful is a nightmare, as there’s repeated tupling and untupling involved then. But it can’t not do that, cause the specification says it has to be so.

MarkCLewis · August 15, 2018, 8:36pm

As an educator who works with beginners a lot, I feel that the difference between the yield version (expression) and non-yield version (statement) is significant. Of course, if there is really only one version (expression), then the true simplification is to get rid of the yield. This does require that the compiler be able to optimize down to a call to foreach when the result isn’t used for anything.

I have to say that there are other parts of this proposal that I dislike. While it might simplify the overall syntax to force {} if there is more than one element in the loop, that forces introducing the {} syntax, which I can currently avoid. For the novice it is important to distinguish simplification of the whole vs. simplification of what the beginner must be taught.