Proposal for LazyFoldLeft

adamgfraser · April 28, 2019, 9:37pm

I was playing around and came up with something that I thought might be of broader interest. Folds are great because we can use them to represent many operations on collections by using the appropriate parameters. However, there is an issue that for many operations like exists we want early termination which requires a lazy fold, but for stack safety we need a tail recursive left fold.

There have been efforts to get around this like using Eval in Cats or lazyFoldRight and foldLeftSome in Scala Collections Contrib. However, I think these can either involve significant performance overhead or are less intuitive for users to define operations using them.

The code below shows a lazy left fold. It accepts a binary function of the form (B, => A) => A. If the by-name parameter is evaluated it continues the fold normally. If the by-name parameter is not evaluated (e.g. because an operation like || does not need to evaluate its second argument) the fold terminates immediately and returns the accumulator.

I think this allows implementing a wide variety of specific folds, both supporting early termination and maintaining stack safety. It also makes it very easy to write a wide variety of folds in terms of the same function. There is some overhead in checking if the by-name parameter is evaluated but at least my preliminary testing indicates it is relatively small.

What do people think? Is this something that is worth contributing?

import scala.annotation.tailrec

object LazyFoldLeft extends App {

  @tailrec
  def lazyFoldLeft[A, B](as: Iterable[A])(z: B)(f: (B, => A) => B): B =
    if (as.isEmpty) z
    else {
      var headEvaluated = false
      lazy val head = {
        val res = as.head
        headEvaluated = true
        res
      }
      val acc = f(z, head)
      if (headEvaluated) lazyFoldLeft(as.tail)(acc)(f) else z
    }

  def sum(as: Iterable[Int]): Int =
    lazyFoldLeft(as)(0)(_ + _)

  def exists[A](as: Iterable[A])(f: A => Boolean): Boolean =
    lazyFoldLeft(as)(false)((b, a) => b || f(a))


  val bigList = List.range(1, 50000)
  val nats = Stream.from(0)

  assert(sum(bigList) == 1249975000)
  assert(exists(nats)(_ > 100000))
  
  println("woot!")
}

morgen-peschke · April 28, 2019, 10:43pm

That’s certainly clever.

Is this proposal for a new method, or a replacement for the current foldLeft implementation?

virusdave · April 28, 2019, 11:15pm

I would strongly oppose using “has the by-name param been evaluated” as a signal to stop evaluation. Ugh. Talk about unexpected behavior!

Why not explicitly encode the desire to continue some other way in f, such as making the return type Option[B] with None meaning “stop” ?

adamgfraser · April 28, 2019, 11:15pm

Thanks! I think it would have to be a new method for a couple of reasons.

First, replacing the current implementation would create significant compatibility issues. For example, (B, A) => B and (B, => A) => A are treated as different types by the compiler, so functions that were valid parameters for the current implementation would not compile in some situations, as shown below.

Second, I think there is some performance cost for checking if the by-name parameter is evaluated that you wouldn’t want to have to pay in some situations when you don’t need to, especially for such a foundational function that is often used in tight loops.

So I would propose a new lazyFoldLeft method. That way current code is not impacted and users can choose between the benefits of the lazy version outlined above and the incremental performance of the strict version. I think the lazy version might ultimately be more interesting. You can think of the lazy version as strictly more powerful and the strict version as a performance optimization in certain situations. But seems like that is something that should evolve organically over time.

The main downside would just be polluting the namespace with one more method but I think this one is pretty interesting.

  def or(x: Boolean, y: => Boolean): Boolean = x || y

  def or1(x: Boolean, y: Boolean): Boolean = x || y

  val as = List(1, 2, 3)

  assert(lazyFoldLeft(as.map(_ % 2 == 0))(false)(or))

  // Does not compile
  // assert(lazyFoldLeft(as.map(_ % 2 == 0))(false)(or1))

  // Works
  assert(lazyFoldLeft(as.map(_ % 2 == 0))(false)(or1(_, _)))

morgen-peschke · April 28, 2019, 11:31pm

Using a tagged type could also work. The primary downside to using Option would be the overhead of wrapping each list element and unwrapping each result.

I don’t believe we have tagged types in the std lib, so that’s probably a non-starter. Maybe union types in Dotty, with a Stop case object as the alternative return value?

adamgfraser · April 28, 2019, 11:39pm

Yes that is definitely the alternative. That is what foldLeftSome does in Scala Collection Contrib for example. I guess I would have a couple of responses.

First, I think you communicate in the name of the method and in the documentation that the lazy nature of it is a feature so it isn’t unexpected at all, just like you have lists and lazy lists.

Second, I think it is nice to not have to couch every fold in terms of Option, and there is a performance cost of doing so. Instead of that I would probably just write a specialized fold using pattern matching whereas with this I think I would be more inclined to implement the more specialized functions in terms of this.

Third, I think the short circuiting behavior actually makes sense. In the exists example once you have found one example that satisfies the predicate you’re done. You shouldn’t have to specify that yourself through Some and None, it is inherent in the short circuiting nature of ||, whereas with sum you have to evaluate the entire list because + always needs to consider both elements.

This is the behavior in Haskell and some of the function programming oriented Scala libraries have lazyFoldRight implementations that similarly terminate the computations when you don’t need to look at any more elements. This is just the same concept but without complications like having to use a monad to ensure stack safety.

virusdave · April 29, 2019, 5:15am

A couple of responses have mentioned performance cost of Option wrapping the result of applying f.

I don’t know how significant that would be, especially when compared to the example implementation from @adamgfraser that instead wraps access of f's parameter itself in a new lazy val (which is a function that itself needs to be allocated and has overhead).

I would imagine they are of similar cost, but comparing benchmarks would be interesting i suppose.

My primary objections were the non-obviousness of the result (potentially mitigated by naming and documentation), and the fact that evaluating the parameter has a nonlocal side effect, which is generally something one wants to avoid if you want a sane codebase.

jimka2001 · April 29, 2019, 7:09am

As a relative new-comer to Scala, I found this problem of fold difficult. In order to short-circuit the computation
the programmer is required to go through great lengths for what mathematically is quite simple.
For any ring (or semi-ring) having a 1 element and also a 0, the 0 element annihilates the * operation. I’ve

implemented this in my own code several times with either an exception, or by using the Either type and pattern

matching the return value. Both of which make the code look overly complicated.

@,adamgfraser , could you please elaborate on your idea in the case of annihilating operations. It’s not

explicitly clear how to extend your exists idea based on the fact that || doesn’t always evaluate its

right-hand side, to cover a problem such as finding the multiplicative product of a sequence of numbers

or matrices, and aborting computation when the accumulated result becomes zero.

julienrf · April 29, 2019, 8:52am

Thanks @adamgfraser for sharing your idea!

Your lazyFoldLeft proposal is original because the lazy (by-name) parameter is the collection element as opposed to the folded tail, as we usually see. I think it would be interesting to publish it so that people can play with it! I’m happy to review and merge a PR on scala-collection-contrib, or you can publish it on your own.

adamgfraser · April 29, 2019, 10:17am

For multiplication, * always evaluates its right hand side so you couldn’t get the short circuiting behavior with that operator directly, but you could get the behavior you want quite easily like so:

  def times(x: Int, y: => Int): Int =
    if (x == 0) x else x * y

  def product(as: Iterable[Int]): Int =
    lazyFoldLeft(as)(1)(times)

  val nats = Stream.from(0)
  assert(product(nats) == 0)

You could generalize this to matrix multiplication in the same way.

adamgfraser · April 29, 2019, 10:18am

Thanks! I will submit a PR on scala-collection-contrib.