Concerns about deprecation of IterableOnce member methods in 2.13

informarte · August 8, 2019, 7:21am

Scala 2.12 (and earlier version) feature TraversableOnce. Since both Iterator and Iterable implement TraversableOnce, it can be used to avoid object allocations: For example, consider the function

  def foo(bar: TraversableOnce[Bar]) = if (bar.isEmpty) 0 else 1

This function can be applied to either an iterator or some container and, in either case, there will no object allocation in practice, which I always considered an outstanding feature of the Scala collections.

In Scala 2.13, TraversableOnce is deprecated and synonymous to IterableOnce. When I compile foo with the 2.13.0 compiler, I obtain the following warning:

Warning: method isEmpty in class IterableOnceExtensionMethods is deprecated (since 2.13.0): Use .iterator.isEmpty instead

So this warning suggests to change the 2.13 version of foo to:

  def foo(bar: IterableOnce[Bar]) = if (bar.iterator.isEmpty) 0 else 1

Now, when bar is an iterator, there will be an additional call to iterator but no additional object allocation. (Let’s hope some optimizer gets rid of this call sooner or later.) However, when bar is some container, then an iterator object will be created. Considering the fact that typical container implementations provide an efficient isEmpty implementation, the iterator creation is an unnecessary and huge overhead.

I am the author of the optimization tool Yuck and performance is key. In its guts, the tool relies heavily on TraversableOnce and the flexibility it provides. Hence I am very concerned about this deprecation.

Looking at the documentation of IterableOnce, the goal of this deprecation seems to be:

The goal is to provide a minimal interface without any sequential operations. This allows
third-party extension like Scala parallel collections to integrate at the level of IterableOnce
without inheriting unwanted implementations.

What’s the problem with the default implementations? I mean, when you have a better implementation, you just override in your subclass.

Jasper-M · August 8, 2019, 10:29am

FWIW, I think this should give you more or less the same functionality:

import scala.collection.IterableOnceOps

type IteratorOrIterable[A] = IterableOnceOps[A, IterableOnce, IterableOnce[A]]

informarte · August 8, 2019, 8:11pm

I tried IteratorOrIterable; it works but it feels more like a workaround than a solution:

IterableOnceOps seems to be an internal helper trait. Will it stay around for some while? Is it safe to rely on it in application code?
IterableOnceOps provides the toIterator method, but it is deprecated. To access the iterator method (of IterableOnce), I defined this implicit function:

    implicit def asIterableOnce[A](it: IteratorOrIterable[A]) = it.asInstanceOf[IterableOnce[A]]

Every IterableOnceOps is an IterableOnce and vice versa but they look and feel totally different.
Many operations of IteratorOrIterable (like filter, map, take, drop, …) are not closed; they return IterableOnce instead of IteratorOrIterable.

martijnhoekstra · August 9, 2019, 9:12am

It looks like that in many places you’re using iterators in order not to evaluate more than needed. Have you considered using views for this in 2.13? I have the impression that you can get pretty far by just globally replacing TraversableOnce with Iterable, and toIterator with view

EDIT: I made a quick PR with roughly this change, could you check if that degrades performance or not?

julienrf · August 9, 2019, 4:47pm

Hello, may I ask why you need to have an API that supports both Iterator and Iterable? Why isn’t Iterable alone sufficient?

informarte · August 11, 2019, 5:44am

@martijnhoekstra, I studied your PR, thanks a lot for it!

I ran performance tests overnight (120 problems) (on OpenJDK 11 and with full optimization and inlining) and created a figure to compare the speedup of different versions vs. the latest results from my dev branch with Scala 2.12.8:

comparison-1

IteratorOnce is just the results of replacing Traversable(Once) with Iterable(Once).
IteratorOrIterable implements the proposal by @Jasper-M.
concoction is IteratorOnce plus this piece of code:

    type IteratorOrIterable[A] = scala.collection.IterableOnceOps[A, IterableOnce, IterableOnce[A]]

    implicit def asIterableOnce[A](it: IteratorOrIterable[A]): IterableOnce[A] =
        it.asInstanceOf[IterableOnce[A]]
    implicit def asIteratorOrIterable[A](it: IterableOnce[A]): IteratorOrIterable[A] = it match {
        case it: IteratorOrIterable[A] => it
        case _ => it.iterator
    }

martijnhoekstra-213 is your PR.

The good news is: Expect for some outliers (which require separate investigation), all 2.13 based versions were a bit faster than the reference. The winner is IteratorOrIterable. Views are a bit slower overall and there are more outliers with speedups lower than 1.

informarte · August 11, 2019, 5:49pm

@julienrf, in my project the results of many interfaces are used only once. Regarding the implementations of such interfaces, let’s distinguish two cases:

The result is not readily available and has to be computed. In this case returning an iterator is the most efficient solution.
The result is readily available as class member. In this case it is cheaper to return the container itself than to allocate an iterator; the caller can still ask for an iterator if necessary, but often the result is processed by foreach and hence no iterator is required.

For example, consider the effects of a move in local search; they are looked at only once and hence the Move interface defines:

def effects: TraversableOnce[AnyMoveEffect]

There are different types of moves. ChangeAnyValues is created from a container of effects and its effects method just returns this container. BulkMove, however, collects effects in a mutable hash map and its effects method returns a values iterator.

As pointed out by @martijnhoekstra, views are an alternative but they are more expensive than TraversableOnce. Views are based on iterators and hence creating and using a view requires two allocations. In general, in a pipeline like

source.view.filter(f).map(g)

with n operations, 2 * (n + 1) allocations are required. Replacing the views by iterators reduces the number of allocations to n + 1.

Concluding, as both iterators and containers implement TraversableOnce, it’s the perfect type for my needs. Views are an option but more expensive.

martijnhoekstra · August 11, 2019, 7:22pm

Views indeed allocate, which represents the cost of being able to iterate more than once lazily over an eager collection.

Whether its worth it depends on the use case. Your code works ostentatively, so in the end, everything works out. Still, an Ordering[TraverableOnce] was a bit jarring: if the underlying collection cant be traversed again, then you can’t find the ordering and also use the traversable.

In didn’t study the code in enough detail to check in which cased it is used, and in which cases it isn’t, I just went through and made the change that locally looked reasonable without taking the global structure of the program in to account. That does to me imply that the program can probably be refined a bit so that you know whether a value must be lazy (because you may want to skip some operations), and can be iterated once, or can be iterated again.

When that is clearer, I also suspect it will be easier to identify some more places where there is room for more performance. If it’s expensive to produce, and might not need to be iterated fully, and will not be (partially) iterated again: Iterator, if it will be partially iterated many times: View, if it will be iterated fully more than once, and will not need to change: ArraySeq. If it will be iterated fully, and may be changed, Vector.

Regardless of anything else, and whether those guesses are correct, it’s great to see these kinds of applications, written in scala, available as free software.

julienrf · August 12, 2019, 11:16am

Thanks for the explanation. Unfortunately, there is (currently?) no equivalent to Traversable or TraversableOnce yet. We had a (short) discussion about it when we began the work on the 2.13 collections but we never really concluded anything: https://github.com/scala/collection-strawman/issues/15. You could introduce such an abstraction, but this won’t be as efficient as TraversableOnce used to be:

trait TraversableOnceOps[A, CC[_], C] {
  def foreach[U](f: A => U): Unit
  def filter(p: A => Boolean): C
  def map[B](f: A => B): CC[B]
  // etc.
}

object TraversableOnceOps {
  def ofIterable[A, C <: Iterable[A], CC[X] <: Iterable[X]](it: IterableOps[A, CC, C]): TraversableOnceOps[A, CC, C] = new TraversableOnceOps[A, CC, C] {
    def foreach[U](f: A => U): Unit = it.foreach(f)
    def filter(p: A => Boolean): C = it.filter(p)
    def map[B](f: A => B): CC[B] = it.map(f)
  }

  def ofIterator[A](it: Iterator[A]): TraversableOnceOps[A, Iterator, Iterator[A]] = new TraversableOnceOps[A, Iterator, Iterator[A]] {
    def foreach[U](f: A => U): Unit = it.foreach(f)
    def filter(p: A => Boolean): C = it.filter(p)
    def map[B](f: A => B): CC[B] = it.map(f)
  }
}

That being said, I agree that it would be simpler if it was possible to just use IterableOnceOps instead, like @Jasper-M suggested. Could you please report the limitations you found at https://github.com/scala/bug?

Also, to make it a little bit easier to use, we might want to introduce an IsIterableOnceOps abstraction, like we already have with IsIterable, IsSeq, etc. (see https://www.scala-lang.org/api/current/scala/collection/generic/IsIterable.html). Would that help you?

julienrf · August 12, 2019, 4:38pm

This can be “fixed“ with a more sophisticated definition of IteratorOrIterable, as shown here (and see here for a version where you can call the iterator method). If the cost of carrying these CC[X] <: SomeIterableOnceOps[X, CC] type bounds is too high, we could create an IsIterableOnceOps, like we have IsIterable.

julienrf · August 12, 2019, 4:47pm

I’ve submitted scala/scala#8330 for that.

informarte · August 13, 2019, 7:14am

Since the migration to Scala 2.13 entails so many code changes, I want to finish it quickly and without performance degradation such that I can continue with other user stories.

Unfortunately, I cannot ignore the deprecation of IterableOnce for the time being because calling methods on IterarableOnce now comes with unexpected overhead. So, inspired by scala.collection.Map, I decided for a mixed approach where I provide getters for iterables and iterators, for example:

    def effects: Iterable[AnyMoveEffect]
    def effectsIterator: Iterator[AnyMoveEffect]

This approach entails a little bit of code duplication but overall the code readability improved.

Right now I am focusing on the outliers I mentioned earlier. One finding is: The min operation on arrays is terribly slow. I replaced it with a classic while loop and went from a big slowdown (in comparison to 2.12) to a big speedup. I conclude that min was already slow in 2.12 and is even slower in 2.13. I will create an issue for this performance pitfall.

julienrf · August 13, 2019, 7:21am

That would be great, thanks!

Jasper-M · August 13, 2019, 8:17am

Note that, while still open for optimization, this PR might already bring some performance improvements in this area.

sideeffffect · December 24, 2019, 3:48pm

That PR can’t be merged, because of binary incompatibility. Is there something else that can be done about this issue?