Design for mutating iterators

NthPortal · October 7, 2020, 12:11am

What should be the API of iterators that mutate their underlying collections? For what collections should they be defined? Hopefully some future version of the standard library can include such iterators.

For mutating iterators over a Buffer, there are a whole slew of methods that would be potentially useful:

insertNext(elem: A): Unit
- inserts a value into the collection such that it is the next element for the iterator
insertAllNext(elems: IterableOnce[A]): Unit
- potentially could return the number of elements inserted
insertSkipping(elem: A): Unit
- inserts a value into the collection such that it precedes the iterator’s next element
insertAllSkipping(elems: IterableOnce[A]): Unit
prepend(elem: A): Unit
- prepends a value to the collection (the iterator is always past it)
prependAll(elems: IterableOnce[A]): Unit
append(elem: A): Unit
- appends a value to the collection (the iterator is never past it)
appendAll(elems: IterableOnce[A]): Unit
remove(): Unit
- removes the value last returned by next() from the collection
- cannot be called after insertSkipping(...) or remove() without first calling next() again
replace(elem: A): Unit
- replaces the value last returned by next() with the given value
- equivalent to an update at the appropriate index (also a potential name)

For other Seq types, the only mutating method (without knowing a more precise type) is update, so most of the above methods would not be applicable (it’s not even clear if remove() would be applicable; perhaps only replace is applicable).

For Set and Map types (and any other Shrinkable types), probably only remove() is applicable. Potentially there could be methods to add elements, but whether or not they’d be iterated over would be non-deterministic, and it might also be very difficult to implement for hash table-backed collections that need to grow in the middle of iteration.

lrytz · October 7, 2020, 6:50am

What do you think is a good way of putting “not applicable” into the design? I don’t think we want to imitate Java’s design of Iterator.remove where the doc says

optional operation […] default implementation throws an UnsupportedOperationException

Ichoran · October 7, 2020, 7:57am

Most of these operations are highly inefficient for most mutable sequence types. Furthermore, it’s hard enough to keep hasNext and next straight. Having more operations that must be applied in order but with no indication of what you’ve done is not, I think, a good practice.

So I don’t think such iterators are, on balance, a good thing to have at all.

Having tailored mutating accessors for particular collection types is fine. But these operations just don’t work well with the full diversity of mutable collection types, and just invite a slew of O(n^2) and order-of-operation errors, even if they are very nice on ListBuffer for certain desirable operations.

SethTisue · October 7, 2020, 3:02pm

re: Lukas’s question, I assumed that if this happens at all, it would be by having a subclass of Iterator (IteratorWithRemove or MutatingIterator or what have you), not by altering Iterator itself.

re: Rex’s question, perhaps only certain specific collection types should offer this kind of iterator — that seems plausible to me.

NthPortal · October 7, 2020, 7:22pm

that was my thought—separate traits for the different types of mutating iterator. perhaps RemovingIterator, UpdatingIterator, BufferIterator, etc.

Ichoran · October 7, 2020, 8:01pm

@SethTisue - That’s what I was thinking at first, until I started thinking which collections would support which operations.

Every mutable Seq can support update. Things that are not a Seq in general cannot easily support it; sets, maps, priority queues, and the like, would naturally move the element at once, which would then risk double-traversal.

But only maps, sets, and lists can even support remove. And insert is even worse–only ListBuffer can do that (ListMap also if you allow things like insert-doesn’t-insert-because-it’s-already-there).

The only collections that can support both update and delete are also only ListBuffer and ListMap.

So I’m just not convinced that the in-place-mutating-iterator operations are general enough or widespread enough to be worth making the standard library larger. There would be value in making ListBuffer return a MutatingListIterator or somesuch that had those handy methods.

What is the use case for anything more general, that doesn’t already have a good alternative, and isn’t an O(n^2) trap for the unwary?

NthPortal · October 7, 2020, 9:11pm

What about ArrayBuffer, ArrayDeque, and the several other Buffer subtypes?

Your descriptions of what operations Seq, Set and Map would support are exactly what I state in my proposal, so the objection is not super clear to me. I think their mutating iterators would have only basic functionality (mirroring their own methods), while the mutating iterators for Buffers would contain lots of functionality, much like their own methods.

Katrix · October 7, 2020, 9:30pm

I think the problem being pointed to is that insert for example would move all items down one. If the iterator does some insert for all the elements, you suddenly end up with O(n^2).

The question is more which collections would be able to support such operations where the operations aren’t O(n) per operation (or even worse).

Ichoran · October 8, 2020, 2:42am

The objection is that there isn’t a general mutating iterator capability. Although there are a few specific things that could be useful, I’m not sure the generalization pays for itself. A mutating iterator for ListBuffer would be handy. Everything else is pretty meh. (Katrix explained better than I did what the concern is for many of the operations.)

NthPortal · October 8, 2020, 4:17am

The thing is, all of the operations I listed for an iterator for Buffers mirror methods on Buffer itself. You’re right that for an iterator over ArrayBuffer, insertNext(A) has O(n) performance (as would remove()), but so does ArrayBuffer for its own insert(Int, A) (and remove(Int)). It happens to be that a mutating iterator for ListBuffer could get drastic performance improvements for certain operations over calling them directly on the ListBuffer, but I don’t think that precludes the value of having those operations exposed for other types of Buffer. If it does, why do those operations exist on Buffer in the first place?

Also, I think it’s worth noting that insertAll on ArrayBuffer (and other similar Buffer implementations) is O(n - idx), not O((n - idx) * m) or O(n²). You can shift the elements all the way at once rather than by one index for each element being inserted. On top of this, there may be system-/hardware-level optimisations that make it even cheaper than O(n - idx).

Ichoran · October 8, 2020, 5:01am

insert etc. exist as a limited-use option for when bulk operations like flatMap are slower. But you should never (nor would you likely be tempted to) write something like

def replace[A](b: ArrayBuffer[A], f: A => Option[Seq[A]]) {
  var i = 0
  while (i < b.length) {
    f(b(i)) match {
      case Some(xs) =>
        b.patchInPlace(i, xs, 1)
        i += xs.length
      case _ => i += 1
  }
}

That would be awful for general use. And you wouldn’t want to, because you’d write it like so instead:

def replace[A](b: ArrayBuffer[A], f: A => Option[Seq[A]]) {
  b.flatMapInPlace(a => f(a).orElse(a :: Nil)
}

So much easier! Also, so much faster in the worst / typical case.

But with the mutating iterator, you have the following temptation:

def replace[A](b: ArrayBuffer[A], f: A => Option[Seq[A]]) {
  val i = b.iterator
  while (i.hasNext)
    f(i.next).foreach{ xs =>
      i.remove()
      i.insertAllSkipping(xs)
    }
}

This is pretty simple code compared to the first option, and it does a better job of disguising just how wasteful all the slides are, and that it can’t actually be optimized.

Enabling this is, I think, has downsides that considerably outweigh the upsides.

Downsides:

Standard Library API is yet bigger
Encourages (and helps hide) a performance mistake
Encourages an error-prone construct (iterator with internal mutability) over a safer one

Upsides:

Easier to express a natural operation with possible early termination

Now, for ListBuffer only, I think the iterator approach adds real value, and can allow more easily expressed algorithms with improved performance, so a special mutating ListBufferIterator would be great.

But for almost anything else, the standard style for pretty much everywhere that has a standard style would be (at least after the first few cases of people being burned) “never use mutating iterators”.

And I just don’t think we should introduce a new feature whose first accomplishment would be going on a “never use this” list.

We could provide iterators that do less, but those would do so little that it’s hard to justify making them at all: extra API surface area for only a smidgen of benefit.

NthPortal · October 8, 2020, 6:03am

Fundamentally, the important operations are actually insert(All)Next, because the iterator then traverses the inserted value(s). You can’t do that with flatMapInPlace (and insert(All)Next essentially allows you to implement a recursive flatMapInPlace). insert(All)Skipping is just a convenience to avoid having to write val count = insert(All)Next(...); var i = 0; while (i < count) { next(); i += 1 } (and maybe it’s optimisable too, as a bonus).

In essence, an iterator exposes lower-level operations than the <op>InPlace family, and allows a wider range of operations. Could you use an index and a loop over an IndexedBuffer? Sure. I don’t personally find that a compelling argument.

As for the other operations:

remove() is O(n - idx) on IndexedBuffer
append(...) is O(1) on IndexedBuffer
appendAll(...) is O(m) on IndexedBuffer
prepend(...) is O(n) on ArrayBuffer but O(1) on ArrayDeque
prependAll(...) is O(n + m) on ArrayBuffer but O(m) on ArrayDeque

The performance characteristics vary between collections, but most are efficient for some collections. As always, you should choose your collection based on the performance characteristics you need.

As for the performance of remove() and insert(All)[Next|Skipping](...), there are potential ways to amortize the cost of shifting elements down or avoid doing so repeatedly, for a marginal cost to element lookup, by allowing a single gap of arbitrary size in the elements. Coincidentally, that change also solves the issue of inserting an ArrayBuffer (or ArrayDeque) into itself failing miserably.

However, I don’t personally see that as a required precursor to having a mutating iterator; I think that the merits of being able to operate on an iterator without tracking an index and being able to operate consistently regardless of what type of Buffer is being used stand on their own.

Ichoran · October 8, 2020, 7:12am

Well, I guess we have very different perspectives here. I don’t see how further discussion would make much of a difference. Others will have to weigh in, I guess, to broaden the perspective. (I don’t think we have time to conduct a usability study to see if my concerns are more pertinent than your expected advantages.)

In particular, what you tout as an advantage:

I think that the merits of being able to operate on an iterator without tracking an index and being able to operate consistently regardless of what type of Buffer is being used stand on their own.

is exactly the same thing that I think is a critical flaw! I think that capability makes the feature dangerous and unusable, for the same reason that people generally don’t use Seq except with the very most limited set of operations: because performance can be so unexpectedly bad that it’s not just slow but basically broken.

Incidentally, i don’t think any of the prepend/append operations belong on an iterator. It’s fundamentally a mixing of concerns prompted by the difficulty of implementing an iterator that works while the underlying collection is being altered.

And I don’t think the recursive ability of insertNext is good. I think it’s an easy way to generate an infinite loop by accident. Just forget to type Skipping, and your program gives you an OOM for free. (Or hangs, if you did a remove first.) Even if you mean to have insert, if you fail to compute the generative grammar correctly in your head, then…boom!

But mostly I just think there’s a major style difference here.

If I were to implement something akin to this, it would be a lazy editor via a loan pattern, where you’d get cursor-like capability to run around and make changes, and you could optimize things along the way, and the changes would only be guaranteed to be visible when the method ended. It’s a considerably bigger project, but would avoid most of the problems I’m worried about here.

Anyway, I think we need feedback from other people to get a sense of how/whether to proceed, given our rather substantially different perspectives.

martijnhoekstra · October 8, 2020, 8:21am

I do see some consensus emerging that it might be a nice idea to have the following capabilities in the following cases

replace current element for all buffers
insert one/many before/after for iterators over ListBuffer

Additionally, maybe remove the current element for Map and Set?

that gives something like

trait RemoveCurrent {
  def remove(): Unit
}
trait ReplaceCurrent[A] {
  def replace(a: A): Unit
}

trait BufferIterator[A] extends Iterator[A] with ReplaceCurrent[A]

trait MapIterator[K, V] extends Iterator[(K, V)] with RemoveCurrent

trait SetIterator[A] extends Iterator[A] with RemoveCurrent

class ListBufferIterator[A] extends BufferIterator[A] with RemoveCurrent {
  def remove(): Unit = ???
  def replace(a: A): Unit = ???
  def insertSkipping(a: A): Unit = ???
  def insertNext(a: A): Unit = ???
}

Is that already something that would be useful as a minimal implementation?

LPTK · October 8, 2020, 4:01pm

@Ichoran there is an argument to be made that methods with potentially-poor performance should not be exposed so readily, or should be somehow marked. But the fact is that Scala already exposes them all over the place. For instance, the +: and :+ operations/extractors available on all sequences, which make it look like you can implement recursive algorithms using them, but in fact they’d often have terrible performance.

I think that’s because the philosophy there is “don’t treat the user as an idiot”. Scala privileges power over hand-holding. If you fall into a performance trap, just review your code, or profile it, making sure you don’t have the wrong complexity. Any serious software engineer should know to do that, and no amount of API safeguarding will allow clueless programmers not to fall into complexity traps.

All that being said, the mutable iterators seem like they could be really handy, even those like insertAll and insertAllNext on buffers! It’s totally reasonable to use them if you only plan to insert your elements only once, for example — and the iterator interface lets you stop early to avoid wasting time, unlike flatMap (which also allocates more). Another reasonable use case is when you know the collection is very small so you know it won’t matter, which happens extremely often in practice.

By the way, are there currently ways of doing things like “insert this element if it is/is not already in the map” without requiring two map queries, in the new Scala collections API or in the scala-collection-contrib library? I always found support for this kind of things really lacking. Maybe a mutable iterator abstraction for maps could help here.

martijnhoekstra · October 8, 2020, 4:39pm

Yes, updateWith takes care of those kinds of remappings

updateWith(key) {
  case None => Some(value)
  case x => x
}

Ichoran · October 8, 2020, 7:56pm

@LPTK - The issue I’m raising is not whether or not one is treating the user like an idiot; the issue is the cognitive burden of not “acting like an idiot”. If we had better communication of computational complexity of operations, then the balance might change. I’m not sure.

As it is, I’m suspicious of adding operations that look efficient but randomly might not be. Especially since most of the time you wouldn’t even use mutable collections unless you were after improved performance. When you’re in a hole, stop digging, as the saying goes.

But if it would help people and I’m just a weird outlier, that’s fine–I’m a weird outlier in a variety of other ways, so it’s not too surprising. It would be good to think about it carefully, though.

NthPortal · October 8, 2020, 8:20pm

A couple use cases:

traversing a directory tree (hopefully you don’t have infinitely many files or an infinite hardlink loop); you could easily limit the depth of recursion by having each element have a depth associated with it (also useful for eventually rendering the directory tree)
this behaviour in a previous version of scopt that I had to send a PR for (Make OptionParser#optionsForRender safer [scopt3] by NthPortal · Pull Request #288 · scopt/scopt · GitHub) after making ListBuffer’s iterator fail-fast; it is essentially the same as walking a directory tree, except walking a list of commands and subcommands (and we’re guaranteed not to have infinitely many subcommands)

That sounds interesting and worth looking into. I’m worried about the cost/complexity of trying to store “speculatively executed” changes, however.

Ichoran · October 8, 2020, 9:07pm

Symlink loops will also kill it, if you follow symlinks. Not a great way to actually do it in general. You need a visited-path set, at which point you may as well just have a buffer of results and a stack of work.

If you just want to walk a tree (DAG), you may as well just use a stack. It’s just like using an iterator but less error-prone.

def traverse(work: List[Node]): Unit = work match {
  case x :: rest =>
    foo(x)
    traverse(children(x) ::: rest)
  case _ =>
}
def traverseAll(): Unit = traverse(root :: Nil)

If you want to flatten a tree, though, it’s a reasonable approach.

rssh · October 9, 2020, 5:55am

Can we expose O(*). properties into some typeclass (?) and then require one to be deduced when compiling.

something like:

  MutableIterator[C,A,C[A]] {
     ...  insertNext(x:IterableOnce[A])(given Complexity[C, Insert, O1])
  }