Design for mutating iterators

Ichoran · October 9, 2020, 6:18am

@rssh - In practice I think this would be too unwieldy. Complexity is important but it’s not so important that every signature needs to add 30ish characters (not to mention that it wouldn’t be verified by the compiler in any way).

I was thinking more about scaladocs or an IDE tooltip that would indicate when a method had subclasses that differed in complexity by N or more. Maybe an -Xlint:+performance or somesuch. So if you use apply on a Seq, you get a little orange squiggly with a pop-up that says, “apply on Seq may be much slower for some subclasses than others”.

And it says that because of some annotation about expected complexity, I guess. I haven’t really thought through every step to see where it might be impractical.

Maybe you just write in the BufferIterator docs something like, "Note that insertNext on ListBuffer takes constant time, while insertNext on ArrayBuffer takes time proportional to the number of elements remaining. Thus, use of this method is inadvisable when performance may be a concern. In such cases, require a narrower type that guarantees the performance characteristics desired, e.g. ListBufferIterator instead of BufferIterator.

And go back and do this in a bunch of other places, like tail on Seq and so on.

It’s a huge effort. Unfortunately, learning this all was also a huge effort, and now it’s all stuck in my head, which is fine for me but not very useful for anyone else.

rssh · October 9, 2020, 7:16am

Also in this direction: in C++ stdlib exists class iterators_traits, which provide information (and types) which are used during generation of specialised algorithms.https://en.cppreference.com/w/cpp/iterator/iterator_traits, https://en.cppreference.com/w/cpp/iterator/iterator_tags .

In Scala something similar can be ‘enabler’ for an extension that provides a set of specialized methods.

And yes, this is huge – it’s needed that somebody will devote some time to experimenting and not sure that our initial thought is not trivial in comparison with the working model of dependencies between properties mapped to the possibility of algorithms.

Upd: better link in C++ reference: https://en.cppreference.com/w/cpp/iterator

LPTK · October 9, 2020, 8:43am

But again this is already the case all over the place. For instance, you might expect an iterator to be a fast and constant-space abstraction, but in fact it may (and often does, in practice) hide lots of time and space complexity behind each call to next and hasNext. For instance, unzip on iterators may allocate an entire new collection depending on how you consume the results.

I don’t understand people who keep bringing up this excuse to justify half-assing the mutable collections API. If you’re going to expose and support an API, make it as good as possible, period. Yes, people do have many use cases where these collections are needed, otherwise they wouldn’t even be there in the first place. Sometimes it’s the only way to even have acceptable performance (let alone good performance) depending on the use case – think about an algorithm like Dijkstra, or caching. I find this attitude of “you are not going to use it anyway” annoying and presumptuous.

Not sure how to interpret this, but I don’t think it’s a productive thing to say here.

Ichoran · October 9, 2020, 7:42pm

Yes, but are you arguing that this is good? The “stop digging” saying is a compact reference, apparently not sufficiently universal, to the argument that having a large quantity of something is not a good argument for more if that thing is something you don’t really want.

??? I’m arguing that this is the half-assed version if made too general. Did you just not read my whole sentence?

The right implementation, in my view, exposes an API that is a good balance of the various concerns: utility, performance, complexity.

The proposed operations achieve a good balance for ListBuffer.

They aren’t very good, I don’t think, for ArrayStack. The utility there would come from bidirectionality and having the index available at all times; the stuff for ListBuffer would be more likely to get you into trouble than help you out. If you want to use the ListBuffer version, just to(ListBuffer) and work with that. If you’re doing more than a couple operations, you’ll probably have better performance this way, even if you have to convert back.

For ArrayBuffer, ArrayStack, and friends, you might also want:

hasPrev
prev
index
remaining
swap (switches just-seen item with one before)
swapWith(index: Int) (swap is swapTo(index - 1))
index_=(newIndex: Int) (relocates iterator to this position)
valueAt(relativeIndex: Int)
setAt(relativeIndex: Int, value: A)

and so on. You can implement all these for ListBuffer too, at the cost of O(n) performance for things that seem O(1)ish, like swap and prev.

I don’t think this is a good design for ListBuffer, so I don’t think they should be there.

LPTK · October 24, 2020, 7:38pm

Sorry, I forgot to answer.

I am not arguing that surprising performance is good per se. I am arguing that reducing the usefulness of APIs because of the fear that some operations might have surprising performance to some programmers does not sound like it’s in line with Scala’s existing philosophy. I think one should strive to follow a single design philosophy when designing a standard library.

Historic half-assing of mutable collections dates back to the previous collection design, and was finally addressed in 2.13, by adding many methods on mutable collections that were sorely missing. Some of these new methods are exactly like you describe, such as dropInPlace on Buffer, which would have surprising performance when used repeatedly if you naively thought all buffers could perform that operation in constant time. These methods were probably missing in the first place because of the “you wouldn’t even use mutable collections unless you were after improved performance” line of reasoning.

That’s a big “if”. As I mentioned, it makes perfect sense to use these operations on ArrayBuffer when you only want to do so once. It also makes perfect sense if the buffer is small (which is the vast majority of buffers used in everyday code). It’s actually going to be much faster repeatedly shifting a small array that fits in your cache than pointer-chasing a ListBuffer! In case you haven’t seen it, have a look at this talk excerpt: https://www.youtube.com/watch?v=YQs6IC-vgmo

You rightly mention that mutable collections are for performance-conscious people. But any sane performance-conscious people would know to look up the complexity of their collection operations when writing algorithms. It doesn’t make any sense to look into low-level mutable implementations if you’re not even clear on the asymptotic complexity of your code.

Ichoran · October 24, 2020, 9:54pm

Okay, that’s a good point. I hadn’t quite framed it that way to myself, but you’re right. The philosophy is to place correctness first, functionality second, and performance third (but let you know what it is in the docs, so you can choose appropriately). This should be no different.

I withdraw the objection!

Instead, I advocate for useful operations that aren’t specific to ListBuffer. For instance, the requirement that remove can only work on the most recent thing you’ve picked up is very ListBuffer-friendly, but the constraint makes no sense for ArrayBuffer.

SethTisue · October 27, 2020, 3:34pm

As background for this discussion, I have a pet example of a mutable-collection task that’s easy to solve with remove() and hard otherwise.

Suppose every time there is"foo" in a ListBuffer we want to remove it and the item after it.

With the Java-style API with remove() you can write (modulo ambiguity in the problem statement about what happens if there are consecutive "foo"s):

val it = xs.iterator()
while (it.hasNext) {
  if (it.next() == "foo") { it.remove(); it.next(); it.remove(); }
}

This is the sort of code CS 101 students can easily write. Whereas with the Scala-style API, solving it efficiently (especially in the absence of efficient indexed access) is a brainteaser. You can do it with filterInPlace by tracking state as you go in a boolean var, but it’s a lot more difficult to write (and difficult to read) then the Java-style version.

NthPortal · October 27, 2020, 10:15pm

I had a relevant discussion with @sjrd (I believe) a couple months ago about iterator design in general, but I’ve no idea where or how to find it. This is also a potential time to revisit the core design and investigate a design based around advance():Boolean and current:A rather than hasNext:Boolean and next():A. There’s no particular reason the mutating iterators need the same design as the regular ones.

Ichoran · October 28, 2020, 12:12am

Indeed, remove makes a lot more sense with advance/current iterators than hasNext/next iterators. It also allows mutation in a convenient way: i.current = x.

One important decision is whether it is permissible for current to be empty despite the position being in the middle of a non-empty collection.

I do think this is worth exploring as an superior design for mutable iterators. They can still have a hasNext and next, implemented in terms of advance and current (or vice versa, which is approximately the case now for BufferedIterator).

Ichoran · October 28, 2020, 12:14am

@Seth - Good example!

Now suppose every time there is a "foo" in a WhateverBuffer, we want to remove it and the item before it. Now what?

What if we want to remove the item, and the one ahead, and the one behind?

val i = xs.mutableIterator
while (i.hasNext()) {
  if (i.next == "foo") {
    i.remove()
    i.remove()
    i.next; i.remove()
  }
}

It seems like we would want to support this kind of thing too, no?

NthPortal · October 28, 2020, 12:31am

I think it really depends if you consider the cursor to be on an element or between elements. If it’s on an element, it feels more natural to me if it can be empty (having it automatically advance (or go backwards) feels weird to me). But if it’s between elements, the method ought to be called something different (previous:A?).

Removing the item before is tough, as it requires you to be able to iterate backwards (kind of like java.util.ListIterator)

Ichoran · October 28, 2020, 1:14am

Well, it’s not very hard if we assume an API like

val i = xs.mutableIterator
while (i.advance) {
  if (i.current == foo) {
    i.remove
    i.retreat; i.remove
    i.advance; i.remove
  }
}

Implementing retreat for ListBuffer is a bit annoying, but the API is easy to use.

NthPortal · October 28, 2020, 1:17am

that is precisely the API I was imagining. to the T (modulo some ()). and it seems to me that’s where we consider current to be empty after calling remove()?

Ichoran · October 28, 2020, 1:18am

Yes, I was envisioning empty-current-after-remove here.

It’s more like a cursor than an iterator.

I’ve written something like this before for some specialized use that I no longer remember. I remember it fondly, though. It was very useful.

NthPortal · October 28, 2020, 1:21am

I’ve been thinking about this all afternoon/evening, and thinking the type should be called Cursor (or some variation thereof). Perhaps with a method to convert it into an Iterator, if needed.

NthPortal · October 28, 2020, 7:42am

Idea: put different methods in different types (as mentioned earlier), and have less-performant-but-still-useful methods behind a method that returns a more refined type with more operations.

I’ll try to use an example to clarify that (names not in any way final).

trait SlowOps[A] { self: Cursor[_] =>
  def withSlowOps: A
}

class ListBuffer[A] {
  def cursor: InsertCursor[A]
         with RemoveCursor[A]
         with SlowOps[InsertCursor[A]
                 with RemoveCursor[A]
                 with ReverseCursor[A]] = ???
}

val buf = ListBuffer(1, 2, 3)

buf.cursor.retreat              // does not compile
buf.cursor.withSlowOps.retreat  // compiles

they’re there if you need them, but you have to explicitly acknowledge that their implementation may not be performant.

bonus simple implementation of SlowOps, which is essentially a compile-time-checked downcast:

trait SelfSlowOps[A] extends SlowOps[A] { self: Cursor[_] with A =>
  def withSlowOps: A = self
}

this way you only need a single implementation, and type returned by cursor just doesn’t capture the full type of the implementation

bonus 2: get rid of that tedious type duplication:

@annotation.showAsInfix
type WithSlow[A <: Cursor[_], B] = A with SlowOps[A with B]

class ListBuffer[A] {
  def cursor: InsertCursor[A] with RemoveCursor[A] WithSlow ReverseCursor[A] = ???
}

LPTK · October 28, 2020, 11:22am

I think this has been mentioned before, but it’d also be super useful to have “buffered” versions of the mutable iterator operations.

val xs = ArrayBuffer(1,2,3,4)
val it = xs.mutableIterator.buffered
while (it.advance) if (it.current % 2 == 0) it.remove
println(xs) // ArrayBuffer(1,2,3,4)
it.apply()
println(xs) // ArrayBuffer(1,3)

As an asymptotically-better alternative to:

val xs = ArrayBuffer(1,2,3,4)
val it = xs.mutableIterator
while (it.advance) if (it.current % 2 == 0) it.linearOps.remove
println(xs) // ArrayBuffer(1,3)

(I renamed @NthPortal’s withSlowOps to linearOps.)

NthPortal · October 28, 2020, 10:43pm

such an implementation is quite complicated, and requires solving a lot of problems:

what happens if you remove and element, advance a few, remove another element, advance a few more, etc.? how is that kept track of?
what if you retreat instead of advancing?
what if you go back and forth? how do you know which elements to skip after removal?
if you insert elements, how do you have the cursor on them when going back and forth?

in short, I think having a transactional design is immensely more complicated, not something you can “just do” (it’s a tremendous amount of work for each collection’s implementation), and not worth the complexity

Ichoran · October 29, 2020, 4:10am

The SlowOps idea is decent, but if we’re targeting Scala 3 (and even if we’re targeting Scala 2), I would be more inclined to use implicit evidence, where the fast versions get evidence by default, and for the slow versions the evidence has to be imported.

The downside of this scheme is that it would take a lot more thought upfront because everything that might possibly have a slow implementation would have to be decorated with an implicit / given. But the upside is that the usage is non-awkward; when you want it to just work, you don’t have to litter your code with extra method calls pointing out tediously that yes, indeed, we still continue not to care that this is O(n).

This also would help keep the type signatures under control Even InsertCursor[A] with RemoveCursor[A] WithSlow ReverseCursor[A] is quite the mouthful, albeit much better than the original.

But I haven’t really had the time to mock up a proposal for it, so I’m not highly confident that this is the way to go. It’s more of a hunch at this point.

NthPortal · October 29, 2020, 7:21am

you can just do val cursor = buf.cursor.withSlowOps once at the beginning then