Circular-buffers are very simple data structures that uses a start and end pointers on a resizable-array which provides strictly better performance than vanilla resizable arrays. Specifically, without sacrificing space and implementation complexity, clear(), prepend(), trimStart() and trimEnd() and removes from start or end become constant-time and all other operations stay the same complexity.
However, accessing or updating an element requires one more addition and one modulo* operation, on top of the bound checks. In fact, youâre using floorMod which seems slowerâ and since its argument seems always positive you should be able to use % (which isnât fast, though).
To use this to replace ArrayBuffer (instead of adding to it) one would have to ensure the overhead is negligible, even when the memory access cost is minimal (e.g. when the entire array is in L1 cache). Thatâs far from obviously true.
However, the expert on these matters is @Ichoran, so let me defer to him.
I think thatâs an interesting idea! The timing is good as a group of us are currently working on a strawman design for new collections. You might add an issue (or, even better, a PR) here:
How about adding a new CircularBuffer collection alongside ArrayBuffer?
The other question is whether to make it the default mutable Seq. That would probably be less of an issue since if youâre relying on characteristics of ArrayBuffer you should be explicitly using that class anyway.
Yeah, that sounds like the right tack to me. As @Blaisorblade points out, the characteristics arenât identical, and folks have established expectations of ArrayBuffer, so replacing the implementation with an entirely different data structure seems a bit iffy to me â not least, because a circular buffer isnât what I expect an implementation of âarray bufferâ to be.
But adding it as a new data structure seems totally uncontroversial, and potentially quite useful. And making it the default mutable Seq is an intriguing idea, although I canât say Iâve got a handle on all the ramifications thereâŚ
Although my very naive benchmarks did not show any regressions (obviously the prepend and trims are constant time now) from mutable.ArrayBuffer, I agree with you. floorMod was introduced in Java 8 so we can get rid of it especially as the second argument is positive here. Also, if we always allocate arrays whose sizes are powers of 2, its trivial to calculate mod: x mod y = x & (y -1) when y is power of 2.
Itâs critical (and less obvious) that also the first argument is non-negative â -1 % 10 = -1, unlike with floorMod.
Sounds excellentâand that works even for negative x. Power-of-2 sizes are sufficient to ensure the asymptotic complexity. Thereâs no point in non-power-of-2 sizes, as long as you always double the size when resizing.
In fact, if you want to trade off time for space (in constant factors, not asymptotically), you can multiply the array size by any number > 1 â 2 is just the standard choice, but any factor works, while adding a constant to the size doesnât. Picking less than 2 reduces the maximum amount of empty slots but makes resizes more frequent. IIRC, the Cormen book has the details. I donât remember many resizable arrays offering this amount of control so Iâd leave it out and stick to power-of-2 sizes.
Last nitpick: for better constant factors, you want to use Javaâs arraycopy inside resize, since that does a single boundary check for the entire operation.
If this is to become a bit more generic it could be considered dimensions like in an array. As new values are added the dimension rolls around. But since more usages will be 1D, 2D, 3D maybe you can have a 1D, 2D, 3D optimised special case.
Forgive me if Iâm wrong, but the data structure described is not a Circular Buffer, because Circular Buffers are not growable.
A circular buffer, circular queue, cyclic buffer or ring buffer is a fixed size queue that on overflow will throw an exception, or drop either the head or the entire buffer. Source: https://en.wikipedia.org/wiki/Circular_buffer - and making it growable does not necessarily make it more useful, because the fixed size (and consequently its behaviour on overflow) is the whole point.
I do have simple implementations for such circular buffers for reference:
Good point. Actually, in general âbufferâ tends to refer to fixed-length data structures, so ArrayBuffer and Buffer in Scala maybe arenât the best names either. In JavaScript for example ArrayBuffer is a fixed-length raw data buffer. But it may be too late to change the terminology in Scala.
ArrayDeque might be a good name for the âresizing circular bufferâ structure. Deques do normally resize, so itâs a deque that also behaves like an array. In Java ArrayDeque uses a circular buffer internally, though it does not provide random access to elements.
I guess we could also call it ArraySeqDeque, since itâs capable of behaving like a Seq and deque, and itâs implemented as an array, but I prefer ArrayDeque.
I think a growable, mutable circular buffer, as a general purpose mutable data structure, is a wonderful idea.
Whether we call it a âcircular bufferâ or something else doesnât really matter to me. Both growable and non-growable circular buffers would be valuable, but non-growable circular buffers are a more specific use case where-as growable circular buffers can replace all sorts of things:
mutable.Buffer
mutable.Queue
mutable.Stack
After all, growable circular buffers are basically Deques, and provide really fast adding/removing at the start/end. This is in addition to providing similar performance characteristics to a normal growable-array buffer for normal growable-array operations (indexed-lookup, iteration, indexed-replacement, âŚ), , without much implementation complexity.
For what itâs worth, I have wanted this enough that I have implemented it twice already:
I think a growable circular buffer would be a wonderful general-purpose mutable data structure, and would serve well as a âdefaultâ mutable data structure.
I agree with everything you said, I felt the need for it myself, I just think we shouldnât use the name Circular Buffer. Name it something else please. ArrayDeque for example is probably fine and more specific, as many users know what a Deque is and theyâll immediately see itâs backed by an Array, so cool, it means it has good traversal / indexing performance, etc.
ArrayDeque (or something like that) sounds indeed like an improvementâthat name is also used by C++'s STL (http://www.cplusplus.com/reference/deque/deque/). I was also confused I hadnât seen this beforeâI just wasnât recognizing a deque because of the name (and because I last saw them ages ago). And calling this a ring buffer confused me too â ring buffers are what you need, for instance, when you receive video streaming.
FWIW, Wikipedia also mentions growing circular buffers here.
Another alternative to âgrowable circular buffersâ are hashed array trees. They are also very simple - basically instead of doubling allocation and copying when full, we allocate a fixed block each time we reach capacity. They are asymptotically as fast as âgrowable circular buffersâ but only allocates O(sqrt(n)) extra space instead of O(n/2) extra space. In practice random-access might be slightly slower because of locality.
They compare more to normal ArrayBufferâhead insertions arenât O(1) but O(n). Iâm not sure they should be in the coreâbut a goal of the new collection libraries, IIRC, is to simplify creating new Scala collections.
I agree with you re: the name, CircularBuffer, I would usually associate it with the âDisruptorâ and other implementations. Where Iâve seen it used (algos) it has traditionally been a fixed size/lockless data structure.
The only thing is binary compatibility against Scala 2.12.0 - does adding an extra class matter? If yes, can we introduce this as a private class and just change mutable.{Queue, Stack, Buffer} to use this instead?