Experiments with Arrays

So now we’ve got ArraySeqs in 2.13, I wanted to experiment with whether Arrays, or to be precise Array based collections could be used as the default collection type.

The classic Cons List has extremely poor random access compared to Arrays and often creates numerous unnecessary reference objects, consuming large amounts of memory and stressing the garbage collector. The Cons List was chosen at a time when computation was expensive. On modern computers with their high clock rates and high IPC, computation is cheap. It is cache that is precious. Modern computers are extremely efficient at copying arrays of data and they are extremely efficient at simple mapping computations on Arrays of data. In addition Lists are not even type safe when used as accumulators. The Scala Vector seems to have surprisingly poor overall performance characteristics. So this in brief is the motivation for experimenting with moving to Arrays as the default.

So in a lot of cases you can use ArraySeqs and the methods provided by the Standard Library. However there are some cases where you need to create your own loops. Using ArrayBuffers as accumulators, then converting to them to ArraySeqs before returning them seems the obvious strategy. This actually has the advantage, that if you forget to convert your ArrayBuffer you get a type error, where as if you forget to reverse your List accumulator you get no compiler error or warning. The mutability is not exposed to the method user.

The problem is iterating through a sequence in a functional way. If you have a long sequence then dropping the head or the beginning of an ArraySeq is extremely inefficient. I tried using ArrayDeques, but that seems to lead to mutability leaking out, rather than being firmly encapsulated with in the method boundary. So it seems to me that An Array with offset class is required:

/** Array with Offset */
class ArrOff[A](val arrRef: Array[A], val offset: Int)
{
  def length: Int = arrRef.length - offset
  def head: A = arrRef(offset)
  def tail: ArrOff[A] = new ArrOff(arrRef, offset + 1)
  etc, etc
}
2 Likes

Thanks for your feedback!

Indeed, I recommend using foldLeft instead of using head/tail extraction. There are also some alternatives such as scala/scala-collection-contrib#18.

For information, IndexedSeqView does provide an O(1) implementation of drop, but not tail (unless I’m wrong). So:

def findFirst[A](v: IndexedSeqView[A])(p: A => Boolean): Option[A] =
  if (v.isEmpty) None else {
    val a = v.head
    if (p(a)) Some(a)
    else findFirst(v.drop(1) /* this is O(1) */)(p)
  }

findFirst(ArraySeq.tabulate(100)(identity).view)(_ % 2 == 0)

But I agree that this is not as ergonomic as using head/tail extraction on List.

So, I think your suggestion makes sense. I’m not sure it should extend Seq, although that would be useful to support the following style:

def findFirst[A](as: ArraySeq[A])(p: A => Boolean): Option[A] = findFirstLinearly(as.linearly)(p)

private def findFirstLinearly[A](as: ArraySeq.Linearly[A])(p: A => Boolean): Option[A] =
  as match {
    case Nil => None
    case a +: as => if (p(a)) Some(a) else findFirstLinearly(as)(p)
  }

(where ArraySeq.Linearly is the ArrOff you proposed)

The problem is that lists are often used inductively (constructing and deconstructing them head/tail-wise). For that specific use case, List is actually extremely efficient on the JVM, and I’d be interested to see any data structure that beats it on that ground. Also, it’s often not possible to replace these use cases with internal iteration (i.e., fold and unfold-style functions, which don’t give you enough control on the iteration — even with the lazyFold variants).

Your ArrOff data structure: (1) doesn’t support efficient inductive construction; and (2) for deconstruction, you’re going to create one ArrOff wrapper per extracted element, so each traversal is actually going to create as much garbage as the equivalent list representation itself (whereas the list doesn’t create any additional garbage on successive traversals). Moreover, you’re going to access each elements through one more pointer indirection than with a List.

The only pro of an Array-based collection is that the data itself will require less space, and will have better locality, which will improve things like fold and foreach. So it might be good for big, long-lived collections that you can traverse using internal iteration, but that’s not the typical situation in which Lists are normally used.

So all in all, I think a collection like ArrOff is not a possible replacement of List in any imaginable way.

2 Likes

In this case, I think the general problem with immutable List is that it is in fact a Stack. When used as an immutable stack it has the desired performance profile, but for general Seq-iness it just isn’t sufficiently optimal.

Those are my 2c on the matter ^^

2 Likes

In “Scala for the Impatient”, there is the rule “If in doubt, use Vector”.

I’m only proposing ArrOff for deconstruction. Of course it shouldn’t be used as a builder. ArrayBuffer is the best builder collection unless you know the size of the finished collection at the start when you can you can just use an Array and wrap it as an ArraySeq when you are finished. There are no doubt specialised situations where you want to use an ArrayDeque as builder, but I haven’t found one yet in my own code.

Lists overall performance characteristic is just poor. Vector is worse. List and Vector of fundamental values like Int and Double are horrendous. It seems to me the current Scala Collection eco system has the worst of all worlds, because as well as using two poor performance collections, there is considerable impedance caused by converting between collection types. Ultimately you have to have a default collection type, when you optimise away from the default type you have to overcome the conversion cost as well. Often a method writer may have no idea of the collection size of the method consumer.

I already use Arrays for storing deep value types. Efficient iterative deconstruction of Arrays in my view is a problem that must be solved. With changes to the language and even the platform if necessary. Using ArrayDeques has been suggested. But while they might have a place in optimised encapsulated code, in my view mutable collections should have no place in the interfaces of public methods.

Now for high efficiency you can just pass the offset as a separate parameter and you can do this internally and in private methods in performance critical code as a late optimisation. But this does complicate code and using the ArrOff seems to me the best compromise. I can’t see any reasons for not having an implicit conversion from an ArraySeq to an ArrOff with offset = 0.

Is there not a reasonable hope that the Jvm will erase these additional objects from the run time. Is there not a faint hope that even the JavaScript machine might not do the same. On Native they can simply be structs, so no objects created there.

And all this is only a stop gap till we get to Valhalla and Web Assembly. Really this is not rocket science. Its the same principle as the C++ Vector that has been around since the eighties. Perhaps Oracle can be shamed into making Valhalla into a top priority.

Tiark Rompf once did an experiment to replace every occurrence of List in the compiler by Vector (List is the standard sequence type used in the compiler). The result was a slowdown of about 10% if I remember correctly. So, List definitely has its uses.

1 Like

My personal opinion for quite some time has been to relabel List to Stack—because it is those use-cases (push=prepend, peek=head/headOption, pop=tail) which it shines for.

Vector unfortunately has issues (or atleast has had) issues with small sizes (mem overhead and indirection).

Caveat: List is the default implementation of Seq so you may end up using it when you don’t expect or want to. While we couldn’t possibly remove List we should consider changing the default. We already got some candidates for default Seq implementations during the development of 2.13 but more extensive performance testing is required.

2 Likes

IIUC (please correct me if I’m wrong, I want to know!) linked lists also have performance issues with indirection, especially in systems under intense memory pressure, and/or larger element sizes. Nothing prevents the tail pointer from pointing off to whatever wild place in DRAM the tail happens to have been allocated, so you’ll take very frequent L1/L2 misses when traversing lists, and pay hundreds of cycles for the fill from DRAM. I’d be surprised if any of the prefetch algorithms on current-day CPUs can help at all.

I vaguely recall hearing a rumour from Charles Nutter many years ago that HotSpot would detect and optimize linked-list access patterns, but I’ve tried and failed to track down any evidence for it a few times over the years.

If you allocate a list on one thread in one go, all the items will be close in memory (each thread allocates in its own chunk sequentially, there is no search for free space, allocation is a pointer bump and capacity check in the thread’s local allocation buffer -TLAB-). Most Lists have elements allocated on one thread, within a short time interval. The locality is generally pretty good.

GC walks references, and often ends up moving elements in relation go the traversal order. Not all GCs do this the same way, but its quite uncommon for any GC to un-do the locality of the original list – G1 and Shenandoah will compact regions and remove gaps, but not change the order AFAIK (its faster that way anyway).

3 Likes

Aha, thank you!

Based on Li Haoyi’s benchmarks I would say that the traversal performance of List is almost as good as that of Array (except for really simple traversals which the JIT can simply optimize away in the case of Array, apparently). Of course that’s in an artificial benchmark and not a real application, and when it comes to memory usage List is a lot worse. There is no collection which is good at everything. There’s Vector, which is mediocre at everything…

3 Likes

There are definitely some low hanging fruit, or at least promising leads, for how Vector could use less memory and have improved performance:

  • Replace with the Relaxed-Radix-Balanced Vector structure
  • Use arrays of size < 32, when possible, for all levels of the tree (currently on the right-hand-side of the Vector, every Array will have a size 32, but those could be trimmed to the exact size needed, saving up to (depth * 31 * pointer_size) memory on each appended/appendedAll op.
  • Use primitive arrays for primitive vectors, ala immutable.ArraySeq
  • Hand-roll some operations like map, foreach, rather than going through iterator.operation(..).to(Vector)
2 Likes

@szeiger Yes, I agree that List as a default Seq doesn’t make much sense, if any at all.

As @viktorklang says, the whole point of List is to be an immutable stack with fast cons/head-tail operations, and almost no one uses +:, the cons/head-tail of Seq — and rightly so, as it can behave extremely poorly if the wrong Seq is used.

I think we should never have had +: and :+ on Seq without any complexity guarantees. But it would probably be possible to guarantee amortized constant time complexity for these operations, by using wrappers when necessary (typically, List would produce a wrapper for :+ but not for +:). Then, this operation on Seq could actually become useful.

The grand majority of default Seq creation will be with a null or small number of element. The implementation is a non-issue. Lists are nice for tail recursion. They are as good as anything. The fixed #of elements variants are an option.

If the designer or developer knows what is important, they should be deliberately selecting the implementation. They can use SeqT.toX to achieve this w/o Using a class name.

There was a study once on Java libraries that concluded that the average size of collections was 4. In that light, I am not concerned about operations that have bad Big-O complexity on some types.

3 Likes

That’s a strange argument. If an application has 1000 three-element collections, and one 1000-element collection, the average collection size is 4, but having the wrong complexity for the big collection will likely ruin the application’s overall performance.
The average collection size is not what matters complexity-wise. If it did, by that logic, why isn’t the default mutable.Map implemented as mutable.ListMap? It’s probably faster than mutable.HashMap for sizes around 4.

@tyohDeveloper I agree that List has many appropriate use cases; I’m just not sure that being the default Seq is one of them.

2 Likes

I’m not convinced that average collection size is the right measure. I can imagine an app that has one collection of a million elements and to million collections of three elements each, and spends most of its time accessing the large collection.

Many collections are small --> so it’s justified to have implementations that work well only for small collections. You don’t need to throw the baby out of the bathwater by banning, say, :+ on lists. It could be quite appropriate to use it as long as the list is expected to be short.

2 Likes