Experiments with Arrays

#4

In this case, I think the general problem with immutable List is that it is in fact a Stack. When used as an immutable stack it has the desired performance profile, but for general Seq-iness it just isn’t sufficiently optimal.

Those are my 2c on the matter ^^

2 Likes
#5

In “Scala for the Impatient”, there is the rule “If in doubt, use Vector”.

#6

I’m only proposing ArrOff for deconstruction. Of course it shouldn’t be used as a builder. ArrayBuffer is the best builder collection unless you know the size of the finished collection at the start when you can you can just use an Array and wrap it as an ArraySeq when you are finished. There are no doubt specialised situations where you want to use an ArrayDeque as builder, but I haven’t found one yet in my own code.

Lists overall performance characteristic is just poor. Vector is worse. List and Vector of fundamental values like Int and Double are horrendous. It seems to me the current Scala Collection eco system has the worst of all worlds, because as well as using two poor performance collections, there is considerable impedance caused by converting between collection types. Ultimately you have to have a default collection type, when you optimise away from the default type you have to overcome the conversion cost as well. Often a method writer may have no idea of the collection size of the method consumer.

I already use Arrays for storing deep value types. Efficient iterative deconstruction of Arrays in my view is a problem that must be solved. With changes to the language and even the platform if necessary. Using ArrayDeques has been suggested. But while they might have a place in optimised encapsulated code, in my view mutable collections should have no place in the interfaces of public methods.

Now for high efficiency you can just pass the offset as a separate parameter and you can do this internally and in private methods in performance critical code as a late optimisation. But this does complicate code and using the ArrOff seems to me the best compromise. I can’t see any reasons for not having an implicit conversion from an ArraySeq to an ArrOff with offset = 0.

Is there not a reasonable hope that the Jvm will erase these additional objects from the run time. Is there not a faint hope that even the JavaScript machine might not do the same. On Native they can simply be structs, so no objects created there.

And all this is only a stop gap till we get to Valhalla and Web Assembly. Really this is not rocket science. Its the same principle as the C++ Vector that has been around since the eighties. Perhaps Oracle can be shamed into making Valhalla into a top priority.

#7

Tiark Rompf once did an experiment to replace every occurrence of List in the compiler by Vector (List is the standard sequence type used in the compiler). The result was a slowdown of about 10% if I remember correctly. So, List definitely has its uses.

1 Like
#8

My personal opinion for quite some time has been to relabel List to Stack—because it is those use-cases (push=prepend, peek=head/headOption, pop=tail) which it shines for.

Vector unfortunately has issues (or atleast has had) issues with small sizes (mem overhead and indirection).

#9

Caveat: List is the default implementation of Seq so you may end up using it when you don’t expect or want to. While we couldn’t possibly remove List we should consider changing the default. We already got some candidates for default Seq implementations during the development of 2.13 but more extensive performance testing is required.

1 Like
#10

IIUC (please correct me if I’m wrong, I want to know!) linked lists also have performance issues with indirection, especially in systems under intense memory pressure, and/or larger element sizes. Nothing prevents the tail pointer from pointing off to whatever wild place in DRAM the tail happens to have been allocated, so you’ll take very frequent L1/L2 misses when traversing lists, and pay hundreds of cycles for the fill from DRAM. I’d be surprised if any of the prefetch algorithms on current-day CPUs can help at all.

I vaguely recall hearing a rumour from Charles Nutter many years ago that HotSpot would detect and optimize linked-list access patterns, but I’ve tried and failed to track down any evidence for it a few times over the years.

#11

If you allocate a list on one thread in one go, all the items will be close in memory (each thread allocates in its own chunk sequentially, there is no search for free space, allocation is a pointer bump and capacity check in the thread’s local allocation buffer -TLAB-). Most Lists have elements allocated on one thread, within a short time interval. The locality is generally pretty good.

GC walks references, and often ends up moving elements in relation go the traversal order. Not all GCs do this the same way, but its quite uncommon for any GC to un-do the locality of the original list – G1 and Shenandoah will compact regions and remove gaps, but not change the order AFAIK (its faster that way anyway).

2 Likes
#12

Aha, thank you!

#13

Based on Li Haoyi’s benchmarks I would say that the traversal performance of List is almost as good as that of Array (except for really simple traversals which the JIT can simply optimize away in the case of Array, apparently). Of course that’s in an artificial benchmark and not a real application, and when it comes to memory usage List is a lot worse. There is no collection which is good at everything. There’s Vector, which is mediocre at everything…

3 Likes
#14

There are definitely some low hanging fruit, or at least promising leads, for how Vector could use less memory and have improved performance:

  • Replace with the Relaxed-Radix-Balanced Vector structure
  • Use arrays of size < 32, when possible, for all levels of the tree (currently on the right-hand-side of the Vector, every Array will have a size 32, but those could be trimmed to the exact size needed, saving up to (depth * 31 * pointer_size) memory on each appended/appendedAll op.
  • Use primitive arrays for primitive vectors, ala immutable.ArraySeq
  • Hand-roll some operations like map, foreach, rather than going through iterator.operation(..).to(Vector)
2 Likes
#15

@szeiger Yes, I agree that List as a default Seq doesn’t make much sense, if any at all.

As @viktorklang says, the whole point of List is to be an immutable stack with fast cons/head-tail operations, and almost no one uses +:, the cons/head-tail of Seq — and rightly so, as it can behave extremely poorly if the wrong Seq is used.

I think we should never have had +: and :+ on Seq without any complexity guarantees. But it would probably be possible to guarantee amortized constant time complexity for these operations, by using wrappers when necessary (typically, List would produce a wrapper for :+ but not for +:). Then, this operation on Seq could actually become useful.

#16

The grand majority of default Seq creation will be with a null or small number of element. The implementation is a non-issue. Lists are nice for tail recursion. They are as good as anything. The fixed #of elements variants are an option.

If the designer or developer knows what is important, they should be deliberately selecting the implementation. They can use SeqT.toX to achieve this w/o Using a class name.

#17

There was a study once on Java libraries that concluded that the average size of collections was 4. In that light, I am not concerned about operations that have bad Big-O complexity on some types.

3 Likes
#18

That’s a strange argument. If an application has 1000 three-element collections, and one 1000-element collection, the average collection size is 4, but having the wrong complexity for the big collection will likely ruin the application’s overall performance.
The average collection size is not what matters complexity-wise. If it did, by that logic, why isn’t the default mutable.Map implemented as mutable.ListMap? It’s probably faster than mutable.HashMap for sizes around 4.

@tyohDeveloper I agree that List has many appropriate use cases; I’m just not sure that being the default Seq is one of them.

#19

I’m not convinced that average collection size is the right measure. I can imagine an app that has one collection of a million elements and to million collections of three elements each, and spends most of its time accessing the large collection.

#20

Many collections are small --> so it’s justified to have implementations that work well only for small collections. You don’t need to throw the baby out of the bathwater by banning, say, :+ on lists. It could be quite appropriate to use it as long as the list is expected to be short.

2 Likes
#21

This is true, but even with small-ish collections, inefficient operations can have a surprisingly big impact on performance, e.g. a common mistake I see beginners make is recursing on the tail of an Array, which means going from linear to quadratic time, which even for small-ish values of n can blow up your execution time.

3 Likes
#22

The designer/engineer and programmer need to understand their domain and use the best programming types for the problem**. There are issues with this,*** but providers can use it to split the difference if they consumer of the library isn’t principled. Declare the parameter as an Iterator rather than a Seq (or worse, List, …). The advantage is the consumer can create an iterator if they know the collection is big or just pass in the collection if it isn’t. It still requires some design introspection, but that can happen latter in the implementation. As Seq are not guaranteed to end, the consumer needs to understand that detail to assure the call/dispatch will return. Knowing the O and size of the collection is just a finer distinction.

** The only collection types are Seq, Map, Set, (or the multi-x versions of map & set). Other than that, its all implementation. We wouldn’t be having this conversation if this wasn’t true.

*** Creating an iterator (stateful) isn’t non-fp any more than are for-comprehensions (for sequences). The program either uses implicit state (the stack, tail for tail recursion) or explicit state (iterator). In both cases, the state must be local exclusively to the library and allow no side effect. A single next within the loop is referentially transparent.

The types Set and Map are not ordered by definition (in mathematics). The contract of the library is “order is not relevant. The contract of any of the types is “halting is not is not assumed.”

#23

I would add that performance of List instantiation is getting worse in Scala 2.13.0: