Experiments with Arrays

smarter · July 13, 2019, 1:43pm

This is true, but even with small-ish collections, inefficient operations can have a surprisingly big impact on performance, e.g. a common mistake I see beginners make is recursing on the tail of an Array, which means going from linear to quadratic time, which even for small-ish values of n can blow up your execution time.

tyohDeveloper · July 13, 2019, 5:17pm

The designer/engineer and programmer need to understand their domain and use the best programming types for the problem**. There are issues with this,*** but providers can use it to split the difference if they consumer of the library isn’t principled. Declare the parameter as an Iterator rather than a Seq (or worse, List, …). The advantage is the consumer can create an iterator if they know the collection is big or just pass in the collection if it isn’t. It still requires some design introspection, but that can happen latter in the implementation. As Seq are not guaranteed to end, the consumer needs to understand that detail to assure the call/dispatch will return. Knowing the O and size of the collection is just a finer distinction.

** The only collection types are Seq, Map, Set, (or the multi-x versions of map & set). Other than that, its all implementation. We wouldn’t be having this conversation if this wasn’t true.

*** Creating an iterator (stateful) isn’t non-fp any more than are for-comprehensions (for sequences). The program either uses implicit state (the stack, tail for tail recursion) or explicit state (iterator). In both cases, the state must be local exclusively to the library and allow no side effect. A single next within the loop is referentially transparent.

The types Set and Map are not ordered by definition (in mathematics). The contract of the library is “order is not relevant. The contract of any of the types is “halting is not is not assumed.”

plokhotnyuk · July 17, 2019, 7:34pm

I would add that performance of List instantiation is getting worse in Scala 2.13.0:

tyohDeveloper · July 30, 2019, 7:10pm

I don’t have any real idea why. The language people had to pick one, and they picked List. For the typical array in non-numeric codes, it doesn’t really matter. All small collections are close enough. If I am writing numeric codes, I know what I need to optimize and won’t use the generic constructors. For things like the integer tensors should be using tulles, bitsets, or something like that.

It would take more work, but a similar fixed structure could optimize most codes based on the # of elements in the constructor (e.g. Tuple[2] for Seq). But the advantages wouldn’t be much for most codes anyway.

A general principle is premature optimization is the root of all evil. Unless you absolutely know there is a problem, try not to care. If you know there is a structure that will make a huge difference, choose that.