We are considering adding new operations to the standard collections in 2.13. But, obviously, we don’t want to bloat the collections with niche operations. We have to find which operations are important and useful enough to the community.
Note: you should consider the meaning of the operation rather than details like its name, but feel free to comment about such details on the corresponding github issues (linked from the gist).
Note: if you think of some important operation that should be also included, please open an issue in the github repository.
I find myself wanting a function on a Seq[(K, A)] to Map[K, Set[A]] or
Map[K, Seq[A]]. Note that this is like groupBy, but doesn’t retain the key
in the value collection.
Besides the already mentioned methods I often find myself missing takeUntil (when I want one element less than I can achieve with takeWhile) and intercalate.
Frequently finding myself needing Seq[Optoin[A]] => Seq[A], getting rid of all the Nones. Sure you can use flatten but that’s terrible for readability. In JS land’s Lodash it’s called compact.
Seq(Some(1), None, Some(5)).compact == Seq(1, 5)
Also, Seq[T] => (T => K) => Map[K, T] which is called keyBy in Lodash.
@jatcwang - compact is called flatten. We already have the functionality, and many people are used to the existing name; this isn’t going to change. I am not sure why you think flatten is terrible for readability (since it does take a nested and thus “non-flat” representation and makes it flat, whereas compact could also mean “don’t retain any buffer for further modifications”), but anyway, it’s established and thus will stay that way.
keyBy is a possibly dangerous simplification of groupBy because collisions in key are resolved in an arbitrary way. The proposed groupMapReduce provides this functionality in a principled way (i.e. makes you specify what you want to do in case of collisions).
@ShaneDelmore - I’m not sure why you would want one less than takeWhile. The natural meaning of takeUntil (in analogy with 0 until 5) is that takeUntil(p) takes everything before but not including that element where p returns true; and this is exactly what you get from takeWhile(!p). It is takeTo, where the last element is included, that is difficult to achieve with the current methods.
I think we should take inspiration from what Spark added, because that’s both clearly successful and very well known (the same API is also in all other Spark frontends, which means it’s the de facto standard for data science now).
Examples are reduceByKey and operations that treat a list of pairs as a map.
Overall, however we have to be very careful here. It’s easy to make a case for each individual addition. But if we add them all we will end up with a mess, and its easy to make a case against that, too . The problem is how to balance one against the other.
One possible solution that we should explore further is to put new (and also some old?) operations in decorators as opposed to the core classes. That should be doable for any method that does not need an override in a subclass, and I would hope that’s the majority. That way we could experiment with different “bundles” that cover different functionalities. We’d also make the core library lighter, which is very important in some cases.
I like @odersky’s suggestion of, at least for now, putting many of the suggestions in decorators in a separate library. That way they have room to evolve a bit without needing standard-library-level stability.