Which operations should be included in the new collections?

Hi all,

We are considering adding new operations to the standard collections in 2.13. But, obviously, we don’t want to bloat the collections with niche operations. We have to find which operations are important and useful enough to the community.

To this end, I’d like to gather your opinion about the inclusion of a dozen of new operations that are described in the following gist: https://gist.github.com/julienrf/79cc1937530a3b9e26de930794a16042

Please give me your opinion by filling the following poll: https://goo.gl/forms/TWcrH81MoRppvXwN2

Note: you should consider the meaning of the operation rather than details like its name, but feel free to comment about such details on the corresponding github issues (linked from the gist).
Note: if you think of some important operation that should be also included, please open an issue in the github repository.

7 Likes

Please consider operations defined on scalaz.Foldable, i find selectSplit in particular to be in demand.
Otherwise ‘lazy folds’ are certainly welcome.

2 Likes

I think Christopher Vogt is spot on with his extensions:

  • groupWith
  • distinctWith
  • containsDuplicatesBy
  • containsDuplicatesWith
  • foldWhile
  • reduceWhile
5 Likes

a related meta-ticket with some candidates: https://github.com/scala/bug/issues/8958

also, I’m a distinctBy fan: https://github.com/scala/collection-strawman/issues/175#issuecomment-318902270, https://github.com/scala/scala/pull/3850

4 Likes

I find myself wanting a function on a Seq[(K, A)] to Map[K, Set[A]] or
Map[K, Seq[A]]. Note that this is like groupBy, but doesn’t retain the key
in the value collection.

4 Likes

This looks like we have something close to Map[K, Set[A]] it’s MultiMap and it’s mutable, bummer.

Multimaps can also be ordered and unordered (the scala mutable MultiMap is unordered).

2 Likes

Besides the already mentioned methods I often find myself missing takeUntil (when I want one element less than I can achieve with takeWhile) and intercalate.

2 Likes

distinctBy would be great

4 Likes

Frequently finding myself needing Seq[Optoin[A]] => Seq[A], getting rid of all the Nones. Sure you can use flatten but that’s terrible for readability. In JS land’s Lodash it’s called compact.

Seq(Some(1), None, Some(5)).compact == Seq(1, 5)

Also, Seq[T] => (T => K) => Map[K, T] which is called keyBy in Lodash.

Seq(1, 2, 3).keyBy(_.toString) == Map(
  "1" -> 1,
  "2" -> 2,
  "3" -> 3
)

Also got quite a few use of Seq[(K, A)] => Map[K, Seq[A]] (note we do not get rid of duplicate As in the same group like @shawjef3’s usage.

This operation is supported in the current collections through the flatten method

assert(Seq(Some(1), None, Some(5)).flatten == Seq(1,5))

@jatcwang - compact is called flatten. We already have the functionality, and many people are used to the existing name; this isn’t going to change. I am not sure why you think flatten is terrible for readability (since it does take a nested and thus “non-flat” representation and makes it flat, whereas compact could also mean “don’t retain any buffer for further modifications”), but anyway, it’s established and thus will stay that way.

keyBy is a possibly dangerous simplification of groupBy because collisions in key are resolved in an arbitrary way. The proposed groupMapReduce provides this functionality in a principled way (i.e. makes you specify what you want to do in case of collisions).

@ShaneDelmore - I’m not sure why you would want one less than takeWhile. The natural meaning of takeUntil (in analogy with 0 until 5) is that takeUntil(p) takes everything before but not including that element where p returns true; and this is exactly what you get from takeWhile(!p). It is takeTo, where the last element is included, that is difficult to achieve with the current methods.

I think takeTo and friends would be helpful.

3 Likes

I think we should take inspiration from what Spark added, because that’s both clearly successful and very well known (the same API is also in all other Spark frontends, which means it’s the de facto standard for data science now).

Examples are reduceByKey and operations that treat a list of pairs as a map.

Overall, however we have to be very careful here. It’s easy to make a case for each individual addition. But if we add them all we will end up with a mess, and its easy to make a case against that, too :slight_smile:. The problem is how to balance one against the other.

One possible solution that we should explore further is to put new (and also some old?) operations in decorators as opposed to the core classes. That should be doable for any method that does not need an override in a subclass, and I would hope that’s the majority. That way we could experiment with different “bundles” that cover different functionalities. We’d also make the core library lighter, which is very important in some cases.

5 Likes

how about unfold() for Iterable/Seq? i’ve always found a need to it and i think it will be a great addition for either.

3 Likes

I’m not familiar with unfold - what does it do?

@NthPortal unfold takes a value and a generator to create a collection. https://github.com/tpolecat/examples/blob/master/src/main/scala/eg/Unfold.scala is the first example I found.

Ah, thanks. I was thinking it was on the instance (rather than the factory), which had me confused.

Is it sufficiently different from iterate to be included though?

1 Like

I like @odersky’s suggestion of, at least for now, putting many of the suggestions in decorators in a separate library. That way they have room to evolve a bit without needing standard-library-level stability.

Another possible addition to think about: https://github.com/lihaoyi/geny/blob/master/geny/shared/src/main/scala/geny/Generator.scala (that’s all the src/main/scala code in geny)

Readme: https://github.com/lihaoyi/geny/blob/master/readme.md

.intersperse on Seq (or whatever you want to call it). Take one from this seq, then one from that seq, then one from this seq…

1 Like