Which operations should be included in the new collections?

julienrf · September 13, 2017, 5:04pm

Hi all,

We are considering adding new operations to the standard collections in 2.13. But, obviously, we don’t want to bloat the collections with niche operations. We have to find which operations are important and useful enough to the community.

To this end, I’d like to gather your opinion about the inclusion of a dozen of new operations that are described in the following gist: https://gist.github.com/julienrf/79cc1937530a3b9e26de930794a16042

Please give me your opinion by filling the following poll: https://goo.gl/forms/TWcrH81MoRppvXwN2

Note: you should consider the meaning of the operation rather than details like its name, but feel free to comment about such details on the corresponding github issues (linked from the gist).
Note: if you think of some important operation that should be also included, please open an issue in the github repository.

OlegYch · September 13, 2017, 5:20pm

Please consider operations defined on scalaz.Foldable, i find selectSplit in particular to be in demand.
Otherwise ‘lazy folds’ are certainly welcome.

MasseGuillaume · September 13, 2017, 5:23pm

I think Christopher Vogt is spot on with his extensions:

github.com

cvogt/scala-extensions/blob/master/src/main/scala/collection.scala#L1


package org.cvogt.scala.collection
import scala.collection._
import scala.collection.generic.CanBuildFrom
import scala.annotation.tailrec
import scala.collection.mutable.Builder


    
object `package`{
implicit class SeqLikeExtensions[A, Repr](val coll: SeqLike[A, Repr]) extends AnyVal{
  /** type-safe contains check */
  def containsTyped(t: A) = coll.contains(t)

groupWith
distinctWith
containsDuplicatesBy
containsDuplicatesWith
foldWhile
reduceWhile
…

SethTisue · September 13, 2017, 6:32pm

a related meta-ticket with some candidates: https://github.com/scala/bug/issues/8958

also, I’m a distinctBy fan: https://github.com/scala/collection-strawman/issues/175#issuecomment-318902270, https://github.com/scala/scala/pull/3850

shawjef3 · September 13, 2017, 8:49pm

I find myself wanting a function on a Seq[(K, A)] to Map[K, Set[A]] or
Map[K, Seq[A]]. Note that this is like groupBy, but doesn’t retain the key
in the value collection.

MasseGuillaume · September 13, 2017, 9:21pm

This looks like we have something close to Map[K, Set[A]] it’s MultiMap and it’s mutable, bummer.

Multimaps can also be ordered and unordered (the scala mutable MultiMap is unordered).

ShaneDelmore · September 14, 2017, 2:11am

Besides the already mentioned methods I often find myself missing takeUntil (when I want one element less than I can achieve with takeWhile) and intercalate.

1gnition · September 14, 2017, 8:38am

distinctBy would be great

jatcwang · September 14, 2017, 11:26am

Frequently finding myself needing Seq[Optoin[A]] => Seq[A], getting rid of all the Nones. Sure you can use flatten but that’s terrible for readability. In JS land’s Lodash it’s called compact.

Seq(Some(1), None, Some(5)).compact == Seq(1, 5)

Also, Seq[T] => (T => K) => Map[K, T] which is called keyBy in Lodash.

Seq(1, 2, 3).keyBy(_.toString) == Map(
  "1" -> 1,
  "2" -> 2,
  "3" -> 3
)

Also got quite a few use of Seq[(K, A)] => Map[K, Seq[A]] (note we do not get rid of duplicate As in the same group like @shawjef3’s usage.

joshlemer · September 14, 2017, 2:50pm

This operation is supported in the current collections through the flatten method

assert(Seq(Some(1), None, Some(5)).flatten == Seq(1,5))

Ichoran · September 14, 2017, 3:53pm

@jatcwang - compact is called flatten. We already have the functionality, and many people are used to the existing name; this isn’t going to change. I am not sure why you think flatten is terrible for readability (since it does take a nested and thus “non-flat” representation and makes it flat, whereas compact could also mean “don’t retain any buffer for further modifications”), but anyway, it’s established and thus will stay that way.

keyBy is a possibly dangerous simplification of groupBy because collisions in key are resolved in an arbitrary way. The proposed groupMapReduce provides this functionality in a principled way (i.e. makes you specify what you want to do in case of collisions).

Ichoran · September 14, 2017, 3:55pm

@ShaneDelmore - I’m not sure why you would want one less than takeWhile. The natural meaning of takeUntil (in analogy with 0 until 5) is that takeUntil(p) takes everything before but not including that element where p returns true; and this is exactly what you get from takeWhile(!p). It is takeTo, where the last element is included, that is difficult to achieve with the current methods.

I think takeTo and friends would be helpful.

odersky · September 14, 2017, 4:21pm

I think we should take inspiration from what Spark added, because that’s both clearly successful and very well known (the same API is also in all other Spark frontends, which means it’s the de facto standard for data science now).

Examples are reduceByKey and operations that treat a list of pairs as a map.

Overall, however we have to be very careful here. It’s easy to make a case for each individual addition. But if we add them all we will end up with a mess, and its easy to make a case against that, too . The problem is how to balance one against the other.

One possible solution that we should explore further is to put new (and also some old?) operations in decorators as opposed to the core classes. That should be doable for any method that does not need an override in a subclass, and I would hope that’s the majority. That way we could experiment with different “bundles” that cover different functionalities. We’d also make the core library lighter, which is very important in some cases.

weihsiu · September 14, 2017, 4:38pm

how about unfold() for Iterable/Seq? i’ve always found a need to it and i think it will be a great addition for either.

NthPortal · September 14, 2017, 4:58pm

I’m not familiar with unfold - what does it do?

shawjef3 · September 14, 2017, 5:19pm

@NthPortal unfold takes a value and a generator to create a collection. https://github.com/tpolecat/examples/blob/master/src/main/scala/eg/Unfold.scala is the first example I found.

NthPortal · September 14, 2017, 6:31pm

Ah, thanks. I was thinking it was on the instance (rather than the factory), which had me confused.

Is it sufficiently different from iterate to be included though?

nafg · September 14, 2017, 6:42pm

I like @odersky’s suggestion of, at least for now, putting many of the suggestions in decorators in a separate library. That way they have room to evolve a bit without needing standard-library-level stability.

nafg · September 14, 2017, 6:45pm

Another possible addition to think about: https://github.com/lihaoyi/geny/blob/master/geny/shared/src/main/scala/geny/Generator.scala (that’s all the src/main/scala code in geny)

Readme: https://github.com/lihaoyi/geny/blob/master/readme.md

Holothuroid · September 14, 2017, 6:47pm

.intersperse on Seq (or whatever you want to call it). Take one from this seq, then one from that seq, then one from this seq…