Which operations should be included in the new collections?

Ichoran · September 14, 2017, 7:14pm

That would most naturally be called interleave, and can be done already by Seq(xs, ys).tranpose.flatten or (xs zip ys).flatMap{ case (x, y) => Seq(x, y) } among others.

The best methods to include are ones that are either extremely common use cases or extremely difficult to emulate using a few other higher-order methods. For instance, distinctBy really can’t be accomplished any other convenient way.

jozic · September 15, 2017, 3:13am

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

jozic · September 15, 2017, 3:23am

i also have another version of distincBy which allows to select which duplicate to preserves

github.com

jozic/scalax-collection/blob/master/src/main/scala/com/daodecode/scalax/collection/extensions/package.scala#L44


  * Example:
  * {{{
  *   scala> List(1 -> "one", 1 -> "ten", 2 -> "two", 2 -> "twenty").
  *        | distinctBy(_._1, takeFirst = _._2.length > _._2.length)
  *   res1: List[(Int, String)] = List((1,ten), (2,twenty))
  * }}}
  *
  * @return  A new $coll which contains selected occurrence of every element of this $coll.
  * @since 0.1.1
  */
def distinctBy[B](f: A => B, takeFirst: (A, A) => Boolean)(implicit cbf: CanBuildFrom[Repr, A, Repr]): Repr = {
  val seen = mutable.LinkedHashMap.empty[B, A]
  for (x <- seqLike) {
    val fx = f(x)
    seen.get(fx) match {
      case Some(a) => seen += fx -> (if (takeFirst(a, x)) a else x)
      case _ => seen += fx -> x
    }
  }
  (cbf(seqLike.repr) ++= seen.values).result()
}

it doesn’t look like rest of standard collection though

jozic · September 15, 2017, 3:55am

I also found keyBy useful, but named it toMapWithKey

toMapWithKey

scala> val m = List("1", "2", "1").toMapWithKey(_.toInt)
m: scala.collection.immutable.Map[Int,String] = Map(1 -> 1, 2 -> 2)

+1 for Seq[(K, A)] => Map[K, Seq[A]], I use it quite often

toCompleteMap

scala> val cm = List(1 -> "1", 2 -> "2", 1 -> "11").toCompleteMap
cm: scala.collection.immutable.Map[Int,List[String]] = Map(2 -> List(2), 1 -> List(1, 11))

also

mapToMap can be seen as more efficient replacement for map().toMap combination

scala> val m = List("1" -> "one", "2" -> "two").mapToMap { case (i, s) => i.toInt -> s }
m: scala.collection.immutable.Map[Int,String] = Map(1 -> one, 2 -> two)

withFrequency

scala> val fm = List("a", "b", "c", "a", "b", "d").withFrequency
fm: scala.collection.immutable.Map[String,Int] = Map(b -> 2, d -> 1, a -> 2, c -> 1)

withFrequencyBy

scala> val fm = List("ab", "bc", "cd", "ab", "bc", "de").withFrequencyBy(_.head)
fm: scala.collection.immutable.Map[Char,Int] = Map(b -> 2, d -> 1, a -> 2, c -> 1)

mdedetrich · September 15, 2017, 8:42am

A lot of grips have already been mentioned here, however my personal annoyance is not having a flatten or compact that works on a map. For example, if I have a map of optional fields (lets say that represent query string parameters to a Http call) I may have something like this

val queryParams: Map[String, Option[String]] = Map(
   "first_name" -> Option(firstName)
   "last_name" -> Option(lastName)
   "married" -> married.map(_.toString)
)

The main point is we have a Map[String, Option[String]] and we want to flatten this to a Map[String, String]. Other collections (such as having a Seq[Option[String]] support this with the flatten method, but there is no such method for a Map[String, Option[String]] so I end up having to do stuff like this

val queryParams: Map[String, Option[String]] = Map(
   "first_name" -> Option(firstName)
   "last_name" -> Option(lastName)
   "married" -> married.map(_.toString)
).collect {
  case (k, Some(v)) => (k,v)
}

All of the time. At least the above would be a small quality of life change (i.e. support for flatten on Map[K, Option[V]]

jatcwang · September 15, 2017, 9:48am

When I see flatten, I expect it to be called on Seq[Seq[A]] or Option[Option[A]]. It’s the same reason why collectionOfStuff.map(_.map(someOp)) is terrible for readibility – it’s overly generalized IMO. I do feel like compact is too overly specialized to be in the standard library so I’m happy to define it as my own extension method.

groupMapReduce sounds pretty awesome!

jducoeur · September 15, 2017, 11:13am

I actually use it that way pretty rarely – if I have a straightforward nested collection like that, I’m usually using a for comprehension on it. I almost always use flatten on a Seq[Option[A]] – that’s probably 90% of the uses I encounter…

hochgi · September 15, 2017, 2:12pm

here’s a few I already found my self in need of and implemented on our util lib:

partition3:

github.com

thomsonreuters/CM-Well/blob/master/server/cmwell-util/src/main/scala/cmwell/util/collections/package.scala#L48-L65


def partition3[T, Coll[_]]
  (xs: Coll[T])
  (f: T => Int)
  (implicit ev: Coll[T] <:< TraversableLike[T,Coll[T]], cbf: CanBuildFrom[Coll[T], T, Coll[T]]): (Coll[T],Coll[T],Coll[T]) = {


  val b0 = cbf(xs)
  val b1 = cbf(xs)
  val b2 = cbf(xs)


  for(x <- xs) f(x) match {
    case 0 => b0 += x
    case 1 => b1 += x
    case 2 => b2 += x
    case n => throw new IllegalStateException(s"f($x) returned $n which is not in range of 0..2 and was not expected")
  }


  (b0.result(), b1.result(), b2.result())
}

partitionWith:

github.com

thomsonreuters/CM-Well/blob/master/server/cmwell-util/src/main/scala/cmwell/util/collections/package.scala#L92-L108


def partitionWith[A, B, T, Coll[_]]
(xs: Coll[T])
(f: T => Either[A,B])
(implicit ev: Coll[T] <:< TraversableLike[T,Coll[T]],
 cbf1: CanBuildFrom[Coll[T], A, Coll[A]],
 cbf2: CanBuildFrom[Coll[T], B, Coll[B]]): (Coll[A],Coll[B]) = {


  val b1 = cbf1(xs)
  val b2 = cbf2(xs)


  for(x <- xs) f(x) match {
    case Left(a) => b1 += a
    case Right(b) => b2 += b
  }


  b1.result() -> b2.result()
}

distinctBy:

github.com

thomsonreuters/CM-Well/blob/master/server/cmwell-util/src/main/scala/cmwell/util/collections/package.scala#L119-L136


def distinctBy[T, R, Coll[_]]
(xs: Coll[T])
(f: T => R)
(implicit ev: Coll[T] <:< TraversableLike[T, Coll[T]],
 cbf: CanBuildFrom[Coll[T], T, Coll[T]]): Coll[T] = {


  val builder = cbf(xs)
  builder.sizeHint(xs.size)
  val seen = MSet.empty[R]


  xs.foreach { elem =>
    if (!seen(f(elem))) {
      builder += elem
      seen.add(f(elem))
    }
  }
  builder.result()
}

unfold:

github.com

thomsonreuters/CM-Well/blob/master/server/cmwell-util/src/main/scala/cmwell/util/collections/package.scala#L266-L423


/**
 * Unfolder can be used to explicitly unfold to some collection type.
 * e.g:
 *
 * {{{
 * scala> import cmwell.util.collections._
 * import cmwell.util.collections._
 *
 * scala> Unfolder.unfold(Array.newBuilder[Int])(0 → 1){
 *      |   case (a,b) if a > 1000 ⇒ None
 *      |   case (a,b) ⇒ Some((b→(a+b))→a)
 *      | }
 * res0: Array[Int] = Array(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987)
 * }}}
 *
 * note this will also work for user defined collections,
 * if you'll explicitly supply a builder for it.
 */
object Unfolder {
  def unfold[CC[_], E, S](b: mutable.Builder[E, CC[E]])(z: S)(f: S => Option[(S, E)]): CC[E] = {

This file has been truncated. show original

sequence & traverse with Option & Try

github.com

thomsonreuters/CM-Well/blob/master/server/cmwell-util/src/main/scala/cmwell/util/collections/package.scala#L425-L459


implicit class TryOps(tryObj: scala.util.Try.type) {
  def sequence[A, M[X] <: TraversableOnce[X]](in: M[Try[A]])(implicit bf: CanBuildFrom[M[Try[A]], A, M[A]]): Try[M[A]] = {
    in.foldLeft(Success(bf(in)): Try[mutable.Builder[A,M[A]]]) {
      (tr, ta) => tr.flatMap { b =>
        ta.map(b.+=)
      }
    }.map(_.result())
  }


  def traverse[A, B, M[X] <: TraversableOnce[X]](in: M[A])(f: A => Try[B])(implicit bf: CanBuildFrom[M[A], B, M[B]]): Try[M[B]] = {
    in.foldLeft(Success(bf(in)): Try[mutable.Builder[B,M[B]]]) {
      (tr, a) => tr.flatMap { b =>
        f(a).map(b.+= )
      }
    }.map(_.result())
  }
}


implicit class OptionOps(optObj: scala.Option.type) {
  def sequence[A, M[X] <: TraversableOnce[X]](in: M[Option[A]])(implicit bf: CanBuildFrom[M[Option[A]], A, M[A]]): Option[M[A]] = {

This file has been truncated. show original

also remember a disscussion on groupWhile, but can’t find it now… Maybe also(instead?) generalize it to foldLeftWhile (and equivalent foldRightWhile)

scottcarey · September 15, 2017, 7:33pm

Regarding distinctBy and other things that create Map[A, Seq[B]] or Map[A, Set[B]].

These are the wrong collections to use. You want a Multimap. Look at Guava’s: https://github.com/google/guava/wiki/NewCollectionTypesExplained In particular, grok why Multimap, Multiset, and BiMap exist. Map[A, Set[B]] is very awkward. what if you want to remove or add a value? If you remove all the values for a key, should the result have an empty Set in there or should the key be removed?

Really, we need either:

some scala API wrappers to make using those Guava things easier and Scala idiomatic (unfortunately, no immutable variants exist though)

OR

actual multimaps / multisets

True multimaps / multisets are very useful and cut down on code boilerplate a lot more than a single method to take Seq[(A, B)] =>Map[A, Seq[B]] would. I think such a method would eventually replaced by Seq[(A, B)].toSeqMultimap and should not be included in the stdlib.

Ichoran · September 15, 2017, 7:58pm

Yeah, we definitely need multimaps. Not so sure about multisets; there are two flavors, counted sets and isomorphisms to multimaps. 2.13 would be a good time to introduce an implementation. (The cake-based version with mutable collections didn’t work out too well in the old collections.)

tarsa · September 15, 2017, 11:47pm

What’s often missing for me (most needed things first):

Option.fold with variance that would make it close to Option.map().getOrElse(). Currently it isn’t the case because Option.fold has two parameter lists and the first one strictly determines the result type. Thus I can do intOpt.map(x => Right(x + 5)).getOrElse(Left(“x”)) and it will correctly result in Either type, but trying to do intOpt.fold(Left(“x”))(x => Right(x + 5)) results in compilation error. I have to put types explicitly somewhere to make fold work, which is worse than using map + getOrElse combo.
sequence (in companion objects) on many types, eg Try, Either, (maybe List), Option, etc An ugly workaround is to use partition + downcasting elements on the resulting collections, but that’s quite a lot of code that could be done by a simple sequence invocation.
unfold - eg Iterable.unfold and Iterator.unfold would probably suffice (to get lists, vectors etc one could call toList, toVector, etc). iterate is not a replacement because iterate requires the size of resulting collection upfront and unfold specifically frees me from that requirement.
making Option an collection directly (well, that could be controversial), because sometimes the implicit conversion to iterable doesn’t cut it

yangbo · September 16, 2017, 2:11pm

@julienrf Regardless included operations, I hope IterableOps will extend a new MonadOps trait.

trait MonadFactory[+CC[_]] {
  def pure[A](a: A): CC[A]
}

/**
  * @tparam Root The root type of a type family.
  *         For example:
  *         - `Root` is `Iterable` when `CC` is a collection.
  *         - `Root` is `Option` when `CC` is `Option`.
  */
trait MonadOps[+A, +CC[_], Root[_]] {
  def map[B](f: A => B): CC[B] = {
    flatMap { a =>
      monadFactory.pure(f(a)).toRoot
    }
  }

  // General version of IterableOps.flatMap
  def flatMap[B](f: A => Root[B]): CC[B]

  // General version of IterableOps.toIterable
  def toRoot: Root[A]

  def monadFactory: MonadFactory[CC]
}

Ideally, this MonadOps will not only be the super type of collections, but also super types of Option, Either, Try, TailRec and Future. So that Scala standard library users can create general monadic code or general monad transformers that support all those types.

This encoding of monad has been discussed at https://github.com/typelevel/general/issues/81

marcinzh · September 16, 2017, 9:19pm

class Map[A, B] {
    def filterKeys(f : A => Boolean) = filter(kv => f(kv._1))
    def filterValues(f : B => Boolean) = filter(kv => f(kv._2))
}

class TraversableOnce[A] {
    def toMapSafe[B, C](implicit ev : A =:= (B, C)) = { ... }
}

^^^ Same as this.toMap, but throws exception if keys weren’t unique during construction of map. Accidential overwrites of values due to key collisions = opportunity for nondeterministic bugs. True story.

class Set[A] {
    def isDisjointWith[B >: A](that : Set[B]) = forall(x => !that.contains(x))
    def CACHE_LOOKUP(a : A) : Option[A] = { ... }
}

^^^ I don’t have a good proposal for the name of CACHE_LOOKUP, but it would work follows:
if (this.contains(a)), then it would return Some(the element contained in the set, that is found equal to a), not Some(a).
else None
Without such method, I’m sometimes forced to use Map[A, A] for memoization, where Set[A] would be enough. Requires intimate knowledge of the representation to implement efficiently, I believe.

class Seq[A] {
    def areElementsDistinct = distinct.size == size
    def areElementsSorted(implicit o : Ordering[A]) = 
        (size <= 1) || init.zip(tail).forall { case (a, b) => o.compare(a, b) <= 0 }
}

Sciss · September 17, 2017, 9:59am

In general, I would avoid adding ops that are useful but less frequently needed because they could easily be added as extension methods.

Here are my own extensions which I found useful: https://github.com/Sciss/KollFlitz

especially:

List(13, 5, 8, 21, 3, 8).counted == Map(3 -> 1, 5 -> 1, 8 -> 2, 13 -> 1, 21 -> 1)
List("a1", "b1", "a2", "b3", "c2", "c3").toMultiMap(_.head)(_.drop(1).toInt) == Map(b -> Vector(1, 3), a -> Vector(1, 2), c -> Vector(2, 3))
List(13, 5, 8, 21, 3, 8).groupWith(_ > _).toVector == Vector(List(13, 5), List(8), List(21, 3), List(8))
List(13, 5, 8, 21, 3, 8).mapPairs(_ - _) == List(8, -3, -13, 18, -5)
foreachPairs

shawjef3 · September 21, 2017, 12:36pm

If something is useful, put it in the collections. Don’t hide it. Hiding functionality behind an import blocks discoverability.

yangbo · September 22, 2017, 8:53am

Monad is not a good name for those types that has different CC from this.type.

It is actually a “right” Kan extension. See, it looks like scalaz.Ran.

yangbo · September 22, 2017, 9:00am

IterableOps can map to CC, so it is a functor in category theory.

However, since IterableOps can be a different type from CC, it’s not an endofunctor.

We can do better than Haskell here. All Haskell Functors are endofunctors, as a result, it is impossible to create a XxxOps as a Functor in Haskell. I guess this restriction is due to lack of multi-parameter type class in early Haskell version.

szeiger · September 22, 2017, 11:51am

A potential problem with MonadOps is that it could only handle the Iterable case. “Sorted Iterable” requires extra implicits and for Map you’re defining the monad only at the Iterable level with a (K,V) element type, rather than the desirable monad with element type V.

yangbo · September 22, 2017, 3:58pm

It is an Ops, which means that it allows implicit views from other types. Fortunately implicit views may have implicit parameters.

julienrf · September 28, 2017, 7:49am

Hi, thank you all for your comments and participation!

I couldn’t find an easy way to share the nice charts that Google forms builds from the poll responses, but the raw data is available here: https://docs.google.com/spreadsheets/d/1hCAcD2ioPI2zCyav2nIt3a4T8I0tJ7_5k2ScVqd9kj4/edit?usp=sharing

In general the people who have replied to the poll are rather open to introduce new operations on the collections. The operations that you need most seem to be zipWith and groupMap.

You also mention several other features that were missing from the poll. I noticed that traverse/sequence was frequently mentioned, as well as the Multimap collection type. I’ve created a couple issues in the repository according to your feedback.