Which operations should be included in the new collections?

That would most naturally be called interleave, and can be done already by Seq(xs, ys).tranpose.flatten or (xs zip ys).flatMap{ case (x, y) => Seq(x, y) } among others.

The best methods to include are ones that are either extremely common use cases or extremely difficult to emulate using a few other higher-order methods. For instance, distinctBy really can’t be accomplished any other convenient way.

2 Likes

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

i also have another version of distincBy which allows to select which duplicate to preserves

it doesn’t look like rest of standard collection though

1 Like

I also found keyBy useful, but named it toMapWithKey

toMapWithKey

scala> val m = List("1", "2", "1").toMapWithKey(_.toInt)
m: scala.collection.immutable.Map[Int,String] = Map(1 -> 1, 2 -> 2)

+1 for Seq[(K, A)] => Map[K, Seq[A]], I use it quite often

toCompleteMap

scala> val cm = List(1 -> "1", 2 -> "2", 1 -> "11").toCompleteMap
cm: scala.collection.immutable.Map[Int,List[String]] = Map(2 -> List(2), 1 -> List(1, 11))

also

mapToMap can be seen as more efficient replacement for map().toMap combination

scala> val m = List("1" -> "one", "2" -> "two").mapToMap { case (i, s) => i.toInt -> s }
m: scala.collection.immutable.Map[Int,String] = Map(1 -> one, 2 -> two)

withFrequency

scala> val fm = List("a", "b", "c", "a", "b", "d").withFrequency
fm: scala.collection.immutable.Map[String,Int] = Map(b -> 2, d -> 1, a -> 2, c -> 1)

withFrequencyBy

scala> val fm = List("ab", "bc", "cd", "ab", "bc", "de").withFrequencyBy(_.head)
fm: scala.collection.immutable.Map[Char,Int] = Map(b -> 2, d -> 1, a -> 2, c -> 1)
2 Likes

A lot of grips have already been mentioned here, however my personal annoyance is not having a flatten or compact that works on a map. For example, if I have a map of optional fields (lets say that represent query string parameters to a Http call) I may have something like this

val queryParams: Map[String, Option[String]] = Map(
   "first_name" -> Option(firstName)
   "last_name" -> Option(lastName)
   "married" -> married.map(_.toString)
)

The main point is we have a Map[String, Option[String]] and we want to flatten this to a Map[String, String]. Other collections (such as having a Seq[Option[String]] support this with the flatten method, but there is no such method for a Map[String, Option[String]] so I end up having to do stuff like this

val queryParams: Map[String, Option[String]] = Map(
   "first_name" -> Option(firstName)
   "last_name" -> Option(lastName)
   "married" -> married.map(_.toString)
).collect {
  case (k, Some(v)) => (k,v)
}

All of the time. At least the above would be a small quality of life change (i.e. support for flatten on Map[K, Option[V]]

1 Like

When I see flatten, I expect it to be called on Seq[Seq[A]] or Option[Option[A]]. It’s the same reason why collectionOfStuff.map(_.map(someOp)) is terrible for readibility – it’s overly generalized IMO. I do feel like compact is too overly specialized to be in the standard library so I’m happy to define it as my own extension method.

groupMapReduce sounds pretty awesome!

I actually use it that way pretty rarely – if I have a straightforward nested collection like that, I’m usually using a for comprehension on it. I almost always use flatten on a Seq[Option[A]] – that’s probably 90% of the uses I encounter…

here’s a few I already found my self in need of and implemented on our util lib:

partition3:

partitionWith:

distinctBy:

unfold:

sequence & traverse with Option & Try

also remember a disscussion on groupWhile, but can’t find it now… Maybe also(instead?) generalize it to foldLeftWhile (and equivalent foldRightWhile)

4 Likes

Regarding distinctBy and other things that create Map[A, Seq[B]] or Map[A, Set[B]].

These are the wrong collections to use. You want a Multimap. Look at Guava’s: https://github.com/google/guava/wiki/NewCollectionTypesExplained In particular, grok why Multimap, Multiset, and BiMap exist. Map[A, Set[B]] is very awkward. what if you want to remove or add a value? If you remove all the values for a key, should the result have an empty Set in there or should the key be removed?

Really, we need either:

  • some scala API wrappers to make using those Guava things easier and Scala idiomatic (unfortunately, no immutable variants exist though)

OR

  • actual multimaps / multisets

True multimaps / multisets are very useful and cut down on code boilerplate a lot more than a single method to take Seq[(A, B)] =>Map[A, Seq[B]] would. I think such a method would eventually replaced by Seq[(A, B)].toSeqMultimap and should not be included in the stdlib.

7 Likes

Yeah, we definitely need multimaps. Not so sure about multisets; there are two flavors, counted sets and isomorphisms to multimaps. 2.13 would be a good time to introduce an implementation. (The cake-based version with mutable collections didn’t work out too well in the old collections.)

1 Like

What’s often missing for me (most needed things first):

  • Option.fold with variance that would make it close to Option.map().getOrElse(). Currently it isn’t the case because Option.fold has two parameter lists and the first one strictly determines the result type. Thus I can do intOpt.map(x => Right(x + 5)).getOrElse(Left(“x”)) and it will correctly result in Either type, but trying to do intOpt.fold(Left(“x”))(x => Right(x + 5)) results in compilation error. I have to put types explicitly somewhere to make fold work, which is worse than using map + getOrElse combo.
  • sequence (in companion objects) on many types, eg Try, Either, (maybe List), Option, etc An ugly workaround is to use partition + downcasting elements on the resulting collections, but that’s quite a lot of code that could be done by a simple sequence invocation.
  • unfold - eg Iterable.unfold and Iterator.unfold would probably suffice (to get lists, vectors etc one could call toList, toVector, etc). iterate is not a replacement because iterate requires the size of resulting collection upfront and unfold specifically frees me from that requirement.
  • making Option an collection directly (well, that could be controversial), because sometimes the implicit conversion to iterable doesn’t cut it
2 Likes

@julienrf Regardless included operations, I hope IterableOps will extend a new MonadOps trait.

trait MonadFactory[+CC[_]] {
  def pure[A](a: A): CC[A]
}

/**
  * @tparam Root The root type of a type family.
  *         For example:
  *         - `Root` is `Iterable` when `CC` is a collection.
  *         - `Root` is `Option` when `CC` is `Option`.
  */
trait MonadOps[+A, +CC[_], Root[_]] {
  def map[B](f: A => B): CC[B] = {
    flatMap { a =>
      monadFactory.pure(f(a)).toRoot
    }
  }

  // General version of IterableOps.flatMap
  def flatMap[B](f: A => Root[B]): CC[B]

  // General version of IterableOps.toIterable
  def toRoot: Root[A]

  def monadFactory: MonadFactory[CC]
}

Ideally, this MonadOps will not only be the super type of collections, but also super types of Option, Either, Try, TailRec and Future. So that Scala standard library users can create general monadic code or general monad transformers that support all those types.

This encoding of monad has been discussed at https://github.com/typelevel/general/issues/81

2 Likes
class Map[A, B] {
    def filterKeys(f : A => Boolean) = filter(kv => f(kv._1))
    def filterValues(f : B => Boolean) = filter(kv => f(kv._2))
}

class TraversableOnce[A] {
    def toMapSafe[B, C](implicit ev : A =:= (B, C)) = { ... }
}

^^^ Same as this.toMap, but throws exception if keys weren’t unique during construction of map. Accidential overwrites of values due to key collisions = opportunity for nondeterministic bugs. True story.


class Set[A] {
    def isDisjointWith[B >: A](that : Set[B]) = forall(x => !that.contains(x))
    def CACHE_LOOKUP(a : A) : Option[A] = { ... }
}

^^^ I don’t have a good proposal for the name of CACHE_LOOKUP, but it would work follows:
if (this.contains(a)), then it would return Some(the element contained in the set, that is found equal to a), not Some(a).
else None
Without such method, I’m sometimes forced to use Map[A, A] for memoization, where Set[A] would be enough. Requires intimate knowledge of the representation to implement efficiently, I believe.


class Seq[A] {
    def areElementsDistinct = distinct.size == size
    def areElementsSorted(implicit o : Ordering[A]) = 
        (size <= 1) || init.zip(tail).forall { case (a, b) => o.compare(a, b) <= 0 }
}

In general, I would avoid adding ops that are useful but less frequently needed because they could easily be added as extension methods.

Here are my own extensions which I found useful: https://github.com/Sciss/KollFlitz

especially:

List(13, 5, 8, 21, 3, 8).counted == Map(3 -> 1, 5 -> 1, 8 -> 2, 13 -> 1, 21 -> 1)
List("a1", "b1", "a2", "b3", "c2", "c3").toMultiMap(_.head)(_.drop(1).toInt) == Map(b -> Vector(1, 3), a -> Vector(1, 2), c -> Vector(2, 3))
List(13, 5, 8, 21, 3, 8).groupWith(_ > _).toVector == Vector(List(13, 5), List(8), List(21, 3), List(8))
List(13, 5, 8, 21, 3, 8).mapPairs(_ - _) == List(8, -3, -13, 18, -5)
foreachPairs

If something is useful, put it in the collections. Don’t hide it. Hiding functionality behind an import blocks discoverability.

Monad is not a good name for those types that has different CC from this.type.

It is actually a “right” Kan extension. See, it looks like scalaz.Ran.

IterableOps can map to CC, so it is a functor in category theory.

However, since IterableOps can be a different type from CC, it’s not an endofunctor.

We can do better than Haskell here. All Haskell Functors are endofunctors, as a result, it is impossible to create a XxxOps as a Functor in Haskell. I guess this restriction is due to lack of multi-parameter type class in early Haskell version.

A potential problem with MonadOps is that it could only handle the Iterable case. “Sorted Iterable” requires extra implicits and for Map you’re defining the monad only at the Iterable level with a (K,V) element type, rather than the desirable monad with element type V.

It is an Ops, which means that it allows implicit views from other types. Fortunately implicit views may have implicit parameters.

Hi, thank you all for your comments and participation!

I couldn’t find an easy way to share the nice charts that Google forms builds from the poll responses, but the raw data is available here: https://docs.google.com/spreadsheets/d/1hCAcD2ioPI2zCyav2nIt3a4T8I0tJ7_5k2ScVqd9kj4/edit?usp=sharing

In general the people who have replied to the poll are rather open to introduce new operations on the collections. The operations that you need most seem to be zipWith and groupMap.

You also mention several other features that were missing from the poll. I noticed that traverse/sequence was frequently mentioned, as well as the Multimap collection type. I’ve created a couple issues in the repository according to your feedback.

1 Like