3 std lib additions: convert(map &+ filter) = -O(n/2) ? and other collection methods

Hi All,

Here are three SIP proposals that I came across while doing a deep dive on Scala over the past few weeks (I am looking for remote work as my company Adligo contact [email protected] if your interested :slight_smile: )

I have given the proposals the temporary ids for sake of discussion;

Adligo .SIP?. A , Adligo .SIP?. B , Adligo .SIP?. C

Adligo .SIP?. A ) This is a simple idea to combine the ‘map’ and 'filter ’ methods into a single method ‘convert’ in order to reduce the number of traversals from two to one in certain scenarios as follows;

map[A] takes ( A => B) and returns either a Iterator[B] or collection[B]

filter[B] takes ( A => Boolean) and returns either a Iterator[A] or collection[A]

I am suggesting an additional method;

convert[B] takes (A => (Boolean, B)) and returns either a Iterator[B] or collection[B]

Note the function passed to convert returns a boolean (to filter or not) in the left tuple slot, and the converted/mapped value in the right tuple slot.

There is a discussion on the topic here, with some solutions that I feel could be greatly improved on with a convert implementation;

(https://stackoverflow.com/questions/32234132/how-to-combine-filter-and-map-in-scala

I have created a rough proof of concept here;

(https://github.com/adligo/scala/blob/2.13.x/src/library/scala/collection/immutable/List.scala
(https://github.com/adligo/scala/blob/2.13.x/test/junit/scala/collection/immutable/Fruit.scala
(https://github.com/adligo/scala/blob/2.13.x/test/junit/scala/collection/immutable/ListTest.scala

I think there are a large number of places where this ‘convert’ idea could be used i.e.;

groupConvert

groupConvertReduce

etc.

I think this will reduce the time complexity of the programs by roughly O(n/2) in many cases for large collections!

Adligo .SIP?. B ) l would like Java convenience methods ’ containsAll ', ’ removeAll ', ’ retainAll ’ added to the Scala collections and additional convenience methods for Maps ‘containsAllKeys’, ‘removeAllKeys’, ‘retainAllKeys’. This also has a external discussion going;

(https://stackoverflow.com/questions/28705781/with-scalas-set-is-there-a-method-analog-to-the-containsall-method-in-javas-s

Adligo .SIP?. C ) Scala has a cool method 'combinations( tupleSize : Int)’ , I would like a method ‘combinations()’ added that calculates all the combinations. This is mostly for programming competitions :slight_smile:

That’s it, again I’m looking for remote work ( [email protected] ). If I find work I can dedicate some time to implementing much of this, if others also think it’s a good idea. I don’t see it changing that much for Scala3, but I also don’t know much at all about Scala3 :)_

Merry XMas,

Scott

Is there any particular reason the accepted answer from that SO question (use collect) doesn’t work for you?

3 Likes

Well yes, I misread the {} for () since I’m coming from the Java background and I still don’t know what all of the syntax rules are. I actually thought collect was broken, since perhaps I haven’t taken enough of the Scala coursera classes. I now see that it works, and I have gone down yet another rabbit hole.

As mentioned above. For your first proposal, there is already collect. Also, any time you have a long chain of transformations on a single collection, you can take advantage of laziness (e.g. iterators, views, streams) to apply all the transformations in just one iteration.

For your second proposal, you can implement all of those (and many more) using already defined methods. And if you prefer, you may define them as extension methods. Nevertheless, if you still consider that they are common enough to be part of the STD lib, IMHO, it would be better to just open a Pull Request with the implementation.

Finally, your third proposal can be easily implemented using the unfold method on the Iterator companion. And again, IMHO, this is uncommon enough to be part of the STD lib.

2 Likes

Also how would you do a collect to reduce the filter traversal in conjunction with a groupMap or groupMapReduce ? I do think there are places where the idea of Convert would help. For example you could do a filter().groupMap() or a groupMap().filter() but groupConvert() would reduce the second method and therefore the 2nd traversal.

PR = Patch Request?

I added a implementation of groupConvert to better illustrate my idea;

  1. https://github.com/adligo/scala/blob/2.13.x/src/library/scala/collection/Iterable.scala
  2. https://github.com/adligo/scala/blob/2.13.x/test/junit/scala/collection/immutable/ListTest.scala

I have changed the title from “SIP” to “std lib additions”, since this is more accurate. SIPs are only for changing the language itself.

1 Like

Nice implementation! I would change the signature to the following, for the sake of consistency with the existing collect operation:

def groupCollect[K, B](key: PartialFunction[A, K])(f: PartialFunction[A, B]): Map[K, CC[B]]

However, I’m not sure this is general enough to be in the scala-library. If there a high demand you could submit a PR to the scala-collection-contrib project, to add it as a decorator. In the meantime, you can implement such a decorator in a separate library :slight_smile:

3 Likes

What I find I want to do is to apply a mapping to entire groups, for example when denormalizing joined database records, something like:

def mapGroup[K, B](key: A => K)(f: List[A] => B): Map[K, B]

It’s not uncommon to want to check if the key exists, so in this case collectGroup (and potentially groupCollect) would make sense, however I’m not seeing a scenario where the mapping also needs to be a partial function.

def collectGroup[K, B](key: PartialFunction[A, K])(f: List[A] => B): Map[K, B]

First thanks to everyone for reading my thoughts and providing great feedback!

I have renamed the Adligo.SIP* temporary ids to Adligo.Scala.LIB* since these no longer pertain to SIPs.

I have replaced groupConvert and groupConvertReduce with six methods mostly for completeness, this would also allow for easy refactoring. For example if you started with groupMap, and then needed to change the Map operation to a Collect operation for some reason.

groupCollect

groupCollectReduce

refineCollect

refineCollectReduce

refineMap

refineMapReduce

I messing around with PartialFunctions…

Also refine in the above method names;

refine = filter and group in a single traversal

PR = Pull Request

WRT the main proposal, I haven’t seen much use case for these sort of operations. The closest are probably .map(f).toMap (mostly used to derive keys and/or values) and .groupBy(f).map(g) (used to group into a map, then dedup the values).

We’ve got an internal library with helpers for these, but (as you can probably see from the signatures, there isn’t much filtering going on):

final class StdMapFactories[F[_], E](val fe: F[E]) extends AnyVal {
 def asMapWithKey[K](keyFunction: E => K)(implicit F: Foldable[F]): Map[K, E]
 def asMapWithValue[V](valueFunction: E => V)(implicit F: Foldable[F]): Map[E, V]
 def asMap[K, V](keyFunction: E => K, valueFunction: E => V)(implicit F: Foldable[F])
 def groupUsing[M[_]: Applicative][K, V](kf: E => K, vf: E => V)(implicit F: Foldable[F], S: Semigroup[M[V]]): Map[K, M[V]]
}
1 Like

julienrf & jcracknell

I have updated my implementation to use the PartialFunctions thanks for the suggestion!

morgen-peschke and others;

I agree that these six suggested methods (Adligo.Scala.LIB?.A) are fringe use cases, however IMHO they should be included with the standard API for a few reasons including;

  1. refineMap, refineMapReduce provide a simple way to refactor a groupMap and groupMapReduce by adding a filter into the group section. I think refineMap, refineMapReduce would also be the most common of the six methods.

  2. The other four methods are totally fringe cases, but I added them mostly for completeness. If for some reason someone wanted to update some code that was using any of refineMap, refineMapReduce, groupMap or groupMapReduce with a PartialFunction they would have the option available in the Scala API.

For Adligo.Scala.LIB?.A I have simply done the full Matrix of Functions and PartialFunctions for groupMap and groupMapReduce, adding some names to prevent type erasure.