A rant about collection.Factory

I complained about it when it was released, but maybe now, that 3 is out, everyone is less absent minded. The way that IterableFactory is implicitly convertible to a Factory is really nice, syntax-wise. However it robs a collection from the ability to provide an optimised conversion to certain types. For example, consider a Ranking: a Set where order of elements matter, or a Seq with unique elements. Due to sharing traits of both of the latter collections, while being neither, (i.e. fast contains, positional access), overriden toSet and toSeq can return a very light wrapper in O(1). No such luck with to though, as without really ugly reflection, I can’t recognize by a Factory instance or class from what target collection type it came.

I think we have a IndexedSeq, List as well as HashSet for the occasions when we want a specific implementation, for example for performance reasons. But Seq is as abstract as it gets, unlike in other functional languages, exactly so we can have various implementations.

Why was it thought necessary to remove ReusableCBF, which could be used to do various tricks like this? To be clear, I do not speak about CBF in general, as it was clearly a design choice which I understand even if it makes me a sad panda. If only the implicit conversion target Factory was always the same instance, or if Factory exposed an Option[IterableFactory[_]], anything, we could see more such optimisations. Was it a conscious decision to prevent this sort of thing, or it just sort of happened? This would be an extremely minor, both source and binary compatible change to the library.

This is especially useful with various non-generic collections, especially domain-specific collections: if I have a dedicated collection type, then I typically want that runtime type to be preserved when the collection is used. However, methods like toSeq are useful as a sort of explicit breakOut from pre-2.13: they allow, at O(1), to share the internal structure o the dedicated collection, but circumventing any additional eatures and switching to a standard collection semantics (this of course works better for immutable implementations). It is also one of the reasons I am not super happy about deprecating overriding of various ‘toXxx’ methods, as it makes third-party implementations much worse integrated than they could.

2 Likes

Hey @noresttherein,

One problem with toXxx methods (such as toList, toSet, etc.) is that they are redundant with the more general to operation. The to operation is more general in the sense that it works with many more collection types, including user-defined collection types.

That being said, I agree with your analysis that the current design makes some optimizations impossible to implement.
Maybe the to operation should take a BuildFrom instead of a Factory… I am not sure whether such a transition could be smoothly achieved, though.

Could you please elaborate on your suggestions that would be both source and binary compatible? Do you mind giving a concrete example?

Sure.
Currently we have:

implicit def toFactory[A, CC[_]](factory: IterableFactory[CC]): Factory[A, CC[A]] = new ToFactory[A, CC](factory)

  @SerialVersionUID(3L)
  private[this] class ToFactory[A, CC[_]](factory: IterableFactory[CC]) extends Factory[A, CC[A]] with Serializable {
    def fromSpecific(it: IterableOnce[A]): CC[A] = factory.from[A](it)
    def newBuilder: Builder[A, CC[A]] = factory.newBuilder[A]
  }

The simplest change which would make me happy would be:

implicit def toFactory[A, CC[_]](factory: IterableFactory[CC]): Factory[A, CC[A]] = ReusableFactory.asInstanceOf[Factory[A, CC]]
  private[this] val ReusableFactory = new ToFactory[Any, CC](factory)

  @SerialVersionUID(3L)
  private[this] class ToFactory[A, CC[_]](factory: IterableFactory[CC]) 
extends Factory[A, CC[A]] with Serializable {
    def fromSpecific(it: IterableOnce[A]): CC[A] = factory.from[A](it)
    def newBuilder: Builder[A, CC[A]] = factory.newBuilder[A]
  }

Then I could write:

override def to[C](factory :Factory[A, C]) :C =
   if (factory == Set.toFactory[Any]) SpecializedSet(this)
   else super.to(factory)

Of course, I am not depending on the above equality for correctness, only optimisations, so even if at some point a decision was made to depart from this solution, no code would break, just execute (much) moe slowly.

2 Likes

This looks very reasonable to me.

1 Like