Standard Library: Now open for improvements and suggestions!

But that already happened with the collections rewrite for 2.13. Even though Traversable is a more general concept (where control of progression is with the collection), it turned out that there were no collections for which an Iterator (where control of progression is inverted and placed with the caller) was impractical. So it was dropped.

And View was reformulated so that it acts almost identically to () => Iterator. When you need to do something once, you use an Iterator. If you need to use it multiple times, but you don’t want to build a copy and use that, you use a View. It can be more optimized than () => xs.iterator.<iterator operations>, but isn’t guaranteed to be. Is it worth the overhead, or should we tear it out? I’m not sure it’s used enough to have been worth building, but since it’s there, may as well leave it.

That’s CollectionType.newBuilder[EltType]. You terminate it with .result().

If it’s awkward to string along the builder, you can also create an iterator and use to(CollectionType). If you want to defer the decision, use the generic Buffer[EltType] and to(TargetCollectionType).

I really don’t see how it could be dramatically easier. Maybe we need to emphasize it more. But it’s already pretty awesome.

Did you know you can parallelize operations without bringing in the parallel collections dependency, if you’re on the JVM? Use scala.jdk.StreamConverters and your collections can become Java Streams with .asJavaParStream and then toScala(TargetCollection) to get back–or you can fragment the work yourself by using collection.stepper and the methods on that.

The collections are already really awesome. They are big, true; the biggest problem is getting lost in them. The second biggest problem is that they’re nominally extensible, but there are a lot of tricks to extending them, so it’s easier to just give up. The third biggest problem is that if you want something that works generically over all collections, you need some pretty advanced concepts, but it actually is only a few lines of code. The fourth biggest problem is that there aren’t many high-level methods for mutable collections. They’re just not straightforward (which maybe isn’t unreasonable, given that this is employing some very heavy lifting to make everything work like magic). All these problems are intentional tradeoffs, though. If it had been done differently, something else would suffer, sometimes a lot. Small collections = re-invent the wheel a lot. Inextensible = your stuff can never “just work”. Easy application = dedicated compiler magic that you can never have for your library and/or collections that lose types left and right (like Java). Less mutable support = concession to immutable-as-typical and already-too-big.

So we can discuss, but I think the existing hierarchy has an awful lot going for it.

3 Likes

You don’t need capture checking / capabilities for that. Any old context parameter will do.

class C():
  def +=(i: Int)(using C.Token): Unit = println(i)
object C:
  opaque type Token = Unit
  inline def build(inline f: Token ?=> C): C =
    f(using ((): Token))

This pattern allows access to methods on one class only in the context of a particular method call, which you can arrange however is convenient (e.g. to unlock builder capabilities).

But in the case of builders, having an explicit snapshot() method would probably do the trick too. It’s not that hard to sequence things in the order you mean; the trick with builders is knowing whether result() means that you’re done, or that you’re going to keep going, and for that to be safe. But you can just declare your intent, and it’s all good.

Capture checking would prevent smuggling the Token out by grabbing it in a closure. Granted that’s not very likely, but it can happen.

1 Like

Yes, so in addition to ListBuilder explicitly allowing the pattern, I would also need List.newBuilder to guarantee that it returns a ListBuilder (or use ListBuilder directly; I’ve gone back and forth on that but I like the idea of using <Type>.newBuilder for all the collections).

But the entire point of the feature is to help you keep your additions straight.

Why would you try to circumvent a feature which is there to help you avoid mistakes, unless there was a very good reason that meant that the class was actually not useful for you otherwise?

Given that Scala private often is actually JVM public, and given that JVM private is an setAccessible(true) away from being seen, and Scala immutable is JVM mutable most of the time, we’re absolutely beset by ways to intentionally do the thing that makes things tricky and dangerous.

The advantage of the permission token method is that you never see the token by default. It just looks like:

val b = Buffer.create[Int]
val c = b.build:  // Or Buffer.build--can work however you want
  b += 2
  b += 3

So you really have to want to grab it. The scoping solution makes grabbing the functionally active class more tempting.

Just scoping is fine, too, most of the time. But a permission token adds an extra level of safety because as implicit context you don’t name it.

Simple proposal:

overload for zipAll(other) with just one parameter, that throws if both collections are not the same length.

I noticed that all of the zip() usages in my code actually zip same length sequences / iterators, it would be nice to be explicit about it to catch bugs.

3 Likes

For example, the standard library needs non-empty collection types so that groupBy’s type can be properly expressed (the values of the returned Map are always non-empty).

+1 for non-empty collection types

I would add that Scala is a very good language at expressing domain invariants, and in my experience non-empty collections are a very common invariant, so it would be consistent with Scala’s design goals to have such types in the standard library. The fact a lot of Scala ecosystems are coming-up with their own NonEmptyList implementation is to me a notable sign that it is lacking from the standard library.

In an ideal world we would have non-empty versions of every collection type, but at least one non-empty type by family (Seq, Set, Map) would actually help me a lot in my day to day work.

There is the cons :: type in the STL but its methods are inherited from List so (1 :: Nil).map(_ + 1) returns a List and the non-empty property is then lost by the type system.

Why not use cats.data.NonEmptyList? Because it is not standard-library idiomatic and tends to push you deeper into the cats ecosystem than you actually need to, here are a few examples:

  • cats.data.NonEmptyList doesn’t have a foreach(f: x => Unit): Unit method
  • conversions to NonEmptySet or NonEmptyMap require cats.kernel.Order instances because they are sorted implementations (unlike the base standard library Set and Map types)
  • and so on…

I think writing “plumbering” code to convert from/to standard library types whenever I need it pollutes the domain logic expressed in the code. I also want utility types that solve my problem without requiring me to adapt a significant part of the codebase.

org.scalactic.anyvals.NonEmptyList/Set/Map are closer to what I need and integrates better with standard library code thanks to implicit conversions, but Scalactic is not really popular so third-party libraries never provide typeclasses for these types.

I don’t have much experience with zio.prelude, it also provides implicit conversions to integrate with standard library code, but it is not free of zio-ism and it feels heavy to bring zio and zio-streams dependencies just for a NonEmptyList type when you are not using ZIO effects.

On a different subject, built-in traverse utilities for standard library types like Either and Option would also be helpful (like Future.sequence).

EDIT : Forgot to add that while implicit conversions may help, it would be way more convenient to have NonEmptySeq/Set/Map types be sub-types of Seq/Set/Map.

3 Likes

Huge +1 for all that :clap:

I’m going to offer a counterpoint here, even though I sometimes also prefer the simplicity of iterators:

intermediary views are hard to pass around.

I don’t get what’s hard to pass around in views (seems as hard as an iterator)

and they are inefficient if used multiple times.

That is true, but iterators can have undefined behavior if used multiple times. So I think the performance issues are still worth it.

I would argue that one should always use views unless the underlying collection cannot be kept in memory.