Standard Library: Now open for improvements and suggestions!

But that already happened with the collections rewrite for 2.13. Even though Traversable is a more general concept (where control of progression is with the collection), it turned out that there were no collections for which an Iterator (where control of progression is inverted and placed with the caller) was impractical. So it was dropped.

And View was reformulated so that it acts almost identically to () => Iterator. When you need to do something once, you use an Iterator. If you need to use it multiple times, but you don’t want to build a copy and use that, you use a View. It can be more optimized than () => xs.iterator.<iterator operations>, but isn’t guaranteed to be. Is it worth the overhead, or should we tear it out? I’m not sure it’s used enough to have been worth building, but since it’s there, may as well leave it.

That’s CollectionType.newBuilder[EltType]. You terminate it with .result().

If it’s awkward to string along the builder, you can also create an iterator and use to(CollectionType). If you want to defer the decision, use the generic Buffer[EltType] and to(TargetCollectionType).

I really don’t see how it could be dramatically easier. Maybe we need to emphasize it more. But it’s already pretty awesome.

Did you know you can parallelize operations without bringing in the parallel collections dependency, if you’re on the JVM? Use scala.jdk.StreamConverters and your collections can become Java Streams with .asJavaParStream and then toScala(TargetCollection) to get back–or you can fragment the work yourself by using collection.stepper and the methods on that.

The collections are already really awesome. They are big, true; the biggest problem is getting lost in them. The second biggest problem is that they’re nominally extensible, but there are a lot of tricks to extending them, so it’s easier to just give up. The third biggest problem is that if you want something that works generically over all collections, you need some pretty advanced concepts, but it actually is only a few lines of code. The fourth biggest problem is that there aren’t many high-level methods for mutable collections. They’re just not straightforward (which maybe isn’t unreasonable, given that this is employing some very heavy lifting to make everything work like magic). All these problems are intentional tradeoffs, though. If it had been done differently, something else would suffer, sometimes a lot. Small collections = re-invent the wheel a lot. Inextensible = your stuff can never “just work”. Easy application = dedicated compiler magic that you can never have for your library and/or collections that lose types left and right (like Java). Less mutable support = concession to immutable-as-typical and already-too-big.

So we can discuss, but I think the existing hierarchy has an awful lot going for it.

4 Likes

You don’t need capture checking / capabilities for that. Any old context parameter will do.

class C():
  def +=(i: Int)(using C.Token): Unit = println(i)
object C:
  opaque type Token = Unit
  inline def build(inline f: Token ?=> C): C =
    f(using ((): Token))

This pattern allows access to methods on one class only in the context of a particular method call, which you can arrange however is convenient (e.g. to unlock builder capabilities).

But in the case of builders, having an explicit snapshot() method would probably do the trick too. It’s not that hard to sequence things in the order you mean; the trick with builders is knowing whether result() means that you’re done, or that you’re going to keep going, and for that to be safe. But you can just declare your intent, and it’s all good.

Capture checking would prevent smuggling the Token out by grabbing it in a closure. Granted that’s not very likely, but it can happen.

1 Like

Yes, so in addition to ListBuilder explicitly allowing the pattern, I would also need List.newBuilder to guarantee that it returns a ListBuilder (or use ListBuilder directly; I’ve gone back and forth on that but I like the idea of using <Type>.newBuilder for all the collections).

But the entire point of the feature is to help you keep your additions straight.

Why would you try to circumvent a feature which is there to help you avoid mistakes, unless there was a very good reason that meant that the class was actually not useful for you otherwise?

Given that Scala private often is actually JVM public, and given that JVM private is an setAccessible(true) away from being seen, and Scala immutable is JVM mutable most of the time, we’re absolutely beset by ways to intentionally do the thing that makes things tricky and dangerous.

The advantage of the permission token method is that you never see the token by default. It just looks like:

val b = Buffer.create[Int]
val c = b.build:  // Or Buffer.build--can work however you want
  b += 2
  b += 3

So you really have to want to grab it. The scoping solution makes grabbing the functionally active class more tempting.

Just scoping is fine, too, most of the time. But a permission token adds an extra level of safety because as implicit context you don’t name it.

Simple proposal:

overload for zipAll(other) with just one parameter, that throws if both collections are not the same length.

I noticed that all of the zip() usages in my code actually zip same length sequences / iterators, it would be nice to be explicit about it to catch bugs.

3 Likes

For example, the standard library needs non-empty collection types so that groupBy’s type can be properly expressed (the values of the returned Map are always non-empty).

+1 for non-empty collection types

I would add that Scala is a very good language at expressing domain invariants, and in my experience non-empty collections are a very common invariant, so it would be consistent with Scala’s design goals to have such types in the standard library. The fact a lot of Scala ecosystems are coming-up with their own NonEmptyList implementation is to me a notable sign that it is lacking from the standard library.

In an ideal world we would have non-empty versions of every collection type, but at least one non-empty type by family (Seq, Set, Map) would actually help me a lot in my day to day work.

There is the cons :: type in the STL but its methods are inherited from List so (1 :: Nil).map(_ + 1) returns a List and the non-empty property is then lost by the type system.

Why not use cats.data.NonEmptyList? Because it is not standard-library idiomatic and tends to push you deeper into the cats ecosystem than you actually need to, here are a few examples:

  • cats.data.NonEmptyList doesn’t have a foreach(f: x => Unit): Unit method
  • conversions to NonEmptySet or NonEmptyMap require cats.kernel.Order instances because they are sorted implementations (unlike the base standard library Set and Map types)
  • and so on…

I think writing “plumbering” code to convert from/to standard library types whenever I need it pollutes the domain logic expressed in the code. I also want utility types that solve my problem without requiring me to adapt a significant part of the codebase.

org.scalactic.anyvals.NonEmptyList/Set/Map are closer to what I need and integrates better with standard library code thanks to implicit conversions, but Scalactic is not really popular so third-party libraries never provide typeclasses for these types.

I don’t have much experience with zio.prelude, it also provides implicit conversions to integrate with standard library code, but it is not free of zio-ism and it feels heavy to bring zio and zio-streams dependencies just for a NonEmptyList type when you are not using ZIO effects.

On a different subject, built-in traverse utilities for standard library types like Either and Option would also be helpful (like Future.sequence).

EDIT : Forgot to add that while implicit conversions may help, it would be way more convenient to have NonEmptySeq/Set/Map types be sub-types of Seq/Set/Map.

11 Likes

Huge +1 for all that :clap:

I’m going to offer a counterpoint here, even though I sometimes also prefer the simplicity of iterators:

intermediary views are hard to pass around.

I don’t get what’s hard to pass around in views (seems as hard as an iterator)

and they are inefficient if used multiple times.

That is true, but iterators can have undefined behavior if used multiple times. So I think the performance issues are still worth it.

I would argue that one should always use views unless the underlying collection cannot be kept in memory.

Actually you can’t statefully map a collection without materializing it with fold. statefulMap / mapAccumulate allows to map using a state on any collection.

1 Like

The operation is tricky to get right if it might be done in parallel unless you do it the way that, say Akka/Pekko does. But in that case, the parallelizing library is the one that should implement it. If done sequentially, you can update a variable that is captured by the closure for the map. So I’m not entirely sure this is a great candidate, especially since scan already handles some of the use-cases:

var s = s0
xs.map{ x => s = foo(x, s); bar(x, s) }

xs.iterator.
  scanLeft((s0, None)){ (ans, x) =>
    val s = foo(x, ans._1); (s, Some(bar(x, s))
  }.flatMap(_._2)

Both imperfect, but we have to weigh just how imperfect these are against the cost of making the already very-large footprint of collections methods even larger.

In the case of something like groupMapReduce, the value is more substantial because creating a superfluous known-but-not-typed-nonempty collection is a pretty big divergence from intent. Here it’s less obvious.

Why just map, anyway? Why not statefulFilter, statefulCollect, statefulFlatMap, statefulPartition, and so on? Those things are actually all useful! But at some point we have to decide where we’re going to stop providing specific methods and switch to more general types of composition.

I’m not really arguing that this isn’t a good addition. I just want to see the argument about how much weight this pulls and why the boundary should be with this in, but the other obvious variants out (or why they should be in).

2 Likes

Sorry to interrupt the complicated things, but I was reminded of one thing on my wish list: Seq.get(idx: Int): Option[T] same behaviour as .lift but named like Map.get.

3 Likes

i guess you can put up with one allocation (Some) but two (.lift) is too much?

Edit: joke

It’s really just about the name.
list.lift(5) just … does not read like reading an element out of the list. Lift always requires these mental gymnastics of “right, I can treat this sequence like a partial function and lift it to a total function which coincidentally behaves like a ‘read element with bounds check’ method.”

10 Likes

possibly a tapNone for Option

trait Option[+T] {
  ...
  def tapNone[U](op: => U): Option[T]
}

or in Iterable .tapEmpty?

saves on doing .tap(optFoo => if optFoo.isEmpty then ...) or .tap(_.fold(...)(_ => ()))

side note: why the are tap/foreach closures generic and not returning Unit? just to avoid warnings for discarded non-unit value?

Discussion at the PR but be forewarned, not as much bikeshedding as what to name tap and where it should reside.

1 Like

After trying every possible combination and being annoyed and burned in various different ways, I have settled on this as the optimum for me (with value discard warnings always on):

  /** Apply a function to this value and return the result. */
  inline infix def pipe[B](inline f: A => B): B = f(a)

  /** Apply a side-effecting function to this value; return the original value */
  inline def tap(inline f: A => Unit): A = { f(a); a }

  /** Apply a side-effecting function to this value; discard the value. */
  inline def effect(inline f: A => Unit): Unit = f(a)

  /** Discard the value in an observable way. Use as __ Unit */
  inline infix def __(nothing: Unit.type): Unit = inline compiletime.erasedValue[A] match
    case _: Unit => compiletime.error("Cannot discard a value that is already Unit")
    case _       => ()

Because __ is alphanumeric (lowest precedence) and binding is left-to-right, you can always append a __ Unit to something you want to discard.

I find it extremely clear and easy–the discards are all explicitly documented, and I don’t need to try to remember when a discard is assumed (which might mean that I’m not actually using the function correctly).

3 Likes

Sadly, it flouts

@scala.annotation.compileTimeOnly("`Unit` companion object is not allowed in source; instead, use `()` for the unit value")

The annotation didn’t work at all in Scala 3 with the Scala 2 library until my PR from 2023 was recently merged, just in time for the transition to the Scala 3 library.

I might try an arrow pointing to a hole in the ground, something like x —> __.

1 Like

It’s inline so Unit never materializes. So it is compile-time only. And if it is erased as well (once that’s non-experimental), that really really ought to be okay.

But if not, I can always create Void instead for my own use. Goodness knows I already have a lot of things for my own use. And other people who have value discard warnings on can do…whatever else.

Anyway, it is super-useful for me. After trying about five different things, this is finally one that I find works really well. (Whether it’s Unit or Void isn’t terribly important. A prominent short keyword is very helpful, though–I had it as __ () for a while, and that was far harder to pick out by eye.)

1 Like

I should have said compileTimeOnly was just a hack to keep users from mistaking Unit for (), though frankly I did not foresee its dematerialization in DSLs, which apparently stands for Dematerialized Scala Larks.

Also I wasn’t sufficiently laudatory of your invention. I like the way double underscore looks like an ellipsis; and as a reactionary underscorer, I boost any effort to use more of them.

For a brief moment, I used them to align case arrows, case __________ =>. But that made patterns look like Mad Libs. (Mad Libs is not political commentary, but a word fill-in game.)

I will try your syntax first, and if I found a consultancy, I will call it The Double Underscore Unit.

Back on topic, I hope the standard library becomes more inventive; the discussion about symbolic

|>

that I revisited earlier today was unfortunately constrained.