Standard Library: Now open for improvements and suggestions!

Hello everyone, on behalf of the Scala Core team, we would like to announce that as of Scala 3.8.0, the standard library is fully open to new improvements (including collections and other core data types that have been frozen since 2.13.0).

This can also include additions to the other modules under the org.scala-lang maven namespace (or even new ones)

We now have funds from the Sovereign Tech Agency to develop, review and integrate changes.

We will soon announce a new lightweight process to get API changes approved. We want a process that is constructive, fast-moving, and with community participation.

Until the process is fully-announced, we would like to use this forum post to seek suggestions for where Scala’s Standard Library should evolve, and identify common problems.

As always the scala/scala3 repo is open for pull requests.

Background: we have previously included community suggestions or other ideas for extending the library in repositories such as:

16 Likes

There was also some previous discussion in StdLib Extensions · Issue #22006 · scala/scala3 · GitHub

Any thoughts about using Type Classes as the relation between collection types in Scala 3? I’ve been writing an alternative standard library as a side project, and I implemented a version of TreeSet with a toset typeclass, and it seems to work pretty well so far.

I’d really love the addition of a.groupByvariant that assumes unique keys. Just for convenience and type safety of not having to .map((k, vs) => (k, vs.head)).

Also, groupByHead?

Example implementations for List as of today:

extension [A](items: List[A]) {
  def groupByOne[B](mkKey: A => B): Map[B, A] = items.groupBy(mkKey).view.mapValues(_.head).toMap
}

extension [T <: Tuple](items: List[T]) {
  def groupByHead: Map[Tuple.Head[T], List[Tuple.Tail[T]]] = items.groupBy(_.head).view.mapValues(_.map(_.tail)).toMap
  //and maybe a groupByHeadOne or something similar
}

3 Likes

groupByHead, this is for somethign like group database rows by and dropping primary key?

I found the following is missing:

  • groupByOrdered[P](f: T => P) that preserves value ordering Iterable[T] -> List[(P, List[T])]
  • groupBy that can accept different compare and hash functions than the default ones.
  • invert function Map[K,V] -> Map[V, Set[K]]
  • invert function ListMap[K,V] -> ListMap[V, ListSet[K]]
1 Like

Is there something like .freqs or .frequencies equivalent to: .groupMapReduce(identity)(_ => 1)(_ + _) which I see often requested, or is it maybe not necessary?

Edit: I just realized I missed a great opportunity for a pun: “… which I see frequently requested, …”

3 Likes

+1 on groupByHead and frequencies. I’m pretty sure those are 90% of my uses of groupMapReduce.

groupByHead, this is for somethign like group database rows by and dropping primary key?

Yup, this is how I usually use it (although I technically don’t drop the key).
I wouldn’t say “database rows”, because in those cases you can usually move the computation to the DB, but think something like processing CSVs.

For example, say you have a CSV of users, that you parse to get User(id: String, name: String, age: Int)

It’s common to have a def parseCsv[T](file: String): List[T], but in this case you want a Map[String, User] to efficiently fetch users by ID.

2 Likes

There are two previous topics which I think should be scanned for things we have already discussed about the possible Scala 3 specific stdlib improvements:

1 Like

I view use of ChainingSyntax as an antipattern specifically because it’s not inlined. It non-obviously makes some operations dramatically slower. If it’s inlined then it would actually be a plus rather than a trap! This should be a high priority; if not, I think ChainingSyntax should just be removed. You never need it, the amount it helps is modest, and the potential for unexpected performance hits is high.

(I use tap and pipe all the time in Scala 3, but my own versions which use the inline definition, not the ChainingSyntax ones.)

Regarding infix, I think it is a much lower priority unless we have a more principled way to deal with non-Scala libraries. A lot of Java libraries have methods that very naturally work as infix, but you get a million warnings, so the only practical method to use them as infix is to turn warnings off. But if the solution to infix is to turn warnings off, it works just fine on the standard library too.

So, yes, let’s do it. But I wouldn’t worry about it much; the infix restriction is still a painful experience for people who write anything infix, and the workarounds work for everyone (e.g. using backticks).

2 Likes

see chore: deprecate `scala.util.ChainingOps` by hamzaremmal · Pull Request #24725 · scala/scala3 · GitHub

Unfortunately, I could not use Scala for a while. Last time I missed mapping functions on Tuple. Something like (1,…).map1(_.toString) allowing to change the type of an element without the need to repeat all other elements. There is already map which applies to all elements. IMO that would be a nice addition. Not sure if this should be available for named tuples as well, I guess there one would need to able to define the new name as well.

1 Like

the same pattern we could apply to named tuples i hope

Would map1 only map the first element ?

If not, this should be solved instead by adding polymorphic eta-expansion:

val f: [T] => T => String = _.toString // Should work, but currently doesn't

What about an Ordering instance for simple enums such as described in this thread? Derived Ordering for simple enum? - #8 by philipschwarz

2 Likes

Just to say also again there can be more changes than the odd collection operation here or there

  • new collection types?
  • error handling/validation
  • “we rely on java api too much for X”
  • should we standardise some patterns from the ecosystem (or is too much left to the ecosystem)
  • what should we add to other “core” scala modules (not necessarily scala-library)

Oh one thing I would like is a generalization of Either (aka tagged union), maybe something like:

val foo: TaggedUnion[(A: Int, B: String, C: Int)] = ???

foo match
case TaggedUnion.A(x) => // x: Int
case TaggedUnion.B(x) => // x: Sting
case TaggedUnion.C(x) => // x: Int

I don’t know what the syntax should be but it should:

  1. Be constructed with as little boilerplate as possible (hence the named union in the example)
  2. Support pattern matching
  3. Support conditionals, a kind of .nonEmpty but for each tag
  4. Support safe access, a kind of .getOption but for each tag
  5. Probably not be a monad
    1. For example no “.map is the same as .left.map”
  6. Maybe be interoperable with other type instances x: TaggedUnion[(A: Int)] is also a valid TaggedUnion[(A: Int, B: String)]
  7. Preferably the order of the tags should not matter: TaggedUnion[(A: Int, B: Int)] =:= TaggedUnion[(B: Int, A: Int)]
  8. Does not have to be user-constructible, if we need to have some special case for it in the compiler, it’s fine for me

Another way to achieve this is to create tag types like @Ichoran has done (IIRC) so that you have Tag["A", Int] | Tag["B", Int]
(This seems cleaner as a foundation, but a bit wordy in user programs, so maybe we can add an alias and/or desugaring)

1 Like

Another useful thing would be a list with compile-time know length
For example Vector[Int, 4]

And automatic conversion between tuple literals (and only literals!) and collections:

val xs: List[Int] = (1, 2, 3)
val ys: Vector[Int, 4] = (1, 2, 3, 4)
val xs: List[Int] = myTuple // error: got Tuple[...], but expected List[Int]
2 Likes

My tagged types are almost isomorphic to arity-1 named tuples. So this would be (a: Int) | (b: Int), which doesn’t resolve favorably save with sneaky inline compile-time dispatching.

My use case is the opposite: to make sure disjoint Ints are not confused with each other. For instance, if you have a start index and a length, then def slice(i0: Int \ "start", n: Int \ "length") would prevent errors like slice(5, 10) intending that you get elements 5, 6, 7, 8, and 9. You’d have to write slice(5 \ "start", 10 \ "length") at which point it’s blindingly obvious that you’re using the API wrong.

The key difference is that the tagged types are neither subtypes nor supertypes of the type that is being tagged. With named tuples, you can slice((start = 5), (length = 5)) but you can also just slice(Tuple(5), Tuple(10)), which loses the names.

Anyway, if you could force named parameters to be used, this would cover probably 70% of use cases. And even so, I only use this in cases where it is very, very important that I don’t accidentally switch same-typed values.

Sum types that need to be distinguished at runtime have to store extra information.

Tagged unions are a cool idea; they’re just a different one! I don’t have a great way to do that at the library level off the top of my head. The thing you can’t express cleanly is that an N-arity union is a supertype of a (N-M)-arity union with the same names but M alteratives missing. That seems very natural and desirable, but I think you would need explicit compiler support for it.

The NamedTuple mostly-library-level solutions are fragile and don’t always give great error messages even for named tuples, where the subset identity stuff isn’t pushed on very hard.

1 Like

Only the first yes, likewise mapN would create a copy with all values except for the nth element where the value is transformed with the given function.