Hello everyone, on behalf of the Scala Core team, we would like to announce that as of Scala 3.8.0, the standard library is fully open to new improvements (including collections and other core data types that have been frozen since 2.13.0).
This can also include additions to the other modules under the org.scala-lang maven namespace (or even new ones)
We now have funds from the Sovereign Tech Agency to develop, review and integrate changes.
We will soon announce a new lightweight process to get API changes approved. We want a process that is constructive, fast-moving, and with community participation.
Until the process is fully-announced, we would like to use this forum post to seek suggestions for where Scala’s Standard Library should evolve, and identify common problems.
As always the scala/scala3 repo is open for pull requests.
Background: we have previously included community suggestions or other ideas for extending the library in repositories such as:
Any thoughts about using Type Classes as the relation between collection types in Scala 3? I’ve been writing an alternative standard library as a side project, and I implemented a version of TreeSet with a toset typeclass, and it seems to work pretty well so far.
I’d really love the addition of a.groupByvariant that assumes unique keys. Just for convenience and type safety of not having to .map((k, vs) => (k, vs.head)).
Also, groupByHead?
Example implementations for List as of today:
extension [A](items: List[A]) {
def groupByOne[B](mkKey: A => B): Map[B, A] = items.groupBy(mkKey).view.mapValues(_.head).toMap
}
extension [T <: Tuple](items: List[T]) {
def groupByHead: Map[Tuple.Head[T], List[Tuple.Tail[T]]] = items.groupBy(_.head).view.mapValues(_.map(_.tail)).toMap
//and maybe a groupByHeadOne or something similar
}
Is there something like .freqs or .frequencies equivalent to: .groupMapReduce(identity)(_ => 1)(_ + _) which I see often requested, or is it maybe not necessary?
Edit: I just realized I missed a great opportunity for a pun: “… which I see frequently requested, …”
+1 on groupByHead and frequencies. I’m pretty sure those are 90% of my uses of groupMapReduce.
groupByHead, this is for somethign like group database rows by and dropping primary key?
Yup, this is how I usually use it (although I technically don’t drop the key).
I wouldn’t say “database rows”, because in those cases you can usually move the computation to the DB, but think something like processing CSVs.
For example, say you have a CSV of users, that you parse to get User(id: String, name: String, age: Int)
It’s common to have a def parseCsv[T](file: String): List[T], but in this case you want a Map[String, User] to efficiently fetch users by ID.
There are two previous topics which I think should be scanned for things we have already discussed about the possible Scala 3 specific stdlib improvements:
I view use of ChainingSyntax as an antipattern specifically because it’s not inlined. It non-obviously makes some operations dramatically slower. If it’s inlined then it would actually be a plus rather than a trap! This should be a high priority; if not, I think ChainingSyntax should just be removed. You never need it, the amount it helps is modest, and the potential for unexpected performance hits is high.
(I use tap and pipe all the time in Scala 3, but my own versions which use the inline definition, not the ChainingSyntax ones.)
Regarding infix, I think it is a much lower priority unless we have a more principled way to deal with non-Scala libraries. A lot of Java libraries have methods that very naturally work as infix, but you get a million warnings, so the only practical method to use them as infix is to turn warnings off. But if the solution to infix is to turn warnings off, it works just fine on the standard library too.
So, yes, let’s do it. But I wouldn’t worry about it much; the infix restriction is still a painful experience for people who write anything infix, and the workarounds work for everyone (e.g. using backticks).
Unfortunately, I could not use Scala for a while. Last time I missed mapping functions on Tuple. Something like (1,…).map1(_.toString) allowing to change the type of an element without the need to repeat all other elements. There is already map which applies to all elements. IMO that would be a nice addition. Not sure if this should be available for named tuples as well, I guess there one would need to able to define the new name as well.
Oh one thing I would like is a generalization of Either (aka tagged union), maybe something like:
val foo: TaggedUnion[(A: Int, B: String, C: Int)] = ???
foo match
case TaggedUnion.A(x) => // x: Int
case TaggedUnion.B(x) => // x: Sting
case TaggedUnion.C(x) => // x: Int
I don’t know what the syntax should be but it should:
Be constructed with as little boilerplate as possible (hence the named union in the example)
Support pattern matching
Support conditionals, a kind of .nonEmpty but for each tag
Support safe access, a kind of .getOption but for each tag
Probably not be a monad
For example no “.map is the same as .left.map”
Maybe be interoperable with other type instances x: TaggedUnion[(A: Int)] is also a valid TaggedUnion[(A: Int, B: String)]
Preferably the order of the tags should not matter: TaggedUnion[(A: Int, B: Int)] =:= TaggedUnion[(B: Int, A: Int)]
Does not have to be user-constructible, if we need to have some special case for it in the compiler, it’s fine for me
Another way to achieve this is to create tag types like @Ichoran has done (IIRC) so that you have Tag["A", Int] | Tag["B", Int]
(This seems cleaner as a foundation, but a bit wordy in user programs, so maybe we can add an alias and/or desugaring)
My tagged types are almost isomorphic to arity-1 named tuples. So this would be (a: Int) | (b: Int), which doesn’t resolve favorably save with sneaky inline compile-time dispatching.
My use case is the opposite: to make sure disjoint Ints are not confused with each other. For instance, if you have a start index and a length, then def slice(i0: Int \ "start", n: Int \ "length") would prevent errors like slice(5, 10) intending that you get elements 5, 6, 7, 8, and 9. You’d have to write slice(5 \ "start", 10 \ "length") at which point it’s blindingly obvious that you’re using the API wrong.
The key difference is that the tagged types are neither subtypes nor supertypes of the type that is being tagged. With named tuples, you can slice((start = 5), (length = 5)) but you can also just slice(Tuple(5), Tuple(10)), which loses the names.
Anyway, if you could force named parameters to be used, this would cover probably 70% of use cases. And even so, I only use this in cases where it is very, very important that I don’t accidentally switch same-typed values.
Sum types that need to be distinguished at runtime have to store extra information.
Tagged unions are a cool idea; they’re just a different one! I don’t have a great way to do that at the library level off the top of my head. The thing you can’t express cleanly is that an N-arity union is a supertype of a (N-M)-arity union with the same names but M alteratives missing. That seems very natural and desirable, but I think you would need explicit compiler support for it.
The NamedTuple mostly-library-level solutions are fragile and don’t always give great error messages even for named tuples, where the subset identity stuff isn’t pushed on very hard.
Only the first yes, likewise mapN would create a copy with all values except for the nth element where the value is transformed with the given function.