On this forum, each project supported through this investment has its own dedicated thread.
This thread covers the work package Maintenance of the Standard Library/Core Library Modules and APIs and will be used to share the project overview, a roadmap with key milestones, ongoing progress updates, and opportunities to engage—so we can hear ideas from the community and encourage contributions.
The goal for this project is to perform overdue maintenance work on the standard library and propose new changes after the end of a 5 year feature-freeze during the transition to Scala 3. The scope covers all library modules under the Scala organisation.
Current Team
Jamie Thompson (Lead), Nguyen Pham, Solal Pirelli, Hamza Remmal
What does it mean for the community?
Now that the Scala 2.13 library sources are copied to Scala 3’s repo, as of Scala 3.8.0 the standard library is fully open to new improvements (including collections and other core data types that have been frozen since 2.13.0). This means that we are free to integrate Scala 3 features with the existing library (such as explicit nulls), add new APIs, or improve the performance profile of existing ones (while maintaining backwards binary compatibility.)
Background: we have previously included community suggestions or proposed ideas for extending the library in repositories such as:
These previous efforts integrate community contributions have been passive, but now with this project we have dedicated funds specifically to put in place a guidelines and a lightweight process to systematically gather and review community-lead suggestions for the standard library, monitored by experts.
Exploration areas
This list is non-exhaustive and serves as examples:
Improvements to existing core API: Collections, Error handling, etc.
Integration of old API with Scala 3 features: derivation, inline, Nullability, etc.
Improving the onboarding ramp for “basic” scala programming - scala/toolkit
For completely fresh ideas (e.g. new Result type) we propose to incubate them in lampepfl/steps.
Communication
The purpose of this thread is to inform of actions taken, to communicate about the new process for community-lead contributions, and to seek feedback from the community:
Any thoughts about using Type Classes as the relation between collection types in Scala 3? I’ve been writing an alternative standard library as a side project, and I implemented a version of TreeSet with a toset typeclass, and it seems to work pretty well so far.
I’d really love the addition of a.groupByvariant that assumes unique keys. Just for convenience and type safety of not having to .map((k, vs) => (k, vs.head)).
Also, groupByHead?
Example implementations for List as of today:
extension [A](items: List[A]) {
def groupByOne[B](mkKey: A => B): Map[B, A] = items.groupBy(mkKey).view.mapValues(_.head).toMap
}
extension [T <: Tuple](items: List[T]) {
def groupByHead: Map[Tuple.Head[T], List[Tuple.Tail[T]]] = items.groupBy(_.head).view.mapValues(_.map(_.tail)).toMap
//and maybe a groupByHeadOne or something similar
}
Is there something like .freqs or .frequencies equivalent to: .groupMapReduce(identity)(_ => 1)(_ + _) which I see often requested, or is it maybe not necessary?
Edit: I just realized I missed a great opportunity for a pun: “… which I see frequently requested, …”
+1 on groupByHead and frequencies. I’m pretty sure those are 90% of my uses of groupMapReduce.
groupByHead, this is for somethign like group database rows by and dropping primary key?
Yup, this is how I usually use it (although I technically don’t drop the key).
I wouldn’t say “database rows”, because in those cases you can usually move the computation to the DB, but think something like processing CSVs.
For example, say you have a CSV of users, that you parse to get User(id: String, name: String, age: Int)
It’s common to have a def parseCsv[T](file: String): List[T], but in this case you want a Map[String, User] to efficiently fetch users by ID.
There are two previous topics which I think should be scanned for things we have already discussed about the possible Scala 3 specific stdlib improvements:
I view use of ChainingSyntax as an antipattern specifically because it’s not inlined. It non-obviously makes some operations dramatically slower. If it’s inlined then it would actually be a plus rather than a trap! This should be a high priority; if not, I think ChainingSyntax should just be removed. You never need it, the amount it helps is modest, and the potential for unexpected performance hits is high.
(I use tap and pipe all the time in Scala 3, but my own versions which use the inline definition, not the ChainingSyntax ones.)
Regarding infix, I think it is a much lower priority unless we have a more principled way to deal with non-Scala libraries. A lot of Java libraries have methods that very naturally work as infix, but you get a million warnings, so the only practical method to use them as infix is to turn warnings off. But if the solution to infix is to turn warnings off, it works just fine on the standard library too.
So, yes, let’s do it. But I wouldn’t worry about it much; the infix restriction is still a painful experience for people who write anything infix, and the workarounds work for everyone (e.g. using backticks).
Unfortunately, I could not use Scala for a while. Last time I missed mapping functions on Tuple. Something like (1,…).map1(_.toString) allowing to change the type of an element without the need to repeat all other elements. There is already map which applies to all elements. IMO that would be a nice addition. Not sure if this should be available for named tuples as well, I guess there one would need to able to define the new name as well.
Oh one thing I would like is a generalization of Either (aka tagged union), maybe something like:
val foo: TaggedUnion[(A: Int, B: String, C: Int)] = ???
foo match
case TaggedUnion.A(x) => // x: Int
case TaggedUnion.B(x) => // x: Sting
case TaggedUnion.C(x) => // x: Int
I don’t know what the syntax should be but it should:
Be constructed with as little boilerplate as possible (hence the named union in the example)
Support pattern matching
Support conditionals, a kind of .nonEmpty but for each tag
Support safe access, a kind of .getOption but for each tag
Probably not be a monad
For example no “.map is the same as .left.map”
Maybe be interoperable with other type instances x: TaggedUnion[(A: Int)] is also a valid TaggedUnion[(A: Int, B: String)]
Preferably the order of the tags should not matter: TaggedUnion[(A: Int, B: Int)] =:= TaggedUnion[(B: Int, A: Int)]
Does not have to be user-constructible, if we need to have some special case for it in the compiler, it’s fine for me
Another way to achieve this is to create tag types like @Ichoran has done (IIRC) so that you have Tag["A", Int] | Tag["B", Int]
(This seems cleaner as a foundation, but a bit wordy in user programs, so maybe we can add an alias and/or desugaring)
My tagged types are almost isomorphic to arity-1 named tuples. So this would be (a: Int) | (b: Int), which doesn’t resolve favorably save with sneaky inline compile-time dispatching.
My use case is the opposite: to make sure disjoint Ints are not confused with each other. For instance, if you have a start index and a length, then def slice(i0: Int \ "start", n: Int \ "length") would prevent errors like slice(5, 10) intending that you get elements 5, 6, 7, 8, and 9. You’d have to write slice(5 \ "start", 10 \ "length") at which point it’s blindingly obvious that you’re using the API wrong.
The key difference is that the tagged types are neither subtypes nor supertypes of the type that is being tagged. With named tuples, you can slice((start = 5), (length = 5)) but you can also just slice(Tuple(5), Tuple(10)), which loses the names.
Anyway, if you could force named parameters to be used, this would cover probably 70% of use cases. And even so, I only use this in cases where it is very, very important that I don’t accidentally switch same-typed values.
Sum types that need to be distinguished at runtime have to store extra information.
Tagged unions are a cool idea; they’re just a different one! I don’t have a great way to do that at the library level off the top of my head. The thing you can’t express cleanly is that an N-arity union is a supertype of a (N-M)-arity union with the same names but M alteratives missing. That seems very natural and desirable, but I think you would need explicit compiler support for it.
The NamedTuple mostly-library-level solutions are fragile and don’t always give great error messages even for named tuples, where the subset identity stuff isn’t pushed on very hard.
Only the first yes, likewise mapN would create a copy with all values except for the nth element where the value is transformed with the given function.