Standard Library: Now open for improvements and suggestions!

I like that. Given its “good at everything, excellent at nothing” performance, Vector looks like a reasonable default. If nothing else, it would save my students from tripping themselves by using :+ on a Seq

2 Likes

That’d be great. @lihaoyi ‘s library has an interesting design for that. Personally, and after much struggle, I finally settled on two type classes Input and Output and functions such as def readAll[A, C[_], I: Input](...) and def writeAll[A, O: Output](...). These now handle all my cases as I was getting tired of my inputs sometimes being a File, a URL, a Path, a filename, a resource input stream, etc.

Vectors are indeed fairly reasonable for reasonably sized collections. For small collections (say 0-4 elements) they seem to be worse than Lists. Tiark Rompf ran a trial some years back where all lists in the Scala compiler were replaced by vectors. The compiler ran 10% slower. Maybe that can be mitigated by a vector design that special cases small collections. But it’s hard to do that without introducing megamorphic calls which produce a performance hit themselves.

2 Likes

An ability to limit collection size in a context is the most our wanted feature. We have created sandbox wich is very difficult to crash. But if a programmer makes an error wich leads to infinite collection increase, it is just nightmare. It will crash server(Jmv is not throwing OutOfMemroy, it just hangs ). Vm snapshots is very difficult to analyze because of very long chains of objects. Sometimes jvm hangs and it is even not possible to take snapshots with standard tools.

If anyone is interested in adding something to the standard library, we now have process in place:

Essentially, it’s more streamlined than SiP. It requires to first start a conversation with the intended change on the contributors forum to gather feedback and then the decision is take during the Scala Core meeting. The Scala Core Coordinator (in this case me) is responsible to have it presented in a timely manner (probably max 1 month.)

I will get this process documented in more places.

5 Likes

fold already lets you “map” a value so could you be more specific?

True, but at a fairly high overhead in both code and runtime[1]. The basic type definition is:

class List[A] {
  def mapAccumulate[S, B](z: S)(f: (S, A) => (S, B)): (S, List[B]) = ...
}

Note this exists in Cats and in Haskell, it’s a pretty standard function.

I find this useful in compiler-like operations, needing to maintain some state while transforming a sequence of operations. And in compiler-y use cases, the specialization for performance (and reduced business logic overhead) is useful. In my opinion, this is one of those “obviously useful once you see it” sort of combinators that could benefit the standard library.

[1]: At best, you construct an immutable collection element-by-elements which is obviously much slower than using a mutable Builder like map does. At worst (e.g. Lists), you construct backwards then need to reverse.

2 Likes

+1 for frequencies and frequenciesBy(f) = groupMapReduce(f)(const(1))(_+_)

and I throw in

def zipWith[B, C](ys: List[B], f: (A, B) => C): List[C]

and

def uncons: Option[(A, List[A])]
def unconsWith[B](f: (A, List[A]) => B): B

:folded_hands:

2 Likes

This might be a long shot, but what if the Tuple1 to Tuple22 implementations could be records on the JVM? Client code may benefit from extra constant folding by the JIT because record fields are trusted to be final.
Additionally the .equals and .hashcode implementations might be implemented the similarly by computing them lazily through invokedynamic/ObjectMethods#bootstrap. I suppose a similar thing could be done for .toString, but it would require a bootstrap method in the Scala standard library since the default toString implementation for Java records and Scala tuples are not equivalent.

How about atomic wrappers like AtomicInt? We can even make match types usuful

type ToAtomic[T] <: Atomic[T] = T match
    case Int => AtomicInt
    case Long => AtomicLong
    ...
    case _ => AtomicRef[T]

inline def Atomic[T](value: T): ToAtomic[T] = compiletime.erasedValue[T] match
    case _: Int => new AtomicInt(value)
    case _: Long => new AtomicLong(value)
    ...
    case _ => AtomicRef(value)

They could be an opaque wrappers for Java Atomics or new implementations (compare with Atomic — Monix)

1 Like

The default collection type should be immutable Arrays. The JVM may provide them in the not too distant future. Maybe the JavaScript runtime will provide them at some point, but if not they need to be compile time wrapped. Yes from a functional education perspective Lists can be helpful, but I’m pretty sure that in any situation where Lists beat Arrays, Scala style Vectors would be much better. I see no place where Lists are the best collection type.

Bjarne Stroustrup has been pushing for Arrays, the C++ Vector being essentially a compile time wrapped Array, for decades. And the evolution of computers over the last two decades has increased the performance advantages of Arrays further not diminished them.

I just remembered a big crazy ask that I had a while ago (probably too much work, but just in case), but I’m not sure how this would work.

Namely, it would be nice if Scala programs could be written “without the stdlib”. This is probably not feasible, but having a minimal stdlib (similar to Rust’s no_std/core library) might be possible.
So maybe it would be possible to split the stdlib into two, with a tiny core stdlib that would depend on a tiny fraction of java.lang (e.g. if possible, without any Process or Thread stuff).

While this might not be of much use on the JVM, it could potentially allow embedded development using Scala Native and small JS libraries in Scala.JS.

7 Likes

It might be really hard to decide what’s in vs what’s out in the tiny core library. It’s very likely a horses for courses problem.

You might be able to do this already with an old-school Java obfuscator. Back in the day we’d use them to strip unused code from applets. Java Code Obfuscation Tools: A Comprehensive Guide — javaspring.net

I think Scala Native and Scala.js might be doing something similar already, but don’t know if the tool can be separated to run on the JVM.

Perhaps something less crazy than that could be: could we continue shrinking the standard lib?

There’s a lot of stuff that has been ripped out of scala-library over the years and put into separate jars: scala-xml, scala-actors, scala-parser-combinators, scala-parallel-collections, etc. I think everyone agrees in hindsight this was a good move. However, there is still a bunch of legacy that code that lives in there that probably deserves to be modularized out: e.g.

  • scala.sys.process: this was copy-pasted over from SBT 15 years ago with zero code review, and now there are better alternatives open-source such as os-lib
  • scala.ref: Java’s WeakReferences are fine, does anyone actually need this?
  • scala.io: also largely superseded by os-lib
  • scala.collection.concurrent: I think most would agree this sees much less usage than the rest of scala.collection
  • scala.bean
  • scala.concurrent: as much as I like Future and ExecutionContext APIs myself, the APIs that existing in scala.concurrent are generally half-baked, missing a lot of necessary quality-of-life features, and it seems that nobody is interested in improving them since they haven’t been touched in 14 years now. As alternatives like Gears or Ox mature, we can consider modularizing scala.concurrent and referring people to one of the better-maintained alternatives

Shrinking the standard library would help move us towards a “tiny core library” world, but do so in an incremental fashion following the footsteps others have taken in the past. And it seems entirely doable. Anyone who wants a more batteries-included experience, such as beginners trying to write small scripts or applications, should probably be using the scala-toolkit or com-lihaoyi libraries anyway which include much higher-quality code than the dead batteries still living in scala-library

Regarding backwards compatibility, anyone who wants to continue using these things could pull them in and use them without any code changes, so it shouldn’t be a blocker

12 Likes

++ for Scala embedded (but can it be viable with depending on GC? - perhaps for those that really need to control memory usage SafeZone might be required in this case )

2 Likes

I believe it should be viable, potentially with some adjustments (i.e. using only a single threaded GC like Immix and disabling heap growing).

Jakub was able to have Scala Native running on a Playdate (with 16 MB of RAM), so at least some of the more powerful embedded systems should work.

The official MicroPython microcontroller board has only 192KiB RAM. I suspect SN would need quite a lot of optimization to go this low, but MicroPython does have a GC, so that part shouldn’t be a blocker.

On that note, I do think the main problem for systems with 192KiB RAM might be the stdlib. It’s probably fine if you just use arrays and strings, but if you use something like List you’ll ask for a banana and get a gorilla.

2 Likes

(Sorry, this was meant as a reply for Li Haoyi, but I can’t edit that now)

There’s a lot of stuff that has been ripped out of scala-library over the years and put into separate jars: scala-xml, scala-actors, scala-parser-combinators, scala-parallel-collections, etc. I think everyone agrees in hindsight this was a good move. However, there is still a bunch of legacy that code that lives in there that probably deserves to be modularized out

I suspect that there’s a caveat here: Bootstrapping the compiler - I believe the compiler cannot depend on any external lib to avoid circular issues when something needs to break.

So I think (at least parts of?) scala.io need to live in the compiler repo.

Not sure about the other modules though, but a moving those to separate jars sounds nice.

I think Scala Native and Scala.js might be doing something similar already.

They do have some reachability analysis to remove unused code, but this is slightly tangential to having a small core stdlib.

The advantage of that is that you can have an ecosystem of libraries that only depend on the core lib, kind of like a “seal of quality” (not that this implies quality).

So, if you are developing a system that cannot depend on the full stdlib, you have a subset of libraries that you can safely pick.

I agree with some of these removals, but I think we should aim at making the standard library as useful as possible for scripting

Scripting is I think vital for Scala’s future:

  • The minimal to non-existent boilerplate makes it a very good tool to onboard new people
  • It can help dissipate this conception that Scala is hard (to learn, to use, etc)
  • It can help people who already know Scala to use it for less ambitious things

And to me a big hindrance to scripting is “Wait what’s that library I need for opening files ? Oh there’s multiple? And what’s the magic syntax (scala cli directive) to make it available ?”
Compare this to python’s import os
And the story is the same for making http requests
I do agree that the scala toolkit makes this better, but it feels like still too much friction

For me the standard library should give you at least the tools to interact with your environment, and right now that’s terminal input + output + parameters, file input and output, and internet input and output
And if we realize that to support that well, we should put a good streaming abstraction, then we should include that in the standard library as well

If the solution is just putting one or more of these libraries straight into the standard library, so be it

P.S: The other thing the standard library should do is serve as well, a standard, any time there are multiple libraries that do something very basic, we should have the synthesis of all of them in the standard library, so that downstream libraries don’t have to convert between multiple different versions of the same thing

5 Likes

I am not sure if it’s achievable in stdlib, but what would you say for ToExpr/FromExpr instances for simple enums like:

  enum Level:
    case trace, debug, info, warn, error

  object Level:

    given ToExpr[Level]:
      def apply(x: Level)(using Quotes): Expr[Level] =
        x match
          case Level.trace => '{ Level.trace }
          case Level.debug => '{ Level.debug }
          case Level.info => '{ Level.info }
          case Level.warn => '{ Level.warn }
          case Level.error => '{ Level.error }

    given FromExpr[Level]:
      def unapply(x: Expr[Level])(using Quotes): Option[Level] =
        x match
          case '{ Level.trace } => Some(Level.trace)
          case '{ Level.debug } => Some(Level.debug)
          case '{ Level.info } => Some(Level.info)
          case '{ Level.warn } => Some(Level.warn)
          case '{ Level.error } => Some(Level.error)
          case _ => None

IMO it’s really common pattern.

(it could be generalised for case classes iff ToExpr/FromExpr available for the field types)

3 Likes

Isn’t //> using toolkit default sufficient for this? It would be nice if it were a touch shorter, but I don’t see any realistic path to making it shorter.

(I really doubt pulling any of the toolkit libraries into the standard library is going to fly…)

2 Likes