Explicit Nulls: Option-like extensions for working with nullable T | Null

Motivation

Scala3’s powerful additions enable new (potentially better) ways to think about existing patterns. One such pattern is the handling of potentially missing values, for which the best principled approach has historically been to use the Option monad, which certainly has its merits:

  • Option forces the programmer to consider and deal with the potentially missing case
  • it provides a rich and natural API to do so

Nevertheless, with Scala3’s union types and explicit nulls, the union T | Null emerges as a principled alternative to deal with this pattern. While it keeps Options hygiene of explicitly handling the absence of values, it distinguishes itself with clear advantages:

  • better runtime efficiency as values are unboxed
  • clearly improved user experience

The second point maybe deserves some explanation. Our APIs quite often have to provide methods with lots of optional values, for which the default is to not pass any value. The reduction in syntax overhead of not having to wrap every value that you actually want to pass with Some(_), essentially going from something like this:


generate(
    model = "mymodel",
    prompt = Some("Nullable types are"),
    suffix = Some("alternative to Options"),
    options = Some(Options(seed = Some(12), temperature = Some(0))),
    think = Some(true),
)

to this

generate(
    model = "mymodel",
    prompt = "Nullable types are",
    suffix = "alternative to Options",
    options = Options(seed = 12, temperature = 0),
    think = true
)

doesn’t only make code much easier to write and to read, but also matches my expectations as a user better of just passing values as is when they’re present.

But despite its benefits, T | Null has one clear drawback compared to Option: it doesn’t enjoy the same rich and fluent API.

Proposal

The proposal is quite simple (both to express and to implement :smiley:). It’s NOT not about bringing any changes or additions to the languages or its syntax. Rather, it’s to continue what has been started in Working with Null and provide additional extensions to make working nullable values easier and more fluent. More concretely, for example:

extension [T](x: T | Null)
  def map[U](f: T => U): U | Null = if x == null then null else f(x)
  def fold[U](ifEmpty: => U)(f: T => U): U = if x == null then ifEmpty else f(x)
  ...

The general idea then would be to provide pretty much a mirror of the Option API and all its methods.

While this arguably could be implemented for one own’s code (which I have personally done) or provided as a user library, the overhead of the former to take into every project you’re working on and the non-standard nature of the latter make it hard to provide an API to user relying on nullable types.

Closing remarks

A potential argument against encouraging the use of nullable types as T | Null is the inability to model nested options. However, arguably in more than 90% of cases you don’t need such a capability. And if you do, you can still use Option.

All in all, every approach to handling missing values comes with trade-offs, and this is certainly the case for both T | Null and Option. But despite their limitations, nullable types combined explicit nulls have certainly become a principled and viable alternative with clear advantages in many cases. This proposal will then enable taking advantage of their benefits with the powerful and familiar API users know and love from Option.

I don’t think we can reuse the method names:

val x: List[Int] | null = ???

x.map // Is that the nullable map, or the list map ?

In practice it would be the list one, since extension methods cannot override regular methods

We would therefore need something like mapNullable or mapN

And if we have to do that, I would rather have something like typescript:

val x: List[Int] | Null = ???
x?.sum() ?? 0
// vs
x.mapNullable(_.sum()).getOrElseNullable(0)
// or
x.mapN(_.sum()).getOrElseN(0)
2 Likes

I am firmly opposed.

You finish with

Let’s say your figure of 90% is accurate. Your suggestion will make 90% of code a bit faster, an 10% of code plain wrong. That’s not a good tradeoff. If we make T | Null too convenient (more convenient than Option[T]), people will reach for it for it by default without realizing the dangers it poses.

Nested options are in fact much more common than one thinks. Not because we actively write Option[Option[T]] anywhere. But because we write Option[A] somewhere, and then somewhere else we instantiate A to Option[B]. It is crucial that we be able to tell appart None from Some(None) in these situations.

For example, let’s say you implement any sort of container, like a Map[K, V]. Inside, you have holes, and to represent holes or not-found data you use V | Null, for efficiency. But then, a user of your container instantiates it to Map[String, String | Null]. Now you can’t tell whether null is your hole or your user’s null! That is disastrously wrong. Yet nowhere did anyone write String | Null | Null; it appeared indirectly through instantiation.


Writing T | Null in the first place is always wrong when T is not disjoint from Null. When top efficiency matters, and you have a concrete type SomeType that you know to be disjoint from Null, then it is acceptable to use SomeType | Null. Anywhere else, and in particular anywhere T is a generic type, writing T | Null is already wrong.

Contrast with Option[T], which is safe from any of these mistakes. Option[T] remains the correct default for an overwhelmingly large majority of cases. Therefore, it must remain the most convenient way to do things, so that users do not shoot themselves in the foot.

7 Likes

This is situational. Most options in the wild are not generic. I just ran some rough analysis on the Scala codebases I happen to have checked out on my machine, finding approximately ~3,000 generic options vs ~19,000 non-generic options

@ 
os.list(os.pwd).collect{case p if os.exists(p / ".git") => os.call(("git", "ls-files"), cwd = p).out.lines().map(p -> _)}.flatten.map{case (b, r) => b / os.SubPath(r) }.filter(_.ext == "scala").iterator.flatMap(p => try os.read.lines(p) catch{case e => Nil}).flatMap{l => """Option\[(.+?)\]""".r.findAllMatchIn(l).map(_.group(1)) }.toSeq.partition(_.length > 2) match{case (long, short) => (long.length, short.length) }  
cmd23.sc:1: This catches all Throwables. If this is really intended, use `case e : Throwable` to clear this warning.
val res23 = os.list(os.pwd).collect{case p if os.exists(p / ".git") => os.call(("git", "ls-files"), cwd = p).out.lines().map(p -> _)}.flatten.map{case (b, r) => b / os.SubPath(r) }.filter(_.ext == "scala").iterator.flatMap(p => try os.read.lines(p) catch{case e => Nil}).flatMap{l => """Option\[(.+?)\]""".r.findAllMatchIn(l).map(_.group(1)) }.toSeq.partition(_.length > 2) match{case (long, short) => (long.length, short.length) } 
                                                                                                                                                                                                                                                                    ^
res23: (Int, Int) = (19264, 3164)

This analysis heuristic is that Option[$s] with s.length <= 2 is likely to be a T or +T, and by convention almost all the generic options use single-character names. You can probably get a more precise count from the community build with a compiler plugin, but I’m sure you’ll get similar results.

Most Options out in the wild are not generic, by a 6:1 ratio. Using Option when T is not generic is verbose, wasteful, and overly-flexible. It violates the Principle of least Power, as Option is more flexible than | Null, and should only be used when that flexibility is necessary. The sooner we can replace all these Option[ConcreteType] with ConcreteType | Null the better

Sure, people implementing generic Map[K, V]s will want to use Option rather than | Null, but how many people do we think are implementing Scala collection libraries out there? I think the number is probably single-digit. Even if you include all the other non-collection use cases of generic Option[T], it still ends up being ~15% of Options out in the wild according to my rough analysis: a significant but small minority.

Here’s the top few Option type parameters in my dataset. Plenty of generics, but Int and String are the largest by a huge margin

os.list(os.pwd).collect{case p if os.exists(p / ".git") => os.call(("git", "ls-files"), cwd = p).out.lines().map(p -> _)}.flatten.map{case (b, r) => b / os.SubPath(r) }.filter(_.ext == "scala").iterator.flatMap(p => try os.read.lines(p) catch{case e => Nil}).flatMap{l => """Option\[(.+?)\]""".r.findAllMatchIn(l).map(_.group(1)) }.toSeq.groupBy(identity).mapValues(_.size).toSeq.sortBy(-_._2) 
cmd25.sc:1: This catches all Throwables. If this is really intended, use `case e : Throwable` to clear this warning.
val res25 = os.list(os.pwd).collect{case p if os.exists(p / ".git") => os.call(("git", "ls-files"), cwd = p).out.lines().map(p -> _)}.flatten.map{case (b, r) => b / os.SubPath(r) }.filter(_.ext == "scala").iterator.flatMap(p => try os.read.lines(p) catch{case e => Nil}).flatMap{l => """Option\[(.+?)\]""".r.findAllMatchIn(l).map(_.group(1)) }.toSeq.groupBy(identity).mapValues(_.size).toSeq.sortBy(-_._2)
                                                                                                                                                                                                                                                                    ^
cmd25.sc:1: method mapValues in trait MapOps is deprecated (since 2.13.0): Use .view.mapValues(f). A future version will include a strict version of this method (for now, .view.mapValues(f).toMap).
val res25 = os.list(os.pwd).collect{case p if os.exists(p / ".git") => os.call(("git", "ls-files"), cwd = p).out.lines().map(p -> _)}.flatten.map{case (b, r) => b / os.SubPath(r) }.filter(_.ext == "scala").iterator.flatMap(p => try os.read.lines(p) catch{case e => Nil}).flatMap{l => """Option\[(.+?)\]""".r.findAllMatchIn(l).map(_.group(1)) }.toSeq.groupBy(identity).mapValues(_.size).toSeq.sortBy(-_._2)
                                                                                                                                                                                                                                                                                                                                                                                ^
res25: Seq[(String, Int)] = List(
  ("String", 3884),
  ("Int", 1639),
  ("T", 971),
  ("A", 823),
  ("Boolean", 420),
  ("PsiElement", 309),
  ("ScType", 294),
  ("V", 266),
  ("B", 248),
  ("Tree", 233),
  ("Path", 222),
  ("ScExpression", 192),
  ("Symbol", 183),
  ("AbstractFile", 126),
  ("Long", 123),
  ("Type", 123),
  ("K", 121),
  ("os.Path", 118),
  ("Any", 118),
  ("Term", 110),
  ("(Int, Int)", 103),
  ("ScTypeElement", 94),
  ("dotty.tools.dotc.semanticdb.Scope", 90),
  ("X", 88),
  ("_", 88),
  ("File", 83),
  ("Simplification", 83),
  ("Throwable", 80),

The problem with using ConcreteType | Null for most use cases, because it happens to be fine with concrete, non-nullable types, is that we cannot effectively teach users when not to use that. They will get it wrong.

Sure, there’s “only” a 1:6 ratio for generic Options. They are the most important ones, though. The ones users see every day, e.g., when they call someList.headOption or someMap.get(key). These must stay Options. Why (asking myself as a confused user) am I encouraged to use T | Null, but the library functions I use everyday don’t?

You can make that argument both ways. I argue that T | Null is the most flexible, since it allows you to shoot yourself in the foot, whereas Option[T] doesn’t. Option has your back; it’s not hostile the moment you stray from the common case.

I would also argue that T | Null violates another important principle: the principle of generalizability. If I have some correct code for a function foo whose signature and body use ConcreteType, but doesn’t actually rely on any method of ConcreteType. I can fearlessly generalize it to T instead of ConcreteType, as long as it uses Options. I cannot correctly do that if it uses ConcreteType | Null.

The principle of generalizability goes hand in hand with the principle of least power. It is because I can generalize later that it is good practice to use the least power now. If I need more power later, I can generalize. If I cannot easily generalize, I have more incentive to use the greatest power from the start, which is definitely not what we want to encourage.

5 Likes

To take a counter-stance, I don’t disagree that Options have usability shortcomings. I don’t think the correct fix for that is to use more T | Nulls, though. I would be much more supportive of a proposal that makes Options more convenient than they are now.

9 Likes

Isn’t this something a compiler warning or lint can help guard against? It seems like it should be pretty straightforward to say “hey that T | Null is dangerous because T could itself be null, consider using an Option. Since it’s a warning, it’s fine if it’s not 100% precise: false negatives (i.e. missing bugs) are ok as long as it catches enough bugs to be useful, and false positives (i.e. spurious warnings) can be shut up with @nowarn as long as they’re not too frequent.

Then if someone generalizes a ConcreteType | Null into a T | Null, the compiler will appropriately yell at them until they either change it to Option[T], or use @nowarn if they decide they know better at their T will not be null

2 Likes

You’re right that in “normal” Scala, the list map would be picked. But with the compiler option -Yexplicit-nulls, the nullable map is correctly used. Runnable example:

//> using option -Yexplicit-nulls

extension [T](x: T | Null)
    def map[U](f: T => U): U | Null = if x == null then null else f(x)

@main def main(): Unit =
    val x: List[Int] | Null = List(1, 2, 3)
    println(x.map(_.head)) // 1
    // Removing "-Yexplicit-nulls" would lead to an error: "value head is not a member of Int"
1 Like

You might be able to pull off something with a warning. It’s not that easy, though. Warnings that depends on what types one writes are finicky. Types move around and don’t have definite positions.

I think that distracts from the point, though. It remains that there is one thing you use for concrete cases, and another for generic cases. That’s not a good story by any means. It’s not the Scala way. Adding warnings to tame the consequences of losing that orthogonality is not the right direction, when the alternative is not to lose that orthogonality in the first place.

4 Likes

My bad, very nice that it works !

With that in mind, here is an alternate proposal that addresses one of the original issues:

def generate(model: String, prompt: String?) =
  // model has type String here
  // prompt has type Option[String] here
  ???

generate("mymodel") // inside: prompt = None
generate("mymodel", prompt = "Nullable type are") // inside: prompt = Some("Nullable type are")

// And maybe

val optionalParam = Option.when(usePrompt)("Only present when usePrompt is true")
generate("mymodel", prompt = ?optionalParam)

There might be issues with the above syntax, this is just a quick sketch

In particular, the above avoids the issue of (A | Null) | Null:

def foo[T](optional: T?) =
  // optional will have type Option[T] here
  // => T can be Option[A] or A | Null without issue
  ???

foo(Some(1)) // T =:= Option[Int], in body: optional: Option[Option[T]]
foo(?Some(1)) // T =:= Int, in body: optional: Option[T]

This might even be possible through SIP-XX - Unpack Case Classes into Parameter Lists and Argument Lists
Has it has similar ideas, but I’m not yet familiar with the particulars of the proposal

2 Likes

IMO that would make perfect sense and make explicit null more usable. But I suggest two changes. Most importantly, the extension should not be for T but for T <: AnyRef | AnyValue in the spirit of explicit null. Secondly, the methods should only be available if one has activated explicit-null (though this could be optional).
I suspect this will also make all mentioned concerns obsolete. If you need nested optionality then you can still wrap T with Option.

While I can see the appeal of something like this, I am nevertheless opposed. The problem is that A | Null is a supertype of A, and hence all methods available on A | Null are also available on A. This leads to all kinds of problems. When I trigger code completion in my editor, I’m now always going to get a boatload of completions pertaining to nullable values even for values that aren’t nullable. And when I change a function’s parameter type from Option[A] to A, the compiler isn’t going to tell me about all the Option-related method calls in the function body that I can now get rid of but perform a bunch of pointless null checks instead.

If we need an unboxed Option type, it should be a proper type in the standard library implemented as an opaque type alias without a subtyping relationship to the original type. In fact, library implementations of such types already exist, see the Maybe type in Kyo or scala-unboxed-option by @sjrd. So if you need an unboxed Option type with lots of bells and whistles, consider using one of those. And hopefully some day we can get a standard library update to replace the standard Option type with something similar to that.

I would also like to point out that nesting is not the real issue here. That is a solved problem as scala-unboxed-option has shown, those do nest properly.

1 Like

That’s an annoyance I encounter in kotlin (kotlin is kind of explicit-null per default and has T? which is the same as T | Null).
But that’s only because Kotlin doesn’t support lower bounds. In Scala we can, so the extension should be for A where T :< AnyRef | AnyVal, A >: Null :< T | Null
The methods will therefore never show up for non-nullable types.

I’m not quite sure what you mean there, but I can’t make it work.

$ scala-cli -S 3.7.2 -Yexplicit-nulls
Welcome to Scala 3.7.2 (21.0.8, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.

scala> extension[T <: AnyRef | AnyVal, A >: Null <: T | Null](a: A)
     |   def exists(f: A => Boolean) = a != null && f(a)
     | 
def exists
  [T <: AnyRef | AnyVal, A >: Null <: T | Null](a: A)(f: A => Boolean): Boolean

scala> 42.exists(_ == 42)
val res0: Boolean = true

Unfortunately the Scala compiler is still smart enough to figure out that A must be Int | Null to make this work, so it’s still possible to call exists on a non-nullable Int even with those bounds. Could you give a complete example of what you mean?

Looks more like a compiler bug to me since Null :< Int shouldn’t be valid if explicit-null is enabled.

Ah my bad.. I see what you mean. The compiler infers Int | Null and of course Null is a subtype of it. Need to try to find something in front of a computer.

I don’t think it’s a bug. It just calls exists[Int, Int | Null](42)(_ == 42), and that is allowed because 42 is of type Int, which is a subtype of A, i. e. Int | Null. If it’s possible to make this work, then only because the compiler’s type inference isn’t smart enough, so it’s necessarily brittle.

No its not a bug of the compiler, I realised it as well, it was a bug in my reasoning if you like. I thought the compiler would not widen Int to Int | Null due to the lower bound constraint but the compiler doesn’t need to take it into account for Int, only for A. So it basically has no effect at all.

1 Like

I feel compelled to score against my own side, but I don’t want technical difficulties to stand in the way. It is possible to make it work: the extension should be defined in the companion object of Null. This way, it will only be considered when Null appears in the type in the first place.

This is what we had to do for js.UndefOr[+A], which is an alias of A | Unit. In Scala.js, extensions for it are injected in the companion object of Unit. (Unit is the type of the undefined value of JS.)

I wouldn’t recommend using js.UndefOr as a precedent, though. This was a concession to source compatibility wrt. Scala 2, in which js.UndefOr[A] was a distinct type, which was not a supertype of A. It didn’t have adequate pattern matching to back it up, so an API was the lesser of several possible evils. If we had to do it again in Scala 3, I doubt we would have given it that API.

This would make perfect sense IMHO. It’s a simple extension of the varargs syntax that already exists.