SIP public review: explicit Nulls

sjrd · December 20, 2019, 3:06pm

Hi Scala Community!

This thread is the SIP Committee’s request for comments on a proposal to make reference types non-nullable by default (e.g., val s: String = null would not be valid anymore). Nullable types would be expressed very naturally with a union type (e.g., val s: String | Null = null). You can find all the details here.

In the current version of Dotty, explicit nulls are enabled only with the flag -Yexplicit-nulls.

Motivation

The motivation is to considerably reduce the cases in which NullPointerExceptions can happen in a Scala codebase. It also helps the writing of performance-sensitive code that uses nulls on purpose, by providing help from the type checker to deal with those nulls.

Summary

Currently, in Scala, all reference types (types of a class or interface that extend AnyRef) admit two kinds of values:

instances of the class or interface, which is good, and
null, which is bad.

When trying to call any method on null, a run-time NullPointerException is thrown. For example, the following program typechecks, but fails at run-time:

val x: String = null
println(x.substring(1))

With explicit nulls, a value of a reference type (such as String) does not admit null as a valid value anymore:

val x: String = null // compile error: String expected but Null found

We can use a union type to explicitly mark a nullable type. In that case, trying to call a method on a nullable type will result in a type error:

val x: String | Null = null // ok
println(x.substring(1)) // compile error: substring is not a member of String | Null

The type checker forces us to deal with the null case, for example using pattern matching:

x match {
  case x: String => println(x.substring(1))
  case null      => println("it was null")
}

Flow typing

Because pattern matches can be burdensome for manipulating nullable types, we introduce a limited form a flow typing that is able to track the effective nullability of vals:

if (x != null) {
  // x: String in this branch
  println(x.substring(1)) // ok
} else
  println("it was null")

Java interop

Since Java libraries have nullable reference types, we must interpret any reference type in a Java library as nullable. However, doing so naively by adding | Null everywhere forces way too many spurious tests for nulls when using APIs that do not in fact manipulate nulls. This can make the code significantly less readable.

The solution is to use a magical UncheckedNull instead when translating types from Java. A String | UncheckedNull still cannot be assigned to a String, but it is possible to call methods of String on it:

val javaStr = "hello".substring(3)
val s: String = javaStr // compile error: String expected but String | UncheckedNull found
println(javaStr.length) // ok, prints 2

This choice is obviously a design trade-off between usability and soundness.

More details

There are many more details in the documentation page:
https://dotty.epfl.ch/docs/reference/other-new-features/explicit-nulls.html
Make sure to read the details before commenting here.

Discussion

Opening this Proposal for discussion by the community to get a wider perspective.

jducoeur · December 20, 2019, 3:42pm

– IMO, this is one of the lowest-hanging-fruit in Dotty. It helps with a pain point that almost any Scala project encounters if it works with Java libraries.

My only real concern is the behavior of UncheckedNull. Personally, I’d be happier if the behavior was more Option-like – if it is null, skip the chained operations, rather than throwing an NPE.

But this is probably a minor issue provided we can treat UncheckedNull monadically in a for comprehension via extensions. That is, I’m okay with this provided I could optionally write:

val result: String | UncheckedNull = for {
  v <- someJavaMethod()
  trimmed <- v.trim()
  sub <- trimmed.substring(2)
  lower <- sub.toLowerCase()
}
  yield lower

That’s much wordier, but still an improvement over the Scala 2 situation. (And could be shortened with a specialized pipe operator, I think.) I haven’t played enough yet with the new unions to know whether this is a straightforward thing to do at the library level or not.

But even without that, this is a clear improvement to the language, so yay…

bergmark · December 20, 2019, 3:50pm

Great feature. Really looking forward to this.

I have a couple of questions:

The linked dotty page says this is an opt-in feature, is that part of this proposal or would the feature always be enabled?
I find it a bit confusing that javaStr.length may cause an NPE while javaStr: String would be a compilation error. I personally wouldn’t mind having to do javaStr.nn.length (or something safer) for the sake of consistency. Has there been previous discussion/investigation on why the proposed solution hits the sweet spot? Any chance for a compiler flag to treat UncheckedNull like Null?

bergmark · December 20, 2019, 4:03pm

I’d be happier if you had to explicitly pick the one you want Skipping the chained operation may also cause bugs.

oscar · December 20, 2019, 4:21pm

If you want to use A | UncheckedNull In a for comprehension I think all you need to do is implement extension methods flatMap and map which could be done in a library.

I think the issue comes if A has map/flatMap methods then the extension won’t be triggered.

Alternatively you could use opaque type to wrap A | UncheckedNull and add methods to that.

So, I think you can reasonably ergonomically do this in library code without language changes (although having the Nullable opaque type in the standard library would be nice).

curoli · December 20, 2019, 4:20pm

My main concern would be how robust the flow typing is under refactoring. For example, would this still work:


  if (x != null) {
// x: String in this branch
println(x.substring(1)) // ok
} else
println("it was null")

if I do a little trivial refactor into:

val xIsNotNull = x != null

  if (xIsNotNull) {
// x: String in this branch
println(x.substring(1)) // ok
} else
println("it was null")

Or maybe into:

def isNotNull(x: String|Null): Boolean = x != null

  if (isNotNull(x)) {
// x: String in this branch
println(x.substring(1)) // ok
} else
println("it was null")

julienrf · December 20, 2019, 4:35pm

The places where I need explicit null checking are mostly when inter-operating with Java code. However, because of UncheckedNull this proposal does not help. Would it be possible to make UncheckedNull opt-out?
There is no mention of non-parametric patterns like combining Null with type parameters as in type Opt[A] = A | Null (this pattern is unsound because Opt[Opt[A]] is indistinguishable from Opt[A]). Should we add some guards to prevent them to be used?

morgen-peschke · December 20, 2019, 5:17pm

I would really like this. Even dropping a bunch of .nn at the boundary is a really small price to pay for easy to diagnose NPEs.

Would this proposal take into account the more popular variants of not-null annotations to allow those parts of the Java ecosystem we can infer will never be null to be typed accordingly?

joshlemer · December 20, 2019, 6:48pm

I think we should pay close attention to what has been done with java interop in the Kotlin world, where values returned from Java are referred to as “platform types”. Here is where you can read more about that https://kotlinlang.org/docs/reference/java-interop.html#null-safety-and-platform-types

Basically, the approach that they’ve taken is that when Kotlin calls into Java code, the nullability of the returned value is not 100% defined, it exists in some limbo state which they call a “platform type” which is not mentionable in the language itself but is talked about in English and in auto-complete with the ! suffix (String!, List<String!>, List<String!>!,etc). The properly defined nullability of the type, and its type parameters, is definable by the user, via type ascriptions.

So consider this java code:

import java.util.List;
import java.util.Map;

public class Foo {
    public static Map<String, List<String>> getMap() {
        return null;
    }
}

Just from looking at the type signature, it’s not clear which of the following is the null-aware type. Here’s the possibilities:

Map<String, List<String>>
Map<String, List<String>>?
Map<String, List<String>?>
Map<String, List<String>?>?

Map<String, List<String?>>
Map<String, List<String?>>?
Map<String, List<String?>?>
Map<String, List<String?>?>?

Map<String?, List<String>>
Map<String?, List<String>>?
Map<String?, List<String>?>
Map<String?, List<String>?>?

Map<String?, List<String?>>
Map<String?, List<String?>>?
Map<String?, List<String?>?>
Map<String?, List<String?>?>?

So the inferred type is actually Map<String!, List<String!>!>! which can be refined by the user at call time. So in kotlin I have a choice. All of the following compile:

val m: Map<String, List<String>> = Foo.getMap()
val m: Map<String, List<String>>? = Foo.getMap()
val m: Map<String, List<String>?> = Foo.getMap()
val m: Map<String, List<String>?>? = Foo.getMap()

val m: Map<String, List<String?>> = Foo.getMap()
val m: Map<String, List<String?>>? = Foo.getMap()
val m: Map<String, List<String?>?> = Foo.getMap()
val m: Map<String, List<String?>?>? = Foo.getMap()

val m: Map<String?, List<String>> = Foo.getMap()
val m: Map<String?, List<String>>? = Foo.getMap()
val m: Map<String?, List<String>?> = Foo.getMap()
val m: Map<String?, List<String>?>? = Foo.getMap()

val m: Map<String?, List<String?>> = Foo.getMap()
val m: Map<String?, List<String?>>? = Foo.getMap()
val m: Map<String?, List<String?>?> = Foo.getMap()
val m: Map<String?, List<String?>?>? = Foo.getMap()

I can go with the completely safe option:

val m: Map<String?, List<String?>?>? = Foo.getMap()

or the completely unsafe option, and assert the non-nullability for all options.

val m: Map<String, List<String>> = Foo.getMap()

or anything in between. If I have asserted that something is guaranteed not null, when in fact it is null, then at some point downstream I may receive a null pointer exception, but that will only be when I try to get a non-null value out of null value. For example, consider this java code:

public class Foo {
    public static Map<String, List<String>> getMap() {
        Map<String, List<String>> map = new HashMap<>();
        map.put("nonNullValue", Arrays.asList("a","b","c"));
        map.put("nullValue", null);
        return map;
    }
}

and this calling kotlin code

fun main(): Unit {
    val m: MutableMap<String, List<String>> = Foo.getMap()

    val x0: List<String> = m.get("nonNullValue")!!
    println("x0 = $x0")
    val x1: String = m.get("nonNullValue")!!.get(0)
    println("x1 = $x1")

    try {
        val x3: List<String> = m.get("nullValue")!!
    } catch (e: Exception) {
        println("e = $e")
    }
}

which outputs this

x0 = [a, b, c]
x1 = a
e = kotlin.KotlinNullPointerException

This happens because Kotlin will insert null checks at runtime whenever you have chosen to assert that a platform type is non-null.

LPTK · December 20, 2019, 7:11pm

This is not “unsound”, as in: it does not cause unsoundness in the type system. It’s just bad design.

ryanstull · December 20, 2019, 8:07pm

I agree that most of the time I encounter null is dealing with Java libraries, and I too would like some way to make UncheckedNull opt-out.

I think there are situations where safety is the priority and other situations where easy of use is the priority; so if there was a flag to switch between those two, that would be very nice.

lrytz · December 20, 2019, 8:22pm

One minor concern (but still I’d like to mention it) is that the feature could promote the use of null, as it is better for performance while we don’t have unboxed Option. If Null starts being used more, there can be difficulties in interop while the feature is opt-in; assuming non-nullness for Scala code compiled without the flag is not sound. I agree this is the right choice though.

As others mentioned, the main use case is probably Java / JS interop. It would be great if there was a way to tell the compiler which methods in a Java API can return null, and which can’t. For String this is kind of crucial, as it’s an integral part of the Scala standard library (s.substring should have non-null Strong type). For Scala.js this can be done naturally in the facade declarations. It would be nice to have a way to do that for Java dependencies. Or is the use of @NotNull widespread enough? The JDK is a counter-example.

RichType · December 20, 2019, 8:27pm

So my original motivation for a specialised Opt type was an A* path-finding, where performance is important and hence I didn’t want the double boxing of the Standard Library’s Some[Int]. But I’ve created a generalised schema that uses nulls for unboxed optional reference types. So I’m a bit concerned if this proposal will break my code.

class OptRef[A >: Null <: AnyRef](val ref: A) extends AnyVal with Opt[A]
{ def fold[B](vNone: => B, fSome: A => B): B = ife(ref == null, vNone, fSome(ref))
  override def toString: String =
   fold("NoRef", v => "Some(" + v.toString + ")")
  def empty: Boolean = ref == null
  override def map[B](f: A => B)(implicit ev: OptBuild[B]): Opt[B] =
    ife(empty, ev.none, ev(f(ref)))
  override def flatMap[B](f: A => Opt[B])(implicit ev: OptBuild[B]):
    Opt[B] = ife(empty, ev.none, f(ref))
}

implicit def refImplicit[B >: Null <: AnyRef] = new OptBuild[B]
{ override type OptT = OptRef[B]
  override def apply(b: B) : OptRef[B] = new OptRef(b)
  override def none: Opt[B] = new OptRef[B](null)
  }
}

lihaoyi · December 20, 2019, 9:54pm

I like the idea in general, but I feel like there is far too little detail in this proposal considering how foundational a change is being proposed. Here are my immediate questions after reading through:

The behavior of UncheckedNull seems pretty underspecified. Can I pass the an UncheckedNull type to a method expected a non-UncheckedNull parameter? Will that throw an exception immediately, or in the guts on the method when the parameter is finally used, or only if the parameter is used in a way that would normally throw an NPE?

(My personal preference would be to allow seamless assignment from UncheckedNull types to non-UncheckedNull types, but automatically insert a check to eagerly throw an NPE. That would be a strict improvement over the current state, where the NPE is thrown somewhere randomly deep in the guts of the method being called, far away from the place a contract is being violated, but also not cause any additional breakage in user code)
What are the edge cases that we should be aware of? The linked document tells us all the things that work great. What are the things that don’t work well, and trade-offs that we were forced to make in the design of this language feature?
If things break, what is the migration plan? Is it mostly a mechanical fix, or is it more involved than that?
How much breakage do we expect “in the wild”? Is this a “probably won’t break much” change, “will only break your Java interop” change, or a “will break a significant amount of pure Scala code” change?
How much is .nn expected to be used? Hardly ever? Or should I expect thousands of .nn calls for code that works with Java libraries?
Is there any way to “bulk” .nn a large block of code to avoid these calls, or as @lrytz mentions write a facade to declare the nullability of an un-annotated Java API in one place so we can safely use it throughout our code, via T or T | Null, not UncheckedNull? The Ruby, Python, Typescript folks have a similar setup where they can provide facade-types separately from the code in question; it’s not perfect, but it works OK given the constraints (which are similar to ours)
Scala.js also has an undefined magic value, which has similar usage patterns to null in Java. Can that be captured in the same type system feature somehow?
On that note, would all our Scala.js facades need to be updated, since they currently do not specify Nullability anywhere in their signature? (They do usually specify undefined-ness though, with js.UndefOr[T])
Apart from unsound initialization causing NPEs, are there any other cases where unsoundness could slip in and cause NPEs in weird places? Or is the type system sufficiently buttoned-up that this should never happen?
The change in == and != say you cannot compare known-null with known-not-null values, but what about (x: T) == (y: T | Null)? Is that allowed?
Flow typing looks great where it works. Where does it not work? What are the limits of the flow-typing algorithm that we should be aware of? Can it be aware of the java.util.Objects.{requireNonNull, isNull, nonNull} methods that are common in the Java ecosystem, or the x.getClass idiom for eagerly throwing NPEs?
Could I write my own method that interacts with flow-typing e.g. to validate non-null-ness of its parameters?
We are able to detect the nullability of some local mutable variables seems like it deserves a rigorous listing of what cases can be inferred and what can’t. In the cases where we cannot infer nullability, what then? Is it just a matter of annotating types, or are there cases more invasive changes are necessary?
The document mentions Java generics have an un-nullable type param Box[T], but with a .get method returning T | Null. But what about Java-implemented containers which we know never return null unless we put one in? Those would require Box[T].get return T, and Box[T | Null].get return T | Null. Does that work? That possibility seems to be dismissed by the linked page, but it seems like a reasonable use case that should be supported.
Can the list of @NonNull annotations be passed in at compile time, e.g. if I want to define my own @NonNull annotation for my own Java code without needing to pull in a third party dependency?
Apart from using @NonNull annotations for inference, can we also use @Nullable to automatically infer T | Null rather than T | UncheckedNull? e.g. Google Findbugz, Spring, IntelliJ-Platform, and others have such annotations, and put together they get a decent amount of usage in the community
If I have a Java interface with a method void foo(String s), do I need to override it from Scala as def foo(s: String | Null): Unit or def foo(s: String): Unit? Or would either be acceptable?
How about between Scala methods: could I override def foo(s: String): Unit with def foo(s: String | Null): Unit, or vice versa, given method parameters are typically invariant?
Could a Java concrete class implement a Scala interface method def foo(s: String): Unit, given the Java equivalent void foo(String s) is actually equivalent to def foo(s: String | Null): Unit?

In general, given the significance of the proposed change (this isn’t just a bit of syntactic sugar!) I’d hope for at least the following sections in such a proposal:

A blog-post-style introduction to the concept, to read and build intuition
A somewhat-rigorous reference specification for what works and what doesn’t, that we can look up to resolve ambiguity
Limitations and constraints, where things stop working and where the “happy path” ends, and what happens then
Alternatives that were considered and tradeoffs that were made to come up with the design, and the reasoning behind the decisions that brought us to the design we have now instead of something else
A migration plan: how much breakage to expect, how breakages can be resolved, how we expect the migration process to play out

The linked document has bits and pieces of each section here and there, but isn’t really laid out in a way that is easy to consume, and is pretty spartan overall. That makes it very hard for someone on the outside to come in and give a proper design review.

I know it’s a lot to ask, but this is a huge change to the very fundamentals of the Scala programming language, so hopefully it’s not too much to ask!

sjrd · December 21, 2019, 1:29pm

We have paid close attention to what Kotlin did, and UncheckedNull + the rules for where to insert it are basically the adaptation of platform types to a system where nullability is represented with a union type.

sjrd · December 21, 2019, 1:49pm

@lihaoyi: Answering some of the easier questions for now. I don’t have enough time right now to answer the deeper questions.

It is specified by those two rules:

It is a type alias for Null
It receives magical treatment for selections (and selections only).

For anything that is not a selection, T | UncheckedNull behaves exactly as if it were T | Null.

Except where overriding is involved, I believe most things would be a .nn away in case something breaks.

This proposal does not provide anything for undefined (= Unit) at the moment. However I have already made sure that it is generalizable to dealing with Unit/undefined in the future (this basically only means adapting the flow typing to also care about () besides null).

Scala.js facades for things that are actually nullable will need to be updated with the appropriate | Nulls, yes. Note that js.UndefOr[T] is exactly the same thing as (an opaque type alias of) T | Unit.

To the best of my knowledge, NPEs can only happen when calling .nn or when selecting a member of a T | UncheckedNull.

Yes, that is allowed, obviously, since the intersection of those two types is non-empty.

Currently it only works exactly in the cases that are mentioned in the proposal document. Anything else will not influence flow typing.

No, that is not possible.

If the analysis is not smart enough for your code pattern, at worst you might need to add some .nn calls. In some cases you might be able to teach it the right thing by assigning the var to a temporary val before working with the val. Flow typing has more power with vals than with vars.

There is currently no provision for that use case. How would you propose we adapt the rules to make this work?

Currently not, but I agree with you that those things should be possible.

You need to override it with def foo(s: String | Null): Unit or def foo(s: String | UncheckedNull): Unit. This directly follows from the rules on how Java interfaces are interpreted, and from the fact that method parameters are invariant. def foo(s: String): Unit is a different, incompatible type.

Parameters are not typically invariant. They are always invariant. So you must override def foo(s: String): Unit with def foo(s: String): Unit. And you must override def foo(s: String | Null): Unit with def foo(s: String | Null): Unit (or def foo(s: String | UncheckedNull): Unit, but I would not recommend that).

Java will of course only know about the Java-erased types of your Scala interface. So your Scala method def foo(s: String): Unit will appear to Java as void foo(String s) (with a nullable string because it’s Java). This is unfortunate but no different than when more complex types that exist in Scala but not in Java are erased to simpler Java types when Java sees them.

lihaoyi · December 21, 2019, 1:55pm

Thanks for your responses Sebastien!

I don’t have an answer for this, unfortunately. Union types are novel to me, as are flow types, and UncheckedNulls, and union types with flow types with UncheckedNulls over Java interop with generics is far beyond my capabilities. I won’t be able to give a concrete solution, but hopefully by highlighting a potential problem someone can help come up with a way of addressing it.

lihaoyi · December 21, 2019, 1:57pm

Is how this proposal relates to Kotlin platorm types written up somewhere? This sort of comparative analysis would help greatly in contextualizing this proposal in the broader space of possible solutions. Especially the differences, e.g. why they did things one way and we are proposing to do things another way, are typically very useful in illustrating how we ended up with the proposal in front of us.

Such a write up would not be useful at all for those who did the comparison and have the knowledge in their heads, but would be of immense value for an outsider with zero background trying to quickly pick up enough context to give a useful design review

AndreVanDelft · December 21, 2019, 2:31pm

Good proposal.

There were discussions 7 years ago about this subject; they may contain some considerations that have not yet been brought up here; anyway there is a nice link to the early history of nulls:

hrhino · December 21, 2019, 4:34pm

As @AndreVanDelft’s links point out, Array would probably require special treatment, because new Array[T](10) should only compile if T >: Null or T is primitive, which is a condition I don’t think can be expressed in user-written code (which thankfully Array isn’t).

Are we planning to add this restriction, or will that just be considered unsound in the same way as uninitialized fields? I’d hope that the weirdness in making Array's constructors irregular wouldn’t be too off-putting.