[WIP] Scala with Explicit Nulls

@olhotak and I have been prototyping in Dotty what it would take to make reference types non-nullable by default (e.g. val x: String = null would no longer type).

I just wrote a doc describing our design, and any feedback here would be greatly appreciated.

Doc: https://gist.github.com/abeln/9f79774bac111d99b3ae2cb9016a33e6
PR: https://github.com/lampepfl/dotty/pull/5747

29 Likes

Just wanted to give shout-out to you guys. This is super valuable and I hope it will get merged! Thanks

4 Likes

Me too, can’t say anything about the compiler internals, but the effort seems great. I hope this can be reviewed and be considered!

1 Like

Hello,

Thanks a lot for your efforts. This is a really hard problem, and I’m glad it’s been tackled to that depth. When I first saw the idea of the T | JavaNull, I said it was the first design I’d seen that had a chance not to crash and burn; the last time I said something like that, @odersky replied: “[Coming from you], I take that as strong endorsement.”

Obviously, I have quite a number of comments :smile:

Equality (ref in doc)

Because of the unsoundness, we need to allow comparisons of the form x == null or x != null even when x has a non-nullable reference type (but not a value type). This is so we have an “escape hatch” for when we know x is nullable even when the type says it shouldn’t be.

I disagree with that reasoning, for two reasons:

  • a normal program never tests whether a val x = something is in fact null to test around initialization problems. This is a false problem, it shouldn’t be tackled.
  • there is already an escape hatch if someone really really wants to do that: (x: T | Null) == null

That said, you can’t prevent anyone from ever doing x == null regardless of the type of x, because == is defined on Any, so this whole thing is quite moot anyway. It’s possible to warn when trying to compare a non-nullable type with null, though, like scalac already does for primitive types:

Welcome to Scala 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60).
Type in expressions for evaluation. Or try :help.

scala> val x: Int = 5
x: Int = 5

scala> x == null
<console>:13: warning: comparing values of types Int and Null using `==' will always yield false
       x == null
         ^
res0: Boolean = false

Similarly, I think the section on Reference equality is biased by the same above false issue. Here it’s even more annoying because you have to add an entirely new top-type, that is magical, in the type system. I strongly suggest to remove all of that. I can use x == null instead of x eq null anyway, if I want to do that! And yes, == null is always as efficient as eq null, so performance is not a good excuse either.

Working with Null (ref in doc)

x.nn, as a method name, does not follow established conventions for methods in the Scala library: alphanumeric method names should be full words, not initials (eq and ne are unfortunate precedents, but let’s not exacerbate the issue). This is one of the cases where I would recommend a symbolic operator, such as x.!!. Otherwise, spell out x.ensureNotNull or something like that.

In addition, for both the above method, and the implicit conversions that you mention below, I strongly suggest that they be hidden behind an import, such as

import scala.NullInterop._

I could live with x.!! always in scope, but please don’t put unsound implicit conversions in scope for every program, every time.

Nullification function (ref in doc)

I am confused why nf(A | B) and nf(A & B) need to be defined. IIRC, nf is only applied to types loaded from Java source files and Java-generated class files, and those would never contain | or & types. Can you elaborate on an example where this is relevant?

JavaNull (ref in doc)

I agree with @smarter’s comment on the PR that it is problematic that we cannot write down a JavaNull type in the source code. I fail to see any advantage to that restriction. Can we simply lift it?

Flow-sensitive inference (ref in doc)

Although I sympathize with the issue that this feature addresses, I can’t help but have a bad feeling about it. There is no precedent in Scala for anything like that, and I am wary of adding it as part of such an important change. I believe pattern matching should be enough to safely deconstruct nullable types.

Besides, if we really want this, I find it very odd that it only supports null. I would expect it to support other idioms, such as

if (x.isInstanceOf[Foo]) {
  val y: Foo = x // typechecks because I know `x` is a `Foo`
}

At the very least, I believe that this flow-sensitive inference should be presented as an orthogonal proposal. Fundamentally, it has nothing to do with non-nullable types.

That’s all for today :wink: Thanks again for all your hard work. I hope we can get this all the way to the finish line.

13 Likes

No idea for union, but intersections types can appear in type parameter bounds in Java.

1 Like

Thank you for the detailed comments and suggestions.

Many of our decisions were guided by the point made several months ago that smooth migration of existing code is extremely important. On some decisions, perhaps we have prioritized migration a bit too much.

On equality, in existing Scala, null supports eq/ne because Null <: AnyRef, but our proposal makes this no longer true. Thus we wanted to support eq/ne for backwards compatibility. But since, as you point out, testing for null can still be done with ==/!=, I agree that we can remove eq/ne from Null and remove the RefEq trait that we introduced. I don’t feel strongly either way about a warning for ==/!=; we can add one if desired.

I think we agree with your suggestion to put the implicits behind an import. Our point was merely that we do want to have these implicits available somewhere, but they do not have to be globally in scope everywhere.

On nf(A | B) and nf(A & B), I will let @abeln comment, but I think these are an artifact of an idea that we eventually rejected, which was to treat .class files compiled with existing versions of Scala (without the explicit null support) the same way as .class files coming from Java. That is, we would have applied nf() to all types in such .class files. This would have been more sound but would have made migration more difficult, and especially would make it difficult to maintain codebases that need to be compiled with different Scala versions. We decided instead to keep the types in existing Scala .class files as they are, so for example, if a method has a return type String, we take it on faith that it will not return null. This is unsound for now but makes migration easier, and we will have soundness eventually once all code (and all libraries…) have migrated.

Not being able to write JavaNull explicitly came out of an earlier discussion with @odersky. JavaNull is a bit of an unpredictable hack, and we wanted to discourage its spread throughout a codebase away from the Java interop boundary. That said, since it is just an alias for an annotated Null which itself can be written, I don’t feel strongly about preventing JavaNull from being written. We can discourage it in documentation rather than banning it outright.

Flow-sensitive inference is the one area where I think I disagree with you. Yes, technically it is an orthogonal issue, and yes, it could be generalized to support things other than null. I also share your nervousness about there being no precedent for any similar flow-sensitive inference in current Scala. However, looking at existing code, we concluded that the if(x != null) idiom is so widespread that we cannot have a smooth migration without some support for flow-sensitive inference. We originally wanted to delay implementing it, but found it necessary even just to get many of the tests in the Dotty test suite to pass: many of them use x as non-null after a null check. Porting the standard library and getting Dotty to bootstrap will also require this: both contain many instances of if(x != null). Yes, I agree that it would be more consistent with the rest of the language if people just wrote something like:

x match {
  case null => ...
  case nn: T => ...
}

instead of if(x == null) ... else .... But the point is that existing code uses if for this in many places, not match.

A more general version of flow-sensitive typing will take a long time to reach consensus on, and might never be accepted to the language. We don’t want to block explicit nulls waiting for a general flow-sensitive typing feature that may never come.

That said, I do also feel uneasy about opening this can of worms without precedent, and thus I would welcome any alternative suggestions that we may not have thought of to ease migration of existing uses of if(x != null).

5 Likes

Does the flow control mechanism take possibly concurrent mutation into account?

var x = _

if (x != null) {
  //because I know this code will never run in a concurrent environment
  //I can assume x is not null here, but the compiler doesn't know about that
}

seems like a common idiom

1 Like

Great to see this progressing!

On flow-sensitive type inference: I agree it’s needed for the case of nulls. But I also think the most natural way would be to coach this in terms of type-tests. I.e. treat x == null as x.isInstanceOf[Null] and introduce a flow-sensitive type system for type tests. I don’t see how this would be harder than the original problem and the generalization could be useful.

3 Likes

The flow-sensitive inference is done only on stable paths (so only vals, not vars).

3 Likes

What about compatibility with Option?

Ideally, something like type Option[T] = T|Null should be possible, to have instantaneous codebase adaptation and to avoid questions “should I use Option or T|Null now?”. As far as I understand, to achieve that, null should behave as a monad (just as None ), e.g. null.map(f: A => B) == null, and T|Null should have corresponding methods (just as Some[T] ), e.g. x: T|Null .map(f: T => K) == if (x == null) null else f(x) .

If I’m not missing anything (some corner case for monad laws?) and introducing dummy methods on null is acceptable (not sure if that can be implemented technically), that would be a perfect drop-in replacement for Option.

Edit:
With assumptions in my post, for i: Int | Null and
def foo(x): Int | Null = null,
def bar(x: Int | Null): Int = if (x == null) 0 else 42 + x,
x.map(foo).map(bar) =! x.map(foo andThen bar). So, such replacement is impossible. That opens another questions:

  1. when one should use Option[T], and when T | Null ?
  2. how chaining option.map(foo).map(bar).flatmap(baz) would look like with T | Null ? One possible answer here is to use ?-like syntax. Personally, that seems worse than Option to me, as it doesn’t have such chaining flexibility, and promotes nulls usage (by providing special support for them).

Unfortunately, this is not that simple. You should not use T | Null if T is a universally quantified type, because that T type could be instantiated to Option[U], so that the Option[Option[U]] type would expand to U | Null | Null, which would be simplified to U | Null, meaning that Option[Option[U]] and Option[U] would be undistinguishable! (this has bad consequences on parametric code)

4 Likes

Conflating Null union and options is a bad idea and I believe @sjrd had a longish post somewhere about that. That beeing said I think having .toOption on null union (probably via extension method) would be quite reasonable.

8 Likes

The answer is, you should never use T|Null unless outside interoperability forces you to, or you need to do non-premature micro-optimization. Option should remain Option. It was never meant to be a safe replacement for null, but a safe construct that avoids the need for null.

2 Likes

That can be written:

(for {
   ret  <- Option(someJavaMethod())
   tmp  <- Option(ret.trim())
   tmp2 <- Option(tmp.substring(2))
 } yield tmp2.toLowerCase()
).get

But @olhotak’s point remains about migration problems.

I also think it would be useful to generalize the flow-sensitive typing, or at least leave the door open to future generalization; that is, implement it generally (which does not seem much harder, as pointed our by @odersky), even if at first it is only enabled for null checks.

I think you’re talking about this longish post: SIP Suggestion: Add ?: and ?. syntactic sugar for more convenient Option[T] usage :smile:

3 Likes

Thanks for the detailed and thoughtful reply!

Equality

I had Dotty (as opposed to scalac) in mind when I wrote this. Because of multiverse equality, some equality comparisons are disallowed in Dotty

scala> 1 == “hello”
1 |1 == “hello”
|^^^^^^^^^^^^
|Values of types Int and String cannot be compared with == or !=

The current rule for null says: “allow equality comparisons with null if the value compared isn’t an AnyVal” (dotty/compiler/src/dotty/tools/dotc/typer/Implicits.scala at main · lampepfl/dotty · GitHub). What I wanted to communicate was that the rule should remain unchanged, even though reference types are now non-nullable.

I agree with both your points, and I quite like (x: T | Null) == null: it makes it explicit that something’s gone off with the supposedly non-nullable value x.

That said, even if unsound initialization isn’t a good-enough reason for allowing equality comparisons with null, backwards compatibility might be. I searched for places in the Dotty community build where there are equality comparisons involving null, and eq null and ne null seem to be quite common:

  • == null: 0
  • != null: 0
  • eq null: 582
  • ne null: 469

Full list of occurrences here: Equality comparisons involving null in Dotty community build · GitHub

Searching all public repos in Github shows many hits as well: Code search results · GitHub

So there seem to be two options here:

  1. Allow both ==/!= and eq/ne on null (both as an argument and receiver). This is backwards compatible, but seems to require the introduction of the magic RefEq trait (magic because it’s erased to Object).
  2. Allow only ==/!=. This has the advantage that we avoid RefEq, but now we need rewrite rool that converts all the occurrences of eq/ne null above to ==/!=, respectively.

I don’t have a strong opinion either way.

Working with Null

I agree with both of your suggestions:

  1. rename .nn to .!!, which is consistent with Kotlin (https://kotlinlang.org/docs/reference/null-safety.html#the--operator). The only question would be whether !! is already in use by a popular library.
  2. put the array implicit conversions behind an import. This is already the case, it just wasn’t mentioned in the doc: https://github.com/abeln/dotty/blob/explicit-null/library/src-bootstrapped/scala/NonNull.scala

Nullification Function

Like @smarter said, nf(A & B) is needed because it’s used in Java generics. I can’t quite reproduce the example that uses it, but it gets added here dotty/compiler/src/dotty/tools/dotc/core/classfile/ClassfileParser.scala at main · lampepfl/dotty · GitHub

nf(A | B) is not really needed and can be removed.

JavaNull

I agree with @smarter and your suggestion that users should be able to write down JavaNull, so we’ll lift the restriction.

Flow-sensitive Type Inference

As per the usage stats above, the

if (x ne null) {
// do something with x, access its fields, etc
}

pattern seems quite common. See for example

(these are just arbitrary examples off github search)

Unfortunately, the type inference isn’t able to handle some of the usages: for example, if they involve a non-stable path:

I don’t have a good sense for what percentage of the usages the type inference can handle (and hence how much value we get from it), but from what I’ve seen so far I lean towards saying we do need it. Can you think of a different way to migrate/rewrite that usage pattern?

I like that this generalizes well. Will prototype it in the current PR.

1 Like

One section I’d particularly like to get feedback on is the binary compatibility one: https://gist.github.com/abeln/9f79774bac111d99b3ae2cb9016a33e6#binary-compatibility

To restate our approach here: when loading Scala code compiled with a pre-explicit-null compiler, we leave the types unchanged. That is, we don’t apply the nf function above to Scala types (only Java types).

This has the nice property that you can update your code with minimal changes to the explicit-nulls world, before your dependencies have updated.

Notice that the “unit of update” is whatever sources are in your build. In particular, it’s not possible to have part of a project with explicit nulls and the other part with implicit nulls. It’s also not possible to decide that some dependencies will be imported in “strict” mode (explicit null) while others won’t. I think to the user this makes for a conceptually-simpler model of what types mean, but there’s less granularity/control over the feature.

Do people have any concerns/ideas around binary compatibility?

It’s used by scala.sys.process, which is bundled with the standard library

@ import scala.sys.process._
import scala.sys.process._

@ "ls".!!
res4: String = """LICENSE
build.sc
out
readme.md
requests
"""

Along with the following other operators:

!   !!  !!< !<  ### #&& #<  #>  #>> #|  #|| %   %%

I’d be in favor of deprecating scala.sys.process and/or moving it into a separate optional module. The code is awful, the API is crazy, and it has had approximately 0 progress made since it was merged into scala/scala 9 years ago un-reviewed.

8 Likes

The first step would be to add non-symbolic aliases for all these operators, I think a PR doing that would be accepted (though maybe not with RC1 so close now): https://github.com/scala/bug/issues/11133