Make Null a subclass of AnyVal under -Yexplicit-nulls

Hello,

Currently, under the normal type system, Null is a magical type. It’s not really a class (despite how it’s presented in the doc. It has special rules to make it a subtype of every reference class type. This is illustrated by the following class diagram:

Under -Yexplicit-nulls, Null is less magical. In the core type system, it is almost a regular class (with a single instance null), which extends Matchable (and hence Any). There are no special subtyping rules in the core type system for Null anymore. (There are new type checking rules for flow typing, but that’s a different story.) This is illustrated by the following diagram:

Problem: Null is now sitting front and center in the Scala class hierarchy. It’s painfully obvious that you have to learn about it. Given it’s position, clearly it is a very important type that everyone should know about! That’s not the impression we want to give. In a world with -Yexplicit-nulls, you should be able to learn about Scala without learning what Null and null even are (until you need interop scenarios and/or expert-level performance tweaks).

Proposed solution: Move Null under AnyVal. Make it no more special than Int or Unit. The move is illustrated here:


Now Null is not front and center in the diagram that every beginner should learn. It’s possible to teach a lot of Scala without ever mentioning it.

Null has all the properties of a value class: it has == but not eq; the (only) instance can be obtained from a literal syntax (namely, the keyword null), and that instance is indistinguishable from other instances obtained that way (unlike "foo" and "foo", which could be distinct instances that are not eq). Therefore, it makes perfect sense to “hide” it under AnyVal. If you have trouble wrapping your head around this, observe that null: Null and (): Unit have a lot in common.


Historical note: at the beginning of the explicit-nulls experiment, Null was still a direct subclass of AnyRef. This was not great. It is not rare to use AnyRef as an upper bound for some T <: AnyRef. That gives you the ability to perform eq on Ts, for example. With Null under AnyRef, you could not rule out that T would in fact be nullable. That meant it wasn’t safe to manipulate a T | Null for performance-sensitive code, for example, and there was no way to define T to make it safe. With Null separate from AnyRef, that becomes safe.

Why is that not an issue with Null <: AnyVal? Because unlike AnyRef, AnyVal is useless as an upper bound. It does not give you any power that Any does not give you already. Moreover, the performance argument of manipulating a T | Null when T is not an AnyRef does not hold, since it may require boxing. In practice, it is very unlikely that there exists a meaningful T <: AnyVal anywhere.


I submitted a PR for this change here:

WDYT? Any objection?

9 Likes

Does null.asInstanceOf keep it magic, or should we have a compiletime.defaultOf?

I could probably answer this if I checked the PR, but is this also true for the non-explicit-nulls case?

This is, I think, where most of the friction is likely to arise. The advantage of merely pulling Null out so it’s no longer the bottom type of the AnyRef hierarchy is that it doesn’t impact the AnyVal hierarchy. This makes it much easier to go back and forth.

Now, people don’t write very much code that looks like def foo[Av <: AnyVal](av: Av) .... So maybe it’s largely irrelevant in practice–as you say, you can’t really do anything with it that you can’t do with Any. All it’s good for is to rule out AnyRef (as a type, but not as an underlying representation, because you might have a value class, and if it’s not inline it the AnyVal will be boxed anyway!).

But having a situation where null jumps back and forth between passing as an AnyVal and not adds an extra level of difficulty that you don’t have with explicit-nulls right now. So that would suggest that the non-explicit-nulls should also have null as an AnyVal…except that is a more impactful change as a change of default than flipping it over.

So I’m a little bit uneasy. Both ways seem to have the potential to grate, albeit in an area where few people tread, and so there won’t be much grating overall.

It seems to me like the slightly cleaner theoretical treatment is that Null <: AnyVal always, and the difference between -Yexplicit-nulls and not is whether or not it is also a subtype of all AnyRef. But maybe that diamond is undesirable–unlike Nothing which is a bottom type in type only because there are no instances of it, Null is inhabited uniquely by null, and thus would live as an instance in both hierarchies at once. I don’t know if that would break in-practice assumptions made in the compiler.

Yes, it will keep its magic. This is more a property of how asInstanceOf works on the term null, than of how the Null type appears in the hierarchy.

No. We’re not touching the hierarchy under non-explicit-nulls. That would touch the standard language, which would require a SIP on its own. That does not seem desirable.

Possibly. But is that really different from the fact that it already jumps “back and forth” between passing as an AnyRef and not?

We would probably run into a bunch of places that make that assumption, for sure. Nothing fundamental that could not be addressed, though, I believe. But as I mentioned above, I don’t think it’s desirable to alter the type hierarchy in the current, stable language.

The intent for the future is to replace the distinction between an explicit-nulls type system and a non-explicit-nulls type system with a single type system but with relaxed rules for type checking that can be enabled or disabled locally with a language import. More specifically, the idea is that -Yexplicit-nulls and unsafeNulls would both be enabled by default. (The current default is that both are disabled.)

The intent is that existing non-explicit-nulls code would compile in this new default configuration (I think we’re very close to achieving that). A benefit is that unsafeNulls can be enabled/disabled locally, unlike -Yexplicit-nulls, which fundamentally needs to be on or off globally for a whole compiler run. Another benefit is that we would no longer have two distinct type hierarchies.

1 Like

A downside of moving Null under AnyVal is that it would make it impossible to distinguish between AnyVal and AnyVal | Null, since we can’t write a type difference in Scala’s type system. Conversely, it was the desire to distinguish between AnyRef and AnyRef | Null that motivated moving Null out of AnyRef.

How important is that downside? That depends on what AnyVal is used for and whether those use cases require excluding Null.

What is AnyVal used for in practice for which Any is not sufficient? One might say it’s to ensure something is of some primitive type, but when is that useful when you don’t know which primitive type it is? In many cases, AnyVal needs to be boxed, defeating the guarantee that it’s a primitive type.

It would be useful for this discussion to brainstorm a list of use cases for AnyVal and then examine each case for whether it needs to exclude Null.

1 Like

I anticipated this in the original post. Is there anything in particular that you disagree with, regarding my analysis that AnyVal is not useful as an upper bound?

No, I don’t have any specific disagreement. I was just asking the questions to invite others to present their use cases, if they have some compelling ones.

I wonder if this could somehow also help with interop with JEP 401, since Java’s Value Classes can be nullable.

So, with this change, maybe Java’s value classes could be interpreted as a subtype of AnyVal? :thinking:

I don’t think that will have an impact for that either way. String is nullable in Java, and we can interop with it despite Null not being a subtype of AnyRef anymore. Such is the power of a union type; it can be used with anything.

i think the type of code where you reach to | Null for performance reasons of avoiding Option - in such a generic case i guess would cast the value to AnyRef anyway.

But looking to the future with Valhalla the upper class for a value class is still Object and its the non-null marker on a type that matters, does that suggest nullness is a third category again?

As others already mentioned I think it’s important to track what the JVM does here.

I agree that null: Null can be seen as value type.

But it’s still a very special value type!

On the JVM level the distinction between for example Long and Long | Null is crucial (which makes also the distinction between AnyVal and AnyVal | Null crucial). The former needs only one word of storage (assuming a 64-bit VM / CPU) and can reside in a single physical register, while the later needs at least two words and will be torn apart on the hardware level—with all the implications (like making an array of Long | Null twice as large in the ideal case than an array of Long, and all the issues with atomicy when some value spans more then one register; and all the other issues).

The end game on the JVM is to make value classes non-nullable (where possible), as that’s the only way to make them completely primitive like:

[BTW, it’s really funny to see that Java will gain the implicit keyword for implicit constructors. The stuff around implicit constructors is actually another part of the problem space: Value types need a zero value as null can’t take that place any more in case you exclude null, as planed.]

Scala would likely shoot itself into the foot if we couldn’t distinguish for example Java’s Long and Long? from the future Long! (and same for all the future value classes that are to come, like FP16!, or Complex!).

So I think Null needs to stay a very special thing in the type system. It’s technically a value, but it’s definitely not a value you want as possible inhabitant of any value type—in fact you don’t want it there for all reasonable use-cases of value types!

The current state of Scala’s AnyVal is actually a massive PITA imho: It’s one of the most dangerous footguns in (current) Scala as in Scala you can’t ever be sure an Int is actually an efficient int or a fat and slow Integer. That’s a broken promise, leaky abstraction, and violating least surprise. Current custom AnyVal implementations make that issue even worse…

Instead we should try to arrive with AnyVal at where the JVM wants to arrive with the end-form of value classes. This change here would make the transition to there hard (if not impossible).

But to be honest, I don’t even really understand the motivation behind this idea.

The problem statement is basically only:

Null is now sitting front and center in the Scala class hierarchy. It’s painfully obvious that you have to learn about it. Given it’s position, clearly it is a very important type that everyone should know about! That’s not the impression we want to give.

Expressed pointy: Some diagram is not good for teaching.

Sure, it’s ugly. Null sticks out like a sore thumb.

But it’s technically at the right spot: Null is special; Null is not a subtype of AnyVal, as this would have, like mentioned, quite some disastrous implications long term.

My proposed solution would be really simple instead: Just draw this diagram differently! Make the “Null box” grayed out, give it a dotted outline, and add a legend stating that this exist mostly only for interop and can be ignored in most of regular Scala. (Likely there are even more parts of the docs which could use the same visual appearance to mark similar cases.)

It’s not special. Rewrite your last (long) sentence and replace Null by Unit. It’s all still true!

That’s because none of that is a property of Null. It’s a property of union types, where the parts are logically distinct (we must be able to tell them apart at run-time) but their underlying bit patterns are not. In fact, it’s even more fundamental than that: it’s a basic property of information theory! You can’t fit 2^64 + 1 different values in 64 bits. It doesn’t matter whether that “+ 1” is null, (), or SENTINEL.

The proposed change would not alter our ability to deal with Java non-nullable types in the slightest. We’re not proposing to make Null a subtype of every subclass of AnyVal (obviously); only of AnyVal itself.

Java still does not intend to include an equivalent of AnyVal. So our AnyVal will still erase Object, no matter what, and that’s not a value class. Quoting from JEP 401: Value Classes and Objects (Preview) :

Every value class belongs to a class hierarchy with java.lang.Object at its root, just like every identity class. There is no java.lang.Value superclass of all value classes.

We will still be able to distinguish long from jl.Long? and jl.Long!. In Scala, they are respectively scala.Long, jl.Long | Null and jl.Long. Those are completely orthogonal to whether Null <: AnyVal.


No, really, Value classes and null-restricted types from Java are completely orthogonal to this proposal. Making Null <: AnyVal will neither make it harder nor easier in the future. In fact it won’t even change anything wrt. those future Java things.

3 Likes

null is a reference (works like a reference and e.g. takes as much space as any other reference), so making it an instance of scala.AnyVal feels weird to me. saying a null is a value is like saying any other reference is a value, which is true as references are values and they are passed by value, but that doesn’t clear anything up w.r.t. explicit nullness.

a crazy thought: what about making scala.Null a type without any parent, not even scala.Any?

can you give concrete examples showing what will change under -Yexplicit-nulls with your additional change to the hierarchy?

1 Like

Long takes as much space as any other reference, so I don’t see how that contributes to defining what a reference is.

Besides that, you haven’t provided any supporting argument in defense of “null is a reference”. Comparatively, I believe I have provided several supporting arguments in favor of “Null is a value type” in my original post and other replies. Perhaps you would like to refute specific arguments I made, or bring specific arguments in favor of “null is a reference”?

That’s not possible. In the Scala type system, by definition scala.Any is the super type of all “proper types”, i.e., the types of terms. null is a term, so its type Null must be a subtype of scala.Any.

What you’re asking actually exists in the status quo. It’s AnyVal | AnyRef. Today, that is the type of anything but null. With the change I propose, there will not be a way to express that anymore (just like there is no way to say “anything but ()” or “anything but Boolean” in the current system. I argue in the original post that that is not actually desirable to express (as opposed to expressing “a non-null AnyRef”, which has concrete use cases).

Here as well, if you have a specific, existing use case where an upper bound of AnyVal is useful, please share it.

The following would compile:

val x: AnyVal = null

and it does not compile before. That’s pretty much the only thing that will change. And I believe no one ever wrote non-test code that has an x: AnyVal, so I believe it won’t change non-contrived code.

Well, there is one property that I don’t think anyone likely uses, but we should be thoughtful about changing it.

Right now, if you write

def foo[A <: AnyVal](a: A): Foo[A] = ???

you don’t need to worry that a might be null. In particular, everything is boxed. You can always use the Any methods on it, like .toString, without fear. Primitives and values classes are all inhabited; it’s like -Yexplicit-nulls with AnyRef in that regard.

But with the change, this signature–if anyone were ever to use it–suddenly becomes dangerous. Now you can foo(null), and an innocent .toString will crash. Note that this is already the case with Any–and mostly we don’t call .toString directly but use methods that filter out the null.

So, anyway, while I think in practice this is probably okay, changing AnyVal from a known-exception-free to potentially-exception-throwing object when used with AnyRef methods that are promoted onto Any is something to think carefully about.

It makes me vaguely uncomfortable, even though a lot of things have to go wrong in order to hit it–first, someone needs to have written [A <: AnyRef], secondly they have to (mistakenly?) pass null to it, thirdly they need to be switching to -Yexplicit-nulls, and only then would they be bitten by the formerly-safe pattern that now isn’t.

(Note that if you have a value class that wraps AnyRef or a subtype, and you pass in null, the value class boxes it at least in every case I’ve thought to check. So you don’t have the same hole. Of course, this just illustrates the problem with value classes: they are supposed to not box stuff, but in practice they quite often do. Maybe there’s some way to avoid the boxing and reveal that null doesn’t have .toString.)

Anyway, if the proposal doesn’t already have it, the problem could be fixed by having a canonically boxed null value and logic to catch null whenever it might leak in so that singleton would be used instead of bare null (even though nominally a bare null fits in AnyVal as a null not as a boxed type). But it also suggests that maybe AnyVal isn’t the most natural home for Null.

3 Likes

All these potential issues already exist with Any, and Any is used as an upper bound basically infinitely more often than AnyVal. Again, unless there’s a concrete use case for an AnyVal upper bound (or, even less likely, direct type of a term), it does not really matter, does it?

We can’t do that. It would destroy the only use cases for Null in the first place, namely a) interop with Java/JavaScript and b) expert performance-sensitive code.

Meaning that you’d like to be able to write stuff like String | (Int | Null) and ensure that the null is unboxed (despite the Int being boxed)?

Because save for that, you can just have a rule that boxes only if you are provably a subtype of AnyVal but are not provably a subtype of Null. So Any and therefore unbounded generics let it in bare, but [A <: AnyVal]...: A still will .toString safely.

But in the safety-enhanced case, (Int | Null) boxes null (presumptively), so String | (Int | Null) ought to have a boxed Null value, which like any sentinel may be a bit slower than bare null.

And furthermore, one could argue that if it happened this way, in practice (just not in the type system) Null has special treatment w.r.t. AnyVal vs Any, which kind of makes the type system declaration that it is a subtype of AnyVal not fully honest.

All of the following expressions must produce the same result, because ascriptions to supertypes is observably a no-op (not so when implicit conversions are involved):

  • null: Any
  • (null: AnyVal): Any
  • (null: (Int | Null)): Any
  • (null: (String | Null)): Any.

That same result must be an unboxed null for interop reasons. If it’s not, you have no way to pass a raw null to a Java/JavaScript method that accepts any value (Object in Java).

Moreover, to preserve the Liskov Substitution Principle, null.toString(), (null: AnyVal).toString() and (null: Any).toString() must all to the same thing. Since (null: Any) must be unboxed, (null: Any).toString() must throw, and therefore so must (null: AnyVal).toString().

There’s not really any room for “wishes” here. It’s all dictated by fundamental principles of our type system.

2 Likes