Deprecate assignment to underscore

Defaults are a little bit more complicated than many posts here imply.
They are intertwined with the uninitialized state.

Note that assignment to underscore results in the same value as an uninitialized val.

Critically, this is what is inside a fresh array

println(new Array[Int](1)(0))
println(new Array[String](1)(0))

are analogous to printing out a val of those types that is uninitialized or a var assigned to underscore. Any plans to allow defaults being different than uninitialized values is going to get complicated by arrays.

Valhalla (last I checked, I have a month of mailing list to catch up on though), decided that value types will not have custom defaults for a variety of reasons. The default value will be the same as the uninitialized value (in line with ā€˜codes like a class, works like an intā€™) ā€“ meaning numeric members will be 0 and references null. A class can modify its accessors to interpret that differently.

Perhaps a Date class would be encoded as a byte for the day, a byte for the month, and an int for the year ā€“ but if the default is January 1, 1970 the accessors will have to add 1, 1, and 1970 respectively.

Any other choice would make the array allocation path quite a bit slower. And class allocation as well, as there is a desire for the uninitialized state and default to be the same. Getting all 0ā€™s back from the allocator is cheap, as it clears them in bulk. Writing arbitrary user defined bit patterns to fresh arrays or objects (or bits on the stack) is not going to be fast.

There is still some debate on the topic, at least a few weeks ago. But the relationship between user defined defaults, uninitialized values and array allocation are the key things to take note of.

2 Likes

The Valhalla discussion around default values from a couple months culminated in this:

http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-May/004228.html

That applies in general to language design and not just the VM, in particular points 2 and 3 should echo strongly for anyone who has dealt with messy Scala object initialization issues and had to play ā€˜lazy val whack a moleā€™ to uncover initialization order issues.

Default values that depend on other default values where the default is not the same as the uninitialized value can lead down a path through bugs of unusual size (BOUSā€™s) to a pit of despair.

I would keep the assignment to underscore unless we had even simpler syntax for ā€˜uninitialized varā€™. Honestly, that is the only time I think it is needed is when declaring a var ā€“ maybe we can allow var x: Int instead of saying its abstract, to make it the same as if it was assigned _ and a default value. That does cause some inconsistencies with val, however. Iā€™m not sure we need the case of re-assigning the default later, or at least that is significantly more rare and null.asInstanceOf[A] is good enough (though that might be slower).

Just to make this as clear as possible:

class A[T] { var x: T = _ }

is equivalent to

class A[T] { var x: T }

which is equivalent to

class A[T] { var x: T = null }

No, there are differences.

class A[T] { var x: T }

declares an abstract field. So it doesnā€™t compile because A needs to be abstract.

class A[T] { var x: T = null }

doesnā€™t compile either, because null has type Null which is not a subtype of T. You could rewrite it as

class A[T] { var x: T = null.asInstanceOf[T] }

which would then almost be equivalent to var x: T = _, except that there is a reassignment to null after the call to the super constructor, whereas there is no such reassignment in var x: T = _. This is observable in contrived examples:

abstract class Base {
  init()
  def init(): Unit
}

class A[T](someT: T) extends Base {
  var x: T = _
  val y: T = x

  def init(): Unit = x = someT
}

val a = new A[String]("foo")
println(a.y)

this displays foo. But if you use

var x: T = null.asInstanceOf[T]

it will display null

3 Likes

May be it should be explained in https://docs.scala-lang.org/tour/classes.html .
I think that we need seperate bold section for it :slight_smile:

I remember when I had started to work with scala the google told me to make something like

class A[T >: Null](someT: T)  {
  var x: T = null
  val y: T = x

  def init(): Unit = x = someT
}

In the end, I lost about two days to find out and fix that mistake in my final class hierarchy.

Yes, I have understood all these differences, but ā€¦,ā€¦ ,ā€¦ :slight_smile:

1 Like

Somewhat.

What Iā€™m proposing is that one could make var x: Int be interpreted as the existing var x: Int = _ for concrete classes and local variables. This is a bit odd since the former currently means that it is abstract, and it would be inconsistent with other places where there is no = on the RHS (except for procedure syntax). It also might not parse well and cause some ambiguities that require inserting a semicolon ( var x: Int; ) which Iā€™m sure will be despised if true.

Assignment to underscore is identical to ā€˜uninitializedā€™ in this case, and would have the same bytecode ā€“ none!

var x: Int = null.asInstanceOf[Int] is not the same bytecode, there is both an unboxing of null to Int and a write of the default value to the variable and this will have performance and bytecode size consequences. And some of those consequences would be significant ā€“ look at the standard library and collections today, and how frequently a var + loop is used for performance reasons. If all of these gain useless bytecode and ceremony to use a private temporary var, there will be performance consequences.

For cases where the type is known and 0 or null can be assigned, the JVM / JIT can usually optimize away the extra assignment, but this is not always the case (JS?) and the extra bytecode does matter.

My proposal goes against the intent of the original proposal, which has some examples where the default (null) can lead to bugs such as var perhaps: Option[String] = _. To me those things look like tasks for a linter. I think Scala needs the ability to have an uninitialized var for performance reasons.
If the primary desire is to change up syntax though, to remove one more use of _, then my proposal here can do that, although only by introducing a different quirk for var.

Your proposal does not allow to declare a concrete uninitialized var in an abstract class anymore.