Pre-SIP: Unboxed wrapper types

What happens if you use the type C | U (or C ∨ U in shapeless)? Double | Logarithm seems like it could be a reasonable type for a representation of a floating point number, but you wouldn’t be able to distinguish them at runtime.

I don’t think that representation works here because we’d like to hide the underlying methods and operations by default. By contrast, the Double with Tag encoding specifically ensures that you can still treat a tagged value as a raw Double (and makes it easy to lose the tag too). For example:

scala> trait Tag
defined trait Tag

scala> type Tagged = Double with Tag
defined type alias Tagged

scala> val x: Tagged = 123.456.asInstanceOf[Tagged]
x: Tagged = 123.456

scala> math.log(x)
res0: Double = 4.815884817283264

scala> x + 1.0
res1: Double = 124.456

In some situations this might be the desired behavior, but in many cases (for example my Logarithm example) we definitely don’t want to try that value as a normal Double, and we definitely don’t want unexposed methods (like -) to treat the value as a normal Double.

Rules for compile errors and warnings are different for pattern matching than from is/asInstanceOf already. Pattern matching, as a “common” feature, is stricter about certain things. is/asInstanceOf are more lenient, as they are intended to be more low-level. Since there is precedent for that, it would make sense that pattern matching be stricter about matching a Logarithm when the expression being matched is a Double. In fact, I suspect that without doing anything special in the implementation of newtypes, we would get that for free.

1 Like

I agree. And as mentioned by @sjrd, Scala can already warn against type patterns that won’t have a proper runtime check. For example try writing List[Any]() match { case ls: List[String] => }.

<console>:12: warning: non-variable type argument String in type pattern List[String] (the underlying of List[String]) is unchecked since it is eliminated by erasure
       List[Any]() match { case ls: List[String] => }

Similar thing happens for the usual newtype encoding (with an abstract type T as mentioned previously). For example, in 42 match { case _: Label => }.

<console>:15: error: scrutinee is incompatible with pattern type;
 found   : LabelImpl.Label
 required: Int
       42 match { case _: Label => }

I also agree that the behavior of asInstanceOf being potentially surprising is not a real problem here. It’s already surprising for a number of reasons, but it is really a low-level building block anyway. It is not supposed to be used much (I think it is often considered bad style) except when implementing internal low-level mechanisms, where the author will likely be an expert and know about the precise semantics.

1 Like

Also, the article on which this proposal is based is written around the premise that this encoding is better for writing parametric code. Using isInstanceOf and asInstanceOf is more or less the opposite of writing good parametric code, so the user shouldn’t be surprised that using those features together will not work out well.

1 Like

@non I’m glad to see this being considered with the concerns of type-safe programmers at the forefront, rather than the concerns of stringly-typed programming.

I think an “associated function” mechanism, like the ‘newtype instance methods’ included in the current draft, is a great feature to include. But for true “zero-cost” newtyping, I think a more powerful one is called for.

For the sake of argument, let’s assume that given

sealed abstract class =:=(A, B)
final case class Refl[A]() extends (A =:= A)
sealed abstract class <:<[-A, +B]
final case class SRefl[A]() extends (A <:< A)

case Refl() reveals the A = A equality in its consequent body, and case SRefl() reveals the A <: B relation in its consequent body. (We must currently simulate this with subst.)

Now consider a Set module in the spirit of @chrisokasaki’s Typelevel post.

trait SetModule {
  type T
  type E

  def wrap(underlying: Set[E]): T def unwrap(t: T): Set[E]
  def union2(l: T, r: T): T

  def unionN(xs: Set[T]): T
  def subsets(t: T): Set[T]
  def reveal: (T =:= Set[E])
}

object SetModule {
  val Module: SetModule = ... // elided

This provides a 100% zero-cost arrangement, assuming sufficient inlining of the functions in the module. That’s because the type equality T = Set[E] is visible to both the functions defined directly within the module, and any “extension” functions you might want to add, by means of

SetModule.Module.reveal match {
  case Refl() =>
    // here T = Set[E] is visible, so I can use normal Set[E]
    // signatures with Ts, and vice versa
}

By contrast, the style of single-receiver method definition is only well-suited to defining functions like wrap, unwrap, and to a slightly lesser extent union2—the “other” argument must be unwrapped and the result rewrapped, so it is not quite as neat as the “opaque type alias”-style definition.

By contrast, consider unionN. For the module, you can simply supply any existing function that conforms to Set[Set[E]] => Set[E]. The instance-method style doesn’t have it so easy: assuming you can sensibly choose one of the Ts to act as receiver—assuming there is one to choose at all—the tail Set[T] must still be maually unwrapped, in linear time, even in the body of a T method, in order to use an existing definition of unionN.

The same goes for return values, such as the case of subsets. No linear wrapping step is needed for the module version, but one is needed for a instance-method-style approach.

And that assumes mapping is even possible. With a full type equality visible, even an invariant, unmappable F[Set[E]] simply becomes an F[T] as requested—but only as requested.

reveal extends that equality to the public module interface; for many newtype use cases, it is probably “so open the abstraction falls out”, and a weaker property would probably be desirable, like T <:< Set[E]. This too might be revealed globally as an upper bound (i.e. newts’s @translucent), or constrained to pattern-matching blocks that explicitly ask for the relationship via case SRefl(), depending on how visible you want the relationship.

newts gives you the two choices (1) totally opaque or (2) globally visible supertype; Flow opaque type aliases offer the same two choices. GADTs/subst give us more possibilities via reveal, but I don’t know what the best way to provide those options would be.

In summary, I would like to see these aspects considered, which are not well served by an instance-method-like approach.

  1. The definition of functions alongside the type definition that do not fit the “single receiver” pattern; the closed set; and
  2. Whether there should be something like reveal to define further functions not alongside the type definition, yet knowing something about the representation, and how fine-grained the choices should be; the open set.

You might say these are well-served by the already-supported “module definition style” (notwithstanding #10283 and the like), and that anyone who wants these features should use these existing—if quite obscure—Scala features, but it would be really nice to not need to reach for them often in the future.

Hello everyone,

I and Erik have been changing the proposal based on feedback from the Committee. Our final proposal is now live in the official SIP list and can be read here. We hope that you like this new proposal.

We’re discussing this proposal in today’s SIP meeting.

2 Likes

I watched the discussion of SIP-ZZ Opaque Types this morning EDT; great stuff. Thanks for the email-friendly update too, @heathermiller.

The current draft is really nice and looks useful as well; thanks @non, @jvican, et al. Extending 3.5.1 Equivalence in the “companion knows” sense is a neat way to provide the code-that-knows block in a truly zero-cost way.

Some thoughts below on a few points that came up in the discussion, namely

  1. implementation via implicits (scalac) vs GADTs (dotty)
  2. companions restricted to opaque aliases
  3. implicit copying
  4. optional bounds
  5. multiple opaque types
  6. motivation: boxed Objects, unboxed functions

Implementation via implicits or GADTs

Under the stated equivalence rule, functions like these can be defined in e.g. opaquetypes.Logarithm.

import collection.mutable.{Set => MSet}
def wrapSet(s: MSet[Double]): MSet[Logarithm] =
  s

def mdl(m: Map[Double, Logarithm]): Map[Logarithm, Double] = m

// lest you still try mapping
def wrapFoo(foo: Foo[Double]): Foo[Logarithm] = foo
// where Foo is a user-defined trait

// just to make it really impossible
def wrapSomeMap[M[K, V] <: Map[K, V]](
    m: M[Double, Logarithm]): M[Logarithm, Double] =
  m

Some definition like this might be worth including in the example. This is great because it implies you can provide a =:= or <:< (or any equivalent subst carrier) to expose as much of the equality as you want.

As such, I don’t think an implicit conversion story will work, because a pair of Double => Logarithm, Logarithm => Double conversions won’t lift into the type constructors above.

I think there’s a happier story here in the area of GADTs as @odersky mentioned with respect to Dotty. This should work because the type equality being locally visible is just how the handwritten module implementations of “High Cost…” handle all these, without resorting to implicits.

This is also a happier story because when you use the “translucent” style (i.e. upper-bounded with same type you intend to set the opaque type equal to), all the extra asInstanceOfs go away, because the erasure of the “methods that know”, and indeed all methods that use the translucent type, match the unwrapped usage exactly. (The fully-opaque style has casts similar to what you get for generics; see “Erasure” below.) The big value-add of the Scala feature here is separating the erasure choice from the visible upper bound.

As an aside, it would be nice if GADT-style extraction of the equality/conformance from =:= and <:< worked in Scala; if this happened as a side-effect of making the equality work for opaque type companions, I wouldn’t look the gift horse… :slight_smile:

Companions restricted to opaque aliases

This came up a couple times; I think the “only new types get companions” rationale works well here as a way to explain the distinction to users.

There are extra reasons that companions for proper aliases would be confusing. Take this opaque type:

opaque type Moo[Bar >: LB <: UB, Baz] = Baz
  // where LB, UB are bound type names

The story for implicit search related to this opaque type is pretty simple, because it’s just like the one for classes. Look at Moo, Bar, and Baz companions, walking up in the ordinary way.

But Moo and Bar will not be searched for a proper alias, as implied by the SIP under “Type companions”. After all, Moo[ClassLoader, Baz] = Baz, no matter where Baz occurs, wherever Moo is visible.

It seems to me that a user who wants their type alias to behave “equivalent-but-not-quite” should really be reaching for an opaque or translucent type under this SIP, or any abstract-type mechanism, anyway.

Implicit copying

You can define an Ordering[Logarithm] in Logarithm's companion as follows.

implicit def ordInstance: Ordering[Logarithm] =
  Ordering.Double

Of course, you’d rather just use implicits to find the instance, so you rewrite to

implicit def ordInstance: Ordering[Logarithm] =
  implicitly[Ordering[Double]]

Of course, ordInstance: Ordering[Double] in this context, so you diverge. (This is one of the rare things that’s easier to get right with subst than with a visible type equality.)

“Wrong” return type inferred

Suppose you define

def add(l: Logarithm, r: Logarithm) = l + r

The most natural inferred return type is Double, which is fine for proper aliases (where callers see Logarithm=Double anyway), but is probably wrong for the intent of the programmer here. (It’s pretty similar to the problem you get if you try to write GADT code without declaring an expected type for the output of your match.) I don’t have anything better than suggesting that inferred return types in opaque types’ companions ought to be warned about, but even that might have too many false negatives.

This problem doesn’t appear in the “High Cost” style because you’re always forced to declare all your full method signatures in a context where the opaque type doesn’t equal the expansion, i.e. the ML-style “signature” trait.

Optional bounds

I think that some people in the call thought this was addressed somehow, but I wanted some clarity on how you might add public bounds.

For reference, this is the “translucent” flavor of @alexknvl’s newts, and is also an option with Flow’s opaque type aliases. It’s satisfied in “High Cost” style by putting <: and/or >: on the publicly-visible abstract type member declaration. (I haven’t seen another system try to offer >: as well, but several use cases come to mind.)

This might just be a syntactic difficulty; it’s obvious how to do it in “High Cost” style once you understand the signature/structure separation, but putting both = and <: on an opaque type would be…weird, to say the least.

Flow supports this like so

opaque type Foo: VisibleUpperBound = PrivateUnderlyingType

Multiple opaque types

While discussing type companions, the possibility of using the containing block as “code-that-knows”, and relying on the parent path of the opaque type for finding associated implicits (but see scala/bug#10283 courtesy @Atry, but hey thanks for tackling it @TomasMikula) came up. This is intriguing for a few reasons:

  1. it makes “code that knows multiple opaque types” straightforward, which it is in “High Cost” style but is not possible without GADT-=:= in the current SIP;
  2. “code-that-knows” is alongside the opaque type, similar to “High Cost” style
  3. type companions may be eliminated, supposing #10283 is fixed.

So you write, say

object mod {
  opaque type Foo = String
  opaque type Bar = ClassLoader

  def fb(m: Map[Foo, Bar]) =
    // m: Map[String, ClassLoader] here,
    // because both type equalities are
    // visible
}

One big question would be “what happens when you subclass a class with an opaque type [member]?”

Having multiple opaque types is also supported in Flow, which gives modules a similar structure.

opaque type Foo = string
opaque type Bar = "classloader"

function fb(m: ((f: Foo) => Bar)[]): ((f: string) => "classloader")[] {
    return m;
}

Motivation: boxed Objects, unboxed functions

The example focuses on Double, but it might be worth mentioning that AnyVal subclasses have the described boxing behavior even when they wrap non-primitive types (i.e. those that wouldn’t box at generic boundaries).

Since motivation came up in the meeting, avoiding boxing for performance was emphasized, but the convenience of the type equality is also a big deal, I think. That is, you can recycle functions, typeclass instances, et al without adding any mapping layer.

Erasure

Aside from equivalence, 3.7 Type Erasure should also be extended by the SIP.

  • The erasure of an opaque type is the erasure of its right-hand side.

This makes it clear where it stands erasure-wise between alias types and abstract types (bullets 1 and 2 respectively).

2 Likes

Thanks @S11001001 for the insightful comments.

For the record, I’d like to publicly thank you for your blog post. You did a really good job at describing the problems with value classes, and your analysis motivated me to kick start the proposal with Erik. I’ll add this acknowledgement to our current proposal.

I didn’t want to add lengthy examples to ease the read of the code snippets, but I’ll add wrapFoo and mdl. They illustrate what we meant by:

Note that the rational for this encoding is to allow users to convert from the opaque type and the underlying type in constant time. Users have to be able to tag complex type structures without having to reallocate, iterate over or inspect it.

If the type equivalence between the dealiased type and the opaque type definition is described by the user, the implicit conversions will not be synthesized, as explained in the proposal. @non has also mentioned the possibility of user-defined =:= et al instances his last commit here.

I will explore this idea, will talk to the Scala team about it, and see if this can fit in the implementation without being part of the proposal. The proposal is already ambitious as it is :wink:.

Indeed, it may be a problem to typecheck wrapSomeMap only with those private synthetic implicit conversions in scope. I’ll have a closer look and try to find a way to make it typecheck.

Interesting observation, but I think this will not be a problem in the Scalac implementation. When triggering the implicit search of Ordering[Logarithm] inside the opaque type companion, typer doesn’t know yet that Logarithm =:= Double, so it will look for an instance of that implicit, it will fail, and it will try to to apply the implicit def conversion from Logarithm => Double. The result of this last search will be Ordering[Double], taken from scala.Predef. In Dotty, this could be a problem if the first step of the implicit search sees Ordering[Double] =:= Ordering[Logarithm].

Cannot this happen as well in other cases, especifically when relying on + being provided by an implicit?

In my opinion, if someone writes a public definition without a type, they’re looking for trouble. I strongly discourage it. The case you point out cannot be addressed in a principled way, or at least I don’t see how we could.

What I would propose is that we have enabled-by-default warnings that will warn users that define public methods in opaque type companions without an explicit return type. I believe this warning could be given a bigger scope, too, and warn about these cases all over your program.

The idea is that those users that want to specify upper and lower bounds are forced to define a type member in a trait:

trait T {
  type OT <: Any
}

and then implement it:

object T extends T {
  opaque type OT = String
}

just as you would with type aliases.

Opaque types need to be defined inside an entity after all, so the overhead of adding this type member in a trait is minimal. Would this cover all the scenarios you’d like to use upper and lower bounds on?

Interesting, it’s the first time I hear about Flow.

We can certainly consider doing so, but I’m not sold on its utility. One of the things I like the most about opaque types is that they have the same syntax (semantic-wise) than type aliases, and don’t require explicit type ascriptions. If we add this, we’re creating a new mental model of opaque types, and users need to learn it. The fewer rules, the better.

As I explain in the meeting, this has several problems:

  • APIs of different opaque types get mixed, hampering readability of the code.
  • Users cannot define a method tag for two different opaque types that have the same underlying type. The same happens with implicits.
  • Use sites of these opaque types do not know where these methods are defined. It’s way clearer to see Logarithm.tag than tag somewhere in your program.

I don’t like the idea of defining multiple opaque types in the same prefix. I’m personally in favor of opaque type companions, and I think companions are a natural way of thinking about Scala code. Its addition does not add overhead to the language; instead, it creates a more consistent language that converges towards common and widespread language features.

As @adriaanm mentions in the meeting, a non-negligible part of Scala developers, especifically beginners, already think that an object with the same name of a type alias is a companion.

I haven’t given this too much thought, but it will inherit it. If you want to override it, you also can. This is consistent with the behaviour of type aliases.

Yes, and this needs to be made more clear in the proposal. @xeno-by and @dragos pointed it out in an email before the meeting. The golden rule of opaque types is: the runtime will box/unbox whenever the underlying type needs to. Hence, they do not add extra boxing.

Despite boxing for AnyVal instances, note that primitive boxing is cheaper than what AnyVal does, and therefore faster.

For example, let’s take the Logarithm example from the proposal and inspect its bytecode. In the value class example, the compiler triggers the instantiation (via new) of every logarithm in the following expression val xs = List(Logarithm(12345.0), Logarithm(67890.0)).map(_ + x). This is not the same bytecode than for opaque types, which uses scala.Predef.doubleToDouble to cast scala.Double to java.lang.Double, and whose implementation is just a cast (d: scala.Double).asInstanceOf[java.lang.Double]. This cast is cheaper for the runtime than the new instantiation because:

  • It is instrinsified and it’s a fundamental mechanism of the JVM.
  • It doesn’t have to go through the initializers of the value class and the extended classes (traits).
  • When you instantiate a new object, you waste a lot of memory for object headers, fields, metadata, etc. I haven’t checked yet, but my guess is that java.lang.Object is optimized to avoid all this waste, therefore being easier on memory consumption.

Opaque types have more non-obvious advtanges over value classes, if we follow the reasoning of the golden rule for opaque types. If we compile val xs = List(Logarithm(12345.0), Logarithm(67890.0)).map(_ + x) with an Array instead of a List, we have zero boxing/unboxing because arrays are specialized.

Opaque types do not solve the problem of boxing/unboxing (this is a problem of the runtime), but they are a mechanism that adds wrapper types avoiding any extra overhead that would not be performed had the underlying type be used.

Thanks, I’ll add this! I mention this in the meeting, but I forgot to make it explicit.

Instead of adding a keyword opaque, an alternative could be to combine type bounds with type definition:

// An opaque type:
type A <: Any = Double
// A translucent type:
type A <: Double = Double

This perfectly mirror the way one can define terms of a certain type but annotate the definition with a wider type, as in val a: AnyVal = 1.0.

Yet another possibility, inspired by constructors with restricted visibility this time:

type A <: Double private >: Double
// in principle (but not practice?), A >: Double <: Double is equivalent to A = Double

The advantage now would be that one could control the visibility in a fine-grained way using traditional Scala mechanisms: private only visible in the companion, protected visible by subclasses, private[foo.bar] for a package, etc.

1 Like

I just opened a new conversation to add a parametric top type, which is relevant for the discussion here. I have already discussed this with @jvican and hashed out with him some of the details of how that would relate to value classes.

The gist is, I was quite in favor of the proposal yesterday, but now think we have found something better.

[bikeshedding] Another alternative syntax that doesn’t require a new keyword:

package object opaquetypes {
  type Logarithm = private Double
}

I feel like new type Logarithm doesn’t fully correspond to Haskell’s newtype, since newtype in Haskell is not opaque.

Using the private keyword would also be somewhat consistent with private inheritance (as in C++) if Scala ever gets it:

class Foo extends private Utils
2 Likes

How about this?

private[packagename] trait InternalUtils extends Utils

class Foo extends InternalUtils

Please, let’s not discuss syntax until we’re done with the semantics. I’d like to keep this discussion focused and technical, for now.

3 Likes

One of the benefits (from a programming perspective) of using AnyVal for wrapper types is that the code is very compact. For example, suppose one is wrapping a user ID:

case class UID(value: String) extends AnyVal

That single line succinctly gives you an apply(String) method to wrap values, and a value method to unwrap them. The equivalent code for opaque types is substantially more verbose.

opaque type UID = String

object UID {
  def apply(s: String): UID = s

  implicit class Ops(val self: UID) extends AnyVal {
    def value: String = self
  }
}

If creating several wrapper types (e.g. for half a dozen types in a user record), opaque types add significant source code burden.

Is this use case sufficiently important to warrant its own syntax/syntax extension (for example, opaque case type Foo = Bar)? (not trying to start a discussion about possible syntax; just want to discuss the possibility of having some syntax)

1 Like

(EDIT: I had misunderstood the visibility rules that value classes are allowed to use, so these examples will work with value classes, modulo some potential boxing in some situations. See ghik’s reply.)

One interesting point here is that the design of value classes is such that you are required to have unconditional public wrappers and unwrappers. (This is because the internal implementation’s extension$ methods require third parties to be able to wrap/unwrap the types.)

By contrast, the proposal here gives the user control of when (or if) wrapping and unwrapping is possible. Consider cases where we only want to allow certain values to be wrapped:

opaque type PositiveLong = Long

object PositiveLong {
  def apply(n: Long): Option[PositiveLong] =
    if (n > 0L) Some(n) else None

  implicit class Ops(val self: PositiveLong) extends AnyVal {
    def asLong: Long = self
  }
}

Relatedly, we might choose to use Int to encode an enumeration or flags, but want to ensure users can only use a small selection of actual values (to prevent users from wrapping arbitrary values):

opaque type Mode = Int

object Mode {
  val NoAccess: Mode = 0
  val Read: Mode = 1
  val Write: Mode = 2
  val ReadWrite: Mode = 3

  implicit class Ops(val self: Mode) extends AnyVal {
    def isReadable: Boolean = (self & 1) == 1
    def isWritable: Boolean = (self & 2) == 2
    def |(that: Mode): Mode = self | that
    def &(that: Mode): Mode = self & that
  }
}

Finally, we might not want users to be able to unconditionally extract back to the underlying values. In this case, we can restrict access to code in the db package:

package object db {
  opaque type UserId = Long

  object UserId {
    def apply(n: Long): UserId = n
    private[db] def unwrap(u: UserId): Long = u
  }

  def lookupUser(db: DB, u: UserId): Option[User] = ...
}

These are all interesting use cases that are not possible (except by convention) with value classes. I agree that the enrichment is a little bit cumbersome (which is why the first version of our proposal included it) but on balance I think the added flexiblity and power of opaque types is worth a bit of verbosity for enrichment. In the future, if we improve the story with value classes and extension methods, opaque types will be able to reap the benefits.

1 Like

What do you mean? I thought value classes can have private constructor and wrapped member, e.g.

class Mode private(private val raw: Int) extends AnyVal
object Mode {
  val NoAccess = new Mode(0)
  val Read = new Mode(1)
  val Write = new Mode(2)
  val ReadWrite = new Mode(3)

  def apply(raw: Int): Option[Mode] =
    if(raw >= 0 && raw <= 3) Some(new Mode(raw)) else None
}
1 Like

Note in your example, these is no way to make an unboxed Mode. To return Option[Mode] you must box, no?

I could have also done this:

def apply(raw: Int): Mode = {
  require(raw >= 0 && raw <= 3)
  new Mode(raw)
}

which incurs no boxing, but is less typesafe.

I am totally in favour of the opaque type proposal and I fully understand its superiority to value classes in terms of performance. I simply didn’t understand @non’s argument about “unconditional public wrappers” in value classes.

I am very much behind opaque types, and I don’t actually think they add too much verbosity in most situations. However, for basic wrappers, they do.

Let me motivate my concern with the following example:

case class BrittleUser(id: Long, firstName: String, lastName: String, email: String)

case class User(id: User.Id, firstName: User.FirstName, lastName: User.LastName, email: User.Email)

object User {
  case class Id(value: Long) extends AnyVal
  case class FirstName(value: String) extends AnyVal
  case class LastName(value: String) extends AnyVal
  case class Email(value: String) extends AnyVal
}

By using a few short wrapper types, you get type safety, preventing you from getting the order of the fields wrong. However, to accomplish the same with opaque types is… a little bit ridiculous.

I like the idea of opaque types, and I think they add flexibility at extremely low cost for types which are more than a simple wrapper (such as your Mode type). However, in some situations, they lose to AnyVals in source maintainability even though they win in performance.

1 Like