Dotty-style Union Types in Scala

Are there any plans to start supporting union types, similarly to how it’s done in Dotty (e.g., Int | Long | Float | Double)? If yes, is there any time estimate?

If not, what would be the best way to implement the following functionality in Scala 2.12?

Suppose we are interacting with a native C++ library supporting both Float32 (i.e., Float) and Float64 (i.e., Double) data types. We want to define a DataType trait in our library, subclassed by DataType.Float32 and DataType.Float64 that implement functionality such as casting for example. In that trait we want to have an abstract type T, which the DataType.Float32 will override with value Float and DataType.Float64 will override with value Double. In that trait we also define a couple methods:

def cast[V <: Float | Double](value: V): T
def setElementInBuffer(buffer: ByteBuffer, index: Int, value: T): Unit
def getElementFromBuffer(buffer: ByteBuffer, index: Int): T

T here can only be either Float or Double, but the compiler does not know that. Let’s say we read from one buffer of some data type, cast the elements, and write in a buffer of the other data type. Ideally we want cast to be specialized for the primitives. For type V, above, the constraint can be enforced by something like the solution of Miles Sabin.

However, currently there is no way to make the compiler aware that T can only be Float or Double. We could use a witness-style inheritance-based pattern, but that would force boxing of the primitive values and thus be inefficient. Defining type T <: Float | Double (like in Dotty) would be ideal. Is there a way to mimic that behavior currently in Scala, without boxing the primitives?

Note that paraneterizing the DataType trait with T and having a context bound there does not work. The reason is this: suppose a class Tensor has a data type but holds a reference to a C++ Tensor with an integer representing its data type. Then the way we obtain the data type is by calling a JNI function that returns that integer which we then convert to a DataType[_]. This makes the compiler unable to verify that one tensors data type returns elements that can be fed into another tensor’s buffer (after casting to the appropriate type).

I hope this application description is sufficient for understanding the problem, but I can provide more details if necessary.

Thank you!

2 Likes

I am actually of a mind to submit as SIP a standardization of Scala.js’ pseudo union type. This would allow it to work on other platforms, as well as receive better treatment by the compiler on some aspects (e.g., in pattern matching), without having to support them in the typechecker/type system per se. I haven’t gotten around to doing so yet, though.

I wasn’t aware of that implementation. It’s pretty cool. However, for my use case there is a problem. I realize I need an “exclusive-or” and not an “or” of types, after all. This means that “Int | Double <: Int | Float | Double” should evaluate to “false”. I’m not sure of the scala.js union type can be converted to achieve that functionality. Do you have any idea if this is possible? My impression is there would need to be evidence that “not A <: B”, aside from just “A <: B” and I’m not sure what the base case should be for that in your code.

Then, I would need to be able to provide implicit evidence for each type in the XOR separately (e.g., “IntCastHelper”, and “DoubleCastHelper”). My impression is that I need an implicit function providing evidence for a type “T” that does pattern matching on the type and provide a different object for each type in the XOR. This needs to be an exhaustive pattern matching on each of the types defined in the XOR.

Does my description make sense?

Thanks!

Why not use scala.Either?

Because it’s a completely different thing? In particular, it doesn’t have the following properties of union types:

  • A | A =:= A (while Either[A,A] can be ugh… either Left[A] or Right[A])
  • more generally, A | B =:= A when B <: A
  • A <: A | B (you can pass one of the component types where union is expected)
  • erasure of A | B = least upper bound of A and B (Either has runtime overhead due to wrapping)

So the key thing is the lack of wrapping in union types as compared to Either, both on type level and in runtime.

Hello,

I think rather than union types, we need special super types for
numerical types.

It is a big pity that numeric types Double, Float, Int, Long, Short,
Char, Byte have so much functionality in common (e.g. toDouble) and yet
there is no super-type to cover that common functionality.

So please, let’s have scala.Number.

And maybe also scala.FloatingPoint and scale.Integer. Perhaps even
scala.Number32 and scala.Number64.

I’m not excited about union types A|B. Much added complexity for little
gain. You’d either have to treat them like the common super-type or check
the type. Why not just use the common super-type instead?

 Best, Oliver

I don’t know about you, but I definitely prefer Int | String instead of Any or CaseClassOne | CaseClassTwo instead of Product with Serializable.

Yes, you’ll have to pattern match union types. But that’s like bread and butter in functional programming. We’re already doing that with ADTs (sealed hierarchies). Union type can be used as a simple, ad-hoc, no-overhead alternative to a sealed hierarchy or typeclass.

Type classes are the right approach for dealing with this. Check out Spire, if you haven’t already…

2 Likes

Hello,

With Any or Product with Serializable, I know where to look to find out
what methods are available. If I see CaseClassOne|CaseClassTwo, I don’t
know.

If I see Int|String, I would know it is really Any, and I would think
"Why the hell would some one do that?".

 Best, Oliver

To access a common function a structural type would do as well. See http://cscarioni.blogspot.ch/2013/02/duck-typing-in-scala-kind-of-structural.html

Any or Product with Serializable tells me absolutely nothing about methods being available.
If I see CaseClassOne | CaseClassTwo then I know that I have to pattern match against these two and look into API of these two. And I’m also safe against someone giving me CaseClassThree as would be possible when the type was Any.

Hello,

Ok, that is true. But what is the use case for CaseClassOne|CaseClassTwo?
And what is the use case for Int|String?

 Best, Oliver

I would like to clarify something with respect to my original question because I feel that this conversation might not be exactly on point. My focus is on finding a way to achieve the desired type-safety while being highly efficient at the same time.

My current solution does involve a sealed trait hierarchy of value class wrappers, but that is inefficient (because of constant boxing/unboxing when accessing elements of underlying native tensors) and introduces a lot of boilerplate (because a value class wrapping an integer does not “behave” like an integer – I have to implement a trait including all of the arithmetic and comparison operations, among other things). @jducoeur mentioned Spire and I have indeed checked it out. It seems to be using a similar approach to what I am doing right now, but that has the two problems I described. Please correct me if I am wrong.

What would ideally be desired is to be able to let the compiler know that type T is a numeric primitive, for example. Then if a class implements a function cast that takes in a value of type Float, and another function cast that takes in a value of type Double, etc., the compiler should be able to check whether cast can take a value of type T (i.e., all specialized cast implementation exist). In this case, no boxing/unboxing would be necessary and the code could be highly efficient. This would be more relevant to the approach that @curoli proposes. If those number super-types are part of the Scala library, then the compiler might be able to treat them in a specialized manner.

Hello,

Spire is pure magic. It’s super-cool what it can do. It would be nice if
there was a simpler way that non-wizards can understand.

 Best, Oliver

Yes, and @eaplatanios doesn’t want those properties. He wants an exclusive or. Either is that, union types are not.

I think @jducoeur is right. I’ll elaborate on why below—but the TL;DR. is “use spire and follow https://github.com/non/spire/blob/master/GUIDE.md”. Below I answer this thread and explain why that seems the correct answer—ahem, modulo the fact that some requirements seem overly restrictive.

@eaplatanios clearly wants no wrapping, so Either is not OK. I’m not sure you can satisfy all the given requirements. But what he actually wants is Float | Double, so there might be specialized solutions for that. In particular, you want to use Spire and specialization as described on the Spire guide.

You asked about having

If you don’t want to box value, your generated bytecode will need to use two separate methods (overloaded or not):

def cast(value: Float): T
def cast(value: Double): T

You can also try to generate that via specialization:

def cast[T @specialized(Float, Double)](value: V): T

The advantage of specialization is that it produces the two overloads automatically, and then you can write callers without duplicating them. That is, you can just write

def castUser[T @specialized(Float, Double)](value: V) = ... cast(value) ...

instead of having two copies, one for each overload of cast:

def castUser(value: Float) = ... cast(value) ...
def castUser(value: Double) = ... cast(value) ...

Beware: I’m no expert on specialization, but it can have some issues (I’m no expert on those). However, the Spire guide recommends specialization and you only care about two types, Float and Double, so you’ll have fewer problems.

On the JVM, the only alternative (in principle) would be to only support a single type (say Double), and widen Float to Double if needed. That avoids boxing. Something like this was the basis of http://scala-miniboxing.org/—though I suspect that avoided floating-point extension by using Float.floatToRawIntBits and friends. However, I understand that miniboxing is not stable enough for production use.

Using type parameters and specialization avoids union types, but let me answer anyway…

That’s confusing to me—especially, the name “exclusive or” is. That thing is an exclusive-or, a value can’t be both Int and Double. And if a value is an Int | Double, it is a special case of Int | Float | Double. A caller that handles the latter can also handle the former—its Float branch will not be triggered by such a value, but it can still be triggered by users producing Float. If you know you have no such users, you shouldn’t need to write Int | Float | Double anywhere (I know I’m oversimplifying, but not really). The pattern matching will still be exhaustive—at worst, it has more cases than strictly needed.

Calls to structural types are compiled to use reflection, hence are much too slow. Time ago somebody had a solution based on macros, but I don’t know if it still works and how robust it is:
https://meta.plasm.us/posts/2013/07/12/vampire-methods-for-structural-types/

Yes, I should have mentioned the performance impact.