Making union types even more useful

bjornregnell · March 7, 2021, 5:24pm

I’m starting this thread to discuss pros and cons of potential enhancements to the ergonomics of union types when used as a modelling tool.

When working more and more with Scala 3 I have come to appreciate union types more and more, as they allow me to model things in a lightweight way, without heavy inheritance, when appropriate. But I have came across some use cases where the limitations in type inference are hindering me to write concise beautiful models.

One thing I would really like is that the compiler would allow me to use shared methods among union members, without having to resort to bulky matches, so that this would not be an error:

scala> class A(val x: Int)
// defined class A

scala> class B(val x: Int)
// defined class B

scala> def f(ab: A | B) = ab.x  
1 |def f(ab: A | B) = ab.x  
  |                 ^^^^
  |                 value x is not a member of A | B

I found out through @smarter on gitter (thanks!) that this once was taken out based on a discussion way back in 2016 True union types by odersky · Pull Request #1550 · lampepfl/dotty · GitHub with arguments that inheritance would be better here, but I think there are arguments for allowing such structural similarity to shine through, as constructing a union may often be a deliberate modelling decision.

Are there others who thinks common members of unions would be useful? Or am I missing some severe downsides/show-stoppers?

RubenVerg · March 7, 2021, 5:42pm

This is similar to the way Typescript does it: provide all properties common between all options. Actually in TS:

class A { declare i: number }
class B { declare i: string }
function ab(x: A | B) { return x.i /* number | string */ }

I’m not sure if Scala’s type system can support that tho

RubenVerg

bjornregnell · March 7, 2021, 5:56pm

Yes that is a nice way to reduce code duplication and boilerplate. Without having deep insights into the type system, I think all the info should be there for the compiler and it seems as the compiler actually did allow this years ago…

CiaraOBrien · March 8, 2021, 4:51am

Working on also very little knowledge, I would think it would probably be slightly janky in several places if it came back, but shouldn’t be to the point of unsoundness afaik

LPTK · March 8, 2021, 11:11am

The real problem is that this would require structural types, and that structural types are usually an anti-pattern because they use a slow reflection-based implementation on the JVM.

With the reflectiveSelectable option, this currently works and could very well be inferred (making the annotation unnecessary), but that would be a bad idea as it would lead to surprisingly-inefficient code:

scala> def f(ab: A | B) = (ab: {val x: Int}).x
def f(ab: A | B): Int

bjornregnell · March 8, 2021, 11:17am

Aha. Thanks. So the compiler does not know statically that both A and B has x?

LPTK · March 8, 2021, 11:33am

The compiler does of course know this statically, since there are no runtime checks in the code. But the JVM has no instruction for “select a field named x in any class with such a field”, so compiling this code requires some form of dynamicity. There are other ways of compiling such code, but with different tradeoffs.

smarter · March 8, 2021, 11:39am

No, the way this was implemented before https://github.com/lampepfl/dotty/pull/1550 was that the compiler expanded ab.foo into ab match { case ab: A => ab.foo; case ab: B => ab.foo }.

bjornregnell · March 8, 2021, 11:39am

I agree that hidden runtime costs are bad. But in the simple example above, where A and B are as is (or even if they were final or something else to perhaps help the compiler further), could the compiler not statically (without any runtime checks) just see that x is available in both, call it, and infer the return type to Int, which is the type of x? (Forgive me if I’m missing something here and thanks for taking the time to explain.)

bjornregnell · March 8, 2021, 11:40am

Aha! Yes exactly. What I’m after is not having to write that clunky match myself

smarter · March 8, 2021, 11:43am

I think it would be helpful for the discussion if you could come up with practical usecases where you’d want to do that, then we can compare it to alternatives.

bjornregnell · March 8, 2021, 11:45am

Thanks. I’ll try to do that, but I need to minimize my code first…

LPTK · March 8, 2021, 11:50am

Interesting. But this is not a general solution, and don’t think it would cover all TypeScript-style usages.
For instance, something like {class A(val x: Int); class B(val x: Int); Set(new A, new B)}.head.x would not work.

smarter · March 8, 2021, 1:12pm

Sure but that’s a different problem, when you have local classes like this, the type of the block ends up being widened to Set[_ <: AnyRef] so there’s no selection on a union anymore.

bjornregnell · March 8, 2021, 1:41pm

Well, it is also kind of “janky” if working with unions is less ergonomic than it could be…

mbloms · March 8, 2021, 3:44pm

I think it’s important to understand that x in A and x in B are completely unrelated unless x is also a member of some common super type. The fact that they share name and type signature is irrelevant as far as the JVM is concerned. The bytecode for calling (o: A).x and (o: B).x is different and it doesn’t help that x is final either. So the only way to do this at the bytecode level is with some kind of runtime check.

Maybe there are cases where this would be useful, but the best way to do this is always going to be putting x in a common parent of A and B. (though that’s not always possible)

Personally I think it would be a bit confusing if the return type of x in A and B was tightened and some code stopped compiling. Let’s say x had type X. Let’s also say that Y and Z are subtypes of X and that I change the type of x in A to Y and x in B to Z. What is the type of (ab: A | B).x now? It’s a bit weird if this change, which should be compatible, breaks code. But if the type is Y|Z or X, then for any two methods with the same name and signature the return type of calling them as a union will be Any. This could potentially lead to some errors not being caught.

Maybe it’s not that big of a problem, but there are edge cases to consider here in addition to the performance.

bjornregnell · March 8, 2021, 4:24pm

With this post I’m trying to give some use cases, in context, of where unions could be more ergonomic. (Apologies for a rather long post…)

To set the use cases in context of real code I will use the domain of music and try to extract “minimal” parts of a library I’m working on while learning about the shiny new Scala 3 goodies.

First I wanted to try to use unions for working with empty values (in the context of music, I have used it for silence, and non-available notes etc. coming later), esp. when the heavier monadic and parameterized Option feels a bit bulky, and also without resorting to null and Null, byt simply:

sealed abstract class Empty  
object Empty extends Empty:
  override def toString = "Empty"

… and then by extension of T | Empty providing some useful goodies and also a shim to Option, like so:

extension [T](x: T | Empty)
  def isEmpty: Boolean = x.isInstanceOf[Empty]
  
  def nonEmpty: Boolean = !isEmpty
  
  def get: T = x match
    case Empty => throw new NoSuchElementException("Empty.get") 
    case _ => x.asInstanceOf[T]
  
  def getOrElse(default: T): T = x match
    case Empty => default 
    case _ => x.asInstanceOf[T]

  def toOption: Option[T] = x match
    case Empty => None
    case _ => Option(x.asInstanceOf[T])
end extension

Enhancement 1: more precise matching

A first observation is the use of asInstanceOf[T], that would have been nice to avoid. I think (in my possibly naive assumption) that if case Empty has not fired then the compiler should be able to see that the rest what is left of T | Empty is T, but the compiler stops at Matchable:

scala> extension [T](x: T | Empty)
     |   def toOption: Option[T] = x match
     |     case Empty => None
     |     case t: T => Option(t)
     | 
4 |    case t: T => Option(t)
  |            ^
  |            pattern selector should be an instance of Matchable,
  |            but it has unmatchable type T | Empty instead

Enhancement 2: more precise inference

A second observation is that the compiler can infer something else than matchable if there are only two members of my union, but if more it doesn’t:

scala> val ie: Int | Empty = 42
val ie: Int | Empty = 42

scala> ie.get 
val res3: Int = 42  // This is a nice and  precise type :)

scala> val ise: Int | String | Empty = 42
val ise: Int | String | Empty = 42

scala> ise.get
val res4: Matchable = 42   // Would have been nice if it was inferred as Int | String

I have reported this here: Make type inference preserve user-written union types with more than two members to infer a more precise type than Matchable · Issue #11449 · lampepfl/dotty · GitHub

Now to the use case of union instead of using a hierarchy in a musical context.

First some basic integer boxings before we go on, including the notions of Pitch (modeled as a positive integer not more than 127, just as in the MIDI standard) and the notion of PitchClass that represents harmonic equivalence over octaves using modulus 12, and finally the representation of relative distance between pitches, called an Interval:

final case class Pitch private (toInt: Int) 
object Pitch:
  def apply(i: Int): Pitch = 
    new Pitch(if i < 0 then 0 else if i > 127 then 127 else i)

final case class PitchClass private (toInt: Int) 
object PitchClass:
  def apply(i: Int): PitchClass = new PitchClass(math.floorMod(i, 12))

final case class Interval private (toInt: Int) 
object Interval:
  def apply(i: Int): Interval = 
    new Interval(if i < -127 then -127 else if i > 127 then 127 else i)

With these definitions we can now model a sequence of relative intervals (Steps) and a musical Scale and a Chord, that later will be used to form a union.

case class Steps(toVector: Vector[Interval])

case class Scale(root: PitchClass, steps: Steps):
  def pitchClasses: Set[PitchClass] =
    var sum = 0
    val xs = collection.mutable.ListBuffer(Interval(0))
    for (step <- steps.toVector) do  
      sum += step.toInt
      xs.append(Interval(sum)) 
    xs.dropRight(1).map(iv => PitchClass(root.toInt + iv.toInt)).toSet  

case class Chord(root: PitchClass, pitchSet: Set[Pitch]):
  def pitchClasses: Set[PitchClass] = pitchSet.map(p => PitchClass(p.toInt))

case class Tuning(strings: Vector[Pitch]):
  def apply(string: Int, pos: Int): Pitch = Pitch(strings(string).toInt + pos)

Enhancement 3: automatic matching on common members

For some fretted string instruments such as a guitar, we can model a fret board called Fret with strings that have a Tuning. The purpose of Fret is to (in terminal with ASCII graphics) show a visualization of a fret board that highlights selected pitches, based on a musical context, including a scale or a chord or just a set of pitches, given a certain tuning.

Here a union Filter is used to determine which pitches are supposed to be included in a fret board visualization (look at the type aliases in object Fret):

case class Fret(tuning: Tuning, selected: Fret.Filter = Empty):
  val pitchesOnString: Fret.Matrix = selected match
    case Empty => Fret.pitchMatrix(tuning)
    case pcs: Set[PitchClass] => Fret.filter(Fret.pitchMatrix(tuning), pcs)
    case s: Scale => Fret.filter(Fret.pitchMatrix(tuning), s.pitchClasses)
    case c: Chord => Fret.filter(Fret.pitchMatrix(tuning), c.pitchClasses)

object Fret:
  val MaxNbrOfFrets = 18
  type Matrix = Vector[Vector[Pitch | Empty]]
  type Filter = Empty | Set[PitchClass] | Scale | Chord

  def pitchMatrix(t: Tuning): Matrix = 
    Vector.tabulate(t.strings.size)(s => Vector.tabulate(MaxNbrOfFrets)(f => t(s,f))) 
  
  def filter(m: Matrix, pcs: Set[PitchClass]): Matrix = m.map { string => 
    string.map(pitchOrEmpty => 
      if pitchOrEmpty.nonEmpty && pcs.contains(PitchClass(pitchOrEmpty.get.toInt)) 
      then pitchOrEmpty else Empty 
    )
  }

So what is the point of this example? The point is that Scale and Chord and Set[PitchClass] are not, from a domain perspective, related in a way that seems to motivate a base type and a hierarchy. Of course we could have constructed one, but a union seems better IMHO.

What is the problem with this? If you look at the match used when constructing pitchesOnString in case class Fret you see code duplication. Could we get rid of it? At least not currently this way:

scala> 
case class Fret(tuning: Tuning, selected: Fret.Filter = Empty):
  val pitchesOnString: Fret.Matrix = selected match
    case Empty => Fret.pitchMatrix(tuning)
    case pcs: Set[PitchClass] => Fret.filter(Fret.pitchMatrix(tuning), pcs)
    case x: (Scale | Chord) => Fret.filter(Fret.pitchMatrix(tuning), x.pitchClasses)

5 |    case x: (Scale | Chord) => Fret.filter(Fret.pitchMatrix(tuning), x.pitchClasses)
  |                                                                     ^^^^^^^^^^^^^^
  |                       value pitchClasses is not a member of Scale | Chord

And that is the motivation for the initial example in the first post: it would be great if the above would “just work”

In summary:

There are domain modelling cases where a union fits better than a hierarchy, but the ergonomics could be improved to make the modelling with unions more pleasant.
When matching, the set of union members possible in a case could be trimmed as cases are subsequently proven not to match (see enhancement 1 above)
Type inference of unions with more than 2 members could be more precise (see enhancement 2 above)
Structural members could be found to reduce code duplication and avoid clunky matches (see the initial code snippet in the first post in this thread, and enhancement 3 above).

Hope this helps in further discussions of union use cases. (I’m also very open to scrutiny of my modelling choices above )

mbloms · March 8, 2021, 5:45pm

I don’t really see the conflict. There may very well be good reasons to use union types instead of some base trait, but that doesn’t mean there can’t be a base trait. If you don’t want the compiler to infer the base trait with pitchClasses, you can use transparent traits to accomplish that.
https://dotty.epfl.ch/docs/reference/other-new-features/transparent-traits.html

bjornregnell:

extension [T](x: T | Empty)
  def isEmpty: Boolean = x.isInstanceOf[Empty]
  
  def nonEmpty: Boolean = !isEmpty
  
  def get: T = x match
    case Empty => (throw new NoSuchElementException("Empty.get")).asInstanceOf[T] 
    case _ => x.asInstanceOf[T]
  
  def getOrElse(default: T): T = x match
    case Empty => default 
    case _ => x.asInstanceOf[T]

  def toOption: Option[T] = x match
    case Empty => None
    case _ => Option(x.asInstanceOf[T])
end extension

Could you do something like this? (I haven’t tested it):

extension [T <: Matchable](x: T | Empty)
  def isEmpty: Boolean = x match
    case Empty => true
    case _ => false
  
  def nonEmpty: Boolean = !isEmpty
  
  def get: T = x match
    // no type coercion needed, Nothing is a subtype of anything.
    case Empty => throw new NoSuchElementException("Empty.get")
    case x: T => x
  
  def getOrElse(default: T): T = x match
    case Empty => default 
    case x: T => x

  def toOption: Option[T] = x match
    case Empty => None
    case x: T => Some(x)
end extension

My humble opinion is that your Empty use case would almost definitely be better served either by using Option or Null. There’s no real reason a Empty class is better than Null in a union type. Null is only dangerous without Explicit Nulls

https://dotty.epfl.ch/docs/reference/other-new-features/explicit-nulls.html

smarter · March 8, 2021, 5:48pm

But it would only “just work” because two unrelated classes happen to have a method with the same name. And even then, this isn’t very user-friendly: when I see x.pitchClasses, I’d like to read its documentation to know what it means, but if it’s not defined in a base type that means I need to read the documentation of that method in every member of the union, then mentally compute the common part of each documentation comment to understand what this code may do. It would be much cleaner to define a base trait for both Scale and Chord where pitchClasses is explained once, and so we should encourage users to do that instead of giving them a way to avoid doing that.

bjornregnell · March 8, 2021, 5:52pm

Thanks for proposals and input. Testing your refined types gives:

scala> extension [T <: Matchable](x: T | Empty)                                                                                
     |   def isEmpty: Boolean = x match
     |     case Empty => true
     |     case _ => false
     |   
     |   def nonEmpty: Boolean = !isEmpty
     |   
     |   def get: T = x match
     |     // no type coercion needed, Nothing is a subtype of anything.
     |     case Empty => throw new NoSuchElementException("Empty.get")
     |     case x: T => x
     |   
     |   def getOrElse(default: T): T = x match
     |     case Empty => default 
     |     case x => x
     | 
     |   def toOption: Option[T] = x match
     |     case Empty => None
     |     case x => Some(x)
     | end extension
15 |    case x => x
   |              ^
   |      Found:    (x : T | Empty)
   |      Required: T
   |
   |      where:    T is a type in method getOrElse with bounds <: Matchable
19 |    case x => Some(x)
   |                   ^
   |       Found:    (x : T | Empty)
   |       Required: T
   |
   |       where:    T is a type in method toOption with bounds <: Matchable

Well, I have learned to stay away from nulls, but maybe its different with explicit nulls and I should try that. Somehow that simple, non-generc object Empty seems like the simplest thing, and thus appealing to me.