Union type widening

JamesGallicchio · August 28, 2017, 5:18am

I recently found out about an odd behavior in dotty when using union types. Currently, the compiler widens union types to a common supertype when inferencing the types of vals, defs, and type parameters.

I saw in another topic discussion about singleton union types being disallowed because this would open issues with “normal” type handling in the compiler- but I don’t see how in the more general case union types can’t be treated like any other type. If all unions are essentially always widened to a supertype, that renders them unusable for most of their desired behavior.

What is the reasoning behind this design, and are there plans to rectify their usability?

NthPortal · August 28, 2017, 9:19am

Do you mean union types?

felixmulder · August 28, 2017, 9:21am

The TL;DR is ultimately that type inference is hard - it should be pragmatic enough that you nearly always get the type you wanted, but strict enough that you don’t loose precision. These goals have to be met, while the compiler has to be as fast as possible (faster than scalac anyway!). There are trade-offs between all of this and I, personally, really like the way Dotty infers types right now.

Both compound types and union types incur a penalty when being inferred. My hunch from reading your post is that you are referring to discussions regarding union types a.k.a “or-types”: e.g: Int | String. Where intersection (or compound) types are A & B which is equivalent (almost) to Scala 2’s A with B.

So let’s talk about union types For instance, if you do:

val x = if (cond) 1 else "doge"

In Scala 2, you’ll get the inferred type Any which is the least upper bound between the two types since (in pseudo code):

1 <: Int <: AnyVal
"doge" <: String <: AnyRef
(AnyRef | AnyVal) <: Any

Since Dotty has union types, we have to decide what to do in the if-expression above:

Do we infer a union i.e. the least upper bound between the two (Any)?
Do we simply infer the union (Int | String)?

We could have gone even further with singleton types:

Do we infer the type 1 | "doge"?
1 | String?
Int | "dog"?

As you can imagine, all this comes with a lot of complexity, subtype checking becomes more expensive, the deeper the type is the longer it takes. For types which contain dis/con-junctions we have to potentially check twice as many things. It doesn’t help if they are aliased, or derived from a type member or term paths.

Another thing that is quite annoying is that most of the time, you really don’t want the most specific type, you want the compiler to automagically widen things - but not too much. Imagine you did:

case class InvariantList[A](xs: A*)

trait A
trait B
class C extends A with B
class D extends A with B
val xs = InvariantList(new B{}, new C) // xs: InvariantList[C | D]

def foo(xs: InvariantList[A & B]) = ???

foo(xs) // error: expect InvariantList[A & B], got InvariantList[B | C]

Now you see that we got an inferred type which was more precise than we wanted to, and since our list is invariant, now it doesn’t fit in the type

Sadness… but, don’t worry, Dotty does the right thing in the above example and widens the type of xs to InvariantList[A & B]

So what do we do currently with regards to unions and intersections and singletons of the twain? Well, through several issues and user reports, we have adapted the type inference to take a more pragmatic approach. I’m not sure the discussion you’ve read are 100% up to date, we do infer union types where they are expected, but we don’t infer them all the time. We currently don’t support unions with singleton type members, this is for technical reasons, we hope to have them eventually. When it comes to automatically inferring unions, we have some kinks to work out, for instance:

def foo(x: String | Int) = println(x)

val x = if (cond) 1 else "doge"

foo(x) // error: got Any expected `Int | String`

But, when the compiler can figure out beforehand that the expected type is a union, there is no problem:

def foo(x: String | Int) = println(x)

foo { if (cond) 1 else "doge" }

It’s safe to say that unions and intersections are a useful concept - and as @cvogt said about unions, the place where they are most useful is most likely as an anonymous ADT in the case where you have two disjoint types that should be the argument to the same method:

def closeResource(cls: Closeable | MyFileReader): Try[Unit] = Try {
  closeable match {
    case cls: Closeable =>
      cls.close()
    case cls: MyFileReader =>
      cls.doSomeCleanupThingy()
      cls.closeFileReader()
  }
}

At the call site, you’d probably never want to infer a union, you’d always have a single part of the disjunction e.g. MyFileReader or some subclass of Closeable, if you need to keep both, you’ll have to annotate the types. So from this use case, we’re doing what makes sense.

Currently, Dotty nearly always gets what you want and with less boilerplate than in Scala 2 to boot. I used the Dotty compiler when rewriting the Dotty REPL, and I must say - when I converted the project to Scala 2 (we had to for sbt compatibility), it felt a bit painful to lose the new features and the type inference in Scala 2 felt somewhat clumsy in comparison. Obviously, being on the compiler team, I’m extremely biased, but that’s my take on it anyway

Hope this answers your questions, if you have any more wonderings regarding type inference and our type system in Dotty, @odersky and @smarter are your top experts.

Cheers,
Felix

curoli · August 28, 2017, 10:14am

Hello,

Just wondering, if it so happens that there is a method Party.launch and
ICBM.launch, can I call the launch method on a val of type Party | ICBM in
Dotty?

In code:

class Party { def launch: Unit = println(“Good time!”) }
class ICBM { def launch: Unit = println(“End of the world!”) }
val launchable = if(cond) new Party else new ICBM
launchable.launch // Is this legal in Dotty?

 Best, Oliver

NthPortal · August 28, 2017, 10:42am

In order for it to even hypothetically work, I believe you need to have

val launchable: Party | ICBM

Regardless, it does not seem to work (tested on Scastie), though I don’t know why not.

curoli · August 28, 2017, 11:21am

I’m glad my example doesn’t work, since that would be a big loss of type
safety.

Are you saying that for

val launchable = if(cond) new Party else new ICBM

launchable gets inferred as AnyRef rather than Party | ICBM? I’d say that’s
a good thing - if you want unions, you should have to say so.

LPTK · August 28, 2017, 11:25am

AFAICT, it’s because launch in Party and launch in ICBM are two unrelated methods (despite their identical signature), so in order to call them with the same code you’d need a structural type (yet another direction in the type inference conundrum).
Notice that writing (launchable:{def launch: Unit}).launch gives a different error (Dotty requires structural types to extend Selectable to be used), and (launchable:Selectable{def launch: Unit}).launch works, as long as both Party and ICBM extend Selectable.

LPTK · August 28, 2017, 11:49am

I hope that once support for unions of literal types is there (if it ever is), we can use them as lightweight enums, so that the following works:

type MyEnum = "On" | "Off" | "Unknown"
def parse(str: String): Option[MyEnum] = 
  str match { s: MyEnum => Some(s) case _ => None } // compiles isInstanceOf correctly
def use(state: MyEnum) = x match { case "On" => } // warns about non-exhaustive pattern matching
use(state = "On")
use(state = "Up") // raises a type error here

It would be even nicer (and more efficient?) to allow symbol literal singleton type unions, as in:

type MyEnum = 'On | 'Off | 'Unknown

NthPortal · August 28, 2017, 12:21pm

Yup (though for some reason, dotc says Object ¯\_(ツ)_/¯).

JamesGallicchio · August 28, 2017, 1:47pm

Aha, yes! Thank you. Edited post to reflect that.

rcano · August 28, 2017, 2:11pm

It’s clear that you are happy about the current situation, to me it’s looks awful :(, full of arbitrary rules that are sure to puzzle everyone, from rookie to expert, for ever and ever.

As you can imagine, all this comes with a lot of complexity, subtype checking becomes more expensive, the deeper the type is the longer it takes. For types which contain dis/con-junctions we have to potentially check twice as many things. It doesn’t help if they are aliased, or derived from a type member or term paths.

I also don’t understand who you are helping with these rules, the compiler? the user? one particular use case?

Another thing that is quite annoying is that most of the time, you really don’t want the most specific type, you want the compiler to automagically widen things - but not too much.

Sounds like hell, sure to help when you dont need dis/con-junctions, and sure to not help when you need them.

Now you see that we got an inferred type which was more precise than we wanted to, and since our list is invariant, now it doesn’t fit in the type

It is invariant, you got what you asked for. Otherwise, you are mangling the type inferencer for shoveling covariance into invariance.

Well, through several issues and user reports, we have adapted the type inference to take a more pragmatic approach

Sounds like you require machine learning to understand the current rules.

It’s safe to say that unions and intersections are a useful concept

Help me understand, with the current implementation, in comparison with scala 2, what exactly are you improving? just that closeResource method? because I can’t see any other use case, and that use case never felt like an issue to begin with.

I guess I’m really against un-teachable rules. I feel with this you are sure to annoy every scala expert that wants the most out of the type system, and you’ll never get people from languages like java or kotlin to make the jump because of how impossible to explain things like this are.

When I ask the compiler to do things in my stead, such as infer a type, I don’t want it to lose information, if the compiler cant track types, what chances do I have? Humans suck at state tracking. If anything, losing static information should be a deliberate action on my part.

Sorry for the rant. Cheers.

JamesGallicchio · August 28, 2017, 2:37pm

Is the speed of the compiler really hurt that much by treating union types as subtypes of the common superclass?

val test = if (cond) "hi" else Seq(1, 2, 3) // String | Seq

def func[T](iter: TraversableOnce[T]) = ???

func(test) // Should compile fine since TraversableOnce is a supertype of String | Seq common supertype

It just makes sense to me to treat String <: Seq | String <: (their common supertype). In fact, in this way intersection and union types go hand in hand- if the common supertype is some weird group of types, union types enable you to describe it exactly. As you said this can be a big slowdown to the compiler, but by how much? It seems more like something that should be enabled by a compiler flag, rather than excluded completely.

I’m not sure exactly how I would want the compiler to handle singleton union types- in many cases you want the types, rather than the singletons, but as @LPTK pointed out singleton unions could be useful for enumlike constructs. It would make the most sense to just assume one over the other always (and force a user to ascribe the type if they want the other), but we’d have to look at how many use cases each has.

In my mind, for consistency it would make the most sense to always assume types over singletons- that way, you avoid assuming types such as "hi" | Int.

NthPortal · August 28, 2017, 7:04pm

The following comment (as well as the rest of his reply) might be helpful.

JamesGallicchio · August 28, 2017, 9:41pm

I actually had read that, but I still don’t understand why that’s an issue- yeah, it would grow linearly. Intersection and union types can get unwieldy rather quickly, if used improperly. That’s not something that the compiler should mandate against but rather a good programmer should avoid. Scala has a lot of constructs that could be used poorly, but the compiler doesn’t actively work to prevent programmers from using them.

On top of that, linear growth (in terms of computing time) is really not that terrible for small computations. It’s not as if every type in a project is going to be a union type; they’re useful in a select few (somewhat important) cases.

A lot of the info in that link is about singleton types and the issues they present (which is why it was opted to widen singleton types), but that doesn’t explain why union types in general are being widened so quickly.

The compiler already (obviously) can compute the least upper bound for a union type, because it already makes use of this. I just don’t see why that process isn’t delayed until something necessitates the widening of the type.

I think what would be most useful is an example of where union type widening is what is desired, because the example presented doesn’t really make sense. I think why I’m confused is that I see no reason why A | B shouldn’t just be treated as a subtype of the least upper bound (outside of performance issues), rather than being actively replaced with this bound by the compiler.

felixmulder · August 29, 2017, 8:36am

It’s clear that you are happy about the current situation, to me it’s looks awful :(, full of arbitrary rules that are sure to puzzle everyone, from rookie to expert, for ever and ever.

I’m indeed very happy with type inferencing in Dotty - it’s not awful, it’s not unclear. In fact it’s the opposite: the type inference in Dotty gets out of your way.

Simple things like defining foldLeft no longer need the lambda expression to be in a separate parameter list. No need to do case when mapping over tuples. You can refer to previous arguments in the same parameter list. If anything, the inference in Dotty is less surprising, and more sure not to puzzle users. It requires less annotations and less work to make the compiler understand your code.

I’m sorry if the content of my post wasn’t enough to persuade you that the current status of the type inferencing in Dotty is quite good - and also getting better rapidly. Try it out, I’m sure you’ll find some corner cases where the inferencing fails, but from using it as my main compiler for a medium-sized project - I’m really happy with it.

It is invariant, you got what you asked for. Otherwise, you are mangling the type inferencer for shoveling covariance into invariance.

It is invariant indeed, but if you look at xs you’re probably expecting the type to be a list of the LUB (InvariantList[A & B]), not the union of all list members (imagine it being 5 times as long with common LUB A & B) - as such, if the type hadn’t been widened here, you would’ve gotten a type that is too precise for foo.

Sounds like you require machine learning to understand the current rules.

There are principled and clear rules to how we infer types, but there will always be corner cases and scenarios where you, the user, don’t get the expected type. In these cases, we need to take a step back and look at why you’re not getting the type you assumed, if it is indeed a valid case, most likely it is a bug which goes against the rules - and so, we fix it.

I’m sorry that I can’t refer you to a spec for type-inferencing, hopefully that’s something we can get around to producing once we’re nearer stable releases.

Hope this reassures you that we’re not out to confuse and annoy users “for ever and ever”. This is a project were we welcome user feedback and value your opinions. We don’t just try to meet everyone with positivity and openness, we also try to understand where you’re coming from.

Shoot me an email ([email protected]) if you have more questions or worries, I’d be happy to arrange for us to have a Skype or Hangouts call - I’d rather talk face to face, and make sure we understand each other than to spend time communicating through a medium where I feel like our understanding of each other is only superficial.

Cheers,
Felix

rcano · August 29, 2017, 5:38pm

I lacked context in my post I guess, I was talking only about union types plus type inference. The fact that you no longer need to have the lambda expression in a separate parameter list for things like folds sounds great, but you’ll probably still do it because of syntax sugar when using it.

Try it out, I’m sure you’ll find some corner cases where the inferencing fails

My sad story is that every time I want to try it out, to test an interaction I’ve been craving for with scala 2, I get an unpleasant surprise like the one with union types. I guess I’m just unlucky.

It is invariant indeed, but if you look at xs you’re probably expecting the type to be a list of the LUB

Nope, I was expecting the full thing. If I wanted to discard types information, I would have annotated my statement, just like I do today with scala 2 when I do something.fold(None: Option[Blah]) because I don’t want to use None.type.

There are principled and clear rules to how we infer types

Well, when I try to read about them, I get greeted with http://dotty.epfl.ch/docs/reference/changed/type-inference.html, so I have to take your word here. My point here was, that even if you have clear rules, if you have a list of ~10 items and ~5 exceptions that compose those rules (just throwing numbers), then clear is not a term I would choose, I would go for well-intentioned but complex.

but there will always be corner cases and scenarios where you, the user, don’t get the expected type. In these cases, we need to take a step back and look at why you’re not getting the type you assumed, if it is indeed a valid case, most likely it is a bug which goes against the rules - and so, we fix it.

Honestly this scares me a bit. It reminds of Perl, where the people that write the compiler have the maxim of trying to produce the result the user would expect.
Trying to guess what a person that doesn’t really know what he wants (because he maybe is exploring stuff, or is just not intimate with scala’s type system (maybe even precisely because of how many corners and nooks it has)) doesn’t sound appealing.

Hope this reassures you that we’re not out to confuse and annoy users “for ever and ever”.

I’m sure you are not, I’m expressing what I feel will be the result, and I’m as biased as you so my opinion is as worth as any other feeling, but I hope I manage to articulate it good enough so that even if it is discarded, it is so for good reasons.

Shoot me an email ([email protected]) if you have more questions or worries, I’d be happy to arrange for us to have a Skype or Hangouts cal

I appreciate the offer, though I don’t think it would be a good use of your time trying to convince random people of the internet . Also, despite me using scala for about 8 years, I’m by no means qualified to discuss with you. I wanted to raise my concerns and so I did, that’s the least I could do as a long time scala user that is thankful to the language and the people that made it happen. After doing so, I’ll trust your vision and hope for the best .

Cheers.

smarter · August 29, 2017, 7:38pm

Until we get around to documenting things better, here’s a talk I gave last year that goes over some of the differences: https://www.youtube.com/watch?v=YIQjfCKDR5A . For a real world example, see the commit where Felix made the REPL compile with Scala 2 (it was only compiled with Dotty before): https://github.com/lampepfl/dotty/pull/2991/commits/bc48498f93110717b2b9eaab25dc475ba0ab70aa

Blaisorblade · August 30, 2017, 2:20am

First: while I don’t watch videos, I’ve found slides and pointers to code. Union types aren’t directly mentioned in either place.

http://guillaume.martres.me/talks/typelevel-summit-oslo/

Actual inference code (link from slides, updated):
dotty/compiler/src/dotty/tools/dotc/typer/Inferencing.scala at main · lampepfl/dotty · GitHub or (right now) dotty/compiler/src/dotty/tools/dotc/typer/Inferencing.scala at 360121431b3da6e3d5c35c0946e4af0d7c8eee61 · lampepfl/dotty · GitHub (364 lines now).

It’s good that you have principled and clear rules.

But users forming a mental model by doing guesswork will, in general, NOT come to guess the same rules. People have WILDLY different expectations. Written rules help forming a mental model. People learn in different ways, and some prefer rules to trial and error.

Viceversa, it’s hard for compiler writers to form and maintain consistent rules without writing them down.

You did say you’re biased. A likely source of bias is that you are very familiar with the correct mental model.

Satisfaction depends on how you want to use type inference. I see two approaches:

One can write and compile code, check inferred types, and add annotations if the inferred type is not the expected one. I easily believe that doing this with Dotty will work better than with Scalac. I feel this is the approach you’re using. But having to check results can easily interrupt programming flow (where I use flow in the technical sense). Also, some fancy programs might just be impossible to write, or they might require sheer luck.
Anecdotally: I started getting in touch with compiler hackers by failing to tame Scalac and filling JIRA with my violated expectations; for some things I found workarounds (and they’re fixed in Dotty), other things are still open (and hard!) problems.
Alternatively, one can learn the rules and rely on them, adding annotations when I know type inference won’t help, or changing the design so that type inference will work. Some people enjoy working this way.
And for some things, I know no such rules for Scalac.

Conversely, I know singleton types are never inferred, so I annotate them as needed when writing code (at least in certain patterns), without needing to think too much about it.

Having to add annotations in response to errors would take (me) longer, possibly get me out of flow, and be overall more annoying. I can’t say how much actual time is lost there, and I don’t think the same estimate works for different people.

I feel a reason some are so vocally annoyed at Scalac is that it’s often hard to predict what it’ll do, even for experts. (Guess: People with this complaint are likely to prefer GHC).
I know that’s one reason why in my research group we use Scala a lot but still never taught it to beginner students, but only use it in more advanced courses. (Also, I know nothing like the How to Design Programs curriculum for Scala).

Dotty is one the road to being more predictable. I wish it goes on, and I hope with my next job I’ll have a chance to contribute more seriously.

LPTK · August 30, 2017, 9:24am

On the one hand, it’s true that value/variable bindings have always acted like boundaries to Scala’s local type inference, so it’s not completely unexpected or unnatural that some widening (of literal types and unions) would happen on those boundaries too. It’s also consistent with how the type of values is widened (e.g., how x in val a = new A; val x = a has type A, not a.type).

On the other hand, one of the pain points of Scala’s type inference, in my experience, is how it sometimes infer useless upper bounds because of a small mistake somewhere in a program.
For example, when writing:

val l0 = List("a","b","c")
val l1 = List('a, 'b, 'c)
val lsum = l0 ++ l1

Where the last line should actually be:

val lsum = l0 ++ l1.map(_.name)

Trying to use lsum as in:

lsum.reduce(_ + _.length)

Gives us the unhelpful "value length is not a member of java.io.Serializable".

In simple examples like this, it’s easy to find the problem. But if you have a big pipelined transformation, especially if expressed in one single expression (admittedly, not the best style), this can be really annoying to find.

Instead of that error, having "value length is not a member of String | Symbol" would probably be more helpful, as it tells me exactly what went wrong.

AleksanderBG · October 25, 2017, 7:22am

There’s one thing which could be mentioned here - it’s actually possible to make Dotty infer union/intersection types for variables with this one weird trick:

scala> val a = if (true) 0 else ""
val a: Any = 0
scala> val a @ _ = if (true) 0 else ""
val a: (Int | String) @unchecked = 0

While this looks a bit hacky, it seems like this behaviour can be reasonably relied on. Destructuring assignment should behave exactly the same as pattern matching, and in pattern matching we don’t want to widen union types, so that we can assign precise types to bound variables and so that we can emit precise exhaustivity warnings.

I don’t know if this is a solid argument for things staying as they are, but at the very least this makes it possible to not explicitly annotate variables with union/intersection types everywhere we want one.