Synthesize constructor for opaque types

LPTK · December 4, 2020, 10:22pm

Yeah, unfortunately opaque types are a fundamentally leaky abstraction due to Scala’s pattern matching semantics, which can discover new type info at runtime, including type info which was hidden behind an opaque type.

This means Dotty’s new “immutable” array type IArray is kind of broken, as anyone can trivially make it mutable by just using pattern matching on it.

This issue is discussed in more detail here: https://github.com/lampepfl/dotty/issues/7314

mbloms · December 4, 2020, 10:34pm

Yes. @LPTK Slightly off topic, but still related to this discussion: could you explain what Martin means here:

Why can’t pattern matching on Any give warnings? I don’t really see why you must be able to pattern match on it at all.

LPTK · December 4, 2020, 11:00pm

I think he means that this semantics is an integral part of Scala as it’s designed right now, and that many people probably rely on it, so we can’t just outlaw it like that. I personally agree that if we were to design a new language from scratch, this should be disabled by default.

But as you said, something like the Parametric Top proposal could alleviate the problem with opaque types leaking – I had not thought about it in this context. If a parametric Top type is ever introduced, it should definitely be made the default upper bound of opaque types.

mbloms · December 4, 2020, 11:17pm

Haha well making it illegal in Scala 3 would maybe be a bit extreme, but I really see no reason why you wouldn’t be able to warn about it. Especially extractor patterns, which I used to think was safe and only sugar for calls to an unapply method.

alexandru · December 4, 2020, 11:41pm

If we go down that route, asInstanceOf is a big hole in the type system, that will invalidate any proof that the compiler tries to do.

val x: String = 1.asInstanceOf[String]

From my point of view, if opaque types are a leaky abstraction, then the entire Scala language is

And I understand that pattern matching is common, being something that people might want to do on a type. But isInstanceOf checks are in fact the leaky abstraction. Because of type erasure, which on top of the JVM is a way of living:

List(1, 2, 3) match {
  case _: List[String] => ??? // yep!
}

As that joke goes:

Doctor, it hurts when I do this
Then don’t do it

mbloms · December 5, 2020, 12:24am

There is a massive difference here in that most bad typed patterns give unchecked warnings, and if not, when someone tries to use an Integer like a String, at least there is a ClassCastException. This isn’t the case with opaque types since they have the same erasure.

What’s more: type tests which was previously always sound is now leaky because we can’t know without resorting to the closed world assumption that someone won’t come along and define a opaque type alias we don’t know about.

For this reason, one should never think of opaques as the same as Haskell’s newtype. If you can’t afford someone being able to successfully casting your opaque to the underlying representation by accident, then you should use something else.

But yes, Scala’s pattern matching is a leaky abstraction. I hope TypeTest will improve things, but phasing out unsound pattern matching from the language is not a small project.

LPTK · December 5, 2020, 9:42am

To add to what @mbloms said:

That’s exactly what asInstanceOf is as of today. It completely violate the soundness of any and all type system features which go beyond the JVM’s own very limited runtime type system.

Path-dependent types, refinements, singleton types, intersections, etc. are all unsound in the presence of asInstanceOf. This is understood and does not nullify the usefulness of the type system. You have similar unsound escape hatches in most practical languages, including OCaml, Haskell, and even Idris.

On the other hand, well-typed and warning-free pattern matching is not supposed to be an unsound escape hatch.

Not an unfounded point of view!

Whenever we talk about pattern matching in Scala, we’re implicitly also talking about isInstanceOf, which is supposed to correspond precisely with the type-filtering part of pattern matching semantics (something that fails to compile or warns with one should also fail to compile or warn using the other). With parametric Top, isInstanceOf would simply not be allowed on values of Top type.

odersky · December 5, 2020, 10:30am

I think the right way to go about leaky abstraction problems is to look into a parametric top type. Once we have Top, opaque types should have it as their default bound. This will rule out pattern matching, equality, hashCode, and toString on opaque types.

Here are some things to sort out:

This is a rather complex change, so it will probably have to come after 3.0. But since it does not affect binary compatibility, it could come soon after. However, there will then be a window of vulnerability where code can use universal methods on opaque types. This code will break when we change to parametric Top. Is that OK?
Logically, the same change should be done for abstract types and type parameters. I.e. type X would be type X <: Top, and [X] would be [X <: Top]. But this would probably break too much code. So we might have to live with the existing default Any and demand explicit opt-in for the Top bound.

LPTK · December 5, 2020, 10:47am

I’m really looking forward to this! Would it make sense to have a transition period where using non-parametric methods on unannotated abstract types raises a warning? This way, people will have time to move to <: Any bounds or proper type classes before the warning becomes an error.

odersky · December 5, 2020, 10:58am

Would it make sense to have a transition period where using non-parametric methods on unannotated abstract types raises a warning?

Yes, maybe. We could make the default upper bound type an alias of Any and rig the type-checker to issue a warning if an Any method is called on this one. But we’d have to evaluate how annoying this would be in practice.

mbloms · December 5, 2020, 3:04pm

I think there are many use cases that are hard to represent using the current type hierarchy, and changing it would be a very good opportunity to try and address as much of that as possible, while staying as compatible as possible with current code. It seems to me to be quite a big feat so it would probably be smart not to rush it.

That said, while having the full power of newtype in opaque types would be extremely useful, especially when it comes to IArray and similar use cases, they are already very useful despite the weaknesses.

For example, many of the use cases in Haskell export the constructor/destructor. This is used all the time to make it possible to safely define multiple given instances. A notable example of this is Monoid:
https://hackage.haskell.org/package/base-4.14.0.0/docs/Data-Monoid.html

This use case fits perfectly with opaque types, and isn’t at all diminished by morally dubious pattern matching:

mbloms:

object newtypes {
  opaque type Sum[A] = A
  object Sum {
    def apply[T](x: T): Sum[T] = x
    def unapply[T](w: Sum[T]): Some[T] = Some(w)
  }

  opaque type Prod[A] = A
  object Prod {
    def apply[T](x: T): Prod[T] = x
    def unapply[T](w: Prod[T]): Some[T] = Some(w)
  }
  
  opaque type Logarithm = Double
  object Logarithm {
    def apply(d: Double): Logarithm = math.log(d)
    def unapply(l: Logarithm): Some[Double] = Some(math.exp(l))
  }

  given Monoid[Prod[Double]] {
    extension (x: Prod[Double]) @targetName("mappend") def <> (y: Prod[Double]): Prod[Double] = x * y
    def mempty: Prod[Double] = 1
  }

  given Monoid[Sum[Double]] {
    extension (x: Sum[Double]) @targetName("mappend") def <> (y: Sum[Double]): Sum[Double] = x + y
    def mempty: Sum[Double] = 0
  }

  given (using m: Monoid[Sum[Double]]) as Monoid[Prod[Logarithm]] = m
}

I think the problem of code downcasting from Any breaking in the future could be solved quite cleanly:

Even with the addition of Top as a supertype of Any, opaque types with an upper bound that is a subtype of Any will still have the problems described above. One solution to this could be to already now disallow downcasts to opaque types. That way, opaque types can still be converted into the underlying type via Any, but not the other way around. Then when the opaque type is changed to have Top as it’s upper bound, it will no longer be possible to cast it to Any.
If in the future someone wants morally dubious type conversions to be possible, they can use Any as the upper bound explicitly.

Not allowing downcasting to an opaque type also fits quite nicely with the fact that usually if I define an

opaque type MySpecialBox <: Box = Box

I probably don’t want someone to downcast from any Box to MySpecialBox. If on the other hand I have:

opaque type Boxish <: AnyRef = Box

I really see no way to prevent someone from turning something Boxish into a Box. A Top type doesn’t help at all. The only way to prevent that would be by restricting typed patterns quite radically, which would be an interesting move, but probably a bit too extreme. Actually @odersky, do you have an idea of how hard it would be to require a TypeTest instance for all pattern matching?

TL;DR:

Opaques are great, they will be even better with Top
Considering adding any kind of sugar like case opaque type should wait until we have Top
Downcasting to opaque types should be illegal (unless using the-method-which-should-not-be-named)
Let users worry about the other direction until we have Top

odersky · December 5, 2020, 6:18pm

I agree that pattern matching against an opaque type pattern should give at least an unchecked warning: https://github.com/lampepfl/dotty/pull/10664

mbloms · December 6, 2020, 12:25am

Would it be possible to make it illegal? People that want to do a type coercion could always use asInstanceOf. Maybe wouldn’t be so bad to have some precedence for illegal type testing? People defining opaque types could always define TypeTest instances if they want type tests to be possible.

smarter · December 6, 2020, 12:39am

Ignoring unchecked warnings is already unsafe, so I don’t think that going further for one special case is useful, perhaps there’s a debate to be had on whether unchecked warnings should be errors by default but that’s straying off topic.

japgolly · December 6, 2020, 1:50am

Yeah that’s fascinating. I’ve had plenty of experiences where avoiding boxing made a significant perf improvement (very rigorously validated by JMH), and I’ve also had experiences where I’d a bunch of work that is undoubtedly a theoretic improvement only the see the results either not change or even get worse sometimes. Part of me thinks domain vs generic is a heuristic for determining with boxing will make a difference but I wouldn’t put much faith in that, even myself. The JVM still perplexes me to this day. Anyway…

I did not close any PR or issue, or shut down any discussion, just offered my opinion. Others are free to disagree.

100% and I want to emphasise that I haven’t seen you personally close a PR or shut down a discussion. As far as I can see, no one at all has shut down any discussion on this topic yet. To clarify, the reason I mentioned that at all is that it’s a pattern I’ve seen a few times where a discussion concludes but, for right or wrong, a majority (or maybe just a very large group) are still unconvinced, and the documentation and/or the community fail to sway new users who come across the same common use case. In those situations, PRs and discussions do end up just getting shut down because in the eyes of maintainers, the ship has sailed, the debate has been had and there’s no value in repeating it. To me this issue is (was?) starting to look like it would go down that path which is why I want to highlight it now so that even if we don’t modify the implementation, we beef up the documentation to transparently best address those common use cases as best we can, even if a significant amount of people disagree that opaque types are relevant to those use cases.

mbloms · December 6, 2020, 3:32pm

Fair enough. I think it’s safe to say that I already derailed this topic to the limit already.

Before I get a grip on myself and stop spamming I just want to say thanks to @odersky and @smarter for taking the time to address this despite everything on your plate with the Scala 3 launch! As a newcomer on this forum, that feels great!

odersky · December 7, 2020, 12:53pm

Some new info gained from experiments is here: https://github.com/lampepfl/dotty/issues/10662#issuecomment-739852480

FelixHargreaves · December 14, 2020, 6:57am

This is very interesting. Am I correct in assuming this deals with edge cases where the newtype-like opaque type usage is unsafe? (I’m missing the tl;dr)

mbloms · December 14, 2020, 9:40am

Yes, that’s right. I’ll try to summarize:

EDIT: To clarify, the solution makes it safe to emulate Haskell’s newtype using opaque type, but only if no upper type bound is exposed, like in Haskell which has no subtype polymorphism at all, only parametric polymorphism.

Summary

Problem

The semantics of opaque type is not the same as newtype in Haskell. opaque type is actually much more general. The main reason opaque type doesn’t do encapsulation in the same way newtype do is that since Scala allows you to pattern match on litterally Anything, you can expose the underlying type representation (intentionally or by mistake) using pattern matching.

Quoted example (minimized)

mbloms:

This is legal in Scala:

(any: Any) match {
  case Name(str) => Some(str)
  case _ => None
}

This is not legal in Haskell:

newtype Name = Name String
anyToName :: forall a. a -> Maybe String
anyToName x = case x of
    Name str -> Just str
    _ -> Nothing

For this reason, users using the pattern of defining a constructor/unapply method will experience some gotcha moments when opaque types don’t behave like case classes would have.*

Solution

This is solved in two steps:

Disallow opaque types in typed patterns (like the one above).
Restrict Any so that only subtypes of a new trait Matchable can be used as the scrutinee in pattern matching.

(1) prevents conversions like String -> Name by warning on patterns like:

(str: String) match {case n: Name => n}

Because Name can’t appear in a typed pattern in the case clause anymore.

(2) prevents conversions like Name -> String by warning on patterns like:

(n: Name) match {case str: String => str}

Because Name isn’t a subtype of Matchable so it can’t be pattern matched on at all.

Demo: (minimized)

scala> object n :
     |   opaque type Name = String
     |   object Name :
     |     def apply(str: String): Name = str
     |     def unapply(n: Name): Some[String] = Some(n)
// defined object n

scala> import n._

scala> "hi there" match {case n: Name => n}
1 |"hi there" match {case n: Name => n}
  |                       ^^^^^^^
  |                     the type test for n.Name cannot be checked at runtime
val res0: String & n.Name = hi there

scala> Name("Franz Kafka") match {case str: String => str}
1 |Name("Franz Kafka") match {case str: String => str}
  |                                     ^^^^^^
  |                      pattern selector should be an instance of Matchable,
  |                      but it has unmatchable type n.Name instead
val res1: n.Name & String = Franz Kafka

See this for more:

Note that this isn’t in master yet, and warnings won’t be turned on in 3.0.

*This is also why case classes should still be encouraged and preferred when their semantics is needed! Some kind of sugar like case opaque type could maybe be added in the future when the semantics of opaque type are better understood in practice and the feature has matured more. Preferably these unsafe patterns should be compiler errors rather than just warnings before such sugar is added.

smarter · December 14, 2020, 1:10pm

Matchable isn’t enough to make all usages of opaque types safe as I mentioned in Add `Matchable` trait - #3 by smarter, the problem is that users are free to define a visible upper-bound for their opaque type which is itself a subtype of Matchable.