Proposal for Multi-level Enums

Dotty has added a really fabulous way of declaring Algebraic Data Types (ADTs) in the form of the new enum syntax

e.g.

enum MyEnum {
  case A
  case B(i: Int, j: Int)
}

However these ADT are limited in that they can only be completely flat, while in general it is actually quite useful oftentimes to nest ADTs, and the new enum syntax seems like a great opportunity to provide support for those! I would propose the following syntax

enum MyEnum {
  case A
  case B(i: Int)
  case enum C {
    case D(s: String)
    case E
    case enum F {
      case G
    }
    case enum H {
      case I, J, K
    }
  }
}

But then the question arises: What type should be inferred for nested enum values, such as MyEnum.A, MyEnum.B.apply, and MyEnum.C.D.apply(s: String)?

I think a simple approach would be: Always infer the most specific enclosing enum. So each of the following expressions should have these types inferred

MyEnum.A // MyEnum
MyEnum.B(1) // MyEnum
MyEnum.C.D("") // MyEnum.C
MyEnum.C.E // MyEnum.C
MyEnum.C.F.G // MyEnum.C.F
MyEnum.C.H.I // MyEnum.C.H
MyEnum.C.H.J // MyEnum.C.H
MyEnum.C.H.K // MyEnum.C.H

Below I will include a collection of nice usecases of this feature, to demonstrate that it is not just for giggles:

  1. Paul Philips’ SizeInfo talked about here
enum SizeInfo {
  case Bounded(bound: Int)
  case enum Atomic {
    case Infinite
    case Precise(n: Int)
  }
}
  1. A Hierarchy of Reals/Rationals/Integers/Naturals
enum Real(computeDigits: Int => BigDecimal) {
  case enum Rational(numerator: Int, denominator: Int) 
      extends Real(_ => BigDecimal(numerator) / denominator) {
    case enum Integer(n: Int) extends Rational(n, 1) {
      case Natural(n: Int) extends Integer(n)
      case Negative(n: Int) extends Integer(n)
    }
    case Fraction(numerator: Int, denominator: Int) extends Rational(numerator, denominator)
  }
  case Irrational(computeDigits: Int => BigDecimal) extends Real(computeDigits)
}
  1. An generalization of Either to include “Inclusive OR”
enum AndOr[+A, +B] {
  case Both[+A, +B](left: A, right: B) extends AndOr[A, B]
  case enum Either[+A, +B] extends AndOr[A, B] {
    case Left[+A, +B](value: A) extends Either[A, B]
    case Right[+A, +B](value: B) extends Either[A, B]
  }
}
  1. Grouping JSON values into primitives and non primitives
enum JsValue {
  case Obj(fields: Map[String, JsValue])
  case Arr(elems: ArraySeq[JsValue])
  case enum Primitive {
    case Str(str: String)
    case Num(bigDecimal: BigDecimal)
    case JsNull
    case enum Bool(boolean: Boolean) { 
      case True extends Bool(true), 
      case False extends Bool(false) }
  }
}
35 Likes

This looks really great to me. Subclasses of ADTs can be very useful. Would be great to have a nice syntax.

6 Likes

I used this kind of nesting fairly often when I have complicated error ADTs. A ReadError can be DecodeError, a ParseError or an IOError. A DecodeError can be a TypeError or a NotFoundError. Etc…

3 Likes

I second this proposal. Multi-level enums can be really useful for encoding more precise invariants into ADTs, without having to resort to nesting (which adds clutter and extra allocations).

5 Likes

Good idea. This should work for inference since nested parameterized enum nodes aren’t case classes like their siblings. We want to avoid this type inference problem.

That said, is the current implementation of enums necessarily flat? Since the top-level enum is sealed rather than final, couldn’t we could already do something like the following?

enum JsValue { // desugars to a sealed abstract class
  case Obj(fields: Map[String, JsValue])
  case Arr(elems: ArraySeq[JsValue])
}
enum Primitive extends JsValue {
  case Str(str: String)
  case Num(bigDecimal: BigDecimal)
  case JsNull
}
enum Bool(boolean: Boolean) extends Primitive { 
  case True extends Bool(true), 
  case False extends Bool(false)
}

(Granted, the nested version is an improvement that’s easier to read and keeps all the types contained under the top level enum, and enumTag might not be built to handle this gracefully. If we did this, enum should probably also imply final)

1 Like

I like this and think it would be a great addition. A lot my own own ADTs are naturally hierarchical, exactly like the examples given, such as this one from my Metascala JVM

sealed trait Type
object Type{
  sealed trait Ref extends Type
  object Ref{
    case class Cls extends Ref
    case class Arr extends Ref
  }
  sealed trait Prim extends Type
  object Prim{
    case object Z extends Prim
    case object B extends Prim
    case object C extends Prim
    case object S extends Prim
    case object I extends Prim
    case object J extends Prim
    case object F extends Prim
    case object D extends Prim
  }
}

This lets things have meaningful names like Type.Prim.I, while avoiding polluting the global (or even Type.*) namespace with dozens of prefixed TypePrimI names.

10 Likes

I tried to refrain myself from doing a “me too” post… But really, not doing what is proposed here would be huge missed in uniformity… I would really love to have that.

5 Likes

I’d honestly be surprised if nesting has never been considered – it just seems so natural and intuitive!

1 Like

Oh but it has on at least one occasion: https://gitter.im/lampepfl/dotty?at=5cd583d2f251e60ffa5a990f

1 Like

@smarter @nicolasstucki @odersky any thoughts? The proposal seems to be universally well received

3 Likes

My take is that we should ship what we have now in 3.0, and we can revisit this in a later version. We can extend flat enums to multilevel enums in a backward compatible way in the future, so we should give ourselves more time to weigh this, and above all to really feel the limits of flat enums, before committing to more complexity.

8 Likes

The problem with that strategy is that instead of the (preferred) “old way” and “new way” to encode ADTs you now have the “old way OR the only way to do hierarchical ADTs” and “the new, limited way”.

Is there no time to implement and test out hierarchical ADTs?

Are there no hierarchical ADTs in dotc?

8 Likes

We need to accept that languages should continually evolve, and figure out how to solve problems that brings[1], rather than keep fooling ourselves that we’re finally in possession of the Final, Complete, and Perfect design. Every bit of software that ships and we start using changes our perspective on things in ways that we can’t predict.

Sorry for being melodramatic, or if I’m over-generalizing, but you get the point…

[1] One example of a problem is “we can’t wait so long.” The solution is not to evolve in fewer iterations, but to figure out how to make each iteration much shorter.

5 Likes

I agree wholeheartedly with that point. There are hierarchical ADTs in my code base, and that basically means I cannot (consistently) rely on enum.

That being said, we can’t port every single fancy behaviour clever devs used to encode through sealed trait hierarchies in enum. Sometimes, it’s fine to say “you can do that, but by hand”. I just don’t think hierarchical ADTs are the right cutoff, but I use them pretty often, so I might be biased.

Also, last time I checked-a few weeks ago- hierarchical ADTs compiled, but just didn’t behave as you’d expect. If we’re going to forbid them, let them at least be a compilation failure.

2 Likes

Should this be opened as an issue in https://github.com/lampepfl/dotty/issues? (Or https://github.com/lampepfl/dotty-feature-requests - are multi-level enums a feature request or a bug report? :boxing_glove:)

2 Likes

Multi-level enums could be useful but could be added later, if necessary.

In the meantime, the feature can be emulated rather directly, without the need to abandon enums:

e.g.:

enum MyEnum {
  case A
  case B(i: Int)
  case C(x: X)
}

final case class X(c: C)

enum C {
  case D(s: String)
  case E
}

Indeed, this would be the idiomatic encoding of this ADT in Haskell and other functional programming languages, which support products of sums, but not “sums of sums”, since those flatten into sums (it is only an artifact of Scala’s encoding of sums using subtyping that such a multi-level sum is even possible).

Why use an intermediate X case class? You can have:

package foo

enum MyEnum {
  case A
  case B(i: Int)
  case C(c: foo.C)
}

enum C {
  case D(s: String)
  case E
}

Though this version still allocate a useless wrapper at runtime (the MyEnum.C case). I don’t see a reason to add a wrapper when you could just have a nested enum. But I agree that nested enums can always be added later, and so are not on the critical path.

You could just as well say that it’s an artifact of Haskell’s lack of subtyping that these are not possible in Haskell.

7 Likes

Oh, and while we’re talking about Haskell, it’s funny to read GHC’s source and find things like:

-- INVARIANT: The cast is never refl.
-- INVARIANT: The Type is not a CastTy (use TransCo instead)

…which could be expressed by having a more refined, multi-level ADT. I guess they don’t use wrappers there because of the added performance overhead and boilerplate these would incur.

1 Like

Agreed. I think they’d be helpful, and it’s worth a feature request, but let’s not make the best be the enemy of the good – if multi-level isn’t in scope for 3.0, we should think about how to do it in a later release.

3 Likes

Are enums final, not sealed ? What’s stopping me from writing

enum MyEnum {
  case A
  case B(i: Int)
}

enum C extends MyEnum {
  case D(s: String)
  case E
}

in the same file?

Edit: there’s a specific prohibiting error message…

class C in package  extends enum MyEnum, but extending enums is prohibited.

But it’s possible to employ a pattern hiding the enum type for a horrible workaround:

sealed trait MyEnum
final lazy val MyEnum = MyEnum0

private enum MyEnum0 extends MyEnum {
  case A
  case B(i: Int)
}

enum C extends MyEnum {
  case D(s: String)
  case E
}