Proposal for Enumerations in Scala

Hey Josh, I don’t think it absolutely must be added immediately, no, but probably some work should be done to ensure they can indeed be added later (i.e. we aren’t backing into a corner syntactically or semantically that would prevent their later addition).

4 Likes

It would probably be interesting to see how many class hierarchies in the ecosystem can and how many cannot be converted to enums. (I just have this feeling that multi-level hierarchies are common, and the best proof that enums can be extended to multi-level enums is by implementing it. :smile:)

4 Likes

My (short) experience so far is that using the enum syntax to define ADTs does not work very well. As soon as you add methods to the enum it feels really weird to have them defined at the same level as enum constructors:

enum Json {
  def decode[A](using decoder: Decoder[A]): Option[A] = ...
  case Bool(value: Boolean)
  case Array(items: Seq[Json])
  ...
}

In this example, the method decode is a member of the Json type, although the constructors Bool and Array are members of the Json value. This means that if you have a value json of type Json on hand, you can call json.decode but not json.Bool. Conversely, you can write Json.Bool but not Json.decode.

In addition to this slightly confusing situation, I found that the enum-syntax was too limited for ADTs. Not only multi-level enums are not supported, but enum constructors can’t define methods:

enum Foo {
  case Bar {
    def something = () // NOT SUPPORTED
  }
}

Last but not least, the values and valueOf methods make no sense on ADTs, as this was previously said in this thread.

For these reasons, I found that it makes little sense to start defining an ADT with the enum syntax: I start with a sealed trait directly, so that I don’t have to convert my code to this style once I hit one of the aforementioned limitations.

But, should we work on addressing these limitations (as suggested in the multi-levels enums proposal), or should we restrict the scope of enums to effectively enumerated values? I lean towards the second option.

About nested-enums: keep in mind that in the SIP proposal constructing an enum value has the type of the enum, not the type of its constructor (as opposed to the way the current case classes work). I.e., constructing Some("foo") would have type Option[String], not Some[String], if Option was defined as an enum. One of the motivations for having multi-levels ADTs is to be able to distinguish one particular subtype of the top-level type. So, I am not sure that being able to use the enum syntax to define such ADTs would be enough.

4 Likes

I believe values and valueOf do make sense if ADTs also define simple cases. They are necessary for (de-)serializing such values, for instance. Without values and valueOf, one would have to generate a complete object with its own JVM class for each simple value in an ADT. By contrast, the aim of the current design is to avoid generating lots of code for simple enum values, no matter whether these values are part of a simple enumeratiion or a general ADT.

About adding methods to enum cases: I believe that should not be supported. The idea of an ADT is that it’s data! If one wants a more OO approach where methods go with subclasses, then indeed one should use a sealed trait with subclasses.

And the idea of Scala is combining OO with FP, an FP-only construct that removes access to the OO toolset is not a good fit for that purpose.

1 Like

I had a similar feeling when trying to convert one of my ADTs (that has methods) to an inner-class-like syntax. I posted about it the other day on the multi-level enum thread. I could perhaps imagine a syntax like the following, which has nothing to do with enums:

sealed trait Json {
  def decode[A](using decoder: Decoder[A]): Option[A] = ...
  sealed {
    case class Bool(value: Boolean) { ... }
    case class Array(items: Seq[Json])
    case object Null
  }
}

We can play around with the keywords, but I think that two things are essential here:

  1. Using case class and case object. This would (a) make it clearer that one is actually a class and the other is an object, and (b) allow for other class-def modifiers – final, private, etc – to be integrated seamlessly.
  2. Nest / indent the nested ADTs under a certain keyword (not necessarily sealed).

I suspect that this would lead to developers using data-only ADT syntax with extension methods (like here) instead of OOP classes. Not to say that this is a bad thing, but just a consideration to be aware of.

Are there any performance considerations for enums? E.g. I would like it if a match statement on enum values collapse down to a tableswitch or lookupswitch (e.g. as per the @switch annotation).

Would this already work with the current Scala 3 Enum definition, and is this in scope as a consideration of the definition?

I don’t think you need to tell me that :wink: But the motivation for enums was that sometimes all we want is data, and then we should not have to jump through all the hoops of the more general class hierarchy syntax.

2 Likes

I’m not sure why though. With the current proposal you can turn those extension methods into instance methods by just copy pasting them into the enum (and s/response/this/). So I’m not sure why someone would go with the extension methods instead.

I was referring to the situation where enum does not allow instance methods. If they are allowed, then by all means there is no need for extension methods, but then the code looks a bit messy (as @julienrf pointed out).

I think with enum, we’re conflating too many things here.

  1. Enumerations – declaring some finite, flat, plain data and assigning a natural number to each piece. Like enums in Java, C, protobuf, … This is where .value, .values, .valuesToEntriesMap, etc methods make sense. Basically this should replicate what enumeratum does. It would be great, if all of these inherited from java.lang.Enum automatically (I’m not sure it’s possible, though). We should use the keyword enum for this:
enum Fruit(sugarContent: Double) {
  case Apple(0.5)
  case Orange(1.25)
}
Fruit.Apple.value == 0
Fruit.Orange.sugarContent == 1.25
Fruit.valuesToEntriesMap(0).sugarContent == 0.5
Fruit.values == List(Fruit.Apple, Fruit.Orange)
Fruit.Apple: Fruit
// Fruit could also implement Eq and Ord out of the box? (based on `.value`)
  1. Tagged unions – ADTs, as we know from Haskell/F#/ML/… Can be recursive, but these are still data, so “nesting” doesn’t make much sense here, nor do .values or java.lang.Enum. We should use the keyword union for this.
union Expr {
  case Zero
  case Val(v: Int)
  case Sum(l: Expr, r: Expr)
}
Expr.Val(42): Expr // the fact, that `Val` is implemented with a `class` `Val` is a detail, that should be _hidden_
  1. Any complicated, including nested, sealed hierarchies. We already have everything we need for them in Scala (sealed, trait, class), no need for any other special keywords.

I like how Scala 3 tries to codify common idioms (those would be enum and union), simple things should be easy :tada: On the other hand, I don’t see a reason to complicate the (much rarer) complex things, like nested hierarchies. Those are already possible with Scala 2 tools, like sealed, trait, class.

@odersky would you agree that it’s worthwhile separating the union concept from enum?

1 Like

I support your idea of separation between the features (and also realize now my earlier mistake of not understanding that ADTs are union types), but I’m not convinced that such a “small” scenario merits a new syntax.

As you said, we already have sealed, class, trait, object; why then we need a custom union syntax? Are those really all that different?

union Expr {
  case Zero
  case Val(v: Int)
  case Sum(l: Expr, r: Expr)
}
sealed trait Expr
object Expr {
  object Zero extends Expr
  case class Val(v: Int) extends Expr
  case class Sum(l: Expr, r: Expr) extends Expr
}

If anything, the more nesting such hierarchy has, the more boilerplate is required, and the greater the need for a simpler and more concise syntax.

I’m not sure this would really work because you would anyway need to derive proper (de-)serializers for the case classes. I’m not even sure it would be simple to design a serialization process that would pick up the valueOf method for “simple cases” but would construct proper class instances for the other cases of a same ADT.

1 Like

Not at all! I am a strong proponent of keeping the two together. As far as I know,every language that supports ADTs also supports enums as a special case of ADTs. An enum is simply an ADT where all cases are simple. The philosophy of the Scala language is to be a unifier, instead of an amalgamation of many different features. I have come to realise that if you ask committees or the general public the vote always goes towards more differentiated features, which in the end invariably leads to feature creep. So, I take it on me to strongly resist this tendency :wink:

One possible design is to stay pure and simply not have any enums at all, since they are not strictly necessary. That’s what Scala 2 did, and we could continue with it. On the other hand, I have the impression that the reduction of boilerplate is worth it. But then it should be one concept, not two or three different ones.

6 Likes

I don’t know about pure-data ADTs, but “java” enums are very much missed in Scala. There is the enumeratum library that somewhat provides their utility, but it seems to rely on macros so I’m not sure it’ll be ported to Scala 3.

It seems to me we have two concepts that are similar at their core, but heavily differ in their usage and needs of syntax sugar; enums need values / valueOf; ADTs need defs on nested types and multi-level hierarchies.

Trying to combine two different syntax sugars into one, just because they share a conceptual core, is not a good decision imho.

2 Likes

That may actually be a good thing. When I define ADTs I always end up stuffing them with methods because it’s the easy thing to do, but then I dislike the result, because it is no longer easy to see the structure of the ADT anymore, with all the method pollution.

I think the better approach (though a bit cumbersome) is to outsource the methods into external traits, which actually also works with the enum syntax:

enum Json {
  case Bool(value: Boolean)    extends Json with BoolImpl
  case Array(items: Seq[Json]) extends Json with ArrayImpl
  def foo: Int
}
private trait ArrayImpl { self: Json.Array =>
  def foo = items.size
  def bar = foo // this method is defined only for Array
}
private trait BoolImpl { self: Json.Bool =>
  def foo = if (value) 1 else 0
}

@main def m = {
  val j = new Json.Array(Seq(Json.Bool(true)))
  assert(j.foo == j.bar)
}

Though again, it’s a little too much boilerplate, especially since it forces specifying the full extends clauses of the ADT cases.

2 Likes

Or maybe just using the good old syntax?

sealed trait Json {
  def foo: Int
}

object Json {
  case class Bool(value: Boolean) extends Json {
    def foo = if (value) 1 else 0
  }
  case class Array(items: Seq[Json]) extends Json {
    def foo = items.size
    def bar = foo
  }
}

Seems a lot cleaner to me.

Clean/pollution is in the eye of the beholder, it seems. :smile:

1 Like

Perhaps, but then is it worth spending time on a new syntax-sugar feature that looks pretty much like before and does not reduce boilerplate?

But I think it must be reiterated. An enum construct with severe limitations and a very low ceiling on what it can do:

  1. can’t have subhierarchies
  2. can’t declare methods for branches (without a workaround)
  3. can’t inherit new traits in branches
  4. can’t declare implicits for branches
  5. all while types of the branches are reachable through pattern matching and must be for GADTs to work – making the argument that .apply widens irrelevant, meaning that all the above concerns are very relevant as a programmer will observe the subtypes of branches daily

Is just un-Scala! it’s a construct that does not scale with usage, that does not help contain complexity, but gives up at a certain point of complexity and forces a retreat to a low-level construct. This really goes against the principle of scaling with the codebase and how the other language constructs scale really well. You could argue that case class is also too limited and doesn’t scale, but I don’t think enums will have the success of case classes, not when e.g. nearly all the sealed hierarchies in my libraries are multi-level, it’s not worth it to use a different syntax for the minority of them that are simplistic.

EDIT: LPTK’s post clarifies some of the capabilities of enums, but still, making workarounds for new features before they’re even out is too much, telling newcomers “just make a private trait if you want to add methods to enum branch” and having to remember that yourself is hardly practical.

2 Likes