Proposal for Enumerations in Scala

sideeffffect · February 10, 2020, 10:34am

I think with enum, we’re conflating too many things here.

Enumerations – declaring some finite, flat, plain data and assigning a natural number to each piece. Like enums in Java, C, protobuf, … This is where .value, .values, .valuesToEntriesMap, etc methods make sense. Basically this should replicate what enumeratum does. It would be great, if all of these inherited from java.lang.Enum automatically (I’m not sure it’s possible, though). We should use the keyword enum for this:

enum Fruit(sugarContent: Double) {
  case Apple(0.5)
  case Orange(1.25)
}
Fruit.Apple.value == 0
Fruit.Orange.sugarContent == 1.25
Fruit.valuesToEntriesMap(0).sugarContent == 0.5
Fruit.values == List(Fruit.Apple, Fruit.Orange)
Fruit.Apple: Fruit
// Fruit could also implement Eq and Ord out of the box? (based on `.value`)

Tagged unions – ADTs, as we know from Haskell/F#/ML/… Can be recursive, but these are still data, so “nesting” doesn’t make much sense here, nor do .values or java.lang.Enum. We should use the keyword union for this.

union Expr {
  case Zero
  case Val(v: Int)
  case Sum(l: Expr, r: Expr)
}
Expr.Val(42): Expr // the fact, that `Val` is implemented with a `class` `Val` is a detail, that should be _hidden_

Any complicated, including nested, sealed hierarchies. We already have everything we need for them in Scala (sealed, trait, class), no need for any other special keywords.

I like how Scala 3 tries to codify common idioms (those would be enum and union), simple things should be easy On the other hand, I don’t see a reason to complicate the (much rarer) complex things, like nested hierarchies. Those are already possible with Scala 2 tools, like sealed, trait, class.

@odersky would you agree that it’s worthwhile separating the union concept from enum?

eyalroth · February 10, 2020, 10:50am

I support your idea of separation between the features (and also realize now my earlier mistake of not understanding that ADTs are union types), but I’m not convinced that such a “small” scenario merits a new syntax.

As you said, we already have sealed, class, trait, object; why then we need a custom union syntax? Are those really all that different?

union Expr {
  case Zero
  case Val(v: Int)
  case Sum(l: Expr, r: Expr)
}

sealed trait Expr
object Expr {
  object Zero extends Expr
  case class Val(v: Int) extends Expr
  case class Sum(l: Expr, r: Expr) extends Expr
}

If anything, the more nesting such hierarchy has, the more boilerplate is required, and the greater the need for a simpler and more concise syntax.

julienrf · February 10, 2020, 10:53am

I’m not sure this would really work because you would anyway need to derive proper (de-)serializers for the case classes. I’m not even sure it would be simple to design a serialization process that would pick up the valueOf method for “simple cases” but would construct proper class instances for the other cases of a same ADT.

odersky · February 10, 2020, 10:55am

Not at all! I am a strong proponent of keeping the two together. As far as I know,every language that supports ADTs also supports enums as a special case of ADTs. An enum is simply an ADT where all cases are simple. The philosophy of the Scala language is to be a unifier, instead of an amalgamation of many different features. I have come to realise that if you ask committees or the general public the vote always goes towards more differentiated features, which in the end invariably leads to feature creep. So, I take it on me to strongly resist this tendency

One possible design is to stay pure and simply not have any enums at all, since they are not strictly necessary. That’s what Scala 2 did, and we could continue with it. On the other hand, I have the impression that the reduction of boilerplate is worth it. But then it should be one concept, not two or three different ones.

eyalroth · February 10, 2020, 11:04am

I don’t know about pure-data ADTs, but “java” enums are very much missed in Scala. There is the enumeratum library that somewhat provides their utility, but it seems to rely on macros so I’m not sure it’ll be ported to Scala 3.

It seems to me we have two concepts that are similar at their core, but heavily differ in their usage and needs of syntax sugar; enums need values / valueOf; ADTs need defs on nested types and multi-level hierarchies.

Trying to combine two different syntax sugars into one, just because they share a conceptual core, is not a good decision imho.

LPTK · February 10, 2020, 11:53am

That may actually be a good thing. When I define ADTs I always end up stuffing them with methods because it’s the easy thing to do, but then I dislike the result, because it is no longer easy to see the structure of the ADT anymore, with all the method pollution.

I think the better approach (though a bit cumbersome) is to outsource the methods into external traits, which actually also works with the enum syntax:

enum Json {
  case Bool(value: Boolean)    extends Json with BoolImpl
  case Array(items: Seq[Json]) extends Json with ArrayImpl
  def foo: Int
}
private trait ArrayImpl { self: Json.Array =>
  def foo = items.size
  def bar = foo // this method is defined only for Array
}
private trait BoolImpl { self: Json.Bool =>
  def foo = if (value) 1 else 0
}

@main def m = {
  val j = new Json.Array(Seq(Json.Bool(true)))
  assert(j.foo == j.bar)
}

Though again, it’s a little too much boilerplate, especially since it forces specifying the full extends clauses of the ADT cases.

eyalroth · February 10, 2020, 2:32pm

LPTK:

I think the better approach (though a bit cumbersome) is to outsource the methods into external traits, which actually also works with the enum syntax:

enum Json {
  case Bool(value: Boolean)    extends Json with BoolImpl
  case Array(items: Seq[Json]) extends Json with ArrayImpl
  def foo: Int
}
private trait ArrayImpl { self: Json.Array =>
  def foo = items.size
  def bar = foo // this method is defined only for Array
}
private trait BoolImpl { self: Json.Bool =>
  def foo = if (value) 1 else 0
}

Or maybe just using the good old syntax?

sealed trait Json {
  def foo: Int
}

object Json {
  case class Bool(value: Boolean) extends Json {
    def foo = if (value) 1 else 0
  }
  case class Array(items: Seq[Json]) extends Json {
    def foo = items.size
    def bar = foo
  }
}

Seems a lot cleaner to me.

dwijnand · February 10, 2020, 2:37pm

Clean/pollution is in the eye of the beholder, it seems.

eyalroth · February 10, 2020, 2:44pm

Perhaps, but then is it worth spending time on a new syntax-sugar feature that looks pretty much like before and does not reduce boilerplate?

kai · February 10, 2020, 2:44pm

But I think it must be reiterated. An enum construct with severe limitations and a very low ceiling on what it can do:

can’t have subhierarchies
~~can’t declare methods for branches~~ (without a workaround)
~~can’t inherit new traits in branches~~
~~can’t declare implicits for branches~~
all while types of the branches are reachable through pattern matching and must be for GADTs to work – making the argument that .apply widens irrelevant, meaning that all the above concerns are very relevant as a programmer will observe the subtypes of branches daily

Is just un-Scala! it’s a construct that does not scale with usage, that does not help contain complexity, but gives up at a certain point of complexity and forces a retreat to a low-level construct. This really goes against the principle of scaling with the codebase and how the other language constructs scale really well. You could argue that case class is also too limited and doesn’t scale, but I don’t think enums will have the success of case classes, not when e.g. nearly all the sealed hierarchies in my libraries are multi-level, it’s not worth it to use a different syntax for the minority of them that are simplistic.

EDIT: LPTK’s post clarifies some of the capabilities of enums, but still, making workarounds for new features before they’re even out is too much, telling newcomers “just make a private trait if you want to add methods to enum branch” and having to remember that yourself is hardly practical.

kai · February 10, 2020, 2:53pm

Nice workaround, but it’s a workaround for a feature that isn’t even released yet! I’d rather have a new release of the language make the ‘book of hacks & workarounds’ thinner and have widely used capabilities available in a straightforward mannger, not add even more weirdness to the language.

LPTK · February 10, 2020, 5:51pm

Wait until he has at least 500 lines of methods in each case. Then we’ll see if he still thinks it looks a lot cleaner

eyalroth · February 10, 2020, 6:06pm

Separation of concerns - Wikipedia

bmeesters · February 10, 2020, 7:09pm

For me personally the cleanest way to deal with this is pattern matching and not the typical inheritance polymorphism:

enum Json {
  case Bool(value: Boolean)
  case Array(items: Seq[Json])
 
  def foo: Int = this match {
     case Bool(value) => ???
     case Array(items) => ???
  }
}

I find this quite pleasant, and it saves quite a few keystrokes compared to the status quo. IIRC this is also the reason methods are dropped from the cases. Since pattern matching is quite natural in combination with ADTs. Though it remains subjective of course if this is really easier to read (IMO it is).

Most importantly (for me), constructors will return the type of the ADT and not the specialized branch, which prevents quirks and also saves on boilerplate on smart constructors:

// instead of having type Some[Int]
def some[A](value: A): Option[A] = Some(value)

morgen-peschke · February 10, 2020, 7:22pm

I’ve used this a couple times in my “try out Dotty” project, and it works really, really well for the simple case of an enumerable set of values.

For creating an ADT or GADT, my guess is this will quickly become a “cute trick” and be relegated obscurity, similar to how you can define a class such that every instance is an extractor, but almost nobody actually does (Regex notwithstanding, as most people seem to treat that as compiler magic, even though it’s not).

odersky · February 10, 2020, 7:59pm

That’s interesting, since so far everybody else I talked to is strongly in favor of dropping this feature! So it would be good to see arguments why you think it’s important to have.

morgen-peschke · February 10, 2020, 8:29pm

AFAIK the canonical (or at least what seems to be the most common) example is what’s inferred if you do something like this:

(_: List[Foo]).foldLeft(Foo.Empty)(_ combine _)

Most times you’d want this to return Foo, but the compiler complains that combine returns a Foo and it expects a Foo.Empty.type.

Granted, if you ever actually need something to return Foo.Empty.type, then getting back down there from Foo is a royal pain, and upcasting from Foo.Empty.type is pretty easy (even if it does require some annoying boilerplate).

I’m in favor of returning the more precise type, as it’s easier to work around when the default is wrong, but I can certainly empathize with the annoyance of having to manually create a smart constructor. A better solution for me would be to return the precise type and synthesize smart constructors and an .upcast method to make it easier to get to the type the compiler needs.

odersky · February 10, 2020, 8:34pm

That particular type inference problem might be solvable: When faced with the problem of instantiating
a type variable X with constraint C <: X where C is a case of enum E, we could in some situations instantiate X to E instead of to C. Similar automatic widenings happen for singleton types and union types already.

LPTK · February 10, 2020, 9:37pm

The problem at hand was to add methods to individual cases, not methods on the whole enum. This cannot be done with pattern matching. I also added a whole-enum method foo in my example just to illustrate that you can also do it.

Ichoran · February 11, 2020, 4:59am

It comes up all over the place with type inference! I much prefer the constructors to not reveal the branch of the ADT, if we’re going to have separate syntax for enums.

For instance var a = Some(x); while (p) { a = Option(foo) } doesn’t work in Scala 2.

So I’m in @bmeesters’s camp on this one. for constructors returning the type of the ADT!

(Except if the branches have their own type you do need a way to get a thing of that type. If you have both an autogenerated apply method and a constructor, the constructor can be exact and the apply return the ADT, for instance.)