Proposal for Enumerations in Scala

Hi Scala Community!

This thread is the SIP Committee’s request for comments on a proposal to introduce Enumerations in the language, a mechanism to create ADTs, GADTs and java-style enumerations with the same syntax. You can find all the details here.

Summary

Allow the definition of an enumeration of cases with the following syntax:

enum Color {
  case Red, Green, Blue
}

and a more advanced example for how scala.Option[T] could be encoded:

enum Option[+T] {
  case Some(x: T)
  case None
}

Today, enumerations are usually encoded via one of two mechanisms:

  1. Simply making use of Java enumerations (rarely).
  2. Defining sealed trait and case class hierarchies.

This encoding has the advantage of allowing full compatibility with java enumerations, as well as allowing a more concise syntax for advanced usages that would normally require an entire hierarchy of types.

Hard Decisions
A lot of the hard implementation decisions are encoded in this github issue. The TL;DR; is the following:

  • How should toString() behave. (See issue)
  • .apply on the companion object for an enum gives the enum type, not the precise type.
    e.g. you would return an Option[T] not a Some[T].

There are more things listed in the github issue, but those are either implementation clean up or need to alter the specification. These include:

  • The expected order of values returned by .values
  • Disallow extending java.lang.Enum outside of an enum definition.

Opening this proposal for further discussion, specifically looking for opinions and use cases on these hard decisions.

7 Likes

I didn’t see an issue for the question about the .apply method in the linked PR, is there additional discussion context for this question?

Is there any discussion on why java.lang.Enum isn’t extended by default, but requires opting in?

2 Likes

While I love the idea of making enumerations (aka sum types) less boilerplate (death to boolean blindness!), I’m concerned by the fact that, if I understand correctly, they can only be 1 level deep. Meaning you have to switch back to a manual class hierarchy for anything that nests. That’s especially concerning when maintaining backwards compatibility, as I expect there to be lots of additional desugaring bits to manually recreate (like the scala.runtime.AbstractFunction1 parent in case class companion objects when not defined)

As an example (and a little blast from the past for Josh), sbt’s Reference hierarchy can at best be approximated with:

enum Reference {
  // BuildReference
  case BuildRef(build: URI) // with ResolvedReference
  case ThisBuild

  // ProjectReference
  case ProjectRef(build: URI, project: String) // with ResolvedReference
  case LocalProject(project: String)
  case RootProject(build: URI)
  case LocalRootProject
  case ThisProject
}
5 Likes

I thought I saw a suggestion once to allow something like nested enums. But can’t find it anywhere.

enum Color {
  enum Pretty {
    case Pink, Purple, Periwinkle
  }
  enum Dull {
    case Red, Green, Blue
  }
}

The above seems to compile if you add a dummy case in the top level enum, but the nested enums just disappear.

Something to a similar effect but with some extra boilerplate seems to work.

sealed trait Pretty { self: Color => }
sealed trait Dull { self: Color => }
enum Color {
  case Pink extends Color with Pretty
  case Red extends Color with Dull
}

val c: Color & Pretty = Color.Pink
3 Likes

Thanks, that does seem to work, but it’s unfortunately quite repetitive.

import java.net.URI

trait BuildReference    { self: Reference => }
trait ProjectReference  { self: Reference => }
trait ResolvedReference { self: Reference => }

enum Reference {
  case BuildRef(build: URI) extends Reference with BuildReference with ResolvedReference
  case ThisBuild            extends Reference with BuildReference

  case ProjectRef(build: URI, project: String) extends Reference with ProjectReference with ResolvedReference
  case LocalProject(project: String)           extends Reference with ProjectReference
  case RootProject(build: URI)                 extends Reference with ProjectReference
  case LocalRootProject                        extends Reference with ProjectReference
  case ThisProject                             extends Reference with ProjectReference
}

It would be nice to be able to use nesting to drop some of the repetition.

4 Likes

I suspect that a general solution for multi-level enums could be pretty tricky. Especially if you want to support things like BuildReference with ResolvedReference in the Reference example.

I found them :sweat_smile:

scala> enum Color {                                                             
     |   enum Pretty {
     |     case Pink, Purple, Periwinkle
     |   }
     |   enum Dull {
     |     case Red, Green, Blue
     |   }
     | case Dummy
     | }
// defined class Color

scala> Color.Dummy.Pretty.Pink
val res1: Color.Dummy.Pretty = Pink

Currently nested enums are placed in the enum cases, and no subtype relation exists between outer and inner enum declarations.

3 Likes

What bothers me quite a bit with this proposal is the conflation of two features into one construct:

  1. Enumerated types.
  2. Concise syntax for sealed types hierarchies – for instance, ADTs.

This is not the first case in the language where this happens – see implicits for example – and I believe it’s a source for confusion and an obstruction of designing syntax that is better suited / tailored to the specific needs of each feature.

Enumerated types are constant and unique values that have unique identifiers; sealed types do not exhibit any of these characteristics.

For example, what will these return?

Option.valueOf("Some")
Option.values()

They surely cannot return an instantiated object for Some.

I suspect that the confusion between the two stems from the pattern in which enums are encoded nowadays in Scala 2 (sealed trait + objects). However, it’s possible to encode them differently using opaques (which are in fact possible in Scala 2):

object Colors {
  opaque type Color = Int

  private[this] case class Data(name: String, rgb: Int)

  private[this] val colorToData = mutable.Map.empty[Color, Data]

  private[this] def apply(ordinal: Int, name: String, rgb: Int): Color = {
    colorToData.put(ordinal, Data(name, rgb))
    ordinal
  }

  object Color {
    def valueOf(name: String): Color = colorToData.find(_._2.name == name).get._1
    def values(): Array[Color] = colorToData.keys.toArray

    val Red = Color(0, "Red", 0xFF0000)
    val Green = Color(1, "Green", 0x00FF00)
    val Blue = Color(2, "Blue", 0x0000FF)
  } 
 
  extension ops on (color: Color) {
    def ordinal: Int = color
    def name: String = colorToData(color).name
    def rgb: Int = colorToData(color).rgb
  }
}

This is not very useful as it requires a lot of boilerplate, but this demonstrates how enums are not about sealed types, but rather about a constant set of identifiable values.

3 Likes

Once again, I have to disagree. An enum with constant and unique values is simply a special case of an ADT. The fact that we think that they’re different things stems from Java.

// here we have only constant values, isomorphic to a Java enum
sealed trait Foo
case object Bar extends Foo
case object Baz extends Foo

// Now we have an ADT
sealed trait Foo
case object Bar extends Foo
case object Baz extends Foo
case class Qux(a: Int) extends Foo

A different axis of conflation comes from the Java enums, which provide

  1. a way to define an enumeration of constant values
  2. tools for reflecting over enumerations of constant values

I think you correctly identified that these reflection tools only make sense for the limited Java kind of enums. So I think it’s a mistake to try to fit them into Scala enums. I would suggest to only emit those methods for the enums which are compatible with Java enums (and remove scala.Enum). Option.valueOf or Option.values don’t make sense.

5 Likes

My definition for enums is not the “java definition”, but the general definition for them in most other common languages – some preceding Java – and apperantly in type theory as well; seen as a special case of tagged unions. See Enumerated type on Wikipedia.

The abstraction where enums are a special case of ADT is uncommon and unconventional as far as I can tell. The only similarity they share is having the compiler able to detect that a pattern-match does not enclose over all of the possible sub-cases.

There methods are one of the major characteristics and most useful features for enums; they will not be enums without those.

1 Like

I said Java because that’s kind of like Scala’s older, slower brother. And because Scala wants to be as compatible with Java as possible. But obviously enums are not unique to Java.

That enum is not the correct term for the generalization that Scala tries to implement here might be a very accurate observation though. But I leave that discussion to people who care more about the name than me.

If you scroll down in your Wikipedia page you’ll find a paragraph about some programming languages like SML—which is like Scala’s older brother from another mother (than Java’s)—that basically says the same as me about how enums relate to ADTs.

They are sometimes useful for enums (like Java enum), but not for all ADT’s (like Scala enum). That’s why I suggested to only emit them when an enum conforms to a Java enum.

4 Likes

I didn’t see any, but an argument could be made that you can’t add a parameterized case without breaking binary compatibility, even if you never meant to consider your type a finite enumeration.

1 Like

I believe Scala suffers from various such generalizations that produce a relatively nuanced low level syntax that is more complex – as in, requires a higher level of understanding and “brain computation”.

This is a common characteristic of lower-level languages, where the rationale behind this complexity is often better performance, which I believe is not the case here.

You’ll also note that:

  1. These languages are much much less common than the others – Java, Python, C, JavaScript, etc.

  2. These languages do not have enums, but they have a simulation of enums (“can be used to implement an enumerated type”). As I’ve demonstrated earlier, enums can also be simulated with opaques in Scala; that does not mean they are a special case of opaques, but is merely an implementation detail.

I’m hardly a Haskell fanatic but if we’re going to call Haskell a lower-level language, I’m not sure what to say next.

ADT’s are a higher level of abstraction than enumerations, not lower. In your simulation of enums with opaques, you built an enum from lower-level tools. You took opaques, case classes, mutable hashmaps and built an enum. If you take an ADT, you have to take something away to build an enum.

And a more modern, booming language like Rust has enums like the ones being proposed in this thread.

2 Likes

I was referring to actual low level language like assembly, and was hinting at features other than enums.

They are not a higher level of abstraction, but a more generic one. The spectrum of high-low level in the context of programming languages is about the simplicity of defining business models, which is achieved by creating ideas that are closer to the actual representation of real-world problems and usage patterns, and omitting ideas that are closer to the underlying implementation.

Conflating (conventional) enums with ADTs is a step down on that spectrum, as ADTs are an implementation detail that is irrelevant to one who desires to use enums (again, conventional definition) – a fixed set of tagged values.

This is identical to how defs have their own syntax and semantics despite being implemented with classes and being able to be seen as a special case of vals. Not having defs will produce a more generic language, but at the same time lower level and harder to read and reason about in the context of its usage.

Still unpopular, hence unconventional. I’m not familiar with Rust but according to its documentation on enums, it seems that the concept has a completely different meaning than their conventional definition.

Hello, sorry for stepping in that thread, but @eyalroth, you speak a lot of “conventional” things and you talk about word definition that I never heard before, for example your definition of higher vs more generic.

I’ve been in software development for 20 years now(not in academic or , and clearly our conventions aren’t the same, so be careful to not conflate your personal experience with the general view of the world.

Then, you also conflate popularity of languages with convention of usage, which is a bold step, especially since you also dismissed the evolution of popularity and the evolution of convention as irrelevant.

That being said, that does not mean that your remarks should not be taken into account: it shows at least that not everyone share the same idea of what is obvious (and so, 1/ enum goal should be clearly specified) and that 2/ we need to be careful with the “enum” keyword and wonder if it’s the best one for the concept described here.

For 1/, I think the issue description (https://github.com/lampepfl/dotty/issues/1970) does a very good job at stating the goals and limits. On the other hand, the dotty doc (https://dotty.epfl.ch/docs/reference/enums/enums.html) doesn’t talk at all (AFAIK) of gadt model, and thus makes legit @eyalroth interpretation. So: perhaps the doc should also clearly state the goals of the feature.

For 2/, I can only share my personal view: enum seems to be the correct word, there’s precedent with rust and other for using that word for other concepts than what Java does, and confusion can be avoided with adequate documentation.

Hope it helps,

7 Likes

Only a limited subset of Scala enums can extend java.lang.Enum. We want the java-compatibility to be an explicit choice, and then you also get errors/warnings enforcing compliance.

TL;DR; Scala Enums can do more than Java Enums, so not every Scala enum can be a Java enum.

6 Likes

Regarding 2/, I missed some linked. You want the discussion of ADTs/GADTs here: https://dotty.epfl.ch/docs/reference/enums/adts.html

I may be a software developer for far less than that (~10 years), but I was part of a very big and old organization (IDF) in which this notion of high-low level exists, which is where I got it from, and might be a source for this concept in a big part of the Israeli tech industry.

Regardless of our individual experiences, there is the definition on Wikipedia – which in general attempts at defining concepts in the most conventional way – which states:

In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer.

And later on explains how relative this concept is, and how over time languages that were considered high-level may become low-level (such as C).

If we consider enums, ADTs seem like an implementation detail; hence, a lower level representation of that concept.

What is convention if not an agreement shared by most people, which is what popularity indicates (which group is the biggest)?

I did not dismiss the evolution of popularity in any way nor the evolution of conventions. The fact that Rust may seem to be booming now doesn’t mean it will become popular, and anyone claiming to know that is delusional, as social sciences are quite inaccurate and undeveloped (yes this is my personal opinion, not common sense).

My personal view is that we need different syntax / keywords for each feature. In fact, I don’t see much value in having enum-like syntax for ADTs unless it supports nesting as well (as proposed by @dwijnand), as it seems to me that without it not much boilerplate is removed. I do think that nested enums (proposed by @Jasper-M) could be a neat addition.

That’s good to know. Under what conditions can’t a scala enum extend java.lang.Enum? It would be good to understand a bit more about the trade-off and what you can no longer do when you want to extend java.lang.Enum, and what you miss out on if you don’t.

1 Like