Finalising Enumerations for Scala 3

bishabosha · August 21, 2020, 9:48am

(The Scala Org aims to release Scala 3 by the end of fall 2020. We are about 15 employees (some of whom work part-time), spread in 4 organisations (+ active community members), focusing on finalising 52 essential projects in 6 months. As of today, project leads will publish the road-maps under the category “Scala 3 release projects” to share with you what is to be expected and hopefully get your advice & contributions as well. All the projects’ road-maps come after an extensive feedback gathering, rounds of discussion, and involvement of major stakeholders, we now need the community to help push this effort over the line. Your collaboration is highly appreciated, thank you in advance!)

The Scala Center team is working on improvements to the enumerations feature before the release of Scala 3, with these main goals in mind:

To improve compatibility of enumerations with Java serialisation framework, and Java’s enum APIs when an enumeration extends java.lang.Enum.
To provide consistent API’s even as enum definitions grow beyond simple enumerations.
To improve intuition behind the desugaring of enum definitions.

Milestone 1: Improve support for Java APIs

As it stands in the 0.26.0-RC1 release, if a singleton enum value does not extend java.lang.Enum then it will not remain singleton if it is deserialized using the Java serialization framework. This can be remedied by implementing readResolve to replace the serialized object by one of the known constants.
The java.lang.Enum API depends heavily on runtime checking of superclasses of enum values, so it is unsafe to extend java.lang.Enum by more than 2 subclasses deep. When not using migration mode in the compiler, we will restrict extension of java.lang.Enum to only from traits or the enum syntax, as it is done in Java.
Enumerations extending java.lang.Enum will be restricted to only be defined in a static object, this is because the Java Language Specification forbids enums which are inner classes.
The values array method on the companion object of an enumeration will be sorted by ordinal.

Milestone 2: custom toString for enum

As it stands in the 0.26.0-RC1 release, a singleton enum value always overrides toString to be equivalent to the identifier of the enum case. This prevents customisation of toString by the user. We will provide a mechanism for the user to override toString and preserve the accuracy of APIs such as valueOf which looks up values by the string of their identifier.

Milestone 3: Simplification of APIs

We will remove scala.runtime.EnumValues in favour of implementing caching and lookup logic directly in the companion object of an enum.
We will likely remove the scala.Enum trait while still providing dynamic lookup of the name and ordinal of each enum case. This will make it simpler to extend java.lang.Enum which guarantees your enumeration has only singleton cases.
We will standardise the representation of Java style enums in the TASTy format so that it is independent of the compiler.

Milestone 4: Improve Intuition for Enum Desugaring

The companion of an enumeration will define public lookup methods (such as values or valueOf) exactly when the enumeration only defines singleton values. It does not matter if the enumeration carries type parameters. If an enum case with a constructor is added, these lookup methods will no longer be generated.
Enum cases that do not explicitly provide type parameters will copy the variances of type parameters of the enum that defines them. If an invariant type parameter is required by the constructor, the user will be warned to explicitly provide type parameters for the enum case
We will experiment with improving type inference for enumerations, allowing the apply method of enum case companions to give the most precise type.

Acknowledgements

Thank you to Guillaume Raffin for contributing the checks that enforce enum syntax for enumerations extending java.lang.Enum.

Major Changes to the Original SIP

A new change was prompted by the need for an appropriate API to retrieve the identifier of an enum value as a String, where the implementation is restricted to the compiler. In general, it is not a good choice to add a name method, such as in java.lang.Enum, to all enums because that is likely to collide with a name field of a class enum case. To avoid that likely collision, the method name enumLabel was chosen as an instance method for an enum class. As a public API, enumLabel may be adding extra burden when we already have the similar method productPrefix. enumLabel was thought to be more friendly to beginners as a teaching tool, however the need to avoid collisions generates a similarly cryptic name. An alternative is to generate a public method on the companion such as def labelOf(value: E): String where E is an enum type.

As with any of our announcements, we invite your feedback and discussion, as these changes will have to be reviewed by the SIP committee.

kai · August 27, 2020, 11:50am

Any plans for improving nested / hierarchical enumerations? AFAIK currently enums cannot be nested, so for enum hierarchies one must fall back to sealed traits. This has been discussed on this forum in the enum proposal topic, and was ostensibly punted to be solved somehow in Scala 3.1, however the issue is does not figure in the roadmap above.

sjrd · August 27, 2020, 11:52am

The roadmap above only shows what is planned by the release of 3.0. So yes, anything planned for 3.1 would be absent from it.

bishabosha · September 7, 2020, 12:33pm

Hello, I would like to update this forum post to state that in the release of Dotty 0.27.0-RC1, we
released all of Milestone 2 and all but one points of Milestone 1, and the nightly 0.28.0-bin-20200902-95a6b44-NIGHTLY completes Milestone 1 by restricting enums extending java.lang.Enum to static scopes. We are currently working on Milestones 3 and 4.

bishabosha · November 4, 2020, 2:29pm

Hello, with the milestone Scala version 3.0.0-M1 now on Maven Central we would like to say that enumerations are now in their final release state for Scala 3.0.0, and that all Milestones: 1, 2, 3, and 4 have been completed.

Compared to the original post, the following feature changes have been made:

Drop the proposed def enumLabel: String method: It was decided the use case it provided was not necessary, and users can instead access the declared label of an enum case with productPrefix, which can still be overridden by the user.
Instead of removing scala.Enum as the common base trait of all enum classes, it has been relocated to scala.reflect.Enum, hinting that it may be used to reflect on the ordinal of a generic enum class. This common super trait will not be inferred in a least upper bound of two unrelated enums. Additionally, when migrating from Scala 2, compilation units that relied on the default import of java.lang.Enum will no longer clash.
We provide a new method on the companion object of an enum class E - def fromOrdinal(ordinal: Int): E. This method is always generated and is intended for deserialisation. It takes any Int and if it matches the ordinal of a singleton enum value then return that value, else throw an exception.

Other updates relating to enums:

Enums are now supported in Scala.js
The representation of classes extending java.lang.Enum in TASTy will be frozen at a later point.
Missing member errors for values and valueOf will explain to the user that their enum declares non-singleton values.

sideeffffect · November 6, 2020, 11:30pm

May I ask why fromOrdinal throws instead of returning an Option or Either?
If throwing is explicitly desired, there could be fromOrdinalOrThrow(ordinal: Int): E, I guess.
I’m asking, because I feel that the default should work without throwing exceptions.

som-snytt · November 7, 2020, 2:01am

This method is always generated and is intended for deserialisation.

If you start with an enum, round-tripping through its ordinal works trivially and efficiently, except in the face of severe breakage.

I heard similar explanations for the API of old-style Enumeration.

Coding a bounds check is not hard, so simplifying the API is also worthwhile.

morgen-peschke · November 7, 2020, 8:44am

That seems somewhat optimistic

Would implementing the base fromOrdinal in terms of a union type be sufficiently performant to allow using it as a base for derived versions?

Then it would be possible to provide alternate behaviors without duplicating the core logic, like a version which throws for performance critical bits of code (fromOrdinalOrThrow?), and a version which returns an Option for everywhere else (fromOrdinalOpt?).

odersky · November 7, 2020, 10:43am

I am firmly in the camp that fromOrdinal should throw. Let me explain why:

Consider what fromOrdinal should be from a modelling perspective: It is the inverse of ordinal, which means it’s a function from an interval to the enum values themselves. So the correct modelling is that fromOrdinal should require that its argument is in that interval. That requirement is enforced by a run-time check.

Runtime checks are less good than static types, but still perfectly acceptable in my book. The alternative would be to make fromOrdinal a total function, having it return an Option type. But that’s much worse from a modelling perspective. We lose the inverse property. We make fromOrdinal take a weaker type than it should and have it return a weaker result. And by returning an option we just pass the problem on to the callers. What should they do if the result is None? If that happens, the argument was an illegal value that should have been caught before passing it to fromOrdinal. But now we have to cure it by massaging the result. Probably at this point most applications would throw an exception anyway.

There could still be an argument for making fromOrdinal a total function over Ints. Basically, it would be a “hedge your bets” play. Some applications might want to deal with None values, and those that do not can easily force the option with a get, or in some other way. So, what is lost in returning the “safer” type? Well, first, there’s a runtime penalty that can be significant. Second, you make application code flabby with all these tests that they have to perform on the return values. I believe the second counter argument is the stronger one.

If an application follows a different philosophy and wants an Option they can always catch the exception and convert it to Option. From a runtime perspective this is advantageous if the exception is indeed thrown only rarely. Which I would say is typically the case here.

So, I would return a Option only if it is comparably hard to test whether the argument to fromOrdering is legal. But that’s not the case. You need a single integer comparison to ensure the property.

Generally, I have seen several code bases that became much harder to work with by over-use of Option. It’s not a silver bullet and it often does more harm than it helps.

To sumarize:

The most important part is always the domain modelling and the algebraic laws it uncovers.
We should employ the type system where possible to verify properties, but runtime checks
are a reasonable fallback where using types is impractical.
We should be careful not to compromise on model strength in order to get to static typing.

odersky · November 7, 2020, 10:54am

To add to my previous argument: It’s interesting to follow the development of type systems that can check requirements such as the one for fromOrdinal and thereby keep the right model. E.g. certain forms of dependent types such as refinement types. That’s probably the future, but it will take time to unfold. Until that time, I would not compromise on modeling strength to get to static typing.

sideeffffect · November 7, 2020, 3:56pm

Fair enough, I see there are proponents of throwing fromOrdinal and they have good reasons for that

Would you find it acceptable, to have also something like fromOrdinalOption which would return an Option or Either (we would have to elaborate on the name, if returning Either)?
That would appease the crowd which prefer (at least in some situations, if not always) to have an Option/Either instead of a thrown Exception, without compromising on the domain modelling with fromOrdinal you talk about.

odersky · November 7, 2020, 6:01pm

No. This is compiler-generated code. We want to generate the absolute minimum. Also it’s already easy to get an Option if one is desired:

Try(fromOrdinal(x)).toOption

If that’s still too annoying, one could also define an optionally wrapper.

optionally(fromOrdinal(x))

looks just as nice as fromOrdinalOption(x).

morgen-peschke · November 7, 2020, 8:09pm

It’s not so much that Try(fromOrdinal(x)).toOption is annoying to write, it’s that if you’re in a context where there’s a good chance that you’re going to take the sad path frequently, you end up having to basically re-implement the core logic because throwing and catching exceptions is expensive - depending on the benchmark, up to 50% slower than using Either, which should be comparable to Option in this case.

String#toInt is a good example of why this is an unpleasant API to work with, and is why String#toIntOption was added.

rcano · November 7, 2020, 8:44pm

Enumeratum is the de-facto standard enum implementation in Scala. Look at what people use most often, fromOrdinal or the version that returns an Option. I think every other argument pales in comparison.

odersky · November 7, 2020, 9:07pm

String#toInt is a good example of why this is an unpleasant API to work with, and is why String#toIntOption was added.

But the optional case is there because it is very hard to test for legal inputs of .toInt. So it makes sense according to my classification. The case of fromOrdinal is exactly the opposite: it’s very simple and cheap to test for legal arguments, so that’s what you should do.

Enumeratum is the de-facto standard enum implementation in Scala. Look at what people use most often, fromOrdinal or the version that returns an Option. I think every other argument pales in comparison.

That kind of illustrates my point that Option is over-used, and we should all be more discerning about when it makes sense. I stand by my argument that it’s the wrong thing to do here.

nafg · November 8, 2020, 1:06am

It’s true that ordinal is MyEnum => Int and fromOrdinal is Int => MyEnum, but they don’t form an isomorphism, they form a prism. Every MyEnum has an Int but not every Int has a MyEnum. So it’s true the arrows go in opposite directions but they aren’t mirror images of each other.

One of the main selling points of Scala is that runtime errors are rare. I don’t see a good reason to weaken that point.

If you want to think of it as a strict interval type then why not do that statically? Scala 3’s type system is powerful enough. Make it statically require a parameter in range. This could be done with a type member, a union of singleton types, an opaque type, or otherwise.

Enforcement as a runtime check is not relevant here. It’s not that much better than return null, that will also result in a runtime exception. It’s true that’s worse because the error will occur elsewhere but making the runtime error local is only slightly better. Scala programmers like to know at compile time that everything lines up correctly.

The inverse property is a fiction. You can’t safely go back and forth between MyEnum-land and Int-land.

It is true that given a specific result of ordinal it’s reasonable to expect to go back to MyEnum without extra ceremony. But Int is not a safe transport mechanism for that. If you want to have that operation you need a narrower type.

The same thing they would do if they had to write the boilerplate of checking beforehand with an if/else, if the input was invalid.

Either way you’re passing the problem to the caller. The only question is whether that fact is documented by the types and enforced by them.

Define “should.” Personally if I have a value that may or may not be legal for getting a result, I want to see if it’s legal for getting a result in one shot. I mean how is this different than every single other Option-returning function? None always means the input had no result.

That’s very far from certain. Besides, going from an Option to an exception is much easier than going from a possible exception to an Option, and much faster. You can do just .get, or to customize the exception, .getOrElse { throw ... }. The other way around you have to import Try, and worry about the performance impact of stack trace collection.

I have no idea what flabby means, but you’re just trading a test on the input value for a test on the return value, except that one is easy to forget and one is safe and robust.

What comparison is that? n >= 0 && n < MyEnum.values.length? Should we also go back to for(int i = 0; i < thing.length; i++)?

If that’s what I have to do I’d never use fromOrdinal. I’d do MyEnum.values.find(_.ordinal == n).

I don’t know what “model strength” means, but I think pretending you can go from Int => MyEnum is covering up not uncovering the “algebraic laws.” The fact is (ordinal, fromOrdinal) form a Prism, not an Iso.

odersky · November 8, 2020, 9:50am

No, ordinal is MyEnum => [0..N) where N is the number of cases. I thought that was clear from my post. That’s the fundamental misunderstanding from which all other differences in opinion follow.

nafg · November 8, 2020, 4:10pm

The bottom line is that the static types are not capturing the runtime types enough to prevent a forgetful programmer from making a mistake. I don’t like APIs that require me to read and remember their individual idiosyncracies in order to use them correctly. I like APIs that are so safe my code practically writes itself, which is the case for most popular libraries designed for Scala.

morgen-peschke · November 8, 2020, 6:50pm

Interestingly, it looks like ordinal cannot be overridden, and attempts to do so don’t produce warnings. So there isn’t a way to safely remove deprecated values, as it would shift the ordinal values of the remaining entries.

This means that ordinal/fromOrdinal is unsuitable for serialization to persistent storage (e.g. a database), so this discussion is probably moot.

import scala.util.Try

enum Foo(ordinal: Int) {
  case One extends Foo(1)
  // case Two extends Foo(2)
  case Three extends Foo(3)
  case Four extends Foo(4)
}

@main
def test (): Unit = {
  Foo.values.foreach { f => 
    println(s"$f -> ${f.ordinal} --> ${Try(Foo.fromOrdinal(f.ordinal)).toOption}")
  }
}

Prints:

One -> 0 --> Some(One)
Three -> 1 --> Some(Three)
Four -> 2 --> Some(Four)

Scastie

som-snytt · November 8, 2020, 7:05pm

IIRC java.lang.Enum uses name for serialization, for this reason. (Not that I ever do that.) The Javadoc for ordinal offers further discipline.

If you try to override val ordinal, you’ll see (misleading or confusing) errors. There is an old ticket requesting better support when a class parameter shadows a superclass member. There is a lint for mutable vars in that position.