Pre-SIP: Disallow restricted `$` compiler identifiers

nicolasstucki · February 16, 2024, 10:28am

Summary

The $ character is reserved for compiler-synthesized identifiers. User programs should not define identifiers which contain $ characters.

Unfortunately, the use of such identifier in user code can lead to compiler crashes, runtime crashes, strange compilation error, or at worst an unexpected change in behavior.

Only a handful of users know about this restriction. Many libraries have used it in the past without knowing the dangers it poses.

This Pre-SIP proposes a definite solution to this problem.

Motivation

Ensure that users do not defined identifiers with undefined behavior without their knowledge.
Remove add-hoc (incomplete) patches from the compiler.

Proposed solution

Disallow user-defined identifiers containing $ and give an escape hatch for the few cases where they are needed.

High-level overview

Given that libraries have accidentally used identifiers with $ we need to have a migration path. We should first emit a migration warning when such an identifier is defined. Then in some following version we can disallow them by emitting an error.

At the same time we need to introduce a way to disable there warnings/errors. As there are few cases where we can define them soundly (i.e. in the Scala 2 standard library), we should make this a -Y compiler flag (-YallowRestrictedCompilerIdentifiers).

Specification

When parsing the name of a definition, the parser should check the identifiers for $ and emit the warning/error unless -YallowRestrictedCompilerIdentifiers is enabled.

Compatibility

This will only affect source compatibility for projects that misused identifiers with $. It will not affect binary compatibility.

Alternatives

In dotty/issues/#18234/comment it was proposed to add this as a linting option (also see dotty/pull/#18563). This would provide the necessary warning, but it does not address the source of the problem. Users that don’t know about this restriction will also not know about the linting option, this will only help experienced users that already know the problem.

Related work

Issue describing this problem: Enforce spec on identifiers containing `$`s · Issue #18234 · lampepfl/dotty · GitHub

A small and incomplete list of issues that this has causes. It is quite difficult to search for these issue in GitHub or Google due the the $ character.

odersky · February 16, 2024, 10:36am

The Scala rules about $ are exactly the same as the Java rules. And as far as I know, it’s not a problem in Java. So why should we be more paranoid than the Java developers?

sjrd · February 16, 2024, 11:02am

Two reasons IMO:

Java developers don’t use symbols in their identifiers as a rule; in Scala we do this with other symbols, and $ does not seem fundamentally different than +
We have a lot more sources of compiler-generated $s in Scala than in Java; in Java there is basically only inner classes; in Scala we have symbols, module classes, anonymous identifiers (such as params), perhaps others.

soronpo · February 16, 2024, 11:08am

I’m OK restricting $ from identifiers, as long as we can still apply them with @targetName.

bishabosha · February 16, 2024, 11:09am

is Spark one of those acceptable escape hatches?

e.g.

$"columnName"

devlaam · February 16, 2024, 12:14pm

Without being hindered by any knowledge of the compile internals, would it not be possible to replace the use of $ by the compiler for some other symbol way up in the unicode table, or a non printable one like 0x11, or even a much less used one like 0xB6?

nicolasstucki · February 16, 2024, 12:34pm

They would have to use the escape hatch. We will continue doing our best effort to make this work properly. As far as I know, the compiler does not generate methods with a single $; therefore, we should not have any conflicts now. We would need to keep ensuring that we the compiler do not add such a definition.

It is a bit more problematic with class $, class $$, and object $ where we have ambiguities when classloading.

odersky · February 16, 2024, 1:09pm

I am in principle against driving the behavior of what the compiler accepts by a -Y flag. We are just now leading parallel discussions about whether we can get rid of those flags.

I maintain my belief that this does not need fixing and any fix would likely make things worse. We have a lot of more urgent things to do than fiddle with this.

soronpo · February 16, 2024, 2:02pm

Maybe the best approach would be to generate a warning (which can suppressed with wconf).
Would do the job of notifying the user that this is a bad thing to do, and enables the user to acknowledge and move on.

nicolasstucki · February 16, 2024, 2:58pm

The escape hatch does not need to be a -Y flag. It could be a normal flag, or a language import or something else like the the use of @targetName.