Questioning the necessity of `:` in object/class/enum/trait declarations

Apologies in advance if I’m missing something obvious and simple that refutes this post, but I’ve been sitting on this for a while and asked a few other users with no good answers.

I teach a beginner computer science course using Scala.

  1. We use scala3 fewer-braces whitespace notation.
  2. Students are instructed to put spaces before type ascriptions, to unlock columnar vertical alignment across lines and to more closely resemble the type theory literature covered in the course. e.g. we write val x : Int instead of val x: Int.

These two at play have revealed a stumbling point in the syntax for beginners: students repeatedly act as if scope-opening colons are the same sort of thing as type ascriptions, by writing

//       ↓ extra space, incorrect
class Foo :
  val x : Int
       ↑ extra space, correct

and

//    ↓ extra space, incorrect
xs.map : x =>
  

It’s not simply a muscle memory issue, students tell me that they don’t understand why they are supposed to put a space before colons in one scenario but not the other.

Further, students also struggle with understanding why : is not required for extension and match and for etc. Student frequently assume they do need to write : in these scenarios

//                 ↓ incorrect
extension (x : Int):
  def ...
..     ↓ incorrect
x match:
  case ...
//            ↓ incorrect
for x <- xs do:
  ...

That : is overloaded to have multiple, dissimilar meanings in the language, makes it harder to learn, and feel less coherent in design (as in, had Scala aimed to be a whitespace language from the very beginning, I imagine : would not be used in all of these ways. It feels very retrospectively duct-taped on).

I understand that there is probably no fixing the use of : in lambdas at this point, but this has gotten me thinking, what purpose does : even serve in class/object/enum/trait declarations? You already are required by the compiler to put a new line and indent afterwards, it’s not as if users are allowed to write single-line declarations like class Foo: def f = 0 which would actually make : a useful delimiter. The token seems to be entirely redundant, except I suppose for the case of self-types? Are there any other cases?

I would like to propose : being optional in these declarations, with newline+indent as the sufficient delimiter, enabling the following

trait Foo
  val x : Int
trait Bar extends Foo
  val x = 0
class Baz(
  val x : Int
)
  def f = 0
enum Color
  case Red, Green, Blue

It is in fact what most of my students already write most of the time, before I annoyingly remind them that they forgot the colon. (which is evidence that newline+indent alone is already enough for learners (and I know I personally find the colon to only contribute ugly visual noise)).

I don’t believe this change would break pre-existing codebases as : would only become optional, not illegal. Very analogous to how before 3.8, \n was required for higher order functions, but now they are optional and single line lambdas are allowed.

But I may be missing something. If anyone can think of cases where this change would be problematic for the compiler or keeping pre-existing code bases legal, please comment below.


(I would also like to pre-empt that even if one dislikes the scala3 whitespace notation and wishes we would avoid all of this with braces, we should agree that since we have already decided to seriously support it, we should at least make the experience feel more coherent for those who want to use it.)

4 Likes

That was one of the earlier syntax iterations and the : was added in for visual clarity.

1 Like

Despite not really agreeing with the reasoning for the change, I really do like this suggestion. Adding a colon, when editing code going from a construct with no definitions to one with some, is an annoyance; one that reminds me of having to add opening and closing braces in a pre fewer-braces world. And this would sort of square the extension multi-def shortcut not needing a colon despite appearing like a scope with an identity at first glance.

I dont know if this would make some constructs ambiguous.

2 Likes

Ah, that’s interesting (and a bit frustrating) to find out… If one thinks that : adds superior visual clarity in these declarations (I don’t), presumably they should also think that about the scenarios like match and extension

Perhaps this was a concern that turned out to be a bit overblown in practice, and something we could revisit now? Especially since we now know it has turned out to actually create problems in practice for learners of the language.

4 Likes

This would be an incompatible change. This…

object Foo
  val bar = 42

…is legal code today and declares bar in the same scope as Foo, whereas this…

object Foo:
  val bar = 42

…declares bar inside of the Foo object.

I’m nevertheless in favour of this change, but we would first have to make the first example illegal for a few releases before we can assign a new meaning to it. I think that’s a good thing, there’s really no excuse for writing it that way.

2 Likes

It would. Not that they’re necessarily ideal practice anyway, but

class C(i: Int)
  (j: Int)
    - 3 |> log

If we have pipe |> defined at least on integers, and we have a j in scope that can at least be coerced into an Int, and we have a log function that can take an integer, then there are two valid ways to parse this:

// Takes 1 arg, logs j-3 when instantiated
class C(i: Int):
  (j: Int)
    - 3 |> log

// Takes 2 args, logs -3 when instantiated
class C(i: Int)
  (j: Int):
    -3 |> log

This is admittedly a bit of a corner case, but at the very least it illustrates that if you have multiple arguments and wrap them to different lines, you need to keep parsing not just go “Oh, newline, let’s pretend this is a colon!” So it’s a complicated parse.

If we want to restore uniformity, I think we instead need to add colons everywhere they are invalid now.

if p(x) then:
  foo(x)
else:
  bar(x)

I don’t think any of those are ambiguous. Colon should be considered bad practice, I think, but at least it parses.

(Bad practice because it is not great for the poor humans who have more work to do to figure out whether : means block opening or a type ascription.)

1 Like

well then we also need colons to be accepted here :wink:

if p(x):
  then foo(x)
  else bar(x)

this was why I don’t really believe the control structures with colons argument was particularly convincing. these structures don’t start the scope of something that has an identity, and they have very relaxed syntaxes that accept newlines in multiple positions.

But there are no blocks in your example.

15
  * 5
  + 3

looks, from the indentation, like a block. But it isn’t.

These both took me so long to mentally process that I think it would be a benefit for them to become illegal in future releases. I imagine the number of people in the world intentionally writing code like that, or like @mberndt’s example, is very very small.

For the example of consecutive parameter lists, perhaps we could require that if parameter lists are separated across lines then they must be exactly vertically aligned, e.g.

class Foo(x : Int)
         (j : Int)
    -3 |> log
class Foo
    (x : Int)
    (j : Int)
        -3 |> log
        def fn = ???
class Foo
(x : Int)
(j : Int)
    -3 |> log
    def fn = ???
class Foo (x : Int)(j : Int)
    -3 |> log
    def fn = ???
class Foo
    (x : Int)
    (j : Int)
extends Bar
    -3 |> log
    def fn = ???
class Foo
    (x : Int)
    (j : Int)
    extends Bar
        -3 |> log
        def fn = ???
class Foo
    (x : Int)
    (j : Int) extends Bar
        -3 |> log
        def fn = ???
class Foo(x : Int)
         (j : Int)
extends Bar
    -3 |> log
    def fn = ???
class Foo(x : Int)
         (j : Int) extends Bar
    -3 |> log
    def fn = ???

etc. would be the only legal variations. Though I’m not sure about allowing all of those variations of where to put extends – but of course we still have : as an option for those scenarios. We could keep the general case nice and reserve : as a last resort?

The thing is that although it looks weird here, when you have serious, complicated parameter blocks, especially ones that repeat, it’s actually quite natural to have them in that form.

Rather than having k out of n possible formattings be blessed as “okay that looks fine”, this is an indication that the feature is questionable as conceived.

In fact, I would say that this indicates that for code like this, braceless notation is also questionable.

class C(i: Int) {
  (j: Int)
    - 3 |> log
}

class C(i: Int)
  (j: Int) {
    - 3 |> log
}

Pretty clear both ways, compared to the others at least.

But that doesn’t mean that we can allow even less clear options, and it also doesn’t mean we should get in the business of deciding which precise formats are permissible and which are not. Having to memorize a bunch of permitted formats is really annoying. People should be able to reason about what is going to work.

For instance, we ought not make any rules that require alignment whitespace and non-whitespace because that forces us to take a particular stance on tabs. So

class C(i: Int)
       (j: Int)

is out. (Block indentation is fine, because whatever you use, it all has to be the same.)

1 Like

To be clear, do you have a stance on this proposal? I know that you’ve said you personally use curly braces for classes, but do you still support us trying to find a way to make : optional for the general case?

I would be fine with the concession of requiring : in the case of multi line parameter lists (just as I could be fine with the concession of requiring {} in such scenarios), if I could omit them in the vast majority of cases.

Could you give an example? I’m failing to think of a reason why someone would want the first parameter list to be vertically out of alignment with all of the following ones, besides maybe the readability isn’t important to them enough to justify manual indentations that are not perfectly 2/4/8 sometimes etc…?

When I personally write classes with multiple parameter lists, I do have them on their own lines, but I either do

class C(
    x : Int, y : Int
)(
    z : Int, w : Int
) extends B

or

class C(i: Int)
       (j: Int)

I would never choose to write

class C(i: Int)
  (j: Int)

As that is very jarring, visually

Of course I do generally respect the idea of not enforcing good style and letting users have the liberty to write less readable code (since it is their code, not mine), but given that the requirement of : was apparently motivated primarily as a redundant token to enforce a subjective style judgment about visual clarity, we’re already in that arena…

And something else worth saying that I’ve observed: even after I explain to learners that : simply means multiple dissimilar things in the language and that they can instead choose to use {} to avoid confusion, they usually immediately recoil and still decide to use : (these are students not coming from any prior c languages where {} are common, the students who do come from those languages of course are delighted to see that braces are supported.)

Apparently, for beginners, the visual taste of whitespace is enough to outweigh the confusion : has in their head over a more coherent but uglier {} notation.

Well, I originally thought it might be a good idea–I’m certainly in favor of regularity, and the sometimes-colon-sometimes-not situation is irregular. Then @aepurniet came up with the question of ambiguity. After finding an ambiguity, I now think it probably isn’t a good idea.

Except as the example shows, this is actually not the case–it’s needed to disambiguate.

Sometimes it doesn’t occur to one that it’s a problem.

class PairedBufferedImage(bufferSize: Int)
  (first: BufferedImage,  initializeFirst: Boolean  = false)
  (second: BufferedImage, initializeSecond: Boolean = false):
    lazy val merge(mergeFn: (Int, Int) => Int): BufferedImage = ???

It’s not the prettiest, but moving bufferSize to its own line, or indenting everything as deep as bufferSize, only makes it uglier.

Anyway, maybe the conflicts are esoteric enough so it could work. I don’t know. I’m uneasy about introducing ambiguity, though, especially since I think the feature is already in danger of making things less clear rather than more.

1 Like

Well, it could be both true that it’s needed to disambiguate these cases and that disambiguating these cases was not a motivator on their mind at the time of decision. But maybe it was, I don’t want to make too many assumptions…

I just mean that if we were trying to prioritize visual clarity and coherence in the language overall, it’s very questionable whether : was the right choice for that in hindsight.

Lean4 uses where as their token,

structure Foo where
  x : Int

I would have been happy if something like this were chosen over :, it would have complemented the then, do markers also.

Maybe in that case the way the feature should work is that it’s greedy. So

class C(i: Int)
  (j: Int)
  - 3 |> log

is a syntax error because if it could possibly be permitted as part of the class constructor parameter lists, it is included, and - 3 |> log is at the same depth, and

class C(i: Int)
  (j: Int)
    - 3 |> log

means

class C(i: Int)
  (j: Int) {
    - 3 |> log
}

because - can’t be part of a constructor parameter list, so (j: Int) is the end of the class constructor parameters, and thus - 3 ... is the start of the indented block.

So even if people do write it this way, it’s resolved syntactically, not semantically (i.e. whether or not j is in scope is irrelevant).

Oh, well, I was arguing for .. or some other novel token precisely to avoid :, but although the technical arguments for .. or somesuch were stronger, aesthetics won the day (not entirely unreasonably) and : was favored.

where would not be my preference because in Rust it’s used to add constraints to generic type parameters (and with Lean4 I think that is pretty much what it’s doing too?–adding type constraints?). I could go for a soft keyword of has or something like that.

1 Like

Everyone, I think that by focusing on the “optional colons” proposal, we miss something more important here. This is clearly an inconsistency - maybe not a big one, but one that somehow got into the syntax. What’s the reason for this? No tests on living people actually coding in Scala? Or is it because the syntax complexity grew so much that those little details are getting forgot? Both? I’m worried that with more new syntax features coming in, this hole in the process may lead to more serious problems in the future.

5 Likes

Is this rule necessary? I’ve worked in a lot of Scala codebasss the past 12 years and I’ve never seen it, so if you could get away with eliding the space here that makes half the problem go away

6 Likes