Proposal to add top-level definitions (and replace package objects)

Wouldn’t top level statements run when the object representing the top level defs get imported, like how object initialization work at the moment?

Define “when imported”. It’s actually only when one of the val, var or defs (not other stuff) of the top-level of that file is accessed (not imported). That’s extremely surprising.

I believe @lihaoyi’s proposal based on wrapping *.sc is technically sound.

That said, I also think it’s heading in the wrong direction as far as language design goes. Yes, the syntax of top-level statements is accepted by different tools, but each such tool gives different semantics to them. Sometimes they inject special imports; sometimes they run stuff in a different way (e.g., worksheets associate results to individual statements; sbt builds interpret top-level terms as expressions and use the result of each such expression; Ammonite gives entirely custom semantics to special kinds of imports; etc.)

That would also give different top-level grammar goals based on an external factor, i.e., the file extension. ECMAScript went that route with Scripts and Modules, and the ecosystem still doesn’t know how to deal with that (see Node.js’ proposal to support ES modules, for example). I don’t think this is the way to go.


To relieve the existing tools from the non-standard syntax aspect, we could allow top-level statements in the syntactic grammar–perhaps going as far as typing them–but then reject them in a later phase of the standard compiler. Tools that want to do some magic with top-level statements can then hijack them after the regular parser (and typechecker), rather than each doing their own stuff.

But baking a main method concept with top-level statements in the standard compiler is not going to end any better, I believe. Many existing tools using top-level statements wouldn’t even be happy with that standard treatment.

2 Likes

I think this is a reasonable thing that everyone would agree on for now, in the context of top-level definitions: we preserve the current property of *.scala files that all top level declarations only take effect when referenced, and there are no file-level side effects that can kick in at unpredictable times.

Simplifying main methods, or converging on a *.sc syntax, is an orthogonal issue and can be discussed separately without getting in the way of top-level declarations.

2 Likes

IMHO, allowing println("Hi Mum") in a .scala file at top-level is a horror show waiting to happen. Now, you raise the issue of the semantics of the scripty scala form. There seem to be two issues here that I think are orthogonal:

  • default environment
  • evaluation semantics

The default environment issue I think can be addressed by either having specific extensions (e.g. .sbt, .am) or bundling them up into a standardised environment import e.g. import ammonite.environment, much like the language feature flags are. The mechanic through which this works is plumbing, and largely booring (e.g. it could be importing a scala.language.ScriptEnvironment instance with a well-named macro that injects the environment).

As for the evaluation semantics, there appear to be two of them. The first one runs the statements just as you would in main. The other collects a dictionary (or bag) of values associated with names, and then makes this available to some down-stream process. In the case of interactive worksheets, this dictionary is used to decorate the IDE with the evaluated values. In ammonite or the scala repl, it gives you interactive values to play with as you continue to type. In SBT, this dictionary becomes the parameterised build commands/environment. But fundamentally it’s the same deal - you evaluate each statement, and record a memoisation of the result against any identifier, minting a new one if the statement is anonymous. The main semantics then reduces to the special case where you decline to do anything with that dictionary, and run it purely for the side-effects.

I wrote up a separate post:

Perhaps we can assume that we will prohibit top-level side-effecting statements/vals/vars in this thread on top-level definitions, and move discussion on the main-method-entrypoint stuff to that post

1 Like

For me, val sideEffect = println("Hi Mum") is the same. To allow one, but not the other is very weird, IMO. So I think top level should either be only lazy val and def or we should remove any other restraints and allow statements.

6 Likes

I don’t think they’re the same. For a top-level statement

println("Hi Mum")

the only reasonable naive expectation is that it is executed “when the program starts”, which is unrealistic to implement.

For

val sideEffect = println("Hi Mum")

it is easier to convince oneself that the side effects will only be executed once sideEffect or one of its siblings is first accessed, the same way the constructor of an object is only executed once that object is accessed for the first time.

I share this concern. I think top-level val definitions might introduce too much confusion.

Are top-level definitions allowed in the “empty package”?

val sideEffect = println(“Hi Mum”)

I’d need convincing that this was sane. I’ve had experience in the past with Java libraries that load in side-effecting values (e.g. from files) into static variables, and it results in incredibly brittle behaviour as it’s unclear what programs will or won’t trigger a resource to be loaded, and therefore which may or may not result in the side effect raising exceptions during class loading. I don’t feel that restricting top-level vals to being lazy is that onerous, and it does force the person writing the top level value to pause and think if they are doing something sane.

What we have then:

  • var are bad design and we should avoid them anyway. DONT ALLOW
  • val has problem with side effects, and It is hard to say when they are Initialized. DONT ALLOW
  • lazy val are safe OK
  • def are safe OK

Do we bother about top level val and var only because we want to drop package objects with this proposal? Is there any other reason why not allow only lazy vals and defs at top level?

4 Likes

But who are the siblings? They used to be all vals, vars and defs in the same namespace. In this proposal suddenly the semantics change depending on which file contains which definitions. The only way to understand why side effects happen when they do is by knowing how they are compiled into an object per file. i.e. top-level eager side effects leak an implementation detail.

1 Like

There’s a proposal for new implicit resolution rules that offer better ways to prioritze than by location of original definition.: #6071 (comment).

2 Likes

Making initialization order more unintuitive than it is now will only make things worse. Right now Scala has inherited from Java the class initialization order (hey unexpected nulls in vals), but also added unintuitive initialization order of nested objects. Inner object can be initialized without initializing outer object, like here:

object Outer {
  println("outer")
  
  object Inner {
    println("Inner")
    
    val x = 5
  }
}

println(Outer.Inner.x)

Above code prints only this:

Inner
5

Adding extra rules for initialization order of top level definitions will only make things more confusing as a whole, especially when hunting for bugs (and beginners do a lot of bugs, partly because they write low quality code).

1 Like

Will top level implicit objects be allowed under this proposal? It would be nice if we could use a trait’s /classes’ companion object as the implicit evidence, rather than just as a container for the implicit evidence object.

Yes. You can put implicit/implied declarations at the top level. It works great for things that add syntax to types defined elsewhere.

Or using the underscore for a name, thus removing the risk of conflicts and the need for the package object syntax. The underlying name might be synthesised by the compiler to reduce conflict risks.

package myLib

object _ {
  ...
}

This would reflect the fact that you’re somehow doing import myLib._

Still I’m still bugged by rules about how the enclosing file should be named

one more meaning of _. Please don’t.

I think that export (see export dotty docs) covers my proposal quite well, so I’m dropping it.

1 Like