Proposal to add top-level definitions (and replace package objects)

I agree that if we are going to allow strict top-level definitions, like var and regular val with arbitrary initializers, then we may as well allow arbitrary code at the top level.

If allowing arbitrary top-level code to run on program startup is not part of this proposal, then you would have to require top-level vals to be lazy val, and disallow top-level var.

Allowing arbitrary statements as toplevel definitions and supporting toplevel sourcefiles as programs looks very attractive.

I believe there’s no big issue with allowing statements as toplevel definitions. The problem that we have to explain when side effects happen (i.e. when someone references a definition in the same file) is already present for side-effecting value definitions.

We’d need one more tweak. A toplevel object implicitly defined by src.scala is named src$package. But we surely want to run it using scala src, not scala src$package. This could be achieved by tweaking the scala runner script.

2 Likes

I believe allowing top-level statements would set the wrong expectations, that those statements are executed at the beginning of the program, or somehow “automatically”. But that won’t be the case; they will only be executed once we touch a val, var or def defined in that file. That would be very hard to explain, and even so the normal expectations would not be met (another ward, or put otherwise: “how do we teach this?”).

For the side effects in the rhs of val and var definitions, I am not so worried. It’s relatively easier to convince people that those would only be executed when the definition or one of its siblings is accessed. Also, I think the problem of naive expectations doesn’t happen as easily with those, because usually we don’t put side effects in the rhs of non-local vals and vars so often.

1 Like

@sjrd you raise a good point. Unlike Python (or Ammonite) which would trigger top-level code any time the module is imported, Scala would only trigger them when a top level val/var/def is referenced, but not when top level class/object/types are referenced. That is surprising.

Presumably this surprisingness is already present in package objects, but those are uncommon and used much less than we expect top-level definitions to be.

There is also the question of, given we want to use this top-level code as program entrypoints, how do we change the various scala runners to specify which top level code to run? These top-level code blocks basically become main methods, and will need to be specifiable in scala, SBT, Mill, and so on.

Perhaps we could consider a slightly more limited scope:

  • Top-level statements can only be used in *.sc files; these are picked up by the Scala compiler similar to *.scala files

  • *.sc files automatically generate a Java-compatible main method with the name of the class being the name of the file e.g. Foo.sc generates a class Foo with a main method (perhaps mangled in some way to avoid collisions?)

  • We ban top-level var and vals within *.scala files, as @nafg suggested. It’s not the end of the world to label the vals with lazy to get a more predictable initialization semantic, and top-level mutable state is rare enough the boilerplate of stuffing it in an object is no big deal.

This would have the following consequences:

  • Standalone *.sc files become code that people can run via scala (this is already possible), or via alternate runners like amm (to the extent that they are compatible, which they mostly are)

  • *.sc files can also serve as entrypoints to larger applications, with the benefit that the entrypoint of a large codebase can trivially be seen from the filesystem without needing to dig through individual files to hunt for def main methods (or extends App, …). Essentially, you could start off with a standalone script, and as it grows seamlessly incorporate it into a multi-file project with a proper build tool by adding *.scala files.

  • *.scala “library” files maintain their current “statelessness”: you cannot accidentally trigger a top-level side effect when dealing with a *.scala file, only by calling their defined functions, instantiating their classes or referencing their (lazy) objects or lazy vals. This also follows the best practice in other languages which allow top-level code, which generally discourage you from having top-level side effecting code in any imported “library” files and only use top-level code in the application entrypoint

Essentially, we would take the convenient “just run code” part of scripting languages, while enforcing the “avoid top level code in imported library files” best practice that already exists, and avoiding any confusion about exactly when top-level code evaluates when non-entrypoint *.scala files are used.

The “seamlessly go from one-file script to multi-file project with build tool” would be a nice experience to people used to Python’s “just import helper code” style of growing out their initial scripts. SBT would already support it (since it allows Scala files in the project root), and Mill and even Ammonite’s script runner could be similarly tweaked to conform to such a "*.sc is entrypoint, *.scala is library" convention with the limitations described above

In this world, we wouldn’t consolidate to a single Scala syntax, but at least we can get everyone to converge towards the same two *.sc/*.scala file extensions with their associated semantics.

This is the best I can come up with so far, unless we can find some way of harmonizing the behavior of top-level code in imported files with that of other languages (i.e. it runs the first time something in the file, anything, is used) to avoid the confusion sebastien brought up.

9 Likes

Yes, I planned to implement exactly this, we can then unify the worksheet mode and the REPL and top-level definition files. Furthermore, we can add other features that only make sense in scripts to .sc files, like import-from-ivy.

5 Likes

I propose an alternative solution to the issue of writing programs, aka entry points, in

I really like where this is heading. My first reaction to having a separate file extension was fear that it might cause confusion among the novice programmers I’m working with, but upon further reflection, I think that it would be less confusing because it provides a clear delineation between the two different types of files that act very differently.

One of the limitations of the current Scala scripting model is that you can’t easily mix scripts with normal Scala code, so a script has to be completely self-contained. This approach would break down that barrier and allow a smoother transition from scripting to writing full applications in Scala.

2 Likes

I’ve long felt that Scala lacks the differentiation between an immutable value and a compile time constant or literal. So it would be desirable to have to top level constants or literals. It would also be desirable to have top level literals for compound deep value types:

lit pi: Double = 3.14159265358979
lit specialPoint: Vec2 = Vec2(2.435, -0985)

I’m not sure what you’re referring to. If you write final val it is considered a compile time constant, IIUC. But what differentiation are you looking for?

Currently implicit priority can be defined by inheritance (e.g. https://github.com/scala/scala/blob/v2.12.8/src/library/scala/math/Ordering.scala#L145)

How to include multiple implicit methods with different priorities in a package without the help of inheritances of package object?

There is compile time constant:

Welcome to Scala 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202-ea).
Type in expressions for evaluation. Or try :help.

scala> final val pi = 3.14159265358979
pi: Double(3.14159265358979) = 3.14159265358979

Note the type of pi is the literal type 3.14159265358979, instead of Double.

So “final val” is a literal for Jvm value types? Presumably including Int, Boolean and String? Do you need to add the “final” keyword to the val for final class and singleton objects?

final val r: Int = scala.util.Random//Presumably this is not a literal.

final val s: Int = 2 + 2 //Is this a literal?

You have to delete the type annotation : Int in order to infer it as a literal type.

So I think that’s a wonderful idea. I would love to have .scala files represent the library and .sc files representing the executables. This would give a really clean segway from mashing code to extracting reusable components, and it would, I think, be fairly easy to teach to students and non-expert professional users e.g. data scientists and bioinformaticians. We don’t need to be beholden to Java. There’s no god-given requirement for us to be wedded to static main methods.

Is this sort of thing – wrapping a file based on its extension – a job for the compiler? Maybe it’s a job for the build tool and/or a compiler plugin. After all, at a high level this is analogous to Twirl – if a file has a certain extension, it’s equivalent to a Scala source file under a certain transformation.

Wouldn’t top level statements run when the object representing the top level defs get imported, like how object initialization work at the moment?

Define “when imported”. It’s actually only when one of the val, var or defs (not other stuff) of the top-level of that file is accessed (not imported). That’s extremely surprising.

I believe @lihaoyi’s proposal based on wrapping *.sc is technically sound.

That said, I also think it’s heading in the wrong direction as far as language design goes. Yes, the syntax of top-level statements is accepted by different tools, but each such tool gives different semantics to them. Sometimes they inject special imports; sometimes they run stuff in a different way (e.g., worksheets associate results to individual statements; sbt builds interpret top-level terms as expressions and use the result of each such expression; Ammonite gives entirely custom semantics to special kinds of imports; etc.)

That would also give different top-level grammar goals based on an external factor, i.e., the file extension. ECMAScript went that route with Scripts and Modules, and the ecosystem still doesn’t know how to deal with that (see Node.js’ proposal to support ES modules, for example). I don’t think this is the way to go.


To relieve the existing tools from the non-standard syntax aspect, we could allow top-level statements in the syntactic grammar–perhaps going as far as typing them–but then reject them in a later phase of the standard compiler. Tools that want to do some magic with top-level statements can then hijack them after the regular parser (and typechecker), rather than each doing their own stuff.

But baking a main method concept with top-level statements in the standard compiler is not going to end any better, I believe. Many existing tools using top-level statements wouldn’t even be happy with that standard treatment.

2 Likes

I think this is a reasonable thing that everyone would agree on for now, in the context of top-level definitions: we preserve the current property of *.scala files that all top level declarations only take effect when referenced, and there are no file-level side effects that can kick in at unpredictable times.

Simplifying main methods, or converging on a *.sc syntax, is an orthogonal issue and can be discussed separately without getting in the way of top-level declarations.

2 Likes

IMHO, allowing println("Hi Mum") in a .scala file at top-level is a horror show waiting to happen. Now, you raise the issue of the semantics of the scripty scala form. There seem to be two issues here that I think are orthogonal:

  • default environment
  • evaluation semantics

The default environment issue I think can be addressed by either having specific extensions (e.g. .sbt, .am) or bundling them up into a standardised environment import e.g. import ammonite.environment, much like the language feature flags are. The mechanic through which this works is plumbing, and largely booring (e.g. it could be importing a scala.language.ScriptEnvironment instance with a well-named macro that injects the environment).

As for the evaluation semantics, there appear to be two of them. The first one runs the statements just as you would in main. The other collects a dictionary (or bag) of values associated with names, and then makes this available to some down-stream process. In the case of interactive worksheets, this dictionary is used to decorate the IDE with the evaluated values. In ammonite or the scala repl, it gives you interactive values to play with as you continue to type. In SBT, this dictionary becomes the parameterised build commands/environment. But fundamentally it’s the same deal - you evaluate each statement, and record a memoisation of the result against any identifier, minting a new one if the statement is anonymous. The main semantics then reduces to the special case where you decline to do anything with that dictionary, and run it purely for the side-effects.