Proposal to add top-level definitions (and replace package objects)

lihaoyi · March 25, 2019, 11:07am

@sjrd you raise a good point. Unlike Python (or Ammonite) which would trigger top-level code any time the module is imported, Scala would only trigger them when a top level val/var/def is referenced, but not when top level class/object/types are referenced. That is surprising.

Presumably this surprisingness is already present in package objects, but those are uncommon and used much less than we expect top-level definitions to be.

There is also the question of, given we want to use this top-level code as program entrypoints, how do we change the various scala runners to specify which top level code to run? These top-level code blocks basically become main methods, and will need to be specifiable in scala, SBT, Mill, and so on.

Perhaps we could consider a slightly more limited scope:

Top-level statements can only be used in *.sc files; these are picked up by the Scala compiler similar to *.scala files
*.sc files automatically generate a Java-compatible main method with the name of the class being the name of the file e.g. Foo.sc generates a class Foo with a main method (perhaps mangled in some way to avoid collisions?)
We ban top-level var and vals within *.scala files, as @nafg suggested. It’s not the end of the world to label the vals with lazy to get a more predictable initialization semantic, and top-level mutable state is rare enough the boilerplate of stuffing it in an object is no big deal.

This would have the following consequences:

Standalone *.sc files become code that people can run via scala (this is already possible), or via alternate runners like amm (to the extent that they are compatible, which they mostly are)
*.sc files can also serve as entrypoints to larger applications, with the benefit that the entrypoint of a large codebase can trivially be seen from the filesystem without needing to dig through individual files to hunt for def main methods (or extends App, …). Essentially, you could start off with a standalone script, and as it grows seamlessly incorporate it into a multi-file project with a proper build tool by adding *.scala files.
*.scala “library” files maintain their current “statelessness”: you cannot accidentally trigger a top-level side effect when dealing with a *.scala file, only by calling their defined functions, instantiating their classes or referencing their (lazy) objects or lazy vals. This also follows the best practice in other languages which allow top-level code, which generally discourage you from having top-level side effecting code in any imported “library” files and only use top-level code in the application entrypoint

Essentially, we would take the convenient “just run code” part of scripting languages, while enforcing the “avoid top level code in imported library files” best practice that already exists, and avoiding any confusion about exactly when top-level code evaluates when non-entrypoint *.scala files are used.

The “seamlessly go from one-file script to multi-file project with build tool” would be a nice experience to people used to Python’s “just import helper code” style of growing out their initial scripts. SBT would already support it (since it allows Scala files in the project root), and Mill and even Ammonite’s script runner could be similarly tweaked to conform to such a "*.sc is entrypoint, *.scala is library" convention with the limitations described above

In this world, we wouldn’t consolidate to a single Scala syntax, but at least we can get everyone to converge towards the same two *.sc/*.scala file extensions with their associated semantics.

This is the best I can come up with so far, unless we can find some way of harmonizing the behavior of top-level code in imported files with that of other languages (i.e. it runs the first time something in the file, anything, is used) to avoid the confusion sebastien brought up.