Pre-SIP: program Foo = { println("Hello world!") }

sjrd · March 25, 2019, 1:30pm

Hello everyone,

The discussion of the proposal to add top-level definitions brought back to life the issues of top-level statements, aka what to do about an easy syntax to write a program.

This is a pre-SIP for a solution, which I came up with, with input from @odersky, @AleksanderBG, @nicolasstucki, @densh and @OlivierBlanvillain.

Problem statement

We want an easy and simple way to define a main program, i.e., an entry point for the execution of an application. Without any special support from the language or standard library, what we need to write is an object with a main method:

package foo

object Bar {
  def main(args: Array[String]): Unit = {
    val who = "world"
    println(s"Hello $who!")
  }
}

The above has two main issues:

It contains several concepts that are irrelevant to a beginner who just wants to print something on the screen, so it is awkward to teach;
It contains a lot of boilerplate, which annoys even experienced developers who write a lot of such entry points

Previous approaches

Scala has had two different approaches to solving this problem, both library-based.

`scala.Application`

A developer could write a main object as follows:

package foo

object Bar extends Application {
  val who = "world"
  println(s"Hello $who!")
}

the trait Application being straightforwardly defined as

trait Application {
  def main(args: Array[String]): Unit = ()
}

The main method was mixed in Bar, and did nothing. However, calling it would force the constructor of Bar to execute, which would run the program.

That approach had a very severe issue: the entire application would run within the constructor of the object, which means inside a static initializer, which means within the initialization lock of the class. This causes deadlocks if multiple threads try to access members of the main object.

Another small problem is that the args are never accessible.

`scala.App`

To remedy the above two issues, another trait was introduced. At use site, it looks exactly the same:

package foo

object Bar extends App {
  val who = "world"
  println(s"Hello $who!")
}

however its implementation is very different:

trait App extends DelayedInit {
  /** The command line arguments passed to the application's `main` method. */
  protected final def args: Array[String] = _args

  private[this] var _args: Array[String] = _
  private[this] val initCode = new ListBuffer[() => Unit]

  override def delayedInit(body: => Unit): Unit =
    initCode += (() => body)

  final def main(args: Array[String]) = {
    this._args = args
    for (proc <- initCode)
      proc()
}

It relies on the DelayedInit mechanism–which was invented specifically for App–to move the body of the constructor of Bar into the initCode lambdas. The main method can then store the args and then call the previous body of constructors.

DelayedInit however has several unfixable issues, which led to it being deprecated in 2.11.0. Support for App has been preserved nevertheless until a better solution came, but so far that has not happened.

Proposed solution

I propose the following solution to the main problem. We introduce a new soft keyword program, which is only a keyword when directly enclosed by a package block (remember that package “statements” open an implicit package block). Its usage looks like:

package foo

program Bar = {
  val who = "world"
  println(s"Hello $who!")
}

Intuitively, program Bar above introduces a main entry point, whose fully qualified name is foo.Bar. The right-hand-side after the = sign is the definition of the program, and is executed as a main method. Once compiled, it is possible to invoke it with

$ scala -cp . foo.Bar

for example.

The right-hand-side has its own scope, which is a local scope like that of methods. Within that scope, the identifier args of type Array[String] is visible and refers to the command-line arguments.

Formally, program is defined as a straightforward, syntactical-only desugaring. The general form

program X = <expr>

is rewritten as

object X {
  def main(args: _root_.scala.Array[_root_.scala.String]): _root_.scala.Unit = <expr>
}

This has the following consequences:

program X introduces a term X
The identifier args is visible in <expr>
The identifier main is also visible in <expr> (it refers to the main method itself) – this is not strictly speaking desirable, but allowing this “leak” greatly simplifies the specification and the implementation
Local definitions inside the <expr>, such as who in the example above, are kept local to the synthesized main method. They are never visible nor import-able outside of the program declaration.

Alternatives

The “magical” introduction of args can be seen as problematic. An alternative would be to use _ as the parameter name, therefore making it invisible instead. This is a bit less magical, but then there is no way to get access to the command-line arguments if we want to, limiting the usefulness of program. Besides, we leak main anyway, so it does not seem a stretch to “leak” args.

An alternative syntax does not include the = sign:

package foo

program Bar {
  val who = "world"
  println(s"Hello $who!")
}

Given the shape of existing constructs in Scala, this alternative suggests that definitions inside the program would somehow be importable from the outside, such as import Bar.who. That is however not possible. The = sign solves this ward by clearly marking the body as something similar to the body of a def or the rhs of a val (and in fact it is the body of a def). It also removes the need for the {} altogether if the body is a single expression/statement, so the following is legal:

program Bar = println("Hello world!")

Backward compatibility

Since program only takes a special meaning when directly inside a package declaration, this proposal does not break any existing code. It is indeed illegal, currently, for anything to start with the token program in a top-level position.

Implementation effort

Trivial, as it is literally a syntactical rewriting.

MarkCLewis · March 25, 2019, 1:53pm

In the list of previous approaches, you leave out the Scala scripting option, which is what we currently use in CS1. In that environment, the original program is println("Hello World!") and you run it by simply typing scala HelloWorld.scala. There is no boilerplate at all. It is just as simple as in a standard scripting language like Python.

For the novice, I prefer the other proposal that has been floated where files ending with .sc are basically script files that contains top-level code and .scala files remain unchanged. My hope would be that in a .sc file students could simply enter println("Hello World!") and then run it with scala HelloWorld.sc. It would be easy to have the scala program do this special handling for .sc files to run them as scripts, but also allow scalac to compile them to a main that allows them to be entry points for applications.

I acknowledge that there could be other factors that play a role here, but I would rather not introduce a new keyword and even though this proposal is fairly simple, it still has overhead compared to the current scripting option.

smarter · March 25, 2019, 1:54pm

Nice proposal! Too bad it’s mostly about syntax which means we’ll debate it endlessly :).

I don’t find that agument convincing, the same argument could be applied to trait Bar for example, and it just seems much more natural to me to write program Bar { ... } without an equal sign.

While we’re bikeshedding, I would write main Bar { ... } instead of program Bar { ... }, thus making it clear what this actually does.

sjrd · March 25, 2019, 1:59pm

The problem is that the scripting-based proposal does not, actually, lend itself to evolution to regular .scala files. There are endless issues with scalac somehow compiling top-level statements to a main method. You’ll either end up with the same issues as Application (deadlocks due to the initialization lock) or App (weird semantics of when statements and initializations are done), not to mention the naming problem (how to actually get the scala -cp foo.Bar incantation right). There are several details in the other thread, which I’d rather not duplicate here.

LPTK · March 25, 2019, 2:04pm

Why introduce an entirely new language construct for such a specific use case?
Especially when what you want really behaves like a method.

I would propose to use a type-directed approach instead:

def myProgram: Application = { println(args.toList) }

Where we have:

@scala.annotation.implicitNotFound(
  "This definition can only be used within an Application context")
case class ApplicationContext(args: Array[String])
type Application = given ApplicationContext => Unit
def args: Array[String] = given (c: ApplicationContext) => c.args

The compiler would generate an object with a main method for all top-level methods that have type Application.

Then we should reap most of the benefits (including friendliness to new users), without any new language features. And this way, we can also easily call into such entry points from another entry point, or even from any place you want, really:

scala> myProgram given ApplicationContext(Array("hello"))
List(hello)

sjrd · March 25, 2019, 2:05pm

trait defines a type. program introduces a term, like object (not surprising, since it is an object). For terms I expect to be able to select their members from outside. For types I do not.

program has precedent in Pascal. I am, after all, Niklaus Wirth’s academic grandson.

sjrd · March 25, 2019, 2:08pm

Because, once again, the name at the JVM level which ends up being the one used in the scala -cp . foo.Bar incantation.

Also, it forces the introduction of args in Predef, which is undesirable. (Yes, it’s protected by the implicit context, but still.)

LPTK · March 25, 2019, 2:09pm

What’s wrong with scala -cp . foo.myProgram?

The generated class could be named after the method.

sjrd · March 25, 2019, 2:13pm

It can’t, actually, because now you have serious issues with your actual scope: you both have the def and the generated class in the toplevel scope. This would introduce many complications.

LPTK · March 25, 2019, 2:22pm

As far as I understand the proposal for top-level definitions, we’d end up with a wrapper object src$package that contains the myProgram definition. Nothing prevents the compiler from generating a top-level class named myProgram, whose main method delegates to src$package.myProgram. Or am I missing something?

Ichoran · March 25, 2019, 2:25pm

I think the proposal is backwards. The proposal should start like this:

We want to be able to run the following as a .scala file:

val x = 42
println(s"The meaning of life is $x")

and then reason from there to what is necessary to make that happen. If there is some compromise regarding how efficiently external code can access x, so be it. If you have to create a synthetic class and a synthetic object and write forwarders between the two, some of which are wrapped in synchronized blocks and/or atomicWhatever, fine. If you need to have all the vals actually be private vars with forwarders and separately handle initialization and detect and forbid circular references, that’s fine too.

But I think we should try really hard with a proposal like this to make the above work and only back out once it’s fairly exhaustively been shown to be impossible or to have tragic consequences.

DavidGregory084 · March 25, 2019, 2:40pm

I think that that depends on whether this proposal really is about running top level statements in .scala files, or about getting the Scala compiler to emit valid Java main methods.

@sjrd wouldn’t the following solve your problem about the magic introduction of args:

program Bar = { args =>
  val who = "world"
  println(s"Hello $who!")
}

i.e. program’s expr must be implemented as Array[String] => Unit

Ichoran · March 25, 2019, 2:56pm

My point is that if it’s not about this, it should be.

It’s not that hard to

object Whatever {
  def main(args: Array[String]): Unit = {
    ...
  }
}

which, if you’re willing to learn stuff and type for a few seconds, gives you a totally valid main method. Learning different stuff and typing for not quite as many seconds is rather different from not worrying about it at all.

If we can’t get to the zero-syntax just-works case, I’m not sure it’s worth the effort. There isn’t that much magic stuff to write.

DavidGregory084 · March 25, 2019, 3:05pm

Yeah, I am dubious about the value of adding more keywords too, there is already a lot of syntactic churn planned in Scala 3.0

joshlemer · March 25, 2019, 3:35pm

I think that just allowing a user to declare a main method at top level should be plenty user-friendly,

val x = 3

def main(args: Array[String]) = { println("hey..") }

and should be familiar to anyone coming from C/C++/Kotlin/Rust/Go. Even in Python, isn’t it usually the case that people place statements inside

if __name__ == '__main__':
  print("...")

ryanstull · March 25, 2019, 3:47pm

I agree with @Ichoran’s point, that if we can’t get to the 0 syntax case, it’s probably not worth doing.

What if instead adding a new construct to the language to support this, we changed the scala command to support scripts?

So you could have a script file like Test.scala

val a = 123
println(a)

Then you run scala Test.scala and scala detects whether or not this is an object. If it isn’t, it can simply wrap the entire contents of the file in

object <Filename> {
   def main(args: Array[String]): Unit = {
      ...
   }
}

and then executes that.

So really, since the scala command already supports running compiled classes with a main method, and source code of an object with a main method, all that is needed, is to detect if the input file doesn’t fit either of those cases, and then wrap it’s contents in an object's main method, and run that.

lihaoyi · March 25, 2019, 3:48pm

To push this a bit further, the proposed program syntax is barely any fewer tokens than object Foo extends App or object Foo extends Application. Even if DelayedInit is gone, we still can implement this entirely in userland with 0 language changes:

class App2(x: => Unit){
  def main(args: Array[String]): Unit = x
}
object Boo extends App2({
  println("Hello World")
})

I don’t think saving 3 tokens is worth the cost of adding a whole new keyword and declaration type to the language.

Furthermore, such a program declaration would be 100% novel, and would have none of the de-facto-standardness that *.sc entrypoint files already do in scala/amm/Scastie/ScalaFiddle/Intellij-Worksheets/etc., not to mention other languages like Python/Ruby/JS/etc., meaning it would further bifurcate the language and making it more idiosyncratic rather than consolidating it with people’s existing expectations and conventions.

The idea of a file containing top-level statements as the entrypoint of your program has been re-invented over and over in the Scala ecosystem, despite having zero support from the compiler. People clearly want such a feature, even with all the constraints and limitations the existing implementations have.

sjrd · March 25, 2019, 3:49pm

I think your counter-proposal is the one that is backwards, TBH. We have:

An idea of a proposal based on top-level statements, that has a list of known issues, which we have to surmount somehow:
- What’s the name of the object?
- What about the static initialization lock?
- What about hiding implementation details of the main program block from the rest of the package?
- etc.
versus a very simple, syntactical-only rewriting which takes the existing known correct thing to do, which is an object with a def main, and just introduces syntactic sugar for it, with an obvious user-level meaning.

I believe the burden is on supporters of the top-level statements-based approach to give a careful explanation of why they think it can work and address all the issues we’ve had for 15 years with Application and App.

sjrd · March 25, 2019, 3:52pm

@odersky made a compelling argument in-person in favor of the =-less syntax: the syntax with = strongly suggests that I can write

program Bar = println("hello")

val one = Bar
val two = Bar

and that I would somehow end up with one and two both being the result of the println (here, ()), and also that the println would be executed twice.

The {}-based syntax does not have those issues.

sjrd · March 25, 2019, 3:55pm

Please see the thread on top-level definitions. There already are many explanations why that is not possible, notably about the name of the generated class.

In general, anyone who is counter-proposing this proposal with anything based on top-level definitions or top-level statements, please read the other thread, and double-check that your idea hasn’t already been proposed and judged problematic on that side.