Proposal: Main methods (`@main`)

SethTisue · April 17, 2020, 9:58pm

Proposal: Main methods (`@main`)

Hello Scala community!

This thread is the SIP Committee’s request for comments on a proposal to change how main methods work.

Summary of the proposal

The proposal adds a @main annotation, which is handled specially by the compiler. @main may annotate a top-level method or a method in a top-level or otherwise statically accessible object.

The proposal also adds new facilities for argument handling. In Scala 2, main methods always took an Array[String] parameter, which is what the JVM platform itself expects. In the new proposal, a main method can take multiple parameters of varying types, which are converted from strings by a new FromString typeclass.

The proposal also aims to phase out Scala 2’s App trait eventually, in part because App relied on the special DelayedInit trait, a language feature that will very likely be dropped from Scala 3.

For more details of the proposal, see the documentation page on the Dotty site:

Main Methods

For discussion

Is the proposal clear and detailed enough?
Should this proposal be accepted by the committee for Scala 3?
Should the proposal be modified before acceptance?

Time frame

This topic will remain open for at least one month, to allow sufficient time to gather feedback.

Ground rules

As with all SIP threads, please try to keep the thread on-topic and of reasonable length. If a sub-discussion is becoming extensive, it may be best to move it to a separate topic, and then summarize that discussion here with a link, for those interested.

SethTisue · April 17, 2020, 10:04pm

A few questions I hope someone can address:

What about Scala.js and Scala Native?
What do other languages do? (Kotlin, Ceylon, others?)

schrepfler · April 17, 2020, 10:24pm

How will libraries like cats-effect which provide IOApp need to adapt?
Interaction with stdin and stdout is to be delegated to the standard classes?

morgen-peschke · April 18, 2020, 1:34am

The proposed argument handling looks like it’ll be more restrictive than it’ll be helpful.

Perhaps I’m overly fond of experimenting with scripts, but I can’t recall a script that I’ve written in Scala that would be workable with just positional arguments (usually anything that simple is just hacked out in Bash).

I do like the idea of having a typeclass wrapping Seq[String] => A specifically for CLI argument handling, because it’ll be easy to integrate the various libraries that currently exist as backends for that system.

lihaoyi · April 18, 2020, 5:36am

As the person who was the inspiration for this proposal, I think it has promise, but as presented I don’t think it passes the bar for getting hard-coded into the language. With more work, it could be great, but it definitely needs more work.

Extensibility

All the problems basically boil down to extensibility:

People have expressed concerns about how argument parsing works
People have expressed concerns about positional vs named parameters
I will chip with a few more requirements: what about --help text generation and formatting? What about bash autocomplete?
Ammonite’s version of this feature allows (A) multiple main methods per program and (B) allows passing in arguments by their --name
The current proposal is very close, but not quite, enough to satisfy use cases in Mill and Cask which do basically the same thing (in fact, those implementations are all copy-pasted from Ammonite!)

The basic issue is that the compiler should not be dictating parameter parsing for the entire Scala ecosystem with so little thought. If we have a deep discussion and wide consensus on how parameter parsing should look like across the community and have a complete and holistic implementation, then I can accept hard-coding it forever in the compiler. But we don’t!

The sensible thing in this scenario is to make it properly extensible. The compiler can still provide a default, but people should be able to hook in their own logic where necessary to satisfy their own use cases. This will also allow the state of the art to improve over time, rather than having a half-baked implementation set in stone forever. The community can develop their own logic for --help message generation, bash autocomplete, --keyword params, and so on.

Requirements

I have experience implementing similar features in four separate places: Autowire, Ammonite, Mill, and Cask. What’s shared between them? The shared logic is essentially:

Resolve a type-class for handling the method return value, and method name
- Resolve a (potentially different) type-class for parsing every parameter, and store its name, and default value
We need to store the resolved method/parameter metadata somewhere

We can see this in the following existing implementations:

Ammonite’s EntryPoint and ArgSig
Cask’s EntryPoint and ArgSig
Mill’s EntryPoint and ArgSig

Once we have this metadata, the remaining handling can be done in user-land code: whether serving a HTTP endpoint, a main method, or a Mill command-line command.

How To Fix This

The simplest way to fix the current proposal to make it satisfy these requirements is to do two things:

Split up the compiler-level feature from the user-land implementation code. There are three compiler-level features here:
- Resolving typeclasses and other metadata for methods, arguments, and return values
- Storing the metadata somewhere
- Synthesizing a wrapper class and main method that makes use of that metadata
Make the user-land implementation code swappable

Resolving Typeclasses

To make the typeclass resolution logic swappable, we could turn @main from a hardcoded compiler-level annotation to instead support any annotation class that implements a trait. This could be something like:

trait main extends MainMethod[FromString, DummyImplicit]
trait MainMethod[ArgHandler[_], ReturnHandler[_]] extends StaticAnnotation{
  def visitMethod[T: ReturnHandler](name: String): Unit
  def visitArg[T: ArgHandler](name: String, default: => T): Unit
  def visitMethodEnd(): Unit
}

Above I have described a Visitor-pattern API, but we could easily use whatever API style people would prefer. What’s important is that if someone wants to change the typeclass resolution logic, e.g. if we wanted to make our main method take JSON instead of positional parameters, we can do so. (This is not a hypothetical: I do this right now in Cask!)

Storing Metadata

The above visitor-based definition is sufficient to also store the metadata: the visitFoo methods return Unit and are expected to side effect. If the MainMethod trait instantiated as a side-effecting statement right above where the method is defined in the source code, it can then store its metadata whereever the implementor chooses.

Note that while the above specification relies on side effects and the visitor pattern, it is trivial to come up with a specification that works more “purely” by having the MainMethod interface methods return the metadata we care about as a value.

Constructing the Wrapper Class and main method

The above two sections Resolving Typeclasses and Storing Metadata is sufficient for all my use cases in Cask, Mill, and Ammonite: they each have their own launcher code that can inspect the metadata and act accordingly. However, if we make the wrapper class and main method configurable, we could dispense with the custom launchers entirely and converge those implementations closer to plain-old-Scala-programs. The easiest thing would be a change like the following:

 trait main extends MainMethod[FromString, DummyImplicit]
 trait MainMethod[ArgHandler[_], ReturnHandler[_]] extends StaticAnnotation{
   def visitMethod[T: ReturnHandler](name: String): Unit
   def visitArg[T: ArgHandler](name: String, default: => T): Unit
   def visitMethodEnd(): Unit
+ def main0(args: Array[String]): Unit
 }

We could specify that the wrapper class always has a def main(args: Array[String]): Unit entrypoint method that calls new MyMainMethodAnnotationClass{}.main0(args). The official @main entrypoint could then do parameter parsing using the stored method/param metadata via a simple/naive positional approach, but the community could easily override it to do parameter parsing in other ways: adding support for --help text or --keyword params, Ammonite could plug in support for multiple main methods with --keyword params and default values, Cask could make def main0 start the HTTP server and use the metadata for routing, and so on.

Note that the interfaces proposed in this post are rough sketches, so they may have holes and be incomplete. You’ll have to trust me when I say that they can be made to work, since I maintain exactly such interfaces in three widely-used applications.

Conclusion

Overall, I think the idea is a good one, but I do not think the current proposal passes the bar: I think it is too narrow and too incomplete to be worthy of including in the Scala standard library, where once the user passes “hello world” they will find it immediately inadequate and need to discard it. This risks it becoming a “good for slides and tutorials and nothing else” feature which we have to warn people against using: scala.util.parsing all over again.

However, with a bit of extensibility, I think all the concerns can be solved: we just need typeclass resolution and the runtime def main argument parsing to be swappable. Then it doesn’t matter how incomplete the built-in standard library @main is: people who want to use the feature for more realistic workloads will be able to extend it to provide the functionality they need, while at the same time standardizing the whole community on a style of defining program entrypoints that is common throughout a myriad of application domains.

We don’t need to support every use case out of the box, e.g. Cask’s composable/stackable annotations which are very cool are probably out of scope. But we should aim to provide a language feature that can scale to support a developer as they grow throughout their career, and not just at the start. After all, being a “scalable language” is what Scala is all about.

odersky · April 18, 2020, 9:00am

I like the idea of making @main customizable by making it an annotation class with some API that is then used by compiler-generated code. The question is, which API? Let me take your idea and run with it a bit. Since we are talking about a compiler-supported feature, the API should be as simple and small as we can make it. A good guideline is the inherent information content of the @main-annotated function. If we manage to pass exactly that info and nothing extraneous we have achieved our goal.

Here’s a candidate API that captures all available information and that looks minimal to me:

trait MainAnnotation[ParseArgument[_]] extends StaticAnnotation:
  // get single argument
  def getArg[T](argName: String, fromString: ParseArgument[T], defaultValue: => Option[T] = None): Option[T]

  // get varargs argument
  def getArgs[T](argName: String, fromString: ParseArgument[T]): List[T]

  // check that everything is parsed
  def done(): Boolean

A @main annotation should resolve to a class that extends MainAnnotation and that takes
as arguments

the program name as a string
the command line arguments as an array of strings.

For instance, here’s a simple main class:

class main(progName: String, args: Array[String]) extends MainAnnotation[util.FromString]: 
   ...

Let’s illustrate with a simple program:

@main def add(x: Int, y: Int = 1) = println(x + y)

The compiler would generate the following wrapper class for this.

class add:
  def main(args: Array[String]) = 
    val cmd = new main("add", args)
    for
      arg1 <- cmd.getArg[Int]("arg1", summon[FromString[Int]])
      arg2 <- cmd.getArg[Int]("arg2", summon[FromString[Int]], Some(1))
      if cmd.done()
    do
      f(arg1, arg2)

Here’s another program

  @main def layout(itemsPerRow: Int, elems: String*) = 
    for row <- elems.grouped(itemsPerRow) do println(row.mkString(" "))

This would generate the following class

class layout:
  def main(args: Array[String]) = 
    val cmd = new main("layout", args)
    for
      itemsPerRow <- cmd.getArg[Int]("itemsPerRow", summon[FromString[Int]])
      elems <- cmd.getArgs[String]("elems", summon[FromString[String]])
      if cmd.done()
    do
      f(itemsPerRow, elems: _*)

I believe the principle is clear: We first instantiate the main annotation class, passing program name and command line arguments. The result is a command line parser cmd. Then for every argument x: T of the main method we call the method cmd.getArg("x", ParseArgument[T]), passing a default value if one is given in the signature. A vararg parameter of the method leads to a getArgs call. At the end we check that everything is parsed and call the main function with the parsed arguments.

@lihaoyi Would that fit all your use cases?

odersky · April 18, 2020, 9:32am

Here’s a version that is a bit more general than what I showed previously. Previously, every command line parser had to return an argument knowing

the program name and the actual command line
the preceding getArg calls
the argument’s name and string parser.

To achieve full generality we’d like to make it possible for the command line parser to see everything that’s expected to be passed to the main function before assembling any argument values. We can achieve this by changing the return type of getArg from Option[T] to () => T. Any parse failure would then be kept as mutable state in the parser to be acted on when done is called. Then the API would look like this:

trait MainAnnotation[ParseArgument[_]] extends StaticAnnotation:
  // get single argument
  def getArg[T](argName: String, fromString: ParseArgument[T], defaultValue: => Option[T] = None): () => T

  // get varargs argument
  def getArgs[T](argName: String, fromString: ParseArgument[T]): () => List[T]

  // check that everything is parsed
  def done(): Boolean

And the compiler-generated code would look like this:

class add:
  def main(args: Array[String]) = 
    val cmd = new main("add", args)
    val arg1 = cmd.getArg[Int]("arg1", summon[FromString[Int]])
    val arg2 = cmd.getArg[Int]("arg2", summon[FromString[Int]], Some(1))
    if cmd.done() then f(arg1(), arg2())

lihaoyi · April 18, 2020, 9:47am

Thanks for responses Martin! Give me a day or two to stew over this if that’s OK, and I’ll get back to you…

scalway · April 18, 2020, 10:03am

I like this idea but hope to support passing also annotations to it:

//example from Ammonite's @main implementation
@doc("doc for method") 
@main def add(
  x: @doc("doc for argument") Int, 
  y: Int = 1
) = println(x + y)

In ammonite it produces:

$ amm add.sc
Missing argument: (--x: Int)
Arguments provided did not match expected signature:

add
doc for method
  --x  Int: doc for argument
  --y  Int (default 1)

We could simply add:

def getArg[T](
  argName: String, 
  fromString: ParseArgument[T], 
  defaultValue: => Option[T] = None, 
  annotations:Seq[Any] = Seq.empty
): () => T

//or pack all argument data in it's own type:
//trait MethodArgument[T] {
//  //def index:Int
//  def label:String
//  def default:Option[String] = None
//  def annotations:Seq[Any] = Seq.empty
//}

//def getArg[T](arg: MethodArgument[T], fromString: ParseArgument[T]): () => T

amsayk · April 18, 2020, 11:46am

It will be nice to be able to abstract over the return type of the main method to return not just Unit but also F[Unit], e. g: something like IO[Unit],


def main(...): IO[Unit] = {  /* main code */ }

// generates

def main(...): IO[Unit] = summon[MainContext[IO]].run{  /* main code */ }

// where
trait MainContext[F[_]] {
   def run[T](code: => T): F[T]
}

// then some IO implementor
given MainContext[IO] {
 // ...
}

julienrf · April 18, 2020, 12:56pm

I think it’s interesting to compare this with a solution that requires no compiler support at all:

object add extends CommandApp(
  name = "add",
  header = "doc for method",
  main = (
    option[Int]("x", "doc for argument"),
    option[Int]("y", "").withDefault(1)
  ).mapN { (x, y) =>
    println(x + y)
  }
)

It produces the following message:

Missing expected flag --x!
Usage: add --x <integer> [--y <integer>]
doc for method
Options and flags:
    --help
        Display this help text.
    --x <integer>
        doc for argument
    --y <integer>

Try it.

The code is very similar and I would argue that both versions are very readable.

In the approach proposed by @odersky and Ammonite, we construct the “model” of the application arguments from the signature of the “main” method: method argument names become application option names, and repeated arguments and default parameter values are treated accordingly, the @doc annotation can be used to provide argument-specific documentation.

Each of these features requires special treatment by the compiler. We haven’t seen how to handle optional arguments, but this would have to be specified in the language as well.

Another common feature is to have both a long name and a short name for each argument. This would require a specific annotation (I think?).

I’m sympathetic to the idea of deriving the application arguments from the method signature but I’m also wondering if such a system is necessary at all given that we already have a quite expressive way to define an application arguments’ model only at the library-level (with decline or scopt).

szeiger · April 18, 2020, 1:39pm

I’ve always wanted top-level functions with a @main annotation in Scala. This seems like the right way to do it. But I am skeptical about the complexity of the command-line parsing.

For simple use cases (typically testing, benchmarking or data processing code that gets run directly from an sbt build) an array of strings is good enough. Sure, typed arguments would be better, but does it justify all the machinery required to get them?

Once you get to a “proper” command line app (that you intend to package and ship to other users) the single function entry point is probably not good enough so you need to switch to a different model. For example, I recently used decline for an app. The (relatively little) boilerplate that you need to write for this is mostly concerned with documenting the command line options and mapping combinations of options to individual entry points. Setting the whole machinery in motion starting with the Array[String] in your main method is only a single method call.

In my opinion the useful parts that carry their weight are:

Top-level functions with @main annotation
Allow Seq[String] instead of Array[String] for the arguments
Allow returning Int and turn that into a System.exit call in the synthetic main method.

curoli · April 18, 2020, 4:44pm

The proposal creates poor user interfaces.

When a user enters wrong arguments, you would expect an explanation of what the app expects and why that expectation was not met, so for example, instead of

“Illegal command line: java.lang.NumberFormatException: For input string: “sixty””

It should be something like:

“The first argument should be an integer signifying the age. “sixty” is not an integer”"

We are much better off if people use their own hand-written error handling code for simple cases and some library (e.g. scallop) for the more involved.

nafg · April 19, 2020, 3:53am

What is the objective?

If the objective is to have a way to define a JVM entry point that avoids the boilerplate and rigidity of a proper main method, and that avoids the DelayedInit dependency of App, then I agree with Stefan. Once I have a Seq[String] I can choose to use an existing library, or not.

If the objective is to have an out-of-the-box solution for writing CLI apps then we need some kind of library out of the box that handles all the typical use cases of a CLI.

If the objective is to have a boilerplate-free out of the box solution then it needs to be more magical, like interpreting method argument names as command-line parameter names and method argument types as argument parsers. This could be based on special compiler support, or macros if they’re powerful enough in scala 3.

curoli · April 19, 2020, 4:14pm

I’m not sure what is meant by distinguishing between “JVM entry point” and command line interface.

The proposal has one example, which is a command-line interface. If this is to support a “JVM entry point” that is somehow not a command line interface, I would love to see an example of that.

odersky · April 19, 2020, 4:44pm

As always, the better is the enemy of the good. The current scheme hits a sweet spot in that it makes use of the info in the function signature. If I write

def add(number: Int, increment: Int = 1)

I implicitly provide several ways to call the function from the same program. E.g.

add(3, 2)
add(3, increment = 2)
add(3)

The proposal makes exactly the same capabilities available from the command line (or more precisely: such capabilities can be implemented in a class that extends MainAnnotation, and the standard @main annotation would do this).

One could adopt a simpler scheme. For instance, that the only allowed signatures for main methods are:

@main def f(): Unit
@main def f(xs: String*): Unit
@main def f(xs: Array[String]): Unit

Then the argument processing is “just one method call away”. I.e. if I want to connect to my add method defined previously, I could write:

@main def run(xs: String*) = processArgs(xs).mapN(add)

Or something like that. Only, it’s not so simple. My processArgs method has to know what arguments add expects, what the names are, and what possible defaults they have. So there’s a lot of info to pass to processArgs! And it’s duplicated info since the same info already exists in the signature of the add method. One could argue that one should simply bypass add as a method and do something like what @julienrf showed. But then we have definitely left behind beginner-friendly territory. A noob will ask “why can I call this method from my program but not from the command line”. I think this is a very reasonable question… Probably most of us have asked this question ourselves when we started out.

So that’s the argument why one might not want to go with the simplest possible scheme.

The other direction, notably by @scalway, is to go further. Can we have doc strings, and can we please pass all other annotations? I agree this would be nice but at least in its full generality it does not look simple. First, we’d need a way to treat annotations as first class values that we pass around. There is currently no way to do this, unless we buy in into some form of reflection or meta-programming. If main method generation is entangled with this, it makes it a lot more complicated to define and explain, and a lot harder to change. Second, supporting these things would now make main method wrapping more powerful than normal method calls since a normal method call will not be able access the same functionality (without meta-programming).

An intermediate solution would be to just honor some @doc annotation and to pass docstrings into the MainAnnotation methods as strings. That could work. Maybe its worth the added complexity - I’m open-minded about this.

Or maybe we can wait until we have macro annotations. Presumably if main was a macro annotation, it will be able to “see” everything in add including all annotations of itself and its parameters. That would be probably enough to be able to generate docstrings and many other things. But macro annotations are not defined yet, so we cannot rely on them at the present stage.

[EDIT] Actually, instead of @doc annotations, maybe we could just pass the doc-comment of the method to the main annotation? That would be closer to the idea that we want to export the same capabilities we have internally to the command line.

scalway · April 19, 2020, 8:42pm

I like idea with passig doc comment (isn’t it first time when we bring comment to value in standard scala?)
proposed MainAnnotation implementation has in fact build in simple dependency injection mechanism as shown below:

class playMain(...) extends trait MainAnnotation[play.fancyutil.Provide]
implicit def provideConfig:Provide[Config] = ???
implicit def provideApp:Provide[play.Application](implicit cog:Config) = ???
//--- user code ---------------
@playMain def dashboard(app: play.Application, servePath:String) = {
    ...
}

I’m not saying it is good or bad. I just say It will be eventually used in that way.

To be honest I was convinced by Stefan. This feature has great potential but we should care to not limit ourselves in future, and most restrictive implementation with such potential extension in future sounds great to me.

Jasper-M · April 20, 2020, 10:25am

I fear that this could be one of those things that will never cover all possible usecases—apparently even something as seemingly simple as a custom help text. That way it is inevitable that people start using it, only to eventually hit one of those things that can’t be done and having to rewrite everything with a 3rd party library instead.

It might end up being beginner-friendly in the same way that apt-get install scala is beginner-friendly. It’s nice to play around with in the beginning, but once you have to do any serious work you discover that you’re still basically clueless.

Just having some basic functionality would already be a win. Like defining a main method without the object wrapper ceremony. Or being able to define a nillary main method when you don’t care about any possible arguments. Maybe having Seq[String] (backed by an ArraySeq?) args instead of an Array.

LPTK · April 20, 2020, 12:34pm

I don’t think covering all possible use cases was ever a goal of this proposal.

It would be nice if the annotation could just generate the command-line usage info and bind it to a --help command, or alternatively pick that info up from the doc comment of the method.

That seems acceptable to me. Not unlike enums and case classes — they are both super convenient, but once you outgrow them you need to switch to a more verbose and explicit implementation. I still think they are useful features for the majority of cases, especially for beginners.

I really like to have the ability to make a @main function with meaningful parameter types, which can be called both from the program and from the terminal. Now, I think that this would better be done in a library, and that Dotty should provide proper macro annotations instead of ad-hoc implementations of specific features. But I’d rather have it built into Dotty than nothing.

AMatveev · April 20, 2020, 5:43pm

odersky:

One could adopt a simpler scheme. For instance, that the only allowed signatures for main methods are:
@main def f(): Unit
@main def f(xs: String*): Unit
@main def f(xs: Array[String]): Unit

I really like such scheme.

Currently @main def f(xs: Array[String]): Unit leads to an error. It is sad because it seems a leaky abstraction to translate array to list then list to array.

I also like the idea about extensibility. But I think such mechanism should allow to implement more rich scheme(like JCommander).
I think it can be achieved by something like:

  trait MainAnnotation[ParseArgument[_]] extends StaticAnnotation:
    def process(argList: Array[Arg],mainFunc: (Array[AnyRef])=>Unit)

where
mainFunc - some sort of reflection to call main function.

The default implementation of type conversion is not very useful for me.

Proposal: Main methods (`@main`)

Proposal: Main methods (@main)

Summary of the proposal

Related links

For discussion

Time frame

Ground rules

Extensibility

Requirements

How To Fix This

Resolving Typeclasses

Storing Metadata

Constructing the Wrapper Class and main method

Conclusion

Proposal: Main methods (`@main`)