Proposal: Simplifying the Scala getting started experience

lihaoyi · March 26, 2019, 9:51am

There has been a lot of discussion recently about ways we can simplify people’s getting started experience with Scala, particularly around the definition of main methods or program entrypoints:

I think it’s worth going over why we would want to do this, from first principles. Rather than starting with any specific feature or syntax we want to add, I will review the current state of the world, why exactly an end user would care about us simplifying the Scala getting started experience, and one possible path to get there.

Why not to Simplify Scala’s getting started experience

There are several reasons for simplifying the Scala definition of main methods or program entry points. I think many of them are invalid, and will go through the proposals here.

Currently, defining a Scala program involves the following:

object Bar {
  def main(args: Array[String]): Unit = {
    println("hello world")
  }
}

This is a bunch of boilerplate, with two existing simplifications that have their own issues:

object Bar extends Application {
  println("hello world")
}

This runs the initialization code in the Object’s static initiializer, which has a bunch of issues. An alternative,

object Bar extends App {
  println("hello world")
}

relies on the magic DelayedInit trait, which is deprecated and going away.

Various proposals have slimmed it down to:

object Bar {
  def main: Unit = {
    println("hello world")
  }
}

program Bar = {
  println("hello world")
}

These changes have a cost: we are moving away from the standard JVM style of declaring main methods. But in exchange for that cost, these simplifications buy us very little over an existing user-land helper:

class App2(x: => Unit){
  def main(args: Array[String]): Unit = x
}

object Bar extends App2({
  println("Hello World")
})

This is pretty concise already, and both the def main: Unit proposal and program proposal barely reduce boilerplate to below what is already possible with a helper App2 class as shown above. Thus I do not think they are worth doing.

Why to Simplify Scala’s getting started experience

I propose that the main reason to simplify Scala main method definitions is to smooth out the transition between Scala programs of different size. This section will be written entirely from the user’s point of view: exactly how we should implement internally, whether in the build tool or compiler or somewhere else, is very much not the end user’s problem.

Currently, the following sizes Scala programs exist:

1-liners in the scala REPL
1-10 liners in the amm REPL
10-1000 liners in scala or amm scripts
1000+ liners in projects using sbt, mill, or other build tools.

Currently this is not ideal, for the following reasons:

Needing to swap from the scala REPL to the amm repl for 1-10 line programs is arbitrary and a waste of time
scala scripts are unmaintained, undocumented, and generally not production ready.
Converting from an 10-1000 line amm script to a 1001 line sbt or mill project involves creating a bunch of ancillary files, putting things in subfolders (SBT now suggests a multi-subproject setup by default, and Mill only supports subprojects), and wrapping things in main boilerplate as described above

In an ideal world, someone getting started with Scala would be able to use the same set of tools to grow their program from 1 line, to 10 lines, 100 lines, 1000 lines, and beyond without needing to swap between 3 arbitrarily different code runners (e.g. scala, amm, mill). A newbie should be able to:

Start out in the REPL:

> println("hello world")

Grow the program to multiple lines:

> println("hello world")
  println("i am cow")

Save the program to a file to run

// Hello.sc
println("hello world")
println("i am cow")

Introduce multiple files as the program grows:

// Hello.sc
import $file.Cow
println("hello world")

// Cow.sc
println("i am cow")

Introduce a proper build tool once that becomes useful:

// Hello.sc
hello()
cow()

// Hello.scala
def hello() = println("hello world")

// Cow.sc
def cow() = println("i am cow")

// build.sc or build.sbt
...

And as the project grows, breaking up the single-module-build into individual submodules or subprojects

Each of these workflows should be official, supported, provide a good user experience, and a smooth transition to the next level. That is very much not the case with the current setup of official/poor-UX Scala REPL, official/unsupported Scala Scripts, unofficial/good-UX Ammonite REPL/Scripts, and the messy transition from scripts to a proper SBT/Mill/etc. project.

This should make it much easier for people getting started with Scala:

Software engineers who are just learning Scala can easily start off with a clean, simple environment (REPL, scripts) without needing to worry about SBT configuration and other irrelevant things
When the time comes to upgrade their project, they can do so without needing to re-write their code, re-name all their files, and add irrelevant def main boilerplate.
Professional novices, who may use Scala incidentally but are not professional programmers, can rely on Scala to provide a good user experience for their small programs in the REPL or scripts without ever wanting, or needing, to learn the intricacies of doing things “properly” with a complex build tool like SBT.

Simplifying and stabilizing support for Scala programs of size 1-1000 lines would not just help professional software engineers getting started with Scala. There is a very large class of data scientists, analysts, system admins, devops, mathematicians, mechanical engineers, and others who would fall under this category: these are people who may spend the majority of their career in the 1-1000 line program phase, only occasionally (or even never) needing to break out a “real” build tool like SBT or Mill to do the heavy lifting.

If we agree that this is a place we want to get to, and that these user-facing properties of the Scala language will get us there, the question is: how?

How to Simplify Scala’s getting started experience

To recap, we want to be able to provide good support for Scala programs of varying sizes:

1-liners
1-10 liners
10-1000 liners
1000+ liners

We want people to have a good, supported, official experience using all of them, while having smooth transitions between sizes without needing to spend time uselessly swapping tools, renaming file extensions, or adding boilerplate.

One way we could get there is the following:

Officialize Ammonite as the way to write small Scala programs: amm supports 1-10 liners perfectly fine in the REPL, and supports scripts that go all the way up to 1000ish lines without issue. Transitioning between REPL and Scripts does not need a change of tools: simply copy-paste your REPL command into a file and run amm on that file.
- Despite ongoing improvements, the scala console and script runner are simply not as good as Ammonite for the majority of purposes, and that doesn’t seem like it’s going to change for the foreseeable future.
- There are currently a small number of places where Ammonite is inferior to the default Scala REPL. These can be trivially fixed with O(more-than-just-me) people helping out.
Support *.sc files in the project root as main-method entrypoints in SBT and Mill builds: allow *.sc entrypoints to use *.scala “library” files, but not vice versa, via a trivial desugaring that wraps the contents of a *.sc file in a standard def main(args: Array[String]): Unit method. The name of the main method can take after the name of the file
- The *.sc wrapping could take place either in the build tool, or in the compiler. The end user doesn’t care. The wrapping object could be named arbitrarily (name of file, possibly with package declaration, possibly mangled, or not) since there are no backwards compatibility concerns as *.sc entrypoints do not currently exist
- SBT would need to support top-level code in the root project as a “standard” way of getting started, rather than starting off people by putting things in a subproject
- Mill would need to support top-level modules with top-level code in the first place (it currently doesn’t)
As the program grows further, the developer can then break up their SBT/Mill build into subprojects/modules, according to their preferred code organization.

This is just one possible path to get to our desired goal: that any Scala programmer can use a minimal amount of tools as their program scales from a 1-line throwaway, to 10, 100, 1000+ line projects, and should be doable in a relatively short amount of time (1-2 months?). There are likely other paths that could get us there, and if you’re interested in the end goal, I encourage you to write up alternative approaches that can be compared and discussed.

Conclusion

Essentially, this would give us a world where someone getting started with Scala need to know only two tools: amm, and either one of sbt or mill. They would be able to write programs of any size, from 1 to 10s to 100s to 1000s of lines, all of which in a well-supported, official, production-quality environment, without needing to constantly pick between tools which each have their own idiosyncrasies and problems. Especially for the “professional novice” class of potential Scala users, this would make it possible to use Scala effectively “in the small”, as a supplement to whatever their real job or profession is, without ever needing to become deep experts in Scala’s tooling. This should greatly help Scala’s adoption outside the current demographic of professional software engineers.

It turns out, that this is not a lot of work: all these tools already exist, are well-used and stable, well liked, and have existed for years. All we need to do is agree on the same subset of tools and then put in a minimal amount of effort to smooth out the rough edges and transitions between them. In exchange for that a tiny amount of work, we could get a massive improvement to the Scala getting started experience and make it much easier to attract people, experts and non-experts alike, to grow the Scala community.

smarter · March 26, 2019, 10:57am

Thanks for writing this! I fully agree with the sentiment here, but have some different opinions on the specifics :).

Historically, externalizing our tooling hasn’t worked well for Scala (look at the state of scala-ide). Tools maintained outside of the compiler tend to have regression (because they’re not part of the compiler CI), take time to catch up to new compiler versions, need to duplicate a lot of logic from the compiler, and sometimes just stop being maintained, leaving everyone scrambling for a replacement. This is why for example the Dotty Language Server and the zinc-specific compiler phases are maintained in the Dotty repo instead of somewhere else. The same reasoning applies to the REPL, the Dotty REPL:

Uses JLine3 to provide a nice cross-platform editing experience.
Has tests which are part of our CI, catching issues regularly.
Has syntax highlighting based on the tree that the compiler parser outputs, the implementation is thus both simple and robust, and when we do find syntax highlighting bugs (e.g. because the parser set the wrong position for some tree node), fixing them benefits every other part of the compiler that relies on our parser.
Has completions based on the same APIs we use for the Language Server (http://guillaume.martres.me/ide_paper.pdf and reveal.js talk about this a bit), the same code sharing benefits we get for reusing the parser apply.
Has a :doc command to display scaladoc/javadoc, again using APIs shared with the Language Sever.

And there’s a lot more planned in the same spirit:

Now that we have top-level definitions, we can probably simplify some of the REPL logic concerned with wrapping user code into objects.
The Dotty Language Server already has a Worksheet mode based on .sc files. With top-level defs we can get rid of a lot of logic here too, we just need the compiler to understand that you can write top-level statements in .sc files.
Coursier is getting a pure Java API, which means the compiler can depend on it to support import $ivy:... in the REPL and .sc files.
I’ve had a student work on integrating the REPL with Jupyter, to make this reusable this is based on the IDE of having a “REPL Server Protocol” akin to the Language Server Protocol allowing different frontends (terminal, Jupyter, worksheet) to the REPL.

I think there really is an opportunity for us to get great tooling in Scala 3, just by sharing as much code and infrastructure as possible between the tools and the compiler!

lihaoyi · March 26, 2019, 11:27am

Internalizing the tooling hasn’t worked well for Scala either. The presentation compiler is abandoned. The scala REPL is fossilized with small amounts of progress, and the scala script runner is abandoned. Of course, a simple solution to worrying about it going out of date is to help maintain it. Nobody is contractually obligated to only committing to scala/scala or lampepfl/dotty! Isn’t “community collaboration” something we want more of, v.s. a “core team writes everything themselves” model?

I propose that what matters is whether a tool is maintained or not, not what git repo it happens to live in, and Ammonite has demonstrated its ability to be been much more easily maintained than any of the scala/scala equivalents over the last few years. I do not see why code inlined in lampepfl/dotty would have a different fate.

And there’s a lot more planned in the same spirit:

Yes, you can write everything again, yourself, in repo if you wished to. However, all this already works in Ammonite, and has worked well and without issue for 4 years now.

It seems you’re very excited about being able to write a bunch of code and implement a bunch of features in the compiler. I have the opposite perspective: the less code that needs to be written, the less code that needs to be coupled together, the better. Re-using Ammonite is very much a case of maximal re-use with minimal coupling, while re-writing everything again inside lampepfl/dotty is the opposite.

Of course, the last difference is that we could start using Ammonite yesterday, whereas standardizing on Dotty looks two years out most optimistically. All the other changes in the initial post would take O(1-2 months) of work. It also doesn’t need to be either/or: it’s entirely possible to get people to use Ammonite now, and then converge Ammonite and Dotty-REPL/Scripts over the next 2 years so when Dotty lands as Scala 3 it’s a smooth transition over to Dotty.

EDIT: rephrased some stuff

smarter · March 26, 2019, 11:42am

Yes, but community collaboration also means encouraging people to commit to scala/scala and lampepfl/dotty :).

True, Ammonite is awesome and I’m sure it has a bright future.

I’m much more excited about the code I’ll get to delete thanks to top-level defs than the code I’ll have to write, I hope there’s as little of the latter as possible :).

That sounds like a great plan! This is also my hope and why I’m trying to keep you in the loop every step of the way (like when I pinged you to get your opinion on the top-level defs PR), so we can plan these things in advance.

scalway · March 26, 2019, 11:48am

using amm as standard coul’d be extreamly nice, but it is project that comes with lot of assumptions that should be reconsidered before bless it as the ‘standard’ way:

what to do with special import $ivy.`org.scalaz::scalaz-core:7.2.7`. It is one of most helpful and nice features of amm but it resolves dependencies for us… big thing for scripts. Should we stay with this notation?
what to do with special import $file.*. It is powerful but it does things in totally different way that we are doing right now in standard scala project. We could allow to import another *.sc flies in that way but normal *.scala classes should looks similar to current import foo.Bar. Maybe we should allow to import classes and assume that some folders (e.g. ./src/**.scala or even whole ./**.scala) are scanned while unknown import is in script. What libraries should be allowed in such *.scala files? We define them in our *.sc file. Maybe we should assume that we are using all libraries from starting script? That means we need to recompile all *.scala for every *.sc file because each can use different set of dependencies.
How to parse arguments? I love ammonite’s way of doing it (see:https://ammonite.io/#ScriptArguments). and I hope to see this kind of api in future version of scala.
there is lot more stuff in amm that are great but while they bacome standard we’ll froze them somehow.

Ammonite is great (I’m using it instead of scala REPL few years now) and agree with Li that it is stable (for me even more than scala repl which does not recognize terminal on my machine due to some parsing error and arrows does not work anymore).

I agree that having all code near compiler gives us some additional guaranties but also allows to higher coupling. Maybe we should just choose some of ammonite features and try to standardize them, and slowly try to implement them in scala/dotty REPL?

Jasper-M · March 26, 2019, 12:38pm

Question: does ammonite already work with the dotty compiler? If not, how hard would it be to make it work?

scalway · March 26, 2019, 12:56pm

~~rather not but Ammonite’s terminal module is being used as a base for the Dotty REPL~~

smarter · March 26, 2019, 1:02pm

This used to be the case but we replaced it with JLine3 to get Windows support: https://github.com/lampepfl/dotty/pull/4680

lihaoyi · March 26, 2019, 1:06pm

Does not currently work. Shouldn’t be too hard to make it work, but Dotty has been unstable so it hasn’t been worth trying to play compatibility games with such a fast moving target.

Ammonite’s scala compiler interface is pretty well encapsulated; most of the codebase just deals with strings and/or classfiles, so swapping in the dotty compiler should be a pretty small diff.

jducoeur · March 26, 2019, 1:29pm

In general I like the analysis, but have to note that it’s rather command-line centric. In practice, an awful lot of the community doesn’t come in via that route at all – instead, they come in via IntelliJ.

Now, I think IntelliJ’s current support for this stuff is fairly awful: I’ve never been able to get their worksheets to work right. But the model seems reasonable and desireable. So whatever solution we arrive at, I don’t think it should be specific to command-line tools. It should also support having the 1-1000 range done directly as worksheets in an IDE. (Whether that be IntelliJ, VSCode, or whatever.)

RichType · March 26, 2019, 1:40pm

Can’t Windows be handled through bash for Windows?

lihaoyi · March 26, 2019, 1:42pm

Maybe, but it’s a moot point since Ammonite supports windows through JLine3 as well

smarter · March 26, 2019, 1:43pm

If you can get it to compile then we’d be very happy to add it to our community build to make sure it doesn’t regress.

smarter · March 26, 2019, 1:55pm

(Same for anyone’s library by the way, we’re desperate for more projects in our community build!)

sjrd · March 26, 2019, 2:25pm

How would Ammonite’s import $file and import $ivy work in a context where .sc files and .scala coexist? What if an sbt or Mill build is supposed to deal with such files in a larger codebase with .scala files? Is the build tool going to hijack the import $ivys? The compiler itself is never going to resolve $ivy imports, and will probably never deal with $file imports either. So we can’t just delegate the treatment of .sc files to scalac/dotc with a simple wrapping, at least not with those magic imports available.

I’m concerned that this jeopardizes the smooth transition that you are calling for at the time of the transition between .sc-alone and .scala files/build tools thrown into the mix.

smarter · March 26, 2019, 2:27pm

I think it could through GitHub - coursier/interface: Lightweight coursier API as I mentioned above.

sjrd · March 26, 2019, 2:40pm

I don’t think it should. I really don’t think the compiler should have anything to do with resolving things from the Internet. That’s not the compiler’s job IMO.

smarter · March 26, 2019, 2:52pm

No, it’s coursier’s job, but I don’t think it really matters who calls coursier.

Ichoran · March 26, 2019, 2:55pm

I agree, but the compiler should help integrate with external tooling so it’s not very hard for something to call it and recover line numbers in the original source not whatever preprocessed thing was sent to the compiler.

If the compiler is a service, your tooling can do things that are important but not-actually-compilation. (Things are going this way already for IDE integration, but I haven’t looked into that enough to know whether it’s adequate. The demands of IDEs are somewhat different from the demands of preprocessors.)

lihaoyi · March 26, 2019, 3:10pm

I think forcing someone to move the magic import $ivy metadata into the build file config is an acceptable amount of overhead for switching to a build tool. Definitely not 0 overhead, but not awful. Presumably you want to configure things using your build tool, otherwise you wouldnt be using a build tool in the first place!

On the subject of import $file, perhaps the way forward is to teach Ammonite how to deal with *.scala files, and make that the standard going forward for imported libraries: *.sc would then be relegated to entrypoint scripts only. The old importing of *.sc can remain for backwards compatibility.

I agree with the “compiler shouldn’t do all these random things” argument. Compiler focus ling on compiling, and Ammonite focusing on all the peripheral concerns, has worked out much better than scala.tools.nsc.interpreter where it’s all lumped together in a highly coupled monolith.

It is already easy enough to fix line numbers after you’re done wrapping/mangling code before feeding it into the compiler, so I don’t think that’s a good argument for the compiler needing to get any smarter