Pre SIP: scala-cli as new scala command

bjornregnell · February 28, 2023, 5:26pm

Many thanks @lihaoyi for taking the time to summarize different annotation syntax variants! Several of them are new to me, so I feel enlightened

There are differences in on-boarding different kinds of groups, comparing e.g. seasoned developers unfamiliar with Scala and complete software engineering beginner students. A seasoned developer probably knows that there are many formats of config and looks up docs, search official getting-started-pages, or try to use tool auto-completion etc. and I think they will figure out how to write what they want - but a complete beginner might have more trouble.

Having a to-the-point explicit syntax of what is going on that reads well is good for both experienced developers and beginners. So I think we should be inspired by existing config approaches but not hesitate to make something that we feel is most to-the-point and intuitive when read by humans. And I actually think our current scheme (or perhaps use instead of using) reads very well compared to all the many various different schemes you have so nicely summarized.

AMatveev · February 28, 2023, 7:07pm

I would suggest to add info string to minimize conflicts:

/***using
file: utils.scala
dep: org.scalatest::scalatest:3.2.10
jvm: 11
***/

IMHO: It would be much familiar format

sideeffffect · February 28, 2023, 11:01pm

If I compare

//> using file "utils.scala"
//> using dep "org.scalatest::scalatest:3.2.10"
//> using jvm "11"

with

/***
file: utils.scala
dep: org.scalatest::scalatest:3.2.10
jvm: 11
***/

they both seem the same to me from the to-the-point and readability point of view. And I suspect novices will find that too, because I don’t think there’s anything about humans as a species that makes them prefer the sequence of characters //> using
And where all things seem equal, I prefer to pick the solution that more resembles already established conventions. In this case I would prefer “Front-matter style” of those presented by @lihaoyi.

Ichoran · February 28, 2023, 11:40pm

But you might be familiar with it. I’m not, and I find it awkwardly wordy and weirdly nonstandard.

The color I like best for the bikeshed, so far, is @AMatveev 's:

/***using
file: Example.scala
jvm: 17
***/

I am moderately in agreement with @lihaoyi that the format should be a known one. Literally just YAML, if we like this syntax. I hesitate because YAML has a lot of features, and if we certainly don’t need them, then having all that complexity probably does more harm than good.

If we’re sure that everything needs to only be one line, unquoted, and every line is a field name followed by a colon and then text, then I guess it’s okay as it is, because the parser is a one-liner:

lines.map{ x => val i = x.indexOf(':'); if i < 0 then (x, "") else (x take i, x.drop(i+1).trim }

sideeffffect · March 1, 2023, 12:18am

It can be just a trivial subset of YAML, right?

Ichoran · March 1, 2023, 1:17am

I don’t see much value in declaring something to be an ad-hoc subset of YAML. You pick up potentially annoying syntactic constraints (like not being able to use braces or brackets), and because it’s a subset, you can’t safely use a YAML writer to produce it.

lihaoyi · March 1, 2023, 3:11am

For the syntax, the final decision is up to you guys and the SIP committee. I’m just here to provide feedback, speaking for the few hundred Scala developers I work with professionally. These are folks with years-decades of experience programming, but not all of it spent in Scala, who would find the proposed syntax a lot weirder than the alternatives discussed. using, in particular, is a keyword/syntax unique to Scala 3, that appears almost nowhere else in the broader programming community. That makes it an unnecessary stumbling block for anyone who is not already a Scala 3 enthusiast.

One more thing I’d like everyone to consider is use cases beyond Scala-CLI. There are tons of reasons why someone may want to associate metadata with a source file:

Turning on/off linters and static analysis: in Scala you have // scalastyle:off <rule id>, import acyclic.file, // format: off, @nowarn, while in other languages you have Ruby’s Sorbet # typed: true or Python’s MyPy # type: ignore . Some of these may need to continue having a comment syntax option to allow use in sub-sections of a file, but for many use cases people want to configure these things on a file-by-file basis as a convenient default. This is especially important for large codebases where linters or static analyses need to be rolled out incrementally over a codebase.
Authorship/ownership/copyright metadata. Not just the standard boilerplate copyright notices, but also things that could tell a CI system "this file belongs to person-XXX/team-YYY, ensure he/they approve it before allowing a change to be merged. Similar to how OWNERS/CODEOWNERS files are used today in some places
Source file selection for platforms: currently people who do cross-version or cross-platform builds do this awkward thing where they configure the build tool to select different source folders based on what scala version/platform they target, resulting in source files being scattered over 12 different folders {2.11,2.12,2.13,3.x}x{jvm,js,native} or more (e.g. consider Ammonite’s 20 different source folders). We could imagine a world where these files could live side-by-side in the same folder, and a build tool would select the files for each platform based on the file metadata statement

Not all of these are obviously good ideas, or exist right now, but they could be good ideas or they may exist in future. I just want to make sure that if we’re discussing a standard way of annotating files in the Scala language with metadata, it’s considered in a broad perspective and over-fit to any specific tool or use case e.g. Scala-CLI’s scripts where using might make more sense. Standards and language features tend to out-live implementations, and people may be benefitting (or suffering) from whatever format we choose long after Scala-CLI no longer exists

mushtaq · March 1, 2023, 4:51am

With multiple files to compile, scala-cli recommends to put the directives in a single project.scala file.

project.scala file then contains only comments with directives. It is a little odd to explain why a .scala file does not contain any Scala.

If we choose YAML, it will simply be a project.yaml, easier to explain.

tgodzik · March 1, 2023, 8:42am

Thank you all for the answers!

If we choose YAML, it will simply be a project.yaml, easier to explain.

We are now starting to veer into a build tool territory, which we certainly wanted to avoid at this point. project.scala can contain code, so it’s not strictly the same issue, but I do understand why people might want a different format here.

I just want to make sure that if we’re discussing a standard way of annotating files in the Scala language with metadata

I was worried that the discussion would turn into this and I would rather see this as a separate SiP. At this point we were finishing most of the issues raised for ScalaCLI and it feel this could postpone the SiP acceptance for months and I believe in the benefits of introducing the ScalaCLI SiP.

lihaoyi · March 1, 2023, 9:13am

Sure, in the end that’s up to the SIP committee. I can just provide my input, but I’m not the one with any part in the deciding

To me, it does make sense to think of the future up front:

Assuming ScalaCLI does well, we can expect it to be heavily used for the next decade+. Consider the age of tools like SBT, or the current scala launcher.
We can thus expect the syntax that ScalaCLI provides to be “locked in”, with dozens of separate groups building support for it: intellij, vscode, scalastyle, scalafmt, maybe SBT and Mill and Ammonite, and tens of thousands of scripts scattered in proprietary codebases
If ScalaCLI is successful, we can also expect that many people will both be moving to ScalaCLI from other languages, “graduating” from ScalaCLI to larger projects, or scaling down from larger projects to smaller scripts in ScalaCLI.
It makes sense to try and make those transitions as smooth as possible. That means maximizing familiarity, following existing standards and conventions where possible, and ensuring that any additional concepts we make people learn are transferrable across these environments
Given the long time scales involved, spending a bit of time thinking up front about how the proposed syntax would scale to various use cases and appear to various parties would pay dividends years or a decade down the road.

If we just wanted a lightweight script launcher with ad-hoc syntax and uncertain stability guarantees, people can already use Ammonite today, and ScalaCLI is already available for use. But presumably the point of “standardising” on ScalaCLI is so the file format and syntax can grow to be more than just “special syntax used by one specific tool”

Again, I’m just a voice on the internet, and have no power to approve or reject the proposal. I’m just providing feedback given my experience in the Scala community, how I expect things to play out, and how we can maximize the expected long-term return on this projects investment

sjrd · March 1, 2023, 9:29am

A lot of these arguments should ideally have been brought up earlier in the process. In fact, the main reason this surfaces again is because of one major observation: the current syntax was designed with the expectation that it would not be comments, but rather actual Scala syntax. For example,

//> using scala "3.2.2"

@main def hello(): Unit =
  println("hello")

was initially meant to become

using scala "3.2.2"

@main def hello(): Unit =
  println("hello")

The commented form was designed to be a transition stop gap to allow experimentation without breaking IDEs while doing so.

That assumption led to at least two decisions:

Reusing an existing keyword, namely using (although implicit was decomposed into given, using and Conversion, somehow the winning argument was to merge using with config )
Use the syntax of Scala literals for the values of settings, for example using scala "3.2.2" instead of using scala 3.2.2.

The second point precluded reusing existing config languages like YAML. We wanted Scala syntax as a design goal.

That said, eventually the majority opinion was to keep using the comment-based syntax, rather than integrating the config into the language. But at that time, we did not reconsider the two points above, although their driving motivation had disappeared.

This brings us to this discussion, where we have, perhaps belatedly, realized that we should at least reconsider the syntax choice.

After this recontextualization, here is my opinion. Given that we are not bound by following the spirit of the Scala syntax anymore, I think the two points above have become moot.

I think we should explicitly avoid the using keyword (in fact I had already argued that before, without success) so that searchability gets improved, both for the config keyword and the contextual abstraction keyword. implicit was decomposed into 2 keywords and 1 named trait, although it was very defensible to say “but these three concepts are about terms that the compiler implicitly fill in for us”. In order to be consistent with the reasoning (and complaints) that led to that decision, we should also separate the config keyword from the contextual abstraction keyword.

Regarding the syntax, I had not in fact (re-)thought about using existing config formats like YAML before this discussion. The comments made before mine make a good case for reusing something that exists.

tgodzik · March 1, 2023, 10:03am

I would strongly opt against YAML, it’s far too complex for what we want to achieve here. We only need key-value configuration, anything more than that will be an overkill. And key-value is also a well known format.

I think dropping using is not an issue for us at all. I don’t agree that it’s a problem, but it’s not difficult to drop it.

I would however stay with the //> syntax since that gives the tooling the ability to highlight those comments differently and it makes it very simple to add automatic completions. Perhaps with /*** comments that would work also, but it’s not really a dramatic improvement over what we have currently.

Do developers know of that /*** syntax widely though? As mentioned every tool seems to solve it differently, and in each one users don’t seem to have issues using it. All of them have one thing in common, they are simple and familiarity might help us a bit, but only with the subset of users this is familiar for.

EDIT: Last argument from me is that because ScalaCLI has been in an experimental phase it was already used by a huge amount of people. Last release had over 10k downloads and the previous one 29k. So this is already widely used and changing the syntax completely will not be without cost. All out docs, all user examples would need to be changed. So this ship, in my opinion, has sailed already.

AMatveev · March 1, 2023, 10:33am

It is an important question, how to know what it is. I sometimes solve puzzles what to do with a project in github, how it can be started.
And the following is much easier to google:

/***scala-cli
file: Example.scala
***/

It happens.
Thanks for great work!

bishabosha · March 1, 2023, 11:10am

To weigh in, I think we should avoid requiring a multi-line syntax, it seems like a lot of boilerplate to require two extra lines just to get started.

Also it makes it more annoying to e.g. copy-paste an “add this dependency foo:bar:baz” string, because now the user has to also manually ensure to surround that with the appropriate escape, e.g. today with scaladex the user just copies //> using lib "foo:bar:baz" directly into the code and it works.

a single line per config seems to have the simplest barrier to entry.

sideeffffect · March 1, 2023, 12:50pm

I think that’s a good point. But why

//> using lib "foo:bar:baz"

specifically? Why not

//> lib "foo:bar:baz"

or even

//> lib foo:bar:baz

or

//> lib: foo:bar:baz

or

// @lib foo:bar:baz

?

lihaoyi · March 1, 2023, 1:16pm

I agree that one line per entry is a good thing, so that rules out the “fenced” style. However, I must say I look at the alternatives that @sideeffffect has listed, and every single one looks far better than the //> using syntax being proposed. I can speak for all my colleagues on this as well, most of which are Scala-developers-but-not-enthusiasts

I can believe that “just key value pairs”, but the syntax given seems much more complex than that. To the extent that it does seem worth picking something existing off the shelf that people are familiar with

Following the SIP process is one thing, and I agree there is value in processes, but in this case the final proposed syntax is just so ugly that I have to speak up. //> using is really the worst of all worlds:

It has a weird symbolic “operator” prefix
Followed by a Scala keyword that is both obscure (to the broader programming community), and out-of-context with an overloaded meaning, and a keyword we just broke off from another keyword to avoid overloading! In fact, we spent a huge amount of effort just recently in Scala 3 removing overloaded meanings for syntax, e.g. splitting up _ to separate *, ?, removing postfix _, etc.
Followed by a non-standard not well specified data format. Seems to be more than just key value pairs if we have to start quoting things. What characters need to be quoted? Is whitespace significant? If we can quote things, can we escape quotes within the quoted sections? Can it contain \ns and other similar escape sequences? Is unicode allowed? Why are some things quoted with double-quotes and some by backticks? Why are some things camelCase and some things kebab-case? How come square brackets and pipes are allowed in //> using target ["test"|"main"]? Does it parse like Bash to "[test|main]", or mean something else entirely? I know a lot of languages and at a glance it’s not even clear how I should lex these things. The only language I can think of that allows syntax like foo "bar", "baz" are Coffeescript and Ruby, languages whose syntax we really should not be following!

I mean, people rightfully hate on operator overuse, but this is worse. Consider a classic problem operator foo <++= bar: would it be better or worse if we instead wrote it as foo <++= using bar?

Yes, there are a lot of people using ScalaCLI now. But no, changing the syntax does not need to cause breakage: we can leave the old syntax deprecated for compat and push whatever new syntax we desire, and all existing scripts will continue to work.

Lastly, if we assume that ScalaCLI as the default launcher for Scala will be successful (which I think we all do!) then we have to assume there will be 10x-100x more people using it in future than there are today.

Now is indeed the most painful time to make changes compared to any time in the past. But it also the least painful time to make changes compared to any time in the foreseeable future. We should not give up such an opportunity for change before things really become hard to change

jducoeur · March 1, 2023, 1:38pm

From the peanut gallery, a strong +1 here. YAML is enormously complex, and full of weird footguns that sometimes snare even experienced YAML users. Let’s please not go down that road.

lihaoyi · March 1, 2023, 1:49pm

On the topic of YAML, one thing I’d like to raise is we would not need to support all of it: even a YAML subset has most of the benefits of YAML (human-readability, tool-readability, familiarity, standardization) without the more obscure !!footguns that trip people up. We’d have to write our own parser, but for a subset that’s not particularly hard.

This is akin to using a markdown dialect for scaladoc. Even without “full” markdown, and maybe not 100% compatible with the other dialects out there, it’s still a lot better than coming up with our own ad-hoc scaladoc language

This is the approach a lot of frameworks take:

tgodzik · March 1, 2023, 2:20pm

I think we could simplify it to:

//> lib foo:bar:baz

without much further work, but I would opt to stay with //>. I don’t think it’s the same issue that symbolic operators face just because it’s a >. Using /*** would be exactly the same issue and if we have normal comments then there is no sensible way to highlight it and it would become unreadable most of the time.

Compare:

Screenshot from 2023-03-01 15-18-12

with:
Screenshot from 2023-03-01 15-18-29

The quotes are probably a leftover from the previous approach and are no longer relevant.

tgodzik · March 1, 2023, 2:31pm

On the topic of YAML, one thing I’d like to raise is we would not need to support all of it: even a YAML subset has most of the benefits of YAML (human-readability, tool-readability, familiarity, standardization) without the more obscure !!footguns that trip people up. We’d have to write our own parser, but for a subset that’s not particularly hard.

YAML would require having a multiline comment, which as @bishabosha mentioned is not ideal for us. Besides it’s way too much for our needs and YAML is notoriously hard to write. Easy to read for sure, hard to write. And if we only need key: value that’s not really yaml anymore in my opinion. And having : would super hard for dependencies:

//> lib: foo:bar:baz

that’s even more colons now. And that colon is not needed.