Pre SIP: scala-cli as new scala command

Many thanks @lihaoyi for taking the time to summarize different annotation syntax variants! Several of them are new to me, so I feel enlightened :slight_smile:

There are differences in on-boarding different kinds of groups, comparing e.g. seasoned developers unfamiliar with Scala and complete software engineering beginner students. A seasoned developer probably knows that there are many formats of config and looks up docs, search official getting-started-pages, or try to use tool auto-completion etc. and I think they will figure out how to write what they want - but a complete beginner might have more trouble.

Having a to-the-point explicit syntax of what is going on that reads well is good for both experienced developers and beginners. So I think we should be inspired by existing config approaches but not hesitate to make something that we feel is most to-the-point and intuitive when read by humans. And I actually think our current scheme (or perhaps use instead of using) reads very well compared to all the many various different schemes you have so nicely summarized.

1 Like

I would suggest to add info string to minimize conflicts:

/***using
file: utils.scala
dep: org.scalatest::scalatest:3.2.10
jvm: 11
***/

IMHO: It would be much familiar format :wink:

1 Like

If I compare

//> using file "utils.scala"
//> using dep "org.scalatest::scalatest:3.2.10"
//> using jvm "11"

with

/***
file: utils.scala
dep: org.scalatest::scalatest:3.2.10
jvm: 11
***/

they both seem the same to me from the to-the-point and readability point of view. And I suspect novices will find that too, because I donā€™t think thereā€™s anything about humans as a species that makes them prefer the sequence of characters //> using
And where all things seem equal, I prefer to pick the solution that more resembles already established conventions. In this case I would prefer ā€œFront-matter styleā€ of those presented by @lihaoyi.

But you might be familiar with it. Iā€™m not, and I find it awkwardly wordy and weirdly nonstandard.

The color I like best for the bikeshed, so far, is @AMatveev 's:

/***using
file: Example.scala
jvm: 17
***/

I am moderately in agreement with @lihaoyi that the format should be a known one. Literally just YAML, if we like this syntax. I hesitate because YAML has a lot of features, and if we certainly donā€™t need them, then having all that complexity probably does more harm than good.

If weā€™re sure that everything needs to only be one line, unquoted, and every line is a field name followed by a colon and then text, then I guess itā€™s okay as it is, because the parser is a one-liner:

lines.map{ x => val i = x.indexOf(':'); if i < 0 then (x, "") else (x take i, x.drop(i+1).trim }
1 Like

It can be just a trivial subset of YAML, right?

I donā€™t see much value in declaring something to be an ad-hoc subset of YAML. You pick up potentially annoying syntactic constraints (like not being able to use braces or brackets), and because itā€™s a subset, you canā€™t safely use a YAML writer to produce it.

For the syntax, the final decision is up to you guys and the SIP committee. Iā€™m just here to provide feedback, speaking for the few hundred Scala developers I work with professionally. These are folks with years-decades of experience programming, but not all of it spent in Scala, who would find the proposed syntax a lot weirder than the alternatives discussed. using, in particular, is a keyword/syntax unique to Scala 3, that appears almost nowhere else in the broader programming community. That makes it an unnecessary stumbling block for anyone who is not already a Scala 3 enthusiast.

One more thing Iā€™d like everyone to consider is use cases beyond Scala-CLI. There are tons of reasons why someone may want to associate metadata with a source file:

  1. Turning on/off linters and static analysis: in Scala you have // scalastyle:off <rule id>, import acyclic.file, // format: off, @nowarn, while in other languages you have Rubyā€™s Sorbet # typed: true or Pythonā€™s MyPy # type: ignore . Some of these may need to continue having a comment syntax option to allow use in sub-sections of a file, but for many use cases people want to configure these things on a file-by-file basis as a convenient default. This is especially important for large codebases where linters or static analyses need to be rolled out incrementally over a codebase.

  2. Authorship/ownership/copyright metadata. Not just the standard boilerplate copyright notices, but also things that could tell a CI system "this file belongs to person-XXX/team-YYY, ensure he/they approve it before allowing a change to be merged. Similar to how OWNERS/CODEOWNERS files are used today in some places

  3. Source file selection for platforms: currently people who do cross-version or cross-platform builds do this awkward thing where they configure the build tool to select different source folders based on what scala version/platform they target, resulting in source files being scattered over 12 different folders {2.11,2.12,2.13,3.x}x{jvm,js,native} or more (e.g. consider Ammoniteā€™s 20 different source folders). We could imagine a world where these files could live side-by-side in the same folder, and a build tool would select the files for each platform based on the file metadata statement

Not all of these are obviously good ideas, or exist right now, but they could be good ideas or they may exist in future. I just want to make sure that if weā€™re discussing a standard way of annotating files in the Scala language with metadata, itā€™s considered in a broad perspective and over-fit to any specific tool or use case e.g. Scala-CLIā€™s scripts where using might make more sense. Standards and language features tend to out-live implementations, and people may be benefitting (or suffering) from whatever format we choose long after Scala-CLI no longer exists

3 Likes

With multiple files to compile, scala-cli recommends to put the directives in a single project.scala file.

project.scala file then contains only comments with directives. It is a little odd to explain why a .scala file does not contain any Scala.

If we choose YAML, it will simply be a project.yaml, easier to explain.

Thank you all for the answers!

If we choose YAML, it will simply be a project.yaml, easier to explain.

We are now starting to veer into a build tool territory, which we certainly wanted to avoid at this point. project.scala can contain code, so itā€™s not strictly the same issue, but I do understand why people might want a different format here.

I just want to make sure that if weā€™re discussing a standard way of annotating files in the Scala language with metadata

I was worried that the discussion would turn into this and I would rather see this as a separate SiP. At this point we were finishing most of the issues raised for ScalaCLI and it feel this could postpone the SiP acceptance for months and I believe in the benefits of introducing the ScalaCLI SiP.

Sure, in the end thatā€™s up to the SIP committee. I can just provide my input, but Iā€™m not the one with any part in the deciding

To me, it does make sense to think of the future up front:

  • Assuming ScalaCLI does well, we can expect it to be heavily used for the next decade+. Consider the age of tools like SBT, or the current scala launcher.

  • We can thus expect the syntax that ScalaCLI provides to be ā€œlocked inā€, with dozens of separate groups building support for it: intellij, vscode, scalastyle, scalafmt, maybe SBT and Mill and Ammonite, and tens of thousands of scripts scattered in proprietary codebases

  • If ScalaCLI is successful, we can also expect that many people will both be moving to ScalaCLI from other languages, ā€œgraduatingā€ from ScalaCLI to larger projects, or scaling down from larger projects to smaller scripts in ScalaCLI.

  • It makes sense to try and make those transitions as smooth as possible. That means maximizing familiarity, following existing standards and conventions where possible, and ensuring that any additional concepts we make people learn are transferrable across these environments

  • Given the long time scales involved, spending a bit of time thinking up front about how the proposed syntax would scale to various use cases and appear to various parties would pay dividends years or a decade down the road.

If we just wanted a lightweight script launcher with ad-hoc syntax and uncertain stability guarantees, people can already use Ammonite today, and ScalaCLI is already available for use. But presumably the point of ā€œstandardisingā€ on ScalaCLI is so the file format and syntax can grow to be more than just ā€œspecial syntax used by one specific toolā€

Again, Iā€™m just a voice on the internet, and have no power to approve or reject the proposal. Iā€™m just providing feedback given my experience in the Scala community, how I expect things to play out, and how we can maximize the expected long-term return on this projects investment

4 Likes

A lot of these arguments should ideally have been brought up earlier in the process. In fact, the main reason this surfaces again is because of one major observation: the current syntax was designed with the expectation that it would not be comments, but rather actual Scala syntax. For example,

//> using scala "3.2.2"

@main def hello(): Unit =
  println("hello")

was initially meant to become

using scala "3.2.2"

@main def hello(): Unit =
  println("hello")

The commented form was designed to be a transition stop gap to allow experimentation without breaking IDEs while doing so.

That assumption led to at least two decisions:

  • Reusing an existing keyword, namely using (although implicit was decomposed into given, using and Conversion, somehow the winning argument was to merge using with config :man_shrugging: )
  • Use the syntax of Scala literals for the values of settings, for example using scala "3.2.2" instead of using scala 3.2.2.

The second point precluded reusing existing config languages like YAML. We wanted Scala syntax as a design goal.

That said, eventually the majority opinion was to keep using the comment-based syntax, rather than integrating the config into the language. But at that time, we did not reconsider the two points above, although their driving motivation had disappeared.

This brings us to this discussion, where we have, perhaps belatedly, realized that we should at least reconsider the syntax choice.


After this recontextualization, here is my opinion. Given that we are not bound by following the spirit of the Scala syntax anymore, I think the two points above have become moot.

I think we should explicitly avoid the using keyword (in fact I had already argued that before, without success) so that searchability gets improved, both for the config keyword and the contextual abstraction keyword. implicit was decomposed into 2 keywords and 1 named trait, although it was very defensible to say ā€œbut these three concepts are about terms that the compiler implicitly fill in for usā€. In order to be consistent with the reasoning (and complaints) that led to that decision, we should also separate the config keyword from the contextual abstraction keyword.

Regarding the syntax, I had not in fact (re-)thought about using existing config formats like YAML before this discussion. The comments made before mine make a good case for reusing something that exists.

11 Likes

I would strongly opt against YAML, itā€™s far too complex for what we want to achieve here. We only need key-value configuration, anything more than that will be an overkill. And key-value is also a well known format.

I think dropping using is not an issue for us at all. I donā€™t agree that itā€™s a problem, but itā€™s not difficult to drop it.

I would however stay with the //> syntax since that gives the tooling the ability to highlight those comments differently and it makes it very simple to add automatic completions. Perhaps with /*** comments that would work also, but itā€™s not really a dramatic improvement over what we have currently.

Do developers know of that /*** syntax widely though? As mentioned every tool seems to solve it differently, and in each one users donā€™t seem to have issues using it. All of them have one thing in common, they are simple and familiarity might help us a bit, but only with the subset of users this is familiar for.

EDIT: Last argument from me is that because ScalaCLI has been in an experimental phase it was already used by a huge amount of people. Last release had over 10k downloads and the previous one 29k. So this is already widely used and changing the syntax completely will not be without cost. All out docs, all user examples would need to be changed. So this ship, in my opinion, has sailed already.

3 Likes

It is an important question, how to know what it is. I sometimes solve puzzles what to do with a project in github, how it can be started.
And the following is much easier to google:

/***scala-cli
file: Example.scala
***/

It happens.
Thanks for great work!

To weigh in, I think we should avoid requiring a multi-line syntax, it seems like a lot of boilerplate to require two extra lines just to get started.

Also it makes it more annoying to e.g. copy-paste an ā€œadd this dependency foo:bar:bazā€ string, because now the user has to also manually ensure to surround that with the appropriate escape, e.g. today with scaladex the user just copies //> using lib "foo:bar:baz" directly into the code and it works.

a single line per config seems to have the simplest barrier to entry.

6 Likes

I think thatā€™s a good point. But why

//> using lib "foo:bar:baz"

specifically? Why not

//> lib "foo:bar:baz"

or even

//> lib foo:bar:baz

or

//> lib: foo:bar:baz

or

// @lib foo:bar:baz

?

1 Like

I agree that one line per entry is a good thing, so that rules out the ā€œfencedā€ style. However, I must say I look at the alternatives that @sideeffffect has listed, and every single one looks far better than the //> using syntax being proposed. I can speak for all my colleagues on this as well, most of which are Scala-developers-but-not-enthusiasts

I can believe that ā€œjust key value pairsā€, but the syntax given seems much more complex than that. To the extent that it does seem worth picking something existing off the shelf that people are familiar with

Following the SIP process is one thing, and I agree there is value in processes, but in this case the final proposed syntax is just so ugly that I have to speak up. //> using is really the worst of all worlds:

  • It has a weird symbolic ā€œoperatorā€ prefix

  • Followed by a Scala keyword that is both obscure (to the broader programming community), and out-of-context with an overloaded meaning, and a keyword we just broke off from another keyword to avoid overloading! In fact, we spent a huge amount of effort just recently in Scala 3 removing overloaded meanings for syntax, e.g. splitting up _ to separate *, ?, removing postfix _, etc.

  • Followed by a non-standard not well specified data format. Seems to be more than just key value pairs if we have to start quoting things. What characters need to be quoted? Is whitespace significant? If we can quote things, can we escape quotes within the quoted sections? Can it contain \ns and other similar escape sequences? Is unicode allowed? Why are some things quoted with double-quotes and some by backticks? Why are some things camelCase and some things kebab-case? How come square brackets and pipes are allowed in //> using target ["test"|"main"]? Does it parse like Bash to "[test|main]", or mean something else entirely? I know a lot of languages and at a glance itā€™s not even clear how I should lex these things. The only language I can think of that allows syntax like foo "bar", "baz" are Coffeescript and Ruby, languages whose syntax we really should not be following!

I mean, people rightfully hate on operator overuse, but this is worse. Consider a classic problem operator foo <++= bar: would it be better or worse if we instead wrote it as foo <++= using bar?

Yes, there are a lot of people using ScalaCLI now. But no, changing the syntax does not need to cause breakage: we can leave the old syntax deprecated for compat and push whatever new syntax we desire, and all existing scripts will continue to work.

Lastly, if we assume that ScalaCLI as the default launcher for Scala will be successful (which I think we all do!) then we have to assume there will be 10x-100x more people using it in future than there are today.

Now is indeed the most painful time to make changes compared to any time in the past. But it also the least painful time to make changes compared to any time in the foreseeable future. We should not give up such an opportunity for change before things really become hard to change

4 Likes

From the peanut gallery, a strong +1 here. YAML is enormously complex, and full of weird footguns that sometimes snare even experienced YAML users. Letā€™s please not go down that road.

3 Likes

On the topic of YAML, one thing Iā€™d like to raise is we would not need to support all of it: even a YAML subset has most of the benefits of YAML (human-readability, tool-readability, familiarity, standardization) without the more obscure !!footguns that trip people up. Weā€™d have to write our own parser, but for a subset thatā€™s not particularly hard.

This is akin to using a markdown dialect for scaladoc. Even without ā€œfullā€ markdown, and maybe not 100% compatible with the other dialects out there, itā€™s still a lot better than coming up with our own ad-hoc scaladoc language

This is the approach a lot of frameworks take:

2 Likes

I think we could simplify it to:

//> lib foo:bar:baz

without much further work, but I would opt to stay with //>. I donā€™t think itā€™s the same issue that symbolic operators face just because itā€™s a >. Using /*** would be exactly the same issue and if we have normal comments then there is no sensible way to highlight it and it would become unreadable most of the time.

Compare:

Screenshot from 2023-03-01 15-18-12

with:
Screenshot from 2023-03-01 15-18-29

The quotes are probably a leftover from the previous approach and are no longer relevant.

4 Likes

On the topic of YAML, one thing Iā€™d like to raise is we would not need to support all of it: even a YAML subset has most of the benefits of YAML (human-readability, tool-readability, familiarity, standardization) without the more obscure !!footguns that trip people up. Weā€™d have to write our own parser, but for a subset thatā€™s not particularly hard.

YAML would require having a multiline comment, which as @bishabosha mentioned is not ideal for us. Besides itā€™s way too much for our needs and YAML is notoriously hard to write. Easy to read for sure, hard to write. And if we only need key: value thatā€™s not really yaml anymore in my opinion. And having : would super hard for dependencies:

//> lib: foo:bar:baz

thatā€™s even more colons now. And that colon is not needed.

3 Likes