Pre SIP: scala-cli as new scala command

Ichoran · March 3, 2023, 8:44pm

I don’t think use or using does anything but add clutter. If the commands themselves are clear, they document what they’re doing without use. If they are not clear, they probably should be improved anyway. In the rare case where it’s only clear with use you can prepend use- to the keyword. As long as //> has to be at the beginning of a line, it’s hard to mistake.

If you actually want it to be clear that there is magic contained in that line, neither use nor using does the trick. I don’t think it even helps. use scala 3.2.1 looks just like something you’d write to a programmer, not a directive for a tool. If you want to emphasize that it’s magic, you need something that blatantly is not a comment, like directive or pragma (what the heck is a pragma?) or scalacli or even #. (# is probably better than @ because they’re not annotations, and @ suggests that they are; # also means “whoa, something is up here, pay attention!”.)

Jasper-M · March 3, 2023, 9:42pm

If use is and will always be the only possible type of directive, then the keyword is redundant and there’s not much value in having it. Especially if you already have the special comment style for indicating where the directives go.

bjornregnell · March 4, 2023, 1:00pm

I think 3 letter boilerplate can be good if it adds clarity, colors, and lend itself to tooling support etc.

Ichoran · March 4, 2023, 6:36pm

I agree that if it can add those things, it can be good.

However, the contention is that it isn’t any more clear, and I don’t see how //> is any harder for tooling to detect than //> use…and you can color //> in the IDE or with ANSI in the terminal (and any tool that doesn’t know about it will not color the use because to that tool it looks like part of a comment!).

So I don’t see how use gets us anything at this point. Once upon a time it may have made sense–e.g. if you were going to open a multi-line block with use: (or using:). But as currently envisioned, it doesn’t seem to have much purpose.

ragnar · March 5, 2023, 3:28pm

Maybe I am missing some updated SIP document? In it’s current state, a lot of the proposal seems quite confusing to me.

The rest is a bit of a “documenting confusions as I go through the SIP”, with my conclusions at the end. Feel free to take this or leave it as you may, I bring no leverage beyond these words

Note that this is specifically an answer to @bjornregnell’s conclusion – I skimmed parts of the thread, and seem to echo some concerns that have been raised aplenty, but I feel are ignored in the summary. I may be missing some other issues raised in the thread, as I have not read everything.

Confusions about abstract directive syntax in the SIP

Promote SIP-46 from experimental to stable, with the addition that using is changed to use

The linked sip 46 does contain (unspecified) examples of multiline syntax, I assume from context that you support dropping those?

I don’t think arguments for changing to YAML or X or Y or Z are really that strong, as there are so many different configuration languages out there and on-boarding will only be easier for the ones that happen to know the particular language chosen.

I somewhat agree that compatibility with existing formats is not that strong.
However, I do agree that the current syntax is both ad-hoc and not well specified.
Examples from the SIP:

//> using "com.lihaoyi::os-lib:0.7.8"

No key? Is this a typo? A shorthand for libraries?

//> using scala 2.13.8

What is a bare 2.13.8? Should that be a string?
It would be nice to not have it be a string, but the pseudo grammar does not include such.

//> using java-options "..."

/*> using 
  Scala 3
  option "-Xfatal-warning"
*/

Multiline syntax that is likely dropped? Is it?

//> using someSettings { setting1 value 1; setting2; }

Example for multiple settings, would that also work without someSettings i.e., can I rewrite the multiline example from above to: //> using { Scala 3; option "-Xfatal-warning" } ?

Proposed directives seem to not make use of the syntax

Most of the syntax seems to be due to “some experiment in scala-cli“. If I look at the “MUST have directives” I see the following points:

• ident.subident is never used
• if anything could be considered a hierarchical key at all it would be java-options and java-home (latter is a should have). The “should have section” also includes many native-* keys.
• The above all have to be quoted using `.
• all directives use string arguments.

It seems that the directives just need

//> using some-key "value"

As the only supported format.
(Where some-key is an unquoted (Scala) string without whitespace, “value” is a quoted (Scala) string)

Scrolling through a search for //> using on github the only exception I have seen are one target.platform "..." and one publish.XXX "..." None of which are supported by the SIP.

List of must/should have directives seems ad-hoc

• Directives do not always mirror the CLI use. There was some change from lib to dep to be more consistent. In general, I think it would be extremely valuable to have all directives mirror a CLI argument (so users don’t have to learn two new syntaxes).
• Directives seem to have tons of aliases, for example “javaOpt, javaOptions, java-opt, java-options”.
• Naming seems inconsistent, there is “native-mode”, and “jsMode”.

Concluding musings

The CLI part of the proposal (make scala-cli the default tool) seems fine to me. I think scala-cli has a lot of idiosyncrasies. But I don’t see how that would be a problem, as it is a tool not really a standard.

The using directives however introduce a standard Scala build file format. I think it is dangerous that people try to pretend that it does not. Are Metals, IntelliJ, sbt, mill, scastie, etc. recommended implementing this standard, or be recommended to not implement this standard? What is the intention here? What will be supported in the future?
Note, I think that scala-cli itself is well-scoped in this regard, but the impact of transitioning ad-hoc definitions needed by scala-cli the tool is not explored/discussed by this SIP.

If I could veto the proposal I would do so with the following conditions for acceptance:
• It should be made explicit if defining a standard Scala build definition format is the goal.
• The directive syntax should be well-defined (such that two independently written parsers are extremely likely to interpret it the same).
• The directive syntax should be scoped to address the concrete proposed use case.
• The naming of directives should be systematic, and there should be only one directive.

lihaoyi · March 5, 2023, 7:34pm

I find this part particularly a non-sequitor.

For the first part, doesn’t everyone know how widespread YAML is in industry? Every developer would have encountered it. Every language will have a parser and seriaizer for it. Even not-really-developers tweaking HTML in their shopify ecommerce stores would have seen it, as will the most hard-core infrastructure folks writing k8s/cloudformation/etc. templates.

Are we really saying “YAML is used by 100s of different programming languages in the broader community and everyone from the least sophisticated to the most advanced users, while our own syntax is used by ourselves and nobody else, and that’s equivalent”?

Furthermore, the cost of ripping up the whole directives syntax from its roots is pretty high, and the experimental stage of this SIP was entered with a general approval of the current design philosophy of the directives.

For the second half, it seems from @sjrd that the syntax has already gone through a lot of changes, and from @tgodzik it seems we are not above continuing to make changes in response to feedback even in the middle of this discussion. I mean, that’s the point of the review/RFC process right? If we thought “it’s too expensive to make changes”, we wouldn’t even be discussing this.

I understand it might be frustrating to pivot the syntax a lot of times, but that too seems entirely expected as part of the process. It’s even in the name, “experimental”: a chance to creatively experiment with things up-front without as strict a format review, with the assumption that some stricter review will be performed before the “experimental” tag is lifted.

Besides, I find it hard to believe it will be that expensive to make changes.

Keeping the old syntax around (just deprecated) is an easy way to maintain backwards-compatibility and zero breakage to users (just deprecation of the old way).
Presumably Scala-CLI is already parsing the syntax into some kind of KV/nested-text/JSON-like structure, and it should be possible to swap out the parser for SnakeYaml or some lightweight/restricted equivalent that generates the same data structure, and the rest of Scala-CLI’s implementation code can continue to work unchanged

I’m just not seeing what’s so expensive here, it seems perfectly doable with neither user-facing breakage nor unreasonable implementation work. Am I missing something that would become obvious if I tried to put together a ScalaCLI pull request implementing these things?

I’m not just saying this from a user perspective: I also would like to leverage Scala-CLI to substitute some of the more bespoke/idiosyncratic syntaxes and implementations that Ammonite and Mill scripts currently use. This would result in broader standardization across the Scala ecosystem and lower cognitive burden to all developers.

But for that, I’d like Scala-CLI to actually be less bespoke/idiosyncratic than Ammonite/Mill! From the current proposal, it seems like leveraging it would be more of a lateral change rather than a strict improvement here, replacing once set of bespoke idiosyncracies with another set.

Maybe you guys are only interested in the Scala-CLI use case, but to me it would be missing a golden opportunity for standardization. Standardization kind of sucks sometimes, having to get such a wide range of people into agreement. But I think the potential benefit to Scala here is huge, which is why I spent so much time arguing in this thread.

bjornregnell · March 5, 2023, 9:55pm

Many thanks @ragnar for your thorough review of the text in the public SIP - I’m unsure if latest changes has been published. The SIP needs to be updated, e.g. as the multi-line syntax is dropped now.

@tgodzik Could you in the scala-cli team at Virtuslab check and perhaps PR (some of) the findings/inconsistencies in the review by @ragnar that you think are relevant to correct?

bjornregnell · March 5, 2023, 10:01pm

Thanks for taking the time and effort to explain your views! I think there is a cost of changing (also docs, code, blogs etc has been written…) but if it is too high or not depends on what you compare with. I get your point on standardization across scala tools, such as other build tools and IDE:s, but I’m not sure if we have this level of coordination here. Anyway, I’ll think more about this, while input continues in this thread; there is still some time until next SIP meeting for contemplation. Thanks again.

Ichoran · March 6, 2023, 12:58am

The question shouldn’t be about how widespread it is, but how good a match it is to the feature-set. If you only ever need a screwdriver, why drag a whole toolbox along?

In particular, if only thing we need is one-liner key-value pairs, then

\\> key this is the value to end of line

is easier to parse and manipulate with every vanilla programming language’s string-handling than is anyone’s YAML library.

Look, a parser:

def extractDirectives(lines: IterableOnce[String]): Map[String, String] =
  lines.filter(_ startsWith "//>").map(trim).
    map{ line =>
      val key = line.takeWhile(! _.isWhitespace)
      key -> line.drop(key.length).trim
    }.
    toMap

And a writer with exception-handling:

def printDirectives(directives: Map[String, String]): Seq[String] =
  directives.toSeq.map{ case (key, value) =>
    if key.exists(_.isWhitespace) then
      throw new Exception(s"Key '$key' has whitespace and can't be stored")
    if value.exists(_ == '\n') || value.trim != value then
      throw new Exception(s"Value '$value' has space at ends or newlines")
    s"//> $key $value"
  }

Done! We have a complete parser and writer.

The specification is super-short:

ScalaCLI directives consist of a key, consisting of
one or more characters with no whitespace, and a
value, consisting of zero or more characters on a
single line.  The key appears after `//>` at the
beginning of a line.  The value appears after a space
after the key.  To increase readability, space is
allowed before the key too.

Example:
    vvvvv----------- This is the key
//> scala 3.2.1
          ^^^^^----- This is the value
      vvvv-------------------- key
//>   gorp  org.whatever boop
            ^^^^^^^^^^^^------ value

gorp is not a ScalaCLI key, but that line
defines a valid key-value pair.

Compared to YAML, even for a YAML expert, this should be super-easy to understand, to read, and to write. If this is absolutely truly all that is needed, then this is way easier than YAML.

AMatveev · March 6, 2023, 8:14am

I think it is right only when there is a choice.
So let’s compare:

> use key value
> key value
@use key=value

The second option is less readable.
The first option more difficult actually, how many spaces can there be between > and use?

So the third option actually more readable and much more scalable!

I can easily extend this dialect to add others metadata to files.

//@use scala = 2.13
//@doc description = "Main.\n It is main file"

And it can be highlighted by an ide in the same way.

When I imagine what hell of simplicity can be when different commands use scala-cli, yaml and toml to add metadata I feel toothache )))

IMHO: It is very expensive simplicity

Edit1:

Is it simplicity?
PS:
The road to hell is paved with good intentions

bishabosha · March 6, 2023, 9:10am

I think that there may be some benefit to having richer syntax than key/value pairs, for example, sometimes a value needs to be structured information, and currently the only way around that is to reinvent a custom syntax to parse the value! (or have an explosion of more keys)

e.g. look at the value for using publish.developer as seen in virtuslab/toolkit:

//> using publish.developer "szymon-rd|Simon R|https://github.com/szymon-rd"

clearly this value is not a simple string, but a CSV (with a pipe separator), with YAML this could be an actual structured value.

to be clear I’m not personally looking to upturn the syntax dramatically, but highlight there can be a need for more flexible syntax to support more complex configuration, which inevitably will be required in the future.

Edit:

to make it even more clear, I guess this could have been implemented with more nesting in the keys, is that possible?:

//> using publish.developer.user "szymon-rd"
//> using publish.developer.name "Simon R"
//> using publish.developer.home "https://github.com/szymon-rd"

internally are these keys turned into hierarchical data or is it a flat namespace where keys just happen to have .?

lihaoyi · March 6, 2023, 9:57am

I think it’s clear that some kind of hierarchical semi-structured format is necessary.

Even in the examples given here, we are already seeing adhoc embedded-in-string DSLs appear, to squeeze structure into the constraints of the string-only syntax. And these are only toy examples
“Real world” usage would generally have much more complexity/flags/edge-cases/etc. than these. All those are things that hierarchical metadata is good at representing in a standard way (e.g. optional sub-keys).
And we have to expect that the complexity of the data we need to provide will grow over time, as ScalaCLI itself grows and evolves: maybe there’ll be configurable plugins, maybe compatibility with Mill/SBT. All of these would benefit from namespacing and hierarchical config.

We have to prepare for that level of configuration, where Scala-CLI is wildly successful and widely used, which basically necessitates nested hierarchical configuration. We shouldn’t make decisions now that would hamstring the project’s future success for reasons that are easily foreseen up front.

We’ve already seen syntax in Scala-CLI that is an adhoc flavor of YAML, and in this thread we have seen proposals that are ad-hoc flavors of TOML. So clearly the need is there. It’s not a coincidence these keep getting re-invented! Trying to pretend “we don’t need that stuff, it’s just KV pairs, simple” as I see happening in this thread is a mistake that would haunt us down the road: at best we change the syntax again later, at worst we grow a collection of weird sub-languages as people are forced to somehow squeeze their structured data into the KV world.

I agree sometimes the brevity is necessary, e.g. the shorthand dependency syntax copied from Mill. But that should be limited to as few situations as possible, to avoid users having to learn a zoo of adhoc bespoke incompatible sub-languages when all they want to do is pass some nested JSON-like metadata to their build tool.

Just to throw in another use case, let’s say we want Scala-CLI to be able to publish to maven central, since that’s what a lot of one-module SBT and Mill projects can do. The Mill schema for publishing to maven central POMs looks something like:

sealed trait Scope
object Scope {
  case object Compile extends Scope
  case object Provided extends Scope
  case object Runtime extends Scope
  case object Test extends Scope
}

case class Dependency(
    artifact: Artifact,
    scope: Scope,
    optional: Boolean = false,
    configuration: Option[String] = None,
    exclusions: Seq[(String, String)] = Nil
)

case class VersionControl(
    browsableRepository: Option[String] = None,
    connection: Option[String] = None,
    developerConnection: Option[String] = None,
    tag: Option[String] = None
)

case class Developer(
    id: String,
    name: String,
    url: String,
    organization: Option[String] = None,
    organizationUrl: Option[String] = None
)

case class PomSettings(
    description: String,
    organization: String,
    url: String,
    licenses: Seq[License],
    versionControl: VersionControl,
    developers: Seq[Developer],
    packaging: String = "jar"
)

Another example could be the config for Mill’s assembly command:

def assemblyRules: Seq[Assembly.Rule]
sealed trait Rule extends Product with Serializable
object Rule {
  case class Append(path: String, separator: String = defaultSeparator) extends Rule
  case class Exclude(path: String) extends Rule
  case class Relocate(from: String, to: String) extends Rule
  case class ExcludePattern(pattern: Pattern) extends Rule
}

These could be easily handles by a short-ish YAML snippet at the top of the file, but would be a poor fit for raw key-value pairs. This is but one example of a just-slightly-non-trivial use case that basically requires hierarchical config, I’m sure in the wild OSS and proprietary ecosystems there’ll be hundreds of other similar examples

bjornregnell · March 6, 2023, 12:42pm

@lihaoyi Could needs by your examples be satisfied with this simple dot-notation for hierarchy (in line with what @bishabosha just suggested)?

//> use publish.developer.user "szymon-rd"
//> use publish.developer.name "Simon R"
//> use publish.developer.home "https://github.com/szymon-rd"

morgen-peschke · March 6, 2023, 8:30pm

Dot-notation encoding of hierarchy seems like it would scale into “painfully tedious” territory really quickly.

Ichoran · March 7, 2023, 1:16am

I am not sure why you say it is more readable. I’m especially uncertain why you bring up spaces, when it’s far from obvious whether // @use is okay, or //@use key = value.

I personally find it considerably harder to grok, and actively contrary to meaning when giving multiple options that accumulate:

//@use dep="com.github.pathikrit::better-files:3.9.2"
//@use dep="com.lihaoyi::requests:0.8.0"

Does the second overwrite the first? Sure seems like it ought to. With a simple key-value pair, there is less suggestion that replacement might be a thing.

Anyway, I don’t doubt that for you, @use is more intuitive, and I agree that if we were going to allow multiple different categories of metadata, having a keyword for the category is good, and at that point, adding @ makes sense because it helps visually distinguish the category keyword from the keyword within the category.

But then we have to decide if that other tool is going to need more than simple key-value pairs. And if yes, then we should reuse an existing configuration language for that.

So I think that the same argument you give here is an even better argument for TOML.

/***
[cli]
scala = "2.13"

[doc]
description = "Main\n It is main file"
font = ["Calibri", "Arial", "sans-serif"]
***/

Way clearer to me, and then it doesn’t matter if new tools come along who want to get fancy, because we support fancy.

The core question is: can we live with simple?

I share Li Haoyi’s skepticism–it seems like the answer is likely, “We can’t.” I’m not ready to give in to skepticism yet. However, I would be a lot more comfortable if we thought through all the consequences of being thoroughly committed to simplicity.

Note: there’s no particular reason aside from duplicate notation that we can’t have our simplicity and eat it too.

//> scala 2.13.2

/***
[cli]
scala = "2.13.2"
***/

could both be valid. The rule would be that anything that you can write as

//> key value string here

is identical in meaning to

[cli]
key = "value string here"

and that if you can ever write the latter, you can always use it as the former also.

So the duplicate notations needn’t be that confusing. Just a tiny dash of syntactic sugar.

(I would argue against having “use” or “using” be the primary dispatch metadata term if there is any chance for others. It should be something clearly compiler-related like “cli” or “scala” or “compile” or “build” or somesuch.)

(As my examples illustrate, I would go for TOML over YAML due to YAML’s profusion of features, which if we didn’t support would then cause random confusion because “why can’t I do xyz?”. But I don’t have a strong preference among TOML, YAML, or JSON.)

AMatveev · March 7, 2023, 7:49am

So my examples have disadvantages and you have helped me to understand its better. Thanks.

I am not trying to say that ‘@’ is the best choice. I want to say that current simple specification is not working it leads to reinventing the wheel.

I think a good specification should be based on subset of well known language to be scaleable.

Let assume that we must use single line comments, so we take subset of toml grammar because it has hierarchical keys.

It might be something like that:

meta-line = new-meta group-name start-toml toml-keyval

I don’t think I can define further more better than authors of scala-cli.
But I think such specification should be. And it should be on base of well known language.
I can suggest ideas of course:

///use key=val
//*use key=val
//#use key=val
//> use key=val 
//@use key=val

But actually it doesn’t matter.

arturopala · March 7, 2023, 8:36am

I personally like the original idea of the scala-cli directives being very (stupid) simple and easy to use when writing scripts, sharing snippets, or just playing with some scala code or libraries.

My proposal is to keep only simple //> key value format and delegate all complex definitions to the respective config files with @file(...) or @url(...) syntax, possibly supporting as many data formats as we like. Example:

//> scala 3.2.2
//> publish @file(config/publisher.json)

or

//> scala 3.2.2
//> dependency com.github.pathikrit::better-files:3.9.2
//> publish @url(https://github.com/foo/foo/publisher.yaml)

The clear benefits would be:

no-brainer syntax for simplest use cases
the most proper and free tooling support for all other cases, without any limits

odersky · March 7, 2023, 10:04am

I agree, but I would also keep the using keyword. Scala is a keyword first language, it does not generally start something important following a symbol. Also, using can be syntax highlighted. Verbosity does not matter here; these are highly specific commands, it’s important to have some textual clue what they are. And, yes, I know the argument against the double use of using, but I don’t think it will be an issue.

sideeffffect · March 7, 2023, 1:25pm

If you prefer to keep using a Scala keyword, why not given? That is at least congruent with what it means in ordinary Scala language.

Ichoran · March 7, 2023, 5:50pm

That’s quite true, but this isn’t Scala, at this point. With the revised conceptualization, it’s just a build tool directive, with ScalaCLI being a very low-ceremony build tool.

In this way it’s more like Scaladoc comments (which IDEs have no trouble syntax highlighting).

/** Hello, scaladoc!
  * @param bar I am an example parameter!
  */
def foo(bar: Bar): Foo = ???

//> main hi.i.am.a.build.directive.MainClass

Aside from the @, which is necessary because scaladoc by default takes raw text, and the multi-line vs. single-line comment, it’s a close parallel.

So I think we have precedent for doing things this way, too.