Pre SIP: scala-cli as new scala command

lihaoyi · March 6, 2023, 9:57am

I think it’s clear that some kind of hierarchical semi-structured format is necessary.

Even in the examples given here, we are already seeing adhoc embedded-in-string DSLs appear, to squeeze structure into the constraints of the string-only syntax. And these are only toy examples
“Real world” usage would generally have much more complexity/flags/edge-cases/etc. than these. All those are things that hierarchical metadata is good at representing in a standard way (e.g. optional sub-keys).
And we have to expect that the complexity of the data we need to provide will grow over time, as ScalaCLI itself grows and evolves: maybe there’ll be configurable plugins, maybe compatibility with Mill/SBT. All of these would benefit from namespacing and hierarchical config.

We have to prepare for that level of configuration, where Scala-CLI is wildly successful and widely used, which basically necessitates nested hierarchical configuration. We shouldn’t make decisions now that would hamstring the project’s future success for reasons that are easily foreseen up front.

We’ve already seen syntax in Scala-CLI that is an adhoc flavor of YAML, and in this thread we have seen proposals that are ad-hoc flavors of TOML. So clearly the need is there. It’s not a coincidence these keep getting re-invented! Trying to pretend “we don’t need that stuff, it’s just KV pairs, simple” as I see happening in this thread is a mistake that would haunt us down the road: at best we change the syntax again later, at worst we grow a collection of weird sub-languages as people are forced to somehow squeeze their structured data into the KV world.

I agree sometimes the brevity is necessary, e.g. the shorthand dependency syntax copied from Mill. But that should be limited to as few situations as possible, to avoid users having to learn a zoo of adhoc bespoke incompatible sub-languages when all they want to do is pass some nested JSON-like metadata to their build tool.

Just to throw in another use case, let’s say we want Scala-CLI to be able to publish to maven central, since that’s what a lot of one-module SBT and Mill projects can do. The Mill schema for publishing to maven central POMs looks something like:

sealed trait Scope
object Scope {
  case object Compile extends Scope
  case object Provided extends Scope
  case object Runtime extends Scope
  case object Test extends Scope
}

case class Dependency(
    artifact: Artifact,
    scope: Scope,
    optional: Boolean = false,
    configuration: Option[String] = None,
    exclusions: Seq[(String, String)] = Nil
)

case class VersionControl(
    browsableRepository: Option[String] = None,
    connection: Option[String] = None,
    developerConnection: Option[String] = None,
    tag: Option[String] = None
)

case class Developer(
    id: String,
    name: String,
    url: String,
    organization: Option[String] = None,
    organizationUrl: Option[String] = None
)

case class PomSettings(
    description: String,
    organization: String,
    url: String,
    licenses: Seq[License],
    versionControl: VersionControl,
    developers: Seq[Developer],
    packaging: String = "jar"
)

Another example could be the config for Mill’s assembly command:

def assemblyRules: Seq[Assembly.Rule]
sealed trait Rule extends Product with Serializable
object Rule {
  case class Append(path: String, separator: String = defaultSeparator) extends Rule
  case class Exclude(path: String) extends Rule
  case class Relocate(from: String, to: String) extends Rule
  case class ExcludePattern(pattern: Pattern) extends Rule
}

These could be easily handles by a short-ish YAML snippet at the top of the file, but would be a poor fit for raw key-value pairs. This is but one example of a just-slightly-non-trivial use case that basically requires hierarchical config, I’m sure in the wild OSS and proprietary ecosystems there’ll be hundreds of other similar examples