Pre-SIP: A Syntax for Collection Literals

Circling back one of the original examples that motivated this discussion, consider this data structure

  def pomSettings: PomSettings = PomSettings(
    description = "Hello",
    organization = "com.lihaoyi",
    url = "https://github.com/lihaoyi/example",
    licenses = Seq(License.MIT),
    versionControl = VersionControl.github("lihaoyi", "example"),
    developers = Seq(Developer(id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi"))
  )

def pomSettings already has a target type; we already know it is a PomSettings object! Similarly, developers = Seq(Developer(...)) is redundant, and licenses = Seq(License.MIT) is similarly redundant, both to humans (we know the english meaning of developers) and also to the compiler (it knows developers is of type Seq[Developer]).

I would like to be able to say

  def pomSettings: PomSettings = (
    description = "Hello",
    organization = "com.lihaoyi",
    url = "https://github.com/lihaoyi/example",
    licenses = [.MIT],
    versionControl = .github("lihaoyi", "example"),
    developers = [(id = "lihaoyi", name = "Li Haoyi", url "https://github.com/lihaoyi")]
  )

To achieve this, we would need (a) collection literals (b) named-tuple-to-case-class implicit constructors and (c) some kind of enum-shorthand syntax for .MIT and .github.

It’s a lot of new features, so I don’t expect to be able to spec/discuss/implement/etc. all of them in the near future. But I think allowing Scala to concisely write this kind of common hierarchical data structure would be a nice “north star” that we can slowly work towards, and maybe some day we’ll even get there.

3 Likes

This proposal would have been a good addition somewhere around 2.9, before Scala became popular and before a lot of people became familiar with the syntax. The actual cost of this change is, as it was already mentioned before, the introduction of a new, alternative way of doing the same thing that will percolate through the ecosystem and boost an already massive complaint of “Scala in project A looks very differently to Scala in project B and that makes onboarding and moves between projects difficult”. This isn’t only mine opinion - this is one of the core issues voiced by the community in the Maintenance Survey and one of the reasons the report is not yet ready is that we’re trying to figure out how to address these problems effectively.

In light of the above I think it’s counterproductive to add a brand new way of doing something that really isn’t a very common thing (in software engineering practice 90% of time you load data from files or download it from the internet anyway) that generally has a single way of doing at the moment: Vector(1, 2, 3) or Map("a" -> 1, "b" -> 2) is fine, you learn it once, there really aren’t alternatives available and then AI writes it for you anyway when you need to write large literals.

My biggest issue however is that we change something that has precise meaning into something that’s ambiguous. In python [1, 2, 3] is a list - not an array, not a map, not a vector, a list. If we go with these implicit conversions the answer in Scala will be “it depends”.

14 Likes

I agree with eed3si9n and djspiewak concerns about this issue.

Also in my opinion, this proposal not adds any particular value and not solves any particular problem, except making language more pop. (Making string representations with indentations and will add more value imo)
Examples with json for me are not really good. The way circe works with json is way better, than syntax with [ ]

Even more we already have such syntax for types.

def foo[x, y, z]: Unit = ()
foo[1, 2, 3]

Also I would think twice about using this syntax for maps. 5 letters Map() vs 2 letters [] is not much difference, but creates ambiguity

2 Likes

I can’t agree with you, you can just not specify type explicitly. Why would you need one way more of defining the same thing?

For me explicit types look much better than this. And how are you going to navigate to sources with such encoding?

I did not see this proposed explicitly in the original thread. But thinking about it, it makes sense. My argument against the original aggregate proposal (which has nothing to do with named tuples) was that it introduces a new way of doing things that is both shorter and more obscure. In my experience many people will prefer short over clear so it’s a recipe to make scala code in the wild harder to read. That argument does not apply to named tuple syntax since you trade the class name for the field names. So the new syntax is usually not shorter, but it can be clearer. So :+1: for this idea. Also, it’s a nice way to introduce default values for named tuples. And finally, it could be extended to () by stating that the () literal can fill in a case class which has fields that all have default values.

If that sentiment wins, Scala is dead. We need progress. Yes, we have to live with the difference in styles but it is not due to this trivial choice of literals but due to the fact that we have the split between pure FP and the rest. That permeates everything. On the other hand, if we fail to improve the language and don’t make it more accessible to newcomers we have nothing to defend. There’s no moat. We have no large corporation funding us. Scala either continues to be the best choice of language for many usage scenarios or it will wither away. I know I’ll be gone long before that.

To expand on this: This is an argument that was prevalent for Java in the long period between Java 5 and Java 8. There’s of course lots more code written in Java, so the argument made even more sense than for Scala and it was very widespread. But Java lost a lot of ground during that time. Now Java is innovating, with every release packed with new features. And I believe it helps the language that they are doing that.

5 Likes

This is well known repeated point of view. However, I have a set of questions which arise and I can’t answer them easily to prove that this complain is valid:

  • What are exact examples of such issue? Do they related to language itself?

    Personally, I never face the described issue so nothing cames to my mind. Different set of libraries might fit this case but it’s not about the language.

  • Can we compare the numbers of such things with other PL? Do we have more, or less?

    I’m writing too long only on Scala, I have no idea what happens in other PL. They do release new features and versions too. They also have different libraries and frameworks. What if we aren’t so different from others?

1 Like

We all agree on that, it’s just there are a number of improvements that are much more valuable than collection literals. I think for most of us the change in usability is only minor, but the work related to it will be major both in terms of tooling and understanding the code.

Named tuples for example, while most people are careful, they only disagreed about the exact implementation. Most people felt this is quite useful and adds to the language. We can now do things that were not possible before.

With collection literals we don’t get that. I doesn’t sit well with the language as [] are used for something totally different. No one really needs it urgently, no new usability is added.

And how do you actually solve the issue with having two syntaxes all over the code? We will be able to have snippets like:

val a = [1, 2, 3]
assertEquals(a, Seq(1, 2, 3))

Is that not relevant or problematic?

17 Likes

A fun fact is that declining SIPs like that won’t make these complaining people happier.
Also it won’t bring new users and won’t create a point for news/discussions outside of Scala-bubble.

However, accepting it means possible new users and some portion of attention. Yep, there will be some people who will be disappointed by this syntax but they will get used to it and they won’t leave Scala because of its addition.

Will this change bring that many people in? This is the biggest issue for most people complaining in this thread. Was ever the collection syntax a blocker?

7 Likes

We agree that collection literals represent a much smaller improvement overall than named tuples, in both effort required and the benefit they bring. But I believe small improvements matter as well.

And yes, the change might help bring new people in since it removes one of these initial “this looks weird for no reason” reactions that people might have when they see their first Scala script using Seq(...). A minor thing, for sure.

2 Likes

This conversation has gotten long, and I’ll admit that I haven’t kept up with it. Based on skimming it, I think my points have been made, but I want to re-iterate them as someone who taught CS1 and CS2 using Scala for over 10 years. I’m not in favor of this proposal for two main reasons.

First, I believe that using [] for type arguments only is helpful for teaching/learning. I could tell students that if they see [], what goes inside of the brackets will always be a type. This proposal removes that clarity, hurting students trying to learn the language. Yes, Python type hints also use brackets, but the majority of people who teach introductory programming with Python don’t use full type hints, and they can do so because type hints are optional. Most students learning Python will see lists as the language’s only use of []. Scala requires [] to be shown when students see functions/methods that accept collections, so this syntactic ambiguity is unavoidable.

I also feel that this type of syntax is only more clear in languages that are highly opinionated about their default collections and don’t provide a rich collections library. The rich collections library is a tremendous strength of Scala, and I think this undercuts that. I consider Python a very poor language for teaching data structures because there is one default implementation for List, Set, and Dict. I want to teach the relative merits of linked lists and array-based lists. Same for hashmaps versus tree maps. Python is poor for that, and I think this proposal would undermine Scala’s value in that area.

As a teacher, I happen to like the explicit collection syntax that currently exists in Scala. If you want a collection, you give the name of the collection followed by an argument list with values. I think this explicit syntax is very clear for students, especially in a language with a rich collections library. This proposal only makes things less clear. It just provides yet another way of doing things that potentially adds confusion.

18 Likes

Who knows, let’s implement it and find it out :wink:

As the outcome related to new users of this SIP I see:

  • more chances to land in Scala for those who know Python/JavaScript and etc.
  • more exiting snippets, demo, educational materials. Take a look at the scala-lang.org - there is a snippet with fruits
1 Like

@lihaoyi I quickly tried to implement the named tuple syntax for case class constructors. It was very easy to do. The original JSON definitions I posted now compile with meaningful static types. Here are some variations. So, yes, it looks like statically typed JSON is entirely doable with this.

I think I will file an amendmend to the named tuple SIP to support this use case.

Expanding on this. Here’s one of the data values again:

val b1: BuildDescription = (
  declarationMap = true,
  esModuleInterop = true,
  baseUrl = ".",
  rootDir = "typescript",
  declaration = true,
  outDir = pubBundledOut,
  deps = [junitInterface, commonsIo],
  plugins  = [
    ( transform = "typescript-transform-paths" ),
    ( transform = "typescript-transform-paths",
      afterDeclarations = true
    )
  ],
  aliases = ["someValue", "some-value", "a value"],
  moduleResolution = "node",
  module = "CommonJS",
  target = "ES2020"
)

And here are the case classes defining the schema:

case class BuildDescription(
  declarationMap: Boolean = false,
  esModuleInterop: Boolean = true,
  baseUrl: String = ".",
  rootDir: String = "",
  declaration: Boolean = false,
  outDir: String = ".",
  deps: Seq[Dep] = [junitInterface, commonsIo],
  plugins: List[Plugin]  = [],
  aliases: IArray[String] = [],
  moduleResolution: String = "",
  module: String = "",
  target: String = "",
  other: String = ""
)

case class Plugin(
  transform: String,
  afterDeclarations: Boolean = false
)

The first case class contains various collections in its fields, let’s assume that’s done for efficiency considerations. But the actual value just uses [...] everywhere. This is as it should be! The person defining the schema will also write the code to process it and therefore will take care choosing the right collection types. The person defininig the data value should not care about this at all – all that matters is that some sequence of values is defined. So I would argue that in this case it’s actually a good thing that the type is not manifest in the value.

7 Likes

The only reason the Seqs stick out is because they are inlined. junitInterface, commonsIo and pubBundledOut are conveniently defined elsewhere, otherwise they would stick out as well.

Perhaps, before rushing into it, it makes sense to frame out what are the problems this feature is going to solve. Here is my personal vision, although all those points have been made by other people many times in the above conversation:

  • Can it simplify learning Scala?
    Unlikely, because
    a) students who begin learning their first language would be better off familiarizing themselves with the rich collection library that Scala offers and understand the differences between Vector, List, Set, etc, rather than getting used to just [1, 2, 3]. I doubt that the latter is going to be helpful for them.
    b) students/learners who came from the languages that have some kind of collection literals like Python or JavaScript can become bitten by the fact that literals like [], [1, "two", 3.45] and such are not what they think and work in a very different way under the hood. The initial excitement may wear off pretty quickly.
  • Can it help in real-world applications (like servers, bigdata pipelines, etc)?
    Nope, because in such application most of the data is received from external sources. Besides, encouraging people to hardcode their data in production-grade apps is a bad idea in general.
  • Can it help with writing tests?
    A little. In some simple test scenarios it could be a win indeed. That said, tests for the real-world applications usually contain so many test data that it is not a good idea to hardcode it either. There are better approaches available: either also to read it from external sources (files, in-memory DBs) or to embrace a property-based testing.
  • Scripting and build configurations (sbt, mill)?
    Yes, agreed, there will be quite a win. However, I guess that Scala code in this area accounts for less than 1% of all the Scala code written overall. Moreover, build scripts tend to be pretty “static”, i.e. they do not change that often, therefore there’s no real pressure for the feature from this camp.

I totally get the point that “Scala has to move forward to survive” in general. I just don’t feel convinced that this particular feature is the “move forward” for Scala.

8 Likes

I am not a fan of the magic-ness and special-ness in this. It makes things hard to understand and hard to explain. It does not follow from my other knowledge of Scala, and does not contribute to it. It makes the language bigger and therefore harder, not easier, to learn.

The familiarity of square bracket literals would be deceptive. Yes, at first approximation, collection literals would look the same as they would in other languages like JS and Python. But because Scala has many collection types, and the literals’ type is obscured, they would not work the same, for example:

val arr = [1, 2, 3]
def process(input: Array[Int]) = ???
process(arr) // does not work, because arr is a Seq, not an Array

The familiar collection literal has locked the user into incorrect expectations, and led the user astray. Now they have to find out why the “array literal syntax” (as they know it) does not give them an array. They have to learn about all the hard parts of collections anyway.

They’ll get another surprise when they learn that they can’t use arr[n] to get the n-th element, which is also something they would expect from a language that supports [1, 2, 3] literal syntax.

I don’t think that Seq(1, 2, 3) or Vector(1, 2, 3) is overly verbose. It’s short enough. It’s also easy to read – it tells me exactly what it gives me. Most importantly, I know if the collection I’m creating is mutable or not. If I read code that provides [1, 2, 3] to some function, I wouldn’t even know if that function accepts it as an immutable Seq or as a mutable Buffer. A JS developer who has never seen an immutable array may be ok with this, but for Scala developers, the type-less literal syntax actually introduces confusion – not because of unfamiliarity, but because it may be unclear what’s going on anymore.

And of course, using the same square brackets for Maps completely defeats any remaining argument for familiarity. Neither JS, nor Python, nor JSON, use square brackets and -> arrows to create maps. They use curly braces. Of course we can’t do that in Scala because it conflicts with existing syntax, so I don’t understand why we need the [1, 2, 3] literal syntax when we can’t possibly meet the other related expectations.

Lastly, consider a new user who learned to use the new proposed syntax – a theoretical success case for this proposal. They will still absolutely have to learn the “normal” non-special way of spelling it, using Seq() and other constructors. If not to write, then at least to read such code. So again, it does not make the language simpler. It requires learning more things, not less.

I do not believe that writing or reading Seq() or Vector() is in any way an impediment to Scala ergonomics. Simplicity and consistency with the rest of Scala trumps any superficial familiarity we could achieve with the square brackets here IMO.

22 Likes

Some people said they wanted a shorter way to write bunches of data in tests – if so, why not use e.g. val s = Seq (in your own code, not in Scala), then you could just do s(1, 2, 3) – that is only a single character longer than [1, 2, 3]. Yet, even the people who would supposedly benefit from this brevity aren’t doing it (at least I personally haven’t seen it), so it seems to me that this particular use case isn’t anywhere near important enough to be driving language decisions that affect everyone.

8 Likes

My two cents as a regular Scala user and teacher:

  • I’m neither for nor against it. I like the proposed syntax, but I can live without it.

  • What attracted me the most about the proposal initially was the idea of specifying a sequence type only at the definition site, not at the call site. So, I could write f([1,2,3]) and not care if f requires a List or an IndexedSeq or changes over time. But even that is of limited benefit. Once I start to write:

    val nums = [1,2,3]
    f(nums)
    

    the compiler won’t be able to infer a suitable type for nums based on the definition of f.

  • I don’t buy the argument that test code will benefit much because of the presence of many collection literals in the code. I agree with the previous comments that too many literals in test code is a sign they should be move to resource files.

  • I also don’t buy the argument that it will be hard on beginners. They’ll write [1,2,3] and it will work most of the time. Already, beginners write List(1,2,3) without any concept of companion object and hidden apply method. Beginners get by with a superficial understanding of things.

  • Now, if the type magically changes by importing a given, that’s harder on beginners. To me, [1,2,3] should mean “I don’t care about the type, pick one that works”, and if I care, I’ll write List(1,2,3) or IndexedSeq(1,2,3). I don’t like the idea of writing [1,2,3] but with a well-chosen given in scope because I care. Either I really don’t care about the type, or I write it explicitly.

6 Likes

Named tuples for example, while most people are careful, they only disagreed about the exact implementation. Most people felt this is quite useful and adds to the language. We can now do things that were not possible before.

I don’t think named tuples really add much that you couldn’t do before. Defining a case class and giving things names has always been possible, named tuples just reduce friction in a common case where the name is meaningless, which is valuable because the only thing worse than no-name is meaningless-arbitrary-names. That’s not unlike collection literals reducing friction in common cases where the collection name Seq is meaningless and you just want to instantiate “whatever” collection.

People already use Seq all the time for “don’t care”, and the proposed syntax is a more powerful “don’t care”: apart from being more concise, and more standard across languages, it also fits into whatever target-typing exists so you can “don’t care” in many more scenarios where previously you would need to explicitly specify types such as Vector(Vector(1, 0), Vector(0, 1)).

It’s definitely a cost of the proposal. How often do people have problems with 1 :: 2 :: 3 :: Nil vs List(1, 2, 3) vs Seq(1, 2, 3).toList? Do people have problems with new Runnable{ def run() = ??? } vs () => ???? Or x => x.foo vs _.foo? All of them cause problems at various points in time, the question is whether the problems are worth the benefits or not. Clearly many people are opposed to it, but “two syntaxes” has never been a hard blocker even if it’s a potential downside

2 Likes