Pre-SIP: A Syntax for Collection Literals

These are pretty simple syntactic rules that could work both in term and type position for tuples:

  1. It is optional to have a trailing comma in tuples if the number of elements are two or more, such as (42, "hi",) or (Int, String,)
  2. If there are zero or one element in a tuple then the trailing comma is mandatory, such as (42,) or (Int,) or the empty tuple (,)

To me, that feels very regular end easy to explain. (Rule number 1 is not needed for allowing single-elem tuple syntax, but it provides regularity inline with allowed trailing commas elsewhere.)

What do you think?

4 Likes

The solution here is easy: don’t solve the problem for one element tuples

How often do people create literal one element tuples? Basically never. I have never done so once in 15 years writing a large part of the Scala ecosystem

How often do people create literal one element sequences? Very often. In the com-lihaoyi I see it many many times in each file.

Why should we block an improvement for a common problem (constructing sequences), because we can’t also solve an ultra-rare problem nobody ever encounters (constructing one element tuples)? That whole thing is premised on some tenous theoretical relationship between tuples and sequences, but really it seems in reality there is no overlap in usage patterns and we can do well to just ignore this tuple-sequence correspondence as a red herring that doesn’t provode any useful insight

3 Likes

But if we want tuple-based syntax we need to solve the problem, if I understand you correctly? As you often construct single elem lists and want target typing to avoid repeating Seq and skip the type detail in the expression then you need to be able to

val xs: Seq[Int] = (42,)

a simple solution to a common problem of single elem lists and it should work with named tuples as well…

2 Likes

For me, I think it’s best if we can rely on explicit (a, b, c)*, just like when applying for regular sequences, but only in named application x = (a, b, c)*.

1 Like

Yes that’s right. And the solution here is to not use a tuple-based syntax and use square brackets instead!

Scala with square bracket sequences would basically be an identical syntax to Python with square bracket sequences. Both use square brackets for types, both use parens for tuples. This overloading of square brackets isnt a problem in practice for Python, and enough people write Python that we would definitely find complaints if there were issues

Python has a lot of problems, but syntax generally isnt one of them. People call it “executable pseudocode” because you can transcribe stuff off a whiteboard into Python and it just works. The way sequences are represented in pseudocode is with [] or {}, which is what all popular languages use. Nobody writes x = 123.tup or (,) on a whiteboard to represent a one element or zero element sequence. No programming language in the world uses that syntax for defining sequences.

One difference that has been brought up is that Python uses [] for lookup while Scala uses (), but that’s irrelevant: it’s not like Python dictionaries being defined with {} and looked up with [] causes confusion

Lots of people have suggested using tuple syntax for collections, and we’ve discussed it in great depth in this thread. The outcome certainly seems to a detailed study of why using parens for sequences is incompatible with Scala’s syntax, or for that matter the syntax of any language other than Lisps. We’ve re-discovered the reason why people don’t use () for collection literals in every other language

3 Likes

This makes me think there might be a syntax ambiguity with Martin’s current case class literal proposal, due to parentheses’ role as expression delimiters:

var a: Int = scala.compiletime.uninitialized
val b = (a = 1) // not a (named) tuple

This is an argument in favor of my own proposal to also use square brackets for case class literals.

You mean, if enter bracket syntax in term position then we should go all in? Also for tuple terms and named tuple terms? Like in val t2: (Int, Int) = [42, 42]

Whatever we do, the one element tuple problem will be solved, as tuples are sequences:

(To me this seems like a terrible solution, but as you said, singleton tuples are very rare, so it doesn’t really matter)

The real question is:
What’s the better solution for sequence literals:

  1. Convert from tuples and fix the one issue they have
    • Or something tuple-like without the issue, like: (1)*
  2. Add completely new syntax which heavily resembles something completely different in this language (type parameters), without adding any of the features that often go alongside it in other languages (a[1], a[1..3], a[_ > 3])

But we do have a notion of “a sequence of things, but I don’t care how”: Seq
What’s wrong with Seq(a, b, c) other than “it’s not exactly [a, b, c]” ?
As a counterpoint, when I write pseudocode, I usually write List(a, b, c) and not [a, b, c]
Because in pseudocode I don’t even care that List is actually a linked-list which is bad for performance, I mean a list of things !

1 Like

In the average data value, the word Seq will be duplicated a thousand times.

If you read my proposal carefully, it is to leave unnamed tuples alone and re-engineer named tuples as case class literals (or records).

Huh, really? I would always use (a, b, c) for a sequence. Maybe this is because my training was more in mathematics (or a difference in continental Europe much like 1,234 being slightly more than 1 instead of over a thousand (or one, and also 234)?).

[1,2] always reads to me “these are (mathematical) vector-like and I intend to use them with matrices which I will also write with square brackets which is why I’m not just using parentheses.”

Obviously having different delimiters is handy for parsing in computer languages, but blackboard algorithm work for me is usually in math notation because it is usually math, and in math I’ve seen and used it’s usually (a, b, c). And the elements wouldn’t be x(1) or x[1], but with a subscript.

Are we sure we’ve explored this idea thoroughly enough? What if they are type parameters? What if a bare [e] is typeof(inline () => e)?

So [2, Foo(7, 9), println("Hi")] is the type of something that can be converted at compile-time to 2, to Foo(7, 9), and to println(“Hi”). What you do with that information can depend on context.

For assignment without an expected type, val x = [2, 3, 4], you can convert to a standard collection type, probably Array, for maximal familiarity.

For assignment with an expected type, you can marshal a macro that uses the type information to summon values inline.

For other type signatures, you can convert to the expected return type, e.g.

val x = xs.groupBy(_ < 2)
var y: (Int, [x], String) = (0, x, "foo")

where [x] is typeof(x) in this context.

You could set these as types to reuse them if needed, type T = [2, Foo(2, 9), println("x")]. And if you wanted these to be named, you could do it: [x = 2, f = Foo(2, 9), p = println("x")].

One would need a little care to make sure it was all self-consistent. For instance, one would need to decide whether singletons were just single types, or types-of-length-one; if the latter, [x]* would unpack into a single type.

But otherwise it essentially accomplishes the desired functionality, but with the generalization needed for it not to be a weird totally different thing but just a cool new way to use types. It’s basically the same idea as path-dependent types but with inline code functionality instead of a referent to a stable value; you’d have a reference to a stable type but allow dynamic creation of that type instead of just plugging in the fixed value.

2 Likes

In the average data value, the word Seq will be duplicated a thousand times.

If it is the case then it make make sense to consider moving such huge data sets into an external file (or files). Because if there are going to be thousands of collections, then with or without brackets it can turn code files into a total mess.

Imho, this is the whole point: in small-scale inlined collections either Seq, tuples or brackets all would work just as fine. Brackets would have a little advantage, but would come with all the caveats discussed above. In big-scale collections, however, neither of them would work. I would dare to argue that keeping a lot of data in code files is somewhat an anti-pattern.

6 Likes

I was not able to make typeof(inline () => e) work (scastie), is it already available ?

To my knowledge, types are much less precise than even singleton types, for example println("Hi").type =:= ().type.

Even outside of side effects, even case classes do not have that natively:
Foo(1).type =:= Foo(2).type
You can cheat it with refinements however:
Foo{val x: 1} =/= Foo{val x: 2}
(Adding that would probably be worthwhile, but it would be in another change, in another SIP)

And it’s hard to know what it would mean for cases like the following:

class Bar():
  println("bar")

def foo[T]: Int = 0

foo[Bar()] // prints bar or doesn't ? At runtime or compile-time ?

Even if we added this, it’s not clear what going through “it’s all just types” adds
Even in the clarification side:
“[1, 2, 3] is a type converted to a sequence value, where the elements are the values converted to types and then extracted by a macro”

The desire for it to be a type seems solely syntactic, and not semantic

Sorry, I wasn’t clear enough: my proposal is that this would add an extended typeof functionality, which, as far as I know, we don’t have.

What does it mean to have the type of a literal expression?

Well, it means that you know what the return type is. But it also means that you ought to be able to summon the code of the expression (hence it being an expression literal). compiletime.constValue[T] can convert the type T to an instance of the type T e.g. by copying the constant into the code, so compiletime.constExpr[=>T] should be able to convert the type =>T to an expression that returns type T; but you can also simply read off the type T.

So foo[Bar()] would be foo[() => { Bar() }: Bar], but we don’t materialize the type anywhere so the only thing to do is to reduce it to its return type Bar. Therefore, foo[Bar()] would be an unnecessarily long way to write foo[Bar]. Because of that, I doubt that it would make sense to even allow it to be written like that with bare expressions inside generic type blocks; I’d suggest that the literal expression type version would have to go inside [] not in the normal type ascription place, and use broadcasting with * to get out. So I would write it as foo[[Bar()]*] which I think would make it plenty clear that at the very least we’d better read some documentation to understand why this weird construct is being used.

The point wouldn’t be to write types weirdly, though. The point would be to express types relatively, which right now isn’t that easily. For example:

class Bar[A, B](val a: A, val b: B) {}
val bar = Bar("hi", 7)
var c: Option[[bar.a]*] = None
// More code that relies upon [bar.a]* being String

would be nice.

Now, it’s entirely possible that a typeof could be added that didn’t have these properties. But it seems to me that it’s very much a two-birds-one-stone situation: it matches the desired syntax, it solves a missing need, and it makes regular the efficient construction of types or the expressions that produce the corresponding values. It also fixes cases where we presently need match types for which we can’t actually quite tell that the match type result is the original result so we have to live in match-type-land and wait for the compiler to resolve everything properly (or not).

For instance, I would envision being able to write code like this:

inline arr_into[T <: TypeExprLiteral, U >: Widen[T]](a: Array[U], index: Int): Unit =
  inline T match
    case [] =>
    case [X, Y*] =>
      a(index) = constExpr[X]
      arr_into[Y, U](a, index+1)

inline arr[T <: TypeExprLiteral]: Array[Widen[T]] =
  val a = new Array[Widen[T]](constValue[Size[T]])
  arr_into[T](a, 0)
  a

val xs = arr[Some(3), None, None, Some(7)]
/* Syntactic sugar for:
val xs = new Array[Option[Int]](4)
xs(0) = Some(3)
xs(1) = None
xs(2) = None
xs(3) = Some(7)
*/

where Widen[T] for [=>A1, =>A2, =>A3, ..., => An] is the usual operation to find A when calling def f[A](args: A*) = ??? with f(_: A1, _: A2, _: A3, ..., _: An), and Size[T] is simply the compile-time value of n.

1 Like

Lots of people have said that collection literals should be moved to a separate file. This works for some cases when the data is large and standalone, but not the vast majority of them which are small and integrated deeply into the surrounding context. To see for yourself, run the following bash command in your repo:

git ls-files | grep \\.scala | grep -v test | xargs cat | grep -Ei '[^a-zA-Z](Seq|Vector|List)\('  

Here is some example output from the Dotty codebase:

lihaoyi scala3$ git ls-files | grep \\.scala | grep -v test | grep -v integration | xargs cat | grep -E '[^a-zA-Z](Seq|Vector|List)\('  | tail -n30
      case inl :: Nil => Some(Body(Seq(Paragraph(inl))))
      case inls => Some(Body(Seq(Paragraph(Chain(inls)))))
          if (checkParaEnded()) List(s) else List(s, getInline(isInlineEnd = false))
              iss ++= List(Text(endOfLine.toString), i2)
      Chain(Seq(Text("^"), i))
        Chain(List(i, Text(".")))
          Seq(m.withOrigin(Origin.ExtensionFrom(source.name, source.dri)).withKind(kind))
      val expandedMembers = c.members.map(expandMember(outerMembers ++ Seq(c)))
    st.flatMap(s => Vector(s -> ltt) ++ getEdges(s, subtypes))
      List(),
    prefix: Signature = List(Plain("")),
    suffix: Signature = List(Plain("")),
    separator: Signature = List(Plain(", ")),
      list(params, List(Plain("(")), List(Plain(")")), List(Plain(", "))){ (bdr, param) => bdr.buildAnnotationParameter(param)}
    val all = prefixMods.map(_.name) ++ Seq(t.visibility.asSignature) ++ suffixMods.map(_.name)
    this.list(paramss, separator = List(Plain(""))) {
    this.list(params.parameters, prefix = List(Plain("("), Keyword(params.modifiers)), suffix = List(Plain(")")), forcePrefixAndSuffix = true) { (bld, p) =>
  def typeParamList(on: TypeParameterList) = list(on.toList, List(Plain("[")), List(Plain("]"))){ (bdr, e) =>
    this.list(paramss, separator = List(Plain(""))) { (bld, pList) => bld.termParamList(pList) }
    inspectAllTastyFiles(Nil, List(jar), Nil)(inspector)
        List(new ReadTasty) :: // Load classes from tasty
        List(new TastyInspectorPhase) ::  // Perform a callback for each compilation unit
    List(List(new QuotedFrontend))
    List(new Inlining) ::
    List(new Staging) ::
    List(new Splicing) ::
    List(new PickleQuotes) ::
    inspectAllTastyFiles(Nil, List(jar), Nil)(inspector)
        List(new ReadTasty) :: // Load classes from tasty
        List(new TastyInspectorPhase) ::  // Perform a callback for each compilation unit

Some things worth noting here:

  1. People construct collection literals all the time! It isn’t just some quirk of test code or com-lihaoyi code.

  2. Most of the time, the collection literal appears in a place where there is already a target type! So stating the type again when constructing it is redundant

  3. These collection literals tend to be small probably the most common sizes for them are 0 1 and 2, although other sizes do exist. Thus the “lets provide an alternative worse syntax for 0 and 1-element collections” approach defeats the purpose entirely.

  4. Most of these small collections do directly reference things from the enclosing scope! They cannot be simply moved to external data files.

Similar conclusions can be drawn from running this on Ammonite:

lihaoyi Ammonite$ git ls-files | grep \\.scala | grep -v test | grep -v integration | xargs cat | grep -E '[^a-zA-Z](Seq|Vector|List)\('  | tail -n30
        @val bugs = Seq("6302", "8971", "9249", "4438", "8603", "6660", "7953", "6659", "8456", "1067", "8307", "9335")
    @hl.ref(ammoniteTests/"BuiltinTests.scala", Seq("basicConfig", "@"))
      @hl.ref(ammoniteTests/"BuiltinTests.scala", Seq("settings", "@"))
      Seq(
      Seq("bash", "-i"),
        @hl.ref(scriptTests, Seq("loadIvyAdvanced", "@"), "\"\"\"")
        start = Seq("specifyMain", "\"\"\"", ""),
        start = Seq("specifyMainDoc", "\"\"\"", ""),
        case x => Seq(x)
  val NewLine = Seq("\n", "\r")
  val Up = Seq(DefaultUp, WeirdUp)
  val Down = Seq(DefaultDown, WeirdDown)
  val Right = Seq(DefaultRight, WeirdRight)
  val Left = Seq(DefaultLeft, WeirdLeft)
    Seq("sh", "-c", s"$pathedTput $s 2> /dev/tty").!!.trim.toInt
    Seq("sh", "-c", s"$pathedStty $s < /dev/tty"): ProcessBuilder
    s"LazyList(${(rec(this, Nil).reverse ++ Seq("...")).mkString(",")})"
  implicit def stringPrefix(s: String): Strings = Strings(Seq(s))
    wrap(ti => ti.ts.inputs.dropPrefix(Seq(-1)).map(_ => Exit))
        for ((Seq(l, r), i) <- frags) yield {
            val Seq(min, max) = Seq(mark.get, cursor).sorted
          val Seq(min, max) = Seq(cursor, mark).sorted
    up(Vector(), c)
        (Some(start), Vector(), msg, 0)
      up(Vector(), c)
    searchHistory(historyIndex.max(0), 1, b :+ char, Vector())
    searchHistory(historyIndex, 1, b, Vector())
          else if (searchTerm.exists(_.isEmpty)) Vector()
          case (Vector(), Vector('\n', allAfterNewline @ _*)) =>
            append(Vector('\n'))

or Metals

lihaoyi metals$ git ls-files | grep \\.scala | grep -v test | grep -v integration | xargs cat | grep -E '[^a-zA-Z](Seq|Vector|List)\('  | tail -n30 
    List(
   * Consume token stream like "a.b.c" and return List(a, b, c)
            List(UnresolvedOverriddenSymbol(rhsName))
          List(ResolvedOverriddenSymbol(region.owner))
    val indices = text.findIndicesOf(List(c))
        s.TextDocuments(List(document))
    List(
        file <- List(
  val metalsDevs = List(
              Seq(a, b).join
        List(
        List(
      List(out)
            Seq(
  val testGroups = List(
  val eclipseJdt = Seq(
  def deprecatedScala2Versions = Seq(
  def nonDeprecatedScala2Versions = Seq(
    Seq(scala3, "3.3.1") ++ scala3RC.toSeq
    Seq(
    List(
  val tasks = Seq(
      if (isScala3.value || !requiresSemanticdb.value) Seq()
        Seq(
        val configurations = Seq(
        List(
        val compilerPackages = List(
          List(s"-J--add-exports", s"-Jjdk.compiler/$pkg=ALL-UNNAMED")
        Seq(javaHome / "release", javaHome.getParentFile / "release")
        Seq(javaHome / "jre" / "lib" / "rt.jar", javaHome / "lib" / "rt.jar")

Or STTP

lihaoyi sttp$ git ls-files | grep \\.scala | grep -v test | grep -v integration | xargs cat | grep -E '[^a-zA-Z](Seq|Vector|List)\('  | tail -n30
    Seq(socksWithAuth, socks, httpWithAuth, http, httpsWithAuth, https).find(_.isDefined).flatten
      Vector(),
      List(digestOut, realmOut, uriOut, nonceOut, qopOut, challengeOut, cnonceOut, nc, algorithmOut, opaqueOut).flatten
    val params = List(
  private var state: Either[List[Promise[T]], List[T]] = Right(List())
    case Left(Nil)    => state = Right(List(t))
        state = Left(List(p))
      val allHeaders = List(contentDisposition) ++ otherHeaders
  private var multiPartHeaders: Seq[CurlList] = Seq()
          headers = List(Header.contentLength(file.size)),
    Seq(array: _*)
    CurlCode(CCurl.setopt(handle, option.id, toCVarArgList(Seq(parameter))))
          .consumeWith(writeAsync(file.toPath, Seq(StandardOpenOption.WRITE, StandardOpenOption.CREATE)))
          Seq(tf.copy(finalFragment = false), tf.copy(payload = ""))
          Seq(tf)
          Seq(tf.copy(finalFragment = false), tf.copy(payload = ""))
          Seq(tf)
            List(WebSocketFrame.text(s"response to: $payload"))
    val allHeaders = List(contentDisposition) ++ otherHeaders
    val allHeaders = List(contentDisposition) ++ otherHeaders
  val DefaultBuckets: List[Double] = List(.005, .01, .025, .05, .075, .1, .25, .5, .75, 1, 2.5, 5, 7.5, 10)

In these examples, the the List() or Seq() or Vector() type names mostly fall into two categories:

  1. The target type is known. In this case you need to write out the collection type, but there’s only one right answer! So there’s only downside in making the wrong choice (you get a compile error) and no upside in being able to do it better (because the target type is fixed!).

  2. Nobody cares about the collection type - they just want some collection - and the collection is small and code path sufficiently cold that the performance doesn’t matter

In both cases this is pure boilerplate: either you are just writing some collection name to match the target type that you have no choice in choosing, or you are writing a Seq() that you don’t care about the collection at all.

This isn’t an acute problem. No program is un-writable because of this problem. But it’s a low-grade boilerplate that occurs in every Scala application, and is entirely unsolvable at the moment

5 Likes

I think this is a weak argument, or in fact an argument against. To me there does not seem to be a problem at all. Just having a few Seqs here and there is not really boilerplate. Also these grepped examples are not similar to the initial motivation for this SIP (and the previous one like it), which was about lots of bulk / nested data, like matrices…

As for the use case of “we just temporarily need a collection-like thing with 0,1,2 things in it”, I don’t see any issue with Seq. What’s with the obsession to remove just three characters?

Maybe there is a deeper problem here; it’s Scala’s identity / popularity problem. (Of course we can come up with any number of post-hoc rationalizations and technical reasons to back it up “why it’s a good idea regardless”.) Why didn’t people have an issue with writing Seq for the last 20 years? Somehow if we make it more like other popular languages (such as Python), we believe these issues will improve? I believe these problems are unsolvable (in fact not-understandable), because in the social sciences 50% of results are not reproducible, so we have no idea why they happen. Horrible languages (and humans!) can get very popular, wonderful ones can go unnoticed…

Sorry for repeating myself: Scala should stick to its guns.

15 Likes

IIUC Martin’s concept of data value, it is meant to play the role of XML/JSON. At my job, in our code base we have huge chunks of XML that must be validated independently (against XSD) and loaded at runtime by the application (which is not in Scala or even in Java but that’s another story). It would be very convenient to have these as code files in the application language.

Perhaps, there could be some exceptional cases when it could be convenient indeed. And your case could be one of them of course. But, to be honest, according to my personal experience in the industry, as a rule of thumb, it is usually not the case.

Well, build managers like SBT – are one example of such exception, but in my opinion it doesn’t justify these language changes.

Moreover (and according to my experience of course again), long-term projects usually evolve in the opposite direction. They may start off with describing some data in code files at first, but with time and as the data continute piling up, the code files may become hardly manageable so that at some point the team may have to take a difficult decision to allocate some time in order to refactor the code by extracting the data out of code files. And it is not just for Scala per se, it is regardless of the PL used. Again, there can be exceptions, of course, but usually it happens like that.

3 Likes

Well, usually data lives … in databases. The case you put it instead in your code base as either JSON/XML or (my argument) your preferred PL is when the data is read only. In my case it’s the configuration of a metro line, which arguably does not change that much often. I suppose this kind of situation is not uncommon in a lot of other domains.

The question of the separation of the data from the business logic that you mention is different and independent of the data encoding. But it is indeed important.

About the only thing I actually miss from having to work with Python is slice notation.

Having syntax for slice notation that desugars [1, 2, 3][::-1] into something like List(1,2,3).apply(Slice.End, Slice.Start, -1)) as a side-effect of this whole thing is something I wouldn’t be mad about.

Especially if untyped collection literals desugared into tuples, since that would seem to be a very reasonable choice because it doesn’t lose type information to widening the way defaulting to Seq would, doesn’t invoke a varargs constructor, and would provide syntax for zero and one element tuples.

val a: List[Int] = [1,2,3]
val b = [1,2,3] // val b: Tuple3[Int,Int,Int] = (1,2,3)
val c = []      // val c: EmptyTuple          = EmptyTuple
val d = [1]     // val d: Tuple1[Int]         = Tuple1(1)

// val e: Tuple2[Char, String] = 
//   (1,'c', 2, "d", 3).apply(
//      start = Slice.FromStart(1), 
//      end = Slice.FromEnd(1),
//      by = 2
//   )
val e = [1,'c', 2, "d", 3][1:-1:2]
1 Like