Pre-SIP: A Syntax for Collection Literals

Following up on my example with a generic col method:

def col[T, C](values: T*)(using f: Factory[T, C]): C =
  f.newBuilder.addAll(values).result()

Maybe there is a way to enhance the inference machinery to address its limitations (inability to express nested collections) instead of creating entirely new syntax?

The [1,2,3] syntax proposed by Martin does not suffer from the nesting problem, and he claims the implementation to be straightforward, so maybe we could just do something similar for col method?

Even if it had to be special-cased by the compiler, we could avoid creating an entirely new syntax and making the language more complex, at least per the criterion of grammar size which Martin likes to invoke.

I always very much liked that Scala does not have a special syntax for collection literals.

4 Likes

As I have pointed out earlier, @JD557 has already demonstrated how this can be done without new syntax, including nesting.

Unfortunately, the proponents of this feature have yet to explain why this isn’t good enough.

Incidentally, I think this demonstrates why we need a better tool than a simple forum to discuss these issues. It is simply too easy for points to be buried in these long form threads without structure.

6 Likes

Then why wouldn’t we put effort into making this standard and non-fragile? Why not just use String*?

If it’s because def foo(a: A*, b: B = defaultB) doesn’t work, why isn’t the SIP to make that work? That’s an obvious win anyway–there’s no syntactic ambiguity; you just can’t reach b unless you name it.

4 Likes

Okay, so the feature isn’t that named tuples have any relationship with case classes, but rather that the same literal notation that can be used for named tuples can also be used for case classes. That definitely makes the syntax change pull its weight better. (But I still think it leaves named tuples as not-shining-as-much-as-they-should.)

However, I note an asymmetry between the proposals as they stand.

val x = (speed = 14, destination = "Paris")
val y = [1, 3, 6, 10, 15]

In the first case there is no doubt that x is a named tuple–you just look at the code and see that that’s what it is. (Possible exception: if we have an abstract class where val x: Trip is defined, will val x be a named tuple or a Trip?.)

But in the proposal under consideration, val y is determined by whatever set of givens happens to be in scope. In particular, we can’t assume it’s a Seq[Int] or IArray[Int] or any other natural type; it could be a bitset for all we know.

I think it’s worth revisiting this part of the proposal (as I suggested before). As far as I know, no language allows distant import statements to determine the type of collection literals; it’s strictly an override when the type is known (which is why C# for instance allows partial type annotations). I think you’re right that:

However, the person may end up needing to care after all what the type is–for instance, without H-M type inference, they may wish to factor out the collection creation, and then they need to know what type it is.

In the JSON example, in addition to the compiler yelling at them if they get it wrong, they can also just look at the schema, which clearly says IArray[String] and List[Plugin]. But in your original example, there was no good way to figure out the types.

I think that although you want to enable people to not care about the collection type, you also want to empower them to figure out the type without too much struggle.

If you merely had the equivalent of implicit conversions activated by default whenever [...] was specified, then we’d be on the same kind of ground that we usually are with on-by-default implicit conversions and numeric widening and so on.

But if [...] doesn’t even have that, but rather acts like (...).into where into changes types based on givens and type inference, that’s pretty weird. You can imagine an into method that creates a named tuple from a larger named tuple (same types, but more names/types that are unused). But

val x = (first = "John", last = "Doe", age = 42).into

would just be weird. Who knows what is being picked out and why? Hiding the operation name by using [ ] symbols instead of a name ( ).into doesn’t make the weirdness any less; it just alerts you less and thus is all the more perplexing when something non-obvious is going on.

“Don’t use Scala–we added an innocent import statement to get our config right, and then our production system slowed down to 1/n speed because our Vectors silently got converted to List” sounds like just the sort of Reddit post that would happen about six months after someone actually started using the feature seriously.

(The post would be wrong, of course. “Don’t use collection literals” is the appropriate reaction to that kind of unpleasant outcome–but that also is an argument against giving too much power to givens.)

2 Likes

But Scala already does have collection literals:

val c = (1, 2, 3)

creates a collection of Ints, initially packed into a Tuple (which is also a collection, out of the box), but can be converted into any other collection either explicitly (c.toList) or implicitly, if necessary, based on the target type

val c: Seq[Int] = (1, 2, 3) // easy to support without syntax changes

From that perspective, what is the difference between these two then?

val c = (1, 2, 3) // already available, usable, clear, familiar
val d = [1, 2, 3] // requires additional syntax, not aligned with any other syntax in Scala

What makes d better than c, basically?

Frankly speaking, to me this discussion looks more like: “even though Scala has collection literals via tuple literals already, we refuse to recognize it and want to get brackets anyway just because most of other languages use brackets for that”.

10 Likes

You could make it work if qux itself took repeated parameters qux(col(1,2), col(3,4)). It solves the case of not knowing the required target type, though I think repeated parameters just become a Seq? So this isn’t really much different from qux(Seq(Seq(1,2), Seq(3,4))) – the implementation still has to convert a Seq into the desired collection type. So, we’re back at square one.

He did mention that #(a, b, c) or (a, b, c)* were alternatives. This kinda just feels like the same thing, no less complex, except now it’s masquerading like an impossible method instead of a literal. Making a special case for a method is really the same thing as creating new syntax.

That said, your post made me ponder the possibilities of repeated repeated parameters (or repeated parameter lists?), or the case where an inline repeated parameter list is basically treated as a collection literal that can be converted to the appropriate collection type. These ideas fail because they requires the callee to do something special to provide this convenience to the caller. So, I think that puts us back at Martin’s proposal; the caller needs new syntax.

I don’t think the API from @JD557 solves the nesting problem. The examples are just picked in a way that matches the priority of implicits.

This one does not compile:

val e: List[(Int, Vector[Int])] = &(1 -> &(1, 2, 3))

The inner & is inferred to be a List - this is driven just by the fact that typeclass instance for List has priority over the one for Vector.

The root of the problem here is the fact that type information during inference in Scala flows mostly in one direction: bottom-up. Inference of type parameters based on expected return type is very limited, and (as far as I know) can “look” only one level up.

2 Likes

Thanks @ghik, that finally answers the question of why a library can’t solve the problem.

Every single usage of repeated parameters in Scala creates an intermediate JVM Array - this is unavoidable. Even if you write Seq(1,2,3), an Array[Int] gets created first, and only then it is converted to a List (the “default” implementation of Seq). So, I don’t think switching between repeated vs collection parameters makes any difference.

If qux takes a repeated parameter, then it can no longer decide the expected collection type - it must convert the ArraySeq (the runtime representation of repeated parameters) into it - which defeats the purpose of declaring anything more specific.

The root of the “nesting” problem is mostly one-directional flow of information in type inference (bottom-up). Scala cannot infer a type parameter based on expected return type, except when it’s just one level “up”.

1 Like

It’s sounding like it would be surprising for the following to be invalid:

val box: Rect = ((5, 12), (25, 192))

But I do like that this is approaching something universal…

Imagine that any literal tuple can be interpreted via some constructor, whether case class or collection. This allows any value can be demarcated using nested inline tuples, essentially using only literals and parentheses following the generic AST of the desired value.

(val boxes: List[Rect] = ( // omits constructor `List`
  ((0, 0), (10, 20)),      // omits `Rect` and `Point`
  ((5, 15), (25, 192))))

The biggest issue with this idea is that these look too much like tuples.
What if we removed the commas to distinguish them?

(val boxes: List[Rect] = 
  (((0 0) (10 20))
   ((5 15) (25 192))))

It cannot be solved on library level with current type inference.

I’m just wondering: if it was so easy to add the [] syntax with desired type inference, why can’t we solve it without introducing new syntax?

A random thought: maybe we can hint the compiler that a type parameter must be inferred based on return type, and not based on parameters? For example:

def col[T, C@inferFromExpectedReturnType](values: T*)(using Factory[T, C]): C

I know that it’s probably much more difficult that it seems to me know, but maybe the real solution is somewhere in this area?

2 Likes
scala> def foo(ns: Int*) = ns
def foo(ns: Int*): Seq[Int]

scala> foo(3,4,5,6)
val res10: Seq[Int] = ArraySeq(3, 4, 5, 6)

Yes, the Seq is an ArraySeq, but that doesn’t change my point; There’s a concrete collection that needs to be converted to the desired type. col can’t do the work of adapting it to an arbitrary type for a caller unless it’s special cased by the compiler. And even if it did, there’s a concrete collection which we could only avoid instantiating by special casing in the compiler. So we’re back at square one with the original proposal.

1 Like

I mostly agree, but I am hoping that we can find a way to amend the type inference algorithm in a way that avoids the necessity of a new syntax, or even special casing a method in the compiler.

We could avoid that by making col a macro - no need to special case it in the compiler just to avoid intermediate collection.

One more comparison to Python (which is one of the most popular and ubiquitous languages, so it is fair to compare to). In Python both are possible:

x = (1, 2, 3)
y = [1, 2, 3]

The difference is basically that x is immutable whereas y is mutable. Not sure it is the difference that Scala needs.

Also, in Python this one is possible just as well:

x = (1, "two", True)
y = [1, "two", True]

However, if Scala gets bracket-based literals, then we’ll end up with the following dichotomy:

val x = (1, "two", true) // all types are captured: (Int, String, Boolean)
val y = [1, "two", true] // what is this? Seq[Any] ?
4 Likes

So, the whole thread point to the fact that that feature is at least controversial.

Its absolutely sure it will lead to more fragmentation, as sure as as xkcd 927.

It’s also almost as certain it will happen, at least if history of last years are of any guidance regarding syntax changes mimicking Python and supported by Martin.

There is doubt it will have a weight on tooling.
And there is a need for ecosystem impact analysis.

Perhaps a good way would be to have the feature be only in nightly (if that’s still how it’s done for proposed evolutiins) for some releases and use that time so that the proponents of the feature try to add it to the whole tooling chain, so that for that time, the tooling is already up to date when the feature land in a release.

That would be an amazing shift and maturity assessment of the ability for the language dev and the tooling to work together, and not having the latter run after the former for ever.

4 Likes

From my perspective, this is the core issue here that has to be addressed in the first place. Currently, the parentheses-based literals are ambiguous and do not allow to use them universally:

val a = (1, "2") // tuple
val b = (1) // not a tuple
val c = () // nope!

and on the type level too:

type A = (Int, String) // tuple
type B = (Int) // not a tuple
type C = () // nope!

Personally, that issue bugs me way worse comparing to the lack of special syntax for collection literals (to be honest, the latter doesn’t bug me at all).

How does it relate to collection literals? That is simple: if Scala manages to fix the syntax for tuples and make it working for arities starting with 0, then we’ll get the full featured collection literals automatically:

type A = ?(Int, String) // Int *: String *: EmptyTuple
type B = ?(Int) // Int *: EmptyTuple
type C = ?() // EmptyTuple

val a = ?(1, "2") // Int *: String *: EmptyTuple
val b = ?(1) // Int *: EmptyTuple
val c = ?() // EmptyTuple

where ?( ... ) assumes any approach that would work here, I don’t mean the question mark specifically.

The important condition though to make it useful: it should work universally for both types and literals and for all arities starting with 0.

Otherwise, piling up [] syntax on top of the existing one would looks like that Scala leaves one controversy unresolved and introduces yet another one:

val a1 = Tuple() // works but sticks out
val a2 = [] // ok, but doesn't align with other Scala syntax
val b1 = Tuple(1) // works but sticks out
val b2 = [1] // ok, but doesn't align with other Scala syntax
val c1 = (1, 2) // ok
val c2 = [1, 2] // ok, but doesn't align with other Scala syntax
val d1 = (1, "two") // just fine too
val d2 = [1, "two"] // don't do this!

why there should be so many different ways in Scala to express similar things with so many caveats and catches for particular cases?


Not mentioning that if we need to create a collection of a particular type, not just a arbitrary sequence of items, then nothing can beat the direct syntax Vector(1, 2, 3).

7 Likes

Yes, that would be surprising. But if it worked, that would also be quite bad. Is the encoding ((x0, nx), (y0, ny))? Is it ((cx, cy), (wx, wy))? Is it ((x0, y0), (xN, yN))? Or ((x0, xN), (y0, yN))?

With the status quo, these would be something like Rect(Span(x0, nx), Span(y0, ny)), Rect(Point(cx, cy), Size(wx, wy)), Rect(Point(x0, y0), Point(xN, yN)), and Rect(Span(x0, xN), Span(x0, yN)) respectively–pretty good, though the first and last aren’t distinguished.

With the named-tuple-literal-is-also-case-class-literal proposal from Martin, though, the safety is strictly better. You’d have something like (xs = (start = 5, len = 7), ys = (start = 12, len = 180)), (c = (x = 8, y = 102), w = (wx = 7, wy = 180)), (ul = (x = 5, y = 12), br = (x = 25, y = 192), or (xs = (start = 5, stop = 25), ys = (start = 12, stop = 192)). That’s brilliant. Every single one is clear–as long as the case classes don’t have really poorly-chosen field names.

So I’m actually arguing that ((5, 12), (25, 192)) is just bad. It’s very natural, but bad.

Because Martin’s proposal is actually a case class literal syntax proposal (not a named tuple conversion proposal), where case class literals can use named tuple literal syntax as long as the expected type is a case class, we don’t have the danger.

So I think this is pretty awesome. Explained properly, you get (almost) all the safety you could want, and don’t have to have any superfluous class names when everything is known.

It is specifically all the information provided by the field names that makes this work.

4 Likes

I just want to point out that in the time I’ve used R and Python and Rust and Scala all for vaguely similar things, the only time I felt a literal syntax was clearly far better was for numeric data. And it’s not that Scala was the outlier; Scala and R were the outliers. [[1, 3], [5, 7]] is clearly a 2x2 matrix. c(c(1, 3), c(5, 7)) is much much less clearly a 2x2 matrix.

Every other wish for slightly shorter collection syntax doesn’t seem to me more than a mild benefit. You can do it all with an extra character; what’s the big deal?

Well, the big deal is, specifically with mathematical vectors and matrices, that anything that interrupts the visual flow gets in the way of visual verification that you have what you want. Matlab and Julia have [1 3; 5 7] which is even better in this regard for just numbers, but kind of falls apart once you have expressions (the commas and extra [] really help then).

Scala can easily match R. We could literally define a c that did the same thing, in a library, if we wanted. We could make it postfix, if we preferred, little-to-no compiler needed, and basically no chance of collision with almost anything anyone uses now: ((1, 3).c, (5, 7).c).c.

Scala cannot easily match Rust and Python and Julia. Even ((1, 3), (5, 7)) isn’t as nice as [[1, 3], [5, 7]] because the sharp corners help visually delimit the relevant items.

But, again, this really does not matter unless you are dealing with piles and piles of mathematical vectors and matrices all the time.

If you are dealing with piles and piles of mathematical vectors and matrices all the time, the fact that Scala doesn’t have clean slicing notation is even worse than the collection awkwardness. I’ve already mostly fixed that in extension methods I wrote for Array–you can do stuff like xs(1 to End-2) = 7 and it works. Not everything works, because Scala isn’t quite syntactically flexible enough, but it’s close enough so that it’s not a pain point any longer.

Anyway, I think that if any proposal for Scala made it by-default really easy for piles of numbers, that would be a substantial win for that use case. People don’t really use Scala much for that use case, so maybe that’s not the target.

But I did want to push back against the idea that c('petal', 'sepal') is way better than Seq("petal", "sepal") but c(c(1, 3), c(5, 7)) and [[1, 3], [5, 7]] are two birds of a feather. At least to me–and this is after many hundreds of hours of use–I’m indifferent to the distinction between the first pair, but the second pair is a big enough deal to make me switch languages unless there’s some other compelling reason not to (which there usually is).

4 Likes

@odersky I would propose using (...) instead of [...].

case class Item(name: String, price: Int)
val items: List[Item] = ( (name="a", price=50),  (name="b", price=100) )
val items2: List[Item] = ( Item(name="a", price=50),  Item(name="b", price=100) )
val items3: List[Item] = List( Item(name="a", price=50),  Item(name="b", price=100) )

versus dichotomy:

val items: List[Item] = [ (name="a", price=50),  (name="b", price=100) ]
val items2: List[Item] = [ Item(name="a", price=50),  Item(name="b", price=100) ]
val items3: List[Item] = List( Item(name="a", price=50),  Item(name="b", price=100) )

This will give case-class literal and collection literal syntax a straightforward explanation: you can skip writing the target class name if it can be inferred from the expected types, something we are used to doing in Scala already, without a new rule to learn.

1 Like

Regarding the use of named tuple literals for case classes: how does that interact with implicit conversions? This is clearly something that’s relevant for Mill, but also for things like zio-k8s which defines an implicit conversion from A to Optional[A] in order to simplify the specification of optional function parameters. Or perhaps we can find a way to solve that problem without implicit conversions? Since we already have a feature for repeated (variadic) parameters, having one for optional parameters doesn’t seem outlandish.