Pre-SIP: a syntax for aggregate literals

mberndt · November 7, 2024, 5:07pm

I’m not sure that this problem is worth solving, collection literals are pretty much guaranteed to be of limited length. And even if it does turn out to be significant, it should be possible to solve it on a library level – we can rewrite Seq(1,2,3) to the equivalent builder code with a macro. I also think it would be a mistake to conflate it with aggregate literals because then you need to rewrite a lot of existing code to use the new feature in order to get the performance benefits. Let’s try to make something that works well in existing code bases.

lihaoyi · November 7, 2024, 5:15pm

Good point, we can always just make def apply a macro as necessary

lihaoyi · November 9, 2024, 10:21pm

One issue with naive target-type companion-apply-inference approach is that it doesnt work well when the types arent exact.

e.g. In Mill we would like to be able to use aggregate literals where Task[Seq[T]] is expected, and IIRC other libraries in the ZIO or Cats ecosystems have similar requirements

On further thought we may be able to work around this issue in Mill by providing a variadic apply method in Tasks companion object, and it can be a macro if necessary (probably is for Mill)

Another approach would be to make aggregate literals always return a SeqLiteral[T] type, and then the various libraries can all define implicit conversions from SeqLiteral to Seq or Vector or Task[Seq] as desired. Not sure what the tradeoffs are but it seems like this approach would work as well

MateuszKowalewski · November 10, 2024, 2:23am

That’s a very interesting idea!

It would avoid to tie this feature to some specific lib. It would make it nicely generic and extensible.

In theory one could even change the syntax post-factum without affecting any users as even different syntax would expand to a SeqLiteral (or a MapLiteral) and any further handling happens in user space (some libs).

I think it would also allow to experiment with “object literals” later on without breaking anything because interpretation of SeqLiterals (or a MapLiterals) as object constructors would be again something in user space.

The idea is very much in the spirit of Scala where features are expressed as types in the compiler. (There was a meme on Reddit some time ago joking about the amount of “types of types” in Scala, but having a lot of specialized machinery implemented this way is actually a good design I think).

A different thing:

Do we also need SetLiterals, to complete what Python offers?

mberndt · November 10, 2024, 2:48am

Actually, I have identified this problem before for ZIO’s Optional type, and also proposed a solution: make it possible to override which type’s companion object a [] expression should refer to.

At the time I couldn’t find any other use for it besides Optional, but it would be useful for Mill’s Task as well.

mberndt · November 13, 2024, 11:15pm

We could also consider a scheme where, if the companion object (Task, in this case) doesn’t have an apply method, it looks for implicit conversions to the expected type in the companion object and if it finds one, it will try the convertee type’s companion object’s apply method.

Example:

enum Optional[+A]:
  case None
  case Some(a: A)

object Optional:
  implicit def toOptional[A](a: A): Optional[A] =
    Optional.Some(a)

val foo: Optional[List[Int]] = [42]

In this example, it would see that Optional doesn’t have an apply method, so it would look for a suitable implicit conversion in the companion object and would find toOptional. So it unifies the return type Optional[A] with the expected type Optional[List[Int]] and finds that A =:= List[Int]. And List has a variadic apply method, so [42] is desugared to List(42).
If there are several implicit conversions, we can ignore those whose return type doesn’t unify with the expected type of the aggregate literal, as well as those where the convertee type doesn’t have a variadic apply method.
Such an approach would make things like Optional or mill’s Task work while not requiring any additional language features.

bishabosha · November 14, 2024, 8:59am

Ok actually this is not right, [<FOO>] would unify with Optional[A] so therefore <FOO> must be the A so then A =:= List[Int] and then 42 doesn’t work.

What has been done in Swift/Rust and even Scala with generic number literals is just some type class that specifically resolves from the syntax to a type:

e.g. given [A] => FromArrayLiteral[List] => List[A] = ???

mberndt · November 14, 2024, 1:58pm

Hi @bishabosha,

I think there’s a misunderstanding here. I was proposing to extend the “expected type” mechanic in order to better handle implicit conversions.

The expected type in my example is Optional[List[A]]. The idea would be that if the expected type’s companion object – Optional in this case – doesn’t have a variadic apply method, then it would look for an implicit conversion inside Optional whose return type can be unified with the expected type. So it finds the implicit def toOptional[A](a: A): Optional[A] and unifies its return type – Optional[A], with the expected type, Optional[List[Int]], which yields A =:= List[Int]. And now it looks at the convertee of this implicit conversion, namely a: A, and sees if it can find a variadic apply method in its companion object. Since unification established that A is List[Int], the relevant companion object is now List, and it has a variadic apply method. Hence, the expression [42] would be desugared to List(42).

I hope this explanation made the idea clearer.

odersky · November 20, 2024, 5:48pm

I think in balance I’d prefer a scheme where we need a type class to decide the
result type of an aggregate literal.

Something like this:

  /** A typeclass to map sequence literals with `T` elements
   *  to some collection type `C`.
   */
  trait FromArray[T, +C]:
    inline def fromArray(inline xs: IArray[T]): C

FromArray is what I call an inline type class: It’s a type class with inline methods that can be implemented with macros. Here are some given instances:

  /** Straightfoward mapping to Seq */
  given [T] => FromArray[T, Seq[T]]:
    inline def fromArray(inline xs: IArray[T]) = Seq(xs*)

  /** A more specific mapping to Vector */
  given [T] => FromArray[T, Vector[T]]:
    inline def fromArray(inline xs: IArray[T]) = Vector(xs*)

  /** Some delayed computation */
  case class Task[T](body: () => T)

  /** A delaying mapping to Task */
  given [T] => FromArray[T, Task[Seq[T]]]:
    inline def fromArray(inline xs: IArray[T]) = Task(() => Seq(xs*))

The idea is that an aggregate literal like [a, b, c] with elements of type A and expected type C will search for a FromArray[A, C] instance fa. If one is found, it will expand to fa.fromArray(IArray(a, b, c)). Since fromArray is an inline method with an inline parameter it can be implemented as a macro that inspects its argument. So it could even produce some builder pattern. In other words the aggregate literal is treated by the compiler as if it was a call seqLit(IArray(a, b, c)) where seqLit is defined as follows:

 inline def seqLit[T, C](inline xs: IArray[T])(using inline fa: FromArray[T, C]): C =
    fa.fromArray(xs)

If the expected type of an aggregate literal is undefined the implicit search will be ambiguous. In that case we can default to some type. The most user friendly option is probably to default to Seq for plain aggregate literals and to Map for literals where all elements are pairs of the form a -> b.

Note that if seqLit was not declared an inline method, the code would be rejected with an error:

-- Error: seqlits.scala:21:15 --------------------------------------------------
21 |    f.fromArray(xs)
   |    ^^^^^^^^^^^^^^^
   |   Deferred inline method fromArray in trait FromArray cannot be invoked

In other words, methods of inline type classes can be invoked only in a context where the type class instance is statically known. I think that’s what we want here, anyway.

I prototyped this scheme in a test file that is added in A strawman for aggregate literals by odersky · Pull Request #21993 · scala/scala3 · GitHub.

alvae · November 21, 2024, 1:20am

FWIW, as far as I understand this approach is precedented in Swift, which has already been mentioned early in this thread. Specifically, one can peruse ExpressibleByArrayLiteral.

MateuszKowalewski · November 21, 2024, 4:23am

I like the type class approach. Seems in line with other things in the collections.

Type classes are now a broadly used and well know concept in modern programming languages, so there is now no reason to avoid them (like it happened in the fight against CanBuildFrom which was justified by “it’s confusing to newcomers”). We should imho actually use more type classes across the whole std. lib finally; but that’s another point.

So after the important things here are taken care of, I guess we can do some bike shed discussion?

The point is: I don’t like the proposed syntax.

Before Martin came here more or less everybody agreed that using [] is a very bad idea as this syntax is reserved exclusively for types.

I don’t get why it’s now OK to break this long standing rule more or less en passant.

Scala went even against almost all languages and does not use [] for indexing. But now it’s OK to use this syntax for something not really found in other languages in this sense as here? (For example [] is an heterogeneous array in JS, and a few other languages, so more like Scala tuples).

I liked the prefixed parens much more for the sequence literals.

Maybe not using # but instead * as this is also used for var args, which is related to sequences. (Also * is one key left to ( on an US keyboard).

lihaoyi · November 21, 2024, 9:08am

If I’m not mistaken, this concept is basically identical to the macro implicit conversions that the com-lihaoyi libraries make heavy use of today, with the same “can be invoked only in a context where the type class instance is statically known” requirement. It also seems very similar to what we already do in the experimental Numeric Literals

The main difference is the inline typeclass as described here would require a bootstrap def seqLit to trigger, whereas an implicit conversion can trigger either:

a mismatch between an expression type and a target type
a method call to a non-existent method on the expression type

(1) is the case where com-lihaoyi needs often (sourcecode.Text, os.PathChunk, mill.Task), while (2) is the case where com-lihaoyi usually does not want but sometimes does (e.g. in FastParse)

I suspect that with a bit of tweaking, we’d be able to re-use the same inline type class concept to represent all three concepts (aggregate literals, numeric literals, macro implicit conversions) maybe with a bit less power than present-day implicit conversions (i.e. we usually want (1) above and usually do not want (2)), and maybe have it extensible to other use cases users may come up with in future that we may not agree on standardizing yet (haskell-style overloaded strings?? aggregate literals for case classes???)

bjornregnell · November 21, 2024, 10:33am

I agree that there are downsides of overloading the meaning of [ ]

Can’t we just use the tuple syntax (a, b, c) and turn it into seqLit(a, b, c) using the type class scheme proposed by @odersky and perhaps some compiler magic if needed?

odersky · November 21, 2024, 10:48am

No, there is no seqLit. seqLit was just an articfact to simulate the behavior before we have a syntax for seqLiterals.

Yes, I think we can use the same approach also for numeric literals and macro conversions. I already outlined a draft for macro conversions in a comment for SIP-66 - Implicit macro conversions by Iltotore · Pull Request #86 · scala/improvement-proposals · GitHub).

The advantage of a typeclass approach over implicit conversions is that implicit conversions come with strings attached: you need a language import to enable them. In the future we might offer escape hatches where this is not needed, but that’s not fully worked out yet. Since we don’t need the full power of a conversion for aggreate and numeric literals I prefer not to use an implicit conversion for them in the first place.

lihaoyi · November 21, 2024, 11:12am

We almost can, but the issue is the single-element-in-parenthese (foo) already has a specific meaning that is explicitly not a tuple.

We could fake it with compiler magic or implicit converisons, but that adds either some sketchy conversions from Seq[T] to T, or some sketchy compiler magic to do the same
We could have some special syntax for one element lists, like Python’s (foo,) single element tuple syntax, but as you know most sequences are small and so this one-element-list scenario probably comes up a lot

In the end the issue is do we overload parens (used for tuples and grouping) or do we overload square brackets (used for types). Both have some degree of overloading, and both could work. Overall I fall on the side of preferring brackets because of the universality of that syntax across all other languages, which for me wins over sharing syntax with Scala tuples

bjornregnell · November 21, 2024, 11:18am

Thanks. You almost convinced me but I’m still on the fence… Still hoping for some solution that nobody has thought of yet that is as clean as (a, b, c)

aboisvert · November 21, 2024, 5:37pm

I wonder if ‘*’ could be used judiciously here since it’s already related to Seqs, and varargs / multi-valued parameters…

(Is this the “splat” operator? Sorry I’m not sure what the Scala community calls this operator/syntax)

jpablo · November 21, 2024, 7:07pm

I think it’s valuable to keep (a,b,..) immediately recognizable as a tuple, distinct from regular collections.
It’s usage in both value and type positions doesn’t seem to cause any issues.

val t: (Int, String) = (1, "a")

Furthermore [A, B, ...] already means some kind of sequence (of types), so using it as a sequence of values is not too far off.

bjornregnell · November 22, 2024, 10:11am

@bishabosha wrote this in another thread, but relevant also here:

Nice with more fresh ideas on the syntax dilemma on the table! I think it makes sense, given Scala’s other syntax choices.

soronpo · November 22, 2024, 11:38am

< and > are legitimate terms/operator tokens. I don’t see this as a viable option without ambiguities in the parser.