Pre-SIP: a syntax for aggregate literals

rjolly · July 25, 2024, 8:15am

The problem seems not so much to automatically import members of the companion object but rather, if we suppose apply is among these, what to do with it. I think we cannot just take any matching parentheses expression as a constructor, it is stretching things too much. On the other hand if we want to use some symbol such as <> then why not just alias apply as in:

object Foo:
  def apply(abc: String): Foo = ???
  def <> = apply

import Foo.*

<>("xyz")

Then the question reduces to whether or not to automatically import companion object members, and I do not see a reason why not.

ragnar · July 25, 2024, 9:18am

Importing (with or without renaming) the apply method needs to be done explicitly for each type.
Having a single global symbol that (essentially) resolves to the inferred type’s companion object solves that problem.

I do agree that automatically importing companion object members could also address the boilerplate concern. I think I would prefer automatic imports over new syntax (i.e, over [] and ..).

However, they seem also to have more far-reaching consequences. There was a similar feature in Kotlin which was restricted at some point if I recall correctly. Basically, the issue is that imports are also available in inner scopes, thus nested definitions get a lot of imports, and some may be undesirable. My hunch is that it would be not as bad in Scala, because Kotlin used that feature for mutable builders (thus, combined with automatic imports, you just had random side effects), whereas in Scala you would just construct immutable data, thus resolving an unexpected method would likely just not compile.

mberndt · July 25, 2024, 11:30am

Re the import scala.compiletime.companion idea: I think it wouldn’t really buy us much because it would still have very special semantics.

Let’s take a very simple example like

import scala.compiletime.companion as <>
class Foo(x: Int)

val foo: Foo = <>(42)

The issue here is that the expression that we need to figure out a type for in order to compile this is <>, but the expression that that type needs to be determined from is <>(42), not <>. This is completely different from how any kind of identifier in Scala today works, it will require support from the compiler, and hence I disagree about the idea that there are “no new principles to learn” here – there definitely are. And I think hiding very different behaviour behind a familiar syntax is actually more confusing than just having a separate syntax, like we do for _ lambda expressions, which are the most similar feature that we have today.

ragnar · July 25, 2024, 11:49pm

I do not propose that companion by itself has any interesting meaning, but (companion.abc(xyc): T) having the method call present, and the return type somehow inferred is necessary.

But I think if you want to, the way to think about companion[List[Int]] is that it returns List (the companion object). And type inference is adapted, such that companion.abc(xyz): T infers companion[T].abc(xyc).

To make this more concrete I implemented something as close to my proposed syntax as I could get using a macro (non-transparent), this is how it can be used:

case class SomeType(v1: Int, v2: String) derives Syntax

object SomeType { def test(): SomeType = new SomeType(1, "test") }

case class NestedType(v1: Int, v2: SomeType) derives Syntax

object Test {

  import companion as <>

  def main(args: Array[String]): Unit = {
    val res1: SomeType = <>(42, "apply")

    val res2: SomeType = <>.test()

    val res3: List[SomeType] = List(<>.test(), <>(-1, "list"))

    val res4: NestedType = <>(42, <>(12, "type 1"): SomeType)

    ()
  }
}

A major limitation is that the implicits hack I use to convince type inference to infer the return type to get the companion object is not very stable, so in res4 the annotation is necessary.
Also, because this is a macro based hack, it has terrible error messages when it does not work.

With the above limitations, this is clearly useless as implicit syntax, but a proper implementation in the compiler might be able to address those.

Being able to mostly express this scheme using existing concepts is what I meant with “no new principles to learn”.
Yes, this uses quite a bit of advanced features, but conceptually it’s just method calls (albeit, “generated” forwarder methods), and type inference (of the return type). I guess the strange part is why these methods would exist on companion, but .

A non-macro sketch of the concept is here:
Scastie - An interactive playground for Scala.
the macro is here:
Scastie - An interactive playground for Scala.

mberndt · July 27, 2024, 8:04pm

Yes, I understand. My point is that this would be an identifier that behaves completely different than any other identifier (its type would be determined by a different expression), and so at that point it’s effectively a language feature that users would need to learn and it would have to be supported by the compiler and any other kind of sophisticated tooling (IDEs). It would be like allowing Scala users to use a different character than _ for abbreviated lambda expressions, and I don’t see the point of that.

bjornregnell · July 28, 2024, 2:09pm

I agree that this is nicer to read even if it may be a surprise that github is in scope. But I’m hesitant to introduce new syntax like [elem] for values.

What about this: @lihaoyi ?

a language import that switch on that companion object members are in scope if the expected type has a companion
a language import that switch on named tuple conversion to an apply call of the expected type

Something like:

import language.{companionScope, tupleApply}

def pomSettings: PomSettings = (
  description = artifactName(),
  organization = "com.lihaoyi",
  url = "https://github.com/com-lihaoyi/scalasql",
  licenses = apply(MIT),
  versionControl = github(
    owner = "com-lihaoyi",
    repo = "scalasql"
  ),
  developers = apply(
    (id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
  )
)

We can also do away with one elem list apply(MIT) if we allow single-elem-tuples to be adapted as in (MIT), if we can accept that parens around an expression alters its meaning when the language.tupleApply is imported.

Sporarum · July 28, 2024, 6:05pm

Didn’t we want to move away from language imports ?
(IMO for good reasons)

mberndt · July 28, 2024, 7:47pm

I would like to hear your reasoning for this. New (experimental) syntax was recently added for named tuples, and unlike my proposed syntax, it could be used only to create those, so it has a much smaller power-to-weight ratio.
Actually, we could probably supplant the named tuple syntax with this proposal and use the [] syntax for named tuples as well. After all, it’s still experimental, so nobody is using it and backwards compatibility is not an issue (and the reason I’d prefer [] is that I’m absolutely positive that wrapping any expression in () must remain a no-op).

Why would it be limited to named tuples rather than tuples of any kind, or even mixed parameter lists where some arguments are named and others are positional? We can philosophize and debate all day long about what is readable and what isn’t. But we could also take a look at what people actually do in the real world, and we’d notice a pattern: they come up with all kinds of operators and DSLs to be able to write composite data structures with a compact notation. They use string interpolators, like ivy"org.slf4j:slf4j-api:1.7.25". Or they use operators, like "org.slf4j" % "slf4j-api" % "1.7.25". Why not just provide people with a convenient way to do this stuff, i. e. ["org.slf4j", "slf4-j-api", "1.7.25"] ?
Requiring argument names would also preclude this from working with collections, which makes it a non-starter as far as I’m concerned.

Please please no, parens around an expression must remain a no-op, otherwise you’re going to have people accidentally converting stuff to type-safe wrapper classes all the time. I don’t scare easily, but that way lies only madness. Parens around a single expression being a no-op is hard-wired into basically every programmer’s brain, except maybe Scheme programmers’. What’s wrong with just using unambiguous syntax like []?

nafg · July 28, 2024, 11:16pm

We could use some new kind of brackets, like

[[ … ]] – doubling has served to disambiguate in the past, for instance :: and : have opposite meanings vs. Haskell
Since we use […] the way other languages use <…>, perhaps angle brackets are free to use for this
Maybe something with a colon or into would convey “this expression’s meaning is based on the expected type”
Or, perhaps we could do something like >(…) where > is “just an object” whose apply method is somehow defined according to the apply method of the expected type.

mberndt · July 29, 2024, 12:50am

Hey @nafg,

thanks for joining the discussion.

Is there an ambiguity problem with the proposed [] syntax? Because I’m not aware of one, expressions cannot currently start with [. At first I thought that there might be a problem with infix operators:

object A:
  def +[A] = ()

But A + [Unit] is currently a syntax error and not, as one might expect, a call to the + method with an explicit type parameter.

I thought about it, but < and > are currently valid identifiers, so you can do this:

object < :
  object foo:
    def apply(a: Any) = ()

object >

Now < foo > is a valid expression.

That’s essentially @ragnar’s idea, no?

Anyway, I think the main point of contention here isn’t so much the syntax, it’s whether the feature in its unrestricted form (as I originally proposed) is too easily abused to produce unmaintainable code, and if that is the case, how it could be nerfed to prevent that. I feel that such concerns are misplaced (because whether something is maintainable or not depends too much on the context, and also because I think that enforcing arbitrary and taste-based “readability” rules is firmly in the territory of linters, not the compiler), but I think that’s the main objection.

lihaoyi · July 29, 2024, 1:28am

At least one example of expressions that start with [] is Scala 3 polymorphic function literals

val e0 = Apply(Var("f"), Var("a"))
val e1 = mapSubexpressions(e0)(
  [B] => (se: Expr[B]) => Apply(Var[B => B]("wrap"), se))
println(e1) // Apply(Apply(Var(wrap),Var(f)),Apply(Var(wrap),Var(a)))

The [B] => (se: Expr[B]) => Apply(Var[B => B]("wrap"), se) is an lambda expression that starts with [

bishabosha · July 29, 2024, 7:34am

you don’t see a potential for confusion with type arguments/parameters? your single element ambiguity could be solved with trailing comma (MIT,)

bishabosha · July 29, 2024, 7:40am

one thing that is absolutely illegal and so can’t be ambiguous with old code is #()

rjolly · July 29, 2024, 9:21am

As an additional data point, I would like to mention that the companion object members are not brought automatically into scope for the corresponding class, which I think is similar to the expected type situation, and at first suprised me, as companion members are like Java’s static fields, which are in a class’ scope, obviously.

// nok

class Foo:
  bar
  ^^^
  Not found: bar

object Foo:
 def bar = ???

// ok

import Foo.bar

class Foo:
  bar

object Foo:
 def bar = ???

Although surprising, I guess there is good reason for it, and that might hinder automatic import of companion members in the expected type situation as well.

ragnar · July 29, 2024, 12:28pm

At this point in the discussion (re-skimming the thread a bit) I am not quite sure what exactly the discussion is about exactly anymore, I see at least:

new syntax, e.g. [a, b] being somehow automatically converted to the target expression (original proposal)
a new idiom to use existing tuples as literals (e.g., with a FromLiteral typeclass that essentially acts like an always allowed implicit conversion) by soronpo, and mentioned by lihaoyi.
“scope injection” of symbols defined in the companion object of the target type (I found a couple mentions, but not the original proposal)
a symbol to kinda access the companion object of the target type.

I may also have overlooked some.

1&2 seem to be somewhat mutually exclusive, as seem 3&4.
But overall it seems unclear to me which to prefer.

I am actually with you on this one. Scala already has all the tools to win an obfuscated code challenge.
Moreover, I think that arbitrary “readability restrictions” make code harder to understand as it requires learning all the little exceptions where something is allowed and where not.

Side note, I think named tuples are an interesting example here, because they bring tuple syntax and parameter list syntax closer together, removing exceptions.

mberndt · July 29, 2024, 12:31pm

Ah right, good point! I think it’s still technically unambiguous because expressions can’t be followed by a => token, whereas the type parameter list of a polymorphic function type must be. So once the parser finds the matching ] token, the next token can determine what kind of expression it is: if it’s => then it’s a polymorphic lambda, in any other case it’s an aggregate literal. But it’s still a mess that’s probably best avoided.

That’s an interesting idea… Trailing commas are not currently allowed in tuples though, so it’s still a syntax extension.
Python does it this way, but nevertheless, I find it looks a bit odd, and if we’re going to have to extend the syntax, then I think that I’d prefer something like your other proposal:

I thought of that too, and it certainly has advantages. I don’t think we can get away with any of {}, [] or (), so we’re going to need some sort of “decorated paren” thing. And since most characters can be used as identifiers in Scala, we don’t have that many left. We should also consider that we probably want to extend this syntax from expressions to patterns some day. If this code works…

val x: Seq[Int] = #(1,2,3)

…then so should this:

x match
  case #(a, b, c) => 42

Hence, syntax that works for expressions but not for patterns should probably be avoided. Specifically, :() and @() would be fine in an expression context, but could lead to confusion in a pattern context because those symbols are already used in patterns. An entirely separate symbol like # would avoid that for human readers. For non-human readers, any of :, @ and # would be fine because a pattern can’t start with any of those today.
The last option I can think of is .(), which is largely the same as #(), so I’d be fine with it, although I do prefer #() on a visual level.

Most other symbols are either obviously unsuitable, or a valid identifier, or Unicode (non-obvious to type).

mberndt · July 29, 2024, 1:47pm

Haha, I can’t blame you because we’ve explored many different paths from where we started. I think that’s actually really good and it has certainly provided me with new insights.

Going through your list in order, here’s how I think about the various options:

That is the original idea. I don’t think of it as “conversion” though, rather it’s a kind of syntactic sugar that’ll fill in the correct companion object in front of a parameter list. So you can go from [1,2,3] to List(1,2,3), but also from [x=1, y=2] to Point(x = 1, y = 2).
(tuple conversion) I don’t like this idea for a variety of reasons. It’s less flexible because there are things that just aren’t possible with tuples, like multiple parameter lists or using clauses or having some parameters named and others unnamed. It will also lead to terrible error messages and bad tooling, unless the tools grow specific support for these conversions. Moreover I don’t think it can be made to work for the case where such expressions are nested. I don’t think a good solution can be achieved this way.
(scope injection) The original scope injection thread is here. At one point I thought it was a good idea to merge the two, but I’m no longer convinced because I think the issues are sufficiently distinct that more than one language feature is going to be required to solve them (sorry @soronpo, I’m still convinced that relative scoping of some form is required due to the reasons I’ve laid out in another comment, but I think it’s a largely separate issue, and I’m still prepared to help out with writing a proposal)
(placeholder for companion object) That is quite similar to number 1 which proposes a syntax for companion object apply calls. It has the added benefit of also allowing things like #.of(1958, 9, 5) to create a LocalDate object (assuming that # means companion object). I think that’s a good solution.

Now that you brought up that last one again, I had some more thoughts about it. At one point I thought it would be nice to have a syntax to select members from the companion object, e. g. #of would select the of member of the companion object (or static method for Java classes). That would allow us to get rid of that ugly little dot in #.of(1958, 9, 5). But then I realized that maybe you don’t always want to select anything from the companion but rather just refer to the companion object itself. Notably, that is the case for collection conversions:

val foo = List("bar" -> 42)
def baz(m: Map[String, Int]) = ()

baz(foo.to(Map)) // using companion object here
                 
baz(foo.to(#))   // but could use a placeholder too!

So maybe that ugly little . in #.of(1958, 9, 5) is the price to pay to enable this use too.

Now that I’ve thought about it again and that @lihaoyi has demolished the [] idea, I think that this “companion object placeholder” idea is probably the best solution.

Absolutely, that is what I was trying to express with many more words before. Let’s make the language simple and orthogonal and have linters deal with “readability” for those that deem that necessary.

ragnar · July 29, 2024, 2:59pm

I wonder about this. Let me phrase it a bit differently than tuple conversions.

The current state of things is, that the syntax for method parameter lists at call site is effectively the syntax for literals in Scala.
The main problem discussed in this thread seems to be that importing and/or repeating the companion object is unnecessary boilerplate (there seems to be little disagreement about this).

A very direct way to address this seems to be to allow omitting that companion object definition. The remaining part would be a “parameter literal”. Are parameter literals typeable? Maybe not in general, but for many cases without other constraints their type would just be the corresponding tuple type.

Similar to how function literals can be converted to SAM types, a parameter literal could be converted to types marked by something (a type with an apply method on the companion, or marked with some annotation, or using some type class like FromLiteral, does not matter for now).

I think this is very close to many proposals that were made in this thread (it effectively looks like automatic import of apply methods, and implicit conversions for simpler cases). But I want to emphasize this here because it can explained by analogy to existing language constructs – the syntax exists, and conversions based on expected type exist.

To me it seems that this use of “just get the companion object” does not fit well with the way type inference works.
In my proposal for this variant: (companion.abc(xyz): T) it’s just the return type that needs to be inferred, and as my implementation shows that is actually possible today using implicit.
The above seems to either require some inverse inference (going from the outer type serveral levels deep inside) or heuristics on what the scope of # should be that are similar to how _ works for anonymous functions (which is to say, I don’t think that would work well).

mberndt · July 29, 2024, 4:35pm

Yes, exactly. That is pretty much point 1 in your list above, and my original proposal (using [] rather than ()).
And that’s definitely a viable proposal, we can make that work. But please do consider the issues that were brought up about this:

It can’t be used when you want to use a method other than apply. For example, LocalDate objects aren’t created using LocalDate(y, m, d), they’re created using LocalDate.of(y, m, d). Or a cats.data.NonEmptyList, which isn’t created with NonEmptyList(a,b,c,d) but NonEmptyList.of(a,b,c,d). I don’t think this is absolutely crucial, but it’s nice to have.
It doesn’t work when you have more than one parameter list or a using clause.
If we go with a () syntax, then simply wrapping an expression in parens – which is so far always a no-op – can now cause a constructor to be called. I think this makes it way too easy to accidentally trigger construction of an object that you didn’t mean to (e. g. typesafe ID wrapper types)
If we go with a [] syntax, then it complicates the parser as @lihaoyi has helpfully pointed out

So overall, this approach doesn’t address all the use cases that I would like it to, and both of the syntaxes that have been proposed have drawbacks that I’d rather avoid. That is why currently the “companion object placeholder” model looks best to me.

That’s right! And the good news is that we already do that today for other language constructs. For example, this works perfectly fine:

val f: Int => Int => Int =
  x => y => x + y

Neither x nor y need a type annotation here, so this recursive, incremental type inference thing is actually already happening, and it’s a proven approach.

Yes, absolutely, and the scoping issue is exactly what I was trying to get at in my previous comment.
But the good news is that, again, we have a set of proven rules on how that should work, and it’s the scoping of _ in lambda expressions. So that’s why my suggestion is to use the exact same rules also for the scope of the # placeholder. That would work for every reasonable example I can come up with:

val _: List[Int] = #(1, 2, 3)
val _: Duration = #.fromNanos(42)
val _: List[Int] = (1 to 10).to(#)
val _: Future[Unit] = #(println("Hello, ragnar!"))(using ExecutionContext.global)

And actually, we can experiment with that syntax today! We just need to place CompanionObject.type => in front of the expected type and then use an _ instead of # and squint a bit! All these compile:

val _: List.type => List[Int] = _(1, 2, 3)
val _: Duration.type => Duration = _.fromNanos(42)
val _: List.type => List[Int] = (1 to 10).to(_)
val _: Future.type => Future[Unit] = _(println("Hello, ragnar!"))(using ExecutionContext.global)

And here’s an extra cool one:

val _: List[List.type] = #(#) 

// to simulate the syntax in current Scala:
val _: (List.type, List.type) => List[List.type] = _(_)

ragnar · July 30, 2024, 9:59am

To me, introducing new characters vs mostly reusing existing things is quite a drastic difference in proposal. New characters run out quickly, and change the way a language looks by a lot.

Both the .of(a, b, c, d) and of(a, b, c, d) variants have been proposed and seem like they could work. There have been some arguments about ambiguity before, but the way I see it is that a “conversion” would happen only in places where there is a known expected result. Might still have issues, but would need to be explored further.

I don’t see why it could not work technically. Just have multiple parameter lists as in any of the other variants.

mberndt:

val f: Int => Int => Int =
  x => y => x + y
Neither x nor y need a type annotation here, so this recursive, incremental type inference thing is actually already happening, and it’s a proven approach.

The example seems quite different, but I guess you view curried functions not as a single entity.

I did test it though, and came to the conclusion that baz(foo.to(summon)) works.

I guess it’s just unfortunate that it does not seem to work with my macro/dynamic hacks .

Concluding remark. Anyway, as I said before, I think the “a shorthand for the companion object” variant is a good solution, and it does seem more plausible to add compared to making parameter lists first class (or even just second class …).