Pre-SIP: a syntax for aggregate literals

I’m going to once again come back to a point that I made earlier, which is that while you’re placing all your focus on this .of(...) expression, you’re neglecting all the cases where you have even less information of what the thing you’re passing to the Person constructor might be. Nobody is complaining about the fact that Person("Martin", _) is a perfectly valid expression to construct a function of type LocalDate => Person, nor did anybody ever insist that this must be written Person("Martin", birthday = _). The reason is that the language couldn’t possibly understand all the context that a human reader can factor in while reading the code, and thus whether to make this explicit is a choice that the programmer must make. In an expression like Person("Martin", .of(1958, 9, 5)), it is relevant context that I know what Martin looks like and therefore have rough idea of what his year of birth might be. Humans do this kind of subconcious cross-referencing all the time, and it’s not something that any programming language will ever understand. Another example of context is variable names. What if the expression isn’t .of(1958, 9, 5) but .of(year, month, day)? It’s just not possible to argue that anybody could mistake that for anything other than a date.

The irony here is that I probably would use a named parameter for this case in order to distinguish between a person’s birthday and other possible relevant dates (like wedding date, signup date or whatever). But when I just write LocalDate.of(y, m, d) I don’t have to do that either, so adding this rule for a relative scoping expression doesn’t really solve the problem, especially given that people can just import LocalDate.of. So you’re not enforcing readable code, you’re just enforcing longer import lists.
And there’s a lesson here: you cannot enforce readable code through language rules. Readable code is the result of developers giving a fuck, and no amount of language legislation is going to change that.
Another example is my zio-aws example from earlier. s3.createBucket([bucket = ["foo"]]) is good code, adding a parameter name is pure noise, and adding noise makes code worse, not better.

When it helps, developers have the possibility to use named parameters. I like named parameters, I probably use them more than the average developer. But I don’t want to have the language tell me when I need to use them, and to me, all these arbitrary restrictions (arbitrary as in not forced by technical reasons) frankly just feel like somebody else trying to force their ideas of what good code should look like on me, when they have no idea what my project or the people working on it are like.
It also shouldn’t be forgotten that these additional restrictions make the language not only harder to learn (because there’s more arbitrary rules to memorize) but also less fun to learn. We should strive to have a language where, while you’re learning it, you have those moments where it clicks and you realize that, wait, you can put those two things together in that way too? How cool is that? And that moment shouldn’t be destroyed because your mommy comes in and tells you, no, you can’t do that, it’s too dangerous for you.

I also see that you haven’t considered my other points. If relative scoping is only permitted for named parameters, they

  • can’t be used in val definitions
  • … or return values
  • … or Lists
  • don’t work in Java methods (no named parameters)
  • are ugly in tuples

Perhaps some teams like to put guardrails like that around themselves, and that’s fine, they can write a scalafix rule for it. I see this kind of rule firmly in the territory of linting tools, not the language proper.

1 Like

Oh and one more thing: while it’s probably pretty obvious that I don’t agree with everybody on everything, I very much do appreciate the civil discussion of ideas and the time that people have been putting into it. Thanks everybody for your continued engagement.

Adding new syntax and an entirely new way to impose restrictions seems suboptimal. If someone encounters these examples in a codebase, they won’t easily understand what is happening. The code is non-discoverable and difficult to search for.

I prefer the simplicity of @mberndt’s earlier proposal, which recommends using a new symbol to infer the companion object without any magic – just an import:

import scala.compiletime.companion as <>

val l: List[Int] = <>(1,2,3,4)

(Note: <> is chosen randomly since @ is not an allowed symbol)

Delegating the choice of something short to reduce boilerplate is slightly cheating, but it is still an improvement because this renaming import is needed only once, not per constructor.

Specifically, (companion.abc(xyc): T) would compile to T.abc(xyc), which seems nearly achievable with macros, except the inferred return type seems inaccessible.

This approach is clear in terms of documentation and how to find it. There are no new principles to learn, just some inferred types and forwarded methods.

It is not as concise as some other proposals, but it seems to be an improvement without real downsides, except possibly conflicting with a better proposal.

The problem seems not so much to automatically import members of the companion object but rather, if we suppose apply is among these, what to do with it. I think we cannot just take any matching parentheses expression as a constructor, it is stretching things too much. On the other hand if we want to use some symbol such as <> then why not just alias apply as in:

object Foo:
  def apply(abc: String): Foo = ???
  def <> = apply

import Foo.*

<>("xyz")

Then the question reduces to whether or not to automatically import companion object members, and I do not see a reason why not.

Importing (with or without renaming) the apply method needs to be done explicitly for each type.
Having a single global symbol that (essentially) resolves to the inferred type’s companion object solves that problem.

I do agree that automatically importing companion object members could also address the boilerplate concern. I think I would prefer automatic imports over new syntax (i.e, over [] and ..).

However, they seem also to have more far-reaching consequences. There was a similar feature in Kotlin which was restricted at some point if I recall correctly. Basically, the issue is that imports are also available in inner scopes, thus nested definitions get a lot of imports, and some may be undesirable. My hunch is that it would be not as bad in Scala, because Kotlin used that feature for mutable builders (thus, combined with automatic imports, you just had random side effects), whereas in Scala you would just construct immutable data, thus resolving an unexpected method would likely just not compile.

1 Like

Re the import scala.compiletime.companion idea: I think it wouldn’t really buy us much because it would still have very special semantics.

Let’s take a very simple example like

import scala.compiletime.companion as <>
class Foo(x: Int)

val foo: Foo = <>(42)

The issue here is that the expression that we need to figure out a type for in order to compile this is <>, but the expression that that type needs to be determined from is <>(42), not <>. This is completely different from how any kind of identifier in Scala today works, it will require support from the compiler, and hence I disagree about the idea that there are “no new principles to learn” here – there definitely are. And I think hiding very different behaviour behind a familiar syntax is actually more confusing than just having a separate syntax, like we do for _ lambda expressions, which are the most similar feature that we have today.

I do not propose that companion by itself has any interesting meaning, but (companion.abc(xyc): T) having the method call present, and the return type somehow inferred is necessary.

But I think if you want to, the way to think about companion[List[Int]] is that it returns List (the companion object). And type inference is adapted, such that companion.abc(xyz): T infers companion[T].abc(xyc).

To make this more concrete I implemented something as close to my proposed syntax as I could get using a macro (non-transparent), this is how it can be used:

case class SomeType(v1: Int, v2: String) derives Syntax

object SomeType { def test(): SomeType = new SomeType(1, "test") }

case class NestedType(v1: Int, v2: SomeType) derives Syntax

object Test {

  import companion as <>

  def main(args: Array[String]): Unit = {
    val res1: SomeType = <>(42, "apply")

    val res2: SomeType = <>.test()

    val res3: List[SomeType] = List(<>.test(), <>(-1, "list"))

    val res4: NestedType = <>(42, <>(12, "type 1"): SomeType)

    ()
  }
}

A major limitation is that the implicits hack I use to convince type inference to infer the return type to get the companion object is not very stable, so in res4 the annotation is necessary.
Also, because this is a macro based hack, it has terrible error messages when it does not work.

With the above limitations, this is clearly useless as implicit syntax, but a proper implementation in the compiler might be able to address those.

Being able to mostly express this scheme using existing concepts is what I meant with “no new principles to learn”.
Yes, this uses quite a bit of advanced features, but conceptually it’s just method calls (albeit, “generated” forwarder methods), and type inference (of the return type). I guess the strange part is why these methods would exist on companion, but :person_shrugging:.

A non-macro sketch of the concept is here:
Scastie - An interactive playground for Scala.
the macro is here:
Scastie - An interactive playground for Scala.

1 Like

Yes, I understand. My point is that this would be an identifier that behaves completely different than any other identifier (its type would be determined by a different expression), and so at that point it’s effectively a language feature that users would need to learn and it would have to be supported by the compiler and any other kind of sophisticated tooling (IDEs). It would be like allowing Scala users to use a different character than _ for abbreviated lambda expressions, and I don’t see the point of that.

I agree that this is nicer to read even if it may be a surprise that github is in scope. But I’m hesitant to introduce new syntax like [elem] for values.

What about this: @lihaoyi ?

  1. a language import that switch on that companion object members are in scope if the expected type has a companion
  2. a language import that switch on named tuple conversion to an apply call of the expected type

Something like:

import language.{companionScope, tupleApply}

def pomSettings: PomSettings = (
  description = artifactName(),
  organization = "com.lihaoyi",
  url = "https://github.com/com-lihaoyi/scalasql",
  licenses = apply(MIT),
  versionControl = github(
    owner = "com-lihaoyi",
    repo = "scalasql"
  ),
  developers = apply(
    (id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
  )
)

We can also do away with one elem list apply(MIT) if we allow single-elem-tuples to be adapted as in (MIT), if we can accept that parens around an expression alters its meaning when the language.tupleApply is imported.

Didn’t we want to move away from language imports ?
(IMO for good reasons)

1 Like

I would like to hear your reasoning for this. New (experimental) syntax was recently added for named tuples, and unlike my proposed syntax, it could be used only to create those, so it has a much smaller power-to-weight ratio.
Actually, we could probably supplant the named tuple syntax with this proposal and use the [] syntax for named tuples as well. After all, it’s still experimental, so nobody is using it and backwards compatibility is not an issue (and the reason I’d prefer [] is that I’m absolutely positive that wrapping any expression in () must remain a no-op).

Why would it be limited to named tuples rather than tuples of any kind, or even mixed parameter lists where some arguments are named and others are positional? We can philosophize and debate all day long about what is readable and what isn’t. But we could also take a look at what people actually do in the real world, and we’d notice a pattern: they come up with all kinds of operators and DSLs to be able to write composite data structures with a compact notation. They use string interpolators, like ivy"org.slf4j:slf4j-api:1.7.25". Or they use operators, like "org.slf4j" % "slf4j-api" % "1.7.25". Why not just provide people with a convenient way to do this stuff, i. e. ["org.slf4j", "slf4-j-api", "1.7.25"] ?
Requiring argument names would also preclude this from working with collections, which makes it a non-starter as far as I’m concerned.

Please please no, parens around an expression must remain a no-op, otherwise you’re going to have people accidentally converting stuff to type-safe wrapper classes all the time. I don’t scare easily, but that way lies only madness. Parens around a single expression being a no-op is hard-wired into basically every programmer’s brain, except maybe Scheme programmers’. What’s wrong with just using unambiguous syntax like []?

:100:

2 Likes

We could use some new kind of brackets, like

  • [[]] – doubling has served to disambiguate in the past, for instance :: and : have opposite meanings vs. Haskell
  • Since we use [] the way other languages use <>, perhaps angle brackets are free to use for this
  • Maybe something with a colon or into would convey “this expression’s meaning is based on the expected type”
  • Or, perhaps we could do something like >() where > is “just an object” whose apply method is somehow defined according to the apply method of the expected type.

Hey @nafg,

thanks for joining the discussion.

Is there an ambiguity problem with the proposed [] syntax? Because I’m not aware of one, expressions cannot currently start with [. At first I thought that there might be a problem with infix operators:

object A:
  def +[A] = ()

But A + [Unit] is currently a syntax error and not, as one might expect, a call to the + method with an explicit type parameter.

I thought about it, but < and > are currently valid identifiers, so you can do this:

object < :
  object foo:
    def apply(a: Any) = ()

object >

Now < foo > is a valid expression.

That’s essentially @ragnar’s idea, no?

Anyway, I think the main point of contention here isn’t so much the syntax, it’s whether the feature in its unrestricted form (as I originally proposed) is too easily abused to produce unmaintainable code, and if that is the case, how it could be nerfed to prevent that. I feel that such concerns are misplaced (because whether something is maintainable or not depends too much on the context, and also because I think that enforcing arbitrary and taste-based “readability” rules is firmly in the territory of linters, not the compiler), but I think that’s the main objection.

At least one example of expressions that start with [] is Scala 3 polymorphic function literals

val e0 = Apply(Var("f"), Var("a"))
val e1 = mapSubexpressions(e0)(
  [B] => (se: Expr[B]) => Apply(Var[B => B]("wrap"), se))
println(e1) // Apply(Apply(Var(wrap),Var(f)),Apply(Var(wrap),Var(a)))

The [B] => (se: Expr[B]) => Apply(Var[B => B]("wrap"), se) is an lambda expression that starts with [

3 Likes

you don’t see a potential for confusion with type arguments/parameters? your single element ambiguity could be solved with trailing comma (MIT,)

2 Likes

one thing that is absolutely illegal and so can’t be ambiguous with old code is #()

2 Likes

As an additional data point, I would like to mention that the companion object members are not brought automatically into scope for the corresponding class, which I think is similar to the expected type situation, and at first suprised me, as companion members are like Java’s static fields, which are in a class’ scope, obviously.

// nok

class Foo:
  bar
  ^^^
  Not found: bar

object Foo:
 def bar = ???

// ok

import Foo.bar

class Foo:
  bar

object Foo:
 def bar = ???

Although surprising, I guess there is good reason for it, and that might hinder automatic import of companion members in the expected type situation as well.

At this point in the discussion (re-skimming the thread a bit) I am not quite sure what exactly the discussion is about exactly anymore, I see at least:

  1. new syntax, e.g. [a, b] being somehow automatically converted to the target expression (original proposal)
  2. a new idiom to use existing tuples as literals (e.g., with a FromLiteral typeclass that essentially acts like an always allowed implicit conversion) by soronpo, and mentioned by lihaoyi.
  3. “scope injection” of symbols defined in the companion object of the target type (I found a couple mentions, but not the original proposal)
  4. a symbol to kinda access the companion object of the target type.

I may also have overlooked some.

1&2 seem to be somewhat mutually exclusive, as seem 3&4.
But overall it seems unclear to me which to prefer.

I am actually with you on this one. Scala already has all the tools to win an obfuscated code challenge.
Moreover, I think that arbitrary “readability restrictions” make code harder to understand as it requires learning all the little exceptions where something is allowed and where not.

Side note, I think named tuples are an interesting example here, because they bring tuple syntax and parameter list syntax closer together, removing exceptions.

1 Like

Ah right, good point! I think it’s still technically unambiguous because expressions can’t be followed by a => token, whereas the type parameter list of a polymorphic function type must be. So once the parser finds the matching ] token, the next token can determine what kind of expression it is: if it’s => then it’s a polymorphic lambda, in any other case it’s an aggregate literal. But it’s still a mess that’s probably best avoided.

That’s an interesting idea… Trailing commas are not currently allowed in tuples though, so it’s still a syntax extension.
Python does it this way, but nevertheless, I find it looks a bit odd, and if we’re going to have to extend the syntax, then I think that I’d prefer something like your other proposal:

I thought of that too, and it certainly has advantages. I don’t think we can get away with any of {}, [] or (), so we’re going to need some sort of “decorated paren” thing. And since most characters can be used as identifiers in Scala, we don’t have that many left. We should also consider that we probably want to extend this syntax from expressions to patterns some day. If this code works…

val x: Seq[Int] = #(1,2,3)

…then so should this:

x match
  case #(a, b, c) => 42

Hence, syntax that works for expressions but not for patterns should probably be avoided. Specifically, :() and @() would be fine in an expression context, but could lead to confusion in a pattern context because those symbols are already used in patterns. An entirely separate symbol like # would avoid that for human readers. For non-human readers, any of :, @ and # would be fine because a pattern can’t start with any of those today.
The last option I can think of is .(), which is largely the same as #(), so I’d be fine with it, although I do prefer #() on a visual level.

Most other symbols are either obviously unsuitable, or a valid identifier, or Unicode (non-obvious to type).

Haha, I can’t blame you because we’ve explored many different paths from where we started. I think that’s actually really good and it has certainly provided me with new insights.

Going through your list in order, here’s how I think about the various options:

  1. That is the original idea. I don’t think of it as “conversion” though, rather it’s a kind of syntactic sugar that’ll fill in the correct companion object in front of a parameter list. So you can go from [1,2,3] to List(1,2,3), but also from [x=1, y=2] to Point(x = 1, y = 2).
  2. (tuple conversion) I don’t like this idea for a variety of reasons. It’s less flexible because there are things that just aren’t possible with tuples, like multiple parameter lists or using clauses or having some parameters named and others unnamed. It will also lead to terrible error messages and bad tooling, unless the tools grow specific support for these conversions. Moreover I don’t think it can be made to work for the case where such expressions are nested. I don’t think a good solution can be achieved this way.
  3. (scope injection) The original scope injection thread is here. At one point I thought it was a good idea to merge the two, but I’m no longer convinced because I think the issues are sufficiently distinct that more than one language feature is going to be required to solve them (sorry @soronpo, I’m still convinced that relative scoping of some form is required due to the reasons I’ve laid out in another comment, but I think it’s a largely separate issue, and I’m still prepared to help out with writing a proposal)
  4. (placeholder for companion object) That is quite similar to number 1 which proposes a syntax for companion object apply calls. It has the added benefit of also allowing things like #.of(1958, 9, 5) to create a LocalDate object (assuming that # means companion object). I think that’s a good solution.

Now that you brought up that last one again, I had some more thoughts about it. At one point I thought it would be nice to have a syntax to select members from the companion object, e. g. #of would select the of member of the companion object (or static method for Java classes). That would allow us to get rid of that ugly little dot in #.of(1958, 9, 5). But then I realized that maybe you don’t always want to select anything from the companion but rather just refer to the companion object itself. Notably, that is the case for collection conversions:

val foo = List("bar" -> 42)
def baz(m: Map[String, Int]) = ()

baz(foo.to(Map)) // using companion object here
                 
baz(foo.to(#))   // but could use a placeholder too!

So maybe that ugly little . in #.of(1958, 9, 5) is the price to pay to enable this use too.

Now that I’ve thought about it again and that @lihaoyi has demolished the [] idea, I think that this “companion object placeholder” idea is probably the best solution.

Absolutely, that is what I was trying to express with many more words before. Let’s make the language simple and orthogonal and have linters deal with “readability” for those that deem that necessary.