I’d like to write down a few more thoughts about this that have crossed my mind recently. I’ve come to the conclusion that it’s probably overambitious to try and cover the wide range of problems that have been discussed with a single language feature. Instead, we should consider several smaller changes and extensions.
FP fundamentals should just work
As a functional language, Scala should be able to express typical FP idioms as elegantly and concisely and possible. By this I mean code like the following: a prototypical implementation of map
:
extension[A, B](l: List[A]) def map(f: A => B): List[B] =
l match
case head :: next => f(head) :: l.map(f)
case Nil => Nil
(Let’s ignore for now that List
already has map
and that this is not stack safe). This is the nice, idiomatic functional code that the language is intended for and that we want people to be able to write.
This code works today because
Nil
and::
aren’t scoped insideList
like they would be if anenum
had been used, which would be the idiomatic way to declare types like thisList
has a method called::
defined on it, and there’s a weird syntax rule that says that some identifiers when used in infix are looked up on the right operand rather than the left one
Neither of these would be the case if we had just declared List
like so:
enum List[+A]:
case Nil
case ::(head: A, next: List[A])
And I feel strongly that the above implementation of map
should just work without any ceremony. To me, any uglification like List.::
@::
or ..::
(ew!) is an unacceptable step backwards, and so would be importing Nil
and ::
.
It follows that we need new name lookup rules for at least some identifiers. When matching against an enum
type, its cases must automatically be in scope without imports or the like, and in a position where an enum
is expected, its cases should also be in scope. And for binary symbolic case
s, it should also be possible to apply them using infix syntax.
The same thing is true when matching against a sealed
type: the derived case class
es and case object
s should be in scope both for matching and construction. In fact, I recently changed a bunch of code from sealed trait
(with derived types declared in the same scope, not in the companion object) to enum
, and while this did improve the declaration of those types, it made using them much less pleasant.
The need to define a separate ::
method to construct these things is also a wart, and it requires a weird syntax rule to even work. Maybe we can say that in places where an enum
type is expected, the enum
cases with a symbolic name (e. g. ::
) can be applied with infix syntax?
Too much of a good thing?
Assuming that we can agree on the above, it’s easy to jump to the conclusion that everything should be looked up in the companion object scope when the companion object is known. This is tempting because it makes some things very easy. Want to make a LocalDate
? Easy, just type of(…)
and you’re done!
But many types’ companion objects declare dozens or even hundreds of methods, and that could easily lead to an unacceptable level of namespace pollution.
OTOH, I still feel that we should have some way to make the companion object more accessible, and I think a good compromise is to allow unqualified lookup for enum case
/case class
/case object
symbols while requiring explicit syntax like ..
for all other symbols in the companion object scope.
..of(1958, 9, 5)
is a reasonable syntax to make a LocalDate
, and at the same time there’s a visual clue that some special name lookup is going on.
The apply method
As a special case, apply
is used more often than any other name for “factory” functions, for the obvious reason that it is treated specially in various ways by the language. A syntax like ..apply(1,2,3)
would defeat the purpose. It was suggested that we might abbreviate this to ..(1,2,3)
. But when I look at @Ichoran’s example from earlier…
..(II, ..(..("rrf-3", ..("b", 26))),
..(IV, ..(..("fem-1", ..("hc", 17))),
… then I can’t help but feel that it looks excruciatingly ugly, and it’s the dots that are bothering me. I really think that
[II, [["rrf-3", ["b", 26]]],
[IV, [["fem-1", ["hc", 17]]],
looks much cleaner. Regarding multiple parameter lists and using
clauses, I think it’s fairly simple to solve: [stuff]
is really just a syntax for ..apply(stuff)
, and so if you need more parameter lists or using
clauses, you just do this:
[stuff](bla)(using blub)
, which would desugar to ..apply(stuff)(bla)(using blub)
. [stuff][A]
would be desguar to ..apply(stuff).apply[A]
.
In addition to the motivation that we already discussed (e. g. k8s objects), I ran into another case recently where this kind of syntax just makes sense, and it’s the zio-aws project. Basically all methods in this project look something like this:
def createBucket(
request: CreateBucketRequest
): IO[AwsError, zio.aws.s3.model.CreateBucketResponse.ReadOnly]
There are hundreds (or even thousands?) of these, and of course everybody and their dog is using AWS these days, so a simpler syntax to create those requests really would make the language feel a lot more light-weight for a large number of people. What would you rather read and write: s3.createBucket(CreateBucketRequest(bucket = BucketName("foo")))
or s3.createBucket([bucket = ["foo"]])
? I know which one I’d pick, and it’s not the first one!
Summary
My current feeling is that we’ve been trying to cram too much functionality in one feature, so I would propose four separate ones:
- In expressions where an
enum
orsealed
type is expected, the relevantcase
/case class
/case object
identifiers can be used unqualified. The same applies when matching anenum
orsealed
type:case
/case class
/case object
names can be used unqualified - Symbolic binary
case
/case class
names can be written with infix syntax. ..foo
(or@foo
or whatever else we agree on) is a syntax to modify name lookup to occur in the companion object of the expected type.[stuff]
is syntactic sugar for..apply(stuff)
That’s actually an interesting example, because it demonstrates that spelling out types is often not helpful at all. Or is anybody going to argue that f(true: Boolean)
is any easier to read?
well, I would write it f(true: true)
…
pronounced, “true, too (two) true”. Insert ruefully shaking head.
MyFavoriteThing(id = 2, metadata = MyFavoriteThing.Watevr(salt = “eee”)),
…really doesn’t make the code any clearer.
It really does actually. By a lot. If we are bothered by length / verbosity we can use imports and renaming to a shorter name. (This would still be a trade-off, losing some clarity.)
import {MyFavoriteThing => MyFvTh}
import MyFavoriteThing.*
Yep! Data in code is a bad practice / smell. Best to keep data in files and read using helpers / libraries. There are many Scala libraries that read data into specialized efficient classes with tons of features.
It’s always possible to come up with synthetic examples to make any language feature look arbitrarily bad. That’s a pretty meaningless discussion that will get us nowhere.
Here’s some real-world code I just came across that would benefit from this language feature being proposed:
def pomSettings: PomSettings = PomSettings(
description = artifactName(),
organization = "com.lihaoyi",
url = "https://github.com/com-lihaoyi/scalasql",
licenses = Seq(License.MIT),
versionControl = VersionControl.github(
owner = "com-lihaoyi",
repo = "scalasql"
),
developers = Seq(
Developer(id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
)
)
Perfectly idiomatic Scala: immutable case class
es, collections, factory methods, named parameters, explicit type annotations. This is as vanilla Scala as it gets.
But when you look at the code, it’s kind of clunky. licenses = Seq(License...)
, versionControl = VersionControl...
, developers = Seq(Developer(...))
. There’s tons of boilerplate plate and unnecessary duplication, and def pomSettings: PomSettings = PomSettings(...)
is the kind of verbosity that you find in Java that Scala programmers like to feel smug about avoiding. It’s true that Scala is often pretty concise, but in the very common scenario of constructing nested data structures, Scala is pretty verbose as well.
I would much prefer to write some like:
def pomSettings: PomSettings = (
description = artifactName(),
organization = "com.lihaoyi",
url = "https://github.com/com-lihaoyi/scalasql",
licenses = [MIT],
versionControl = github(
owner = "com-lihaoyi",
repo = "scalasql"
),
developers = [
(id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi")
]
)
With my latest ideas, that would be something like
def pomSettings: PomSettings = [
description = ..artifactName(),
organization = "com.lihaoyi",
url = "https://github.com/com-lihaoyi/scalasql",
licenses = [..MIT],
versionControl = ..github(
owner = "com-lihaoyi",
repo = "scalasql"
),
developers = [
[id = "lihaoyi", name = "Li Haoyi", url = "https://github.com/lihaoyi"]
]
]
That looks pretty good to me, and it’s explicit about the lookup of items like github
or MIT
. Though the more I look at it, the more I feel that ..
really is a terrible eyesore. Maybe it’s time to get the bikeshedding started on that one? I feel that @
would be OK, but I’m not sure if it would conflict with annotations or pattern aliases.
I had considered using ()
(tuple syntax) instead of []
for the “companion object apply” syntax, and I don’t like the idea because so far, wrapping any single expression in parens is always a no-op, and I think it should stay that way. I also like []
because then it’ll match the list syntax of most other languages.
I think there is a way that can satisfy most concerns raised here and on the relative-scoping thread. Here is what I think we should do:
Relative-scoping
Relative scoping, as in accessing the target type’s companion object methods/values, will only be available for named argument placement. Due to this restriction, I think we can reduce the relative scoping token to a single .
, since the ambiguity is removed, IIUC. For discussion: with this restriction, we can even consider removing the need for a leading relative scoping token entirely.
case class Foo(arg: Int, bar: Bar)
object Foo:
def func(): Foo = ???
enum Bar:
case Bar1, Bar2, Bar3
val foo1: Foo = Foo(arg = 0, bar = .Bar1) //OK! (we can even consider removing the need for leading `.`)
val foo2: Foo = Foo(0, .Bar1) //error (relative scoping only available for explicit named arguments)
val foo3: Foo = .func() //error (we could allow this, but I think not)
Note: Due to the above restriction, we need to have named pattern matching officially in the language to have relative scoping within pattern matching.
Aggregated literals
Aggregated literals, as in dropping the explicit type constructor name when invoking apply
, will be possible under the following restrictions:
- Only named values or argument placement.
- The syntax of
[]
is used, and must always have named argument positioning, unless there is a single varargs argument.
case class Foo(arg: Int, bar: Bar)
enum Bar:
case Bar1, Bar2, Bar3
case class Baz(arg: Int*)
val foo1: Foo = [arg = 0, bar = .Bar1] //OK!
val foo2: Foo = [0, Bar.Bar1] //error: missing argument names
val baz: Baz = [0, 1, 2, 3] //OK!
I’m opposed to restricting relative scoping to named arguments because it’s simply unnecessary in many (perhaps even most) cases. .of(year = 1958, month = 9, day = 5)
isn’t clearer than .of(1958, 9, 5)
, it’s less clear because the signal-to-noise ratio is worse. And worse than that, named parameters don’t even work with Java methods.
What is so wrong with simply allowing the developers to make their own decisions, like adults do? I’m honestly sick and tired of being told that we can’t give powerful features to capable and responsible developers because of a few fools who might abuse them to shoot themselves in the foot. Especially given that pretty much the worst thing that can happen is that somebody needs to enable parameter name hints in their editor.
We don’t know better than other people how they should write their code.
Actually, it’s much clearer to me. But what I proposed does not prevent you from writing .of(1958, 9, 5)
. The named argument placement is a restriction for invoking relative scoping. As In, Foo(arg = 0, date = .of(1958, 9, 5))
works, but Foo(0, .of(1958, 9, 5))
won’t.
I find that hard to believe given that nobody writes dates like that anywhere ever. Humans are really very good at figuring things out from context and world knowledge, and that’s why ISO date format is 1958-09-05
and not year: 1958, month: 9, day: 5
. Because everybody can figure out it’s a date just by glancing at it.
Why, why complicate things with additional arbitrary rules that make them less orthogonal than they need to be? Why this insistence that you know better than the people using the language how they should be writing their code? I’m sorry I’m getting emotional here, but this idea that adults are really children who need to be protected from themselves is spreading everywhere, and it needs to stop. You don’t make the world a better place by preventing every bad thing that could happen, however minor. You make it a better place by giving the capable and well-intentioned all the possibilities to make good things happen. Especially when the worst possible downside can easily be mitigated by decent tooling.
But getting back to the technical side: invoking relative scoping only for named parameters means I can’t use it for variable definitions, I can’t use it in a List, I can’t use it for a function’s return value, I can barely use it in a tuple (_2 = .Bar1
, srsly?). To me, that’s unacceptable.
The restriction of limiting unqualified lookup to enum case
s and case class
es that inherit from a sealed trait
should really be enough.
That is untrue. Without the context I really had no idea what did you mean by .of(1958, 9, 5)
. That’s why date = .of(1958, 9, 5)
makes sense, but without any reference to what those numbers mean it’s an absolute hell. You need some kind of anchor for the reader to understand context. If we take away full constructors, we need to at least leave the argument name.
I’m going to once again come back to a point that I made earlier, which is that while you’re placing all your focus on this .of(...)
expression, you’re neglecting all the cases where you have even less information of what the thing you’re passing to the Person
constructor might be. Nobody is complaining about the fact that Person("Martin", _)
is a perfectly valid expression to construct a function of type LocalDate => Person
, nor did anybody ever insist that this must be written Person("Martin", birthday = _)
. The reason is that the language couldn’t possibly understand all the context that a human reader can factor in while reading the code, and thus whether to make this explicit is a choice that the programmer must make. In an expression like Person("Martin", .of(1958, 9, 5))
, it is relevant context that I know what Martin looks like and therefore have rough idea of what his year of birth might be. Humans do this kind of subconcious cross-referencing all the time, and it’s not something that any programming language will ever understand. Another example of context is variable names. What if the expression isn’t .of(1958, 9, 5)
but .of(year, month, day)
? It’s just not possible to argue that anybody could mistake that for anything other than a date.
The irony here is that I probably would use a named parameter for this case in order to distinguish between a person’s birthday and other possible relevant dates (like wedding date, signup date or whatever). But when I just write LocalDate.of(y, m, d)
I don’t have to do that either, so adding this rule for a relative scoping expression doesn’t really solve the problem, especially given that people can just import LocalDate.of
. So you’re not enforcing readable code, you’re just enforcing longer import lists.
And there’s a lesson here: you cannot enforce readable code through language rules. Readable code is the result of developers giving a fuck, and no amount of language legislation is going to change that.
Another example is my zio-aws example from earlier. s3.createBucket([bucket = ["foo"]])
is good code, adding a parameter name is pure noise, and adding noise makes code worse, not better.
When it helps, developers have the possibility to use named parameters. I like named parameters, I probably use them more than the average developer. But I don’t want to have the language tell me when I need to use them, and to me, all these arbitrary restrictions (arbitrary as in not forced by technical reasons) frankly just feel like somebody else trying to force their ideas of what good code should look like on me, when they have no idea what my project or the people working on it are like.
It also shouldn’t be forgotten that these additional restrictions make the language not only harder to learn (because there’s more arbitrary rules to memorize) but also less fun to learn. We should strive to have a language where, while you’re learning it, you have those moments where it clicks and you realize that, wait, you can put those two things together in that way too? How cool is that? And that moment shouldn’t be destroyed because your mommy comes in and tells you, no, you can’t do that, it’s too dangerous for you.
I also see that you haven’t considered my other points. If relative scoping is only permitted for named parameters, they
- can’t be used in
val
definitions - … or return values
- … or Lists
- don’t work in Java methods (no named parameters)
- are ugly in tuples
Perhaps some teams like to put guardrails like that around themselves, and that’s fine, they can write a scalafix rule for it. I see this kind of rule firmly in the territory of linting tools, not the language proper.
Oh and one more thing: while it’s probably pretty obvious that I don’t agree with everybody on everything, I very much do appreciate the civil discussion of ideas and the time that people have been putting into it. Thanks everybody for your continued engagement.
Adding new syntax and an entirely new way to impose restrictions seems suboptimal. If someone encounters these examples in a codebase, they won’t easily understand what is happening. The code is non-discoverable and difficult to search for.
I prefer the simplicity of @mberndt’s earlier proposal, which recommends using a new symbol to infer the companion object without any magic – just an import:
import scala.compiletime.companion as <>
val l: List[Int] = <>(1,2,3,4)
(Note: <> is chosen randomly since @ is not an allowed symbol)
Delegating the choice of something short to reduce boilerplate is slightly cheating, but it is still an improvement because this renaming import is needed only once, not per constructor.
Specifically, (companion.abc(xyc): T)
would compile to T.abc(xyc)
, which seems nearly achievable with macros, except the inferred return type seems inaccessible.
This approach is clear in terms of documentation and how to find it. There are no new principles to learn, just some inferred types and forwarded methods.
It is not as concise as some other proposals, but it seems to be an improvement without real downsides, except possibly conflicting with a better proposal.
The problem seems not so much to automatically import members of the companion object but rather, if we suppose apply
is among these, what to do with it. I think we cannot just take any matching parentheses expression as a constructor, it is stretching things too much. On the other hand if we want to use some symbol such as <>
then why not just alias apply
as in:
object Foo:
def apply(abc: String): Foo = ???
def <> = apply
import Foo.*
<>("xyz")
Then the question reduces to whether or not to automatically import companion object members, and I do not see a reason why not.
Importing (with or without renaming) the apply method needs to be done explicitly for each type.
Having a single global symbol that (essentially) resolves to the inferred type’s companion object solves that problem.
I do agree that automatically importing companion object members could also address the boilerplate concern. I think I would prefer automatic imports over new syntax (i.e, over []
and ..
).
However, they seem also to have more far-reaching consequences. There was a similar feature in Kotlin which was restricted at some point if I recall correctly. Basically, the issue is that imports are also available in inner scopes, thus nested definitions get a lot of imports, and some may be undesirable. My hunch is that it would be not as bad in Scala, because Kotlin used that feature for mutable builders (thus, combined with automatic imports, you just had random side effects), whereas in Scala you would just construct immutable data, thus resolving an unexpected method would likely just not compile.
Re the import scala.compiletime.companion
idea: I think it wouldn’t really buy us much because it would still have very special semantics.
Let’s take a very simple example like
import scala.compiletime.companion as <>
class Foo(x: Int)
val foo: Foo = <>(42)
The issue here is that the expression that we need to figure out a type for in order to compile this is <>
, but the expression that that type needs to be determined from is <>(42)
, not <>
. This is completely different from how any kind of identifier in Scala today works, it will require support from the compiler, and hence I disagree about the idea that there are “no new principles to learn” here – there definitely are. And I think hiding very different behaviour behind a familiar syntax is actually more confusing than just having a separate syntax, like we do for _
lambda expressions, which are the most similar feature that we have today.
I do not propose that companion
by itself has any interesting meaning, but (companion.abc(xyc): T)
having the method call present, and the return type somehow inferred is necessary.
But I think if you want to, the way to think about companion[List[Int]]
is that it returns List
(the companion object). And type inference is adapted, such that companion.abc(xyz): T
infers companion[T].abc(xyc)
.
To make this more concrete I implemented something as close to my proposed syntax as I could get using a macro (non-transparent), this is how it can be used:
case class SomeType(v1: Int, v2: String) derives Syntax
object SomeType { def test(): SomeType = new SomeType(1, "test") }
case class NestedType(v1: Int, v2: SomeType) derives Syntax
object Test {
import companion as <>
def main(args: Array[String]): Unit = {
val res1: SomeType = <>(42, "apply")
val res2: SomeType = <>.test()
val res3: List[SomeType] = List(<>.test(), <>(-1, "list"))
val res4: NestedType = <>(42, <>(12, "type 1"): SomeType)
()
}
}
A major limitation is that the implicits hack I use to convince type inference to infer the return type to get the companion object is not very stable, so in res4 the annotation is necessary.
Also, because this is a macro based hack, it has terrible error messages when it does not work.
With the above limitations, this is clearly useless as implicit syntax, but a proper implementation in the compiler might be able to address those.
Being able to mostly express this scheme using existing concepts is what I meant with “no new principles to learn”.
Yes, this uses quite a bit of advanced features, but conceptually it’s just method calls (albeit, “generated” forwarder methods), and type inference (of the return type). I guess the strange part is why these methods would exist on companion
, but .
A non-macro sketch of the concept is here:
Scastie - An interactive playground for Scala.
the macro is here:
Scastie - An interactive playground for Scala.
Yes, I understand. My point is that this would be an identifier that behaves completely different than any other identifier (its type would be determined by a different expression), and so at that point it’s effectively a language feature that users would need to learn and it would have to be supported by the compiler and any other kind of sophisticated tooling (IDEs). It would be like allowing Scala users to use a different character than _
for abbreviated lambda expressions, and I don’t see the point of that.