[Option to] forbid declaring nullary methods?

tkroman · April 11, 2020, 12:39am

Not sure if this should be treated as a language-level proposal or just a request for a compiler flag, but here it goes.

I want my code to be consistent wrt the “it behaves like a function if and only if it looks like a function call” principle:
foo means that foo is necessarily a value
foo() means it’s definitely a function.

I realise it’s not going to be totally consistent since there are unary/binary operators which people can define to behave however they like, and this is definitely out of question, but at least for conventional instance.method() calls - does anyone else see how this could be beneficial?

One of the drawbacks I see would probably be some corner-cases of DSLs, but I don’t thing that’s a dealbreaker.

Dropping auto-application (however incomplete (for now?)) is definitely helpful here but I feel like making sure there is no option to confuse function calls and value referencing is a complementary step in the same direction.

Jasper-M · April 11, 2020, 12:14pm

I’m pretty sure it should be easy to define a scalafix rule to enforce this on all your own projects.

I don’t really see it happening that nullary defs are removed from the language. And I don’t really see the problem either, if you follow the convention that nullary defs have to be pure functions. In my mind, being able to override a nullary def with a val or lazy val is a pretty essential property of Scala.

jducoeur · April 11, 2020, 2:59pm

Agree with @Jasper-M – as such, this belongs as a scalafix rule for your project, not in the language.

Keep in mind, the ability to implement a declaration as either a value or a function is – well, if not quite a bedrock principle of OO, at least one of those basic concepts that is used widely in Scala. The sort of rigid rule you’re asking for looks reasonable for classes in isolation, but often much less so in the presence of inheritance, where it’s very common for one implementation of a trait method to be done as a function, where another is done as a value. In the face of inheritance, it just isn’t such a clean either/or – you sometimes don’t even know at the time you write the trait what you might eventually need.

And frankly, I don’t think there’s a lot of value in it as such – from an observer’s POV, there shouldn’t be any meaningful difference between a value and a pure function.

That leads to the more interesting version of the idea, which has been floating around for many years: that parens should be required if and only if it has side-effects. It’s not unusual to adopt that as a convention (I often do so myself, albeit inconsistently), and being able to enforce that would be neat and arguably useful. (Although at that point it isn’t clear why nullaries should be special, so it isn’t really sufficient.) But that requires effect tracking, which would be really cool, but is very much an open research project at this point. No clue whether or when we’re likely to get to the point of having that capability, but I believe the Dotty folks are thinking about it…

curoli · April 11, 2020, 3:09pm

That A.foo could be a val or def is called Uniform Access Principle and it is usually considered a very valuable thing.

In Scala, even a.foo = x could be a direct field assignment or a call to a setter method.

Because in object-oriented design, you usually want to separate API form its implementation. The classical example is Temperature.inCelcius and Temperature.inFahrenheit, where one could be a value and the other be computed from that value, and we do not want the API reveal which one.

In languages that do not have UAP, such as C/C++ and Java, the canonical design is to make all fields private and if you want to reveal the field you add a getter method, and if you want to be able to change it directly, you add a setter method. This leads to a lot of dull verbosity.

In Scala, a simple (i.e. public) field declaration will be translated into the canonical design on the JVM, i.e. val a will be a private field a and a getter method also called a. A var a will also include a setter method. This can lead to surprises when trying to override a val, which is most of the time a mistake.

tkroman · April 11, 2020, 4:30pm

My point is to eliminate this in favour of not “following the convention”. It is one of the features that introduce the cognitive load for both newcomers (when first encountering it and figuring it out) and experienced coders (having to look at a definition every time you are uncertain whether it’s a value or a method call). The fallacy in your argument is that you reply with “but it’s supposed to work this way” to my question of “could we consider if it’s supposed to work this way?”.

In that case one declares a method and lets implementors deal with figuring out what underlying behaviour they want in their implementations.

Not true. Functions are (might be) values but calling a function and reading a field are certainly not equivalent. Consider an example of computation monads and the process of their evaluation (it’s related to what you are saying in the last paragraph) - you always invoke their evaluation function in order to get values, independently of whether a concrete instance is just an eagerly evaluated value inside Now or a complex computation running a program in Delay or whatever.
But I don’t think getting into semantics is useful here, I’m leaning towards the simpler argument of eliminating another source of cognitive load, which IMO should be enough.
After all, Martin claimed (a couple of years ago) that Dotty aims to be more opinionated in certain aspects in order to bring more consistency for end-users and I see this as an opportunity to move in that direction.

The fact that something has a name and had amassed a certain userbase doesn’t automatically mean it’s necessarily the right / the only solution.

On this I completely agree, and I mention it in the original post - I don’t think it’s possible to eliminate this exact pain point.

This is a compile-time surprise so there is no real surprise. A real surprise is figuring out that a call that you thought was a field access is actually costing you seconds in runtime.

As with all language features, there is always a way to abuse them, and where there is an opportunity to abuse something, there is always a hidden knowledge that has to be obtained by newcomers through a process of first bumping into the issue and then learning to be careful around it. If a language lets users to write code that looks like field access but can have arbitrary complexity behind the scenes, that’s going to happen simply because people love syntactic sugar at first, until they have had their fair share of problems with it.

What I’m saying can be boiled down to this: assuming we have nullary methods AND everyone agrees that they should only be pure functions (AND we also implicitly want these calls to be cheap, right?), consider a coding guide of an arbitrary scala shop. Is it going to include this point (warning)? Yes, it is. It’s going to have to explicitly mention it and somehow attempt to enforce, either via automated checks or through code reviews or a combination of both.
If we have a chance to avoid this by getting rid of nullary methods, do we get a different point in the coding guide in exchange? Something that people have to learn to keep in their mind? I don’t think so / I don’t see that now. So maybe it’s worth considering after all?

martijnhoekstra · April 11, 2020, 5:06pm

The entire discussion is about semantics. Semantics are exactly what’s at stake here. To wit, the semantics of function evaluation, of side effects, and the operational semantical difference between evaluating a def and a val.

Getting rid of nullary methods doesn’t avoid this at all – you lose the option to show a def will perform side-effect by giving it an empty parameter list.

tkroman · April 11, 2020, 6:03pm

There is no evidence in real world that this ever worked. The opposite is true, I’ve seen both pure foo()s and impure bars. Anything that is supported only by convention is a priori incomplete and does not work outside of sterile environments as opposed to something that is enforced by the rules users can’t work around.

If you insist on this, than i claim that nullary methods break reasoning since I can never be sure that what looks like a field reference doesn’t read from disk.

martijnhoekstra · April 11, 2020, 6:07pm

Everything that I and other people found useful in the real world is a posteriori useful regardless of a priory incompleteness.

I and other people in the real world have found this feature useful.

tkroman · April 11, 2020, 6:10pm

Got you. Hope we’ll see someone leaning towards the opposite side too.

Ichoran · April 11, 2020, 8:21pm

I concur with @martijnhoekstra - I have found it to be useful in practice. Not critically important, not 100% reliable, but useful.

It is/was a way to help document the code. foo() means that foo has side-effects. And, if it’s def foo() { ... }, then it only has side effects. That was quite handy for me–being able to understand what is probably the case with a method just by glancing at its use site and/or its definition site.

Unfortunately for me, it was decided that the uniformity of having all methods have a return value was more important than this visual distinction, so now it has to be def foo(): Unit = { ... }. This has noticeably impeded the speed at which I can understand certain things about my code and about other people’s code. And that’s just because it takes me a bit longer to notice that it’s : Unit = instead of : Int = or something!

If we were to remove nullary methods, I would expect the negative impact on me to be much greater. I wouldn’t even want to do anything that aids an alternate style where () doesn’t mean side-effecting but rather computation-performing, because that would make it a less reliable guide to what’s going on in other people’s code, even if I were able to use it ubiquitously in all of mine.

So, from direct experience with a change that removes ways to use syntax to suggest things about whether or not code is side-effecting, I say: no thank you!

(I understand that it can be a challenge to understand how much computation is involved when you see something like x.foo; I do a fair bit of high-performance work, and that’s a challenge for me. But I find in practice that the main challenges are elsewhere anyway. If it’s def foo = bar.myFoo, then the computation is still very small, and most of my optimization effort is spent on understanding the characteristics of foo. Just not counting on a lack of () meaning “field access” doesn’t appreciably alter the amount of work I have to do. So I don’t really care. Of course, if you interpret x.foo as just a field access and you’re wrong, it can have major performance implications! But once you learn that you can’t make that assumption, I find it doesn’t much matter.)

curoli · April 12, 2020, 4:27pm

I understand you want to make direct field access syntactically distinct, because it is guaranteed to be cheap, while method calls are potentially expensive.

However, such desire is not only contrary to Uniform Access Principle or the convention that empty parens means side-effects, but you are running against more fundamental principles of object-oriented design, according to which fields usually should not be accessed directly from the outside.

This necessarily leads to lots and lots of method calls. In OO design, this trade-off is accepted based partly on believing that the benefits of OO outweigh the performance loss and in part on trusting that the compiler will help by method inlining, value propagation and other forms of optimization.

NthPortal · April 14, 2020, 5:09pm

I’m going to do something different and come at this from a Java angle.

Item 16 of Effective Java: Third Edition (and it is present in older editions, though perhaps with a different item number) states:

In public classes, use accessor methods, not public fields

It goes on to describe the reasons as follows:

Because the data fields of such classes [with public fields] are accessed directly, these classes do not offer the benefits of encapsulation (Item 15). You can’t change the representation without changing the API […] and you can’t take auxiliary action when a field is accessed.

[I]f a class is accessible outside its package, provide accessor methods to preserve the flexibility to change the class’s internal representation. If a public class exposes its data fields, all hope of changing its representation is lost because client code can be distributed far and wide.

When following the recommendation of Item 16, there is also not a way just from the method structure to distinguish methods which merely read fields, and those which don’t. This is intentional - if you could distinguish them, you couldn’t swap one for the other, and you would lose the flexibility to evolve APIs. It is the purpose of documentation to explain what methods are expensive, IO-bound, etc.

The Uniform Access Principle is a direct encoding of Item 16 in Scala - vals are actually encoded as private fields and public methods, such that using a val is automatically following the above advice and retaining the ability to change the class’s internal representation.

Forbidding declaring nullary methods is essentially violating Item 16 in all cases. It prevents API evolution, which ends up being quite bad for widely-used libraries.

In general, IO-bound methods should be nilary (because they are side-effecting), which would clearly distinguish them from nullary methods that are pure or field reads.

Forbidding nullary methods severely hampers your ability to evolve APIs. Suppose you write a collection type (that doesn’t follow the scala.collection hierarchy), and you expose its length as val length. This will prevent you in the future from implementing a wrapper collection that forwards its length to the underlying collection. You can’t do that, because it needs to be a val. Now you need to store an extra, redundant copy of the length in a field.

Suppose you have a mutable collection, and you keep track of its length in a mutable var. Because you don’t want users of your class to be able to modify the length, the var must be private and the value must be publicly exposed through a def. Now your mutable and immutable collections have confusingly different APIs (and can’t share an ancestor) because the mutable one can’t be backed by a val.

In short, there are very good reasons for the Uniform Access Principle. It is the extension of a principle even found in good Java development practices. If you want, you can probably use Scalafix to prevent the use of nullary methods (as mentioned by @Jasper-M), but you will likely find it hampering you in many, many ways.

dwijnand · April 16, 2020, 8:47am

Sorry for the aside: why do you say “incomplete”?