Meanings of underscore (including wildcard imports)

That’s certainly one of the dark corners of Scala. There is a core where _ makes sense, and then there are some tacked on things that confuse the meaning. But arguably the situation has already improved a lot in Scala 3.

The best way to understand an underscore is as a hole, something that is left out. There are two fundamental modes in syntax. Definitions (including patterns) and uses (including expressions and types). An underscore in a definition is something we choose not to name. An underscore in a use is a hole that needs to be filled by an argument later, i.e. it is a function parameter. Those are the fundamentals, and I think they are quite defensible that way.

Underscores in imports behave a bit differently. A wildcard import means we leave a hole for what is imported, which means in this case everything is imported. So that’s different from _ in expressions. In retrospect, maybe we should have stuck with import p.* for this. But I believe wildcard imports are not a major hurdle to understanding. Please speak up if you think I am wrong here.

On the other hand, the underscore in a renaming import

import p.{a => _}

is a defining occurrence, so we choose not to give a local name to the thing that gets imported. This is quite analogous to _ in patterns or definitions. So I would say that use of _ is again consistent with the intended meaning.

And then we have the awkward squad.

  • _ in a type F[_] means wildcard type in Scala 2, not type function. This was maybe OK before Scala acquired HK types, but is utterly confusing since. It’s fixed in Scala 3, where wildcard types are written F[?] (underscore is still allowed at the moment for cross compilation with Scala 2 but the meaning will change to type function in the future.)

  • f _ in expressions means “all following parameters” in Scala 2, instead of “one parameter”. This idiom is dropped in Scala 3.

  • In a field definition var x: T = _, underscore means “no initializing assignment”. There’s an issue I have just opened about this.

  • : _* means “flatten to match a vararg parameter” in an argument list. It’s fairly arbitrary as syntax. I am not sure it is confused with _ since _* is probably seen as a separate token.

8 Likes

Yes, indeed. So orthogonal != simple, in fact the two are probably inversely related. That’s surprising, but I have come to accept it as a fact.

@odersky on the topic of overusing _, it seems that it’s basically used in two different meanings now: “don’t care” and “everything”, especially since in Scala 3 we’re removing a bunch of misc use cases. What if we standardized on * to mean “everything”? That would leave _ meaning only “don’t care”.

That would give us code like:

import p.*
import p.{a => _, *}
def foo(x: Int*) = x.sum
val nums = Seq(1, 2, 3, 4)
foo(nums*)
case Seq(1, 2, rest*) => ???

* is already “special syntax” since it’s used to denote varargs in parameter lists and expand vararg-lists in callsites, and such a change would both (1) reduce the number of distinct meanings for _ and (2) bring Scala’s “import all” and “varargs” syntax closer to other languages like Java, Python, Ruby, Kotlin, etc., all of which have syntax similar to this for their imports and varargs definitions/expansions/destructuring

Would be a small change, but it seems like it would be a step in the right direction both for Scala’s internal understandability as well as its onboarding-friendliness to people with prior programming experience.

15 Likes

Interesting. To me, : _* makes perfect sense, because we cast Array[A] or Seq[A] to A*, and to cast something, we need a colon :. So it should really be : A*, but we don’t care about A, we just care that it is varargs, so : _* makes perfect sense.

To me, fun _ is the one that doesn’t fit and feels arbitrary.

5 Likes

@lihaoyi I am very sympathetic to the proposal. I have opened an issue for the varargs change which is the less controversial one, I think. That and the change for omitting initializers would get rid of all obscure uses of _ except for imports.

For changing imports, I don’t know whether we can do it, even though I agree that

  import p.{a => _, *}

is clearer than

  import p.{a => _, _}
9 Likes

@odersky What about using * for imports seems controversial? I feel both varargs and import-all are pretty similar: the rest of the programming community use * for both, and Scala’s the odd one out in both cases. Also, if we make * special syntax, then people could still use * as an identifier by wrapping it in backticks, so there’s a reasonably straightforward migration path without needing anyone to rename their variables or anything

4 Likes

Simply that wildcard imports are a lot more pervasive than vararg splices. So it feels like a bigger change.

1 Like

There’s this little method called * which is used pretty ubiquitously, in Scala and almost all other programming languages. I think having to write * between backticks every time you want to multiply some numbers would be a significant step back, if * would suddenly become a reserved word.

Of course if * would just become a soft keyword in import statements and in postfix position (if postfix operators no longer exist), that would probably be fine.

5 Likes

I actually wrote a lesson all about the use of the underscore in Scala, and how it does have a reasonably consistent interpretation, despite its reputation for being too heavily overloaded.

5 Likes

I think the following type of imports is definitely confusing:

import foo.{ bar => _, _ }

It is worth noting that the x => _ syntax is useful only in conjunction with a wildcard import (or maybe it can also be used to explicitly disable an implicit definition?). So, instead I suggest we introduce a new syntax. Here are some suggestions:

import foo.{ _ except bar, baz }

Or:

import foo._
unimport foo.{ bar, baz}

Or:

import foo.{ _, !bar, !baz }

I also agree with Li Haoyi that it does not make real sense to use _ to mean “everything” as in “import everything”. Using * works OK, IMHO:

import foo.*
import foo.{ * except bar, baz }

Relatedly, I think one issue with alias imports is that they don’t follow the way we usually introduce names.

Consider the following import:

// Introduces the name `baz`
import foo.{ bar => baz }

We import foo.bar as an alias baz.

However, the => syntax usually introduces names on its left-hand side and uses them on its right-hand side, as in:

// Introduces the name `foo`
foo => foo.something

So, I think using => for import aliases is not consistent with how we use => at the other places.

We have a way to introduce aliases in patterns, with the case bar @ baz => syntax, but I don’t think this syntax plays well with imports (consider import foo.{ bar @ baz }).

I think there was an experiment for using as as a way to introduce aliases in patterns, I can’t find it anymore in the dotty reference documentation. Has it been dropped? It would make sense to use it for imports:

import foo.{ bar as baz }

I think the meaning of “as” is not ambiguous here, and it is consistent with the way we use it in SQL or in ECMAScript imports.

(We should probably fork the discussion if we want to continue discussing on the syntax of imports)

9 Likes

It is annoying thing in my practice. I have googled it many times :slight_smile:

import p.{a => _, _}

It is unintuitive, and I could not find its description in documentation quickly.

1 Like

Some people will freak out about me proposing this: Now that all other confusing uses of _ will be eliminated PR 11231, PR 11240, should we also clean up import syntax? I am pretty much in line with the proposals of @lihaoyi and @julienrf how to go about this. Concretely

  • Replace wildcard import _ with *, which is what basically all other languages use
  • Replace renaming => with a soft keyword as.
  • Allow as outside braces, as in
    import scala.collection.mutable as mut
    import NumPy as np
    

I would probably still keep “as _” for an import exclusion. Having a separate except clause is nicer but also more complex, syntactically. It would give rise to new feature interactions, so it would be more risky to do, in particular at this late stage. So I’d rather go with

  import p.{a as _, *}

because its rules follow directly from the combination of renaming imports and wildcard imports.

15 Likes

Would this disable * as a method name? (bacticks excluded) What about an object * I might want to import?

Of course not. Only if you want to import a thing named “*” you have to put it in backticks:

  import foo.`*`
1 Like

I’m not sure, does Scala allow having both * and *?

I don’t even freak out anymore. I knew this would happen. It started with “small, obviously better changes”, and we couldn’t resist against doing that 2 weeks away from an RC1. So now it’s growing into “a better thing but that impacts basically every file in existence”. Over the week-end we’ll have something even bigger.

I’m not freaked out anymore; I’m just discouraged.

All these things could wait until a later release. We can keep improving the language after 3.0. Thinking that everything must be perfect at 3.0 is a chimera.

14 Likes

We will improve the language after 3.0, but we will not be able to do fundamental syntax changes anymore, simply because all the docs and tutorials, including printed books and online courses have to stay relevant for a long time. So after 3.0 we are back to the more settled standard way of language evolution, which means:

  • don’t break existing code on a large scale (except for things that we have announced for removal today, so they won’t be in the Scala 3 docs)
  • add new capabilities only
  • concentrate on more advanced and specialized use cases.

So if something is confusing today, we cannot decide to remove it once 3.0 has shipped. We can add stuff, but I fear that will not make anything less confusing.

4 Likes

We can also deprecate old stuff and introduce better replacements. This happens all the time, even in minor versions.

7 Likes

Honestly I don’t find the case for renaming => to as convincing. How many times in the past decade have you heard someone say the import => syntax is confusing? I haven’t heard that at all.

w.r.t. the 3.0 release candidate, honestly what I have been saying for a while is that the release timeline is crazy aggressive, especially given the “fix all the tiny things” attitude that we have taken. Here’s a strawman timeline that would make the current attitude towards Scala 3.0 work:

  • Mid Q1 2021 - Mid Q2 2021: 3 months consolidate the current feature sets:

    • Write Design-Docs/SIPs for every language change. Most of these changes are pretty trivial, and writing a good design doc should take <1day to write, so with 1-2 people working on this would allow us to cover 50-100 different features in good detail. Design docs aren’t just for the sake of arguing and discussion, but will also sharpen our own thought processes and ensure the features are thought through
    • Continue expanding the community build (e.g. including Fastparse, Ammonite, etc.): the more real-world code we can get onto Scala 3 at release, the more we can exercise the featureset and the more confident we will be that we do not have any fundamental blockers to upgrades or missed opportunities for improvement (e.g. in the metaprogramming API, which is too large and complex to review just via SIP)
  • Mid Q2 2021: Release the RC1 together with (1) all the SIPs and (2) the expanded community build.

  • Mid Q2 2021 - Mid Q3 2021: 3 months for discussion and review. The expanded community build and SIP writeups would let people exercise the featureset in more-than-toy use cases, and take part in the discussion on the SIPs with the ability to try things out and give better feedback.

    • This also gives time to implement whatever changes arise from the SIP discussions and feedback; I don’t know what the changes will be, I assume there will be some
  • Mid Q3 2021 - Mid Q4 2021: 3 months of burn down period: no more feature changes. Just time to exercise things more, try and find bugs, fuzz test, improve error reporting, all the standard polish things that are easily overlooked during the ideation-and-feature-implementation phase.

  • Target release of Scala 3.0 in Mid Q4 2021

IMO there’s nothing inherently wrong with a “try to fix all the long-standing problems” release, we just need to be realistic about the timeline necessary to properly execute it. Given that we’ll be living with the results for the next decade, I think spending a few more months to let the process breathe would definitely be a worthwhile investment.

(This is a strawman timeline I made up in the last 5 minutes, the exact dates can be argued over and tweaked)

13 Likes

“Where will it stop?” is a fair question about doing syntactic changes so late in the game. I believe many will agree that eliminating all confusing uses of _ will make the language simpler. But is that just a drop in the bucket, or the last thing to arrive at a local optimum?

Maybe it is best to take the bull by horns, and make a list of all things that are syntactically more obscure than they need to be. I am not talking about missing constructs, these can be added later. I also don’t want to start to bikeshed. Let’s just pick things that

  • are widely used,
  • are obscure or confusing the way they are defined today,
  • would have simple fixes so that
    • code becomes clearer, and
    • existing code can migrate without problems

I did a run-down through the syntax summary, and came up with the following candidates:

  1. @unchecked annotations for pattern matches that fail

    def f(xs: List[Int]) = 
      val y :: ys = xs @unchecked
      ...
    

    It’s definitely an improvement over silent failures in Scala 2, but the @unchecked feels a bit clunky and vaguely hostile. On the other hand, replacing an annotation with special syntax could also be done later, so no urgency here.

  2. Interaction between union types and pattern alternatives

    case x: A | B   => // these are two patterns
    case x: (A | B) => // now it's a union type test
    

    Here the problem is migration. We could change the operator precedence so that x: A | B was a single type test, but that could change the behavior of existing programs. So I think we have to live with it.

  3. Self types

    x: T =>
    

    is a syntax that is not obvious and has to be memorized. Arguably, the underlying concept is not obvious to most people either, even though it is quite general and fundamental. I believe the best way forward here is to investigate a more general “MyType” construct that can supersede self types. This would mean providing a standard name for the type of this, and allowing to override its definition. Ideally, that name would be This. So instead of

    trait T:
      this: U =>
      ...
    

    we’d write

    trait T:
      type This <: U
      ...
    

    This looks strictly more powerful then self types. But it’s something that has to be done later, and
    can be done later. So, over time self types could be superseded. The only question would be: should we reserve the name This for a type now, to make it possible to do this later? I believe that would be a pretty hard breaking change, as This is quite common in codebases. So, I am not at all sure we can do this.

That’s all from my side for now. Three candidates, but neither can or should be changed for 3.0. So maybe confusing _ was the last major thing to fix for 3.0?

If you have other candidates please mention them. But please, stick to the criteria: a widely used existing feature that is obscure or confusing and can be easily changed with a straightforward code migration path. So," let’s add feature X", or “let’s drop feature Y since I don’t use it” won’t qualify.

Sure, but we cannot deprecate stuff that is in every text book or online course. That would be suicidal.

5 Likes