In this topic I’d like to discuss concrete ideas to get better error message reporting into Scalac. @SethTisue has told me the Scala team at Lightbend is looking forward to getting this contribution from the Community, and so I’d like to gauge some feedback on this front.
I believe there is consensus at both the Scala Center and Lightbend about the importance of this feature. For example, @adriaanm has briefly talked about it as one of the goals for Scala 2.13. As this topic has recently been a lot on my mind, I’d like to start this discussion in preparation for the coming 2.13 release. I would personally love to see good error messages feature on the first 2.13.0 artifact.
Next, I dive a little bit deeper into this important topic, quickly discuss why error reporting is important and propose a plan to move this feature forward with the help of the community. I also give my personal take on what makes an error message good, and how the future error messages could look like.
Motivation
Improving the ergonomics of the language is important for two things: first, it makes our current userbase happier; second, it increases language adoption. In the case of Scala, good ergonomics fight the perception that Scala is an academic language with a poor developer experience (many people have this impression even though it’s not true). There’s no better way of selling the catch phrase “the compiler helps you” than making the compiler actually help you with actionable and nice error messages.
In my opinion, good error messages are especially useful for folks unfamiliar with statically compiled languages or developers that have had traumatic experiences with languages like C or C++, languages traditionally known for confusing error messages (maybe now the situation is a little bit better?).
Making improvements in this area has a huge impact in our ecosystem and the future of the language. Other languages have already taken the lead and implemented better error reporting with success. The closest example is Dotty (/cc @felixmulder), which in a community-driven effort has got nice error messages, inspired mainly by languages like Rust and Elm — check the links if you want to have the background and the approach taken in those languages.
I’ve only seen good comments on nice error messages in languages like Elm and Rust. The most recent example has been “Rust in 2018: it’s way easier to use!”, where a developer praises the new usability of the language. I won’t bother to link to more articles, but if you’re interested in the feedback received in those communities I encourage you to Google around and check Hacker News comments.
Current tools
There are several tools that aim to improve error reporting in Scalac, scala-clippy, imcliptly and sbt-errors-summary, which was previously discussed in Improving the compilation error reporting of sbt.
All of these tools require two things: knowing about them (the hard part) and a manual installation process. I believe these tools serve as an inspiration, but we should definitely focus on making the changes in the Scalac compiler so that they are available to everyone, regardless of the tool or knowledge they have.
A plan
Scala 2.13 is “around the corner”, the release candidate is scheduled for April 27th. As this is the kind of initiative that takes a lot of time to be merged (too many details to agree on) and requires coordination, I believe it’s better we start planning for it as soon as possible.
I think the best way to move this forward is that someone works on the compiler infrastructure, and then other contributors improve actual error messages (one per PR). This work would mainly focus on adding the required data structures and architectural changes to make addition, removal or modification of error messages easy. Perhaps it would be useful that there are some short guidelines about what makes a good error message, so that future error messages follow them.
What makes a good error?
A good error message has IMO the following properties:
- A unique identifier.
- The source code is the main part in a message (*).
- The content is short (the less unnecessary cruft, the better).
- Descriptions are clear, verbs are in the present or imperative tense.
- There are actionable steps to address the issue.
(*) Putting the source code front and center is valuable because sometimes that’s all the context a developer needs to figure out what’s wrong. Some errors may be obvious and visible at a glance, and others can be inferred just from the context they happen at.
An error message template
Rust-style error
| |
Characteristics
- No header, to the point: error ID is visible.
- Clear title, no capital letter.
- Display of offending source file, line number and column.
- Error message is next to the
^^^
pointers. - Suggestions have a leading
=
to distinguish them from the message. - Source code has more space.
Elm-style error
- Header with dotted line, display of offending source file in the header.
- Simple statement of what wasn’t found.
- Error message is in a new file.
Scalac style?
Dotty-style errors are a mix between the two, but they ressemble more Elm’s than Rust’s. I personally have a strong preference for Rust-style errors, I find them clearer, more concise and easier to read (and grep
for). I think that having no header line, using leading characters, showing offending line number and column and displaying the error message right next to the carets make them a better alternative.
I propose to use Rust-style error messages in Scalac, but I’m open to change my mind if you have good arguments to prefer Elm-style messages.
The error directory (or catalog)
Having unique identifiers for error messages is great because they become immediately searchable. Unique identifiers must be compiler specific, that is an id for the same error in, say, Dotty and Scalac have to be different to avoid interferences between each other. This way, users can quickly google an error id to know more about an unclear or difficult to address error.
As one would expect the ids to be used a lot, I believe it would be useful to have an official Scala error directory where error ids can be searched for (both online and offline). This is a great place to provide longer explanations, link to auxiliary information (blogs or other resources), or ramble about the nature of the error.
As part of this feature, we can provide tools to generate browsable error directories so that the same concept can be used for compiler-dependent tools too (macros and compiler plugins). Down the road, error indices could even be applied to runtime errors in applications (imagine having an error directory for bloop or sbt errors).
Note that the offline error directory may very well be the compiler itself. scalac -explain SE001
, where SE001
accounts for an Scalac error, would spit out all the known information about this error message to the console (think manpages
for compilation errors). This would help bringing the scalac online docs closer to where they are needed.
Contents of the error messages
The more compiler suggestions error messages have, the merrier. Error messages should strive to be clear and to the point. Those users wanting to know more about errors can do -explain
to get call-site dependent information, or search for the id in the error directory.
Technical discussion
On range positions
Range positions are necessary to highlight the offending pieces of code that are wrong. They currently require a compiler flag -Yrangepos
because scalac uses “point” positions by default. -Yrangepos
's implementation is not infallible and to the best of my knowledge cannot be reliable used as of now – Scala’s issue tracker seems to have some bugs about it.
It’s unclear to me whether those issues are fundamental to the way they work or they can be fixed, but the overhead of enabling range positions seems to be quite high because of increased memory consumption (I remember some experiments made by @xeno-by proved this claim).
The natural route to provide the range highlights would be to enable -Yrangepos
by default in 2.13, but given its current flaws I’d like to advocate for a simpler solution. Let’s fiddle with point positions and “end characters” so that we can simulate range positions “reliably”. In pratice, we would say that a highlighting region ends when a curly brace, a bracket or other special characters are hit. This is the approach taken in Dotty. This would remove the overhead of enabling range positions.
If this is not reliable enough, we can have the compiler parse the call-site of error/warning messages with an improved version of -Yrangepos
(instead of getting the range positions for all tree nodes). To decrease the overhead even more, we could cache this operation with tree attachments.
On IDE consumption
Consuming error messages is not easy for an IDE: error messages need to be parsed manually if there’s no easy access to the compiler internals. Sometimes, downstream tools need to create ad-hoc parsers to get file, line and column metadata.
To address this problem, I propose that Scalac defines an schema for error messages and provide the tools downstream to read them. In particular, I lean towards having the compiler emit protobuf (or, alternatively, json). The benefit of protobuf over json is that protobuf has code generators for most of the languages, and depending on a protobuf file in a python tool is easy.
On how to model errors internally
I think Scalac will need a core abstraction Message
similar to the one that Dotty has. Have a look at the linked PR to get a feeling of how messages are defined.
I want to help!
I cannot imagine shipping this task without the help of our Community. We don’t only need feedback on what’s the best message format or which things you find annoying about our current error messages, but we need also help improving errors messages by itself. Dotty has shown that this is possible.
When this discussion is more developed, I’d like to put up a list of tasks that contributors could help us with. In the meanwhile, please do contribute to this discussion and tell us what you think.
I cannot wait for better error messages in Scalac .