Towards better error messages in Scalac

jvican · January 14, 2018, 8:52pm

In this topic I’d like to discuss concrete ideas to get better error message reporting into Scalac. @SethTisue has told me the Scala team at Lightbend is looking forward to getting this contribution from the Community, and so I’d like to gauge some feedback on this front.

I believe there is consensus at both the Scala Center and Lightbend about the importance of this feature. For example, @adriaanm has briefly talked about it as one of the goals for Scala 2.13. As this topic has recently been a lot on my mind, I’d like to start this discussion in preparation for the coming 2.13 release. I would personally love to see good error messages feature on the first 2.13.0 artifact.

Next, I dive a little bit deeper into this important topic, quickly discuss why error reporting is important and propose a plan to move this feature forward with the help of the community. I also give my personal take on what makes an error message good, and how the future error messages could look like.

Motivation

Improving the ergonomics of the language is important for two things: first, it makes our current userbase happier; second, it increases language adoption. In the case of Scala, good ergonomics fight the perception that Scala is an academic language with a poor developer experience (many people have this impression even though it’s not true). There’s no better way of selling the catch phrase “the compiler helps you” than making the compiler actually help you with actionable and nice error messages.

In my opinion, good error messages are especially useful for folks unfamiliar with statically compiled languages or developers that have had traumatic experiences with languages like C or C++, languages traditionally known for confusing error messages (maybe now the situation is a little bit better?).

Making improvements in this area has a huge impact in our ecosystem and the future of the language. Other languages have already taken the lead and implemented better error reporting with success. The closest example is Dotty (/cc @felixmulder), which in a community-driven effort has got nice error messages, inspired mainly by languages like Rust and Elm — check the links if you want to have the background and the approach taken in those languages.

I’ve only seen good comments on nice error messages in languages like Elm and Rust. The most recent example has been “Rust in 2018: it’s way easier to use!”, where a developer praises the new usability of the language. I won’t bother to link to more articles, but if you’re interested in the feedback received in those communities I encourage you to Google around and check Hacker News comments.

Current tools

There are several tools that aim to improve error reporting in Scalac, scala-clippy, imcliptly and sbt-errors-summary, which was previously discussed in Improving the compilation error reporting of sbt.

All of these tools require two things: knowing about them (the hard part) and a manual installation process. I believe these tools serve as an inspiration, but we should definitely focus on making the changes in the Scalac compiler so that they are available to everyone, regardless of the tool or knowledge they have.

A plan

Scala 2.13 is “around the corner”, the release candidate is scheduled for April 27th. As this is the kind of initiative that takes a lot of time to be merged (too many details to agree on) and requires coordination, I believe it’s better we start planning for it as soon as possible.

I think the best way to move this forward is that someone works on the compiler infrastructure, and then other contributors improve actual error messages (one per PR). This work would mainly focus on adding the required data structures and architectural changes to make addition, removal or modification of error messages easy. Perhaps it would be useful that there are some short guidelines about what makes a good error message, so that future error messages follow them.

What makes a good error?

A good error message has IMO the following properties:

A unique identifier.
The source code is the main part in a message (*).
The content is short (the less unnecessary cruft, the better).
Descriptions are clear, verbs are in the present or imperative tense.
There are actionable steps to address the issue.

(*) Putting the source code front and center is valuable because sometimes that’s all the context a developer needs to figure out what’s wrong. Some errors may be obvious and visible at a glance, and others can be inferred just from the context they happen at.

An error message template

Rust-style error

| |

Characteristics

No header, to the point: error ID is visible.
Clear title, no capital letter.
Display of offending source file, line number and column.
Error message is next to the ^^^ pointers.
Suggestions have a leading = to distinguish them from the message.
Source code has more space.

Elm-style error

naming

Header with dotted line, display of offending source file in the header.
Simple statement of what wasn’t found.
Error message is in a new file.

Scalac style?

Dotty-style errors are a mix between the two, but they ressemble more Elm’s than Rust’s. I personally have a strong preference for Rust-style errors, I find them clearer, more concise and easier to read (and grep for). I think that having no header line, using leading characters, showing offending line number and column and displaying the error message right next to the carets make them a better alternative.

I propose to use Rust-style error messages in Scalac, but I’m open to change my mind if you have good arguments to prefer Elm-style messages.

The error directory (or catalog)

Having unique identifiers for error messages is great because they become immediately searchable. Unique identifiers must be compiler specific, that is an id for the same error in, say, Dotty and Scalac have to be different to avoid interferences between each other. This way, users can quickly google an error id to know more about an unclear or difficult to address error.

As one would expect the ids to be used a lot, I believe it would be useful to have an official Scala error directory where error ids can be searched for (both online and offline). This is a great place to provide longer explanations, link to auxiliary information (blogs or other resources), or ramble about the nature of the error.

As part of this feature, we can provide tools to generate browsable error directories so that the same concept can be used for compiler-dependent tools too (macros and compiler plugins). Down the road, error indices could even be applied to runtime errors in applications (imagine having an error directory for bloop or sbt errors).

Note that the offline error directory may very well be the compiler itself. scalac -explain SE001, where SE001 accounts for an Scalac error, would spit out all the known information about this error message to the console (think manpages for compilation errors). This would help bringing the scalac online docs closer to where they are needed.

Contents of the error messages

The more compiler suggestions error messages have, the merrier. Error messages should strive to be clear and to the point. Those users wanting to know more about errors can do -explain to get call-site dependent information, or search for the id in the error directory.

Technical discussion

On range positions

Range positions are necessary to highlight the offending pieces of code that are wrong. They currently require a compiler flag -Yrangepos because scalac uses “point” positions by default. -Yrangepos's implementation is not infallible and to the best of my knowledge cannot be reliable used as of now – Scala’s issue tracker seems to have some bugs about it.

It’s unclear to me whether those issues are fundamental to the way they work or they can be fixed, but the overhead of enabling range positions seems to be quite high because of increased memory consumption (I remember some experiments made by @xeno-by proved this claim).

The natural route to provide the range highlights would be to enable -Yrangepos by default in 2.13, but given its current flaws I’d like to advocate for a simpler solution. Let’s fiddle with point positions and “end characters” so that we can simulate range positions “reliably”. In pratice, we would say that a highlighting region ends when a curly brace, a bracket or other special characters are hit. This is the approach taken in Dotty. This would remove the overhead of enabling range positions.

If this is not reliable enough, we can have the compiler parse the call-site of error/warning messages with an improved version of -Yrangepos (instead of getting the range positions for all tree nodes). To decrease the overhead even more, we could cache this operation with tree attachments.

On IDE consumption

Consuming error messages is not easy for an IDE: error messages need to be parsed manually if there’s no easy access to the compiler internals. Sometimes, downstream tools need to create ad-hoc parsers to get file, line and column metadata.

To address this problem, I propose that Scalac defines an schema for error messages and provide the tools downstream to read them. In particular, I lean towards having the compiler emit protobuf (or, alternatively, json). The benefit of protobuf over json is that protobuf has code generators for most of the languages, and depending on a protobuf file in a python tool is easy.

On how to model errors internally

I think Scalac will need a core abstraction Message similar to the one that Dotty has. Have a look at the linked PR to get a feeling of how messages are defined.

I want to help!

I cannot imagine shipping this task without the help of our Community. We don’t only need feedback on what’s the best message format or which things you find annoying about our current error messages, but we need also help improving errors messages by itself. Dotty has shown that this is possible.

When this discussion is more developed, I’d like to put up a list of tasks that contributors could help us with. In the meanwhile, please do contribute to this discussion and tell us what you think.

I cannot wait for better error messages in Scalac .

soronpo · January 14, 2018, 9:34pm

In addition, I would love a clickable link to search for the message online, as I mentioned before on Compiler warnings/errors with http document pointers to help resolve them

I’m not entirely sure that’s good. Dotty and Scala have common ground. In the future they are supposed to be merged into one. Why should the same error messages that exist in both have a different ID?

Possibly relevant issues:

Local warning suppression. It would be great if we can properly reference a warning ID and suppress it for a specific scope/line.
Somehow provide better implicitNotFound error cascading messages: Allow cascading custom error messages with `@implicitNotFound`

jvican · January 14, 2018, 9:46pm

Because errors are compiler specific. When I look for an error ID, I only care about that concrete error for my compiler. I would consider an annoyance to see results of the same error in Dotty because theres no guarantee the error is the same.

There’s no plan to merge Scalac and Dotty because they can’t be merged, they are different compilers. I guess you meant that at some point Scalac users will transition to Dotty When that’s the case, having compiler independent error ids will be important.

Consider what would happen if I’m a beginner, I google around the number and I start visiting SO answers about an error id that actually comes from Scalac (but which I don’t discover until hours after).

My point: the name of the compiler should be in some way in the error id so that I can distinguish where a certain error comes from at first glance.

jvican · January 14, 2018, 9:55pm

In addition, I would love a clickable link to search for the message online, as I mentioned before on Compiler warnings/errors with http document pointers to help resolve them

I’m not sure this is a good idea. I’m afraid of having external links that can die in an error. I would be even wary of linking to the Scala docs website from it I think having an error catalog and forcing you to search for the id is a better solution with a lower maintenance cost and lesser risk.

Absolutely I’d like the new architecture to allow users to filter warnings.

Any concrete suggestions? I believe this is an area where we can innovate compared to other languages: implicits need special suggestions and tooling support to be nicer to use.

soronpo · January 14, 2018, 10:08pm

I still think it is good to have some common ground. For instance, let’s assume type mismatch is numerated as 342. Both Dotty and Scalac have this very-same error. I think it is good in this case that one will be ID-ed as DT342, while the other as SC342.

jvican · January 14, 2018, 10:11pm

Can you describe an scenario where that common ground would be useful? I’m not convinced — I think it introduces an unnecessary coordination problem where free will would do just as good

soronpo · January 14, 2018, 10:14pm

Every time the solution is the same.

In any case, if we want the IDs to be different, then the errors must have unique prefix/suffix to make sure two IDs between the compilers never match.

nafg · January 14, 2018, 10:15pm

tek/splain is a compiler plugin that makes implicit errors a lot more useful

jvican · January 14, 2018, 10:17pm

Not necessarily, error messages are compiler specific and you could see them as implementation details leaking. You cannot assume the solution to the error is the same. I echo my previous concerns.

Forgot to add it to the “Current tools” section, thanks for the reference.

mghildiy · January 15, 2018, 3:17am

Does it mean that error message reporting into Scalac is inferior compared
to other languages being mentioned here?
I have started learning Scala only very recently, and I haven’t faced
anything so far which makes difficult for me to draw correct inferences.
Any link/documentation/blog on this topic?

psp · January 15, 2018, 5:21am

There are at least two distinct tasks and you would do very well to decouple them. The sum of the complexities of the separate tasks is much lower than the complexity of the combined task. Separating the tasks also enables a world of uses which will be impossible if they are lumped together.

Those tasks are:

The design of an Error ADT
An interpreter which implements Error => String

We spend half our lives trying to unfry the egg (turn the message string back into its actual meaning) or to refry the egg (filter or transform the message string into a different string suitable for the context) and it’s all very pointless. The error message is just a view of the ADT. Let people fry their own egg. Don’t let the map become the territory.

nafg · January 15, 2018, 6:03am

+1 for decoupling them

soronpo · January 15, 2018, 8:25am

Paul, that’s very interesting. Do you have an example of any language and compiler that do that?

dragos · January 15, 2018, 9:54am

It’s a great initiative. I really like the idea of having unique IDs, especially for warnings. This would allow warning levels and individual on/off switches. For an example of how to document error messages, have a look at the C# compiler error reference. They’ve been doing it for ages for all their compilers.

Regarding range positions, I think it’s a distraction. In terms of real value for users they add very little. Once you see the point where the error starts, knowing where it ends won’t make you suddenly understand the error you had no idea about. Given their complexity and memory penalty I think they’re not worth pursuing in this context. The place where they might help is IDEs, but highlighting the current token is usually good enough.

I have some concerns about a hard schema for error messages: it might be too painful to evolve. You’d probably do that by stuffing additional information in a “details” string field, leading back to clients parsing that string (or break the schema and clients built on top of it). On the flip side, the compiler is mature enough to have a relatively stable error message structure by now, so who knows.

However, I think syntactic improvements in how errors are presented miss the biggest problem with errors:
those that exhibit “spooky action at a distance”: a top-level implicit not found due to a failure deep inside the implicit search tree. You’d actually want to see the innermost error (or errors), but you’re presented with something like this:

Cannot materialize pickler for non-case class: List[model.Command]. If this is a collection, the error can refer to the class inside.
[error]     AutowireClient[Api].exec(cmds).call().toRx.map {
[error]                                        ^
[error] one error found

So, besides better syntax, I think semantic improvements (provide more meaningful context) will be the real win. There’s an entire PhD thesis on that subject, though the overhead of the additional tracking made it prohibitive to include in regular Scala. Maybe a simplified version of -Xlog-implicits trace limited to the current error would be a good start.

jvican · January 15, 2018, 10:25am

I agree with you, these tasks are better split up. That’s what I meant with:

I think the best way to move this forward is that someone works on the compiler infrastructure, and then other contributors improve actual error messages (one per PR).

The idea is that contributors work on the Error => String part where one developer focuses on the ADT design. I guess that your comment emphasizes the need for pluggable message interpreters rather than only having the stock one, am I right? (especifically for different uses like IDE consumption I suppose)

I think protobuf would make it easier to evolve — not sure how painful that would be but in Zinc we use protobuf for essentially the same task and we find it pretty stable. It’s a part of the schema that hasn’t been changed in ages too.

The error schema should decouple the actual details (as you mentioned) and all the metadata, so that clients can reconstruct messages from it. But I strongly encourage clients not to parse the details: they should just show the details as they are.

I agree, it’s already been linked before but GitHub - tek/splain: better implicit errors for scala shows implicit resolution chains when an implicit fails. I’ve never used it myself, but I see the value of doing so and doing some work to make the errors easier to follow.

I think it would be cool to also suggest which imports are missing for the use of an extension method. Imagine I use the extension method map but it’s not in scope because I forgot to import it. I’d like the compiler to suggest me the imports I need to add to my file for the code to compile (this, technically, may prove quite challenging because it requires the compiler to know where all these extension methods are and what their signatures is).

Yes, Rust has many error messages that benefit from this semantic improvements. I personally don’t mind the syntax, and that’s something that can be worked on after the core reporting abstractions are redesigned.

fanf · January 15, 2018, 11:35am

I think @psp idea is extremelly important. It would make possible to let people choose the verbosity of error message with a flag, experiment on several rendering (and use an plugin which better fits their personnal taste).

You can even thing at one more indirection level so that their is an user actionnable Error ADT => Error ADT step between the compiler error analysis and the compiler internal error management. It would make possible to finally take care of selectivetly ignoring warning messages like deprecation one ( https://github.com/scala/bug/issues/7934 ), or even selectively change the error level of class of errors (“I want that non-exhaustive pattern matching are ALWAYS errors, because why the hell aren’t they?”)

jvican · January 15, 2018, 11:55am

I’d like to add a few questions to help Scala Contributors give us feedback. Without Community feedback, we only have partial ideas and not actionable items.

What are the improvements in error reporting that you can imagine?
Can you name a few of them and tell us how the compiler would suggest you solutions?
What is a good error message for you?

I want this discussion to shed some light on the best way to see this initiative through, and how this change would be welcome by every single developer in our Community. After that, we can create a plan.

fanf · January 15, 2018, 12:23pm

Your proposition are - for me - very good. I’m also prefering the Rust still to the Elm style, but I don’t thing I’m sure why.
There was same nice discussions on twitter (at least) when @felixmulder talked about the subject for Dooty around end of 2016. See for ex https://twitter.com/FelixMulder/status/776828995232989184 (but there was a lot more, IIRC). Perhaps could bring some insights about what he gathered, to?

For, me one important point for an error message reporting is to be able to quickly saw what is the difference between expected and current thing. In fact, whatever help me diff at a glance

Typically, when using a type intensive lib (Shapeless, Freek, etc), you don’t care about the 15 same type, you want to see the actual difference between what was provided and what was given. @psp had a lot of tweets / resources on the subject, but unfortunately, that get deleted.
For that, it is also generally preferable to keep alias type than fully resolved in the summary (and only have the fully resolved in a detailled message).

An other example is when you have a case class and you missed a parameter, the compiler should be able to point to the missing one, something like (not actual syntax/presentation I would like :):

foo(a,    c) 
Error: missing b?

fommil · January 15, 2018, 1:14pm

Whatever you do, please do not break the parsable format that sbt already uses or you will break Emacs and I do not have the bandwidth to fix that.

Also, bare in mind colour blind and blind developers. Do not rely on colour to replace text content. You may want a trial group (btw there are the sorts of legacy things can to existing systems have spent years optimising, and a rewrite / redesign will need to do the same again)

I am all for better messages, especially for implicits, but I see no reason to change the layout by default. For opt in, do whatever you want.

jvican · January 15, 2018, 1:56pm

The error format will be broken. The point is to have this by default for every Scala developer, not to force them to remember a flag they can use to get more readable errors.

When you parse error messages, you’re relying on an implementation detail of the compiler, you cannot expect that the format the compiler uses will be the same forever. It’s like parsing the output of scalac -Y to know which flags does the compiler support. It probably works short-term, but it’s a terrible idea.

As I’ve said before, the idea is to provide cross-language readers/writers via schema files or protos. You could use GitHub - brown/protobuf: Common Lisp implementation of Google's protocol buffers for a fast migration or ask other emacs users to do it. We’re talking long-term, there’s still time before 2.13 is released, and remember that making sure external tools can consume error messages is part of the plan.