Revisiting Dotty Diagnostics for tooling

I’d like to revisit some of the previous discussions and movement around the idea of “structured diagnostics” in Dotty. While the situation of reporting diagnostics to the end user to view has greatly improved in Scala 3 compared to Scala 2, the format that they are actually returned in has pretty much remained unchanged. Meaning our tools are left parsing error strings to get the information the compiler has and has reported, but may be needed for further action.

For example, in Scala 2 while using Metals you can see how certain code actions are determined by regex matching on the actual diagnostic message. This has remained a common approach in various tools dealing with compiler errors. I think the hope was that in Scala 3 this situation would be improved and instead we could rely on something more stable for error messages such as an error index like Rust has or something with a more structured output to more easily tell the type of diagnostic being returned (not referring to INFO, WARNING, ERROR, but type of ERROR).

Many times the compiler knows the action to take, like in the case here:

[error] -- Error: /Users/ckipp/Documents/scala-workspace/sanity/Sanity/src/example/Hello.scala:18:7
[error] 18 |object B extends A
[error]    |       ^
[error]    |object creation impossible, since:
[error]    |it has 2 unimplemented members.
[error]    |/** As seen from module class B$, the missing signatures are as follows.
[error]    | *  For convenience, these are usable as stub implementations.
[error]    | */
[error]    |  def a: Int = ???
[error]    |  def b: Int = ???

The question then becomes, how does an editor use this information to provide an action to the user which will then implement a and b? Parse the error to get the recommendations? Compute it themselves using some other method after detecting this error exists? Instead of just returning the message, position, and level, what else could be returned here to help tooling in this situation?

A while back we tried implementing a code action in Metals to create code actions from dotty suggestion but there was a comment by @smarter here that said:

The exact format of the compiler message strings is not stable and may change at any time, please do not rely on it. Instead, I encoruage you to contribute upstream a way to expose this information semantically in our Reporting API (we also implement scalafix-like rewrites which could be displayed as quick fix actions in vscode).

So based off this comment, what could this look like to expose this information semantically to the reporting API. Do people have ideas of structure? Concerns of this breaking existing tooling, etc? I’d like to start a discussion around this to see how the situation could be improved. I know this has been heavily discussed in the past, mainly with the focus of showing those diagnostics to the users, but I’m more concerned about tools being able to get useful information out of them.

23 Likes

The difficulty is probably the way sbt (and other tools) currently hook into the reporter interface, changes there might be not possible. But I could imagine an extension to BSP or LSP so that Metals can go back and ask for more details about the messages that it received, maybe by position. I don’t know how the different layers talk to each other. What’s the path from report.error (which takes a Message, so has at least some structured representation) in the compiler to what Metals receives? Could the build server store the structured messages and respond to such queries, or enrich the messages in the first place?

It would be awesome to at least get the error code and have list of all possible error codes that we can use. LSP and BSP already have a code field that is not being used at all currently.

Knowing the code we suggest some actions and for anything that requires more information we might use the presentation compiler.

Even better solution would be to have a TextEdit attached to a diagnostic with a ready quick fix, but this would require much more effort in tooling, protocols (LSP and BSP don’t support it). I think having an additional error code would be a good middle ground.

1 Like

So this was an interesting exercise for me, so I’ll post the full chain down below in case others are curious about the details as well. If I got it right, the full chain looks something like this:

  1. We’ll start with the report.error like you mention which can take a Message and a position. That message does actually have a ErrorMessageId, but more on that later
  2. The Message above is part of the Diagnostic that Dotty puts together. That diagnostic message method is what ends up many times being the actually message that is carried on to what the user eventually sees.
  3. For any tooling utilizing Zinc, this Diagnostic will end up being translated to a Problem. You can see these xsbti.Problem in sbt, Mill, and Bloop for example if you poke around, many times with a wrapper specific to that tool. We’ll focus on the Bloop path moving forward.
  4. Once Bloop has this Problem it wraps it in CompilationEvent.Diagnostic. This is what then actually is used to create a BSP Diagnostic that is used to forward a build/publishDiagnostics notification to Metals.
  5. Once Metals gets those diagnostics it basically maps them right to an LSP Diagnostic to which it sends to the client to display.

With all the above in mind, it seems that much of the diagnostic structure that is passed around is based on the Problem which is quite limited and the Dotty implementation of this you can fine here in sbt-bridge and for brevity I’ll just include the fields below:

  private final Position _position;
  private final String _message;
  private final Severity _severity;
  private final Optional<String> _rendered;

The next most relevant structure that we have to work with at the moment is the BSP Diagnostic. Again, for brevity below are the fields:

  private Range range;  
  private DiagnosticSeverity severity;  
  private String code;  
  private String source;  
  private String message;
  private DiagnosticRelatedInformation relatedInformation;

Here we have the addition of source and relatedInformation, neither which are used.

Agreed, I think this would be the most minimal thing that we could do to be able to more reliable tell the type of diagnostic we are dealing with in tooling. A couple observations on this:

  • There seems to at least a handful of errors on first glance in Dotty that don’t have an ErrorMessageId. For example the diagnostic I used to illustrate this from the beginning about object creation impossible has no ID. There is never an actual Message created for this. I’m assuming that we’d have to go through and ensure all of the errors actually get a Diagnostic created for them, which means a Messages, which then means it need an ErrorMessageId.
  • While we have these IDs they aren’t listed anywhere and not really used as a lookup of any type. This may need to be restructured a bit or at least published somewhere so people don’t need to dig into the code to find these out.

Could you expand more on why you think this might not be possible? Is it mainly compat concerns? I think it’s actually important that these tools do take into account some of these changes because many times they are the ones forwarding that info on. So at the very least, changes will have to be made to at least account for the error code, unless we want to parse it out of a string, but then we start losing out on the benefits anyways. Maybe there can be a period where we have legacy mode for reporting or something to give time for tooling to take into account the new structures.

The more I dig into this the more I realize how lacking the structure is for our Diagnostics compared to other languages that we’ve been using as an example. While adding the error code would be a great start, I still think we’re just scratching the surface at what could be reported. For example, take a look through what rustc returns just in the DiagnosticSpan alone.

struct DiagnosticSpan {
    file_name: String,
    byte_start: u32,
    byte_end: u32,
    /// 1-based.
    line_start: usize,
    line_end: usize,
    /// 1-based, character offset.
    column_start: usize,
    column_end: usize,
    /// Is this a "primary" span -- meaning the point, or one of the points,
    /// where the error occurred?
    is_primary: bool,
    /// Source text from the start of line_start to the end of line_end.
    text: Vec<DiagnosticSpanLine>,
    /// Label that should be placed at this location (if any)
    label: Option<String>,
    /// If we are suggesting a replacement, this will contain text
    /// that should be sliced in atop this span.
    suggested_replacement: Option<String>,
    /// If the suggestion is approximate
    suggestion_applicability: Option<Applicability>,
    /// Macro invocations that created the code at this span, if any.
    expansion: Option<Box<DiagnosticSpanMacroExpansion>>,
}

Things like suggested_replacement and suggestion_applicability seem incredibly helpful for tools to be able to easily get what they need to fix the issue and then know if they can reliably apply it or not.

Adding the error code is a great start, but if we are considering breaking things, we might as well do it in a way that truly enhances the diagnostics. Again, given the diagnostic in the original post about impossible object creation, it’d be great to have a suggestedFix field or something that would just directly give you:

def a: Int = ???
def b: Int = ???

Then just like rust, this suggestedFix would have to have some sort of applicability enum or something saying that this has placeholders. All that to say, I’d hate for us to go through a ton of work that will need changes in Dotty, Zinc, and everything that uses it only to add an error code when there could probably be much more that we could do.

In order to at least have a bit of tangible action points to start out with. Here is what I’d propose:

  1. Is it true that all errors should have a DiagnosticId? If so, a good first step would be to go through and tackle these. Give IDs to those that don’t have them, form a Diagnostic for them, and follow through on some of the work proposed in Error-specific Diagnostics subclasses by samuelgruetter · Pull Request #92 · lampepfl/dotty · GitHub to ensure that we are always reporting a Diagnostic.
  2. Related to the above, we should get the DiagnosticIds documented and published somewhere.
  3. Start trying to get some consensus on how the shape of what is reported can change. Already the Message has a DiagnosticId, so if we create Messages for all the errors we’ll have that covered internally, but then what about accounting for this in Problem, which many tools interact with. Maybe someone from the sbt/zinc side should chime in on this.

We can have nice things, let’s move on from just relying on msg, srcPos, and severity.

10 Likes

Sounds like a plan. Relatedly, we’d really benefit from being able to represent what the LSP calls relatedInformation inside the compiler, we already have two situations in which we report a stack of positions, each using its own logic, and none of which maps into something the IDE understands:

4 Likes

Also related (or at least worth keeping in mind): `sourcePositionMapper` of sbt doesn't work with `DelegatingReporter` of Dotty · Issue #14691 · lampepfl/dotty · GitHub / The compiler interface should expose sourcePositionMappers so that Scala 3 can render messages with the proper positions · Issue #1074 · sbt/zinc · GitHub

2 Likes

On that subject, here’s what had to done last time I wanted to enrich sbt reporting (not even counting various backports/forward-ports between branches):

Easy, right? :sweat_smile:

6 Likes

Also regarding errors, it’s worth thinking about an ability to hook your own reporter somehow and replace some of the default error messages with better errors that are specific to a given library. That was somewhat possible with scala-clippy, but no longer is under Scala 3.

Ahh, yea this is another great point, thanks for bringing it up! I actually am curious how editors handle DiagnosticRelatedInformation. I’ll dig into that to get some insight. As for stacking, in theory someone like rust does with a Diagnostic having children should also be doable I think. I know that could be quite useful in different scenarios and then easily translatable to the LSP DiagnosticRelatedInformation.

Also a good point. I think about this one some more as it wasn’t even on my radar.

:sweat_smile: yes, I understand any change here won’t be a small undertaking.

2 Likes

Thanks for digging through and writing down the details Chris!

Yes I had compatibility in mind, old versions of zinc should probably continue to work with newer compilers (?).

If I remember correctly, there were some plans / thoughts at some point for zinc to support xsbti versions, i.e., zinc would dynamically test which xsbti version the compiler implements and then call the compiler through the corresponding interface I didn’t manage to dig it out right now, maybe you find it through the links that Guillaume shared. So if it’s indeed a fundamental problem that each message is reduced down to todays xsbti.Problem, maybe there’s a way around that.

As long as compiler-interface doesn’t break binary compatibility that should be fine. There’s also a versioning scheme: zinc/internal/compiler-interface/src/main/java/xsbti/compile/CompilerInterface2.java at develop · sbt/zinc · GitHub, if needed zinc can add a CompilerInterface3 and the compiler can implement both.

So I played around with this a bit today to get an idea of how editors would handle the related information, and given the “inlined positions” example here is how VS Code could handle this if it was returned structurally.

2022-03-26 09.53.17

This is pretty neat and definitely illustrates the benefits of having something like this mapped to DiagnosticRelatedInformation in terms of code navigation.

8 Likes

That very nice, however I think it can quickly become TMI in code bases that do a lot of meta programming and inlining.

Possibly true, but I’m working from the assumption that the majority of code bases people work in and the majority of users aren’t in code bases where there are a lot of meta programming and inlining. So the benifit of scenarios like this where it is useful far outweigh the potential of TMI in code bases that are full of meta programming. Either way, good to keep in mind.

3 Likes

Related in part to the outlined steps here of getting the current ErrorMessageIDs documented, I’ve been slowly doing this in GitHub - ckipp01/dotty-error-index: A WIP error index for Dotty. It’s actually been a fun exercise so far. And of course, by all means, if anyone feels inclined to help I have more details about how everything works here. There’s even a list of ones that I wasn’t able to reproduce so far :smiley: .

9 Likes

You’re doing god’s work!
This needs more attention, and I suggest you tweet about it and post as a help-wanted issue on the dotty repo. Also consider mention this for the next the Scala spree.

To keep everything linked, I’ve also started a discussion on the sbt side about the changes necessary to Problem, which I’d like to get the various parties on board with before really digging in. Also related to the DiagnosticRelatedInformation that could be used for various things, there are some discrepancies with the way it’s done in LSP vs BSP, so hopefully that can be aligned as well and I’ve opened an issue about it.

1 Like