So this was an interesting exercise for me, so I’ll post the full chain down below in case others are curious about the details as well. If I got it right, the full chain looks something like this:
- We’ll start with the
report.error
like you mention which can take a Message
and a position. That message does actually have a ErrorMessageId
, but more on that later
- The
Message
above is part of the Diagnostic
that Dotty puts together. That diagnostic message
method is what ends up many times being the actually message that is carried on to what the user eventually sees.
- For any tooling utilizing Zinc, this
Diagnostic
will end up being translated to a Problem
. You can see these xsbti.Problem
in sbt, Mill, and Bloop for example if you poke around, many times with a wrapper specific to that tool. We’ll focus on the Bloop path moving forward.
- Once Bloop has this
Problem
it wraps it in CompilationEvent.Diagnostic
. This is what then actually is used to create a BSP Diagnostic
that is used to forward a build/publishDiagnostics
notification to Metals.
- Once Metals gets those diagnostics it basically maps them right to an LSP Diagnostic to which it sends to the client to display.
With all the above in mind, it seems that much of the diagnostic structure that is passed around is based on the Problem
which is quite limited and the Dotty implementation of this you can fine here in sbt-bridge and for brevity I’ll just include the fields below:
private final Position _position;
private final String _message;
private final Severity _severity;
private final Optional<String> _rendered;
The next most relevant structure that we have to work with at the moment is the BSP Diagnostic. Again, for brevity below are the fields:
private Range range;
private DiagnosticSeverity severity;
private String code;
private String source;
private String message;
private DiagnosticRelatedInformation relatedInformation;
Here we have the addition of source
and relatedInformation
, neither which are used.
Agreed, I think this would be the most minimal thing that we could do to be able to more reliable tell the type of diagnostic we are dealing with in tooling. A couple observations on this:
- There seems to at least a handful of errors on first glance in Dotty that don’t have an
ErrorMessageId
. For example the diagnostic I used to illustrate this from the beginning about object creation impossible
has no ID. There is never an actual Message
created for this. I’m assuming that we’d have to go through and ensure all of the errors actually get a Diagnostic
created for them, which means a Messages
, which then means it need an ErrorMessageId
.
- While we have these IDs they aren’t listed anywhere and not really used as a lookup of any type. This may need to be restructured a bit or at least published somewhere so people don’t need to dig into the code to find these out.
Could you expand more on why you think this might not be possible? Is it mainly compat concerns? I think it’s actually important that these tools do take into account some of these changes because many times they are the ones forwarding that info on. So at the very least, changes will have to be made to at least account for the error code, unless we want to parse it out of a string, but then we start losing out on the benefits anyways. Maybe there can be a period where we have legacy
mode for reporting or something to give time for tooling to take into account the new structures.
The more I dig into this the more I realize how lacking the structure is for our Diagnostics compared to other languages that we’ve been using as an example. While adding the error code would be a great start, I still think we’re just scratching the surface at what could be reported. For example, take a look through what rustc returns just in the DiagnosticSpan
alone.
struct DiagnosticSpan {
file_name: String,
byte_start: u32,
byte_end: u32,
/// 1-based.
line_start: usize,
line_end: usize,
/// 1-based, character offset.
column_start: usize,
column_end: usize,
/// Is this a "primary" span -- meaning the point, or one of the points,
/// where the error occurred?
is_primary: bool,
/// Source text from the start of line_start to the end of line_end.
text: Vec<DiagnosticSpanLine>,
/// Label that should be placed at this location (if any)
label: Option<String>,
/// If we are suggesting a replacement, this will contain text
/// that should be sliced in atop this span.
suggested_replacement: Option<String>,
/// If the suggestion is approximate
suggestion_applicability: Option<Applicability>,
/// Macro invocations that created the code at this span, if any.
expansion: Option<Box<DiagnosticSpanMacroExpansion>>,
}
Things like suggested_replacement
and suggestion_applicability
seem incredibly helpful for tools to be able to easily get what they need to fix the issue and then know if they can reliably apply it or not.
Adding the error code is a great start, but if we are considering breaking things, we might as well do it in a way that truly enhances the diagnostics. Again, given the diagnostic in the original post about impossible object creation
, it’d be great to have a suggestedFix
field or something that would just directly give you:
def a: Int = ???
def b: Int = ???
Then just like rust, this suggestedFix
would have to have some sort of applicability enum or something saying that this has placeholders. All that to say, I’d hate for us to go through a ton of work that will need changes in Dotty, Zinc, and everything that uses it only to add an error code when there could probably be much more that we could do.
In order to at least have a bit of tangible action points to start out with. Here is what I’d propose:
- Is it true that all errors should have a
DiagnosticId
? If so, a good first step would be to go through and tackle these. Give IDs to those that don’t have them, form a Diagnostic
for them, and follow through on some of the work proposed in Error-specific Diagnostics subclasses by samuelgruetter · Pull Request #92 · lampepfl/dotty · GitHub to ensure that we are always reporting a Diagnostic
.
- Related to the above, we should get the
DiagnosticId
s documented and published somewhere.
- Start trying to get some consensus on how the shape of what is reported can change. Already the
Message
has a DiagnosticId
, so if we create Message
s for all the errors we’ll have that covered internally, but then what about accounting for this in Problem
, which many tools interact with. Maybe someone from the sbt/zinc side should chime in on this.
We can have nice things, let’s move on from just relying on msg
, srcPos
, and severity
.