-
Notifications
You must be signed in to change notification settings - Fork 40
new RFC: short error identifiers #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gasche
wants to merge
1
commit into
ocaml:master
Choose a base branch
from
gasche:error-identifiers
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| # Short identifiers for error messages | ||
|
|
||
| ## Motivation | ||
|
|
||
| We propose to assign stable, short identifiers to each error produced | ||
| by the OCaml compiler, and include this short name in the error | ||
| message. This makes it much easier to refer to those errors message in | ||
| documentation, user questions / answers, and to search on the web for | ||
| material on a specific error. | ||
|
|
||
| This feature was inspired by Rust which does precisely this: | ||
|
|
||
| - Example rust error message: | ||
|
|
||
| ``` | ||
| error[E0268]: `break` outside of a loop | ||
| --> src/lib.rs:2:5 | ||
| | | ||
| 2 | break rust; | ||
| | ^^^^^^^^^^ cannot `break` outside of a loop | ||
| ``` | ||
|
|
||
| The error identifier is `E0268` here. | ||
|
|
||
| - Index of all errors: | ||
| https://doc.rust-lang.org/error-index.html | ||
|
|
||
|
|
||
| ## Imaginary example | ||
|
|
||
| Current behavior: | ||
|
|
||
| ``` | ||
| # let x = 2 + "foo";; | ||
| Error: This expression has type string but an expression was expected of type | ||
| int | ||
| ``` | ||
|
|
||
| New behavior: | ||
|
|
||
| ``` | ||
| # let x = 2 + "foo";; | ||
| Error[tyco007]: This expression has type string but an expression | ||
| was expected of type int | ||
| ``` | ||
|
|
||
| In this example the error identifier is `tyco007`. | ||
|
|
||
| ## Proposed identifier format | ||
|
|
||
| (We are of course happy to change this if there is demand.) | ||
|
|
||
| We propose messages such as "tyco007" formed of a "group", the prefix | ||
| "tyco", and a number, 7. | ||
|
|
||
| We propose to keeep a fixed length for messages, 7 characters, | ||
| following the Rust approach. (A guess about the motivations: it makes | ||
| formatting more consistent and search easier.) Currently the format we | ||
| use has one group per compiler module emitting errors (hre "tyco" is | ||
| short for "typecore"), but one could move to E123456 later in the | ||
| future using a single group "E". | ||
|
|
||
| ```ocaml | ||
| type error_identifier = { | ||
| group: string; (* 4 characters *) | ||
| num: int; (* 0 <= num < 1000 *) | ||
| } | ||
| ``` | ||
|
|
||
| Note that the format proposed here is very different from the format | ||
| of warning identifiers (`missing-record-field-pattern`). Users want to | ||
| refer to specific warnings in their build system and code to | ||
| enable/disable them, so we want descriptive, human-friendly | ||
| identifiers. In constrat we don't know of a reason to refer to | ||
| a specific error in our code, so error identifiers need not be | ||
| readable and informative; we want them short and to the point. | ||
|
|
||
| (Why the group prefix then? Actually it simplifies the implementation | ||
| to give per-module identifiers instead of trying to pick global | ||
| numbers and avoiding conflicts between two errors trying to use the | ||
| same number. It might also help users identify classes of errors.) | ||
|
|
||
|
|
||
| ## Impact on the compiler implementation | ||
|
|
||
| An important question with this proposal is the maintainability impact on the compiler codebase. We sketched an implementation to see how invasive it would be. | ||
|
|
||
| Currently our error-printing logic has code looking as follows, in each module reporting new errors: | ||
|
|
||
| ```ocaml | ||
| let report_error ~loc env = function | ||
| | Constructor_arity_mismatch(lid, expected, provided) -> | ||
| Location.errorf ~loc | ||
| "@[The constructor %a@ expects %i argument(s),@ \ | ||
| but is applied here to %i argument(s)@]" | ||
| longident lid expected provided | ||
| | [..] | ||
| ``` | ||
|
|
||
| We assume a `identifier_of_error : error -> Location.error_identifier` in each such module, | ||
| so the change to `report_error` would be minimal: | ||
|
|
||
| ```ocaml | ||
| let report_error ~loc env err = | ||
| let identifier = identifier_of_error err in | ||
| match err with | ||
| | Constructor_arity_mismatch(lid, expected, provided) -> | ||
| Location.errorf ~loc ~identifier | ||
| "@[The constructor %a@ expects %i argument(s),@ \ | ||
| but is applied here to %i argument(s)@]" | ||
| longident lid expected provided | ||
| | [..] | ||
| ``` | ||
|
|
||
| ### Assigning identifiers to error values | ||
|
|
||
| This cannot easily be done automatically, because the error number | ||
| must be stable, even if the error type changes (a constructor is | ||
| renamed, reordered, split in two, several constructors are factored | ||
| into one...). | ||
|
|
||
| We thus propose to have a manually-written `identifier_of_error` function | ||
|
|
||
| ```ocaml | ||
| let identifier_of_error err = | ||
| let group = "tyco" in | ||
| let open Location in | ||
| match err with | ||
| | Constructor_arity_mismatch _ -> { group; num = 7 } | ||
| | [...] | ||
| ``` | ||
|
|
||
| (Note: in general there is no one-to-one mapping between constructors | ||
| in the error type and the user-level concept of "errors", two | ||
| constructors could be the same error or one constructor could be | ||
| different errors depending on its argument.) | ||
|
|
||
| The exhaustivity check for pattern-matching on this function | ||
| guarantees that we always assign am identifier to all errors, and in | ||
| particular don't forget picking an identifier for new errors. | ||
|
|
||
|
|
||
| #### Advanced | ||
|
|
||
| If we wanted to be able to generate nice documentation for each error, | ||
| or at least compute statically the set of error identifiers currently in | ||
| use, we could attach an "example" of each error (an OCaml value | ||
| belonging to this error case, chosen to be representative for this | ||
| error and read nicely when printed by the error report) to each case | ||
| of the `identifier_of_error` function using a bespoke attribute | ||
| `[error.example]`: | ||
|
|
||
| ```ocaml | ||
| let identifier_of_error err = | ||
| let group = "tyco" in | ||
| let open Location in | ||
| match[@error.table] err with | ||
| | Constructor_arity_mismatch _ -> | ||
| { group; num = 7 } | ||
| [@error.example (Constructor_arity_mismatch (Lident "foo", 1, 2))] | ||
| | [...] | ||
| ``` | ||
|
|
||
| Then a custom ppx could easily extract a list of all "errors" | ||
| expressions in the `[error.list]` match, and for example print all | ||
| errors or build the set of error identifiers. (We already have | ||
| a similar ppx running on utils/warnings.ml to check the links to the | ||
| manual.) | ||
|
|
||
| Having the examples in this function is safer than simply defining | ||
| a global list of all errors separately: if we add a new error | ||
| constructor, exhaustivity warnings will force us to add a new case to | ||
| `identifier_of_error` and thus include an example. | ||
|
|
||
| Such a list of all errors can also be used to check that no two errors | ||
| are given the same identifier. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced by the format nor the rationale.
I think the fact that it's an abbreviation that maps to the module is likely to be interesting only to compiler devs. And the day your code moves to another module what do you do with the uid ? Is it that complicated to have a single module
Error_idwhere you define errors and which you draw from ?Why not keep it to EXXX/WXXX, short and actually tells the user whether that's an error or a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module name in the example was mostly a proxy for the class of errors. For instance, extending the examples
synt: syntax errorstyco: core language type errorstymo: module languagestymimodule inclusion errorstyex: type expression errorslink: linking errorcpmo: module compilation errorIf we ever change the location/classification of an error, it seems fine to change the identifiers of the errors.
One advantage that I see with those errors is that thematically adjacent errors will share similar identifiers even after many renaming and moves (which doesn't happen that often). Contrarily, with dense set of numerical identifiers, a single splits of errors might move two sibling errors far away from each other.
However, I kind of agree that the "themes" themselves might be only partially understandable at first glance for non-compiler developers, but I think that the similarity of the errors will make sense for users.
Defining all error datatypes in a single module does not really work that well with the current architecture: error constructors often carry complex payload that are strongly tied to the concepts of the modules that raise them.
We could define the identifier part centrally, and dispatch the error constructors themselves to those identifiers, but that creates two distinct source of truths and probably make a coherency tool a little more mandatory.
Overall, we could definitively have a single numerical identifiers, but the idea of having group of errors seemed slightly attractive both in term of user experience and compiler development. But it could be totally be than my vision of user experience is far too biased in that specific instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents
This map of knowledge only makes sense when you look at it entirely. When those abbreviations might appear on your console, out of context it wouldn't give any valuable information.
For example when "tyex" appears on your errors for the first time until the N time and realise it has some meaning (and there are other codes that mean other errors) that's already late since you would understand the map as a hole and context isn't needed anymore.
I like the proposal, it adds searchability to errors and given the extended places to ask/receive help online it's huge. I would either make it indexable by a number or give the entire context in the error. Instead of
synt->SYNTAX_ERROR_XPS: Sorry If I jumped into an issue where no one called me out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davesnx this is a reasonable point, but we also want to balance with the size of the identifier which should not become too long, because it is less interesting to humans than the actual human-readable explanation of the error, and still displayed first. We could decide to move the error identifier at the end of the error message, which would un-constrain this aspect of the design space.
@dbuenzli my gut feeling is that having per-class numbers, instead of whole-compiler numbers, is going to make our life easier. For example, it is going to greatly decrease the number of situations where two independent compiler changes conflict (logically and/or textually) because they want to introduce the same error number.