Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lint for broken doc links #13696

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

maxcnunes
Copy link

fixes #2179

changelog: [doc_broken_link]: Add pedantic lint to catch broken doc links that won't produce a link tag by rustdoc.

@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Jarcho (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Nov 16, 2024
@maxcnunes maxcnunes force-pushed the lint-doc-broken-links branch from 8d859c9 to 6ff53c7 Compare November 16, 2024 16:34
@maxcnunes
Copy link
Author

Just noticed there are tests failing due to a false positive like this:

/// Referencing an slice [T]

This will be considered a broken link although actually it isn't. I guess in order to fix it I won't be able to check fake value from pulldown_cmark::Parser::new_with_broken_link_callback anymore, since it doesn't provide the raw text to check why it was considered a broken link. I will try to work on a different solution, appreciate if there are any suggestions to achieve it.

@maxcnunes
Copy link
Author

@maxcnunes maxcnunes force-pushed the lint-doc-broken-links branch from fc34f5d to 640f282 Compare November 18, 2024 00:24
@maxcnunes maxcnunes force-pushed the lint-doc-broken-links branch from 640f282 to 8deb383 Compare November 18, 2024 00:27
@bors
Copy link
Contributor

bors commented Nov 21, 2024

☔ The latest upstream changes (presumably 8298da7) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Contributor

@Jarcho Jarcho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait. Left a couple of specific comments.

One thing you'll need to do is take the text input from the markdown parser. Currently you'll be linting inside code and html sections. Doc attribute can also contain multiple lines.

clippy_lints/src/doc/broken_link.rs Outdated Show resolved Hide resolved
clippy_lints/src/doc/broken_link.rs Outdated Show resolved Hide resolved
@Jarcho Jarcho added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Dec 6, 2024
@maxcnunes
Copy link
Author

@Jarcho thanks for taking the time to review this.

I applied two of your code improvement suggestions. I am a bit confused though about these others comments:

One thing you'll need to do is take the text input from the markdown parser.

Do you mean this lint should run on this parser's output instead of running before that step, which is done against the rust AST attributes? I tried doing that as the first approach, but the new_with_broken_link_callback we use sanitizes broken links replacing them with a fake text and link values, and that makes impossible to run any of this lint logic.

Currently you'll be linting inside code and html sections.

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Doc attribute can also contain multiple lines.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

@maxcnunes
Copy link
Author

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties and removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Dec 7, 2024
@Jarcho
Copy link
Contributor

Jarcho commented Dec 8, 2024

It is being applied only for AttrKind::DocComment attributes. Doesn't it guarantees only code doc comments are covered by this lint, which I imagine is what we want?

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

This current logic is checking for broken links across multiple lines, so I am confused what you mean on this one, as it is already checking for multiple lines.

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

@maxcnunes
Copy link
Author

@Jarcho So, to make sure I got it right:

Yes. And markdown contain code and html sections. Neither of which will try to parse links.

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

About this other one:

Right now you're assuming that each attribute contains a single line. This is normally true, but isn't guaranteed.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

@Jarcho
Copy link
Contributor

Jarcho commented Dec 11, 2024

Are you saying AttrKind::DocComment attributes are markdown, which include code and html sections, and we should not try to parse links for those types, applying it on real document's content only? If that is what you mean, is your suggestion to follow the same approach from doc/mod.rs which uses the pulldown_cmark to parse the doc comments? But in this case we would not use new_with_broken_link_callback in order to properly handle those broken links, since as mentioned before, they get replaced with fake values in that case.

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Would attributes with multiple lines be represented with \n so I should handle that case or are multiple lines represented in a different format? Also, is there are way to reproduce those multiple lines without actually adding \n to comments so I can properly have tests for that case?

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

@maxcnunes
Copy link
Author

maxcnunes commented Dec 18, 2024

@Jarcho

I don't see why that would stop this from working. You can use the span given to the callback to know where to start parsing the input string for the link's destination.

Ok. I will give the markdown parser another try.

rustdoc joins all the doc attributes together with \n as the separator and then passes that to the markdown parser.

rustdoc might do that, but that is not what I have seen on clippy. Each line is a different AttrKind::DocComment, the linter does not provide a single AttrKind::DocComment merged with \n for all sibling lines of AttrKind::DocComment.

@maxcnunes
Copy link
Author

hey @Jarcho, I just pushed some temporary changes to confirm I am on the right direction.

I added a new function to the pulldown_cmark fake_broken_link_callback handler based on your suggestions. This is the data we have available within that function:

doc="Test invalid link, url part broken across multiple lines.\n[doc invalid link broken url scheme part part](https://\ntest.fake/doc_invalid_link_broken_url_scheme_part)"

 fragments=[
    DocFragment {
        span: tests/ui/doc_broken_link.rs:40:1: 40:62 (#0),
        item_id: None,
        doc: " Test invalid link, url part broken across multiple lines.",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:41:1: 41:60 (#0),
        item_id: None,
        doc: " [doc invalid link broken url scheme part part](https://",
        kind: SugaredDoc,
        indent: 1,
    },
    DocFragment {
        span: tests/ui/doc_broken_link.rs:42:1: 42:55 (#0),
        item_id: None,
        doc: " test.fake/doc_invalid_link_broken_url_scheme_part)",
        kind: SugaredDoc,
        indent: 1,
    },
]

 bl=BrokenLink {
    span: 58..104,
    link_type: Shortcut,
    reference: Borrowed(
        "doc invalid link broken url scheme part part",
    ),
}

 text based on 'bl.span' range="[doc invalid link broken url scheme part part]"

So, as we can see the BrokenLink data we get from fake_broken_link_callback gives a span just for the link title. I could use that span initial position to start reading the link and get the url part too. Is that what you had in mind?

@rustbot review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lint broken URLs in documentation comments
4 participants