Add linkcheck_case_sensitive configuration option #14046

FazeelUsmani · 2025-11-07T15:07:09Z

This PR adds a new configuration option linkcheck_ignore_case to enable case-insensitive URL and anchor checking in the linkcheck builder.

Problem

Some web servers (e.g., GitHub, certain hosting platforms) are case-insensitive and may return URLs with different casing than the original link. This causes the linkcheck builder to report false-positive redirects when the URLs differ only in case, even though they point to the same resource.

Solution

Added linkcheck_ignore_case boolean configuration option (default: False)
Modified URL comparison logic to support case-insensitive matching when enabled
Modified anchor comparison in AnchorCheckParser to support case-insensitive matching when enabled
Added comprehensive tests for both URL and anchor case-insensitive checking
Updated documentation in doc/usage/configuration.rst

jayaddison · 2025-11-08T12:18:26Z

Hi @FazeelUsmani - thank you for developing and describing this pull request.

I have a concern that enabling the option reduces the precision of other hyperlinks that are checked.

Could you explain the use case where it would be easier for a documentation project to enable this option by editing the conf.py instead of fixing the URLs/anchors in their documentation sources to use the correct casing?

FazeelUsmani · 2025-11-10T09:00:09Z

That’s a good point, @jayaddison.
This option is off by default and meant mainly only for large or older docs where many links hit servers that normalise URL casing (like GitHub) or are case-insensitive (like Windows). Enabling it just filters out harmless casing-related redirects so teams can focus on real link issues instead of noise.

jayaddison · 2025-11-10T10:54:57Z

@FazeelUsmani got it, understood. As often happens, I had a misunderstanding to begin with - you are saying that this only affects whether case-adjusted response URLs are considered to be redirect instead of successful.

Let me think about this a little more; I do understand the value in this now, but am wary of (and trying to think of) any problem side-effects.

jayaddison · 2025-11-10T10:55:11Z

(also, thank you for the explanation)

jayaddison · 2025-11-10T10:56:46Z

Separately: I do think that we should probably isolate the redirect-case-sensitivity handling from the HTML anchor case-sensitivity; they seem fairly functionally different from each other to me.

FazeelUsmani · 2025-11-10T17:32:42Z

Hmm.. makes sense. I can refactor this into two separate options:
linkcheck_ignore_case_urls: For comparing URL paths (the redirect scenario)
linkcheck_ignore_case_anchors: For comparing HTML anchors

This would give users more granular control. Most users would likely want linkcheck_ignore_case_urls = True (for case-insensitive servers) while keeping linkcheck_ignore_case_anchors = False (since HTML IDs are technically case-sensitive per spec). What do you say?

AA-Turner · 2025-11-10T19:05:59Z

Two options seems overkill for this use-case. What do browsers do de facto on case mismatches on fragment IDs?

A

FazeelUsmani · 2025-11-11T08:46:44Z

Fair point — browsers generally treat fragment IDs as case-sensitive, though behavior can vary depending on the HTML generator. My thought was mainly to avoid false negatives in edge cases (like auto-generated anchors that normalize casing differently).
That said, I’m fine keeping it as a single option if we note the anchor behavior clearly in the docs.

jayaddison · 2025-11-11T10:14:55Z

I can't think of drawbacks to the redirect case-folding -- and although it's maybe slightly controversial, I wonder whether we should enable it by default.

The anchor-checking I'm less certain about; given that we believe browsers seem to navigate to anchors case-sensitively -- something I too checked locally and that is certainly the case in Firefox 140.4 -- I'd be reluctant to offer that without a demonstrable use-case (again that can't be solved easily by fixing the source documentation).

FazeelUsmani · 2025-11-11T12:09:35Z

That makes sense — I’ll keep it as a single linkcheck_ignore_case option limited to the URL path. Anchor checks will remain case-sensitive to align with browser behavior, and I’ll clarify this distinction in the docs so users understand the expected behavior.

jayaddison · 2025-11-11T12:29:38Z

Sounds good to me! Thanks @FazeelUsmani.

jayaddison · 2025-11-18T11:08:12Z

doc/usage/configuration.rst

+.. confval:: linkcheck_case_insensitive
+   :type: :code-py:`bool`
+   :default: :code-py:`False`
+
+   When :code-py:`True`, the *linkcheck* builder will compare URL paths
+   case-insensitively when checking for redirects.
+   This is useful for checking links on case-insensitive servers
+   (for example, GitHub, Windows-based servers, or certain hosting platforms)
+   that may return URLs with different case than the original link.
+
+   When enabled, URL paths like ``/Path`` and ``/path`` are considered
+   equivalent, preventing false-positive redirect warnings on
+   case-insensitive servers.
+
+   .. note::
+
+      This option only affects URL path comparison for redirect detection.
+      HTML anchor checking remains case-sensitive to match browser behavior,
+      where fragment identifiers (``#anchor``) are case-sensitive per the
+      HTML specification.
+
+   Example:
+
+   .. code-block:: python
+
+      linkcheck_case_insensitive = True
+
+   .. versionadded:: 8.2
+


This may be my perception, but I would suggest that it may be easier to understand the config setting if we remove the inverse prefix (e.g. using sensitive instead of insensitive).

I also have a few ideas for the documentation text; please let me know what you think:

Suggested change

.. confval:: linkcheck_case_insensitive

:type: :code-py:`bool`

:default: :code-py:`False`

When :code-py:`True`, the *linkcheck* builder will compare URL paths

case-insensitively when checking for redirects.

This is useful for checking links on case-insensitive servers

(for example, GitHub, Windows-based servers, or certain hosting platforms)

that may return URLs with different case than the original link.

When enabled, URL paths like ``/Path`` and ``/path`` are considered

equivalent, preventing false-positive redirect warnings on

case-insensitive servers.

.. note::

This option only affects URL path comparison for redirect detection.

HTML anchor checking remains case-sensitive to match browser behavior,

where fragment identifiers (``#anchor``) are case-sensitive per the

HTML specification.

Example:

.. code-block:: python

linkcheck_case_insensitive = True

.. versionadded:: 8.2

.. confval:: linkcheck_case_sensitive

:type: :code-py:`bool`

:default: :code-py:`True`

This setting controls how the *linkcheck* builder decides

whether a hyperlink's destination is the same as the URL

written in the documentation.

By default, *linkcheck* requires the destination URL to match the written URL case-sensitively. This means that a link to ``http://webserver.test/USERNAME`` in

the documentation that the server redirects to ``http://webserver.test/username`` will be reported as ``redirected``.

To allow a more lenient URL comparison, that will report the previous case as

as ``working`` instead, configure this setting to ``False``.

.. note::

HTML anchor checking is always case-sensitive, and is

not affected by this setting.

.. versionadded:: 8.2

Edit: replace status code of successful with the actual value from the code, working.

I've applied your documentation changes.

Thanks again!

jayaddison · 2025-11-18T11:19:58Z

sphinx/builders/linkcheck.py

+            if self.case_insensitive:
+                normalised_url = normalised_url.casefold()


This is probably a rare scenario, but strictly speaking I think we should avoid casefolding any fragment/anchor identifier part of the URL.

e.g.

❌ /USERNAME#Badges -> /username#badges

✅ /USERNAME#Badges -> /username#Badges

(I only realised this after attempting to rewrite the config setting documentation in my own words -- in particular the part about anchor case-sensitivity)

Right. I've fixed the _normalise_url() function to only casefold the URL before the fragment. Now /USERNAME#Badges correctly becomes /username#Badges (preserving the fragment case). The function now splits on # and only applies .casefold() to the URL part.

OK; the updated URL normalisation logic looks good to me; thank you.

…andling

jayaddison · 2025-11-18T12:42:18Z

sphinx/builders/linkcheck.py

    app.add_event('linkcheck-process-uri')

    # priority 900 to happen after ``check_confval_types()``
+    app.connect('config-inited', handle_deprecated_linkcheck_case_config, priority=899)


We don't need to add any deprecation/migration-handling code for this config setting, because the linkcheck_case_insensitive config setting has not been included in a published release of Sphinx.

I've removed the deprecation handler function and the old config value registration.

tests/test_builders/test_build_linkcheck.py

Co-authored-by: James Addison <[email protected]>

…sphinx into linkcheck_case_insensitive

jayaddison · 2025-11-18T12:53:16Z

tests/test_builders/test_build_linkcheck.py

+    def do_HEAD(self):
+        if self.path == '/path':
+            # Redirect lowercase /path to uppercase /Path
+            self.send_response(301, 'Moved Permanently')
+            self.send_header('Location', '/Path')
+            self.send_header('Content-Length', '0')
+            self.end_headers()
+        elif self.path == '/Path':
+            self.send_response(200, 'OK')
+            self.send_header('Content-Length', '0')
+            self.end_headers()
+        else:
+            self.send_response(404, 'Not Found')
+            self.send_header('Content-Length', '0')
+            self.end_headers()


We can remove the do_HEAD code block here, as the base HTTP server will invoke do_GET instead if it's missing.

NB: Let's also apply the Location response header simplification to do_GET when we do that, though.

Good point! I've removed the do_HEAD method entirely - the base HTTP server will invoke do_GET instead. I've also simplified the Location header in do_GET to just use /Path instead of the full URL with host.

jayaddison · 2025-11-18T12:58:09Z

tests/test_builders/test_build_linkcheck.py

    } in content
+
+
+class CaseSensitiveHandler(BaseHTTPRequestHandler):


Slightly nitpicky, but: the client and the test case are about case sensitivity -- but this handler is not really. What it does is to normalise/capitalise the path. So I'd suggest renaming it:

Suggested change

class CaseSensitiveHandler(BaseHTTPRequestHandler):

class CapitalisePathHandler(BaseHTTPRequestHandler):

Renamed it from CaseSensitiveHandler to CapitalisePathHandler and updated the docstring to reflect what it actually does.

jayaddison · 2025-11-18T12:59:58Z

tests/test_builders/test_build_linkcheck.py

+    lowercase_uri = f'http://{address}/path'
+    assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked'
+    assert rowsby[lowercase_uri]['status'] == 'redirected'


Although the containment check is good defensive coding, we don't perform similar checks in the other tests, so for consistency I would recommend:

Suggested change

lowercase_uri = f'http://{address}/path'

assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked'

assert rowsby[lowercase_uri]['status'] == 'redirected'

assert rowsby[f'http://{address}/path']['status'] == 'redirected'

(I briefly considered suggesting renaming the variable to something like origin_url, outbound_url or documented_url -- but on balance I think we should simply omit it)

I've simplified both tests to directly assert on rowsby[f'http://{address}/path']['status'] without the intermediate variable or defensive containment check, matching the style used in other tests in the file

…plify assertions

jayaddison · 2025-11-18T13:13:30Z

Ok, I think this is looking pretty good @FazeelUsmani - thanks for accommodating my feedback.

The only significant remaining question I have is whether this should be controlled as a single boolean value, or whether it should be enabled for patterns of URLs -- similar to the way that linkcheck_auth and linkcheck_ignore can be configured for specific patterns.

Currently: I am leaning towards the idea that, ideally, documentation projects should only enable this for the domains/websites where they need the tolerance for case-normalising redirects.

The reason I lean that way is that -- generally -- URLs are in fact case sensitive, and so I have some lingering doubts about making this a flag that is likely to affect cases where it's not worthwhile to casefold.

PS: either way, if you could also update the title of this PR that would be great!

FazeelUsmani · 2025-11-18T13:26:45Z

Thanks for the feedback @jayaddison!

That's a really good point about making this pattern-based rather than a global boolean. I can see how that would be safer and more precise - only applying case-insensitivity to domains that actually need it (like GitHub), similar to how linkcheck_auth and linkcheck_ignore work.

So, the pattern-based approach would be something like:

linkcheck_case_sensitive_patterns = {
    r'https://github\.com/.*': False,
}

Does this sounds good to you?

jayaddison · 2025-11-18T13:44:44Z

That mostly sounds good - I would suggest some adjustments, though: either:

Rename the config dictionary's name to linkcheck_case_sensitivity -- for consistency with the auth and ignore settings, that omit the word patterns.

OR

Use a list datastructure instead of a dict, and rename the variable to linkcheck_case_insensitive (my mistake - perhaps I should not have suggested removing the in... prefix earlier! if we choose this route, I think it'd make sense again because it'd be describing a collection of opt-in patterns that disable the default case-sensitive behaviour)

I don't have a strong preference yet -- I'm continuing to think about it.

The second option might make it slightly more apparent to readers that these ~~domains~~ patterns are exceptions and that case-sensitive URL comparison is the default.

Edit: fixup: the matching patterns are not necessarily domains.

FazeelUsmani · 2025-11-18T13:53:13Z

Thanks for clarifying, @jayaddison! I'll go with Option 2 - using a list with linkcheck_case_insensitive. I agree it makes the intent clearer: these are opt-in patterns for URLs that should be treated case-insensitively, with case-sensitive being the default.

So the config would look like:

linkcheck_case_insensitive = [
    r'https://github\.com/.*',
]

Does that sound good to you? I'll start refactoring to implement pattern matching once you confirm.

jayaddison · 2025-11-18T14:37:18Z

@FazeelUsmani yep, that sounds good to me 👍

jayaddison · 2025-11-18T19:09:56Z

tests/test_builders/test_build_linkcheck.py

+    freshenv=True,
+    confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/.*']},
+)
+def test_linkcheck_case_insensitive(app: SphinxTestApp) -> None:


Optional: I think it would be possible to merge the two test cases into a single def test_linkcheck_case_sensitivity now that we can pattern-match different URL patterns.

(the code for the two of them is very similar currently, so perhaps the end result would be neater/smaller by combining them)

I explored merging the tests, but found that keeping them separate is actually clearer and simpler. Merging them would require either:

Having two different URLs in the test document with hardcoded ports (which don't match the dynamic test
server port)

More complex test logic to construct different URLs

Since you mentioned this was optional, I've kept the two separate tests as they clearly demonstrate:

test_linkcheck_case_sensitive: Default behavior (no patterns configured)

test_linkcheck_case_insensitive: Pattern-based behavior (with a specific pattern)

This makes the tests easier to understand and maintain. Please let me know if you still prefer them to be merged.

That seems reasonable to me, yep - thanks! One detail I'd like to ask about: with approach (2), was the more complex in the CapitalisePathHandler? (that's what I'd guess, but would like to double-check)

Actually, the complexity wasn't in the CapitalisePathHandler - that part would stay the same (it just redirects /path → /Path for any request).

The complexity would be in the test logic itself. To test both behaviors in one test, we'd need:

Two different URLs in the test RST document (e.g., localhost and 127.0.0.1)

The pattern configured to match only one of them (e.g., only 127.0.0.1)

Test assertions to verify different behavior for each URL

The issue is the RST document has hardcoded ports (e.g., http://localhost:7777/path), but the test server uses a dynamic port. So we'd need logic to construct the correct URLs with the dynamic port in the test assertions, which felt more complex than just having two focused tests.

Ok; yep, good observation about the dynamic port allocation for the test webserver vs the static port in the documentation.

And also point taken/agreed about the single /path -> /Path transform offered by the handler.

We could add more path transforms to the handler.. could that help? (I think it might do, but I haven't experimented with it in code yet)

Yes, adding more path transforms to the handler could definitely help!

Here's what I'm thinking:

class CapitalisePathHandler(BaseHTTPRequestHandler): """Test server that capitalises URL paths via redirects.""" protocol_version = 'HTTP/1.1' PATH_REDIRECTS = { '/path1': '/Path1', '/path2': '/Path2', } def do_GET(self): if self.path in self.PATH_REDIRECTS: self.send_response(301, 'Moved Permanently') self.send_header('Location', self.PATH_REDIRECTS[self.path]) self.send_header('Content-Length', '0') self.end_headers() elif self.path in self.PATH_REDIRECTS.values(): content = b'ok\n\n' self.send_response(200, 'OK') self.send_header('Content-Length', str(len(content))) self.end_headers() self.wfile.write(content) else: self.send_response(404, 'Not Found') self.send_header('Content-Length', '0') self.end_headers()

Then the test RST could have:
path1 <http://localhost:7777/path1>_
path2 <http://localhost:7777/path2>_

And we configure the pattern to match only path1:
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/path1']}

This way we can assert:

path1 → working (case-insensitive applies)

path2 → redirected (case-sensitive applies)

The dynamic port issue is actually not a problem for assertions since we use the actual address variable there. The hardcoded port in the RST only matters for the pattern matching, which this approach handles cleanly.

Want me to implement this?

Nice, ok! Let's try that, but with a change: I'm not too keen on the hardcoded redirect paths, so perhaps we could instead test for self.path.islower(), and when it is, then redirect the client to self.path.capitalize(). I think that'd help to make the resulting code even more concise.

Done! I've implemented your suggestion with the dynamic path detection.

Ok, thank you - there may be a couple of cleanups possible:

We have both do_GET and do_HEAD implemented again - I think we should apply the same simplification as previously and remove one of those.

There are still two test cases, where I think we could have a single test_linkcheck_case_sensitivity that handles both case sensitive and non-case-sensitive variants.

I've realized that my suggestion to use self.path.capitalize() isn't applicable as-is.. sorry about that - perhaps we could use self.path.upper() instead, though.

jayaddison · 2025-11-18T19:10:28Z

tests/test_builders/test_build_html_numfig.py



 @pytest.mark.sphinx('html', testroot='numfig')
-@pytest.mark.test_params(shared_result='test_build_html_numfig')


Let's make sure to restore this pytest marker (I think you did, but maybe it got removed again somehow).

jayaddison · 2025-11-18T19:47:15Z

Ah, and one more thing: please make sure to update CHANGES.rst to describe your change and add a credit entry for it!

I think it's looking good; we'll also need a core developer/maintainer to approve it, and I can't promise if/when that will be.

jayaddison

This feature implementation looks good to me; thank you @FazeelUsmani!

FazeelUsmani · 2025-11-19T14:37:07Z

Hi @jayaddison,
Thank you for approving my changes. How does the merging process work here?

jayaddison · 2025-11-19T16:17:57Z

Hi @jayaddison, Thank you for approving my changes. How does the merging process work here?

You're welcome. I think the best response I can offer to answer that -- and I think you have followed most/all of the guidance there already -- is to refer to the Sphinx official contributing guide: https://www.sphinx-doc.org/en/master/internals/contributing.html#contribute-code

FazeelUsmani added 6 commits November 7, 2025 20:24

Update configuration.rst

63e108c

Add linkcheck_ignore_case config option

caae7eb

Update i18n.py

9e6dd40

fixed the failing test test_numfig_disabled_warn

eccd6d7

Enable case-insensitive URL and anchor checking for linkcheck builder

6300483

strip ANSI color codes from stderr before assertion

b61366c

FazeelUsmani marked this pull request as draft November 7, 2025 15:08

FazeelUsmani added 2 commits November 7, 2025 21:42

fixed the failing test test_connect_to_selfsigned_fails

7ea45c6

Update test_build_linkcheck.py

99a5dc0

sylvainstjean53-creator approved these changes Nov 9, 2025

View reviewed changes

Merge branch 'master' into linkcheck_case_insensitive

f99651f

FazeelUsmani marked this pull request as ready for review November 10, 2025 08:56

Update linkcheck.py

ac12d63

FazeelUsmani marked this pull request as draft November 11, 2025 13:13

FazeelUsmani added 2 commits November 11, 2025 18:45

Update test_build_linkcheck.py

1a0d9ed

Update test_build_linkcheck.py

d115b1e

FazeelUsmani force-pushed the linkcheck_case_insensitive branch 3 times, most recently from 56d6a63 to d115b1e Compare November 11, 2025 14:31

jayaddison reviewed Nov 18, 2025

View reviewed changes

FazeelUsmani added 2 commits November 18, 2025 17:44

Refactor linkcheck case sensitivity: rename config and fix fragment h…

eaa1caa

…andling

Improve formatting and update config value handling

57e8b3c

jayaddison reviewed Nov 18, 2025

View reviewed changes

tests/test_builders/test_build_linkcheck.py Outdated Show resolved Hide resolved

FazeelUsmani and others added 3 commits November 18, 2025 18:16

Update tests/test_builders/test_build_linkcheck.py

5dffff4

Co-authored-by: James Addison <[email protected]>

Remove deprecated linkcheck_case_insensitive config handling

5e08ab3

Merge branch 'linkcheck_case_insensitive' of github.com:FazeelUsmani/…

45cf720

…sphinx into linkcheck_case_insensitive

jayaddison reviewed Nov 18, 2025

View reviewed changes

Refactor linkcheck tests: rename handler for case sensitivity and sim…

06663cf

…plify assertions

FazeelUsmani changed the title ~~Add linkcheck_ignore_case configuration option~~ Add linkcheck_case_sensitive configuration option Nov 18, 2025

Add support for case-insensitive URL checking in linkcheck builder

5615ffc

jayaddison reviewed Nov 18, 2025

View reviewed changes

restore @pytest.mark.test_params and update documentation

842b756

jayaddison approved these changes Nov 19, 2025

View reviewed changes

FazeelUsmani added 3 commits November 20, 2025 17:56

efactor linkcheck case sensitivity tests with dynamic path handler

1fe4293

"Update test document with path1 and path2 for case sensitivity tests

8c7648b

Apply ruff formatting

d95224b

		if self.case_insensitive:
		normalised_url = normalised_url.casefold()

		} in content


		class CaseSensitiveHandler(BaseHTTPRequestHandler):

	class CaseSensitiveHandler(BaseHTTPRequestHandler):
	class CapitalisePathHandler(BaseHTTPRequestHandler):



		@pytest.mark.sphinx('html', testroot='numfig')
		@pytest.mark.test_params(shared_result='test_build_html_numfig')

Uh oh!

Add linkcheck_case_sensitive configuration option #14046

Are you sure you want to change the base?

Add linkcheck_case_sensitive configuration option #14046

Conversation

FazeelUsmani commented Nov 7, 2025

Problem

Solution

Uh oh!

jayaddison commented Nov 8, 2025

Uh oh!

FazeelUsmani commented Nov 10, 2025

Uh oh!

jayaddison commented Nov 10, 2025

Uh oh!

jayaddison commented Nov 10, 2025

Uh oh!

jayaddison commented Nov 10, 2025

Uh oh!

FazeelUsmani commented Nov 10, 2025

Uh oh!

AA-Turner commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FazeelUsmani commented Nov 11, 2025

Uh oh!

jayaddison commented Nov 11, 2025

Uh oh!

FazeelUsmani commented Nov 11, 2025

Uh oh!

jayaddison commented Nov 11, 2025

Uh oh!

jayaddison Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayaddison commented Nov 18, 2025

Uh oh!

FazeelUsmani commented Nov 18, 2025

Uh oh!

jayaddison commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FazeelUsmani commented Nov 18, 2025

Uh oh!

jayaddison commented Nov 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AA-Turner commented Nov 10, 2025 •

edited

Loading

jayaddison Nov 18, 2025 •

edited

Loading

jayaddison commented Nov 18, 2025 •

edited

Loading