Skip to content

Add proper canonicalization of domain names #255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from
Closed
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
50c79ec
fix portValue typo
Mar 15, 2024
a8bf21b
add serializing for port and host properties
Mar 15, 2024
c46e981
newline cleanup
Mar 15, 2024
0962aa9
Merge branch 'main' into fix-specs
Apr 17, 2024
1c346f3
Merge branch 'whatwg:main' into fix-specs
rubycon Apr 24, 2024
147b26a
Editorial: Mark IDs referred to by IETF documents as required (#233)
jeremyroman Aug 29, 2024
75bd911
Editorial: Add brief explanations to examples
jeremyroman Aug 29, 2024
90ac4a9
Meta: link Simplified Chinese translation (#238)
sisidovski Sep 27, 2024
bd98b70
Explain how HTTP header fields integrate with URL patterns
jeremyroman Sep 27, 2024
60be5b7
Correct condition for opaque paths in base URL
anonrig Jan 7, 2025
d4b660c
Editorial: Fix broken references to make the build work again
jeremyroman Jan 7, 2025
1fa3d21
Correct condition for opaque paths in base URL
anonrig Jan 7, 2025
20ca299
Meta: Force a build/deploy
jeremyroman Jan 7, 2025
2e38014
Editorial: Fix RFC2119 keyword warnings (#247)
sisidovski Jan 9, 2025
78036b0
Use the basic URL parser when parsing URLs
jeremyroman Jan 9, 2025
c934c6b
Editorial: Correct "set" to "let"
jeremyroman Jan 9, 2025
cc87ea9
Serialize base URL's host when a string is required
jeremyroman Jan 9, 2025
1d3ab52
Correct null handling when computing base URL host string
jeremyroman Jan 15, 2025
1c2d99f
Use WHATWG Infra ASCII code point definition
jeremyroman Jan 16, 2025
9dae792
markup fixes
Jan 29, 2025
d0a4f62
Add proper canonicalization of domain names
Jan 30, 2025
593c5f6
Merge branch 'main' into fix-specs
Jan 30, 2025
38e80d1
Merge branch 'main' into fix-specs
rubycon Mar 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 20 additions & 8 deletions spec.bs
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,7 @@ A <dfn>component</dfn> is a [=struct=] with the following [=struct/items=]:
1. Set |urlPattern|'s [=URL pattern/username component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/username}}"], [=canonicalize a username=], and [=default options=].
1. Set |urlPattern|'s [=URL pattern/password component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/password}}"], [=canonicalize a password=], and [=default options=].
1. If the result running [=hostname pattern is an IPv6 address=] given |processedInit|["{{URLPatternInit/hostname}}"] is true, then set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize an IPv6 hostname=], and [=hostname options=].
1. Otherwise, if the result of running [=protocol component matches a special scheme=] given |urlPattern|'s [=URL pattern/protocol component=] is true, or |urlPattern|'s [=URL pattern/protocol component=]'s [=component/pattern string=] is "`*`", then set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize a domain name=], and [=hostname options=].
1. Otherwise, set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize a hostname=], and [=hostname options=].
1. Set |urlPattern|'s [=URL pattern/port component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/port}}"], [=canonicalize a port=], and [=default options=].
1. Let |compileOptions| be a copy of the [=default options=] with the [=options/ignore case=] property set to |options|["{{URLPatternOptions/ignoreCase}}"].
Expand Down Expand Up @@ -1434,7 +1435,7 @@ To <dfn>add a part</dfn> given a [=pattern parser=] |parser|, a string |prefix|,
1. Set |type| to "<a for=part/type>`full-wildcard`</a>".
1. Set |regexp value| to the empty string.
1. Let |name| be the empty string.
<p class=note>Next, we determine the [=part=] [=part/name=]. This can be explicitly provided by a "<a for=token/type>`name`</a>" [=token=] or be automatically assigned.
<p class=note>Next, we determine the [=part=] [=part/name=]. This can be explicitly provided by a "<a for=token/type>`name`</a>" [=token=] or be automatically assigned.</p>
1. If |name token| is not null, then set |name| to |name token|'s [=token/value=].
1. Otherwise if |regexp or wildcard token| is not null:
1. Set |name| to |parser|'s [=pattern parser/next numeric name=], [=serialize an integer|serialized=].
Expand Down Expand Up @@ -1730,15 +1731,23 @@ To <dfn>convert a modifier to a string</dfn> given a [=part/modifier=] |modifier
</div>

<div algorithm>
To <dfn>canonicalize a hostname</dfn> given a string |value|:
To <dfn>canonicalize a hostname</dfn> given a string |value| and optionally a string |protocolValue|:

1. If |value| is the empty string, return |value|.
1. Let |dummyURL| be a new [=URL record=].
1. If |protocolValue| was given, then set |dummyURL|'s [=url/scheme=] to |protocolValue|.
<p class="note">We set the [=URL record=]'s [=url/scheme=] in order for the [=basic URL parser=] to recognize and normalize non-opaque hostname values.</p>
1. Let |parseResult| be the result of running the [=basic URL parser=] given |value| with |dummyURL| as <i>[=basic URL parser/url=]</i> and [=hostname state=] as <i>[=basic URL parser/state override=]</i>.
1. If |parseResult| is failure, then throw a {{TypeError}}.
1. Return |dummyURL|'s [=url/host=], [=host serializer|serialized=], or empty string if it is null.
</div>

<div algorithm>
To <dfn>canonicalize a domain name</dfn> given a string |value|:

1. Return the result of running [=canonicalize a hostname=] given |value| and "`https`".
</div>

<div algorithm>
To <dfn>canonicalize an IPv6 hostname</dfn> given a string |value|:

Expand Down Expand Up @@ -1870,7 +1879,7 @@ To <dfn>convert a modifier to a string</dfn> given a [=part/modifier=] |modifier
1. If |init|["{{URLPatternInit/protocol}}"] [=map/exists=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=process protocol for init=] given |init|["{{URLPatternInit/protocol}}"] and |type|.
1. If |init|["{{URLPatternInit/username}}"] [=map/exists=], then set |result|["{{URLPatternInit/username}}"] to the result of [=process username for init=] given |init|["{{URLPatternInit/username}}"] and |type|.
1. If |init|["{{URLPatternInit/password}}"] [=map/exists=], then set |result|["{{URLPatternInit/password}}"] to the result of [=process password for init=] given |init|["{{URLPatternInit/password}}"] and |type|.
1. If |init|["{{URLPatternInit/hostname}}"] [=map/exists=], then set |result|["{{URLPatternInit/hostname}}"] to the result of [=process hostname for init=] given |init|["{{URLPatternInit/hostname}}"] and |type|.
1. If |init|["{{URLPatternInit/hostname}}"] [=map/exists=], then set |result|["{{URLPatternInit/hostname}}"] to the result of [=process hostname for init=] given |init|["{{URLPatternInit/hostname}}"], |result|["{{URLPatternInit/protocol}}"], and |type|.
1. If |init|["{{URLPatternInit/port}}"] [=map/exists=], then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|.
1. If |init|["{{URLPatternInit/pathname}}"] [=map/exists=]:
1. Set |result|["{{URLPatternInit/pathname}}"] to |init|["{{URLPatternInit/pathname}}"].
Expand Down Expand Up @@ -1936,10 +1945,12 @@ To <dfn>convert a modifier to a string</dfn> given a [=part/modifier=] |modifier
</div>

<div algorithm>
To <dfn>process hostname for init</dfn> given a string |value| and a string |type|:
To <dfn>process hostname for init</dfn> given a string |hostnameValue|, a string |protocolValue|, and a string |type|:

1. If |type| is "`pattern`" then return |value|.
1. Return the result of running [=canonicalize a hostname=] given |value|.
1. If |type| is "`pattern`" then return |hostnameValue|.
1. If |protocolValue| is a [=special scheme=] or the empty string, then return the result of running [=canonicalize a domain name=] given |hostnameValue|.
<p class="note">If the |protocolValue| is the empty string then no value was provided for {{URLPatternInit/protocol}} in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a [=special scheme=] in order to default to the most common hostname canonicalization.</p>
1. Return the result of running [=canonicalize a hostname=] given |hostnameValue|.
</div>

<div algorithm>
Expand Down Expand Up @@ -2114,8 +2125,9 @@ Ralph Chelala,
Sangwhan Moon,
Sayan Pal,
Victor Costan,
Yoshisato Yanagisawa, and
Youenn Fablet
Yoshisato Yanagisawa,
Youenn Fablet, and
Yves-Marie K. Rinquin
for their contributors to this specification.

Special thanks to Blake Embrey and the other [pillarjs/path-to-regexp](https://github.com/pillarjs/path-to-regexp) [contributors](https://github.com/pillarjs/path-to-regexp/graphs/contributors) for building an excellent open source library that so many have found useful.
Expand Down