From 70925237a88d9802bfe7224fe9c78b146af615be Mon Sep 17 00:00:00 2001
From: Anne van Kesteren
The term empty, when used for an attribute value, Text
node, or
- string, means that the length of the text is zero (i.e. not even containing spaces or control
- characters).
The term empty, when used for an attribute value, Text
node,
+ or string, means that the length of the text is zero (i.e., not even containing controls or U+0020 SPACE).
An element's child text content is the concatenation of the data of all the Text
nodes that are children of the
@@ -2369,9 +2369,11 @@ a.setAttribute('href', 'https://example.com/'); // change the content attribute
character
This is not to be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the Unicode.txt
data file.
The control characters are those whose Unicode "General_Category" property has the
- value "Cc" in the Unicode UnicodeData.txt
data file.
Some of the micro-parsers described below follow the pattern of having an input @@ -10532,9 +10531,8 @@ console.assert(image.height === 200); whitespace).
Text
nodes and attribute values must consist of scalar
- values, must not contain U+0000 characters, must not contain permanently undefined
- characters (noncharacters), and must not contain control characters other than
- ASCII whitespace.
+ values, excluding noncharacters, and controls other than ASCII whitespace.
U+000E to U+001F,
- U+007F to U+009F, U+FDD0 to U+FDEF, and
- characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE,
- U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF,
- U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
- U+FFFFF, U+10FFFE, and U+10FFFF are parse errors. These are all
- control characters or permanently undefined characters (noncharacters).
Any character that is a not a scalar value, i.e. any isolated
- surrogate, is a parse error. (These can only find their way into the input stream via
- script APIs such as document.write()
.)
Any occurrences of surrogates, noncharacters, or controls other than + ASCII whitespace are parse errors.
+ +Isolated surrogates can only find their way into the input stream via script APIs
+ such as document.write()
.
U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) characters are treated specially. Any LF character that immediately follows a CR character must be ignored, and all CR