diff --git a/src/main/xml/bibliography.xml b/src/main/xml/bibliography.xml index 183195e..ca1bf49 100644 --- a/src/main/xml/bibliography.xml +++ b/src/main/xml/bibliography.xml @@ -341,6 +341,12 @@ Internationalized Resource Identifiers (IRIs). M. Duerst and M. Suignard, editors. Internet Engineering Task Force. January, 2005. +RFC 8259 +RFC 8259: +The JavaScript Object Notation (JSON) Data Interchange Format. +T. Bray, editor. +Internet Engineering Task Force. December, 2017. + Unicode TR#17 Unicode Technical Report #17: Character Encoding Model. diff --git a/steps/src/main/xml/references.xml b/steps/src/main/xml/references.xml index bf3efb3..0c4212a 100644 --- a/steps/src/main/xml/references.xml +++ b/steps/src/main/xml/references.xml @@ -22,6 +22,7 @@ + diff --git a/steps/src/main/xml/steps/load.xml b/steps/src/main/xml/steps/load.xml index 76dbe54..c347fdb 100644 --- a/steps/src/main/xml/steps/load.xml +++ b/steps/src/main/xml/steps/load.xml @@ -102,18 +102,86 @@ the processor does not support DTD validation. document is an XPath data model document consisting of a single text node.) It is a dynamic error if the - specifies an encoding, which is not supported -by the processor. + specifies a charset (sometimes called the +character encoding) that is not supported by the processor. Text parameters are implementation-defined. +
+Byte order marks + +UTF-8 and UTF-16 inputs can begin with a byte order mark. The byte order +mark is not considered part of the text and is not included in the text +document. + +In order to identify the byte order mark, it is first necessary to +identify the charset that is being +used. The charset of an external resource is determined as follows: + + + +external charset information is used if available (for example, if the +resource is loaded with HTTP or HTTPS and the server provided a charset), otherwise + + + +the charset from the content type if specified, otherwise + + + +the processor may use implementation-defined heuristics to determine the +likely charset, otherwise + + + +UTF-8 is assumed. + + + +Processors must support UTF-8 and UTF-16. Support +for other charsets is implementation-defined. + +If the encoding is UTF-8, UTF-16, UTF-16LE, or UTF-16BE, a byte order mark +may be present. (For any other encoding, there is no byte order mark and all of the text +is returned.) + + + +If the encoding is UTF-8 and the document begins with the bytes +EF BB BF, those bytes are the byte order mark. They are discarded. + + +If the encoding is UTF-16 and the document begins with the bytes +FE FF or FF FE, those bytes are the byte order mark. They are discarded. + + +If the encoding is UTF-16LE and the document begins with the bytes +FF FE, those bytes are the byte order mark. They are discarded. + + +If the encoding is UTF-16BE and the document begins with the bytes +FE FF, those bytes are the byte order mark. They are discarded. + + +If the encoding isn’t specified, but the file begins with a byte +order mark (FE FF or FF FE), treat the charset as UTF-16 and +discard the byte order mark. + + +Otherwise, there is no byte order mark, nothing is discarded. + + + + +
Loading JSON data -For a JSON media type, the content is loaded and parsed as JSON. +For a JSON media type, the content is loaded and parsed as JSON +. The parameters specified for the fn:parse-json function in @@ -133,6 +201,15 @@ map contains an entry whose key is defined in the specification of fn:parse-json and whose value is not valid for that key, or if it contains an entry with the key fallback when the parameter with true() is also present. + +
+Byte order mark + +JSON data transmitted with the UTF-8 encoding may begin with a byte order +mark. If it does, the byte order mark is discarded before parsing the +input. +
+