diff --git a/src/oas.md b/src/oas.md index e91c9cb08a..535167f42c 100644 --- a/src/oas.md +++ b/src/oas.md @@ -309,7 +309,7 @@ Using a `contentEncoding` of `base64url` ensures that URL encoding (as required The `contentMediaType` keyword is redundant if the media type is already set: -* as the key for a [MediaType Object](#media-type-object) +* as the key for a [Media Type Object](#media-type-object) * in the `contentType` field of an [Encoding Object](#encoding-object) If the [Schema Object](#schema-object) will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored. @@ -1257,6 +1257,8 @@ See [Working With Examples](#working-with-examples) for further guidance regardi This object MAY be extended with [Specification Extensions](#specification-extensions). +Note that correlating Encoding Objects with Schema Objects may require [schema searches](#searching-schemas) for keywords such as `properties`, `prefixItems`, and `items`. + See also the [Media Type Registry](#media-type-registry). ##### Complete vs Streaming Content @@ -1639,7 +1641,7 @@ These fields MAY be used either with or without the RFC6570-style serialization | Field Name | Type | Description | | ---- | :----: | ---- | -| contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the property type as shown in the table below. | +| contentType | `string` | The `Content-Type` for encoding a specific property. The value is a comma-separated list, each element of which is either a specific media type (e.g. `image/png`) or a wildcard media type (e.g. `image/*`). Default value depends on the type (determined by a [schema search](#searching-schemas)) as shown in the table below. | | headers | Map[`string`, [Header Object](#header-object) \| [Reference Object](#reference-object)] | A map allowing additional information to be provided as headers. `Content-Type` is described separately and SHALL be ignored in this section. This field SHALL be ignored if the media type is not a `multipart`. | This object MAY be extended with [Specification Extensions](#specification-extensions). @@ -2599,6 +2601,10 @@ Note that JSON Schema Draft 2020-12 does not require an `x-` prefix for extensio The [`format` keyword (when using default format-annotation vocabulary)](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.2.1) and the [`contentMediaType`, `contentEncoding`, and `contentSchema` keywords](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-8.2) define constraints on the data, but are treated as annotations instead of being validated directly. Extended validation is one way that these constraints MAY be enforced. +In addition to extended validation, annotations are the most effective way to determine whether these keywords impact the type and structure of the fully parsed data. +For example, formats such as `int64` can be applied to JSON strings, as JSON numbers have limitations that make large integers non-portable. +If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes. + ###### Validating `readOnly` and `writeOnly` The `readOnly` and `writeOnly` keywords are annotations, as JSON Schema is not aware of how the data it is validating is being used. @@ -2611,6 +2617,108 @@ Even when read-only fields are not required, stripping them is burdensome for cl Note that the behavior of `readOnly` in particular differs from that specified by version 3.0 of this specification. +##### Working with Schemas + +In addition to schema evaluation, which encompasses both validation and annotation, some OAS features require inspecting schemas in other ways. + +###### Preparing Data for Schema Evaluation + +When the data source is a JSON document, preparing the data is trivial as parsing JSON produces a suitable data structure. +Some other media types, as well as URL components and header values, lack sufficient type information to parse directly to suitable data types. + +Consider this URL-encoded form: + +```uri +foo=42&bar=42 +``` + +As URL query parameters are strings, this would naturally parse to something equivalent to the following JSON: + +```json +{ + "foo": "42", + "bar": "42" +} +``` + +But consider this [Media Type Object](#media-type-object) for the form: + +```yaml +application/x-www-form-urlencoded: + schema: + type: object + properties: + foo: + type: string + bar: + type: integer +``` + +From the `schema` field, we can tell that the correct data structure would actually be equivalent to: + +```json +{ + "foo": "42", + "bar": 42 +} +``` + +In order to prepare the correct data structure for evaluation in such cases, implementations MUST perform a [schema search](#searching-schemas) for the `type` keyword. + +###### Applying Further Type Information + +The `format` keyword provides more fine-grained type information, and can even change the underlying data type for the purposes of the application. +For example, if `foo` had the schema `{"type": "string", "format": "int64")`, the data structure used for validation would still be the same, but the application will need to convert the string `"42"` to the 64-bit integer `42`. +Similarly, the `content*` keywords can indicate further structure within a string. + +Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements. + +Note that parsing string contents based on `contentMediaType` carries the same security risks as parsing HTTP message bodies based on `Content-Type`; see [Handling External Resources](#handling-external-resources) for further information. + +###### Schema Evaluation and Binary Data + +Few JSON Schema implementations directly support working with binary data, as doing so is not a mandatory part of that specification. + +OAS Implementations that do not have access to a binary-instance-supporting JSON Schema implementation MUST examine schemas and apply them in accordance with [Working with Binary Data](#working-with-binary-data), +When the entire instance is binary, this is straightforward as few keywords are relevant. + +However, `multipart` media types can mix binary and text-based data, leaving implementations with two options for schema evaluations: + +1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords). +2. Perform [schema searches](#searching-schemas) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data. + +Implementations MUST document which strategy or strategies they use, as well as any known limitations. + +##### Searching Schemas + +Several OAS features require searching Schema Objects for keywords indicating the data type and/or structure. +Each feature that needs such a search documents which keywords or structures need to be found. + +Even if the requirement is given in terms of schema keywords, if the data is in a form [suitable for schema evaluation](#preparing-data-for-schema-evaluation) and the necessary information (including type) can be determined by inspecting the data (and possibly also annotations such as `format`), implementations MUST support doing so as this is effective regardless of how schemas are structured. + +If this is not possible, the schemas MUST be searched to see if the information can be determined without performing evaluation. +As schema organization can become very complex, implementations are not expected to handle every possible schema layout. +However, given a known starting point schema (usually the value of the nearest `schema` field), implementations MUST search the following for the relevant keywords, which vary depending on the use case but might include `type`, `format`, `contentMediaType`, `properties`, `prefixItems`, `items`, etc.: + +* The starting point schema itself +* Any schema reachable from there solely through `$ref` and/or `allOf` + +These schemas are guaranteed to be applied to any instance. + +In some cases, such as correlating [Encoding Objects](#encoding-object) with Schema Objects using fields in a [Media Type Object](#media-type-object), it is be necessary to first find a keyword such as `properties`, and then treat its subschema(s) as starting point schemas for further searches. + +Implementations MAY analyze subschemas of other keywords such as `oneOf` or `dependentSchemas`, or examine possible `$dynamicRef` targets, and MUST document the extent and nature of any such additional support. + +###### Handling Multiple Types + +When searching for `type`, if the `type` keyword has multiple values, one of which is `"null"` (e.g. `type: ["number", "null"]`), the non-null type MUST be treated as the relevant type if a single type is needed to determine behavior. + +For other multi-valued `type` keywords, the behavior is implementation-defined but MUST either follow a documented process or be documented to produce an informative error. + +If an implementation supports handling multi-valued `type` keywords for type searches, it SHOULD attempt to use non-`"string"` types before using `"string"` (if `"string"` is one of the types) as all current type interpretation use cases for involve data stored in string form by default. + +Implementations MAY treat the order of types in the `type` keyword as significant, except when it conflicts with the above requirements. + ##### Data Modeling Techniques ###### Composition and Inheritance (Polymorphism)