-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Sequential (Streaming) media types and link to registry #4518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,131 @@ Some examples of possible media type definitions: | |
application/vnd.github.v3.patch | ||
``` | ||
|
||
#### Media Type Registry | ||
|
||
While the [Schema Object](#schema-object) is designed to describe and validate JSON, several other media types are commonly used in APIs. | ||
Requirements regarding support for other media types are documented in this Media Types section and in several Object sections later in this specification. | ||
For convenience and future extensibility, these are cataloged in the OpenAPI Initiative's [Media Type Registry](https://spec.openapis.org/registry/media-type/), which indicates where in this specification the relevant requirements can be found. | ||
|
||
#### Sequential Media Types | ||
|
||
Several media types exist to transport a sequence of values, separated by some delimiter, either as a single document or as multiple documents representing chunks of a logical stream. | ||
Depending on the media type, the values could either be in another existing format such as JSON, or in a custom format specific to the sequential media type. | ||
|
||
Implementations MUST support modeling sequential media types with the [Schema Object](#schema-object) by treating the sequence as an array with the same items and ordering as the sequence. | ||
This requirement applies to the in-memory data structure corresponding to a sequential media type document, and does not change the behavior or restrict the capabilities of the Schema Object itself. | ||
|
||
##### Working With Indefinite-Length Streams | ||
|
||
In addition to regular document-style use, sequential media types can be used to represent some portion of a stream that may not have a well-defined beginning or end. | ||
In such use cases, either the client or server makes a decision to work with one or more elements in the sequence at a time, but this subsequence is not a complete array in the sense of normal JSON arrays. | ||
|
||
OpenAPI Description authors are responsible for avoiding the use of JSON Schema keywords such as `prefixItems`, `minItems`, `maxItems`, `contains`, `minContains`, or `maxContains` that rely on a beginning (for relative positioning) or an ending (to determine if a threshold has been reached or a limit has been exceeded) when the sequence is intended to represent a subsequence of a larger stream. | ||
If such keywords are used, their behavior remains well-defined but may be counter-intuitive for users that expect them to apply to the stream as a whole rather than each subsequence as it is processed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally wonder if this trade-off in slight confusion is worth it. The modelling of jsonl/sse in OpenAPI I've personally seen has always been for an indefinite-length stream, and I feel it might be a bit confusing for OAS authors and tool vendors to represent those as a An alternative modelling is to have the schema model purely the JSON within the stream, and to validate the
This approach has a few advantages for both JSONL and SSE. For JSONL, it:
Note E.g. as Speakeasy, one of the client SDK generators as we convert the schema into a native type in each language, with An alternative modelling that supports/indicates a finite length JSONL response (note: we haven't actually seen any of these APIs yet, but my variant proposal otherwise closes the door on them) could be to represent that information within a new entry under the media type object, perhaps by following the example set by the
For SSE, there are also advantages. Consider the special data types for
By continuing to represent the stream this way, we could open the door to richer modelling of the top level properties to also fit into the "encoding" object. E.g. consider the "sentinel" event; something that's become popularised by the AI/LLM APIs by sending
By modelling it as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ThomasRooney first, let me apologize for not tagging you in the original PR comment- I knew I was missing someone! I'm going to take a while to think through this further, and also tag @gregsdennis who asked about this direction on Slack. For now, I'll just state a few important principles that are guiding me here:
|
||
|
||
##### Sequential JSON | ||
|
||
For any media type where the items in the sequence are JSON values, no conversion beyond treating the sequence as an array is required. | ||
JSON Text Sequences (`application/json-seq` and the `+json-seq` suffix, [[?RFC7464]]), JSON Lines (`application/jsonl`), and NDJSON (`application/x-ndjson`) are all in this category. | ||
Note that the media types for JSON Lines and NDJSON are not registered with the IANA, but are in common use. | ||
|
||
The following example, which uses `application/json-seq` but would be identical aside from the media type for either `application/jsonl` or `application/ndjson`, models a finite stream consisting of a single metadata document followed by an indefinite number of data documents consisting of numeric measurements with units: | ||
|
||
```YAML | ||
content: | ||
application/json-seq: | ||
schema: | ||
type: array | ||
prefixItems: | ||
- $comment: Metadata for all subsequent data documents | ||
type: object | ||
required: | ||
- subject | ||
- dateCollected | ||
properties: | ||
subject: | ||
type: string | ||
dateCollected: | ||
type: string | ||
format: date-time | ||
items: | ||
$comment: A JSON document holding data | ||
type: object | ||
required: | ||
- measurement | ||
- unit | ||
properties: | ||
measurement: | ||
type: number | ||
unit: | ||
type: string | ||
``` | ||
|
||
##### Server-Sent Event Streams | ||
|
||
The `text/event-stream` from the [HTML specification](https://html.spec.whatwg.org/multipage/iana.html#text/event-stream), which is also not IANA-registered, uses a custom named field format for its items. | ||
Field names can be repeated within an item to allow splitting the value across multiple lines; such split values MUST be treated the same as if they were a single field, with newlines added as required by the `text/event-stream` specification. | ||
|
||
Field value types MUST be handled as specified by the `text/event-stream` specification (e.g. the `retry` field value is modeled as a JSON number that is expected to be of JSON Schema `type: integer`), and fields not given an explicit value type MUST be handled as strings. | ||
|
||
The `text/event-stream` specification requires that fields with Unknown names, as well as `id` fields where the value contains `U+0000 NULL` be ignored. | ||
These fields SHOULD NOT be present in the data used with the Schema Object. | ||
|
||
For example, the following `text/event-stream` document: | ||
|
||
```EVENTSTREAM | ||
event: add | ||
data: This data is formatted | ||
data: across two lines | ||
retry: 5 | ||
|
||
event: add | ||
data: 1234.5678 | ||
unknown-field: this is ignored | ||
``` | ||
|
||
is equivalent to this JSON instance for the purpose of working with the Schema Object: | ||
|
||
```JSON | ||
[ | ||
{ | ||
"event": "add", | ||
"data": "This data is formatted\nacross two lines", | ||
"retry": 5 | ||
}, | ||
{ | ||
"event": "add", | ||
"data": "1234.5678" | ||
} | ||
] | ||
``` | ||
|
||
Note that `"1234.5678"` is a string, which avoids problems with number sizes and precision. | ||
See [Data Type Format](#data-type-format) for options for handling numbers transported as strings. | ||
Note also the newline inserted in the string in the first entry, and the absence of the field labeled `unknown-field` in the second entry. | ||
|
||
The following Schema Object is a generic schema for the `text/event-stream` media type as documented by the HTML specification as of the time of this writing: | ||
|
||
```YAML | ||
type: array | ||
items: | ||
type: object | ||
required: | ||
- data | ||
properties: | ||
data: | ||
type: string | ||
event: | ||
type: string | ||
id: | ||
type: string | ||
retry: | ||
type: integer | ||
``` | ||
|
||
Some users of `text/event-stream` use a format such as JSON for field values, particularly the `data` field. | ||
Use JSON Schema's keywords for working with the [contents of string-encoded data](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#name-a-vocabulary-for-the-conten), particularly `contentMediaType` and `contentSchema`, to describe and validate such fields with more detail than string-related validation keywords such as `pattern` can support. | ||
|
||
### HTTP Status Codes | ||
|
||
The HTTP Status Codes are used to indicate the status of the executed operation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is confusing to me, and doesn't seem to reflect the requirement that the Schema Object modeling the sequence must itself be of
type: array
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no requirement that the Schema Object include
type: array
, although it would be a good practice.What we're talking about here is not so much what to put in the Schema Object, but what data structure to convert the document to in order to use that data structure with the document.
Implementations don't get that from the Schema Object, they get that from these requirements, so it would be an error on the part of the implementation to pass anything but an array here. Of course, it's good practice to put the
type: array
in, and if you have other tools that depend on thetype
keyword and aren't paying attention to the media type with which the Schema Object is used, then you have to do that. But there's no requirement for it to be in the Schema Object.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duncanbeevers my most recent commit (after a force-push that was a re-base of the unchanged original commit) added some clarification here, please see if that helps!