Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra template data conflicts with msgspec discriminated union support #2211

Open
ncoghlan opened this issue Dec 9, 2024 · 4 comments · May be fixed by #2215
Open

Extra template data conflicts with msgspec discriminated union support #2211

ncoghlan opened this issue Dec 9, 2024 · 4 comments · May be fixed by #2215

Comments

@ncoghlan
Copy link

ncoghlan commented Dec 9, 2024

I am attempting to use the msgspec support for the OpenAPI discriminator fields in JSON Schema (as added in da7d054)

I've been reverse engineering the expected usage from the test suite, but have hit a point where the behaviour is so strange it feels like a bug:

  • even though only a handful of union fields have discriminator set, every Struct subclass in the generated file has tag_field="type: and tag=None (even the ones that have nothing to do with the updated union fields)
  • the classes that are legitimately part of the discriminated unions also have tag=None set, and their type field is just Literal["value"] rather than the expected class-var-with-default-value that the test suite says I should be getting

I did notice the test suite used type_ instead of type as its discriminator, so I attempted to set {"type": "type_"} as the alias map, but that didn't make any difference to the output. I can't change the field name in the schema itself, as that's defined by the API I'm writing a client for, not by my client code.

I also couldn't find anything in the main documentation indicating that type was problematic as a field name.

The generate call looks like this (kw_only is set via the extra template data because setting the keyword_only=True call option didn't have any effect):

    generate(
        _SCHEMA_PATH,
        input_file_type=InputFileType.JsonSchema,
        output=_MODEL_PATH,
        output_model_type=DataModelType.MsgspecStruct,
        field_constraints=True,
        use_annotated=True,
        use_double_quotes=True,
        use_subclass_enum=True,
        use_union_operator=True,
        extra_template_data=defaultdict(
            dict,
            {
                "#all#": {
                    "base_class_kwargs": {
                        # "None" fields should be omitted, not sent as "null"
                        "omit_defaults": True,
                        # Allow non-default fields after default fields
                        "kw_only": True,
                    }
                }
            },
        ),
    )

And a snippet of the "definitions" entry in the input schema:

    "chatMessagePartData": {
      "anyOf": [
        {
          "$ref": "#/definitions/chatMessagePartTextData"
        },
        {
          "$ref": "#/definitions/chatMessagePartFileData"
        },
        {
          "$ref": "#/definitions/chatMessagePartToolCallRequestData"
        },
        {
          "$ref": "#/definitions/chatMessagePartToolCallResultData"
        }
      ],
      "discriminator": {
        "mapping": {
          "text": "#/definitions/chatMessagePartTextData",
          "file": "#/definitions/chatMessagePartFileData",
          "toolCallRequest": "#/definitions/chatMessagePartToolCallRequestData",
          "toolCallResult": "#/definitions/chatMessagePartToolCallResultData"
        },
        "propertyName": "type"
      }
    },
    "chatMessagePartFileData": {
      "type": "object",
      "properties": {
        "type": {
          "type": "string",
          "const": "file"
        },
        "name": {
          "type": "string"
        },
        "identifier": {
          "type": "string"
        },
        "sizeBytes": {
          "type": "number"
        },
        "fileType": {
          "$ref": "#/definitions/fileType"
        }
      },
      "required": [
        "type",
        "name",
        "identifier",
        "sizeBytes",
        "fileType"
      ],
      "additionalProperties": false
    },
@ncoghlan
Copy link
Author

ncoghlan commented Dec 9, 2024

I tried an experiment with omitting the extra_template_data argument. That resolved the problem that spurious tag_field and tag entries appeared everywhere (suggesting there may be a dict somewhere in datamodel-code-generator that should be getting copied but isn't), but didn't fix the definitions of the union members. Those are still coming out like:

class ChatMessagePartFileData(Struct, tag_field="type", tag=None):
    type: Literal["file"]
    name: str
    identifier: str
    sizeBytes: float
    fileType: FileType

@ncoghlan
Copy link
Author

ncoghlan commented Dec 10, 2024

I've started work on a test case PR that adds additional tests for the extra data handling, including adding an unrelated type (not involved in the union) to the discriminator literals test case for JSON schema. That confirmed there's a genuine problem with that interaction, at least in the jsonschema-to-msgspec case (the openapi-to-msgspec test cases don't cover discriminators).

The keyword_only=True option appears to be working fine in the test cases, so something else must have been wrong when I tried it to make me think it wasn't working.

@ncoghlan
Copy link
Author

ncoghlan commented Dec 10, 2024

Well, that was a journey. Turns out my initial instinct was almost right. Inadvertent data sharing does happen in the base model constructor, but it's specifically in the self.extra_template_data.update(all_model_extra_template_data) call (which I initially dismissed as being OK, because it's updating a fresh dictionary every time).

The reason it fails is because the values in the "#all# entry may be mutable, as they are for "base_class_kwargs" in the example. The constructor needs to use copy.deepcopy to ensure mutable values aren't shared between model instances.

@ncoghlan ncoghlan changed the title Unclear how to use the msgspec discriminated union support with JSON Schema Conflict between msgspec discriminated union support and extra template data Dec 10, 2024
@ncoghlan ncoghlan changed the title Conflict between msgspec discriminated union support and extra template data Extra template data conflicts with msgspec discriminated union support Dec 10, 2024
@ncoghlan ncoghlan linked a pull request Dec 10, 2024 that will close this issue
@ncoghlan
Copy link
Author

With the patched version from #2215 clearing out the incorrect state sharing errors, I worked out that the other misbehaviours I encountered were due to missing property definition fields in my input schema:

  • title must be present on the discriminator field definitions for the ClassVar annotation to be generated
  • default must be set on the discriminator field definitions for the tag to be set properly (only defining const isn't enough)

I have retitled this issue accordingly (it now just covers the state sharing bug).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant