Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sr/avro: fix missing default compat check for null (v2) #24032

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

pgellert
Copy link
Contributor

@pgellert pgellert commented Nov 6, 2024

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types.

Make it easier to extend by simplifying the definition of a test case.
This adds more tests for the avro incompatibility
`reader_field_missing_default_value`. All of these tests are already
passing, while the tests added in the following commit show the change
in behaviour.
This fixes a bug in the avro compatibility check for fields present in
the reader but missing from the writer. In this case the writer and the
reader are compatible only if the reader field has a specified default
value.

The Avro library sets the default value to `GenericDatum()` which has
type `AVRO_NULL` when the default is not specified. This is what we are
trying to detect in this check. However, the check was confusing an
explicit null default for a union-type as a missing value. The former is
represented as `GenericDatum(Union(Null))` and because of the way
`GenericDatum::type()` delegates to the type of the first union leaf
value when `GenericDatum` contains a union, the check thought that the
value was missing, when it was in fact specified as null.

To solve for the above, we also check that the type is not a union type
to be able to detect when the default is not set. (The same check is
used in the Avro library to detect unset default values, for example,
when printing an Avro schema as JSON.)

The caveat is that we are still overly restrictive for fields of type
null, because in that case, the default value will be `GenericDatum()`
regardless of whether the default value was explicitly set to null or
not. This is a limitation of the Avro library's API and could only be
fixed by making breaking changes to the Avro library's API.

Also note that the checks against `reader.leafAt(...)` have been
removed. They were attempting to compare the type of the field against
the type of the default value. This is not necessary because this is
ensured by the Avro library's built-in schema validation done while
parsing the Avro schema.
@pgellert pgellert self-assigned this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant