Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ordering of results (default order and through QueryParameters argument) #171

Open
kevinstadler opened this issue Dec 11, 2024 · 5 comments · May be fixed by #227
Open

Implement ordering of results (default order and through QueryParameters argument) #171

kevinstadler opened this issue Dec 11, 2024 · 5 comments · May be fixed by #227
Assignees
Labels
enhancement New feature or request priority Needs urgent fix

Comments

@kevinstadler
Copy link

Ordering of results should be possible based on a single scalar field of the root model.

  1. should be able to specify default order via ConfigDict of root python model
  2. should be specifiable dynamically through a QueryParameter (that can be exposed to FastAPI)
@kevinstadler kevinstadler added enhancement New feature or request priority Needs urgent fix labels Dec 11, 2024
@kevinstadler kevinstadler changed the title Implement ordering (via QueryParameters argument) Implement ordering of results (default order and through QueryParameters argument) Dec 11, 2024
@lu-pl lu-pl removed the priority Needs urgent fix label Dec 11, 2024
@lu-pl lu-pl self-assigned this Dec 17, 2024
@lu-pl lu-pl added the priority Needs urgent fix label Feb 2, 2025
lu-pl added a commit that referenced this issue Feb 6, 2025
lu-pl added a commit that referenced this issue Feb 6, 2025
lu-pl added a commit that referenced this issue Feb 6, 2025
@lu-pl
Copy link
Contributor

lu-pl commented Feb 6, 2025

Note: If ordering should be possible 1. via query parameters and 2. model_config, then query parameter ordering should take precedence over model_config orderings.

We discussed possible ordering by all scalar fields, i.e. also scalar fields of potentially nested models. This introduces the significant problem of how to handle alias name clashes though. So I am for restricting order-able fields to the top model scalar fields.

It is still possible to order by arbitrary SPARQL bindings through the model then, i.e. by using excluded fields in the top model. See e.g. the ungrouped wikidata example.

lu-pl added a commit that referenced this issue Feb 6, 2025
@lu-pl lu-pl added this to the v0.3.0 release milestone Feb 10, 2025
@lu-pl
Copy link
Contributor

lu-pl commented Feb 17, 2025

The prototype/ordering branch implements an experimental feature that allows to parametrize the QueryParameters model to dynamically compute order-able fields and inject them as enum.StrEnums into the QueryParameters model so that order-able fields can be displayed in the OpenAPI docs and also get validated by FastAPI.

E.g. given a model

class Work(BaseModel):
    model_config = ConfigDict(group_by="name")

    name: Annotated[str, SPARQLBinding("workName")]
    viafs: Annotated[list[str], SPARQLBinding("viaf")]


class Author(BaseModel):
    model_config = ConfigDict(group_by="gnd")

    gnd: str
    surname: Annotated[str, SPARQLBinding("nameLabel")]
    works: list[Work]

one could parametrize QueryParameters like so:

@app.get("/")
def base_route(
    query_parameters: Annotated[QueryParameters[Author], Query()],
) -> Page[Author]:
    return adapter.query(query_parameters)

This would yield the following additional OpenAPI docs field:

Image

See the initial draft of this: https://gist.github.com/lu-pl/718ebf86f5b81f68f2f88005c3663be6

Note: The most recent implementation does not use a metaclass for this anymore but the metaclass-ish __class_getitem__ hook in the mesaclass.

@lu-pl
Copy link
Contributor

lu-pl commented Feb 17, 2025

For the above described order-able feature it is necessary to provide namespacing, because given the rdfproxy.SPARQLBinding aliasing feature, model field names are not necessarily unambiguous.

Basically, there a two ways of going about namespacing in this case:

  1. model name namespacing,
  2. model field namespacing.

The current namespacing mechanism uses model name namespacing, i.e any order-able model fields are unambiguously referenced using the model name and the field name. A good example for this is wikidata_ungrouped_person_fastapi_example.py. Here, the model defines two ambiguous name fields with aliasing:

class Work(BaseModel):
    name: Annotated[str, SPARQLBinding("title")]

class Person(BaseModel):
    name: str
    work: Work

Which name?

With the currently implemented model name namespacing, the name fields can be unambiguously addressed with name (not namespaced because Person is the top model) and Work.name:

Image

@lu-pl
Copy link
Contributor

lu-pl commented Feb 17, 2025

@katharinawuensche made an excellent point in arguing that model name namespacing could still be ambiguous in the case where two fields in a model refer to the same nested model.

class Place(BaseModel):
    name: str

class Author(BaseModel):
    birth_place: Place
    death_place: Place

This however is a model that will never work in RDFProxy because RDFProxy is a mapper, not an ORM!

By design and intent, RDFProxy maps SPARQL bindings to (nested) Pydantic models and provides dataframe-based grouping and aggregation functionality. So the above snippet tries to map a name binding to both a (birth)place and a (death)place SPARQL binding, which is obviously not possible due to the set-nature of SPARQL binding projections.

I nonetheless implemented a mapper for model field namespacing:

class NamespacedFieldBindingsMap(FieldsBindingsMap):
    """Recursive FieldBindingsMap that generates namespaced key entries."""

    @staticmethod
    def _get_field_binding_mapping(model: type[_TModelInstance]) -> dict[str, str]:
        """Resolve model fields against rdfproxy.SPARQLBindings."""

        def _construct_bindings(model, _namespace: str = ""):
            bindings_map = FieldsBindingsMap(model)

            for k, v in model.model_fields.items():
                if isinstance(v.annotation, type(BaseModel)):
                    _namespace += f"{k}."
                    yield from _construct_bindings(v.annotation, _namespace=_namespace)

                if _is_scalar_type(v.annotation):
                    yield (
                        k if not _namespace else f"{_namespace}{k}",
                        bindings_map[k],
                    )

        return dict(_construct_bindings(model))h

For a model

class ReallyDeeplyNestedModel(BaseModel):
    field: Annotated[str, SPARQLBinding("alias_field")]

class DeeplyNestedModel(BaseModel):
    really_deeply_nested: ReallyDeeplyNestedModel

class NestedModel(BaseModel):
    deeply_nested: DeeplyNestedModel

class TopModel(BaseModel):
    nested: NestedModel

this would generate

{'nested.deeply_nested.really_deeply_nested.field': 'alias_field'}

as order-able field.

I am more in favor of model name namespacing, as it is much more concise and will always suffice for RDFProxy.

@lu-pl
Copy link
Contributor

lu-pl commented Feb 20, 2025

Ordering of results should be possible based on a single scalar field of the root model.

1. should be able to specify default order via ConfigDict of root python model

2. should be specifiable dynamically through a QueryParameter (that can be exposed to FastAPI)

There already are semantics in place for default ordering (see #128), so "default order via ConfigDict of root python model" as mentioned in the issue description is not necessary.

The fact that grouped models are default-ordered by the grouping binding and ungrouped models are default-ordered by the first binding of the projection must be prominently documented however!

@lu-pl lu-pl linked a pull request Feb 27, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority Needs urgent fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants