Skip to content

Commit f842d9b

Browse files
Pydantic integration (elastic#3086)
* Support `Annotated` typing hint * Add option to exclude DSL class field from mapping * Pydantic integration with the BaseESModel class * object and nested fields * complete CRUD example * Use a smaller dataset * documentation * Update examples/quotes/backend/pyproject.toml Co-authored-by: Quentin Pradet <[email protected]> * Update examples/quotes/backend/quotes.py Co-authored-by: Quentin Pradet <[email protected]> * Use a better screenshot --------- Co-authored-by: Quentin Pradet <[email protected]>
1 parent 419c1ff commit f842d9b

30 files changed

+6402
-225
lines changed

docs/reference/dsl_how_to_guides.md

Lines changed: 120 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -630,7 +630,7 @@ For more comprehensive examples have a look at the [DSL examples](https://github
630630

631631
### Document [doc_type]
632632

633-
If you want to create a model-like wrapper around your documents, use the `Document` class. It can also be used to create all the necessary mappings and settings in elasticsearch (see `life-cycle` for details).
633+
If you want to create a model-like wrapper around your documents, use the `Document` class (or the equivalent `AsyncDocument` for asynchronous applications). It can also be used to create all the necessary mappings and settings in Elasticsearch (see [Document life cycle](#life-cycle) below for details).
634634

635635
```python
636636
from datetime import datetime
@@ -721,9 +721,19 @@ class Post(Document):
721721
published: bool # same as published = Boolean(required=True)
722722
```
723723

724-
It is important to note that when using `Field` subclasses such as `Text`, `Date` and `Boolean`, they must be given in the right-side of an assignment, as shown in examples above. Using these classes as type hints will result in errors.
724+
::::{note}
725+
When using `Field` subclasses such as `Text`, `Date` and `Boolean` to define attributes, these classes must be given in the right-hand side.
726+
727+
```python
728+
class Post(Document):
729+
title = Text() # correct
730+
subtitle: Text # incorrect
731+
```
725732

726-
Python types are mapped to their corresponding field types according to the following table:
733+
Using a `Field` subclass as a Python type hint will result in errors.
734+
::::
735+
736+
Python types are mapped to their corresponding `Field` types according to the following table:
727737

728738
| Python type | DSL field |
729739
| --- | --- |
@@ -735,7 +745,7 @@ Python types are mapped to their corresponding field types according to the foll
735745
| `datetime` | `Date(required=True)` |
736746
| `date` | `Date(format="yyyy-MM-dd", required=True)` |
737747

738-
To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the field object.
748+
To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the `Field` object.
739749

740750
```python
741751
from typing import Optional, List
@@ -763,7 +773,7 @@ class Post(Document):
763773
comments: List[Comment] # same as comments = Nested(Comment, required=True)
764774
```
765775

766-
Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch field type. To choose a field type that is different than the one that is assigned according to the table above, the desired field instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:
776+
Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch `Field` type. To choose a type that is different than the one that is assigned according to the table above, the desired `Field` instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:
767777

768778
```python
769779
class MyDocument(Document):
@@ -787,7 +797,7 @@ class MyDocument(Document):
787797
category: str = mapped_field(Keyword(), default="general")
788798
```
789799

790-
When using the `mapped_field()` wrapper function, an explicit field type instance can be passed as a first positional argument, as the `category` field does in the example above.
800+
The `mapped_field()` wrapper function can optionally be given an explicit field type instance as a first positional argument, as the `category` field does in the example above to be defined as `Keyword` instead of the `Text` default.
791801

792802
Static type checkers such as [mypy](https://mypy-lang.org/) and [pyright](https://github.com/microsoft/pyright) can use the type hints and the dataclass-specific options added to the `mapped_field()` function to improve type inference and provide better real-time code completion and suggestions in IDEs.
793803

@@ -829,17 +839,17 @@ s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)
829839

830840
When specifying sorting order, the `+` and `-` unary operators can be used on the class field attributes to indicate ascending and descending order.
831841

832-
Finally, the `ClassVar` annotation can be used to define a regular class attribute that should not be mapped to the Elasticsearch index:
842+
Finally, it is also possible to define class attributes and request that they are ignored when building the Elasticsearch mapping. One way is to type attributes with the `ClassVar` annotation. Alternatively, the `mapped_field()` wrapper function accepts an `exclude` argument that can be set to `True`:
833843

834844
```python
835845
from typing import ClassVar
836846

837847
class MyDoc(Document):
838848
title: M[str] created_at: M[datetime] = mapped_field(default_factory=datetime.now)
839849
my_var: ClassVar[str] # regular class variable, ignored by Elasticsearch
850+
anoter_custom_var: int = mapped_field(exclude=True) # also ignored by Elasticsearch
840851
```
841852

842-
843853
#### Note on dates [_note_on_dates]
844854

845855
The DSL module will always respect the timezone information (or lack thereof) on the `datetime` objects passed in or stored in Elasticsearch. Elasticsearch itself interprets all datetimes with no timezone information as `UTC`. If you wish to reflect this in your python code, you can specify `default_timezone` when instantiating a `Date` field:
@@ -878,7 +888,7 @@ first.meta.id = 47
878888
first.save()
879889
```
880890

881-
All the metadata fields (`id`, `routing`, `index` etc) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:
891+
All the metadata fields (`id`, `routing`, `index`, etc.) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:
882892

883893
```python
884894
post = Post(meta={'id': 42})
@@ -961,12 +971,111 @@ first = Post.get(id=42)
961971
first.delete()
962972
```
963973

974+
#### Integration with Pydantic models
975+
976+
::::{warning}
977+
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
978+
::::
979+
980+
::::{note}
981+
This feature is available in the Python Elasticsearch client starting with release 9.2.0.
982+
::::
983+
984+
Applications that define their data models using [Pydantic](https://docs.pydantic.dev/latest/) can combine these
985+
models with Elasticsearch DSL annotations. To take advantage of this option, Pydantic's `BaseModel` base class
986+
needs to be replaced with `BaseESModel` (or `AsyncBaseESModel` for asynchronous applications), and then the model
987+
can include type annotations for Pydantic and Elasticsearch both, as demonstrated in the following example:
988+
989+
```python
990+
from typing import Annotated
991+
from pydantic import Field
992+
from elasticsearch import dsl
993+
from elasticsearch.dsl.pydantic import BaseESModel
994+
995+
class Quote(BaseESModel):
996+
quote: str
997+
author: Annotated[str, dsl.Keyword()]
998+
tags: Annotated[list[str], dsl.Keyword(normalizer="lowercase")]
999+
embedding: Annotated[list[float], dsl.DenseVector()] = Field(init=False, default=[])
1000+
1001+
class Index:
1002+
name = "quotes"
1003+
```
1004+
1005+
In this example, the `quote` attribute is annotated with a `str` type hint. Both Pydantic and Elasticsearch use this
1006+
annotation.
1007+
1008+
The `author` and `tags` attributes have a Python type hint and an Elasticsearch annotation, both wrapped with
1009+
Python's `typing.Annotated`. When using the `BaseESModel` class, the typing information intended for Elasticsearch needs
1010+
to be defined inside `Annotated`.
1011+
1012+
The `embedding` attribute includes a base Python type and an Elasticsearch annotation in the same format as the
1013+
other fields, but it adds Pydantic's `Field` definition as a right-hand side assignment.
1014+
1015+
Finally, any other items that need to be defined for the Elasticsearch document class, such as `class Index` and
1016+
`class Meta` entries (discussed later), can be added as well.
1017+
1018+
The next example demonstrates how to define `Object` and `Nested` fields:
1019+
1020+
```python
1021+
from typing import Annotated
1022+
from pydantic import BaseModel, Field
1023+
from elasticsearch import dsl
1024+
from elasticsearch.dsl.pydantic import BaseESModel
1025+
1026+
class Phone(BaseModel):
1027+
type: Annotated[str, dsl.Keyword()] = Field(default="Home")
1028+
number: str
1029+
1030+
class Person(BaseESModel):
1031+
name: str
1032+
main_phone: Phone # same as Object(Phone)
1033+
other_phones: list[Phone] # same as Nested(Phone)
1034+
1035+
class Index:
1036+
name = "people"
1037+
```
1038+
1039+
Note that inner classes do not need to be defined with a custom base class; these should be standard Pydantic model
1040+
classes. The attributes defined in these classes can include Elasticsearch annotations, as long as they are given
1041+
in an `Annotated` type hint.
1042+
1043+
All model classes that are created as described in this section function like normal Pydantic models and can be used
1044+
anywhere standard Pydantic models are used, but they have some added attributes:
1045+
1046+
- `_doc`: a class attribute that is a dynamically generated `Document` class to use with the Elasticsearch index.
1047+
- `meta`: an attribute added to all models that includes Elasticsearch document metadata items such as `id`, `score`, etc.
1048+
- `to_doc()`: a method that converts the Pydantic model to an Elasticsearch document.
1049+
- `from_doc()`: a class method that accepts an Elasticsearch document as an argument and returns an equivalent Pydantic model.
1050+
1051+
These are demonstrated in the examples below:
1052+
1053+
```python
1054+
# create a Pydantic model
1055+
quote = Quote(
1056+
quote="An unexamined life is not worth living.",
1057+
author="Socrates",
1058+
tags=["phillosophy"]
1059+
)
1060+
1061+
# save the model to the Elasticsearch index
1062+
quote.to_doc().save()
1063+
1064+
# get a document from the Elasticsearch index as a Pydantic model
1065+
quote = Quote.from_doc(Quote._doc.get(id=42))
1066+
1067+
# run a search and print the Pydantic models
1068+
s = Quote._doc.search().query(Match(Quote._doc.quote, "life"))
1069+
for doc in s:
1070+
quote = Quote.from_doc(doc)
1071+
print(quote.meta.id, quote.meta.score, quote.quote)
1072+
```
9641073

9651074
#### Analysis [_analysis]
9661075

9671076
To specify `analyzer` values for `Text` fields you can just use the name of the analyzer (as a string) and either rely on the analyzer being defined (like built-in analyzers) or define the analyzer yourself manually.
9681077

969-
Alternatively you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:
1078+
Alternatively, you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:
9701079

9711080
```python
9721081
from elasticsearch.dsl import analyzer, tokenizer
@@ -1634,7 +1743,7 @@ for response in responses:
16341743

16351744
### Asynchronous Documents, Indexes, and more [_asynchronous_documents_indexes_and_more]
16361745

1637-
The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.
1746+
The `Document`, `BaseESModel`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.
16381747

16391748
Auxiliary classes that do not perform I/O do not have asynchronous versions. The same classes can be used in synchronous and asynchronous applications.
16401749

docs/reference/dsl_tutorials.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ In this example you can see:
134134
* retrieving and saving the object into Elasticsearch
135135
* accessing the underlying client for other APIs
136136

137-
You can see more in the `persistence` chapter.
137+
You can see more in the [persistence](dsl_how_to_guides.md#_persistence_2) chapter.
138138

139139

140140
## Pre-built Faceted Search [_pre_built_faceted_search]

elasticsearch/dsl/document_base.py

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
overload,
3535
)
3636

37+
from typing_extensions import _AnnotatedAlias
38+
3739
try:
3840
import annotationlib
3941
except ImportError:
@@ -358,6 +360,10 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
358360
# the field has a type annotation, so next we try to figure out
359361
# what field type we can use
360362
type_ = annotations[name]
363+
type_metadata = []
364+
if isinstance(type_, _AnnotatedAlias):
365+
type_metadata = type_.__metadata__
366+
type_ = type_.__origin__
361367
skip = False
362368
required = True
363369
multi = False
@@ -404,6 +410,12 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
404410
# use best field type for the type hint provided
405411
field, field_kwargs = self.type_annotation_map[type_] # type: ignore[assignment]
406412

413+
# if this field does not have a right-hand value, we look in the metadata
414+
# of the annotation to see if we find it there
415+
for md in type_metadata:
416+
if isinstance(md, (_FieldMetadataDict, Field)):
417+
attrs[name] = md
418+
407419
if field:
408420
field_kwargs = {
409421
"multi": multi,
@@ -416,17 +428,20 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
416428
# this field has a right-side value, which can be field
417429
# instance on its own or wrapped with mapped_field()
418430
attr_value = attrs[name]
419-
if isinstance(attr_value, dict):
431+
if isinstance(attr_value, _FieldMetadataDict):
420432
# the mapped_field() wrapper function was used so we need
421433
# to look for the field instance and also record any
422434
# dataclass-style defaults
435+
if attr_value.get("exclude"):
436+
# skip this field
437+
continue
423438
attr_value = attrs[name].get("_field")
424439
default_value = attrs[name].get("default") or attrs[name].get(
425440
"default_factory"
426441
)
427442
if default_value:
428443
field_defaults[name] = default_value
429-
if attr_value:
444+
if isinstance(attr_value, Field):
430445
value = attr_value
431446
if required is not None:
432447
value._required = required
@@ -505,12 +520,19 @@ def __delete__(self, instance: Any) -> None: ...
505520
M = Mapped
506521

507522

523+
class _FieldMetadataDict(dict[str, Any]):
524+
"""This class is used to identify metadata returned by the `mapped_field()` function."""
525+
526+
pass
527+
528+
508529
def mapped_field(
509530
field: Optional[Field] = None,
510531
*,
511532
init: bool = True,
512533
default: Any = None,
513534
default_factory: Optional[Callable[[], Any]] = None,
535+
exclude: bool = False,
514536
**kwargs: Any,
515537
) -> Any:
516538
"""Construct a field using dataclass behaviors
@@ -520,22 +542,25 @@ def mapped_field(
520542
options.
521543
522544
:param field: The instance of ``Field`` to use for this field. If not provided,
523-
an instance that is appropriate for the type given to the field is used.
545+
an instance that is appropriate for the type given to the field is used.
524546
:param init: a value of ``True`` adds this field to the constructor, and a
525-
value of ``False`` omits it from it. The default is ``True``.
547+
value of ``False`` omits it from it. The default is ``True``.
526548
:param default: a default value to use for this field when one is not provided
527-
explicitly.
549+
explicitly.
528550
:param default_factory: a callable that returns a default value for the field,
529-
when one isn't provided explicitly. Only one of ``factory`` and
530-
``default_factory`` can be used.
551+
when one isn't provided explicitly. Only one of ``factory`` and
552+
``default_factory`` can be used.
553+
:param exclude: Set to ``True`` to exclude this field from the Elasticsearch
554+
index.
531555
"""
532-
return {
533-
"_field": field,
534-
"init": init,
535-
"default": default,
536-
"default_factory": default_factory,
556+
return _FieldMetadataDict(
557+
_field=field,
558+
init=init,
559+
default=default,
560+
default_factory=default_factory,
561+
exclude=exclude,
537562
**kwargs,
538-
}
563+
)
539564

540565

541566
@dataclass_transform(field_specifiers=(mapped_field,))

0 commit comments

Comments
 (0)