Add SearchIndex and VectorSearchIndex #264

WaVEV · 2025-03-03T02:45:35Z

No description provided.

django_mongodb_backend/schema.py

django_mongodb_backend/base.py

requirements.txt

tests/indexes_/test_atlas_indexes.py

django_mongodb_backend/introspection.py

django_mongodb_backend/schema.py

django_mongodb_backend/base.py

django_mongodb_backend/indexes.py

django_mongodb_backend/schema.py

django_mongodb_backend/indexes.py

django_mongodb_backend/fields/array.py

django_mongodb_backend/schema.py

django_mongodb_backend/indexes.py

django_mongodb_backend/schema.py

django_mongodb_backend/indexes.py

tests/indexes_/test_atlas_indexes.py

django_mongodb_backend/indexes.py

tests/indexes_/test_atlas_indexes.py

django_mongodb_backend/indexes.py

django_mongodb_backend/base.py

.github/workflows/mongodb_settings.py

django_mongodb_backend/fields/array.py

django_mongodb_backend/indexes.py

django_mongodb_backend/introspection.py

django_mongodb_backend/features.py

.github/workflows/test-python.yml

tests/system_checks/tests.py

django_mongodb_backend/indexes.py

tests/system_checks/tests.py

tests/indexes_/models.py

django_mongodb_backend/indexes.py

django_mongodb_backend/checks.py

tests/system_checks/tests.py

django_mongodb_backend/schema.py

tests/indexes_/test_checks.py

tests/indexes_/models.py

django_mongodb_backend/indexes.py

django_mongodb_backend/checks.py

.github/workflows/mongodb_settings.py

django_mongodb_backend/indexes.py

tests/indexes_/test_atlas_indexes.py

django_mongodb_backend/indexes.py

django_mongodb_backend/schema.py

.github/workflows/start_local_atlas.sh

django_mongodb_backend/indexes.py

tests/indexes_/test_checks.py

tests/indexes_/test_atlas_indexes.py

django_mongodb_backend/indexes.py

timgraham · 2025-04-22T02:29:18Z

django_mongodb_backend/indexes.py

+        if field.get_internal_type() == "UUIDField":
+            return "uuid"


As currently implemented, I believe this should be string:

django-mongodb-backend/django_mongodb_backend/base.py

Line 62 in 43f477c

"UUIDField": "string",

Separately, maybe it's worth confirming that UUIDField can't store it values as BSON uuid. Setting DatabaseFeatures.has_native_uuid_field = True raises the error: ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.

I don't think I tried any further to make that work.

yes, good catch. I think we could support UUIDFields if we adapt UUIDField.get_db_prep_value and maybe to_python (have to check); like DurationField was adapted. It seems to be out of scope, we could add another task for that.

django_mongodb_backend/indexes.py

timgraham · 2025-04-23T01:03:19Z

django_mongodb_backend/indexes.py

+        if field.get_internal_type() == "EmbeddedModelField":
+            return "embeddedDocuments"


This is the type we also will want for EmbeddedModelArrayField, correct? We might check if EmbeddedModelField.db_type() should be defined and could return embeddedDocuments.

How about support for ArrayField? https://www.mongodb.com/docs/atlas/atlas-search/field-types/array-type/

And JSONField? (search type="document"?)

JSONField is supported if I am not mistaken. It types as object so is mapped as document. We could map ArrayField also. I don't see any cons to define EmbeddedModelField.db_type() as embeddedDocuments.

EDIT: the only cons is the last S. All the other types are in singular.

Supporting ArrayField would be as I explained in here. I will add and if it is not very convincing, we could remove it. Also I noticed that we could have an array of integers or float that not necessarily be an embedded thing (a vector to do a search), also it could be with size. So the current classification won't work.

For Vector search I cannot add the support as simple as Search Indexes. It might need something that separates filter fields from vector fields. It will change the class sign. It is not very difficult but I suggest to work on that in a next iteration.

django_mongodb_backend/indexes.py

timgraham · 2025-04-23T01:25:44Z

django_mongodb_backend/indexes.py

+                            id=f"{self._error_id_prefix}.E002",
+                        )
+                    )
+                if not isinstance(field_.base_field, FloatField | DecimalField):


Is isinstance() definitely what we want here as opposed to say checking db_type()? (That would be double and decimal for those fields.) On the one hand, an error message that references FloatField & DecimalField probably covers most common use cases and is going to be easier to understand that referencing double/decimal. Just wondering if you had any thoughts on this.

Is this the relevant documentation for this restriction? https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/#about-the-vector-type I see double there but it's not clear to me how DecimalField qualifies. I'm not sure what the doucmentation means when it mentions:

BSON BinData vector subtype float32

BSON BinData vector subtype int1

BSON BinData vector subtype int8

DecimalField is not used for that, is Integer field. But Given that not all the similarities are supported for int1. Shall we support only float32?

I'm not sure, maybe we ask the team.

tests/indexes_/test_search_indexes.py

timgraham · 2025-04-23T01:34:28Z

django_mongodb_backend/indexes.py

+                    )
+            else:
+                field_type = field_.db_type(connection)
+                search_type = self.search_index_data_types(field_, field_type)


The linked doc says, "You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types."
Does this account for arrays?

Well, will try to add some light over this behaviours. The array could be defined in the left hand side if I try to make a query like:

db.t.t.aggregate([ { "$vectorSearch": { "index": "vector_index", "filter": { "name": ["example", "example3"] }, "path": "values", "queryVector": [0,0,0,0,0], "numCandidates": 150, "limit": 10, "quantization": "scalar" } }])

I got the following error:
MongoServerError[UnknownError]: PlanExecutor error during aggregation :: caused by :: "filter" must be a boolean, objectId, number, string, date, uuid, or null
So I cannot pass an array as a filter.

But I can define a documents like:

db.t.t.insertOne({ name: "example", values: [1.23, 4.56, 7.89, 0.12,3.45] }) db.t.t.insertOne({ name: ["example", "example2"], values: [1.23, 4.56, 7.89, 0.12,3.45] })

Then I filter by:

db.t.t.aggregate([ { "$vectorSearch": { "index": "vector_index", "filter": { "name": "example2" }, "path": "values", "queryVector": [0,0,0,0,0], "numCandidates": 150, "limit": 10, "quantization": "scalar" } }])

I will get the second and if I filter by example I will get both.

But indeed we could support array of those types.

django_mongodb_backend/indexes.py

tests/indexes_/test_search_indexes.py

tests/indexes_/models.py

tests/indexes_/test_search_indexes.py

django_mongodb_backend/indexes.py

.github/workflows/test-python-atlas.yml

+    name: Django Test Suite
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout django-mongodb-backend
+        uses: actions/checkout@v4
+        with:
+          persist-credentials: false
+      - name: install django-mongodb-backend
+        run: |
+          pip3 install --upgrade pip
+          pip3 install -e .
+      - name: Checkout Django
+        uses: actions/checkout@v4
+        with:
+          repository: 'mongodb-forks/django'
+          ref: 'mongodb-5.1.x'
+          path: 'django_repo'
+          persist-credentials: false
+      - name: Install system packages for Django's Python test dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install libmemcached-dev
+      - name: Install Django and its Python test dependencies
+        run: |
+          cd django_repo/tests/
+          pip3 install -e ..
+          pip3 install -r requirements/py3.txt
+      - name: Copy the test settings file
+        run: cp .github/workflows/mongodb_settings.py django_repo/tests/
+      - name: Copy the test runner file
+        run: cp .github/workflows/runtests.py django_repo/tests/runtests_.py
+      - name: Start local Atlas
+        working-directory: .
+        run: bash .github/workflows/start_local_atlas.sh mongodb/mongodb-atlas-local:7
+      - name: Run tests
+        run: python3 django_repo/tests/runtests_.py


django_mongodb_backend/indexes.py

docs/source/ref/models/indexes.rst

Co-authored-by: Tim Graham <[email protected]>

WaVEV requested review from timgraham and Jibola March 3, 2025 02:45

timgraham reviewed Mar 4, 2025

View reviewed changes

WaVEV force-pushed the create-atlas-indexes branch from 03629ae to de3d245 Compare March 8, 2025 03:50

timgraham reviewed Mar 11, 2025

View reviewed changes

django_mongodb_backend/introspection.py Outdated Show resolved Hide resolved

django_mongodb_backend/schema.py Outdated Show resolved Hide resolved

django_mongodb_backend/base.py Outdated Show resolved Hide resolved

WaVEV force-pushed the create-atlas-indexes branch 2 times, most recently from 1bf4717 to 7dc04ab Compare March 20, 2025 21:49

WaVEV commented Mar 20, 2025

View reviewed changes

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

WaVEV marked this pull request as ready for review March 20, 2025 23:05

timgraham reviewed Mar 20, 2025

View reviewed changes

timgraham reviewed Mar 21, 2025

View reviewed changes

django_mongodb_backend/schema.py Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/schema.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

timgraham mentioned this pull request Mar 23, 2025

Rename ArrayField.size to max_size #273

Merged

timgraham reviewed Mar 23, 2025

View reviewed changes

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

tests/indexes_/test_atlas_indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

timgraham changed the title ~~Create atlas indexes~~ Add SearchIndex and VectorSearchIndex Mar 23, 2025

timgraham reviewed Mar 23, 2025

View reviewed changes

tests/indexes_/test_atlas_indexes.py Outdated Show resolved Hide resolved

WaVEV force-pushed the create-atlas-indexes branch from 60d49de to 2865e13 Compare March 25, 2025 03:05

timgraham reviewed Mar 25, 2025

View reviewed changes

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

Jibola reviewed Mar 25, 2025

View reviewed changes

WaVEV force-pushed the create-atlas-indexes branch 2 times, most recently from 9fdc143 to 15e3450 Compare March 31, 2025 03:05

WaVEV force-pushed the create-atlas-indexes branch 3 times, most recently from b06db74 to e69da64 Compare April 9, 2025 06:00

This comment was marked as resolved.

Sign in to view

WaVEV force-pushed the create-atlas-indexes branch from 92caf14 to 61b1c05 Compare April 12, 2025 16:19

ShaneHarvey reviewed Apr 15, 2025

View reviewed changes

django_mongodb_backend/features.py Outdated Show resolved Hide resolved

WaVEV force-pushed the create-atlas-indexes branch from 00b0323 to 08654ec Compare April 15, 2025 02:17

timgraham reviewed Apr 16, 2025

View reviewed changes

timgraham reviewed Apr 17, 2025

View reviewed changes

timgraham reviewed Apr 18, 2025

View reviewed changes

django_mongodb_backend/schema.py Outdated Show resolved Hide resolved

.github/workflows/start_local_atlas.sh Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

WaVEV force-pushed the create-atlas-indexes branch from 3237ac8 to f55a410 Compare April 18, 2025 22:57

timgraham reviewed Apr 19, 2025

View reviewed changes

timgraham reviewed Apr 20, 2025

View reviewed changes

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

WaVEV force-pushed the create-atlas-indexes branch from 3d9e815 to f4a5b9a Compare April 21, 2025 16:53

timgraham reviewed Apr 22, 2025

View reviewed changes

WaVEV commented Apr 22, 2025

View reviewed changes

django_mongodb_backend/indexes.py Show resolved Hide resolved

WaVEV commented Apr 22, 2025

View reviewed changes

django_mongodb_backend/indexes.py Show resolved Hide resolved

timgraham reviewed Apr 23, 2025

View reviewed changes

tests/indexes_/test_search_indexes.py Outdated Show resolved Hide resolved

tests/indexes_/models.py Show resolved Hide resolved

tests/indexes_/test_search_indexes.py Outdated Show resolved Hide resolved

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

timgraham force-pushed the create-atlas-indexes branch from 21c9047 to 0d6f719 Compare April 24, 2025 18:28

github-advanced-security bot found potential problems Apr 24, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

timgraham reviewed Apr 26, 2025

View reviewed changes

django_mongodb_backend/indexes.py Outdated Show resolved Hide resolved

docs/source/ref/models/indexes.rst Outdated Show resolved Hide resolved

timgraham force-pushed the create-atlas-indexes branch from fd52b43 to c59297c Compare April 26, 2025 01:07

timgraham force-pushed the create-atlas-indexes branch from 455f98e to 05766b4 Compare May 3, 2025 21:26

timgraham and others added 2 commits May 4, 2025 19:52

Move assertAddRemoveIndex() to a mixin

38cf2b4

Add SearchIndex and VectorSearchIndex

8804c87

Co-authored-by: Tim Graham <[email protected]>

timgraham force-pushed the create-atlas-indexes branch from 05766b4 to 8804c87 Compare May 4, 2025 23:59

timgraham approved these changes May 5, 2025

View reviewed changes

timgraham merged commit 8804c87 into mongodb:main May 5, 2025
15 checks passed

blink1073 temporarily deployed to release May 5, 2025 05:37 — with GitHub Actions Inactive

blink1073 temporarily deployed to release May 5, 2025 05:39 — with GitHub Actions Inactive

		if field.get_internal_type() == "EmbeddedModelField":
		return "embeddedDocuments"

Add SearchIndex and VectorSearchIndex #264

Add SearchIndex and VectorSearchIndex #264

Uh oh!

Conversation

WaVEV commented Mar 3, 2025 • edited by timgraham Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WaVEV commented Mar 3, 2025 •

edited by timgraham

Loading

WaVEV Apr 22, 2025 •

edited

Loading

WaVEV Apr 23, 2025 •

edited

Loading

WaVEV Apr 23, 2025 •

edited

Loading