Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 179 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

(see previous attempts in #318, #319 and #323 for additional context)

@aclark4life
Copy link
Collaborator Author

Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:

"Only create an encrypted connection once then reuse it."

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

  • Unencrypted connection unless we need it
  • Encrypted connection when we need that can be used.

@timgraham
Copy link
Collaborator

timgraham commented Jun 27, 2025

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

@aclark4life
Copy link
Collaborator Author

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

I don't disagree, but it feels a lot like _start_transaction_under_autocommit which gets called by start_transaction_under_autocommit because autocommit is False. Django appears to stumble into _nodb_cursor when the encrypted connection fails to get the database version and while we don't use a cursor in this backend, we do have a "nosql" cursor that has __enter__ and __exit__ (I assume) to meet Django's expectations and we get an opportunity to modify the connection. @Jibola mentioned this design is suspect yesterday and I agree with both of you, particularly with regard to the desire to start with and maintain an encrypted connection first.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

Yes it works by design, not a side effect. I'm deep.copying settings_dict when DatabaseWrapper is initialized and so when DatabaseWrapper.connection is initialized it's unencrypted. When the schema needs encryption later, it's retrieved from _settings_dict.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

I'd make a few passes at it but did not get anywhere, I'll try again though.

@timgraham
Copy link
Collaborator

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jun 28, 2025

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

Copy that, thanks!

I've removed _nodb_cursor in 8e83ada and discovered the version check is the only time that error occurs. I now get errors like:

Traceback (most recent call last):
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
    yield
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
    return func(*args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

Still working on an unencrypted connection, but perhaps the only time we need it is for the version check.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jul 2, 2025

@ShaneHarvey @Jibola @timgraham FYI here is the pipeline that causes the let error:

(Pdb) pprint.pprint(pipeline)
[{'$lookup': {'as': 'django_content_type',
              'from': 'django_content_type',
              'let': {'parent__field__0': '$content_type_id'},
              'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0',
                                                                   '$_id']}]}}}]}},
 {'$unwind': '$django_content_type'},
 {'$match': {'$expr': {'$in': ['$content_type_id',
                               (ObjectId('6864933ec7cf8179e3ef1f8d'),)]}}},
 {'$project': {'codename': 1,
               'content_type_id': 1,
               'django_content_type': {'app_label': 1, 'model': 1}}},
 {'$sort': SON([('django_content_type.app_label', 1), ('django_content_type.model', 1), ('codename', 1)])}]

And here is the error again with some additional debug:

(Pdb) errmsg
"Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection."
(Pdb) code
51208
(Pdb) response
RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))
(Pdb) max_wire_version
26

And the full traceback:


Running post-migrate handlers for application contenttypes
Traceback (most recent call last):
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
    yield
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
    return func(*args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

Test settings:

import os

from django_mongodb_backend import encryption, parse_uri

kms_providers = encryption.get_kms_providers()

auto_encryption_opts = encryption.get_auto_encryption_opts(
    kms_providers=kms_providers,
)

DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017")
DATABASES = {
    "default": parse_uri(
        DATABASE_URL, db_name="djangotests",
    ),
    "encrypted": parse_uri(
        DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts},
            db_name="encrypted_djangotests",
    ),
}

DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField"
PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",)
SECRET_KEY = "django_tests_secret_key"
USE_TZ = False

This is happening in the encryption_ tests with a database router configured to use the encrypted database, but it happens before any tests are run or any routing occurs. I've confirmed that the encrypted database is created, so it appears that something needs to be done to address this issue in either our backend or PyMongo with the ideal candidate, perhaps, being a change to the MQL in the pipeline if possible.

Maybe folks can use the mixin with any Django fields we don't provide ?
Subclassing `dict` to support `queries=EqualityQuery()` API
- Move aws creds to on-demand credentials provided by libmongocrypt
  (requires `pip install pymongo[aws]`.
- Mock boto3 response
- Not sure if KMS_CREDENTIALS are being used since the tests succeed
  after they pass the boto3 mock.
- Test var cleanup
- Local provider has no configurable env setting
- Kmip provider has configurable provider env only
- Schema map in the client is for development.
- Schema map in collection creation is for production.
- Create data keys for schema map in the client.
- If a schema map is found in the client, use it.
Still leaving the assert failed in because the diff now looks like

        {
          "bsonType": "bool",
          "path": "is_active",
          "queries": {
            "queryType": "equality"
          },
          "keyId": {
            "$binary": {
-             "base64": "srXESzUzQdq5Vqapl5TqOw==",
+             "base64": "AaTpZO7vSCiDQ/zH7+dfzw==",
              "subType": "04"
            }
          }
        }

which is expected since the command is generating new data keys and
we're comparing the map to the map from the client.
@@ -588,9 +588,17 @@ def django_test_expected_failures(self):
},
}

@cached_property
def mongodb_version(self):
return self.connection.get_database_version() # e.g., (6, 3, 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, version checks by drivers should be made through the wire protocol version, not the output of commands like buildInfo (which may be unavailable in situations like e.g. using the stable/versioned API)

Via Anna Henningsen

- Server-side schemas prevent a misconfigured client from accidentally writing unencrypted data
- Client-side schemas prevent a malicious or compromised server from advertising an incorrect schema
- ip address field supported
- slug field unsupported
- TimeField
- URLField
@aclark4life aclark4life requested review from a team, Copilot, timgraham and WaVEV July 25, 2025 20:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive support for MongoDB's Queryable Encryption feature to Django MongoDB Backend. It introduces new encrypted model fields, router support, management commands, and documentation for using Queryable Encryption in Django applications.

  • Adds EncryptedModel base class and encrypted field types for sensitive data storage
  • Implements router support for directing encrypted models to encrypted databases
  • Provides management commands and helper utilities for encryption configuration

Reviewed Changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/encryption_/tests.py Comprehensive test suite covering encrypted fields, KMS credentials, and database operations
tests/encryption_/routers.py Test router for encrypted models directing them to encrypted database
tests/encryption_/models.py Test models demonstrating various encrypted field types and query configurations
tests/backend_/test_features.py Tests for queryable encryption feature detection
docs/source/topics/encrypted-models.rst Documentation explaining encrypted models usage and querying
docs/source/topics/known-issues.rst Known limitations and restrictions for Queryable Encryption
docs/source/topics/index.rst Added encrypted-models to documentation index
docs/source/releases/5.2.x.rst Release notes mentioning Queryable Encryption support
docs/source/ref/models/models.rst Documentation for EncryptedModel class
docs/source/ref/models/fields.rst Documentation for encrypted field types and unsupported fields
docs/source/ref/django-admin.rst Documentation for get_encrypted_fields_map command
docs/source/intro/configure.rst Configuration guidance for encrypted models
docs/source/index.rst Updated main documentation index
docs/source/howto/index.rst Added encryption howto guide
docs/source/howto/encryption.rst Detailed encryption configuration guide
docs/source/contents.rst Updated table of contents
docs/source/conf.py Removed root_doc configuration
django_mongodb_backend/schema.py Schema editor support for creating encrypted collections
django_mongodb_backend/routers.py Router extensions for KMS provider support
django_mongodb_backend/models.py EncryptedModel base class implementation
django_mongodb_backend/management/commands/get_encrypted_fields_map.py Management command for generating encryption schema maps
django_mongodb_backend/fields/encrypted_model.py Encrypted field implementations
django_mongodb_backend/fields/__init__.py Exports for encrypted fields
django_mongodb_backend/features.py Feature detection for queryable encryption support
django_mongodb_backend/encryption.py Helper classes and settings for encryption configuration
django_mongodb_backend/base.py Database version detection fix for encrypted connections
django_mongodb_backend/__init__.py Registration of router extensions
.evergreen/setup.sh CI setup improvements
.evergreen/run-encryption-tests.sh Encryption test runner script
.evergreen/config.yml CI configuration for encryption tests

# On Evergreen jobs, "CI" will be set, and if "CI" is set, add
# "/opt/python/Current/bin" to PATH to pick up `just` and `uv`.
if [ "${CI:-}" == "true" ]; then
PATH_EXT="opt/python/Current/bin:\$PATH"
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PATH_EXT variable is missing a leading slash. It should be PATH_EXT="/opt/python/Current/bin:\$PATH" to properly reference the absolute path.

Suggested change
PATH_EXT="opt/python/Current/bin:\$PATH"
PATH_EXT="/opt/python/Current/bin:\$PATH"

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants