-
Notifications
You must be signed in to change notification settings - Fork 26
INTPYTHON-527 Add Queryable Encryption support #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
89205af
to
293656a
Compare
tests/encryption_/tests.py
Outdated
# FIXME: Or remove if wontfix. | ||
# | ||
# This test fails due to | ||
# pymongo.errors.OperationFailure: Index not allowed on, or a prefix | ||
# of, the encrypted field slug | ||
with self.assertRaises(AssertionError): # noqa: SIM117 | ||
with self.assertRaises(pymongo.errors.OperationFailure): | ||
|
||
class SlugFieldTest(models.Model): | ||
slug = EncryptedSlugField(EqualityQuery()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the issue is that SlugField
has db_index=True
. Slugs are generally used in URLs and it seems they would generally not be sensitive data that needs to be encrypted.
The limitation that encrypted fields can't be indexed seems a point worth documenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they can't be indexed or if they are auto indexed. The docs say
Queryable Encryption does not support TTL Indexes or Unique Indexes.
f0d4e92
to
4ef6b84
Compare
3815a70
to
a6a12e7
Compare
The encryption tests are passing locally for me on Enterprise and on the Atlas VM. On GitHub actions, this first issue was solved by adding
But this issue remains:
|
tests/encryption_/tests.py
Outdated
self.assertEqual( | ||
PatientRecord.objects.get(ssn="123-45-6789").profile_picture, b"image data" | ||
) | ||
with self.assertRaises(AssertionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thinking about the usefulness of this assertion? If assertEqual(profile_picture, b"image data")
passed, then of course asserting it's not equal to something else is going to work? (Incidentally, assertNotEqual()
is more natural than assertEqual() + assertRaises()
.)
More generally, it seems like you weren't sure exactly what to test here, so you wrote various things that came to mind. Maybe we need to define the test conditions so we can have some more standardized testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're relying on the encryption algorithm to detect changes I wanted to see it pass and fail, and yes the plan was to add fields to the patient-themed test suite, expanding on the tutorial example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're relying on the encryption algorithm to detect changes
I don't understand.
tests/encryption_/tests.py
Outdated
# FIXME: pymongo.errors.EncryptionError: Cannot encrypt element of type int | ||
# because schema requires that type is one of: [ long ] | ||
# pos_int=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data type for each field is determined by this mapping:
django-mongodb-backend/django_mongodb_backend/base.py
Lines 36 to 63 in d5aa1a7
data_types = { | |
"AutoField": "int", | |
"BigAutoField": "long", | |
"BinaryField": "binData", | |
"BooleanField": "bool", | |
"CharField": "string", | |
"DateField": "date", | |
"DateTimeField": "date", | |
"DecimalField": "decimal", | |
"DurationField": "long", | |
"FileField": "string", | |
"FilePathField": "string", | |
"FloatField": "double", | |
"IntegerField": "int", | |
"BigIntegerField": "long", | |
"GenericIPAddressField": "string", | |
"JSONField": "object", | |
"OneToOneField": "int", | |
"PositiveBigIntegerField": "int", | |
"PositiveIntegerField": "long", | |
"PositiveSmallIntegerField": "int", | |
"SlugField": "string", | |
"SmallAutoField": "int", | |
"SmallIntegerField": "int", | |
"TextField": "string", | |
"TimeField": "date", | |
"UUIDField": "string", | |
} |
It seems the mapping has some mistakes. For example, PositiveBigIntegerField
should be long
(64-bit) [I think]. That said, this is an issue that should be correct in a separate PR.
The question remains how to send the value to the database as a long to avoid this error. Frankly, I wouldn't expect any special handling to be needed, but maybe Jib has some idea. (Was this already discussed when you ran into the error for DurationField?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has not been discussed and now that you mention it, DurationField
may have been similar.
6ab0a86
to
fb1e120
Compare
docs/source/faq.rst
Outdated
"OPTIONS": { | ||
"auto_encryption_opts": AutoEncryptionOpts( | ||
… | ||
schema_map= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very strongly recommend not using the phrase "schema_map" in the context of QE.
"schema_map" and "encrypted_fields_map" are both examples of existing terminology that we already use, with both playing the role of referring to a map that declares which fields should be encrypted and how.
But "schema_map" is specific to CSFLE and "encrypted_fields_map" specific to QE, and calling this "schema_map" feels like it's bound to:
- make people believe they're using CSFLE instead of QE and/or
- make our support staff believe they're using CSFLE instead of QE and/or
- will lead to users trying to pass it in the wrong place (i.e. as the "schema_map" auto-encryption option of PyMongo, when they really should be passing it as the "encrypted_fields_map" option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anna, thanks for the clarification. The differences between CSFLE and QE have really been confusing for us. The documentation for AutoEncryptionOpts
describes it as "Automatic Client-Side Field Level Encryption" but apparently it's for Queryable Encryption too...
One point perhaps you can clarify. Do we need to specify keyId
in the encrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includes keyId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point perhaps you can clarify. Do we need to specify
keyId
in theencrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includeskeyId
.
I believe I have clarified this with @addaleax, but never hurts to hear it again! The manual tests I've done prove we need keyId for client and ClientEncryption.create_encrypted_collection
creates them on the server side.
@addaleax what do recommend instead of schema_map
, schema
maybe ? qe_schema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it supposed to be AutoEncryptionOpts(encrypted_fields_map=...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for
AutoEncryptionOpts
describes it as "Automatic Client-Side Field Level Encryption" but apparently it's for Queryable Encryption too...
Yeah, the naming of pretty much everything in this area is slightly unfortunate in general. Since CSFLE predates QE but a lot of the technology stack and the configuration is shared, you'll unfortunately find references to CSFLE in docs for features of the QE stack as well. In theory, we've decided to adopt "In-Use Encryption" as the umbrella term for both CSFLE and QE, but I wouldn't say that it has really caught on.
what do recommend instead of
schema_map
,schema
maybe ?qe_schema
?
If this maps to the encrypted_fields_map
option in AutoEncryptionOptions
, which it appears to do, yes, I'd definitely recommend sticking to that name here too (i.e. encrypted_fields_map
).
One point perhaps you can clarify. Do we need to specify
keyId
in theencrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includeskeyId
.
Yeah, another big "sigh" to let out here 😅 If you're using the driver's create_encrypted_collection
helper and you're using automatic encryption, then yes, the driver will create key IDs for you. That sounds simple in theory, but is still a bit of a hassle in practice, because you do need to persist the resulting encrypted_fields
configuration which includes key IDs if you intend to use it as part of a client-side encrypted_fields
map (which is a good practice).
You're never going to do anything wrong by just creating keys yourself here. The key creation feature of create_encrypted_collection
is purely a convenience feature, and ultimately when the application runs, it will have to have the correct key IDs available. That can come from the server-side encryptedFields
map, but if you have a client-side encrypted_fields_map
, then that will also need to include the right key IDs.
(This was supposed to help with the fact that you have a bit of a chicken-and-egg situation when setting up QE manually; you need a QE-enabled MongoClient to create keys for an encrypted collection, but you can only specify the correct key IDs after creating them, so you can't start out with the right encryptedFieldsMap
for that initial MongoClient because it does require those…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not something that will be well-received by Djangonauts and at the very least we'll need to:
Document a workaround (export/import ? )
Give an ETA on when the next QE release will address the issue (assuming this is possible)
Yeah, re-creating collections and migrating data would be the primary workaround. First-class migration support in mongosync is part of https://jira.mongodb.org/browse/REP-3483, but I don't know when this would realistically be scheduled (you may want to check in #fle-qe-devs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the most secure solution, but from a user usability perspective, I think the easiest thing would be to keep the existing workflow and have showschemamap
retrieve the keyId
s from the server so it can include them in its output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering about the workflow for a dev/prod environment. For example, user commits AutoEncryptionOpts(encrypted_fields_map=...)
with the keyIds for their local environment. Will this break if they use the same settings in production, i.e. will create_encrypted_collection()
use keyIds
from encrypted_fields_map
?
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. will
create_encrypted_collection()
usekeyIds
fromencrypted_fields_map
?
Yes. create_encrypted_collection()
will only create new keys if the encrypted_fields_map
doesn't already provide keys for all fields.
I'm also wondering about the workflow for a dev/prod environment. For example, user commits
AutoEncryptionOpts(encrypted_fields_map=...)
with the keyIds for their local environment.
Yes, this is one of the pains of bootstrapping CSFLE/QE applications. I'd typically consider client-side schemas to be configuration data that shouldn't be committed to the mainline repository for this reason, but I am sure there are developers who do it without that necessarily being a wrong path.
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
Yes, you may want to verify whether this is the case or not. Client-side schemas are something that all our tools support and there are relevant security reasons for doing so (in a similar vein to what I already mentioned above, client-side schemas protect against compromised database servers which advertise incorrect schemas) – but they're still not necessarily part of every setup.
I'd also encourage you to reach out in #fle-qe-devs – otherwise I'm also happy to be the one to start a discussion there. While I have a deep understanding of the technical aspects of QE and CSFLE, the PMs for QE are more familiar with what customers do in the real world and what the best practices are that we recommend (for example, I know that client-side schemas are something that isn't considered a necessity for every use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
We can consider leaving out client side for GA, however I'm inclined to go for it since we want to present as complete and compelling of a feature as we can given the inherent limitations and we've made a lot of progress in understanding the requirements.
I'm rethinking the design now, as well as confirming what PyMongo does for us re: data keys when we call AutoEncryptionOpts
with encrypted_fields_map
instead of schema_map
.
I'll have some pushes coming later tonight and/or tomorrow morning in which I hope to resolve a significant amount of the issues raised today. Thanks @timgraham and @addaleax !
- Client-side QE configuration mistakenly used `schema_map` to pass the encrypted fields map to Django's schema editor through `AutoEncryptionOpts`. Although confusing, and despite the error, client-side configuration still succeeded because the map given to `AutoEncryptionOpts` in `schema_map` was then correctly passed to `create_collection` via the `encryptedFields` arg. - Re-confirmed in local manual testing that client-side configuration works as expected and requires data keys. There is no code (as far as I can tell) to create data keys in PyMongo that is initiated by the existence of `encrypted_fields_map` alone. Rather, data keys appear to be created in `create_encrypted_collection` and only in `create_encrypted_collection`. - Renamed `showschemamap` -> `showfieldsmap` and updated tests and docs accordingly.
- `showfieldsmap` needs to be renamed to `createfieldsmap` and we need another management command called `showfieldsmap` to retrieve the keys from the key vault (either from server side or client side generation.) - In order to retrieve keys from the vault they need to have a keyVaultName and PyMongo's `create_data_keys` does not provide this. - If `encrypted_fields_map` is present, use it to create the collection, else create the collection with `create_collection` and our `encrypted_fields_map`. - We may consider requiring users to provide an empty dictionary to initiate server side encryption rather than relying on the absence of `encrypted_fields_map`. PyMongo uses this convention in its KMS code and in this case it emphasizes the need for a map and when and how the map is created.
data_key = ce.create_data_key( | ||
kms_provider=kms_provider, | ||
master_key=master_key, | ||
key_alt_names=[key_alt_name], | ||
) | ||
field["keyId"] = data_key | ||
field["keyAltName"] = key_alt_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should keyId
and keyAltName
be generated in _get_encrypted_fields_map()
rather than in both places after _get_encrypted_fields_map()
is called? It looks like a lot of repeated logic. Is the management command but not _create_collection()
supposed to set field["keyAltName"]
?
@@ -0,0 +1,52 @@ | |||
from bson import json_util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not showencryptedfieldsmap
for the command name?
django_mongodb_backend/schema.py
Outdated
ae = getattr(options, "auto_encryption_opts", None) | ||
if not ae: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two letter variable names making a comeback? 👿
docs/source/faq.rst
Outdated
|
||
In addition to the | ||
:ref:`settings described in the how-to guide <server-side-queryable-encryption-settings>`, | ||
you will need to provide a ``schema_map`` to the ``AutoEncryptionOpts``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still several mentions of schema_map through this PR.
"ObjectIdAutoField", | ||
"ObjectIdField", | ||
"PolymorphicEmbeddedModelArrayField", | ||
"PolymorphicEmbeddedModelField", | ||
"RangeQuery", | ||
"has_encrypted_fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_encrypted_fields
looks strange to me here. django.db.models.fields
doesn't provide any ancillary functions like this. Is it meant to be a public API? I'm not sure if users would need it or not. Anyway, I'm thinking something like django_mongod_backend.models.utils.model_has_encrypted_field()
might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to model_utils
because model/utils
sent me down a circular imports spiral …
for app_config in apps.get_app_configs(): | ||
for model in app_config.get_models(): | ||
db_table = model._meta.db_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've lost track of your comment on why router.get_migratable_models()
can no longer be used. If we don't consult the database routers, we'll incorrectly include models that excluded from the specified --database
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was guessing at the time that it had something to do with no longer having EncryptedModel
but not sure.
- Remove query type helpers - Remove KMS AWS patches - Refactor encrypted fields map creation - Remove unused test code - Doc fixes and updates - Move has_encrypted_fields to model_utils - Renamed showfieldsmap -> createencryptedfieldsmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Previous attempts and additional context here: