Skip to content

Slow joins for ManyToMany fields #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skonstant opened this issue May 31, 2025 · 5 comments
Open

Slow joins for ManyToMany fields #309

skonstant opened this issue May 31, 2025 · 5 comments

Comments

@skonstant
Copy link

I am running into a tricky issue, the joins are slow for my ManyToMany fields.

observation.subscopes.all()

Generates this:

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$match': { '$expr': { '$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')] } } }, { '$facet': { 'group': [{ '$group': { '__count': { '$sum': { '$cond': { 'if': { '$in': [{ '$type': { '$literal': True } }, ['missing', 'null'] ] }, 'then': None, 'else': 1 } } }, '_id': None } }] } }, { '$addFields': { '__count': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '__count' } }, '_id': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '_id' } } } }, { '$project': { '__count': { '$ifNull': ['$__count', { '$literal': 0 }] } } }]

And this takes 3 seconds because the "pivot table" is very large, even though the join is on a specific "observation id", a field that is indexed.

If I move the match inside the lookup, the same qery takes milliseconds.

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] },{ '$eq': ['$observation_id', ObjectId('6836e1718c5bb0a20fab6467')] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$limit': 21 }]

I took a quick look at the code that generates the lookup and at first sight I was not clear where to make the change, I'll dig a bit deeper but perhaps you already know where to look to implement the fix.

This is particularly penalizing, as we use a few foreign keys and many to manys on fairly large collections, and would like to keep it this way.

It looks like the fix in the query is simple, it is probably much more complex in the generator.

@skonstant
Copy link
Author

I wrote a little patch that places the eq on the join column in the lookup pipeline here

@skonstant
Copy link
Author

Subject: [PATCH] add join where eq to lookup
---
Index: django_mongodb_backend/query.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/django_mongodb_backend/query.py b/django_mongodb_backend/query.py
--- a/django_mongodb_backend/query.py	(revision c5bfebfb8a18b3e130826bdaf4ea58b80150051c)
+++ b/django_mongodb_backend/query.py	(date 1748801564404)
@@ -172,6 +172,16 @@
     else:
         extra_condition = []
 
+        where = compiler.get_where()
+        try:
+            expr = where.as_mql(compiler, connection) if where else {}
+        except FullResultSet:
+            expr = None
+
+        if expr and "$eq" in expr and len(expr["$eq"]) > 0 and expr["$eq"][0].startswith("$" + self.table_alias):
+            expr["$eq"][0] = "$" + expr["$eq"][0].removeprefix("$" + self.table_alias + ".")
+            extra_condition.append(expr)
+
     lookup_pipeline = [
         {
             "$lookup": {

@WaVEV
Copy link
Collaborator

WaVEV commented Jun 1, 2025

Hi Stephane, thanks for reaching out. Just to clarify, the improvements came from splitting the match in two parts, right?
EDIT: Could you share the code that makes this query? it seems that the first one is count and the second is a listing with a limit.

@skonstant
Copy link
Author

yes, the first one is a count, here it is again:

The subscope field i the observation model looks like this:

subscopes = models.ManyToManyField(Subscope)

Then I run this, just to expand the subscopes queryset:

o = Observation.objects.all()[0]
list(o.subscopes.all())

And without my patch I get this log trace with DB logging set to debug:

(0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}])
(6.261) db.orm_subscope.aggregate([{'$lookup': {'from': 'orm_observation_subscopes', 'let': {'parent__field__0': '$_id'}, 'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0', '$subscope_id']}]}}}], 'as': 'orm_observation_subscopes'}}, {'$unwind': '$orm_observation_subscopes'}, {'$match': {'$expr': {'$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}}}])

And with my patch:

(0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}])
(0.077) db.orm_subscope.aggregate([{'$lookup': {'from': 'orm_observation_subscopes', 'let': {'parent__field__0': '$_id'}, 'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0', '$subscope_id']}, {'$eq': ['$observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}]}}}], 'as': 'orm_observation_subscopes'}}, {'$unwind': '$orm_observation_subscopes'}, {'$match': {'$expr': {'$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}}}])

@aclark4life
Copy link
Collaborator

Thanks for the feedback, @skonstant ! Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants