Slow joins for ManyToMany fields #309

skonstant · 2025-05-31T23:07:45Z

I am running into a tricky issue, the joins are slow for my ManyToMany fields.

observation.subscopes.all()

Generates this:

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$match': { '$expr': { '$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')] } } }, { '$facet': { 'group': [{ '$group': { '__count': { '$sum': { '$cond': { 'if': { '$in': [{ '$type': { '$literal': True } }, ['missing', 'null'] ] }, 'then': None, 'else': 1 } } }, '_id': None } }] } }, { '$addFields': { '__count': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '__count' } }, '_id': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '_id' } } } }, { '$project': { '__count': { '$ifNull': ['$__count', { '$literal': 0 }] } } }]

And this takes 3 seconds because the "pivot table" is very large, even though the join is on a specific "observation id", a field that is indexed.

If I move the match inside the lookup, the same qery takes milliseconds.

[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] },{ '$eq': ['$observation_id', ObjectId('6836e1718c5bb0a20fab6467')] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$limit': 21 }]

I took a quick look at the code that generates the lookup and at first sight I was not clear where to make the change, I'll dig a bit deeper but perhaps you already know where to look to implement the fix.

This is particularly penalizing, as we use a few foreign keys and many to manys on fairly large collections, and would like to keep it this way.

It looks like the fix in the query is simple, it is probably much more complex in the generator.

The text was updated successfully, but these errors were encountered:

skonstant · 2025-06-01T18:36:28Z

I wrote a little patch that places the eq on the join column in the lookup pipeline here

skonstant · 2025-06-01T18:40:31Z

Subject: [PATCH] add join where eq to lookup
---
Index: django_mongodb_backend/query.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/django_mongodb_backend/query.py b/django_mongodb_backend/query.py
--- a/django_mongodb_backend/query.py	(revision c5bfebfb8a18b3e130826bdaf4ea58b80150051c)
+++ b/django_mongodb_backend/query.py	(date 1748801564404)
@@ -172,6 +172,16 @@
     else:
         extra_condition = []
 
+        where = compiler.get_where()
+        try:
+            expr = where.as_mql(compiler, connection) if where else {}
+        except FullResultSet:
+            expr = None
+
+        if expr and "$eq" in expr and len(expr["$eq"]) > 0 and expr["$eq"][0].startswith("$" + self.table_alias):
+            expr["$eq"][0] = "$" + expr["$eq"][0].removeprefix("$" + self.table_alias + ".")
+            extra_condition.append(expr)
+
     lookup_pipeline = [
         {
             "$lookup": {

WaVEV · 2025-06-01T21:35:00Z

Hi Stephane, thanks for reaching out. Just to clarify, the improvements came from splitting the match in two parts, right?
EDIT: Could you share the code that makes this query? it seems that the first one is count and the second is a listing with a limit.

skonstant · 2025-06-02T06:34:04Z

yes, the first one is a count, here it is again:

The subscope field i the observation model looks like this:

subscopes = models.ManyToManyField(Subscope)

Then I run this, just to expand the subscopes queryset:

o = Observation.objects.all()[0]
list(o.subscopes.all())

And without my patch I get this log trace with DB logging set to debug:

(0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}])
(6.261) db.orm_subscope.aggregate([{'$lookup': {'from': 'orm_observation_subscopes', 'let': {'parent__field__0': '$_id'}, 'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0', '$subscope_id']}]}}}], 'as': 'orm_observation_subscopes'}}, {'$unwind': '$orm_observation_subscopes'}, {'$match': {'$expr': {'$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}}}])

And with my patch:

(0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}])
(0.077) db.orm_subscope.aggregate([{'$lookup': {'from': 'orm_observation_subscopes', 'let': {'parent__field__0': '$_id'}, 'pipeline': [{'$match': {'$expr': {'$and': [{'$eq': ['$$parent__field__0', '$subscope_id']}, {'$eq': ['$observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}]}}}], 'as': 'orm_observation_subscopes'}}, {'$unwind': '$orm_observation_subscopes'}, {'$match': {'$expr': {'$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')]}}}])

aclark4life · 2025-06-02T13:20:02Z

Thanks for the feedback, @skonstant ! Much appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow joins for ManyToMany fields #309

Slow joins for ManyToMany fields #309

skonstant commented May 31, 2025

skonstant commented Jun 1, 2025

Uh oh!

skonstant commented Jun 1, 2025

Uh oh!

WaVEV commented Jun 1, 2025 •

edited

Loading

Uh oh!

skonstant commented Jun 2, 2025

Uh oh!

aclark4life commented Jun 2, 2025

Uh oh!

Slow joins for ManyToMany fields #309

Slow joins for ManyToMany fields #309

Comments

skonstant commented May 31, 2025

skonstant commented Jun 1, 2025

Uh oh!

skonstant commented Jun 1, 2025

Uh oh!

WaVEV commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skonstant commented Jun 2, 2025

Uh oh!

aclark4life commented Jun 2, 2025

Uh oh!

WaVEV commented Jun 1, 2025 •

edited

Loading