-
Notifications
You must be signed in to change notification settings - Fork 20
Slow joins for ManyToMany fields #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wrote a little patch that places the eq on the join column in the lookup pipeline here |
|
Hi Stephane, thanks for reaching out. Just to clarify, the improvements came from splitting the match in two parts, right? |
yes, the first one is a count, here it is again: The subscope field i the observation model looks like this:
Then I run this, just to expand the subscopes queryset:
And without my patch I get this log trace with DB logging set to debug: (0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}]) And with my patch: (0.079) db.orm_observation.aggregate([{'$match': {'$expr': {}}}, {'$limit': 1}]) |
Thanks for the feedback, @skonstant ! Much appreciated. |
I am running into a tricky issue, the joins are slow for my ManyToMany fields.
observation.subscopes.all()
Generates this:
[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$match': { '$expr': { '$eq': ['$orm_observation_subscopes.observation_id', ObjectId('6836e1278c5bb0a20fa1aed1')] } } }, { '$facet': { 'group': [{ '$group': { '__count': { '$sum': { '$cond': { 'if': { '$in': [{ '$type': { '$literal': True } }, ['missing', 'null'] ] }, 'then': None, 'else': 1 } } }, '_id': None } }] } }, { '$addFields': { '__count': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '__count' } }, '_id': { '$getField': { 'input': { '$arrayElemAt': ['$group', 0] }, 'field': '_id' } } } }, { '$project': { '__count': { '$ifNull': ['$__count', { '$literal': 0 }] } } }]
And this takes 3 seconds because the "pivot table" is very large, even though the join is on a specific "observation id", a field that is indexed.
If I move the match inside the lookup, the same qery takes milliseconds.
[{ '$lookup': { 'from': 'orm_observation_subscopes', 'let': { 'parent__field__0': '$_id' }, 'pipeline': [{ '$match': { '$expr': { '$and': [{ '$eq': ['$$parent__field__0', '$subscope_id'] },{ '$eq': ['$observation_id', ObjectId('6836e1718c5bb0a20fab6467')] }] } } }], 'as': 'orm_observation_subscopes' } }, { '$unwind': '$orm_observation_subscopes' }, { '$limit': 21 }]
I took a quick look at the code that generates the lookup and at first sight I was not clear where to make the change, I'll dig a bit deeper but perhaps you already know where to look to implement the fix.
This is particularly penalizing, as we use a few foreign keys and many to manys on fairly large collections, and would like to keep it this way.
It looks like the fix in the query is simple, it is probably much more complex in the generator.
The text was updated successfully, but these errors were encountered: