fix #1806: N+1 query issue on dashboard index page #1813

ontowhee · 2024-12-09T15:45:34Z

The existing query was looping on metrics to grab the latest Datum for each metric, which led to 18 similar queries reported by django-debug-toolbar. There were 45 sql queries overall for the page.

This is WIP. The change here reduces the number of queries down to 1 large statement of unions of each metric's latest datum. There are 9 sql queries overall for the page.

Changes include:

Query on the content type for each metric, which is used to build the union.
Query on Datum by building a union of all the metric querysets.
Adding a select_related('category') for the metric query that is used for sorting based on display_position.

for more information, see https://pre-commit.ci

ontowhee · 2024-12-10T18:57:52Z

dashboard/views.py

+            metrics.extend(
+                MC.objects.filter(show_on_dashboard=True).select_related("category")
+            )
+
+        content_types = ContentType.objects.get_for_models(*metrics)
+        datum_queryset = Datum.objects.none()
+        for metric, content_type in content_types.items():
+            datum_queryset = datum_queryset.union(
+                Datum.objects.filter(
+                    content_type_id=content_type.id, object_id=metric.id
+                ).order_by("-timestamp")[0:1]
+            )
+
+        latest_datums = {
+            (datum.object_id, datum.content_type_id): datum for datum in datum_queryset
+        }

        data = []
-        for metric in metrics:
-            data.append({"metric": metric, "latest": metric.data.latest()})
+        for metric, content_type in content_types.items():
+            if latest := latest_datums.get((metric.id, content_type.id)):
+                data.append({"metric": metric, "latest": latest})


Is there a better way to write this?

I've been trying to find an alternate approach (got pretty close with a Datum.objects.values('content_type', 'object_id').annotate(Max('timestamp')) but unfortunately that can only get the latest timestamp for each metric, not the associated measurement).

Your original approach using UNION seems quite good (and it's quite clever).

I've made a suggestion for a slight rewrite that is a bit shorter and also closer to the original view implementation (it does so by pushing some logic onto the Datum manager). If you like it I can push a commit onto this branch.

Sounds good! It's very interesting how you've moved most of the logic to DatumQuerySet!

The only concern is that latest = latest_by_metric.get((metric.content_type.pk, metric.pk)) could return None. Will the frontend be able to handle rendering of None?

Well, the current behavior for a metric with no data is for the view to crash, so it can't be worse than that 😁
(good call though, I'll test it out)

ontowhee · 2024-12-10T18:58:28Z

dashboard/tests.py

@@ -33,10 +33,12 @@ def setUp(self):
    def test_index(self):
        for MC in Metric.__subclasses__():
            for metric in MC.objects.filter(show_on_dashboard=True):
+                metric.data.create(measurement=44)


Adding extra data to ensure the query will retrieve the latest datum for each metric.

bmispelon · 2024-12-12T20:22:27Z

@ontowhee It took a few tries, but I think I finally landed on something that I think I like. I moved a lot of the logic onto manager methods so I could simplify the view (and it also simplified the N+1 tests sinceI could test directly against the manager method rather thant involve the view).

I managed to switch to a subquery instead of a union which I think simplifies the code, but I don't know if the performance is better/worse/equivalent. Do you know anything about that?

I also added an explicit last_updated into the template to deal with metrics that have no data. That's not really supposed to happen in production, but I did run into a crash a few times when I was testing things locally.

Let me know what you think, if you're OK with it I'll merge all commits on this branch into one, with both of us as co-authors.

ontowhee · 2024-12-13T07:57:36Z

For commit 8fe1e01, the overall performance was 110ms.

For commit 2288df6, using the subquery seems to be worse for TracTicketMetric. It took 186ms to query that table, and overall it took 224ms. Will this performance be noticeable by the end user or the server?

Django debug toolbar also identifies "4 similar" queries on ContentType due to calling .for_dashboard() in the loop. Will Sentry flag these 4 similar queries as an N+1 issue?

bmispelon · 2024-12-13T12:09:45Z

Thanks for doing the measurements! The difference between 110 and 186ms doesn't seem so bad to me, but I would like to know your opinion too.

Django debug toolbar also identifies "4 similar" queries on ContentType due to calling .for_dashboard() in the loop. Will Sentry flag these 4 similar queries as an N+1 issue?
Does that warning persist after a page refresh? Django should cache the content type globally (across requests), so I would expect those queries in the loop to happen only on the first load, but not once the cache has been populated.

ontowhee · 2024-12-13T15:13:44Z

The difference between 110 and 186ms doesn't seem so bad to me, but I would like to know your opinion too.

It doesn't seem too bad to me either. I wasn't sure how similar the local database was to production. I noticed that the results are returned in much less time in production too.

Django should cache the content type globally (across requests), so I would expect those queries in the loop to happen only on the first load, but not once the cache has been populated.

You're right, on first load it shows the warning, but on refresh, the content type queries are cached along with all the other queries. (0 queries!)

I think this is good to merge! Thanks so much for refactoring this. I learned so much from your code!!

ontowhee · 2024-12-14T20:55:17Z

I was playing around to understand the queries better. Here's another approach that queries on the Datum model. It uses .distinct() and .order_by() to get the latest measurement for each metric, and it uses prefetch_related() to fetch the metric. (I didn't know prefetch_related could be called on generic foreign keys.) This avoids making subqueries, and it also avoids building the unions or Q expressions that were done in the previous approaches.

Not asking for any changes here. I think your approach of querying on the Metric is easier to understand, and it’s already an improvement from the current dashboard!

ontowhee and others added 4 commits December 9, 2024 07:29

wip optimize queries for dashboard index page

eab41bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

785a17a

for more information, see https://pre-commit.ci

Update test to check num queries

8fe1e01

[pre-commit.ci] auto fixes from pre-commit.com hooks

f69cc41

for more information, see https://pre-commit.ci

ontowhee commented Dec 10, 2024

View reviewed changes

ontowhee marked this pull request as ready for review December 10, 2024 18:58

bmispelon added 3 commits December 10, 2024 20:43

Alternate approach

14e2b13

Alternate alternate approach 🙃

ce82b7a

Alternate alternate alternate approach (last one I swear)

2288df6

bmispelon force-pushed the fix-1806-dashboard-n-plus-1-query branch from a529fc0 to 2288df6 Compare December 11, 2024 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #1806: N+1 query issue on dashboard index page #1813

fix #1806: N+1 query issue on dashboard index page #1813

ontowhee commented Dec 9, 2024 •

edited

Loading

ontowhee Dec 10, 2024

bmispelon Dec 10, 2024

ontowhee Dec 10, 2024

bmispelon Dec 10, 2024

ontowhee Dec 10, 2024

bmispelon commented Dec 12, 2024

ontowhee commented Dec 13, 2024

bmispelon commented Dec 13, 2024

ontowhee commented Dec 13, 2024

ontowhee commented Dec 14, 2024

fix #1806: N+1 query issue on dashboard index page #1813

Are you sure you want to change the base?

fix #1806: N+1 query issue on dashboard index page #1813

Conversation

ontowhee commented Dec 9, 2024 • edited Loading

ontowhee Dec 10, 2024

Choose a reason for hiding this comment

bmispelon Dec 10, 2024

Choose a reason for hiding this comment

ontowhee Dec 10, 2024

Choose a reason for hiding this comment

bmispelon Dec 10, 2024

Choose a reason for hiding this comment

ontowhee Dec 10, 2024

Choose a reason for hiding this comment

bmispelon commented Dec 12, 2024

ontowhee commented Dec 13, 2024

bmispelon commented Dec 13, 2024

ontowhee commented Dec 13, 2024

ontowhee commented Dec 14, 2024

ontowhee commented Dec 9, 2024 •

edited

Loading