Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102

gmierz · 2025-12-02T15:06:08Z

This patch adds multiprocessing capabilities for the MWU test version processing. It should provide some significant performance gains for API requests from PerfCompare. Local runs for a worst case scenario suggest improvements from 40s to 18s for a request like this one when silverman/KDE was enabled: https://treeherder.mozilla.org/api/perfcompare/results/?base_repository=try&base_revision=bddcdb7187bbbae40da023ff24315d5e499dc0a3&new_repository=try&new_revision=7a30b2c44b19a3576f8c87c56a001792d549e5e0&framework=13&no_subtests=true&replicates=true&test_version=mann-whitney-u

When silverman/KDE is disabled, we still see a pretty big improvement going from 4s without multiprocessing to ~2.5s with multiprocessing.

beatrice-acasandrei · 2025-12-10T14:38:06Z

treeherder/webapp/api/performance_data.py

                sig_identifier = perfcompare_utils.get_sig_identifier(header, platform)
                base_sig = base_signatures_map.get(sig_identifier, {})
                base_sig_id = base_sig.get("id", None)
                new_sig = new_signatures_map.get(sig_identifier, {})
                new_sig_id = new_sig.get("id", None)


I think this could be part of the _build_common_result logic as it is repeated in _process_mann_whitney_u_version and _process_student_t_version

beatrice-acasandrei · 2025-12-10T14:53:49Z

treeherder/webapp/api/performance_data.py

+                base_avg_value = perfcompare_utils.get_avg(statistics_base_perf_data, header)
+                base_stddev = perfcompare_utils.get_stddev(statistics_base_perf_data, header)
+                base_median_value = perfcompare_utils.get_median(statistics_base_perf_data)
+                new_avg_value = perfcompare_utils.get_avg(statistics_new_perf_data, header)
+                new_stddev = perfcompare_utils.get_stddev(statistics_new_perf_data, header)
+                new_median_value = perfcompare_utils.get_median(statistics_new_perf_data)
+                base_stddev_pct = perfcompare_utils.get_stddev_pct(base_avg_value, base_stddev)
+                new_stddev_pct = perfcompare_utils.get_stddev_pct(new_avg_value, new_stddev)


Not for now, but these fields seem to me like they could be part of the _build_common_result method, in the mann-whitney logic we retrieve them for the base_standard_stats and new_standard_stats fields.

Good point! Actually, it could also be useful to get those out of the MWU process_stats so we could then at least provide some measures even if the comparison fails. Let me file a bug for this.

beatrice-acasandrei · 2025-12-10T16:44:47Z

treeherder/webapp/api/performance_data.py

+            )
+
+        row_result = {
+            **common_result,


Nit: Simplify the task method by only passing the base/new stat arrays, keep the large common_result in _process_mann_whitney_u_version and merge the results there (each result could have a task index to easily merge the results).

Can you elaborate on what kind of benefits you think that may provide? It's definitely something we could do, but I think it may make the code more complex when we're handling the results with limited potential benefits.

beatrice-acasandrei · 2025-12-10T16:49:50Z

treeherder/webapp/api/performance_data.py

-                    )
+        # Process tasks in parallel using multiprocessing
+        workers = multiprocessing.cpu_count()
+        with multiprocessing.Pool(processes=workers) as pool:


In the event that multiprocessing fails, should the exception be handled by defaulting to sequential processing?

Hmm, good question. I'm not sure since we would probably hit the same failure with sequential processing? Also, there's a chance that if it fails and we then try a sequential run, that we would hit a network timeout on the perfcompare side. I'm going to run a small test to see what happens when we trigger an artificial failure though.

beatrice-acasandrei

I don't have other suggestions besides the nit comments.

gmierz marked this pull request as draft December 2, 2025 15:06

gmierz added 3 commits December 5, 2025 11:15

Run new stats with multiprocessing.

b5d234f

Clean up the code to reduce duplication.

2b272f1

Add a filter for the various math warnings.

156750b

gmierz force-pushed the multiproc-stats branch from 6a41748 to 156750b Compare December 5, 2025 16:15

gmierz changed the title ~~Run new stats with multiprocessing.~~ Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the code. Dec 5, 2025

gmierz changed the title ~~Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the code.~~ Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. Dec 5, 2025

gmierz marked this pull request as ready for review December 5, 2025 16:25

gmierz requested a review from beatrice-acasandrei December 5, 2025 16:25

beatrice-acasandrei reviewed Dec 10, 2025

View reviewed changes

beatrice-acasandrei approved these changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102

Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102

Uh oh!

gmierz commented Dec 2, 2025 •

edited

Loading

Uh oh!

beatrice-acasandrei Dec 10, 2025

Uh oh!

beatrice-acasandrei Dec 10, 2025

Uh oh!

gmierz Dec 10, 2025

Uh oh!

beatrice-acasandrei Dec 10, 2025

Uh oh!

gmierz Dec 11, 2025 •

edited

Loading

Uh oh!

beatrice-acasandrei Dec 10, 2025

Uh oh!

gmierz Dec 10, 2025

Uh oh!

beatrice-acasandrei left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102

Are you sure you want to change the base?

Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102

Uh oh!

Conversation

gmierz commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beatrice-acasandrei Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

beatrice-acasandrei Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gmierz Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

beatrice-acasandrei Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gmierz Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beatrice-acasandrei Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gmierz Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

beatrice-acasandrei left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gmierz commented Dec 2, 2025 •

edited

Loading

gmierz Dec 11, 2025 •

edited

Loading