-
Notifications
You must be signed in to change notification settings - Fork 373
Bug 2004406 - Use multiprocessing for the MWU test version, and clean up the API code. #9102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6a41748 to
156750b
Compare
| sig_identifier = perfcompare_utils.get_sig_identifier(header, platform) | ||
| base_sig = base_signatures_map.get(sig_identifier, {}) | ||
| base_sig_id = base_sig.get("id", None) | ||
| new_sig = new_signatures_map.get(sig_identifier, {}) | ||
| new_sig_id = new_sig.get("id", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be part of the _build_common_result logic as it is repeated in _process_mann_whitney_u_version and _process_student_t_version
| base_avg_value = perfcompare_utils.get_avg(statistics_base_perf_data, header) | ||
| base_stddev = perfcompare_utils.get_stddev(statistics_base_perf_data, header) | ||
| base_median_value = perfcompare_utils.get_median(statistics_base_perf_data) | ||
| new_avg_value = perfcompare_utils.get_avg(statistics_new_perf_data, header) | ||
| new_stddev = perfcompare_utils.get_stddev(statistics_new_perf_data, header) | ||
| new_median_value = perfcompare_utils.get_median(statistics_new_perf_data) | ||
| base_stddev_pct = perfcompare_utils.get_stddev_pct(base_avg_value, base_stddev) | ||
| new_stddev_pct = perfcompare_utils.get_stddev_pct(new_avg_value, new_stddev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now, but these fields seem to me like they could be part of the _build_common_result method, in the mann-whitney logic we retrieve them for the base_standard_stats and new_standard_stats fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Actually, it could also be useful to get those out of the MWU process_stats so we could then at least provide some measures even if the comparison fails. Let me file a bug for this.
| ) | ||
|
|
||
| row_result = { | ||
| **common_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Simplify the task method by only passing the base/new stat arrays, keep the large common_result in _process_mann_whitney_u_version and merge the results there (each result could have a task index to easily merge the results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on what kind of benefits you think that may provide? It's definitely something we could do, but I think it may make the code more complex when we're handling the results with limited potential benefits.
| ) | ||
| # Process tasks in parallel using multiprocessing | ||
| workers = multiprocessing.cpu_count() | ||
| with multiprocessing.Pool(processes=workers) as pool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event that multiprocessing fails, should the exception be handled by defaulting to sequential processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. I'm not sure since we would probably hit the same failure with sequential processing? Also, there's a chance that if it fails and we then try a sequential run, that we would hit a network timeout on the perfcompare side. I'm going to run a small test to see what happens when we trigger an artificial failure though.
beatrice-acasandrei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have other suggestions besides the nit comments.
This patch adds multiprocessing capabilities for the MWU test version processing. It should provide some significant performance gains for API requests from PerfCompare. Local runs for a worst case scenario suggest improvements from 40s to 18s for a request like this one when silverman/KDE was enabled: https://treeherder.mozilla.org/api/perfcompare/results/?base_repository=try&base_revision=bddcdb7187bbbae40da023ff24315d5e499dc0a3&new_repository=try&new_revision=7a30b2c44b19a3576f8c87c56a001792d549e5e0&framework=13&no_subtests=true&replicates=true&test_version=mann-whitney-u
When silverman/KDE is disabled, we still see a pretty big improvement going from 4s without multiprocessing to ~2.5s with multiprocessing.