-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-18431 bio: Set power management register on NVMe #17355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Tom Nabarro <[email protected]>
|
Ticket title is 'NVMe power control feature' |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17355/1/testReport/ |
5e0062d to
0be43f7
Compare
Signed-off-by: Tom Nabarro <[email protected]>
0be43f7 to
8b31c49
Compare
kjacque
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks reasonable. Just trying to understand the right way to interact with the spdk stuff.
If we're letting users configure this generally, it might be good to add an explicit parameter to the config file with more intuitively named values, based on whatever the 0-4 stand for. Not necessary in this PR, but something to think about.
| memset(&cmd, 0, sizeof(cmd)); | ||
| cmd.opc = SPDK_NVME_OPC_SET_FEATURES; | ||
| cmd.cdw10 = SPDK_NVME_FEAT_POWER_MANAGEMENT; | ||
| cmd.cdw11 = bio_spdk_power_mgmt_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
union spdk_nvme_cmd_cdw11 has a member union spdk_nvme_feat_power_management feat_power_management that has multiple fields, not just the power state. So I think we should be using that to make sure we're setting the value correctly.
IMO we should also check the value taken from the env variable is in an acceptable range, preferably when we first ingest it. I had some trouble finding definitions for the power state values. Does SPDK define those somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wait for @Michael-Hennecke to verify the PR. this is more of an enablement change that will unblock some testing. Once verified we can look at tidying things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we at least need to check the value. I was wondering about what would happen if we set the env variable to some value larger than 5 bits, potentially overflowing into the other bit field ("workload hint"). Here's the struct definition I found: https://spdk.io/doc/unionspdk__nvme__feat__power__management.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Michael-Hennecke thoughts on acceptable values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting the value to <= 0x1F to avoid overflowing the workload hint bits may prevent some experimentation capability. As an environment variable capability I would like to leave this unfiltered for the moment with the understanding that it should be used by someone who understands the values to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there someone ready to perform that experiment and see what happens? I would prefer to know what the worst case scenario consequences are before signing off, but if someone can verify it's harmless to toy with these fields with strange/invalid input ASAP, I'm fine with landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Michael-Hennecke can you please comment with regard to @kjacque 's concerns?
Signed-off-by: Tom Nabarro <[email protected]>
src/bio/bio_monitor.c
Outdated
| bb->bb_dev_health.bdh_io_channel = channel; | ||
|
|
||
| /* Set NVMe power management */ | ||
| rc = bio_set_power_mgmt(bb->bb_dev, channel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want this being skipped when health monitor is disabled? (see bypass_health_collect() at the beginning of this function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it okay to move this function and the channel fetch to before the bypass check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this function uses the io_channel created by bio_init_health_monitoring().
src/bio/bio_device.c
Outdated
| cmd.cdw10 = SPDK_NVME_FEAT_POWER_MANAGEMENT; | ||
| cmd.cdw11 = bio_spdk_power_mgmt_val; | ||
|
|
||
| rc = spdk_bdev_nvme_admin_passthru(d_bdev->bb_desc, channel, &cmd, NULL, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses bdh_io_channel but doesn't hold reference count, that could cause issue when bio_dev_health being finalized before the completion. See bio_fini_health_monitoring().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decoupled implementation from health monitoring, is it better now?
Signed-off-by: Tom Nabarro <[email protected]>
cde98e1 to
2b38adf
Compare
Set NVMe power management values for SSDs by setting the new
engine DAOS_NVME_POWER_MGMT environment variable to an integer
(sets register bits 0-4). Value will be applied by SPDK on devices
attached to an engine process. The value will not be reset on engine
exit.
Steps for the author:
After all prior steps are complete: