Skip to content

Conversation

maurizio-lombardi
Copy link
Contributor

@maurizio-lombardi maurizio-lombardi commented Aug 14, 2025

The nvme_get_log_page() function may return an NVMe status code on failure, leaving the errno number set to zero.
The nvme_discovery_log() function wasn't
propagating this error information to its caller; it would just return NULL without setting a meaningful errno value.

the NULL pointer is interpreted as an error by nvme-cli, it then tries to print the string associated to the errno value, resulting in a funny output:

failed to get discovery log: Success

Fix this by capturing the status code returned by
nvme_get_log_page(), converting it to an errno value using nvme_status_to_errno(), and setting errno before returning.

Additionally, the error message has been improved to include the NVMe status code and the log level has been raised from LOG_INFO to LOG_ERR.

@igaw Daniel, do you have any plan to fix the error handling in libnvme/nvme-cli ? I remember we had a small chat about this some time ago, maybe at ALPSS, I don't remember.

I think this needs to be fixed, even if it results in API breakages.
In general, I think that the usage of errno should be restricted to only those functions that directly call ioctl(); IMO libnvme users like nvme-cli should never use errno.

I also think that prototypes like this:
struct nvmf_discovery_log *nvmf_get_discovery_wargs(struct nvme_get_discovery_args *args)
were a mistake.

I think it would have been better if it was like this:
int nvmf_get_discovery_wargs(struct nvme_get_discovery_args *args, struct nvme_discovery_log **log)
So it could return an integer < 0 for errors like ENOMEM, EIO etc...
and integers > 0 for nvme commands status code.
0 for success with *log pointing to the allocated discovery log.

It would have been much easier to propagate the error code to its callers

@maurizio-lombardi maurizio-lombardi force-pushed the discovery_err branch 2 times, most recently from 8d83112 to a755dbe Compare August 14, 2025 11:50
@igaw igaw added the enhancement New feature or request label Aug 22, 2025
@igaw igaw added this to the 2.0 milestone Aug 22, 2025
@igaw
Copy link
Collaborator

igaw commented Aug 22, 2025

This is v2 material and a WIP for the API change is here: https://github.com/nvme-experiments/libnvme/tree/libnvme2

src: return error codes directly nvme-experiments@e47e515

@maurizio-lombardi
Copy link
Contributor Author

@igaw while I agree that in general this is addressed by V2, I think that this particular change should be considered for libnvme 1.x as a temporary fix for the "failed to get discovery log: Success" error message that our customers are hitting now.
If you look at the diff I am not changing the API, I am just setting the errno variable in the error code path.

I am also ok with avoiding raising the log level to NVME_ERROR if you want to make the change even less intrusive.

@igaw
Copy link
Collaborator

igaw commented Aug 22, 2025

Ah sorry, I only skimmed over the patch. Let me look closer now.

@igaw igaw removed this from the 2.0 milestone Aug 22, 2025
@@ -1234,6 +1238,8 @@ static struct nvmf_discovery_log *nvme_discovery_log(

out_free_log:
free(log);
if (!errno)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC nvme_get_log_page wont set errno, when the return value is positive. So in theory errno could be set... Leave it as it is, just saying it is not 100% safe. Did I mention that errno is stupid?

Copy link
Contributor Author

@maurizio-lombardi maurizio-lombardi Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what do you mean here. Yes, in some cases nvme_get_log_page() doesn't set errno... and this is precisely the problem. The caller sees that nvme_discovery_log() returns a NULL pointer and assumes that errno is set.
This is why the "failed to get discovery log: Success" error message appears

The nvme_get_log_page() function may return an NVMe status code on failure,
leaving the errno number set to zero.
The nvme_discovery_log() function wasn't
propagating this error information to its caller; it would just return
NULL without setting a meaningful errno value.

the NULL pointer is interpreted as an error by nvme-cli, it then tries
to print the string associated to the errno value, resulting
in a funny output:

failed to get discovery log: Success

Fix this by capturing the status code returned by
nvme_get_log_page(), converting it to an errno value using
nvme_status_to_errno(), and setting errno before returning.

Additionally, the error message has been improved to include the
NVMe status code and the log level has been raised from
LOG_INFO to LOG_ERR.

Signed-off-by: Maurizio Lombardi <[email protected]>
@igaw igaw merged commit 8eace59 into linux-nvme:master Sep 3, 2025
12 checks passed
@igaw
Copy link
Collaborator

igaw commented Sep 3, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants