-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177
Comments
isn't NVMeNamespaceCreate same as Attach ? we can rename it if that helps... |
I opened #178 as well for enable/disable NS... |
Not quite. When a namespace is created it isn't attached to a controller. We're providing api to create subsystem/namespace/controllers etc.. so it's not like backend that you get this information and attachement information as well. Having said that, looking at your additional issue with enable/disable - maybe it's somehow related and provides another way to make this attachment. in enable/disable are you referring to enable a namespace for a specific controller ? |
yes |
A controller is a "view" or a "session" between a host and a subsystem. A subsystem contains the namespaces. I don't think it makes any sense in the NVMe model to attach a namespace to a controller. Can you outline the use case here? Is this for private namespaces? Or is this some dual port device case where the PCI endpoints see different namespaces? Also, it does make sense for NVMe over fabrics to set up visibility rules based on the hostnqn via a call we don't have in the API yet. But I don't think that's what this issue is talking about. |
if we have 1 subsystem with 1 namespace and 10 controllers each on a separate PF/VF that we want to pass-through to 10 VMs... |
The best approach is to create 1 subsystem containing 1 namespace for each PF/VF. You do not want the VMs to share state of any kind, and if they share a subsystem there is shared state in there, even if you filter out shared namespaces from view. |
There may be cases where you would want the host "view" to match a backend, with shared namespaces etc... |
Unfortunately, that's in direct opposition to the defined OPI view of separate "front", "middle", and "back" ends. The front-end is entirely virtualized and emulated, and back/middle end volumes are added as namespaces. The front-end and the host never see anything about back-end subsystems or controllers, and in fact the back-end may not even be NVMe. Things like NVMe admin commands that come into the front-end are not simply passed through to some "middle" or "back" end object because there isn't middle/back-end object that corresponds to a controller/subsystem - only volumes. In order to think through solutions, I think we'll need a lot more detail on the motivating use case for the host "view" to match some backend. Are you able to provide that? |
I'm not arguing the best approach. I think API should allow and not limit if user wants to share subsystem. Even if we don't see today the use case, it is reasonable to allow this |
A subsystem can have namespaces that are not attached to any controller and similarly a subsystem can have a namespace that is attached to multiple controllers. This is all still front end view(nothing do with backend). This means that there should be an ability to attach/detach a namespace to/from a controller. There is also an ability to attach or detach namespaces from nvme controllers using async event at runtime (namespace attribute changed) . This event causes host to reenumerate the namespaces. This is a feature most of our customers use. |
I think there may be some mixing of very different concepts going on here. These API definitions are not doing the equivalent thing of the NVMe Namespace Attach/Detach commands. Those NVMe commands are looking at a list of namespaces already attached to a subsystem, and then adding a reference, via an NSID, to a controller's view. That's not OPI's job - OPI is the thing that put the namespaces into the subsystem to beginwith. So again, can you clarify what the actual use case is for this feature? An example flow and justification of some kind, even very high level? I'm certainly not advocating that we don't support all sorts of dynamic hot insert/removal of namespaces with all of the appropriate async events, nor am I advocating that we preclude support for any specific NVMe namespace management commands. OPI is definitely going to have to support all of that stuff. I'm really focused just on the need to set up mappings between controllers and namespaces during provisioning, based on the controller identity itself (as opposed to based on a hostnqn), and only in the front-end for NVMe devices. It's a very narrow thing I'm advocating against currently. The reason I'd like to avoid adding support for this is relatively straightforward:
|
This is not about the optional Namespace management support from the host side. I am not actually we should let the host do namespace management in this case anyway, as we want DPU to be in control. This is about how the DPU provisioning new namespaces to a controller at runtime. Unfortunately, the only way the spec supports this is through the host side (using namespace management commands). But the same mechanism lets DPU side also do this by issuing an async event. The high level use case for this feature is provide additional storage to a tenant without the need to burn another VF/PF resource. |
Can't what you're describing be accomplished by adding a namespace to a subsystem instead of adding it to the controller? I believe it can if the default behavior of a controller is to automatically see all namespaces in the subsystem. This is really the core of this discussion. What you outlined above can be done with the calls that already exist. What's the new/extra functionality needed that warrants an extra step to attach/detach from specific controllers? |
A subsystem can have multiple controllers and multiple namespaces. Not every controller in a subsystem needs to have every namespace in the subsystem active in its CNS:02 list. Even if you added a namespace to a subsystem, the host is not able to see it right away. It would not be able to enumerate the new namespace until it issues the identify (CNS:02) command again (which is generally a reboot/nvme driver reload). The above API lets host see a new namespace that is added to the subsystem to be active for a controller at runtime without having reload the driver. |
My line of questioning isn't whether you can or cannot do this. My line of questioning has been focused on why you want to do this. I've provided some reasons above that you probably don't, and instead should have made two subsystems. What's the use case where this is the superior solution to using two subsystems?
This may be the core misconception. I'm advocating that all namespaces in a subsystem (for the front-end) are surfaced through all associated controllers immediately and automatically when they're added or removed from the subsystem. So adding a namespace does in fact generate the asynchronous event notifications, which causes the driver to issue the identify namespace commands and see the new namespace without being reloaded. Is there some problem with this behavior at least being the default? |
Ok. In that case
Ok. In that case, the only comment I have is that it is more restrictive than what the spec allows for. |
I agree we can't / shouldn't implement the entire spec. If there is a real need, we can add new RPCs and new Fields in existing RPCs. But I wouldn't want to go into direction where we just add more things without clear value... it will be hard to support long term. |
Also I wonder, how native multipathing would be supported for the front end, if every host controller ends up with its own subsystem as is being advocated here. if the same namespace is exposed to the host (considered the same due to reporting the same unique identifier in "Identify Namespace") through 2 different controllers which are not in the same subsystem, the host nvme driver will end up creating 2 block devices which beats the purpose |
I'm not advocating that every controller has it's own subsystem. I'm advocating that every controller associated with a given subsystem has the same view of the namespaces in that subsystem. This is a very minor narrowing of the flexibility which yields a large reduction in overall complexity. For example, in my proposed system, the common steps to configure a front-end device would be:
But if we add the ability for each controller to see a different set of namespaces, the steps would become:
If N is anything but 1, is it clear how the second option suddenly becomes much more complex than the first? And I've provided a few reasons that the complexity isn't actually adding any value - there's other ways to accomplish almost every use case that maintain the simpler provisioning flow. |
link to DOCA for reference |
Almost a year later, my opinion remains that OPI should not support "private" namespaces. In other words, all controllers in a subsystem shall see the same set of namespaces. If a namespace is added to a subsystem, all controllers in the subsystem are notified. The normal/common case is one controller per subsystem (although we'd still allow multiple controllers for "dual-port" type set ups). This also means that the normal configuration for these DPU products is to create one subsystem per VF (as opposed to one giant subsystem for the whole device). This has other really important ramifications - it makes it much simpler to live migrate a controller between two subsystems if there is only one controller in it, for example. It also makes it clear what other components in the system are impacted when you request a subsystem reset (just your VF). I really do believe it's the only reasonable design choice to make, so OPI is safe to be opinionated here. |
@mkalderon is on leave. As I stated earlier, I am ok with the restriction as long as there is a way for a subsystem to have multiple controllers and multipathing.Marvell SDK also supports the attach/detach semantics. |
Need to add support for attaching / detaching controllers from a namespace.
We can take this. Just raising as issue in-case someone has prior thoughts on this ?
@glimchb
The text was updated successfully, but these errors were encountered: