Message ID | 20190725172335.6825-1-logang@deltatee.com (mailing list archive) |
---|---|
Headers | show |
Series | nvmet: add target passthru commands support | expand |
Hi Logan, On 7/25/19 7:23 PM, Logan Gunthorpe wrote: > Hi, > > Chaitainya has asked us to take on these patches as we have an > interest in getting them into upstream. To that end, we've done > a large amount of testing, bug fixes and cleanup. > > Passthru support for nvmet allows users to export an entire > NVMe controller through NVMe-oF. When exported in this way (as opposed > to exporting each namespace as a block device), all the NVMe commands > are passed to the given controller unmodified, including most admin > commands and Vendor Unique Commands (VUCs). A passthru target will > expose all namespaces for a given device to the remote host. > In general I'm very much in favour of this, yet there are some issues which I'm not quite clear about. > There are three major non-bugfix changes that we've done to the series: > > 1) Instead of using a seperate special passthru subsystem in > configfs simply add a passthru directory that's analogous to > the existing namespace directories. The directories have > very similar attributes to namespaces but are mutually exclusive. > If a user enables a namespaces, they can't then enable > passthru controller and vice versa. This simplifies the code > required to implement passthru configfs and IMO creates a much > clearer and uniform interface. > How do you handle subsystem naming? If you enable the 'passthru' device, the (nvmet) subsystem (and its name) is already created. Yet the passthru device will have its own internal subsystem naming, so if you're not extra careful you'll end up with a nvmet subsystem which doesn't have any relationship with the passthru subsystem, making addressing etc ... tricky. Any thoughts about that? Similarly: how do you propose to handle multipath devices? Any NVMe with several paths will be enabling NVMe multipathing automatically, presenting you with a single multipathed namespace. How will these devices be treated? Will the multipathed namespace be used for passthru? Cheers, Hannes
On 2019-07-26 12:23 a.m., Hannes Reinecke wrote: > How do you handle subsystem naming? > If you enable the 'passthru' device, the (nvmet) subsystem (and its > name) is already created. Yet the passthru device will have its own > internal subsystem naming, so if you're not extra careful you'll end up > with a nvmet subsystem which doesn't have any relationship with the > passthru subsystem, making addressing etc ... tricky. > Any thoughts about that? Well I can't say I have a great understanding of how multipath works, but... I don't think it necessarily makes sense for the target subsynqn and the target's device nqn to be the same. It would be weird for a user to want to use the same device and a passed through device (through a loop) as part of the same subsystem. That being said, it's possible for the user to use the subsysnqn from the passed through device for the name of the subsys of the target. I tried this and it works except for the fact that the device I'm passing through doesn't set id->cmic. > Similarly: how do you propose to handle multipath devices? > Any NVMe with several paths will be enabling NVMe multipathing > automatically, presenting you with a single multipathed namespace. > How will these devices be treated? Well passthru works on the controller level not on the namespace level. So it can't make use of the multipath handling on the target system. The one case that I think makes sense to me, but I don't know how if we can handle, is if the user had a couple multipath enabled controllers with the same subsynqn and wanted to passthru all of them to another system and use multipath on the host with both controllers. This would require having multiple target subsystems with the same name which I don't think will work too well. > Will the multipathed namespace be used for passthru? Nope. Honestly, I think the answer is if someone wants to use multipathed controllers they should use regular NVMe-of as it doesn't really mesh well with the passthru approach. Logan
>> How do you handle subsystem naming? >> If you enable the 'passthru' device, the (nvmet) subsystem (and its >> name) is already created. Yet the passthru device will have its own >> internal subsystem naming, so if you're not extra careful you'll end up >> with a nvmet subsystem which doesn't have any relationship with the >> passthru subsystem, making addressing etc ... tricky. >> Any thoughts about that? > > Well I can't say I have a great understanding of how multipath works, but... Why is this related to multipath? > I don't think it necessarily makes sense for the target subsynqn and the > target's device nqn to be the same. It would be weird for a user to want > to use the same device and a passed through device (through a loop) as > part of the same subsystem. That being said, it's possible for the user > to use the subsysnqn from the passed through device for the name of the > subsys of the target. I tried this and it works except for the fact that > the device I'm passing through doesn't set id->cmic. I don't see why should the subsystem nqn should be the same name. Its just like any other nvmet subsystem, just happens to have a nvme controller in the backend (which it knows about). No reason to have the same name IMO. >> Similarly: how do you propose to handle multipath devices? >> Any NVMe with several paths will be enabling NVMe multipathing >> automatically, presenting you with a single multipathed namespace. >> How will these devices be treated? > > Well passthru works on the controller level not on the namespace level. > So it can't make use of the multipath handling on the target system. Why? if nvmet is capable, why shouldn't we support it? > The one case that I think makes sense to me, but I don't know how if we > can handle, is if the user had a couple multipath enabled controllers > with the same subsynqn That is usually the case, there is no multipathing defined across NVM subsystems (at least for now). > and wanted to passthru all of them to another > system and use multipath on the host with both controllers. This would > require having multiple target subsystems with the same name which I > don't think will work too well. Don't understand why this is the case? AFAICT, all nvmet needs to do is: 1. override cimc 2. allow allocating multiple controllers to the pt ctrl as long as the hostnqn match. 3. answer all the ana stuff. What else is missing? >> Will the multipathed namespace be used for passthru? > > Nope. > > Honestly, I think the answer is if someone wants to use multipathed > controllers they should use regular NVMe-of as it doesn't really mesh > well with the passthru approach. Maybe I'm missing something, but they should be orthogonal.. I know that its sort of not real passthru, but we are exposing an nvme device across a fabric, I think its reasonable to have some adjustments on top.
On 2019-07-26 4:21 p.m., Sagi Grimberg wrote: >> I don't think it necessarily makes sense for the target subsynqn and the >> target's device nqn to be the same. It would be weird for a user to want >> to use the same device and a passed through device (through a loop) as >> part of the same subsystem. That being said, it's possible for the user >> to use the subsysnqn from the passed through device for the name of the >> subsys of the target. I tried this and it works except for the fact that >> the device I'm passing through doesn't set id->cmic. > > I don't see why should the subsystem nqn should be the same name. Its > just like any other nvmet subsystem, just happens to have a nvme > controller in the backend (which it knows about). No reason to have > the same name IMO. Agreed. >>> Similarly: how do you propose to handle multipath devices? >>> Any NVMe with several paths will be enabling NVMe multipathing >>> automatically, presenting you with a single multipathed namespace. >>> How will these devices be treated? >> >> Well passthru works on the controller level not on the namespace level. >> So it can't make use of the multipath handling on the target system. > > Why? if nvmet is capable, why shouldn't we support it? I'm saying that passthru is exporting a specific controller and submits commands (both admin and IO) straight to the nvme_ctrl's queues. It's not exporting an nvme_subsys and I think it would be troublesome to do so; for example, if the target receives an admin command which ctrl of the subsystem should it send it to? There's also no userspace handle for a given subsystem we'd maybe have to use the subsysnqn. >> The one case that I think makes sense to me, but I don't know how if we >> can handle, is if the user had a couple multipath enabled controllers >> with the same subsynqn > > That is usually the case, there is no multipathing defined across NVM > subsystems (at least for now). > >> and wanted to passthru all of them to another >> system and use multipath on the host with both controllers. This would >> require having multiple target subsystems with the same name which I >> don't think will work too well. > > Don't understand why this is the case? > > AFAICT, all nvmet needs to do is: > 1. override cimc > 2. allow allocating multiple controllers to the pt ctrl as long as the > hostnqn match. > 3. answer all the ana stuff. But with this scheme the host will only see one controller and then the target would have to make decisions on which ctrl to send any commands to. Maybe it could be done for I/O but I don't see how it could be done correctly for admin commands. And from the hosts perspective, having cimc set doesn't help anything because we've limited the passthru code to only accept one connection from one host so the host can only actually have one route to this controller. Logan
>> Why? if nvmet is capable, why shouldn't we support it? > > I'm saying that passthru is exporting a specific controller and submits > commands (both admin and IO) straight to the nvme_ctrl's queues. It's > not exporting an nvme_subsys and I think it would be troublesome to do > so; for example, if the target receives an admin command which ctrl of > the subsystem should it send it to? Its the same controller in the backend, what is the difference from which fabrics controller the admin command came from? > There's also no userspace handle for > a given subsystem we'd maybe have to use the subsysnqn. Umm, not sure I understand what you mean. >>> The one case that I think makes sense to me, but I don't know how if we >>> can handle, is if the user had a couple multipath enabled controllers >>> with the same subsynqn >> >> That is usually the case, there is no multipathing defined across NVM >> subsystems (at least for now). >> >>> and wanted to passthru all of them to another >>> system and use multipath on the host with both controllers. This would >>> require having multiple target subsystems with the same name which I >>> don't think will work too well. >> >> Don't understand why this is the case? >> >> AFAICT, all nvmet needs to do is: >> 1. override cimc >> 2. allow allocating multiple controllers to the pt ctrl as long as the >> hostnqn match. >> 3. answer all the ana stuff. > > But with this scheme the host will only see one controller and then the > target would have to make decisions on which ctrl to send any commands > to. Maybe it could be done for I/O but I don't see how it could be done > correctly for admin commands. I haven't thought this through so its very possible that I'm missing something, but why can't the host see multiple controllers if it has more than one path to the target? What specific admin commands are you concerned about? What exactly would clash? > And from the hosts perspective, having cimc set doesn't help anything > because we've limited the passthru code to only accept one connection > from one host so the host can only actually have one route to this > controller. And I'm suggesting to allow more than a single controller given that all controller allocations match a single hostnqn. It wouldn't make sense to expose this controller to multiple hosts (although that might be doable but but definitely requires non-trivial infrastructure around it). Look, when it comes to fabrics, multipath is a fundamental piece of the puzzle. Not supporting multipathing significantly diminishes the value of this in my mind (assuming this answers a real-world use-case).
On 2019-07-26 5:13 p.m., Sagi Grimberg wrote: > >>> Why? if nvmet is capable, why shouldn't we support it? >> >> I'm saying that passthru is exporting a specific controller and submits >> commands (both admin and IO) straight to the nvme_ctrl's queues. It's >> not exporting an nvme_subsys and I think it would be troublesome to do >> so; for example, if the target receives an admin command which ctrl of >> the subsystem should it send it to? > > Its the same controller in the backend, what is the difference from > which fabrics controller the admin command came from? This is not my understanding. It's not really the same controller in the back end and there are admin commands that operate on the controller like the namespace attachment command which takes a list of cntlids (though admittedly is not something I'm too familiar with because I don't have any hardware to play around with). Though that command is already a bit problematic for passthru because we have different cntlid address spaces. > I haven't thought this through so its very possible that I'm missing > something, but why can't the host see multiple controllers if it has > more than one path to the target? Well a target controller is created for each connection. So if the host wanted to see multiple controllers it would have to do multiple "nvme connects" and some how need to address the individual controllers for each connection. Right now a connect is based on subsysnqn which would be the same for every multipath controller. > What specific admin commands are you concerned about? What exactly > would clash? Namespace attach comes to mind. > And I'm suggesting to allow more than a single controller given that all > controller allocations match a single hostnqn. It wouldn't make sense to > expose this controller to multiple hosts (although that might be doable > but but definitely requires non-trivial infrastructure around it). > Look, when it comes to fabrics, multipath is a fundamental piece of the > puzzle. Not supporting multipathing significantly diminishes the value > of this in my mind (assuming this answers a real-world use-case). I'd agree with that. But it's the multipath through different ports that seems important for fabrics. ie. If I have a host with a path through RDMA and a path through TCP they should both work and allow fail over. This is quite orthogonal to passthru and would be easily supported if we dropped the multiple hosts restriction (I'm not sure what the objection really is to that). This is different from multipath on say a multi-controller PCI device and trying to expose both those controllers through passthru. this is where the problems we are discussing come up. Supporting this is what is hard and I think the sensible answer is if users want to do something like that, they use non-passthru NVME-of and the multipath code will just work as designed. Our real-world use case is to support our PCI device which has a bunch of vendor unique commands and isn't likely to support multiple controllers in the foreseeable future. Logan
> This is different from multipath on say a multi-controller PCI device > and trying to expose both those controllers through passthru. this is > where the problems we are discussing come up. I *think* there is some confusion. I *think* Sagi is talking about network multi-path (i.e. the ability for the host to connect to a controller on the target via two different network paths that fail-over as needed). I *think* Logan is talking about multi-port PCIe NVMe devices that allow namespaces to be accessed via more than one controller over PCIe (dual-port NVMe SSDs being the most obvious example of this today). > But it's the multipath through different ports that > seems important for fabrics. ie. If I have a host with a path through > RDMA and a path through TCP they should both work and allow fail over. Yes, or even two paths that are both RDMA or both TCP but which take a different path through the network from host to target. > Our real-world use case is to support our PCI device which has a bunch > of vendor unique commands and isn't likely to support multiple > controllers in the foreseeable future. I think solving passthru for single-port PCIe controllers would be a good start. Stephen
>> This is different from multipath on say a multi-controller PCI device >> and trying to expose both those controllers through passthru. this is >> where the problems we are discussing come up. > > I *think* there is some confusion. I *think* Sagi is talking about network multi-path (i.e. the ability for the host to connect to a controller on the target via two different network paths that fail-over as needed). I *think* Logan is talking about multi-port PCIe NVMe devices that allow namespaces to be accessed via more than one controller over PCIe (dual-port NVMe SSDs being the most obvious example of this today). Yes, I was referring to fabrics multipathing which is somewhat orthogonal to the backend pci multipathing (unless I'm missing something). >> But it's the multipath through different ports that >> seems important for fabrics. ie. If I have a host with a path through >> RDMA and a path through TCP they should both work and allow fail over. > > Yes, or even two paths that are both RDMA or both TCP but which take a different path through the network from host to target. > >> Our real-world use case is to support our PCI device which has a bunch >> of vendor unique commands and isn't likely to support multiple >> controllers in the foreseeable future. > > I think solving passthru for single-port PCIe controllers would be a good start. Me too.
On 2019-07-29 10:15 a.m., Sagi Grimberg wrote: > >>> This is different from multipath on say a multi-controller PCI device >>> and trying to expose both those controllers through passthru. this is >>> where the problems we are discussing come up. >> >> I *think* there is some confusion. I *think* Sagi is talking about network multi-path (i.e. the ability for the host to connect to a controller on the target via two different network paths that fail-over as needed). I *think* Logan is talking about multi-port PCIe NVMe devices that allow namespaces to be accessed via more than one controller over PCIe (dual-port NVMe SSDs being the most obvious example of this today). > > Yes, I was referring to fabrics multipathing which is somewhat > orthogonal to the backend pci multipathing (unless I'm missing > something). Yes, so if we focus on the fabrics multipathing, the only issue I see is that only one controller can be connected to a passthru target (I believe this was at your request) so two paths simply cannot exist to begin with. I can easily remove that restriction. Logan