Message ID | 20200327171545.98970-7-maxg@mellanox.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | nvme-rdma/nvmet-rdma: Add metadata/T10-PI support | expand |
Looks good modulo the interaction with the reordering in the previous patch.
On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote: > From: Israel Rukshin <israelr@mellanox.com> > > Preparation for adding metadata (T10-PI) over fabric support. This will > allow end-to-end protection information passthrough and validation for > NVMe over Fabric. So actually - for PCIe we enable PI by default. Not sure why RDMA would be any different? If we have a switch to turn it off we probably want it work similar (can't be the same due to the lack of connect) for PCIe as well.
On 4/21/2020 6:17 PM, Christoph Hellwig wrote: > On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote: >> From: Israel Rukshin <israelr@mellanox.com> >> >> Preparation for adding metadata (T10-PI) over fabric support. This will >> allow end-to-end protection information passthrough and validation for >> NVMe over Fabric. > So actually - for PCIe we enable PI by default. Not sure why RDMA would > be any different? If we have a switch to turn it off we probably want > it work similar (can't be the same due to the lack of connect) for PCIe > as well. For PCI we use a format command to configure metadata. In fabrics we can choose doing it in the connect command and we can also choose to have "protected" controllers and "non-protected" controllers. I don't think it's all or nothing case, and configuration using nvme-cli (or other tool) seems reasonable and flexible.
On 4/22/2020 3:07 PM, Max Gurtovoy wrote: > > On 4/21/2020 6:17 PM, Christoph Hellwig wrote: >> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote: >>> From: Israel Rukshin <israelr@mellanox.com> >>> >>> Preparation for adding metadata (T10-PI) over fabric support. This will >>> allow end-to-end protection information passthrough and validation for >>> NVMe over Fabric. >> So actually - for PCIe we enable PI by default. Not sure why RDMA would >> be any different? If we have a switch to turn it off we probably want >> it work similar (can't be the same due to the lack of connect) for PCIe >> as well. > > For PCI we use a format command to configure metadata. In fabrics we can > choose doing it in the connect command and we can also choose to have > "protected" controllers and "non-protected" controllers. > > I don't think it's all or nothing case, and configuration using nvme-cli > (or other tool) seems reasonable and flexible. I think you need to change this to "some fabrics". With FC, we don't do anything in connect. The transport passes attributes on what it can do for PI support, including: passing through metadata (no checks); checking of metadata (normal); generation/strip of metadata on the wire (meaning OS does not have to have a metadata buffer), and so on. Enablement is just like pci - format the ns, then match up the attribute with the behavior. There is no such thing as protected and non-protected controllers. There's either a ns that has metadata or not. If metadata and the attributes of the transport can't support it, the ns is inaccessible. I understand why you are describing it as you are, but I'm a bit concerned about the creation of things that aren't comprehended in the standards at all right now (protected vs non-protected controllers). This affects how multipath configures as well. -- james
On 4/23/2020 1:24 AM, James Smart wrote: > On 4/22/2020 3:07 PM, Max Gurtovoy wrote: >> >> On 4/21/2020 6:17 PM, Christoph Hellwig wrote: >>> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote: >>>> From: Israel Rukshin <israelr@mellanox.com> >>>> >>>> Preparation for adding metadata (T10-PI) over fabric support. This >>>> will >>>> allow end-to-end protection information passthrough and validation for >>>> NVMe over Fabric. >>> So actually - for PCIe we enable PI by default. Not sure why RDMA >>> would >>> be any different? If we have a switch to turn it off we probably want >>> it work similar (can't be the same due to the lack of connect) for PCIe >>> as well. >> >> For PCI we use a format command to configure metadata. In fabrics we >> can choose doing it in the connect command and we can also choose to >> have "protected" controllers and "non-protected" controllers. >> >> I don't think it's all or nothing case, and configuration using >> nvme-cli (or other tool) seems reasonable and flexible. > > I think you need to change this to "some fabrics". > > With FC, we don't do anything in connect. The transport passes > attributes on what it can do for PI support, including: passing > through metadata (no checks); checking of metadata (normal); > generation/strip of metadata on the wire (meaning OS does not have to > have a metadata buffer), and so on. Enablement is just like pci - > format the ns, then match up the attribute with the behavior. There is > no such thing as protected and non-protected controllers. There's > either a ns that has metadata or not. If metadata and the attributes > of the transport can't support it, the ns is inaccessible. > > I understand why you are describing it as you are, but I'm a bit > concerned about the creation of things that aren't comprehended in the > standards at all right now (protected vs non-protected controllers). > This affects how multipath configures as well. it's a bit late for me now so I probably wrote non standard sentence above. BUT what I meant to say is I would like to give the user an option to decide whether use E2E protection or not (of course a controller can control protected and non-protected namespaces :) ) AFAIK, there is no option to format a ns in NVMf (at least for RDMA there is only 1 lbaf exposed by the target) so i'm not sure how exactly this will work. We are doing all or nothing approach in iSER for T10 and I prefer not to do the same mistake for NVMf as well. When NVMe/FC will support metadata in Linux we'll be able to adjust the code and the pi_enable according to HBA cap or any other logic that will fit. > > -- james > >
On Thu, Apr 23, 2020 at 01:07:47AM +0300, Max Gurtovoy wrote: > > On 4/21/2020 6:17 PM, Christoph Hellwig wrote: >> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote: >>> From: Israel Rukshin <israelr@mellanox.com> >>> >>> Preparation for adding metadata (T10-PI) over fabric support. This will >>> allow end-to-end protection information passthrough and validation for >>> NVMe over Fabric. >> So actually - for PCIe we enable PI by default. Not sure why RDMA would >> be any different? If we have a switch to turn it off we probably want >> it work similar (can't be the same due to the lack of connect) for PCIe >> as well. > > For PCI we use a format command to configure metadata. In fabrics we can > choose doing it in the connect command and we can also choose to have > "protected" controllers and "non-protected" controllers. > > I don't think it's all or nothing case, and configuration using nvme-cli > (or other tool) seems reasonable and flexible. Format applies to a namespace and is not limited to PCIe.
On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote: > it's a bit late for me now so I probably wrote non standard sentence above. > > BUT what I meant to say is I would like to give the user an option to > decide whether use E2E protection or not (of course a controller can > control protected and non-protected namespaces :) ) I don't really have a problem with an opt-out, but I'd like to apply it consistently over all transports. > > AFAIK, there is no option to format a ns in NVMf (at least for RDMA there > is only 1 lbaf exposed by the target) so i'm not sure how exactly this will > work. The NVMe protocol Format NVM support is independent of the transport.
On 4/23/2020 8:54 AM, Christoph Hellwig wrote: > On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote: >> it's a bit late for me now so I probably wrote non standard sentence above. >> >> BUT what I meant to say is I would like to give the user an option to >> decide whether use E2E protection or not (of course a controller can >> control protected and non-protected namespaces :) ) > I don't really have a problem with an opt-out, but I'd like to apply it > consistently over all transports. > >> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there >> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will >> work. > The NVMe protocol Format NVM support is independent of the transport. Ok, but it's not supported in Linux. Are you saying we should implement Format NVM for fabrics ? or stay consistent for NVMf (and not nvmf + pci) ?
On Thu, Apr 23, 2020 at 10:30:44AM +0300, Max Gurtovoy wrote: > > On 4/23/2020 8:54 AM, Christoph Hellwig wrote: >> On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote: >>> it's a bit late for me now so I probably wrote non standard sentence above. >>> >>> BUT what I meant to say is I would like to give the user an option to >>> decide whether use E2E protection or not (of course a controller can >>> control protected and non-protected namespaces :) ) >> I don't really have a problem with an opt-out, but I'd like to apply it >> consistently over all transports. >> >>> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there >>> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will >>> work. >> The NVMe protocol Format NVM support is independent of the transport. > > Ok, but it's not supported in Linux. > > Are you saying we should implement Format NVM for fabrics ? or stay > consistent for NVMf (and not nvmf + pci) ? I see no reason not to support a simple Format NVM for our fabrics target implementation. But that isn't the point - you don't really need Format as you can also control it from configfs in your series. So for the initial version I don't think we need Format NVM, but I don't mind adding it later.
On 4/24/2020 10:06 AM, Christoph Hellwig wrote: > On Thu, Apr 23, 2020 at 10:30:44AM +0300, Max Gurtovoy wrote: >> On 4/23/2020 8:54 AM, Christoph Hellwig wrote: >>> On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote: >>>> it's a bit late for me now so I probably wrote non standard sentence above. >>>> >>>> BUT what I meant to say is I would like to give the user an option to >>>> decide whether use E2E protection or not (of course a controller can >>>> control protected and non-protected namespaces :) ) >>> I don't really have a problem with an opt-out, but I'd like to apply it >>> consistently over all transports. >>> >>>> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there >>>> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will >>>> work. >>> The NVMe protocol Format NVM support is independent of the transport. >> Ok, but it's not supported in Linux. >> >> Are you saying we should implement Format NVM for fabrics ? or stay >> consistent for NVMf (and not nvmf + pci) ? > I see no reason not to support a simple Format NVM for our fabrics target > implementation. But that isn't the point - you don't really need Format > as you can also control it from configfs in your series. So for the > initial version I don't think we need Format NVM, but I don't mind > adding it later. so we're ok with passing -p in nvme-cli during connect command ?
On Sun, Apr 26, 2020 at 12:48:18PM +0300, Max Gurtovoy wrote: >> I see no reason not to support a simple Format NVM for our fabrics target >> implementation. But that isn't the point - you don't really need Format >> as you can also control it from configfs in your series. So for the >> initial version I don't think we need Format NVM, but I don't mind >> adding it later. > > so we're ok with passing -p in nvme-cli during connect command ? PI should be enable by default. We can think of a hook disabling it, but please keep it at the end of the series.
On 4/27/2020 9:04 AM, Christoph Hellwig wrote: > On Sun, Apr 26, 2020 at 12:48:18PM +0300, Max Gurtovoy wrote: >>> I see no reason not to support a simple Format NVM for our fabrics target >>> implementation. But that isn't the point - you don't really need Format >>> as you can also control it from configfs in your series. So for the >>> initial version I don't think we need Format NVM, but I don't mind >>> adding it later. >> so we're ok with passing -p in nvme-cli during connect command ? > PI should be enable by default. We can think of a hook disabling it, > but please keep it at the end of the series. but the default format on NVMe pci drives is without metadata and to enable it we do an NVM format command. So for this model we need to do some action to enable T10/Metadata. Regular users will usually use default configuration and are not aware of the pros/cons with adding the metadata (resources/performance/etc..). Also adding module param as we did in iSER is less flexible than having a flag per controller in the CLI. I'll think about another option that will replace -p in connect command (the only one I can think of now is always "on" as suggested...)
On Mon, Apr 27, 2020 at 04:52:52PM +0300, Max Gurtovoy wrote: > but the default format on NVMe pci drives is without metadata and to enable > it we do an NVM format command. The defaut namespace format is entirely vendor specific for both PCIe and Fabrics. For OEMs that rely on PI the drives will usually shipped with a PI-enabled format.
On 4/27/2020 4:54 PM, Christoph Hellwig wrote: > On Mon, Apr 27, 2020 at 04:52:52PM +0300, Max Gurtovoy wrote: >> but the default format on NVMe pci drives is without metadata and to enable >> it we do an NVM format command. > The defaut namespace format is entirely vendor specific for both > PCIe and Fabrics. For OEMs that rely on PI the drives will usually > shipped with a PI-enabled format. Ok I'll set the controller to be capable in case the HCA is capable and then act according to exposed namespace from the target. It will cost us integrity_mrs (that are expensive) for ConnectX-4 and above but I guess we can live with that for now.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 2feb7fb..0b4d847 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1900,7 +1900,8 @@ static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id) * controller. */ if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) { - if (ns->ctrl->ops->flags & NVME_F_FABRICS || + if ((ns->ctrl->ops->flags & NVME_F_FABRICS && + ns->ctrl->opts->pi_enable) || !(ns->features & NVME_NS_EXT_LBAS)) ns->features |= NVME_NS_MD_HOST_SUPPORTED; } diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 2a6c819..38f4213 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -612,6 +612,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq, { NVMF_OPT_NR_WRITE_QUEUES, "nr_write_queues=%d" }, { NVMF_OPT_NR_POLL_QUEUES, "nr_poll_queues=%d" }, { NVMF_OPT_TOS, "tos=%d" }, + { NVMF_OPT_PI_ENABLE, "pi_enable" }, { NVMF_OPT_ERR, NULL } }; @@ -634,6 +635,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, opts->hdr_digest = false; opts->data_digest = false; opts->tos = -1; /* < 0 == use transport default */ + opts->pi_enable = false; options = o = kstrdup(buf, GFP_KERNEL); if (!options) @@ -867,6 +869,15 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, } opts->tos = token; break; +#ifdef CONFIG_BLK_DEV_INTEGRITY + case NVMF_OPT_PI_ENABLE: + if (opts->discovery_nqn) { + pr_debug("Ignoring pi_enable value for discovery controller\n"); + break; + } + opts->pi_enable = true; + break; +#endif default: pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n", p); diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index a0ec40a..773f748 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -56,6 +56,7 @@ enum { NVMF_OPT_NR_WRITE_QUEUES = 1 << 17, NVMF_OPT_NR_POLL_QUEUES = 1 << 18, NVMF_OPT_TOS = 1 << 19, + NVMF_OPT_PI_ENABLE = 1 << 20, }; /** @@ -89,6 +90,7 @@ enum { * @nr_write_queues: number of queues for write I/O * @nr_poll_queues: number of queues for polling I/O * @tos: type of service + * @pi_enable: Enable metadata (T10-PI) support */ struct nvmf_ctrl_options { unsigned mask; @@ -111,6 +113,7 @@ struct nvmf_ctrl_options { unsigned int nr_write_queues; unsigned int nr_poll_queues; int tos; + bool pi_enable; }; /*