diff mbox series

[05/17] nvme-fabrics: Allow user enabling metadata/T10-PI support

Message ID 20200327171545.98970-7-maxg@mellanox.com (mailing list archive)
State Not Applicable
Headers show
Series nvme-rdma/nvmet-rdma: Add metadata/T10-PI support | expand

Commit Message

Max Gurtovoy March 27, 2020, 5:15 p.m. UTC
From: Israel Rukshin <israelr@mellanox.com>

Preparation for adding metadata (T10-PI) over fabric support. This will
allow end-to-end protection information passthrough and validation for
NVMe over Fabric.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/host/core.c    |  3 ++-
 drivers/nvme/host/fabrics.c | 11 +++++++++++
 drivers/nvme/host/fabrics.h |  3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig April 21, 2020, 12:12 p.m. UTC | #1
Looks good modulo the interaction with the reordering in the previous
patch.
Christoph Hellwig April 21, 2020, 3:17 p.m. UTC | #2
On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
> 
> Preparation for adding metadata (T10-PI) over fabric support. This will
> allow end-to-end protection information passthrough and validation for
> NVMe over Fabric.

So actually - for PCIe we enable PI by default.  Not sure why RDMA would
be any different?  If we have a switch to turn it off we probably want
it work similar (can't be the same due to the lack of connect) for PCIe
as well.
Max Gurtovoy April 22, 2020, 10:07 p.m. UTC | #3
On 4/21/2020 6:17 PM, Christoph Hellwig wrote:
> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote:
>> From: Israel Rukshin <israelr@mellanox.com>
>>
>> Preparation for adding metadata (T10-PI) over fabric support. This will
>> allow end-to-end protection information passthrough and validation for
>> NVMe over Fabric.
> So actually - for PCIe we enable PI by default.  Not sure why RDMA would
> be any different?  If we have a switch to turn it off we probably want
> it work similar (can't be the same due to the lack of connect) for PCIe
> as well.

For PCI we use a format command to configure metadata. In fabrics we can 
choose doing it in the connect command and we can also choose to have 
"protected" controllers and "non-protected" controllers.

I don't think it's all or nothing case, and configuration using nvme-cli 
(or other tool) seems reasonable and flexible.
James Smart April 22, 2020, 10:24 p.m. UTC | #4
On 4/22/2020 3:07 PM, Max Gurtovoy wrote:
> 
> On 4/21/2020 6:17 PM, Christoph Hellwig wrote:
>> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote:
>>> From: Israel Rukshin <israelr@mellanox.com>
>>>
>>> Preparation for adding metadata (T10-PI) over fabric support. This will
>>> allow end-to-end protection information passthrough and validation for
>>> NVMe over Fabric.
>> So actually - for PCIe we enable PI by default.  Not sure why RDMA would
>> be any different?  If we have a switch to turn it off we probably want
>> it work similar (can't be the same due to the lack of connect) for PCIe
>> as well.
> 
> For PCI we use a format command to configure metadata. In fabrics we can 
> choose doing it in the connect command and we can also choose to have 
> "protected" controllers and "non-protected" controllers.
> 
> I don't think it's all or nothing case, and configuration using nvme-cli 
> (or other tool) seems reasonable and flexible.

I think you need to change this to "some fabrics".

With FC, we don't do anything in connect. The transport passes 
attributes on what it can do for PI support, including: passing through 
metadata (no checks); checking of metadata (normal); generation/strip of 
metadata on the wire (meaning OS does not have to have a metadata 
buffer), and so on.  Enablement is just like pci - format the ns, then 
match up the attribute with the behavior. There is no such thing as 
protected and non-protected controllers.  There's either a ns that has 
metadata or not. If metadata and the attributes of the transport can't 
support it, the ns is inaccessible.

I understand why you are describing it as you are, but I'm a bit 
concerned about the creation of things that aren't comprehended in the 
standards at all right now (protected vs non-protected controllers). 
This affects how multipath configures as well.

-- james
Max Gurtovoy April 22, 2020, 10:39 p.m. UTC | #5
On 4/23/2020 1:24 AM, James Smart wrote:
> On 4/22/2020 3:07 PM, Max Gurtovoy wrote:
>>
>> On 4/21/2020 6:17 PM, Christoph Hellwig wrote:
>>> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote:
>>>> From: Israel Rukshin <israelr@mellanox.com>
>>>>
>>>> Preparation for adding metadata (T10-PI) over fabric support. This 
>>>> will
>>>> allow end-to-end protection information passthrough and validation for
>>>> NVMe over Fabric.
>>> So actually - for PCIe we enable PI by default.  Not sure why RDMA 
>>> would
>>> be any different?  If we have a switch to turn it off we probably want
>>> it work similar (can't be the same due to the lack of connect) for PCIe
>>> as well.
>>
>> For PCI we use a format command to configure metadata. In fabrics we 
>> can choose doing it in the connect command and we can also choose to 
>> have "protected" controllers and "non-protected" controllers.
>>
>> I don't think it's all or nothing case, and configuration using 
>> nvme-cli (or other tool) seems reasonable and flexible.
>
> I think you need to change this to "some fabrics".
>
> With FC, we don't do anything in connect. The transport passes 
> attributes on what it can do for PI support, including: passing 
> through metadata (no checks); checking of metadata (normal); 
> generation/strip of metadata on the wire (meaning OS does not have to 
> have a metadata buffer), and so on.  Enablement is just like pci - 
> format the ns, then match up the attribute with the behavior. There is 
> no such thing as protected and non-protected controllers.  There's 
> either a ns that has metadata or not. If metadata and the attributes 
> of the transport can't support it, the ns is inaccessible.
>
> I understand why you are describing it as you are, but I'm a bit 
> concerned about the creation of things that aren't comprehended in the 
> standards at all right now (protected vs non-protected controllers). 
> This affects how multipath configures as well.

it's a bit late for me now so I probably wrote non standard sentence above.

BUT what I meant to say is I would like to give the user an option to 
decide whether use E2E protection or not (of course a controller can 
control protected and non-protected namespaces :) )

AFAIK, there is no option to format a ns in NVMf (at least for RDMA 
there is only 1 lbaf exposed by the target) so i'm not sure how exactly 
this will work.

We are doing all or nothing approach in iSER for T10 and I prefer not to 
do the same mistake for NVMf as well.

When NVMe/FC will support metadata in Linux we'll be able to adjust the 
code and the pi_enable according to HBA cap or any other logic that will 
fit.

>
> -- james
>
>
Christoph Hellwig April 23, 2020, 5:53 a.m. UTC | #6
On Thu, Apr 23, 2020 at 01:07:47AM +0300, Max Gurtovoy wrote:
>
> On 4/21/2020 6:17 PM, Christoph Hellwig wrote:
>> On Fri, Mar 27, 2020 at 08:15:33PM +0300, Max Gurtovoy wrote:
>>> From: Israel Rukshin <israelr@mellanox.com>
>>>
>>> Preparation for adding metadata (T10-PI) over fabric support. This will
>>> allow end-to-end protection information passthrough and validation for
>>> NVMe over Fabric.
>> So actually - for PCIe we enable PI by default.  Not sure why RDMA would
>> be any different?  If we have a switch to turn it off we probably want
>> it work similar (can't be the same due to the lack of connect) for PCIe
>> as well.
>
> For PCI we use a format command to configure metadata. In fabrics we can 
> choose doing it in the connect command and we can also choose to have 
> "protected" controllers and "non-protected" controllers.
>
> I don't think it's all or nothing case, and configuration using nvme-cli 
> (or other tool) seems reasonable and flexible.

Format applies to a namespace and is not limited to PCIe.
Christoph Hellwig April 23, 2020, 5:54 a.m. UTC | #7
On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote:
> it's a bit late for me now so I probably wrote non standard sentence above.
>
> BUT what I meant to say is I would like to give the user an option to 
> decide whether use E2E protection or not (of course a controller can 
> control protected and non-protected namespaces :) )

I don't really have a problem with an opt-out, but I'd like to apply it
consistently over all transports.

>
> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there 
> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will 
> work.

The NVMe protocol Format NVM support is independent of the transport.
Max Gurtovoy April 23, 2020, 7:30 a.m. UTC | #8
On 4/23/2020 8:54 AM, Christoph Hellwig wrote:
> On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote:
>> it's a bit late for me now so I probably wrote non standard sentence above.
>>
>> BUT what I meant to say is I would like to give the user an option to
>> decide whether use E2E protection or not (of course a controller can
>> control protected and non-protected namespaces :) )
> I don't really have a problem with an opt-out, but I'd like to apply it
> consistently over all transports.
>
>> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there
>> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will
>> work.
> The NVMe protocol Format NVM support is independent of the transport.

Ok, but it's not supported in Linux.

Are you saying we should implement Format NVM for fabrics ? or stay 
consistent for NVMf (and not nvmf + pci) ?
Christoph Hellwig April 24, 2020, 7:06 a.m. UTC | #9
On Thu, Apr 23, 2020 at 10:30:44AM +0300, Max Gurtovoy wrote:
>
> On 4/23/2020 8:54 AM, Christoph Hellwig wrote:
>> On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote:
>>> it's a bit late for me now so I probably wrote non standard sentence above.
>>>
>>> BUT what I meant to say is I would like to give the user an option to
>>> decide whether use E2E protection or not (of course a controller can
>>> control protected and non-protected namespaces :) )
>> I don't really have a problem with an opt-out, but I'd like to apply it
>> consistently over all transports.
>>
>>> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there
>>> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will
>>> work.
>> The NVMe protocol Format NVM support is independent of the transport.
>
> Ok, but it's not supported in Linux.
>
> Are you saying we should implement Format NVM for fabrics ? or stay 
> consistent for NVMf (and not nvmf + pci) ?

I see no reason not to support a simple Format NVM for our fabrics target
implementation.  But that isn't the point - you don't really need Format
as you can also control it from configfs in your series.  So for the
initial version I don't think we need Format NVM, but I don't mind
adding it later.
Max Gurtovoy April 26, 2020, 9:48 a.m. UTC | #10
On 4/24/2020 10:06 AM, Christoph Hellwig wrote:
> On Thu, Apr 23, 2020 at 10:30:44AM +0300, Max Gurtovoy wrote:
>> On 4/23/2020 8:54 AM, Christoph Hellwig wrote:
>>> On Thu, Apr 23, 2020 at 01:39:26AM +0300, Max Gurtovoy wrote:
>>>> it's a bit late for me now so I probably wrote non standard sentence above.
>>>>
>>>> BUT what I meant to say is I would like to give the user an option to
>>>> decide whether use E2E protection or not (of course a controller can
>>>> control protected and non-protected namespaces :) )
>>> I don't really have a problem with an opt-out, but I'd like to apply it
>>> consistently over all transports.
>>>
>>>> AFAIK, there is no option to format a ns in NVMf (at least for RDMA there
>>>> is only 1 lbaf exposed by the target) so i'm not sure how exactly this will
>>>> work.
>>> The NVMe protocol Format NVM support is independent of the transport.
>> Ok, but it's not supported in Linux.
>>
>> Are you saying we should implement Format NVM for fabrics ? or stay
>> consistent for NVMf (and not nvmf + pci) ?
> I see no reason not to support a simple Format NVM for our fabrics target
> implementation.  But that isn't the point - you don't really need Format
> as you can also control it from configfs in your series.  So for the
> initial version I don't think we need Format NVM, but I don't mind
> adding it later.

so we're ok with passing -p in nvme-cli during connect command ?
Christoph Hellwig April 27, 2020, 6:04 a.m. UTC | #11
On Sun, Apr 26, 2020 at 12:48:18PM +0300, Max Gurtovoy wrote:
>> I see no reason not to support a simple Format NVM for our fabrics target
>> implementation.  But that isn't the point - you don't really need Format
>> as you can also control it from configfs in your series.  So for the
>> initial version I don't think we need Format NVM, but I don't mind
>> adding it later.
>
> so we're ok with passing -p in nvme-cli during connect command ?

PI should be enable by default.  We can think of a hook disabling it,
but please keep it at the end of the series.
Max Gurtovoy April 27, 2020, 1:52 p.m. UTC | #12
On 4/27/2020 9:04 AM, Christoph Hellwig wrote:
> On Sun, Apr 26, 2020 at 12:48:18PM +0300, Max Gurtovoy wrote:
>>> I see no reason not to support a simple Format NVM for our fabrics target
>>> implementation.  But that isn't the point - you don't really need Format
>>> as you can also control it from configfs in your series.  So for the
>>> initial version I don't think we need Format NVM, but I don't mind
>>> adding it later.
>> so we're ok with passing -p in nvme-cli during connect command ?
> PI should be enable by default.  We can think of a hook disabling it,
> but please keep it at the end of the series.

but the default format on NVMe pci drives is without metadata and to 
enable it we do an NVM format command.

So for this model we need to do some action to enable T10/Metadata.

Regular users will usually use default configuration and are not aware 
of the pros/cons with adding the metadata (resources/performance/etc..).

Also adding module param as we did in iSER is less flexible than having 
a flag per controller in the CLI.

I'll think about another option that will replace -p in connect command 
(the only one I can think of now is always "on" as suggested...)
Christoph Hellwig April 27, 2020, 1:54 p.m. UTC | #13
On Mon, Apr 27, 2020 at 04:52:52PM +0300, Max Gurtovoy wrote:
> but the default format on NVMe pci drives is without metadata and to enable 
> it we do an NVM format command.

The defaut namespace format is entirely vendor specific for both
PCIe and Fabrics.  For OEMs that rely on PI the drives will usually
shipped with a PI-enabled format.
Max Gurtovoy April 28, 2020, 9:18 a.m. UTC | #14
On 4/27/2020 4:54 PM, Christoph Hellwig wrote:
> On Mon, Apr 27, 2020 at 04:52:52PM +0300, Max Gurtovoy wrote:
>> but the default format on NVMe pci drives is without metadata and to enable
>> it we do an NVM format command.
> The defaut namespace format is entirely vendor specific for both
> PCIe and Fabrics.  For OEMs that rely on PI the drives will usually
> shipped with a PI-enabled format.

Ok I'll set the controller to be capable in case the HCA is capable and 
then act according to exposed namespace from the target.

It will cost us integrity_mrs (that are expensive) for ConnectX-4 and 
above but I guess we can live with that for now.
diff mbox series

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2feb7fb..0b4d847 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1900,7 +1900,8 @@  static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		 * controller.
 		 */
 		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
-			if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
+			if ((ns->ctrl->ops->flags & NVME_F_FABRICS &&
+			     ns->ctrl->opts->pi_enable) ||
 			    !(ns->features & NVME_NS_EXT_LBAS))
 				ns->features |= NVME_NS_MD_HOST_SUPPORTED;
 		}
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2a6c819..38f4213 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -612,6 +612,7 @@  bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
 	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_TOS,			"tos=%d"		},
+	{ NVMF_OPT_PI_ENABLE,		"pi_enable"		},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -634,6 +635,7 @@  static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	opts->hdr_digest = false;
 	opts->data_digest = false;
 	opts->tos = -1; /* < 0 == use transport default */
+	opts->pi_enable = false;
 
 	options = o = kstrdup(buf, GFP_KERNEL);
 	if (!options)
@@ -867,6 +869,15 @@  static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->tos = token;
 			break;
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+		case NVMF_OPT_PI_ENABLE:
+			if (opts->discovery_nqn) {
+				pr_debug("Ignoring pi_enable value for discovery controller\n");
+				break;
+			}
+			opts->pi_enable = true;
+			break;
+#endif
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a0ec40a..773f748 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -56,6 +56,7 @@  enum {
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
 	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 	NVMF_OPT_TOS		= 1 << 19,
+	NVMF_OPT_PI_ENABLE	= 1 << 20,
 };
 
 /**
@@ -89,6 +90,7 @@  enum {
  * @nr_write_queues: number of queues for write I/O
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
+ * @pi_enable: Enable metadata (T10-PI) support
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -111,6 +113,7 @@  struct nvmf_ctrl_options {
 	unsigned int		nr_write_queues;
 	unsigned int		nr_poll_queues;
 	int			tos;
+	bool			pi_enable;
 };
 
 /*