diff mbox series

[v2,2/2] nvme-multipath: fix path failover for integrity ns

Message ID 20230424225442.18916-3-mgurtovoy@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Fix failover to non integrity NVMe path | expand

Commit Message

Max Gurtovoy April 24, 2023, 10:54 p.m. UTC
In case the integrity capabilities of the failed path and the failover
path don't match, we may run into NULL dereference. Free the integrity
context during the path failover and let the block layer prepare it
again if needed during bio_submit.

Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Tested-by: Ori Evron <oevron@nvidia.com>
Signed-off-by: Ori Evron <oevron@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/host/multipath.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Martin K. Petersen April 25, 2023, 2:12 a.m. UTC | #1
Hi Max!

> In case the integrity capabilities of the failed path and the failover
> path don't match, we may run into NULL dereference. Free the integrity
> context during the path failover and let the block layer prepare it
> again if needed during bio_submit.

This assumes that the protection information is just an ephemeral
checksum. However, that is not always the case. The application may
store values in the application or storage tags which must be returned
on a subsequent read.

In addition, in some overseas markets (financial, government), PI is a
regulatory requirement. It would be really bad for us to expose a device
claiming PI support and then it turns out the protection isn't actually
always active.

DM multipathing doesn't allow mismatched integrity profiles. I don't
think NVMe should either.
Max Gurtovoy April 25, 2023, 10:24 a.m. UTC | #2
On 25/04/2023 5:12, Martin K. Petersen wrote:
> 
> Hi Max!
> 
>> In case the integrity capabilities of the failed path and the failover
>> path don't match, we may run into NULL dereference. Free the integrity
>> context during the path failover and let the block layer prepare it
>> again if needed during bio_submit.
> 
> This assumes that the protection information is just an ephemeral
> checksum. However, that is not always the case. The application may
> store values in the application or storage tags which must be returned
> on a subsequent read.

Interesting.

Maybe you can point me to this API that allow applications to store 
application tag in PI field ?

I see that app_tag is 0 in Linux and we don't set the nvme_cmd->apptag 
to non zero value.
It's been a while since I was working on this so I might be wrong here :).

I've noticed that in t10_pi_generate and ext_pi_crc64_generate we set it 
to 0 as well.

The way I see it now, and I might be wrong, is that the Linux kernel is 
not supporting application to store apptag values unless it's using some 
passthrough command.

> 
> In addition, in some overseas markets (financial, government), PI is a
> regulatory requirement. It would be really bad for us to expose a device
> claiming PI support and then it turns out the protection isn't actually
> always active.
> 
> DM multipathing doesn't allow mismatched integrity profiles. I don't
> think NVMe should either.
> 

AFAIU, the DM multipath is not a specification but a Linux 
implementation. NVMe multipathing follows a standard.
Keith Busch April 25, 2023, 10:39 p.m. UTC | #3
On Tue, Apr 25, 2023 at 01:24:31PM +0300, Max Gurtovoy wrote:
> 
> 
> On 25/04/2023 5:12, Martin K. Petersen wrote:
> > 
> > Hi Max!
> > 
> > > In case the integrity capabilities of the failed path and the failover
> > > path don't match, we may run into NULL dereference. Free the integrity
> > > context during the path failover and let the block layer prepare it
> > > again if needed during bio_submit.
> > 
> > This assumes that the protection information is just an ephemeral
> > checksum. However, that is not always the case. The application may
> > store values in the application or storage tags which must be returned
> > on a subsequent read.
> 
> Interesting.
> 
> Maybe you can point me to this API that allow applications to store
> application tag in PI field ?

I think Martin might be considering kernel modules as apps since they can
access the integrity payload no problem. I know of at least one out-of-tree
module (ex: OpenCAS) that utilizes the apptag (not that upstream necessarily
should care about such modules..).

Outside that, passthrough ioctls and io_uring_cmd can access the fields. I
don't think those interfaces apply to what you're changing, though, since these
bypass the submit_bio() interface.
 
> I see that app_tag is 0 in Linux and we don't set the nvme_cmd->apptag to
> non zero value.
> It's been a while since I was working on this so I might be wrong here :).
> 
> I've noticed that in t10_pi_generate and ext_pi_crc64_generate we set it to
> 0 as well.
> 
> The way I see it now, and I might be wrong, is that the Linux kernel is not
> supporting application to store apptag values unless it's using some
> passthrough command.

If we're considering only in-tree usage, I think you're correct.

> > In addition, in some overseas markets (financial, government), PI is a
> > regulatory requirement. It would be really bad for us to expose a device
> > claiming PI support and then it turns out the protection isn't actually
> > always active.
> > 
> > DM multipathing doesn't allow mismatched integrity profiles. I don't
> > think NVMe should either.
> > 
> 
> AFAIU, the DM multipath is not a specification but a Linux implementation.
> NVMe multipathing follows a standard.

If we're talking about any of the nvme passthrough interfaces, which format
will the user space need to construct its command for? This seems confusing
since userspace doesn't have direct control over which path the command will be
dispatched on, so it won't know which format it needs to cater to, nor will it
have deterministic behavior.
Martin K. Petersen April 26, 2023, 1:22 a.m. UTC | #4
Hi Max!

> The way I see it now, and I might be wrong, is that the Linux kernel
> is not supporting application to store apptag values unless it's using
> some passthrough command.

As Keith mentioned, there are various ways.

But let's assume you are a multipathed storage device that says "Yes, I
support encryption". And then one of the paths doesn't and just lets
things go across the wire in plain text.

I think most people would agree that would be a *bad* thing.

The expectation is that when a device reports that a capability is
enabled, that this capability is actually effective. Protection
information is no different. The PI is part of the data and it needs to
be validated by the recipient.

Silently throwing away the protection information defies the very
premise of why data integrity protection was defined in the first place.
diff mbox series

Patch

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9171452e2f6d..f439916f4447 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -6,6 +6,7 @@ 
 #include <linux/backing-dev.h>
 #include <linux/moduleparam.h>
 #include <linux/vmalloc.h>
+#include <linux/blk-integrity.h>
 #include <trace/events/block.h>
 #include "nvme.h"
 
@@ -106,6 +107,14 @@  void nvme_failover_req(struct request *req)
 			bio->bi_opf &= ~REQ_POLLED;
 			bio->bi_cookie = BLK_QC_T_NONE;
 		}
+		/*
+		 * If the failover path will not be integrity capable the bio
+		 * should not have integrity context.
+		 * If the failover path will be integrity capable the bio will
+		 * be prepared for integrity again.
+		 */
+		if (bio_integrity(bio))
+			bio_integrity_free(bio);
 	}
 	blk_steal_bios(&ns->head->requeue_list, req);
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);