Message ID | 20230424225442.18916-3-mgurtovoy@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix failover to non integrity NVMe path | expand |
Hi Max! > In case the integrity capabilities of the failed path and the failover > path don't match, we may run into NULL dereference. Free the integrity > context during the path failover and let the block layer prepare it > again if needed during bio_submit. This assumes that the protection information is just an ephemeral checksum. However, that is not always the case. The application may store values in the application or storage tags which must be returned on a subsequent read. In addition, in some overseas markets (financial, government), PI is a regulatory requirement. It would be really bad for us to expose a device claiming PI support and then it turns out the protection isn't actually always active. DM multipathing doesn't allow mismatched integrity profiles. I don't think NVMe should either.
On 25/04/2023 5:12, Martin K. Petersen wrote: > > Hi Max! > >> In case the integrity capabilities of the failed path and the failover >> path don't match, we may run into NULL dereference. Free the integrity >> context during the path failover and let the block layer prepare it >> again if needed during bio_submit. > > This assumes that the protection information is just an ephemeral > checksum. However, that is not always the case. The application may > store values in the application or storage tags which must be returned > on a subsequent read. Interesting. Maybe you can point me to this API that allow applications to store application tag in PI field ? I see that app_tag is 0 in Linux and we don't set the nvme_cmd->apptag to non zero value. It's been a while since I was working on this so I might be wrong here :). I've noticed that in t10_pi_generate and ext_pi_crc64_generate we set it to 0 as well. The way I see it now, and I might be wrong, is that the Linux kernel is not supporting application to store apptag values unless it's using some passthrough command. > > In addition, in some overseas markets (financial, government), PI is a > regulatory requirement. It would be really bad for us to expose a device > claiming PI support and then it turns out the protection isn't actually > always active. > > DM multipathing doesn't allow mismatched integrity profiles. I don't > think NVMe should either. > AFAIU, the DM multipath is not a specification but a Linux implementation. NVMe multipathing follows a standard.
On Tue, Apr 25, 2023 at 01:24:31PM +0300, Max Gurtovoy wrote: > > > On 25/04/2023 5:12, Martin K. Petersen wrote: > > > > Hi Max! > > > > > In case the integrity capabilities of the failed path and the failover > > > path don't match, we may run into NULL dereference. Free the integrity > > > context during the path failover and let the block layer prepare it > > > again if needed during bio_submit. > > > > This assumes that the protection information is just an ephemeral > > checksum. However, that is not always the case. The application may > > store values in the application or storage tags which must be returned > > on a subsequent read. > > Interesting. > > Maybe you can point me to this API that allow applications to store > application tag in PI field ? I think Martin might be considering kernel modules as apps since they can access the integrity payload no problem. I know of at least one out-of-tree module (ex: OpenCAS) that utilizes the apptag (not that upstream necessarily should care about such modules..). Outside that, passthrough ioctls and io_uring_cmd can access the fields. I don't think those interfaces apply to what you're changing, though, since these bypass the submit_bio() interface. > I see that app_tag is 0 in Linux and we don't set the nvme_cmd->apptag to > non zero value. > It's been a while since I was working on this so I might be wrong here :). > > I've noticed that in t10_pi_generate and ext_pi_crc64_generate we set it to > 0 as well. > > The way I see it now, and I might be wrong, is that the Linux kernel is not > supporting application to store apptag values unless it's using some > passthrough command. If we're considering only in-tree usage, I think you're correct. > > In addition, in some overseas markets (financial, government), PI is a > > regulatory requirement. It would be really bad for us to expose a device > > claiming PI support and then it turns out the protection isn't actually > > always active. > > > > DM multipathing doesn't allow mismatched integrity profiles. I don't > > think NVMe should either. > > > > AFAIU, the DM multipath is not a specification but a Linux implementation. > NVMe multipathing follows a standard. If we're talking about any of the nvme passthrough interfaces, which format will the user space need to construct its command for? This seems confusing since userspace doesn't have direct control over which path the command will be dispatched on, so it won't know which format it needs to cater to, nor will it have deterministic behavior.
Hi Max! > The way I see it now, and I might be wrong, is that the Linux kernel > is not supporting application to store apptag values unless it's using > some passthrough command. As Keith mentioned, there are various ways. But let's assume you are a multipathed storage device that says "Yes, I support encryption". And then one of the paths doesn't and just lets things go across the wire in plain text. I think most people would agree that would be a *bad* thing. The expectation is that when a device reports that a capability is enabled, that this capability is actually effective. Protection information is no different. The PI is part of the data and it needs to be validated by the recipient. Silently throwing away the protection information defies the very premise of why data integrity protection was defined in the first place.
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9171452e2f6d..f439916f4447 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -6,6 +6,7 @@ #include <linux/backing-dev.h> #include <linux/moduleparam.h> #include <linux/vmalloc.h> +#include <linux/blk-integrity.h> #include <trace/events/block.h> #include "nvme.h" @@ -106,6 +107,14 @@ void nvme_failover_req(struct request *req) bio->bi_opf &= ~REQ_POLLED; bio->bi_cookie = BLK_QC_T_NONE; } + /* + * If the failover path will not be integrity capable the bio + * should not have integrity context. + * If the failover path will be integrity capable the bio will + * be prepared for integrity again. + */ + if (bio_integrity(bio)) + bio_integrity_free(bio); } blk_steal_bios(&ns->head->requeue_list, req); spin_unlock_irqrestore(&ns->head->requeue_lock, flags);