Message ID | 20220809000419.10674-13-michael.christie@oracle.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | Use block pr_ops in LIO | expand |
On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: > To handle both cases, this patch adds a blk_status_t arg to the pr_ops > callouts. The lower levels will convert their device specific error to > the blk_status_t then the upper levels can easily check that code > without knowing the device type. It also allows us to keep userspace > compat where it expects a negative -Exyz error code if the command fails > before it's sent to the device or a device/tranport specific value if the > error is > 0. Why do we need two return values here? -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 8/9/22 2:21 AM, Christoph Hellwig wrote: > On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: >> To handle both cases, this patch adds a blk_status_t arg to the pr_ops >> callouts. The lower levels will convert their device specific error to >> the blk_status_t then the upper levels can easily check that code >> without knowing the device type. It also allows us to keep userspace >> compat where it expects a negative -Exyz error code if the command fails >> before it's sent to the device or a device/tranport specific value if the >> error is > 0. > > Why do we need two return values here? I know the 2 return values are gross :) I can do it in one, but I wasn't sure what's worse. See below for the other possible solutions. I think they are all bad. 0. Convert device specific conflict error to -EBADE then back: sd_pr_command() ..... /* would add similar check for NVME_SC_RESERVATION_CONFLICT in nvme */ if (result == SAM_STAT_CHECK_CONDITION) return -EBADE; else return result; LIO then just checks for -EBADE but when going to userspace we have to convert: blkdev_pr_register() ... result = ops->pr_register() if (result < 0) { /* For compat we must convert back to the nvme/scsi code */ if (result == -EBADE) { /* need some helper for this that calls down the stack */ if (bdev == SCSI) return SAM_STAT_RESERVATION_CONFLICT else return NVME_SC_RESERVATION_CONFLICT } else return blk_status_to_str(result) } else return result; The conversion is kind of gross and I was thinking in the future it's going to get worse. I'm going to want to have more advanced error handling in LIO and dm-multipath. Like dm-multipath wants to know if an pr_op failed because of a path failure, so it can retry another one, or a hard device/target error. It would be nice for LIO if an PGR had bad/illegal values and the device returned an error than I could detect that. 1. Drop the -Exyz error type and use blk_status_t in the kernel: sd_pr_command() ..... if (result < 0) return -errno_to_blk_status(result); else if (result == SAM_STAT_CHECK_CONDITION) return -BLK_STS_NEXUS; else return result; blkdev_pr_register() ... result = ops->pr_register() if (result < 0) { /* For compat we must convert back to the nvme/scsi code */ if (result == -BLK_STS_NEXUS) { /* need some helper for this that calls down the stack */ if (bdev == SCSI) return SAM_STAT_RESERVATION_CONFLICT else return NVME_SC_RESERVATION_CONFLICT } else return blk_status_to_str(result) } else return result; This has similar issues as #0 where we have to convert before returning to userspace. Note: In this case, if the block layer uses an -Exyz error code there's not BLK_STS for then we would return -EIO to userspace now. I was thinking that might not be ok but I could also just add a BLK_STS error code for errors like EINVAL, EWOULDBLOCK, ENOMEM, etc so that doesn't happen. 2. We could do something like below where the low levels are not changed but the caller converts: sd_pr_command() /* no changes */ lio() result = ops->pr_register() if (result > 0) { /* add some stacked helper again that goes through dm and * to the low level device */ if (bdev == SCSI) { result = scsi_result_to_blk_status(result) else result = nvme_error_status(result) This looks simple, but it felt wrong having upper layers having to know the device type and calling conversion functions. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 8/9/22 11:08, Mike Christie wrote: > On 8/9/22 2:21 AM, Christoph Hellwig wrote: >> On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: >>> To handle both cases, this patch adds a blk_status_t arg to the pr_ops >>> callouts. The lower levels will convert their device specific error to >>> the blk_status_t then the upper levels can easily check that code >>> without knowing the device type. It also allows us to keep userspace >>> compat where it expects a negative -Exyz error code if the command fails >>> before it's sent to the device or a device/tranport specific value if the >>> error is > 0. >> >> Why do we need two return values here? > > I know the 2 return values are gross :) I can do it in one, but I wasn't sure > what's worse. See below for the other possible solutions. I think they are all > bad. > > > 0. Convert device specific conflict error to -EBADE then back: > > sd_pr_command() > > ..... > > /* would add similar check for NVME_SC_RESERVATION_CONFLICT in nvme */ > if (result == SAM_STAT_CHECK_CONDITION) > return -EBADE; > else > return result; > > > LIO then just checks for -EBADE but when going to userspace we have to > convert: > > > blkdev_pr_register() > > ... > result = ops->pr_register() > if (result < 0) { > /* For compat we must convert back to the nvme/scsi code */ > if (result == -EBADE) { > /* need some helper for this that calls down the stack */ > if (bdev == SCSI) > return SAM_STAT_RESERVATION_CONFLICT > else > return NVME_SC_RESERVATION_CONFLICT > } else > return blk_status_to_str(result) > } else > return result; > > > The conversion is kind of gross and I was thinking in the future it's going > to get worse. I'm going to want to have more advanced error handling in LIO > and dm-multipath. Like dm-multipath wants to know if an pr_op failed because > of a path failure, so it can retry another one, or a hard device/target error. > It would be nice for LIO if an PGR had bad/illegal values and the device > returned an error than I could detect that. > > > 1. Drop the -Exyz error type and use blk_status_t in the kernel: > > sd_pr_command() > > ..... > if (result < 0) > return -errno_to_blk_status(result); > else if (result == SAM_STAT_CHECK_CONDITION) > return -BLK_STS_NEXUS; > else > return result; > > blkdev_pr_register() > > ... > result = ops->pr_register() > if (result < 0) { > /* For compat we must convert back to the nvme/scsi code */ > if (result == -BLK_STS_NEXUS) { > /* need some helper for this that calls down the stack */ > if (bdev == SCSI) > return SAM_STAT_RESERVATION_CONFLICT > else > return NVME_SC_RESERVATION_CONFLICT > } else > return blk_status_to_str(result) > } else > return result; > > This has similar issues as #0 where we have to convert before returning to > userspace. > > > Note: In this case, if the block layer uses an -Exyz error code there's not > BLK_STS for then we would return -EIO to userspace now. I was thinking > that might not be ok but I could also just add a BLK_STS error code > for errors like EINVAL, EWOULDBLOCK, ENOMEM, etc so that doesn't happen. > > > 2. We could do something like below where the low levels are not changed but the > caller converts: > > sd_pr_command() > /* no changes */ > > lio() > result = ops->pr_register() > if (result > 0) { > /* add some stacked helper again that goes through dm and > * to the low level device > */ > if (bdev == SCSI) { > result = scsi_result_to_blk_status(result) > else > result = nvme_error_status(result) > > > This looks simple, but it felt wrong having upper layers having to > know the device type and calling conversion functions. Has it been considered to introduce a new enumeration type instead of choosing (0), (1) or (2)? Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
On 8/9/22 2:33 PM, Bart Van Assche wrote: > On 8/9/22 11:08, Mike Christie wrote: >> On 8/9/22 2:21 AM, Christoph Hellwig wrote: >>> On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: >>>> To handle both cases, this patch adds a blk_status_t arg to the pr_ops >>>> callouts. The lower levels will convert their device specific error to >>>> the blk_status_t then the upper levels can easily check that code >>>> without knowing the device type. It also allows us to keep userspace >>>> compat where it expects a negative -Exyz error code if the command fails >>>> before it's sent to the device or a device/tranport specific value if the >>>> error is > 0. >>> >>> Why do we need two return values here? >> >> I know the 2 return values are gross :) I can do it in one, but I wasn't sure >> what's worse. See below for the other possible solutions. I think they are all >> bad. >> >> >> 0. Convert device specific conflict error to -EBADE then back: >> >> sd_pr_command() >> >> ..... >> >> /* would add similar check for NVME_SC_RESERVATION_CONFLICT in nvme */ >> if (result == SAM_STAT_CHECK_CONDITION) >> return -EBADE; >> else >> return result; >> >> >> LIO then just checks for -EBADE but when going to userspace we have to >> convert: >> >> >> blkdev_pr_register() >> >> ... >> result = ops->pr_register() >> if (result < 0) { >> /* For compat we must convert back to the nvme/scsi code */ >> if (result == -EBADE) { >> /* need some helper for this that calls down the stack */ >> if (bdev == SCSI) >> return SAM_STAT_RESERVATION_CONFLICT >> else >> return NVME_SC_RESERVATION_CONFLICT >> } else >> return blk_status_to_str(result) >> } else >> return result; >> >> >> The conversion is kind of gross and I was thinking in the future it's going >> to get worse. I'm going to want to have more advanced error handling in LIO >> and dm-multipath. Like dm-multipath wants to know if an pr_op failed because >> of a path failure, so it can retry another one, or a hard device/target error. >> It would be nice for LIO if an PGR had bad/illegal values and the device >> returned an error than I could detect that. >> >> >> 1. Drop the -Exyz error type and use blk_status_t in the kernel: >> >> sd_pr_command() >> >> ..... >> if (result < 0) >> return -errno_to_blk_status(result); >> else if (result == SAM_STAT_CHECK_CONDITION) >> return -BLK_STS_NEXUS; >> else >> return result; >> >> blkdev_pr_register() >> >> ... >> result = ops->pr_register() >> if (result < 0) { >> /* For compat we must convert back to the nvme/scsi code */ >> if (result == -BLK_STS_NEXUS) { >> /* need some helper for this that calls down the stack */ >> if (bdev == SCSI) >> return SAM_STAT_RESERVATION_CONFLICT >> else >> return NVME_SC_RESERVATION_CONFLICT >> } else >> return blk_status_to_str(result) >> } else >> return result; >> >> This has similar issues as #0 where we have to convert before returning to >> userspace. >> >> >> Note: In this case, if the block layer uses an -Exyz error code there's not >> BLK_STS for then we would return -EIO to userspace now. I was thinking >> that might not be ok but I could also just add a BLK_STS error code >> for errors like EINVAL, EWOULDBLOCK, ENOMEM, etc so that doesn't happen. >> >> >> 2. We could do something like below where the low levels are not changed but the >> caller converts: >> >> sd_pr_command() >> /* no changes */ >> >> lio() >> result = ops->pr_register() >> if (result > 0) { >> /* add some stacked helper again that goes through dm and >> * to the low level device >> */ >> if (bdev == SCSI) { >> result = scsi_result_to_blk_status(result) >> else >> result = nvme_error_status(result) >> >> >> This looks simple, but it felt wrong having upper layers having to >> know the device type and calling conversion functions. > > Has it been considered to introduce a new enumeration type instead of choosing (0), (1) or (2)? > The problem is that userspace currently gets the nvme status value or the scsi_cmnd->result which can be host/status byte values like with SG IO. So you could you just do a new enum or add every possible error to blk_status_t but before passing back to userspace you still have to then convert to what format userspace is getting today. So for scsi devices, you have to mimic the host_byte. -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
diff --git a/block/ioctl.c b/block/ioctl.c index 60121e89052b..72338c56e235 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -269,7 +269,8 @@ static int blkdev_pr_register(struct block_device *bdev, if (reg.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags); + return ops->pr_register(bdev, reg.old_key, reg.new_key, reg.flags, + NULL); } static int blkdev_pr_reserve(struct block_device *bdev, @@ -287,7 +288,7 @@ static int blkdev_pr_reserve(struct block_device *bdev, if (rsv.flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; - return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags); + return ops->pr_reserve(bdev, rsv.key, rsv.type, rsv.flags, NULL); } static int blkdev_pr_release(struct block_device *bdev, @@ -305,7 +306,7 @@ static int blkdev_pr_release(struct block_device *bdev, if (rsv.flags) return -EOPNOTSUPP; - return ops->pr_release(bdev, rsv.key, rsv.type); + return ops->pr_release(bdev, rsv.key, rsv.type, NULL); } static int blkdev_pr_preempt(struct block_device *bdev, @@ -323,7 +324,7 @@ static int blkdev_pr_preempt(struct block_device *bdev, if (p.flags) return -EOPNOTSUPP; - return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort); + return ops->pr_preempt(bdev, p.old_key, p.new_key, p.type, abort, NULL); } static int blkdev_pr_clear(struct block_device *bdev, @@ -341,7 +342,7 @@ static int blkdev_pr_clear(struct block_device *bdev, if (c.flags) return -EOPNOTSUPP; - return ops->pr_clear(bdev, c.key); + return ops->pr_clear(bdev, c.key, NULL); } static int blkdev_flushbuf(struct block_device *bdev, fmode_t mode, diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1b15295bdf24..ac39e5d303b9 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -3080,7 +3080,8 @@ struct dm_pr { bool abort; bool fail_early; int ret; - enum pr_type type; + enum pr_type type; + blk_status_t *blk_stat; }; static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn, @@ -3131,7 +3132,8 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev, return -1; } - ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags); + ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags, + pr->blk_stat); if (!ret) return 0; @@ -3145,7 +3147,7 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags) + u32 flags, blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = old_key, @@ -3153,6 +3155,7 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, .flags = flags, .fail_early = true, .ret = 0, + .blk_stat = blk_stat, }; int ret; @@ -3190,7 +3193,8 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev, return -1; } - pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags); + pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags, + pr->blk_stat); if (!pr->ret) return -1; @@ -3198,7 +3202,7 @@ static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, - u32 flags) + u32 flags, blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = key, @@ -3206,6 +3210,7 @@ static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, .type = type, .fail_early = false, .ret = 0, + .blk_stat = blk_stat, }; int ret; @@ -3233,19 +3238,22 @@ static int __dm_pr_release(struct dm_target *ti, struct dm_dev *dev, return -1; } - pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type); + pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type, + pr->blk_stat); if (pr->ret) return -1; return 0; } -static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { struct dm_pr pr = { .old_key = key, .type = type, .fail_early = false, + .blk_stat = blk_stat, }; int ret; @@ -3268,7 +3276,7 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev, } pr->ret = ops->pr_preempt(dev->bdev, pr->old_key, pr->new_key, pr->type, - pr->abort); + pr->abort, pr->blk_stat); if (!pr->ret) return -1; @@ -3276,13 +3284,14 @@ static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev, } static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { struct dm_pr pr = { .new_key = new_key, .old_key = old_key, .type = type, .fail_early = false, + .blk_stat = blk_stat, }; int ret; @@ -3293,7 +3302,8 @@ static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, return pr.ret; } -static int dm_pr_clear(struct block_device *bdev, u64 key) +static int dm_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3305,7 +3315,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key) ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_clear) - r = ops->pr_clear(bdev, key); + r = ops->pr_clear(bdev, key, blk_stat); else r = -EOPNOTSUPP; out: @@ -3314,7 +3324,7 @@ static int dm_pr_clear(struct block_device *bdev, u64 key) } static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, - u32 keys_len) + u32 keys_len, blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3326,7 +3336,7 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_read_keys) - r = ops->pr_read_keys(bdev, keys, keys_len); + r = ops->pr_read_keys(bdev, keys, keys_len, blk_stat); else r = -EOPNOTSUPP; out: @@ -3335,7 +3345,8 @@ static int dm_pr_read_keys(struct block_device *bdev, struct pr_keys *keys, } static int dm_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *rsv) + struct pr_held_reservation *rsv, + blk_status_t *blk_stat) { struct mapped_device *md = bdev->bd_disk->private_data; const struct pr_ops *ops; @@ -3347,7 +3358,7 @@ static int dm_pr_read_reservation(struct block_device *bdev, ops = bdev->bd_disk->fops->pr_ops; if (ops && ops->pr_read_reservation) - r = ops->pr_read_reservation(bdev, rsv); + r = ops->pr_read_reservation(bdev, rsv, blk_stat); else r = -EOPNOTSUPP; out: diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5bbc1d84a87e..49bd745d28e2 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2148,7 +2148,7 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10, } static int nvme_pr_register(struct block_device *bdev, u64 old, - u64 new, unsigned flags) + u64 new, unsigned flags, blk_status_t *blk_stat) { u32 cdw10; @@ -2162,7 +2162,7 @@ static int nvme_pr_register(struct block_device *bdev, u64 old, } static int nvme_pr_reserve(struct block_device *bdev, u64 key, - enum pr_type type, unsigned flags) + enum pr_type type, unsigned flags, blk_status_t *blk_stat) { u32 cdw10; @@ -2175,21 +2175,23 @@ static int nvme_pr_reserve(struct block_device *bdev, u64 key, } static int nvme_pr_preempt(struct block_device *bdev, u64 old, u64 new, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { u32 cdw10 = nvme_pr_type(type) << 8 | (abort ? 2 : 1); return nvme_pr_command(bdev, cdw10, old, new, nvme_cmd_resv_acquire); } -static int nvme_pr_clear(struct block_device *bdev, u64 key) +static int nvme_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { u32 cdw10 = 1 | (key ? 1 << 3 : 0); return nvme_pr_command(bdev, cdw10, key, 0, nvme_cmd_resv_register); } -static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int nvme_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { u32 cdw10 = nvme_pr_type(type) << 8 | (key ? 1 << 3 : 0); @@ -2224,7 +2226,7 @@ static int nvme_pr_resv_report(struct block_device *bdev, u8 *data, } static int nvme_pr_read_keys(struct block_device *bdev, - struct pr_keys *keys_info, u32 keys_len) + struct pr_keys *keys_info, u32 keys_len, blk_status_t *blk_stat) { struct nvme_reservation_status *status; u32 data_len, num_ret_keys; @@ -2268,7 +2270,7 @@ static int nvme_pr_read_keys(struct block_device *bdev, } static int nvme_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *resv) + struct pr_held_reservation *resv, blk_status_t *blk_stat) { struct nvme_reservation_status tmp_status, *status; int ret, i, num_regs; diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index f1d4d0568075..bf080de9866d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1708,7 +1708,7 @@ static int sd_pr_in_command(struct block_device *bdev, u8 sa, } static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info, - u32 keys_len) + u32 keys_len, blk_status_t *blk_stat) { int result, i, data_offset, num_copy_keys; int data_len = keys_len + 8; @@ -1739,7 +1739,8 @@ static int sd_pr_read_keys(struct block_device *bdev, struct pr_keys *keys_info, } static int sd_pr_read_reservation(struct block_device *bdev, - struct pr_held_reservation *rsv) + struct pr_held_reservation *rsv, + blk_status_t *blk_stat) { struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk); struct scsi_device *sdev = sdkp->device; @@ -1769,8 +1770,8 @@ static int sd_pr_read_reservation(struct block_device *bdev, return 0; } -static int sd_pr_out_command(struct block_device *bdev, u8 sa, - u64 key, u64 sa_key, u8 type, u8 flags) +static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key, + u64 sa_key, u8 type, u8 flags) { struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk); struct scsi_device *sdev = sdkp->device; @@ -1801,7 +1802,7 @@ static int sd_pr_out_command(struct block_device *bdev, u8 sa, } static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags) + u32 flags, blk_status_t *blk_stat) { if (flags & ~PR_FL_IGNORE_KEY) return -EOPNOTSUPP; @@ -1811,7 +1812,7 @@ static int sd_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, } static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, - u32 flags) + u32 flags, blk_status_t *blk_stat) { if (flags) return -EOPNOTSUPP; @@ -1819,20 +1820,22 @@ static int sd_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, block_pr_type_to_scsi(type), 0); } -static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type) +static int sd_pr_release(struct block_device *bdev, u64 key, enum pr_type type, + blk_status_t *blk_stat) { return sd_pr_out_command(bdev, 0x02, key, 0, block_pr_type_to_scsi(type), 0); } static int sd_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort) + enum pr_type type, bool abort, blk_status_t *blk_stat) { return sd_pr_out_command(bdev, abort ? 0x05 : 0x04, old_key, new_key, block_pr_type_to_scsi(type), 0); } -static int sd_pr_clear(struct block_device *bdev, u64 key) +static int sd_pr_clear(struct block_device *bdev, u64 key, + blk_status_t *blk_stat) { return sd_pr_out_command(bdev, 0x03, key, 0, 0, 0); } diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index 5e56da748b2a..8726c1473d55 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -29,7 +29,7 @@ bl_free_device(struct pnfs_block_dev *dev) int error; error = ops->pr_register(dev->bdev, dev->pr_key, 0, - false); + false, NULL); if (error) pr_err("failed to unregister PR key.\n"); } @@ -382,7 +382,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d, goto out_blkdev_put; } - error = ops->pr_register(d->bdev, 0, d->pr_key, true); + error = ops->pr_register(d->bdev, 0, d->pr_key, true, NULL); if (error) { pr_err("pNFS: failed to register key for block device %s.", d->bdev->bd_disk->disk_name); diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c index b6d01d51a746..a302ea026f72 100644 --- a/fs/nfsd/blocklayout.c +++ b/fs/nfsd/blocklayout.c @@ -277,7 +277,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb, goto out_free_dev; } - ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true); + ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true, NULL); if (ret) { pr_err("pNFS: failed to register key for device %s.\n", sb->s_id); @@ -285,7 +285,7 @@ nfsd4_block_get_device_info_scsi(struct super_block *sb, } ret = ops->pr_reserve(sb->s_bdev, NFSD_MDS_PR_KEY, - PR_EXCLUSIVE_ACCESS_REG_ONLY, 0); + PR_EXCLUSIVE_ACCESS_REG_ONLY, 0, NULL); if (ret) { pr_err("pNFS: failed to reserve device %s.\n", sb->s_id); @@ -331,7 +331,7 @@ nfsd4_scsi_fence_client(struct nfs4_layout_stateid *ls) struct block_device *bdev = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_bdev; bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, - nfsd4_scsi_pr_key(clp), 0, true); + nfsd4_scsi_pr_key(clp), 0, true, NULL); } const struct nfsd4_layout_ops scsi_layout_ops = { diff --git a/include/linux/pr.h b/include/linux/pr.h index 79b3d2853a20..2cbe97f06490 100644 --- a/include/linux/pr.h +++ b/include/linux/pr.h @@ -18,14 +18,15 @@ struct pr_held_reservation { struct pr_ops { int (*pr_register)(struct block_device *bdev, u64 old_key, u64 new_key, - u32 flags); + u32 flags, blk_status_t *blk_stat); int (*pr_reserve)(struct block_device *bdev, u64 key, - enum pr_type type, u32 flags); + enum pr_type type, u32 flags, blk_status_t *blk_stat); int (*pr_release)(struct block_device *bdev, u64 key, - enum pr_type type); + enum pr_type type, blk_status_t *blk_stat); int (*pr_preempt)(struct block_device *bdev, u64 old_key, u64 new_key, - enum pr_type type, bool abort); - int (*pr_clear)(struct block_device *bdev, u64 key); + enum pr_type type, bool abort, blk_status_t *blk_stat); + int (*pr_clear)(struct block_device *bdev, u64 key, + blk_status_t *blk_stat); /* * pr_read_keys - Read the registered keys and return them in the * pr_keys->keys array. The keys array will have been allocated at the @@ -35,9 +36,11 @@ struct pr_ops { * contains, so the caller can retry with a larger array. */ int (*pr_read_keys)(struct block_device *bdev, - struct pr_keys *keys_info, u32 keys_len); + struct pr_keys *keys_info, u32 keys_len, + blk_status_t *blk_stat); int (*pr_read_reservation)(struct block_device *bdev, - struct pr_held_reservation *rsv); + struct pr_held_reservation *rsv, + blk_status_t *blk_stat); }; #endif /* LINUX_PR_H */
Kernel pr_ops users like LIO need to be able to know about if a failure was a result of a reservation conflict and then be able to convert from the lower level's definition of that error to SCSI so it can be returned to the initiator. To do this they currently have to know the lower level device type and this can be difficult when we have dm-multipath between LIO and the device. dm-multipath would also like to be able to distiguish between path failures and reservation conflict so they can optimize their error handlers for their pr_ops. To handle both cases, this patch adds a blk_status_t arg to the pr_ops callouts. The lower levels will convert their device specific error to the blk_status_t then the upper levels can easily check that code without knowing the device type. It also allows us to keep userspace compat where it expects a negative -Exyz error code if the command fails before it's sent to the device or a device/tranport specific value if the error is > 0. This patch just wires in the blk_status_t to the pr_ops callouts. The next patches will then have the drivers pass up a blk_status_t. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- block/ioctl.c | 11 ++++++----- drivers/md/dm.c | 41 +++++++++++++++++++++++++--------------- drivers/nvme/host/core.c | 16 +++++++++------- drivers/scsi/sd.c | 21 +++++++++++--------- fs/nfs/blocklayout/dev.c | 4 ++-- fs/nfsd/blocklayout.c | 6 +++--- include/linux/pr.h | 17 ++++++++++------- 7 files changed, 68 insertions(+), 48 deletions(-)