Message ID | 20231023092837.33786-10-hare@suse.de (mailing list archive) |
---|---|
State | Deferred |
Headers | show |
Series | scsi: EH rework, main part | expand |
On Mon, Oct 23, 2023 at 11:28:36AM +0200, Hannes Reinecke wrote: > Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly > valid, and indicates that the request could not be executed due to > the transport being busy. > But that is not an I/O error, and we should return -EAGAIN from > scsi_ioctl_reset() to correctly inform userspace. I've had a short look at at least one user of this interface `sg_reset`, and that seems to handle this already: https://github.com/doug-gilbert/sg3_utils/blob/0e955c48621f7dc512e34554a060fe9f5cbc8d67/src/sg_reset.c#L251-L267 So this looks good to me. > > Suggested-by: Benjamin Block <bblock@linux.ibm.com> > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
On 10/23/23 4:28 AM, Hannes Reinecke wrote: > Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly > valid, and indicates that the request could not be executed due to > the transport being busy. I'm not sure if that's completely correct or maybe we have a different view of what it means to be busy. FC, iSCSI and SRP normally return it when the transport is marked as offline/lost, so for normal IO we fail it upwards and userspace gets an error. What drivers use it as temp busy error code? Is it the lpfc one or a snic one? > But that is not an I/O error, and we should return -EAGAIN from > scsi_ioctl_reset() to correctly inform userspace. For the sg_reset example, if you tried to run sg_reset again it would fail. When sg_reset tries to open the device the open function will return failure in the open call because the device is in the transport-offline state for FC/iSCSI/SRP. If you are going to change the return value why not sync it with what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK?
On 10/26/23 18:50, Mike Christie wrote: > On 10/23/23 4:28 AM, Hannes Reinecke wrote: >> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly >> valid, and indicates that the request could not be executed due to >> the transport being busy. > > I'm not sure if that's completely correct or maybe we have a different > view of what it means to be busy. > > FC, iSCSI and SRP normally return it when the transport is marked as > offline/lost, so for normal IO we fail it upwards and userspace gets > an error. > > What drivers use it as temp busy error code? Is it the lpfc one > or a snic one? > This patch is primarily for sg_reset(), which would return -EIO whenever one of the EH functions would return FAST_IO_FAIL. It got triggered by my patch to zfcp, which now returns FAST_IO_FAIL from host reset as remote port login is happening from a worker thread, and the devices are not (yet) available. >> But that is not an I/O error, and we should return -EAGAIN from >> scsi_ioctl_reset() to correctly inform userspace. > For the sg_reset example, if you tried to run sg_reset again it would > fail. When sg_reset tries to open the device the open function will > return failure in the open call because the device is in the > transport-offline state for FC/iSCSI/SRP. > > If you are going to change the return value why not sync it with > what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK? Good point. The alternative would be to map FAST_IO_FAIL back to SUCCESS as after all the host reset completed successfully, and blocked ports are not really an error. Lemme check. Cheers, Hannes
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 7c655d08a305..8e184d92abe9 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -2453,22 +2453,25 @@ scsi_ioctl_reset(struct scsi_device *dev, int __user *arg) break; case SG_SCSI_RESET_DEVICE: rtn = scsi_try_bus_device_reset(dev); - if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE)) + if (rtn == SUCCESS || rtn == FAST_IO_FAIL || + (val & SG_SCSI_RESET_NO_ESCALATE)) break; fallthrough; case SG_SCSI_RESET_TARGET: rtn = scsi_try_target_reset(shost, starget); - if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE)) + if (rtn == SUCCESS || rtn == FAST_IO_FAIL || + (val & SG_SCSI_RESET_NO_ESCALATE)) break; fallthrough; case SG_SCSI_RESET_BUS: rtn = scsi_try_bus_reset(shost, dev->channel); - if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE)) + if (rtn == SUCCESS || rtn == FAST_IO_FAIL || + (val & SG_SCSI_RESET_NO_ESCALATE)) break; fallthrough; case SG_SCSI_RESET_HOST: rtn = scsi_try_host_reset(shost); - if (rtn == SUCCESS) + if (rtn == SUCCESS || rtn == FAST_IO_FAIL) break; fallthrough; default: @@ -2476,7 +2479,17 @@ scsi_ioctl_reset(struct scsi_device *dev, int __user *arg) break; } - error = (rtn == SUCCESS) ? 0 : -EIO; + switch (rtn) { + case SUCCESS: + error = 0; + break; + case FAST_IO_FAIL: + error = -EAGAIN; + break; + default: + error = -EIO; + break; + } spin_lock_irqsave(shost->host_lock, flags); shost->tmf_in_progress = 0;
Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly valid, and indicates that the request could not be executed due to the transport being busy. But that is not an I/O error, and we should return -EAGAIN from scsi_ioctl_reset() to correctly inform userspace. Suggested-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/scsi_error.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)