[09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH

Message ID	20231023092837.33786-10-hare@suse.de (mailing list archive)
State	Deferred
Headers	show Return-Path: <linux-scsi-owner@vger.kernel.org> From: Hannes Reinecke <hare@suse.de> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de>, James Bottomley <james.bottomley@hansenpartnership.com>, linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de>, Benjamin Block <bblock@linux.ibm.com> Subject: [PATCH 09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH Date: Mon, 23 Oct 2023 11:28:36 +0200 Message-Id: <20231023092837.33786-10-hare@suse.de> In-Reply-To: <20231023092837.33786-1-hare@suse.de> References: <20231023092837.33786-1-hare@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	scsi: EH rework, main part \| expand [PATCHv8,00/10] scsi: EH rework, main part [01/10] scsi: Use Scsi_Host as argument for eh_host_reset_handler [02/10] scsi: Use Scsi_Host and channel number as argument for eh_bus_reset_handler() [03/10] scsi: Use scsi_target as argument for eh_target_reset_handler() [04/10] scsi: Use scsi_device as argument to eh_device_reset_handler() [05/10] scsi: set host byte after EH completed [06/10] scsi_error: iterate over list of failed commands in scsi_eh_bus_reset() [07/10] scsi: Do not allocate scsi command in scsi_ioctl_reset() [08/10] scsi_error: iterate over list of failed commands in scsi_eh_bus_device_reset() [09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH [10/10] scsi: remove SUBMITTED_BY_SCSI_RESET_IOCTL

Message ID

20231023092837.33786-10-hare@suse.de (mailing list archive)

State

Deferred

Headers

From: Hannes Reinecke <hare@suse.de>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
        James Bottomley <james.bottomley@hansenpartnership.com>,
        linux-scsi@vger.kernel.org, Hannes Reinecke <hare@suse.de>,
        Benjamin Block <bblock@linux.ibm.com>
Subject: [PATCH 09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH
Date: Mon, 23 Oct 2023 11:28:36 +0200
Message-Id: <20231023092837.33786-10-hare@suse.de>
In-Reply-To: <20231023092837.33786-1-hare@suse.de>
References: <20231023092837.33786-1-hare@suse.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

scsi: EH rework, main part | expand

Commit Message

Hannes Reinecke Oct. 23, 2023, 9:28 a.m. UTC

Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
valid, and indicates that the request could not be executed due to
the transport being busy.
But that is not an I/O error, and we should return -EAGAIN from
scsi_ioctl_reset() to correctly inform userspace.

Suggested-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

Comments

Benjamin Block Oct. 26, 2023, 2:08 p.m. UTC | #1

On Mon, Oct 23, 2023 at 11:28:36AM +0200, Hannes Reinecke wrote:
> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
> valid, and indicates that the request could not be executed due to
> the transport being busy.
> But that is not an I/O error, and we should return -EAGAIN from
> scsi_ioctl_reset() to correctly inform userspace.

I've had a short look at at least one user of this interface `sg_reset`, and
that seems to handle this already:

    https://github.com/doug-gilbert/sg3_utils/blob/0e955c48621f7dc512e34554a060fe9f5cbc8d67/src/sg_reset.c#L251-L267

So this looks good to me.

> 
> Suggested-by: Benjamin Block <bblock@linux.ibm.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/scsi_error.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 


Reviewed-by: Benjamin Block <bblock@linux.ibm.com>

Mike Christie Oct. 26, 2023, 4:50 p.m. UTC | #2

On 10/23/23 4:28 AM, Hannes Reinecke wrote:
> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
> valid, and indicates that the request could not be executed due to
> the transport being busy.

I'm not sure if that's completely correct or maybe we have a different
view of what it means to be busy.

FC, iSCSI and SRP normally return it when the transport is marked as
offline/lost, so for normal IO we fail it upwards and userspace gets
an error.

What drivers use it as temp busy error code? Is it the lpfc one
or a snic one?

> But that is not an I/O error, and we should return -EAGAIN from
> scsi_ioctl_reset() to correctly inform userspace.
For the sg_reset example, if you tried to run sg_reset again it would
fail. When sg_reset tries to open the device the open function will
return failure in the open call because the device is in the
transport-offline state for FC/iSCSI/SRP.

If you are going to change the return value why not sync it with
what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK?

Hannes Reinecke Oct. 27, 2023, 5:05 a.m. UTC | #3

On 10/26/23 18:50, Mike Christie wrote:
> On 10/23/23 4:28 AM, Hannes Reinecke wrote:
>> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
>> valid, and indicates that the request could not be executed due to
>> the transport being busy.
> 
> I'm not sure if that's completely correct or maybe we have a different
> view of what it means to be busy.
> 
> FC, iSCSI and SRP normally return it when the transport is marked as
> offline/lost, so for normal IO we fail it upwards and userspace gets
> an error.
> 
> What drivers use it as temp busy error code? Is it the lpfc one
> or a snic one?
> 
This patch is primarily for sg_reset(), which would return -EIO whenever
one of the EH functions would return FAST_IO_FAIL.
It got triggered by my patch to zfcp, which now returns FAST_IO_FAIL
from host reset as remote port login is happening from a worker thread,
and the devices are not (yet) available.

>> But that is not an I/O error, and we should return -EAGAIN from
>> scsi_ioctl_reset() to correctly inform userspace.
> For the sg_reset example, if you tried to run sg_reset again it would
> fail. When sg_reset tries to open the device the open function will
> return failure in the open call because the device is in the
> transport-offline state for FC/iSCSI/SRP.
> 
> If you are going to change the return value why not sync it with
> what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK?

Good point.
The alternative would be to map FAST_IO_FAIL back to SUCCESS as after
all the host reset completed successfully, and blocked ports are not
really an error.
Lemme check.

Cheers,

Hannes

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7c655d08a305..8e184d92abe9 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2453,22 +2453,25 @@  scsi_ioctl_reset(struct scsi_device *dev, int __user *arg)
 		break;
 	case SG_SCSI_RESET_DEVICE:
 		rtn = scsi_try_bus_device_reset(dev);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_TARGET:
 		rtn = scsi_try_target_reset(shost, starget);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_BUS:
 		rtn = scsi_try_bus_reset(shost, dev->channel);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_HOST:
 		rtn = scsi_try_host_reset(shost);
-		if (rtn == SUCCESS)
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL)
 			break;
 		fallthrough;
 	default:
@@ -2476,7 +2479,17 @@  scsi_ioctl_reset(struct scsi_device *dev, int __user *arg)
 		break;
 	}
 
-	error = (rtn == SUCCESS) ? 0 : -EIO;
+	switch (rtn) {
+	case SUCCESS:
+		error = 0;
+		break;
+	case FAST_IO_FAIL:
+		error = -EAGAIN;
+		break;
+	default:
+		error = -EIO;
+		break;
+	}
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->tmf_in_progress = 0;

[09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH

Commit Message

Comments

Patch