diff mbox series

[v2,14/20] scsi: Retry pr_ops commands if a UA is returned.

Message ID 20220809000419.10674-15-michael.christie@oracle.com (mailing list archive)
State Changes Requested, archived
Delegated to: Mike Snitzer
Headers show
Series Use block pr_ops in LIO | expand

Commit Message

Mike Christie Aug. 9, 2022, 12:04 a.m. UTC
It's common to get a UA when doing PR commands. It could be due to a
target restarting, transport level relogin or other PR commands like a
release causing it. The upper layers don't get the sense and in some cases
have no idea if it's a SCSI device, so this has the sd layer retry.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/sd.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Aug. 9, 2022, 7:16 a.m. UTC | #1
On Mon, Aug 08, 2022 at 07:04:13PM -0500, Mike Christie wrote:
> It's common to get a UA when doing PR commands. It could be due to a
> target restarting, transport level relogin or other PR commands like a
> release causing it. The upper layers don't get the sense and in some cases
> have no idea if it's a SCSI device, so this has the sd layer retry.

This seems like another case for the generic in-kernel passthrugh
command retry discussed in the other thread.

Can you split out two series with just bug fixes for nvme and scsi
as I think we should probably get those into 6.0, and then we can
do the actual feature on top of those?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Christie Aug. 9, 2022, 4:24 p.m. UTC | #2
On 8/9/22 2:16 AM, Christoph Hellwig wrote:
> On Mon, Aug 08, 2022 at 07:04:13PM -0500, Mike Christie wrote:
>> It's common to get a UA when doing PR commands. It could be due to a
>> target restarting, transport level relogin or other PR commands like a
>> release causing it. The upper layers don't get the sense and in some cases
>> have no idea if it's a SCSI device, so this has the sd layer retry.
> 
> This seems like another case for the generic in-kernel passthrugh
> command retry discussed in the other thread.

It is.

> 
> Can you split out two series with just bug fixes for nvme and scsi
> as I think we should probably get those into 6.0, and then we can
> do the actual feature on top of those?

Ok.

Martin W, I'll submit a patch with a new SCMD flag that will fix
both our problems.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Aug. 9, 2022, 7:31 p.m. UTC | #3
On 8/9/22 09:24, Mike Christie wrote:
> On 8/9/22 2:16 AM, Christoph Hellwig wrote:
>> On Mon, Aug 08, 2022 at 07:04:13PM -0500, Mike Christie wrote:
>>> It's common to get a UA when doing PR commands. It could be due to a
>>> target restarting, transport level relogin or other PR commands like a
>>> release causing it. The upper layers don't get the sense and in some cases
>>> have no idea if it's a SCSI device, so this has the sd layer retry.
>>
>> This seems like another case for the generic in-kernel passthrugh
>> command retry discussed in the other thread.
> 
> It is.

Has it been considered to introduce a flag that makes scsi_noretry_cmd() 
retry passthrough commands?

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bf080de9866d..61e88c7ffa44 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1683,6 +1683,8 @@  static int sd_get_unique_id(struct gendisk *disk, u8 id[16],
 	return ret;
 }
 
+#define SCSI_PR_UA_RETRIES 5
+
 static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 			    unsigned char *data, int data_len)
 {
@@ -1690,8 +1692,9 @@  static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 	struct scsi_device *sdev = sdkp->device;
 	struct scsi_sense_hdr sshdr;
 	u8 cmd[10] = { 0, };
-	int result;
+	int result, ua_retries = SCSI_PR_UA_RETRIES;
 
+retry:
 	cmd[0] = PERSISTENT_RESERVE_IN;
 	cmd[1] = sa;
 	put_unaligned_be16(data_len, &cmd[7]);
@@ -1700,6 +1703,9 @@  static int sd_pr_in_command(struct block_device *bdev, u8 sa,
 				  &sshdr, SD_TIMEOUT, sdkp->max_retries, NULL);
 	if (scsi_status_is_check_condition(result) &&
 	    scsi_sense_valid(&sshdr)) {
+		if (sshdr.sense_key == UNIT_ATTENTION && ua_retries-- > 0)
+			goto retry;
+
 		sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result);
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
 	}
@@ -1776,10 +1782,11 @@  static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
 	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
 	struct scsi_device *sdev = sdkp->device;
 	struct scsi_sense_hdr sshdr;
-	int result;
+	int result, ua_retries = SCSI_PR_UA_RETRIES;
 	u8 cmd[16] = { 0, };
 	u8 data[24] = { 0, };
 
+retry:
 	cmd[0] = PERSISTENT_RESERVE_OUT;
 	cmd[1] = sa;
 	cmd[2] = type;
@@ -1794,6 +1801,9 @@  static int sd_pr_out_command(struct block_device *bdev, u8 sa, u64 key,
 
 	if (scsi_status_is_check_condition(result) &&
 	    scsi_sense_valid(&sshdr)) {
+		if (sshdr.sense_key == UNIT_ATTENTION && ua_retries-- > 0)
+			goto retry;
+
 		sdev_printk(KERN_INFO, sdev, "PR command failed: %d\n", result);
 		scsi_print_sense_hdr(sdev, NULL, &sshdr);
 	}