Message ID | 20210907071605.48968-1-hare@suse.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | I/O errors for ALUA state transitions | expand |
On 9/7/21 2:16 AM, Hannes Reinecke wrote: > From: Rajashekhar M A <rajs@netapp.com> > > When a host is configured with a few LUNs and IO is running, > injecting FC faults repeatedly leads to path recovery problems. > The LUNs have 4 paths each and 3 of them come back active after > say an FC fault which makes two of the paths go down, instead of > all 4. This happens after several iterations of continuous FC faults. > > Reason here is that we're returning an I/O error whenever we're > encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, > ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/scsi_error.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 03a2ff547b22..1185083105ae 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > sshdr.asc == 0x3f && sshdr.ascq == 0x0e) > return NEEDS_RETRY; > /* > - * if the device is in the process of becoming ready, we > - * should retry. > + * if the device is in the process of becoming ready, or > + * transitions between ALUA states, we should retry. > */ > - if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01)) > + if ((sshdr.asc == 0x04) && > + (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a)) > return NEEDS_RETRY; Why put this here instead of in alua_check_sense with the other ALUA state transition check?
On 9/7/21 5:59 PM, michael.christie@oracle.com wrote: > On 9/7/21 2:16 AM, Hannes Reinecke wrote: >> From: Rajashekhar M A <rajs@netapp.com> >> >> When a host is configured with a few LUNs and IO is running, >> injecting FC faults repeatedly leads to path recovery problems. >> The LUNs have 4 paths each and 3 of them come back active after >> say an FC fault which makes two of the paths go down, instead of >> all 4. This happens after several iterations of continuous FC faults. >> >> Reason here is that we're returning an I/O error whenever we're >> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, >> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. >>> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/scsi_error.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index 03a2ff547b22..1185083105ae 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> sshdr.asc == 0x3f && sshdr.ascq == 0x0e) >> return NEEDS_RETRY; >> /* >> - * if the device is in the process of becoming ready, we >> - * should retry. >> + * if the device is in the process of becoming ready, or >> + * transitions between ALUA states, we should retry. >> */ >> - if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01)) >> + if ((sshdr.asc == 0x04) && >> + (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a)) >> return NEEDS_RETRY; > > Why put this here instead of in alua_check_sense with the other ALUA > state transition check? > Good point. Will be updating the patch. Cheers, Hannes
alua_check_sense() checks for NOT READY / 04h 0Ah but not UNIT ATTENTION with the same ASC/ASCQ. Careful about updating pg->state in this case though because that may cause subsequent I/O in the probe to fail, we didn't do this before and we *may* only get the UA once. -Ewan On Wed, Sep 8, 2021 at 2:12 AM Hannes Reinecke <hare@suse.de> wrote: > > On 9/7/21 5:59 PM, michael.christie@oracle.com wrote: > > On 9/7/21 2:16 AM, Hannes Reinecke wrote: > >> From: Rajashekhar M A <rajs@netapp.com> > >> > >> When a host is configured with a few LUNs and IO is running, > >> injecting FC faults repeatedly leads to path recovery problems. > >> The LUNs have 4 paths each and 3 of them come back active after > >> say an FC fault which makes two of the paths go down, instead of > >> all 4. This happens after several iterations of continuous FC faults. > >> > >> Reason here is that we're returning an I/O error whenever we're > >> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, > >> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. > >>> Signed-off-by: Hannes Reinecke <hare@suse.de> > >> --- > >> drivers/scsi/scsi_error.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > >> index 03a2ff547b22..1185083105ae 100644 > >> --- a/drivers/scsi/scsi_error.c > >> +++ b/drivers/scsi/scsi_error.c > >> @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > >> sshdr.asc == 0x3f && sshdr.ascq == 0x0e) > >> return NEEDS_RETRY; > >> /* > >> - * if the device is in the process of becoming ready, we > >> - * should retry. > >> + * if the device is in the process of becoming ready, or > >> + * transitions between ALUA states, we should retry. > >> */ > >> - if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01)) > >> + if ((sshdr.asc == 0x04) && > >> + (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a)) > >> return NEEDS_RETRY; > > > > Why put this here instead of in alua_check_sense with the other ALUA > > state transition check? > > > Good point. Will be updating the patch. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer >
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 03a2ff547b22..1185083105ae 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) sshdr.asc == 0x3f && sshdr.ascq == 0x0e) return NEEDS_RETRY; /* - * if the device is in the process of becoming ready, we - * should retry. + * if the device is in the process of becoming ready, or + * transitions between ALUA states, we should retry. */ - if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01)) + if ((sshdr.asc == 0x04) && + (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a)) return NEEDS_RETRY; /* * if the device is not started, we need to wake