Message ID | 1430127309-90412-1-git-send-email-hare@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote: > During ALUA state transitions the device might return > a sense code 02/04/0a (Logical unit not accessible, asymmetric > access state transition). As this is a transient error > we should just retry the READ CAPACITY call until > the state transition finishes and the correct > capacity can be returned. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/sd.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 79beebf..7178b05 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, > * give it one more chance */ > if (--reset_retries > 0) > continue; > + if (sense_valid && > + sshdr.sense_key == NOT_READY && > + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > + /* ALUA state transition; always retry */ > + continue; > } > retries--; > > @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, > * give it one more chance */ > if (--reset_retries > 0) > continue; > + if (sense_valid && > + sshdr.sense_key == NOT_READY && > + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > + /* ALUA state transition; always retry */ > + continue; > } > retries--; > Got to say I really don't like this infinite retry possibility. How long does the ALUA transition take? Would increasing retries work (or even hijacking reset_retries)? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/28/2015 11:18 PM, James Bottomley wrote: > On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote: >> During ALUA state transitions the device might return >> a sense code 02/04/0a (Logical unit not accessible, asymmetric >> access state transition). As this is a transient error >> we should just retry the READ CAPACITY call until >> the state transition finishes and the correct >> capacity can be returned. >> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/sd.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index 79beebf..7178b05 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, >> * give it one more chance */ >> if (--reset_retries > 0) >> continue; >> + if (sense_valid && >> + sshdr.sense_key == NOT_READY && >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) >> + /* ALUA state transition; always retry */ >> + continue; >> } >> retries--; >> >> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, >> * give it one more chance */ >> if (--reset_retries > 0) >> continue; >> + if (sense_valid && >> + sshdr.sense_key == NOT_READY && >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) >> + /* ALUA state transition; always retry */ >> + continue; >> } >> retries--; >> > > Got to say I really don't like this infinite retry possibility. How > long does the ALUA transition take? Would increasing retries work (or > even hijacking reset_retries)? > Well ... transitioning could be quite long (NetApp FAS has a transition timeout of 30 _minutes_ ...). But yeah, I could see to limit this somewhat. Cheers, Hannes
On 4/30/2015 5:56 PM, Hannes Reinecke wrote: > On 04/28/2015 11:18 PM, James Bottomley wrote: >> On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote: >>> During ALUA state transitions the device might return >>> a sense code 02/04/0a (Logical unit not accessible, asymmetric >>> access state transition). As this is a transient error >>> we should just retry the READ CAPACITY call until >>> the state transition finishes and the correct >>> capacity can be returned. >>> >>> Signed-off-by: Hannes Reinecke <hare@suse.de> >>> --- >>> drivers/scsi/sd.c | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >>> index 79beebf..7178b05 100644 >>> --- a/drivers/scsi/sd.c >>> +++ b/drivers/scsi/sd.c >>> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, >>> * give it one more chance */ >>> if (--reset_retries > 0) >>> continue; >>> + if (sense_valid && >>> + sshdr.sense_key == NOT_READY && >>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) >>> + /* ALUA state transition; always retry */ >>> + continue; >>> } >>> retries--; >>> >>> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, >>> * give it one more chance */ >>> if (--reset_retries > 0) >>> continue; >>> + if (sense_valid && >>> + sshdr.sense_key == NOT_READY && >>> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) >>> + /* ALUA state transition; always retry */ >>> + continue; >>> } >>> retries--; >>> >> >> Got to say I really don't like this infinite retry possibility. How >> long does the ALUA transition take? Would increasing retries work (or >> even hijacking reset_retries)? >> > Well ... transitioning could be quite long (NetApp FAS has a > transition timeout of 30 _minutes_ ...). Well, actually NetApp FAS has a transition timeout of 2 minutes, and not 30 minutes - as reported in the IMPLICIT TRANSITION TIMEOUT value in the extended RTPG data. -Martin -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-04-30 at 14:26 +0200, Hannes Reinecke wrote: > On 04/28/2015 11:18 PM, James Bottomley wrote: > > On Mon, 2015-04-27 at 11:35 +0200, Hannes Reinecke wrote: > >> During ALUA state transitions the device might return > >> a sense code 02/04/0a (Logical unit not accessible, asymmetric > >> access state transition). As this is a transient error > >> we should just retry the READ CAPACITY call until > >> the state transition finishes and the correct > >> capacity can be returned. > >> > >> Signed-off-by: Hannes Reinecke <hare@suse.de> > >> --- > >> drivers/scsi/sd.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > >> index 79beebf..7178b05 100644 > >> --- a/drivers/scsi/sd.c > >> +++ b/drivers/scsi/sd.c > >> @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, > >> * give it one more chance */ > >> if (--reset_retries > 0) > >> continue; > >> + if (sense_valid && > >> + sshdr.sense_key == NOT_READY && > >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > >> + /* ALUA state transition; always retry */ > >> + continue; > >> } > >> retries--; > >> > >> @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, > >> * give it one more chance */ > >> if (--reset_retries > 0) > >> continue; > >> + if (sense_valid && > >> + sshdr.sense_key == NOT_READY && > >> + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) > >> + /* ALUA state transition; always retry */ > >> + continue; > >> } > >> retries--; > >> > > > > Got to say I really don't like this infinite retry possibility. How > > long does the ALUA transition take? Would increasing retries work (or > > even hijacking reset_retries)? > > > Well ... transitioning could be quite long (NetApp FAS has a > transition timeout of 30 _minutes_ ...). > But yeah, I could see to limit this somewhat. I think that might be a good idea. We can't hold this device (and the corresponding asynchronous probe thread) in a continuous loop for 30 minutes ... James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 79beebf..7178b05 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1987,6 +1987,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, * give it one more chance */ if (--reset_retries > 0) continue; + if (sense_valid && + sshdr.sense_key == NOT_READY && + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) + /* ALUA state transition; always retry */ + continue; } retries--; @@ -2069,6 +2074,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp, * give it one more chance */ if (--reset_retries > 0) continue; + if (sense_valid && + sshdr.sense_key == NOT_READY && + sshdr.asc == 0x04 && sshdr.ascq == 0x0A) + /* ALUA state transition; always retry */ + continue; } retries--;
During ALUA state transitions the device might return a sense code 02/04/0a (Logical unit not accessible, asymmetric access state transition). As this is a transient error we should just retry the READ CAPACITY call until the state transition finishes and the correct capacity can be returned. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/sd.c | 10 ++++++++++ 1 file changed, 10 insertions(+)