Message ID | 20221208105947.2399894-24-niklas.cassel@wdc.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Add Command Duration Limits support | expand |
On 12/8/22 4:59 AM, Niklas Cassel wrote: > Commands using a duration limit descriptor that has limit policies set > to a value other than 0x0 may be failed by the device if one of the > limits are exceeded. For such commands, since the failure is the result > of the user duration limit configuration and workload, the commands > should not be retried and terminated immediately. Furthermore, to allow > the user to differentiate these "soft" failures from hard errors due to > hardware problem, a different error code than EIO should be returned. > > There are 2 cases to consider: > (1) The failure is due to a limit policy failing the command with a > check condition sense key, that is, any limit policy other than 0xD. > For this case, scsi_check_sense() is modified to detect failures with > the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING > or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING > PROCESSING DUE TO ERROR RECOVERY additional sense code. For these > failures, a SUCCESS disposition is returned so that > scsi_finish_command() is called to terminate the command. > > (2) The failure is due to a limit policy set to 0xD, which result in the > command being terminated with a GOOD status, COMPLETED sense key, and > DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case, > the scsi_check_sense() is modified to return a SUCCESS disposition so > that scsi_finish_command() is called to terminate the command. > In addition, scsi_decide_disposition() has to be modified to see if a > command being terminated with GOOD status has sense data. > This is as defined in SCSI Primary Commands - 6 (SPC-6), so all > according to spec, even if GOOD status commands were not checked before. > > If scsi_check_sense() detects sense data representing a duration limit, > scsi_check_sense() will set the newly introduced SCSI ML byte > SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in > scsi_noretry_cmd(), so that a command that failed because of a CDL > timeout cannot be retried. The SCSI ML byte is also checked in > scsi_result_to_blk_status() to complete the command request with the > BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME > errors for the failed commands. > > Co-developed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > --- > drivers/scsi/scsi_error.c | 46 +++++++++++++++++++++++++++++++++++++++ > drivers/scsi/scsi_lib.c | 4 ++++ > drivers/scsi/scsi_priv.h | 1 + > 3 files changed, 51 insertions(+) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 51aa5c1e31b5..1bdab5385985 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status) > */ > enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > { > + struct request *req = scsi_cmd_to_rq(scmd); > struct scsi_device *sdev = scmd->device; > struct scsi_sense_hdr sshdr; > > @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > if (sshdr.asc == 0x10) /* DIF */ > return SUCCESS; > > + /* > + * Check aborts due to command duration limit policy: > + * ABORTED COMMAND additional sense code with the > + * COMMAND TIMEOUT BEFORE PROCESSING or > + * COMMAND TIMEOUT DURING PROCESSING or > + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY > + * additional sense code qualifiers. > + */ > + if (sshdr.asc == 0x2e && > + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { > + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); > + req->cmd_flags |= REQ_FAILFAST_DEV; Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it? I thought you might have set it because DID_TIME_OUT was set and you wanted to hit that check in scsi_noretry_cmd. However, I see that patch where you added the new flag so DID_TIME_OUT does not get set sometimes so you probably don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there below. > + req->rq_flags |= RQF_QUIET; > + return SUCCESS; > + } > + > if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF) > return ADD_TO_MLQUEUE; > if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 && > @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) > } > return SUCCESS; > > + case COMPLETED: > + if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) { > + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); > + req->cmd_flags |= REQ_FAILFAST_DEV; > + req->rq_flags |= RQF_QUIET; > + return SUCCESS; > + } > + return SUCCESS; > + > default: > return SUCCESS; > } > @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd) > switch (get_status_byte(scmd)) { > case SAM_STAT_GOOD: > scsi_handle_queue_ramp_up(scmd->device); > + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) > + /* > + * If we have sense data, call scsi_check_sense() in > + * order to set the correct SCSI ML byte (if any). > + * No point in checking the return value, since the > + * command has already completed successfully. > + */ > + scsi_check_sense(scmd); > fallthrough; > case SAM_STAT_COMMAND_TERMINATED: > return SUCCESS; > @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd) > return !!(req->cmd_flags & REQ_FAILFAST_DRIVER); > } > > + /* Never retry commands aborted due to a duration limit timeout */ > + if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT) > + return true; > + > if (!scsi_status_is_check_condition(scmd->result)) > return false; > > @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd) > if (scmd->cmnd[0] == REPORT_LUNS) > scmd->device->sdev_target->expecting_lun_change = 0; > scsi_handle_queue_ramp_up(scmd->device); > + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) > + /* > + * If we have sense data, call scsi_check_sense() in > + * order to set the correct SCSI ML byte (if any). > + * No point in checking the return value, since the > + * command has already completed successfully. > + */ > + scsi_check_sense(scmd); > fallthrough; > case SAM_STAT_COMMAND_TERMINATED: > return SUCCESS; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index e64fd8f495d7..4f317c6593aa 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result) > return BLK_STS_MEDIUM; > case SCSIML_STAT_TGT_FAILURE: > return BLK_STS_TARGET; > + case SCSIML_STAT_DL_TIMEOUT: > + return BLK_STS_DURATION_LIMIT; > } > > switch (host_byte(result)) { > @@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) > blk_stat = BLK_STS_ZONE_OPEN_RESOURCE; > } > break; > + case COMPLETED: > + fallthrough; > default: > action = ACTION_FAIL; > break; > diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h > index 4f97e126c6fb..f8da92428ff6 100644 > --- a/drivers/scsi/scsi_priv.h > +++ b/drivers/scsi/scsi_priv.h > @@ -27,6 +27,7 @@ enum scsi_ml_status { > SCSIML_STAT_NOSPC = 0x02, /* Space allocation on the dev failed */ > SCSIML_STAT_MED_ERROR = 0x03, /* Medium error */ > SCSIML_STAT_TGT_FAILURE = 0x04, /* Permanent target failure */ > + SCSIML_STAT_DL_TIMEOUT = 0x05, /* Command Duration Limit timeout */ > }; > > static inline u8 get_scsi_ml_byte(int result)
On 12/9/22 09:13, Mike Christie wrote: >> @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) >> if (sshdr.asc == 0x10) /* DIF */ >> return SUCCESS; >> >> + /* >> + * Check aborts due to command duration limit policy: >> + * ABORTED COMMAND additional sense code with the >> + * COMMAND TIMEOUT BEFORE PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING or >> + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY >> + * additional sense code qualifiers. >> + */ >> + if (sshdr.asc == 0x2e && >> + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { >> + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); >> + req->cmd_flags |= REQ_FAILFAST_DEV; > > Why are you setting the REQ_FAILFAST_DEV bit? Does libata check for it? > > I thought you might have set it because DID_TIME_OUT was set and you wanted > to hit that check in scsi_noretry_cmd. However, I see that patch where you > added the new flag so DID_TIME_OUT does not get set sometimes so you probably > don't hit that path, and you have that check for SCSIML_STAT_DL_TIMEOUT in there > below. This is for the block layer (blk_noretry_request() helper) so that the remainder of the request is not retried. Retrying a CDL command that timedout goes against the goal of CDL.
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 51aa5c1e31b5..1bdab5385985 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -536,6 +536,7 @@ static inline void set_scsi_ml_byte(struct scsi_cmnd *cmd, u8 status) */ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) { + struct request *req = scsi_cmd_to_rq(scmd); struct scsi_device *sdev = scmd->device; struct scsi_sense_hdr sshdr; @@ -595,6 +596,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) if (sshdr.asc == 0x10) /* DIF */ return SUCCESS; + /* + * Check aborts due to command duration limit policy: + * ABORTED COMMAND additional sense code with the + * COMMAND TIMEOUT BEFORE PROCESSING or + * COMMAND TIMEOUT DURING PROCESSING or + * COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY + * additional sense code qualifiers. + */ + if (sshdr.asc == 0x2e && + sshdr.ascq >= 0x01 && sshdr.ascq <= 0x03) { + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); + req->cmd_flags |= REQ_FAILFAST_DEV; + req->rq_flags |= RQF_QUIET; + return SUCCESS; + } + if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF) return ADD_TO_MLQUEUE; if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 && @@ -691,6 +708,15 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd) } return SUCCESS; + case COMPLETED: + if (sshdr.asc == 0x55 && sshdr.ascq == 0x0a) { + set_scsi_ml_byte(scmd, SCSIML_STAT_DL_TIMEOUT); + req->cmd_flags |= REQ_FAILFAST_DEV; + req->rq_flags |= RQF_QUIET; + return SUCCESS; + } + return SUCCESS; + default: return SUCCESS; } @@ -785,6 +811,14 @@ static enum scsi_disposition scsi_eh_completed_normally(struct scsi_cmnd *scmd) switch (get_status_byte(scmd)) { case SAM_STAT_GOOD: scsi_handle_queue_ramp_up(scmd->device); + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) + /* + * If we have sense data, call scsi_check_sense() in + * order to set the correct SCSI ML byte (if any). + * No point in checking the return value, since the + * command has already completed successfully. + */ + scsi_check_sense(scmd); fallthrough; case SAM_STAT_COMMAND_TERMINATED: return SUCCESS; @@ -1807,6 +1841,10 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd) return !!(req->cmd_flags & REQ_FAILFAST_DRIVER); } + /* Never retry commands aborted due to a duration limit timeout */ + if (get_scsi_ml_byte(scmd->result) == SCSIML_STAT_DL_TIMEOUT) + return true; + if (!scsi_status_is_check_condition(scmd->result)) return false; @@ -1966,6 +2004,14 @@ enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *scmd) if (scmd->cmnd[0] == REPORT_LUNS) scmd->device->sdev_target->expecting_lun_change = 0; scsi_handle_queue_ramp_up(scmd->device); + if (scmd->sense_buffer && SCSI_SENSE_VALID(scmd)) + /* + * If we have sense data, call scsi_check_sense() in + * order to set the correct SCSI ML byte (if any). + * No point in checking the return value, since the + * command has already completed successfully. + */ + scsi_check_sense(scmd); fallthrough; case SAM_STAT_COMMAND_TERMINATED: return SUCCESS; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index e64fd8f495d7..4f317c6593aa 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -602,6 +602,8 @@ static blk_status_t scsi_result_to_blk_status(int result) return BLK_STS_MEDIUM; case SCSIML_STAT_TGT_FAILURE: return BLK_STS_TARGET; + case SCSIML_STAT_DL_TIMEOUT: + return BLK_STS_DURATION_LIMIT; } switch (host_byte(result)) { @@ -799,6 +801,8 @@ static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result) blk_stat = BLK_STS_ZONE_OPEN_RESOURCE; } break; + case COMPLETED: + fallthrough; default: action = ACTION_FAIL; break; diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 4f97e126c6fb..f8da92428ff6 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -27,6 +27,7 @@ enum scsi_ml_status { SCSIML_STAT_NOSPC = 0x02, /* Space allocation on the dev failed */ SCSIML_STAT_MED_ERROR = 0x03, /* Medium error */ SCSIML_STAT_TGT_FAILURE = 0x04, /* Permanent target failure */ + SCSIML_STAT_DL_TIMEOUT = 0x05, /* Command Duration Limit timeout */ }; static inline u8 get_scsi_ml_byte(int result)