Message ID | 20231114013750.76609-5-michael.christie@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | scsi: Allow scsi_execute users to request retries | expand |
On 14/11/2023 01:37, Mike Christie wrote: Feel free to pick up: Reviewed-by: John Garry <john.g.garry@oracle.com> > + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer, > + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments? Thanks, John > + &exec_args); > > - } while (the_result && retries); > + if (the_result > 0) { > + if (media_not_present(sdkp, &sshdr)) > + return -ENODEV; > + > + sense_valid = scsi_sense_valid(&sshdr); > + if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST && > + (sshdr.asc == 0x20 || sshdr.asc == 0x24) && > + sshdr.ascq == 0x00) { > + /* > + * Invalid Command Operation Code or Invalid Field in > + * CDB, just retry silently with RC10 > + */ > + return -EINVAL; > + } > + }
On 11/16/23 5:39 AM, John Garry wrote: > On 14/11/2023 01:37, Mike Christie wrote: > > Feel free to pick up: > Reviewed-by: John Garry <john.g.garry@oracle.com> > >> + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer, >> + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, > > BTW, some might find it a little confusing that we have scsi_execute_cmd() retries arg as well as being able to pass exec_args.failure, and exec_args.failure may allow us to retry. Could this be improved, even in terms of arg or struct member naming/comments? Will add some comments. Martin W, if you are going to ask about this again, the answer is that in a future patchset, we can kill the reties passed directly into scsi_execute_cmd. We could make scsi_decide_disposition's failures an array of scsi_failure structs that gets checked in scsi_decide_disposition and scsi_check_passthrough. We would need to add a function callout to the scsi_failure struct for some of the more complicated failure handling. That could then be used for some of the other scsi_execute_cmd cases as well. For the normal/fast path case though we would need something to avoid function pointers.
On Thu, 2023-11-16 at 11:15 -0600, Mike Christie wrote: > On 11/16/23 5:39 AM, John Garry wrote: > > On 14/11/2023 01:37, Mike Christie wrote: > > > > Feel free to pick up: > > Reviewed-by: John Garry <john.g.garry@oracle.com> > > > > > + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, > > > buffer, > > > + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, > > > > BTW, some might find it a little confusing that we have > > scsi_execute_cmd() retries arg as well as being able to pass > > exec_args.failure, and exec_args.failure may allow us to retry. > > Could this be improved, even in terms of arg or struct member > > naming/comments? > > Will add some comments. > > Martin W, if you are going to ask about this again, the answer is > that > in a future patchset, we can kill the reties passed directly into > scsi_execute_cmd. > Did I ask about this? ... I dimly remember ;-) > We could make scsi_decide_disposition's failures an array of > scsi_failure structs that gets checked in scsi_decide_disposition > and scsi_check_passthrough. We would need to add a function callout > to the scsi_failure struct for some of the more complicated failure > handling. That could then be used for some of the other > scsi_execute_cmd > cases as well. For the normal/fast path case though we would need > something to avoid function pointers. I am not sure if I like this idea. scsi_decide_disposition() is challenging to read already. I am not convinced that converting it into a loop over "struct scsi_failure" would make it easier to understand. I guess we'd need to see how it would look like. To my understanding, the two retry counts are currently orthogonal, the one in scsi_execute_cmd() representing the standard mid-layer "maybe_retry" case in scsi_decide_disposition, and the the other one representing the additional "high level" passthrough retry. We need to be careful when merging them. Wrt the callout, I'd actually thought about that when looking at your previous set, but I didn't want to interfere too much. I thought that using callouts might simplify the scsi_failure handling, and might actually make the new code easier to read. I can't judge possible effects on performance. Cheers, Martin
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index fa00dd503cbf..1af04b01e1df 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2453,55 +2453,89 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp, static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, unsigned char *buffer) { - unsigned char cmd[16]; - struct scsi_sense_hdr sshdr; - const struct scsi_exec_args exec_args = { - .sshdr = &sshdr, + static const u8 cmd[16] = { + [0] = SERVICE_ACTION_IN_16, + [1] = SAI_READ_CAPACITY_16, + [13] = RC16_LEN, }; + struct scsi_sense_hdr sshdr; int sense_valid = 0; int the_result; - int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET; unsigned int alignment; unsigned long long lba; unsigned sector_size; + struct scsi_failure failure_defs[] = { + /* + * Do not retry Invalid Command Operation Code or Invalid + * Field in CDB. + */ + { + .sense = ILLEGAL_REQUEST, + .asc = 0x20, + .result = SAM_STAT_CHECK_CONDITION, + }, + { + .sense = ILLEGAL_REQUEST, + .asc = 0x24, + .result = SAM_STAT_CHECK_CONDITION, + }, + /* Do not retry Medium Not Present */ + { + .sense = UNIT_ATTENTION, + .asc = 0x3A, + .result = SAM_STAT_CHECK_CONDITION, + }, + { + .sense = NOT_READY, + .asc = 0x3A, + .result = SAM_STAT_CHECK_CONDITION, + }, + /* Device reset might occur several times so retry a lot */ + { + .sense = UNIT_ATTENTION, + .asc = 0x29, + .allowed = READ_CAPACITY_RETRIES_ON_RESET, + .result = SAM_STAT_CHECK_CONDITION, + }, + /* Any other error not listed above retry 3 times */ + { + .result = SCMD_FAILURE_RESULT_ANY, + .allowed = 3, + }, + {} + }; + struct scsi_failures failures = { + .failure_definitions = failure_defs, + }; + const struct scsi_exec_args exec_args = { + .sshdr = &sshdr, + .failures = &failures, + }; if (sdp->no_read_capacity_16) return -EINVAL; - do { - memset(cmd, 0, 16); - cmd[0] = SERVICE_ACTION_IN_16; - cmd[1] = SAI_READ_CAPACITY_16; - cmd[13] = RC16_LEN; - memset(buffer, 0, RC16_LEN); - - the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, - buffer, RC16_LEN, SD_TIMEOUT, - sdkp->max_retries, &exec_args); - if (the_result > 0) { - if (media_not_present(sdkp, &sshdr)) - return -ENODEV; + memset(buffer, 0, RC16_LEN); - sense_valid = scsi_sense_valid(&sshdr); - if (sense_valid && - sshdr.sense_key == ILLEGAL_REQUEST && - (sshdr.asc == 0x20 || sshdr.asc == 0x24) && - sshdr.ascq == 0x00) - /* Invalid Command Operation Code or - * Invalid Field in CDB, just retry - * silently with RC10 */ - return -EINVAL; - if (sense_valid && - sshdr.sense_key == UNIT_ATTENTION && - sshdr.asc == 0x29 && sshdr.ascq == 0x00) - /* Device reset might occur several times, - * give it one more chance */ - if (--reset_retries > 0) - continue; - } - retries--; + the_result = scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, buffer, + RC16_LEN, SD_TIMEOUT, sdkp->max_retries, + &exec_args); - } while (the_result && retries); + if (the_result > 0) { + if (media_not_present(sdkp, &sshdr)) + return -ENODEV; + + sense_valid = scsi_sense_valid(&sshdr); + if (sense_valid && sshdr.sense_key == ILLEGAL_REQUEST && + (sshdr.asc == 0x20 || sshdr.asc == 0x24) && + sshdr.ascq == 0x00) { + /* + * Invalid Command Operation Code or Invalid Field in + * CDB, just retry silently with RC10 + */ + return -EINVAL; + } + } if (the_result) { sd_print_result(sdkp, "Read Capacity(16) failed", the_result);
This has read_capacity_16 have scsi-ml retry errors instead of driving them itself. There are 2 behavior changes with this patch: 1. There is one behavior change where we no longer retry when scsi_execute_cmd returns < 0, but we should be ok. We don't need to retry for failures like the queue being removed, and for the case where there are no tags/reqs since the block layer waits/retries for us. For possible memory allocation failures from blk_rq_map_kern we use GFP_NOIO, so retrying will probably not help. 2. For the specific UAs we checked for and retried, we would get READ_CAPACITY_RETRIES_ON_RESET retries plus whatever retries were left from the main loop's retries. Each UA now gets READ_CAPACITY_RETRIES_ON_RESET reties, and the other errors get up to 3 retries. This is most likely ok, because READ_CAPACITY_RETRIES_ON_RESET is already 10 and is not based on anything specific like a spec or device, so the extra 3 we got from the main loop was probably just an accident and is not going to help. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- drivers/scsi/sd.c | 108 ++++++++++++++++++++++++++++++---------------- 1 file changed, 71 insertions(+), 37 deletions(-)