Message ID | 10894b48-b215-f2ef-6df0-d48a94ec7158@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On 03/13/2017 02:37 PM, Mauricio Faria de Oliveira wrote: > Hannes, > > On 03/01/2017 06:15 AM, Hannes Reinecke wrote: >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >> index f2cafae..cec439c 100644 >> --- a/drivers/scsi/scsi_error.c >> +++ b/drivers/scsi/scsi_error.c >> @@ -58,6 +58,7 @@ >> static int scsi_eh_try_stu(struct scsi_cmnd *scmd); >> static int scsi_try_to_abort_cmd(struct scsi_host_template *, >> struct scsi_cmnd *); >> +static int scsi_eh_reset(struct scsi_cmnd *scmd); >> >> /* called with shost->host_lock held */ >> void scsi_eh_wakeup(struct Scsi_Host *shost) >> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int >> eh_flag) >> if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) >> eh_flag &= ~SCSI_EH_CANCEL_CMD; >> scmd->eh_eflags |= eh_flag; >> + scsi_eh_reset(scmd); >> list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); >> shost->host_failed++; >> scsi_eh_wakeup(shost); >> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd >> *scmd, int rtn) >> if (!blk_rq_is_passthrough(scmd->request)) { >> struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); >> if (sdrv->eh_action) >> - rtn = sdrv->eh_action(scmd, rtn); >> + rtn = sdrv->eh_action(scmd, rtn, false); >> + } >> + return rtn; >> +} >> + >> +static int scsi_eh_reset(struct scsi_cmnd *scmd) >> +{ >> + int rtn = SUCCESS; >> + >> + if (!blk_rq_is_passthrough(scmd->request)) { >> + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); >> + if (sdrv->eh_action) >> + rtn = sdrv->eh_action(scmd, rtn, true); >> } >> return rtn; >> } > > Apparently we can de-duplicate code in scsi_eh_reset()/scsi_eh_action(), > and change less of sd_eh_action() (i.e., no new parameter). > > Something like this. Thoughts? > > > - Deduplicate code in scsi_eh_reset() and scsi_eh_action() > - A call to scsi_eh_reset() with reset == false calls into eh_action() > > - A call to scsi_eh_reset() with reset == true returns early, > (as happens with/if sd_eh_action() is called in your patch) > > - A call to scsi_eh_reset() in scsi_eh_scmd_add() uses SUCCESS just > for consistency with scsi_eh_reset() in your patch, as the return > value is ignored in it. > > - Less changes to sd_eh_action() > - Removed one chunk in sd_eh_action() ('reset gate variable') > - No more parameter changes in sd_eh_action() (no 'reset' parameter) > > - Removed forward declaration of scsi_eh_reset(), as already suggested > > > > static int scsi_eh_reset(struct scsi_cmnd *scmd, int eh_disp, bool reset) > { > int rtn = eh_disp; > > if (!blk_rq_is_passthrough(scmd->request)) { > struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); > if (sdrv->eh_action) { > if (reset) { > struct scsi_disk *sdkp = > scsi_disk(scmd->request->rq_disk); > > /* New SCSI EH run, reset gate variable */ > sdkp->medium_access_reset = 0; > return rtn; > } > rtn = sdrv->eh_action(scmd, rtn); > } > } > return rtn; > } > Nope. This is assuming that we're always running on a scsi_disk, and that scsi_disk is the only one implementing 'eh_action'. Neither of which is necessarily true. Cheers, Hannes
On 03/13/2017 11:48 AM, Hannes Reinecke wrote: > This is assuming that we're always running on a scsi_disk, and that > scsi_disk is the only one implementing 'eh_action'. > Neither of which is necessarily true. Ah, OK. Thanks for explaining.
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f2cafae..cec439c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; + scsi_eh_reset(scmd, SUCCESS, true); // return value is ignored (early exit due to reset) list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) - if (!blk_rq_is_passthrough(scmd->request)) { - struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); - if (sdrv->eh_action) - rtn = sdrv->eh_action(scmd, rtn); - } - return rtn; + return scsi_eh_reset(scmd, rtn, false); } And the rest, without the 'reset' parameter. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c7839f6..c794686 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1689,X +1689,Y @@ static int sd_pr_clear(struct block_device *bdev, u64 key) * sd_eh_action - error handling callback * @scmd: sd-issued command that has failed * @eh_disp: The recovery disposition suggested by the midlayer * * This function is called by the SCSI midlayer upon completion of an * error test command (currently TEST UNIT READY). The result of sending * the eh command is passed in eh_disp. We're looking for devices that * fail medium access commands but are OK with non access commands like * test unit ready (so wrongly see the device as having a successful - * recovery) + * recovery). + * We have to be careful to count a medium access failure only once + * per SCSI EH run; there might be several timed out commands which + * will cause the 'max_medium_access_timeouts' counter to trigger + * after the first SCSI EH run already and set the device to offline. **/ @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) * process of recovering or has it suffered an internal failure * that prevents access to the storage medium. */ - sdkp->medium_access_timed_out++; + if (!sdkp->medium_access_reset) { + sdkp->medium_access_timed_out++; + sdkp->medium_access_reset = 1; + } /* * If the device keeps failing read/write commands but TEST UNIT diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 4dac35e..6a4f75a 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -106,6 +106,7 @@ struct scsi_disk { unsigned rc_basis: 2; unsigned zoned: 2; unsigned urswrz : 1; + unsigned medium_access_reset : 1; }; #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)