From patchwork Mon Mar 13 13:37:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 9620891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 49FDC60414 for ; Mon, 13 Mar 2017 13:37:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B03D223B3 for ; Mon, 13 Mar 2017 13:37:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F85128498; Mon, 13 Mar 2017 13:37:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 155BD28497 for ; Mon, 13 Mar 2017 13:37:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750929AbdCMNhn (ORCPT ); Mon, 13 Mar 2017 09:37:43 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:53310 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751097AbdCMNhn (ORCPT ); Mon, 13 Mar 2017 09:37:43 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v2DDXjv7017281 for ; Mon, 13 Mar 2017 09:37:42 -0400 Received: from e24smtp01.br.ibm.com (e24smtp01.br.ibm.com [32.104.18.85]) by mx0a-001b2d01.pphosted.com with ESMTP id 294emfb7vp-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 13 Mar 2017 09:37:41 -0400 Received: from localhost by e24smtp01.br.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 13 Mar 2017 10:37:39 -0300 Received: from d24relay02.br.ibm.com (9.18.232.42) by e24smtp01.br.ibm.com (10.172.0.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 13 Mar 2017 10:37:36 -0300 Received: from d24av04.br.ibm.com (d24av04.br.ibm.com [9.8.31.97]) by d24relay02.br.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v2DDbZAr32243732 for ; Mon, 13 Mar 2017 10:37:36 -0300 Received: from d24av04.br.ibm.com (localhost [127.0.0.1]) by d24av04.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id v2DDbYUd004347 for ; Mon, 13 Mar 2017 10:37:35 -0300 Received: from [9.85.154.66] ([9.85.154.66]) by d24av04.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id v2DDbSwL004318; Mon, 13 Mar 2017 10:37:29 -0300 Subject: Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run To: Hannes Reinecke References: <1488359720-130871-1-git-send-email-hare@suse.de> <1488359720-130871-2-git-send-email-hare@suse.de> Cc: "Martin K. Petersen" , Christoph Hellwig , James Bottomley , Bart van Assche , linux-scsi@vger.kernel.org, Ewan Milne , Lawrence Obermann , Benjamin Block , Steffen Maier , Hannes Reinecke From: Mauricio Faria de Oliveira Date: Mon, 13 Mar 2017 10:37:28 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <1488359720-130871-2-git-send-email-hare@suse.de> X-TM-AS-MML: disable x-cbid: 17031313-1523-0000-0000-0000028931FB X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17031313-1524-0000-0000-00002A1F6045 Message-Id: <10894b48-b215-f2ef-6df0-d48a94ec7158@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-13_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703130109 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hannes, On 03/01/2017 06:15 AM, Hannes Reinecke wrote: > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index f2cafae..cec439c 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -58,6 +58,7 @@ > static int scsi_eh_try_stu(struct scsi_cmnd *scmd); > static int scsi_try_to_abort_cmd(struct scsi_host_template *, > struct scsi_cmnd *); > +static int scsi_eh_reset(struct scsi_cmnd *scmd); > > /* called with shost->host_lock held */ > void scsi_eh_wakeup(struct Scsi_Host *shost) > @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) > if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) > eh_flag &= ~SCSI_EH_CANCEL_CMD; > scmd->eh_eflags |= eh_flag; > + scsi_eh_reset(scmd); > list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); > shost->host_failed++; > scsi_eh_wakeup(shost); > @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) > if (!blk_rq_is_passthrough(scmd->request)) { > struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); > if (sdrv->eh_action) > - rtn = sdrv->eh_action(scmd, rtn); > + rtn = sdrv->eh_action(scmd, rtn, false); > + } > + return rtn; > +} > + > +static int scsi_eh_reset(struct scsi_cmnd *scmd) > +{ > + int rtn = SUCCESS; > + > + if (!blk_rq_is_passthrough(scmd->request)) { > + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); > + if (sdrv->eh_action) > + rtn = sdrv->eh_action(scmd, rtn, true); > } > return rtn; > } Apparently we can de-duplicate code in scsi_eh_reset()/scsi_eh_action(), and change less of sd_eh_action() (i.e., no new parameter). Something like this. Thoughts? - Deduplicate code in scsi_eh_reset() and scsi_eh_action() - A call to scsi_eh_reset() with reset == false calls into eh_action() - A call to scsi_eh_reset() with reset == true returns early, (as happens with/if sd_eh_action() is called in your patch) - A call to scsi_eh_reset() in scsi_eh_scmd_add() uses SUCCESS just for consistency with scsi_eh_reset() in your patch, as the return value is ignored in it. - Less changes to sd_eh_action() - Removed one chunk in sd_eh_action() ('reset gate variable') - No more parameter changes in sd_eh_action() (no 'reset' parameter) - Removed forward declaration of scsi_eh_reset(), as already suggested static int scsi_eh_reset(struct scsi_cmnd *scmd, int eh_disp, bool reset) { int rtn = eh_disp; if (!blk_rq_is_passthrough(scmd->request)) { struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); if (sdrv->eh_action) { if (reset) { struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk); /* New SCSI EH run, reset gate variable */ sdkp->medium_access_reset = 0; return rtn; } rtn = sdrv->eh_action(scmd, rtn); } } return rtn; } Plus, diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index f2cafae..cec439c 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) eh_flag &= ~SCSI_EH_CANCEL_CMD; scmd->eh_eflags |= eh_flag; + scsi_eh_reset(scmd, SUCCESS, true); // return value is ignored (early exit due to reset) list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) - if (!blk_rq_is_passthrough(scmd->request)) { - struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); - if (sdrv->eh_action) - rtn = sdrv->eh_action(scmd, rtn); - } - return rtn; + return scsi_eh_reset(scmd, rtn, false); } And the rest, without the 'reset' parameter. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c7839f6..c794686 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1689,X +1689,Y @@ static int sd_pr_clear(struct block_device *bdev, u64 key) * sd_eh_action - error handling callback * @scmd: sd-issued command that has failed * @eh_disp: The recovery disposition suggested by the midlayer * * This function is called by the SCSI midlayer upon completion of an * error test command (currently TEST UNIT READY). The result of sending * the eh command is passed in eh_disp. We're looking for devices that * fail medium access commands but are OK with non access commands like * test unit ready (so wrongly see the device as having a successful - * recovery) + * recovery). + * We have to be careful to count a medium access failure only once + * per SCSI EH run; there might be several timed out commands which + * will cause the 'max_medium_access_timeouts' counter to trigger + * after the first SCSI EH run already and set the device to offline. **/ @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) * process of recovering or has it suffered an internal failure * that prevents access to the storage medium. */ - sdkp->medium_access_timed_out++; + if (!sdkp->medium_access_reset) { + sdkp->medium_access_timed_out++; + sdkp->medium_access_reset = 1; + } /* * If the device keeps failing read/write commands but TEST UNIT diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index 4dac35e..6a4f75a 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -106,6 +106,7 @@ struct scsi_disk { unsigned rc_basis: 2; unsigned zoned: 2; unsigned urswrz : 1; + unsigned medium_access_reset : 1; }; #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)