From patchwork Sun Sep 13 14:52:51 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yaniv Gardi X-Patchwork-Id: 7169731 Return-Path: X-Original-To: patchwork-linux-scsi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 61A7F9F326 for ; Sun, 13 Sep 2015 14:56:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 432BB2068F for ; Sun, 13 Sep 2015 14:56:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1C63D2063E for ; Sun, 13 Sep 2015 14:56:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755436AbbIMO40 (ORCPT ); Sun, 13 Sep 2015 10:56:26 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:49083 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755220AbbIMOyO (ORCPT ); Sun, 13 Sep 2015 10:54:14 -0400 Received: from smtp.codeaurora.org (localhost [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id D1FE81403D4; Sun, 13 Sep 2015 14:54:13 +0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 486) id B25D11403F3; Sun, 13 Sep 2015 14:54:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from lx-ygardi.mea.qualcomm.com (unknown [185.23.60.4]) (using TLSv1.1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ygardi@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 7825B1403D4; Sun, 13 Sep 2015 14:54:09 +0000 (UTC) From: Yaniv Gardi To: James.Bottomley@HansenPartnership.com, pebolle@tiscali.nl, hch@infradead.org Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-arm-msm@vger.kernel.org, santoshsy@gmail.com, linux-scsi-owner@vger.kernel.org, subhashj@codeaurora.org, ygardi@codeaurora.org, gbroner@codeaurora.org, draviv@codeaurora.org, Vinayak Holikatti , "James E.J. Bottomley" Subject: [PATCH v1 11/17] scsi: ufs: add error recovery after DL NAC error Date: Sun, 13 Sep 2015 17:52:51 +0300 Message-Id: <1442155977-7686-12-git-send-email-ygardi@codeaurora.org> X-Mailer: git-send-email 1.8.5.2 In-Reply-To: <1442155977-7686-1-git-send-email-ygardi@codeaurora.org> References: <1442155977-7686-1-git-send-email-ygardi@codeaurora.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some vendor's UFS device sends back to back NACs for the DL data frames causing the host controller to raise the DFES error status. Sometimes such UFS devices send back to back NAC without waiting for new retransmitted DL frame from the host and in such cases it might be possible the Host UniPro goes into bad state without raising the DFES error interrupt. If this happens then all the pending commands would timeout only after respective SW command (which is generally too large). This change workarounds such device behaviour like this: - As soon as SW sees the DL NAC error, it would schedule the error handler - Error handler would sleep for 50ms to see if there any fatal errors raised by UFS controller. - If there are fatal errors then SW does normal error recovery. - If there are no fatal errors then SW sends the NOP command to device to check if link is alive. - If NOP command times out, SW does normal error recovery - If NOP command succeed, skip the error handling. If DL NAC error is seen multiple times with some vendor's UFS devices then enable this quirk to initiate quick error recovery and also silence related error logs to reduce spamming of kernel logs. Signed-off-by: Subhash Jadavani Signed-off-by: Yaniv Gardi --- drivers/scsi/ufs/ufshcd.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++ drivers/scsi/ufs/ufshci.h | 2 + 2 files changed, 95 insertions(+) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 6a1a8cd..a649250 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -3837,6 +3837,79 @@ static void ufshcd_complete_requests(struct ufs_hba *hba) } /** + * ufshcd_quirk_dl_nac_errors - This function checks if error handling is + * to recover from the DL NAC errors or not. + * @hba: per-adapter instance + * + * Returns true if error handling is required, false otherwise + */ +static bool ufshcd_quirk_dl_nac_errors(struct ufs_hba *hba) +{ + unsigned long flags; + bool err_handling = true; + + spin_lock_irqsave(hba->host->host_lock, flags); + /* + * UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS only workaround the + * device fatal error and/or DL NAC & REPLAY timeout errors. + */ + if (hba->saved_err & (CONTROLLER_FATAL_ERROR | SYSTEM_BUS_FATAL_ERROR)) + goto out; + + if ((hba->saved_err & DEVICE_FATAL_ERROR) || + ((hba->saved_err & UIC_ERROR) && + (hba->saved_uic_err & UFSHCD_UIC_DL_TCx_REPLAY_ERROR))) + goto out; + + if ((hba->saved_err & UIC_ERROR) && + (hba->saved_uic_err & UFSHCD_UIC_DL_NAC_RECEIVED_ERROR)) { + int err; + /* + * wait for 50ms to see if we can get any other errors or not. + */ + spin_unlock_irqrestore(hba->host->host_lock, flags); + msleep(50); + spin_lock_irqsave(hba->host->host_lock, flags); + + /* + * now check if we have got any other severe errors other than + * DL NAC error? + */ + if ((hba->saved_err & INT_FATAL_ERRORS) || + ((hba->saved_err & UIC_ERROR) && + (hba->saved_uic_err & ~UFSHCD_UIC_DL_NAC_RECEIVED_ERROR))) + goto out; + + /* + * As DL NAC is the only error received so far, send out NOP + * command to confirm if link is still active or not. + * - If we don't get any response then do error recovery. + * - If we get response then clear the DL NAC error bit. + */ + + spin_unlock_irqrestore(hba->host->host_lock, flags); + err = ufshcd_verify_dev_init(hba); + spin_lock_irqsave(hba->host->host_lock, flags); + + if (err) + goto out; + + /* Link seems to be alive hence ignore the DL NAC errors */ + if (hba->saved_uic_err == UFSHCD_UIC_DL_NAC_RECEIVED_ERROR) + hba->saved_err &= ~UIC_ERROR; + /* clear NAC error */ + hba->saved_uic_err &= ~UFSHCD_UIC_DL_NAC_RECEIVED_ERROR; + if (!hba->saved_uic_err) { + err_handling = false; + goto out; + } + } +out: + spin_unlock_irqrestore(hba->host->host_lock, flags); + return err_handling; +} + +/** * ufshcd_err_handler - handle UFS errors that require s/w attention * @work: pointer to work structure */ @@ -3864,6 +3937,17 @@ static void ufshcd_err_handler(struct work_struct *work) /* Complete requests that have door-bell cleared by h/w */ ufshcd_complete_requests(hba); + + if (hba->dev_quirks & UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS) { + bool ret; + + spin_unlock_irqrestore(hba->host->host_lock, flags); + /* release the lock as ufshcd_quirk_dl_nac_errors() may sleep */ + ret = ufshcd_quirk_dl_nac_errors(hba); + spin_lock_irqsave(hba->host->host_lock, flags); + if (!ret) + goto skip_err_handling; + } if ((hba->saved_err & INT_FATAL_ERRORS) || ((hba->saved_err & UIC_ERROR) && (hba->saved_uic_err & (UFSHCD_UIC_DL_PA_INIT_ERROR | @@ -3939,6 +4023,7 @@ skip_pending_xfer_clear: hba->saved_uic_err = 0; } +skip_err_handling: if (!needs_reset) { hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL; if (hba->saved_err || hba->saved_uic_err) @@ -3967,6 +4052,14 @@ static void ufshcd_update_uic_error(struct ufs_hba *hba) reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER); if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT) hba->uic_error |= UFSHCD_UIC_DL_PA_INIT_ERROR; + else if (hba->dev_quirks & + UFS_DEVICE_QUIRK_RECOVERY_FROM_DL_NAC_ERRORS) { + if (reg & UIC_DATA_LINK_LAYER_ERROR_NAC_RECEIVED) + hba->uic_error |= + UFSHCD_UIC_DL_NAC_RECEIVED_ERROR; + else if (reg & UIC_DATA_LINK_LAYER_ERROR_TCx_REPLAY_TIMEOUT) + hba->uic_error |= UFSHCD_UIC_DL_TCx_REPLAY_ERROR; + } /* UIC NL/TL/DME errors needs software retry */ reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_NETWORK_LAYER); diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h index 0ae0967..2b05bfb 100644 --- a/drivers/scsi/ufs/ufshci.h +++ b/drivers/scsi/ufs/ufshci.h @@ -170,6 +170,8 @@ enum { #define UIC_DATA_LINK_LAYER_ERROR UFS_BIT(31) #define UIC_DATA_LINK_LAYER_ERROR_CODE_MASK 0x7FFF #define UIC_DATA_LINK_LAYER_ERROR_PA_INIT 0x2000 +#define UIC_DATA_LINK_LAYER_ERROR_NAC_RECEIVED 0x0001 +#define UIC_DATA_LINK_LAYER_ERROR_TCx_REPLAY_TIMEOUT 0x0002 /* UECN - Host UIC Error Code Network Layer 40h */ #define UIC_NETWORK_LAYER_ERROR UFS_BIT(31)