From patchwork Sun Aug 27 17:40:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sinan Kaya X-Patchwork-Id: 9923969 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 74F23604D9 for ; Sun, 27 Aug 2017 17:41:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6160A285DE for ; Sun, 27 Aug 2017 17:41:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5633428640; Sun, 27 Aug 2017 17:41:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F06A1285DE for ; Sun, 27 Aug 2017 17:41:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751336AbdH0RlH (ORCPT ); Sun, 27 Aug 2017 13:41:07 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:42690 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbdH0RlE (ORCPT ); Sun, 27 Aug 2017 13:41:04 -0400 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 181D560731; Sun, 27 Aug 2017 17:41:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1503855664; bh=hrKWq7cNzJjcUOpB6WWD58XupwSseUI5Er5ARz5p6LE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ET1UDThDKzMWDm24cCDWrchfuo2S+I41ulG0J/m22O1pyUhfnmeZEb4cZ7uLi/yG6 R5r1k7z5dS5OPi1jmMO2s3ahARaXKUO5qpisrgHlNFjenMPvvYyFzFyUoJ/RV2ta4s eIUgSL2TqNowuDJPJGZef0VD6lirJNDWZYY7CSFs= Received: from drakthul.qualcomm.com (global_nat1_iad_fw.qualcomm.com [129.46.232.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: okaya@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 7AF7F6072D; Sun, 27 Aug 2017 17:41:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1503855663; bh=hrKWq7cNzJjcUOpB6WWD58XupwSseUI5Er5ARz5p6LE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=m2v9fpKsB2b6L34t2TEg7dBDhCNwyjMgSojRuSBHDBITk7usMlfpnpCg/xHMKgAW6 GLEbzhtjGibZAq5tK5RehGZXVljIofX1T47J96CN5ak8VUPQv8I8dZhR6Q/F1dnJpz vVXga1TiJAVYCq6oaBIWVK2WmMZA7uDkBhNpBLpY= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 7AF7F6072D Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org From: Sinan Kaya To: linux-pci@vger.kernel.org, timur@codeaurora.org, alex.williamson@redhat.com Cc: linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sinan Kaya , Bjorn Helgaas , linux-kernel@vger.kernel.org Subject: [PATCH V13 3/4] PCI: Handle CRS ('device not ready') returned by device after FLR Date: Sun, 27 Aug 2017 13:40:50 -0400 Message-Id: <1503855651-17409-3-git-send-email-okaya@codeaurora.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1503855651-17409-1-git-send-email-okaya@codeaurora.org> References: <1503855651-17409-1-git-send-email-okaya@codeaurora.org> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sporadic reset issues have been observed with Intel 750 NVMe drive while assigning the physical function to the guest machine. The sequence of events observed is as follows: - perform a Function Level Reset (FLR) - sleep up to 1000ms total - read ~0 from PCI_COMMAND - warn that the device didn't return from FLR - touch the device before it's ready - device drops config writes when we restore register settings - incomplete register restore leaves device in inconsistent state - device probe fails because device is in inconsistent state After reset, an endpoint may respond to config requests with Configuration Request Retry Status (CRS) to indicate that it is not ready to accept new requests. See PCIe r3.1, sec 2.3.1 and 6.6.2. Increase the timeout value from 1 second to 60 seconds to cover the period where device responds with CRS and also report polling progress. Signed-off-by: Sinan Kaya --- drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc34..abab64b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3811,27 +3811,49 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev) } EXPORT_SYMBOL(pci_wait_for_pending_transaction); -/* - * We should only need to wait 100ms after FLR, but some devices take longer. - * Wait for up to 1000ms for config space to return something other than -1. - * Intel IGD requires this when an LCD panel is attached. We read the 2nd - * dword because VFs don't implement the 1st dword. - */ static void pci_flr_wait(struct pci_dev *dev) { - int i = 0; + int delay = 1, timeout = 60000; u32 id; - do { - msleep(100); + /* + * Per PCIe r3.1, sec 6.6.2, a device must complete an FLR within + * 100ms, but may silently discard requests while the FLR is in + * progress. Wait 100ms before trying to access the device. + */ + msleep(100); + + /* + * After 100ms, the device should not silently discard config + * requests, but it may still indicate that it needs more time by + * responding to them with CRS completions. The Root Port will + * generally synthesize ~0 data to complete the read (except when + * CRS SV is enabled and the read was for the Vendor ID; in that + * case it synthesizes 0x0001 data). + * + * Wait for the device to return a non-CRS completion. Read the + * Command register instead of Vendor ID so we don't have to + * contend with the CRS SV value. + */ + pci_read_config_dword(dev, PCI_COMMAND, &id); + while (id == ~0) { + if (delay > timeout) { + dev_warn(&dev->dev, "not ready %dms after FLR; giving up\n", + delay - 1); + return; + } + + if (delay > 1000) + dev_info(&dev->dev, "not ready %dms after FLR; waiting\n", + delay - 1); + + msleep(delay); + delay *= 2; pci_read_config_dword(dev, PCI_COMMAND, &id); - } while (i++ < 10 && id == ~0); + } - if (id == ~0) - dev_warn(&dev->dev, "Failed to return from FLR\n"); - else if (i > 1) - dev_info(&dev->dev, "Required additional %dms to return from FLR\n", - (i - 1) * 100); + if (delay > 1000) + dev_info(&dev->dev, "ready %dms after FLR\n", delay - 1); } /**