From patchwork Sun Nov 27 11:32:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cao jin X-Patchwork-Id: 9448779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0D2656071A for ; Sun, 27 Nov 2016 11:43:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E1DD420855 for ; Sun, 27 Nov 2016 11:43:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D519926224; Sun, 27 Nov 2016 11:43:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 51F4220855 for ; Sun, 27 Nov 2016 11:43:31 +0000 (UTC) Received: from localhost ([::1]:53838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cAxrm-0001y3-IY for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Nov 2016 06:43:30 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43840) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cAxeL-0008Vx-RR for qemu-devel@nongnu.org; Sun, 27 Nov 2016 06:29:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cAxeK-00011V-PR for qemu-devel@nongnu.org; Sun, 27 Nov 2016 06:29:37 -0500 Received: from [59.151.112.132] (port=15561 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cAxeJ-00010u-Pf for qemu-devel@nongnu.org; Sun, 27 Nov 2016 06:29:36 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="13345175" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 27 Nov 2016 19:29:33 +0800 Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83]) by cn.fujitsu.com (Postfix) with ESMTP id E6D8843972CE; Sun, 27 Nov 2016 19:29:29 +0800 (CST) Received: from G08FNSTD140223.g08.fujitsu.local (10.167.226.69) by G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sun, 27 Nov 2016 19:29:40 +0800 From: Cao jin To: Date: Sun, 27 Nov 2016 19:32:32 +0800 Message-ID: <1480246353-10297-10-git-send-email-caoj.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1480246353-10297-1-git-send-email-caoj.fnst@cn.fujitsu.com> References: <1480246353-10297-1-git-send-email-caoj.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: E6D8843972CE.ABD94 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 59.151.112.132 Subject: [Qemu-devel] [PATCH v10 09/10] vfio-pci: pass the aer error to guest X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chen Fan , izumi.taku@jp.fujitsu.com, alex.williamson@redhat.com, Dou Liyang , mst@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Chen Fan When physical device has uncorrectable error hanppened, the vfio_pci driver will signal the corresponding QEMU's vfio-pci device via the eventfd registered by this device, then, the qemu error eventfd handler will be invoked in event loop. Construct and pass the aer message to root port, root port will trigger an interrupt to signal guest, then, the guest driver will do the recovery. Signed-off-by: Chen Fan Signed-off-by: Dou Liyang Signed-off-by: Cao jin --- hw/vfio/pci.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d9236ed..8bdd889 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2716,18 +2716,56 @@ static void vfio_put_device(VFIOPCIDevice *vdev) static void vfio_err_notifier_handler(void *opaque) { VFIOPCIDevice *vdev = opaque; + PCIDevice *dev = &vdev->pdev; + Error *err = NULL; + PCIEAERMsg msg = { + .severity = 0, + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, + }; + int len; + uint64_t uncor_status; + + /* Read the uncorrectable error status from vfio-pci driver via the eventfd */ + len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status)); + if (len != sizeof(uncor_status)) { + error_report("vfio-pci: uncor error status reading returns" + " invalid number of bytes: %d", len); + return; + } + + if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) { + goto stop; + } + + /* + * In case the actual hardware configuration has been changed, + * should re-check the bus reset capability here. + */ + vfio_check_hot_bus_reset(vdev, &err); + if (err) { + error_report_err(err); + goto stop; + } + + /* Complete the aer msg and send it to root port */ + if (dev->exp.aer_cap) { + uint8_t *aer_cap = dev->config + dev->exp.aer_cap; + bool isfatal = uncor_status & + pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER); - if (!event_notifier_test_and_clear(&vdev->err_notifier)) { + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : + PCI_ERR_ROOT_CMD_NONFATAL_EN; + + error_report("vfio-pci device %d sending AER to root port. uncor" + " status = 0x%"PRIx64, dev->devfn, uncor_status); + pcie_aer_msg(dev, &msg); return; } +stop: /* - * TBD. Retrieve the error details and decide what action - * needs to be taken. One of the actions could be to pass - * the error to the guest and have the guest driver recover - * from the error. This requires that PCIe capabilities be - * exposed to the guest. For now, we just terminate the - * guest to contain the error. + * If the aer capability is not exposed to the guest. we just + * terminate the guest to contain the error. */ error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name);