From patchwork Thu Nov 21 11:23:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257935 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3530D6C1 for ; Fri, 22 Nov 2019 11:42:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1F3EC20708 for ; Fri, 22 Nov 2019 11:42:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727706AbfKVLmJ (ORCPT ); Fri, 22 Nov 2019 06:42:09 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726548AbfKVLmI (ORCPT ); Fri, 22 Nov 2019 06:42:08 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110473" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:05 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 01/10] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Date: Thu, 21 Nov 2019 19:23:38 +0800 Message-Id: <1574335427-3763-2-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch fix an issue regards to always_inline. e.g.: "error: inlining failed in call to always_inline ‘vfio_pci_is_vga’: function body not available". Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 14 -------------- drivers/vfio/pci/vfio_pci_private.h | 14 ++++++++++++++ 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 0220616..79460c3 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -54,15 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(disable_idle_d3, "Disable using the PCI D3 low power state for idle, unused devices"); -static inline bool vfio_vga_disabled(void) -{ -#ifdef CONFIG_VFIO_PCI_VGA - return disable_vga; -#else - return true; -#endif -} - /* * Our VGA arbiter participation is limited since we don't know anything * about the device itself. However, if the device is the only VGA device @@ -102,11 +93,6 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) return decodes; } -static inline bool vfio_pci_is_vga(struct pci_dev *pdev) -{ - return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; -} - static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) { struct resource *res; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index ee6ee91..f12d92c 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -130,6 +130,20 @@ struct vfio_pci_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) +static inline bool vfio_pci_is_vga(struct pci_dev *pdev) +{ + return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; +} + +static inline bool vfio_vga_disabled(void) +{ +#ifdef CONFIG_VFIO_PCI_VGA + return disable_vga; +#else + return true; +#endif +} + extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); From patchwork Thu Nov 21 11:23:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257953 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 656FE138C for ; Fri, 22 Nov 2019 11:42:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4FF5B2071F for ; Fri, 22 Nov 2019 11:42:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727752AbfKVLmN (ORCPT ); Fri, 22 Nov 2019 06:42:13 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726548AbfKVLmL (ORCPT ); Fri, 22 Nov 2019 06:42:11 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110481" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:08 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 02/10] vfio_pci: refine user config reference in vfio-pci module Date: Thu, 21 Nov 2019 19:23:39 +0800 Message-Id: <1574335427-3763-3-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds three fields in struct vfio_pci_device to pass the user configs of vfio-pci module to some functions which could be common in future usage. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++++++++--------- drivers/vfio/pci/vfio_pci_private.h | 12 ++++++++++-- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 79460c3..b04e43a 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -69,7 +69,8 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) unsigned char max_busnr; unsigned int decodes; - if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus)) + if (single_vga || !vfio_vga_disabled(vdev) || + pci_is_root_bus(pdev->bus)) return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; @@ -273,7 +274,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) if (!vdev->pci_saved_state) pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); - if (likely(!nointxmask)) { + if (likely(!vdev->nointxmask)) { if (vfio_pci_nointx(pdev)) { pci_info(pdev, "Masking broken INTx support\n"); vdev->nointx = true; @@ -310,7 +311,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) } else vdev->msix_bar = 0xFF; - if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) + if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) vdev->has_vga = true; @@ -445,10 +446,17 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) vfio_pci_try_bus_reset(vdev); - if (!disable_idle_d3) + if (!vdev->disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D3hot); } +void vfio_pci_refresh_config(struct vfio_pci_device *vdev, + bool nointxmask, bool disable_idle_d3) +{ + vdev->nointxmask = nointxmask; + vdev->disable_idle_d3 = disable_idle_d3; +} + static void vfio_pci_release(void *device_data) { struct vfio_pci_device *vdev = device_data; @@ -473,6 +481,8 @@ static int vfio_pci_open(void *device_data) if (!try_module_get(THIS_MODULE)) return -ENODEV; + vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3); + mutex_lock(&vdev->reflck->lock); if (!vdev->refcnt) { @@ -1313,6 +1323,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) spin_lock_init(&vdev->irqlock); mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) { @@ -1337,7 +1352,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) vfio_pci_probe_power_state(vdev); - if (!disable_idle_d3) { + if (!vdev->disable_idle_d3) { /* * pci-core sets the device power state to an unknown value at * bootup and after being removed from a driver. The only @@ -1368,7 +1383,7 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev->region); mutex_destroy(&vdev->ioeventfds_lock); - if (!disable_idle_d3) + if (!vdev->disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D0); kfree(vdev->pm_save); @@ -1603,7 +1618,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) if (!ret) { tmp->needs_reset = false; - if (tmp != vdev && !disable_idle_d3) + if (tmp != vdev && !tmp->disable_idle_d3) vfio_pci_set_power_state(tmp, PCI_D3hot); } @@ -1619,7 +1634,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(void) +static void __init vfio_pci_fill_ids(char *ids) { char *p, *id; int rc; @@ -1674,7 +1689,7 @@ static int __init vfio_pci_init(void) if (ret) goto out_driver; - vfio_pci_fill_ids(); + vfio_pci_fill_ids(ids); return 0; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index f12d92c..eae8d94 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -122,6 +122,11 @@ struct vfio_pci_device { struct list_head dummy_resources_list; struct mutex ioeventfds_lock; struct list_head ioeventfds_list; + bool nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + bool disable_vga; +#endif + bool disable_idle_d3; }; #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) @@ -135,15 +140,18 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; } -static inline bool vfio_vga_disabled(void) +static inline bool vfio_vga_disabled(struct vfio_pci_device *vdev) { #ifdef CONFIG_VFIO_PCI_VGA - return disable_vga; + return vdev->disable_vga; #else return true; #endif } +extern void vfio_pci_refresh_config(struct vfio_pci_device *vdev, + bool nointxmask, bool disable_idle_d3); + extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); From patchwork Thu Nov 21 11:23:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257937 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0558A138C for ; Fri, 22 Nov 2019 11:42:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E33DA20718 for ; Fri, 22 Nov 2019 11:42:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727783AbfKVLmQ (ORCPT ); Fri, 22 Nov 2019 06:42:16 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727770AbfKVLmO (ORCPT ); Fri, 22 Nov 2019 06:42:14 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110486" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:11 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 03/10] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Date: Thu, 21 Nov 2019 19:23:40 +0800 Message-Id: <1574335427-3763-4-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch replaces the vfio_pci_driver reference in vfio_pci.c with pci_dev_driver(vdev->pdev) which is more helpful to make the functions be generic to module types. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b04e43a..2096e66 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1460,24 +1460,25 @@ static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) { - struct vfio_pci_reflck **preflck = data; + struct vfio_pci_device *vdev = data; + struct vfio_pci_reflck **preflck = &vdev->reflck; struct vfio_device *device; - struct vfio_pci_device *vdev; + struct vfio_pci_device *tmp; device = vfio_device_get_from_dev(&pdev->dev); if (!device) return 0; - if (pci_dev_driver(pdev) != &vfio_pci_driver) { + if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { vfio_device_put(device); return 0; } - vdev = vfio_device_data(device); + tmp = vfio_device_data(device); - if (vdev->reflck) { - vfio_pci_reflck_get(vdev->reflck); - *preflck = vdev->reflck; + if (tmp->reflck) { + vfio_pci_reflck_get(tmp->reflck); + *preflck = tmp->reflck; vfio_device_put(device); return 1; } @@ -1494,7 +1495,7 @@ static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) if (pci_is_root_bus(vdev->pdev->bus) || vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, - &vdev->reflck, slot) <= 0) + vdev, slot) <= 0) vdev->reflck = vfio_pci_reflck_alloc(); mutex_unlock(&reflck_lock); @@ -1519,6 +1520,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) struct vfio_devices { struct vfio_device **devices; + struct vfio_pci_device *vdev; int cur_index; int max_index; }; @@ -1527,7 +1529,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; - struct vfio_pci_device *vdev; + struct vfio_pci_device *tmp; if (devs->cur_index == devs->max_index) return -ENOSPC; @@ -1536,15 +1538,15 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) if (!device) return -EINVAL; - if (pci_dev_driver(pdev) != &vfio_pci_driver) { + if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { vfio_device_put(device); return -EBUSY; } - vdev = vfio_device_data(device); + tmp = vfio_device_data(device); /* Fault if the device is not unused */ - if (vdev->refcnt) { + if (tmp->refcnt) { vfio_device_put(device); return -EBUSY; } @@ -1590,6 +1592,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) if (!devs.devices) return; + devs.vdev = vdev; if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_get_unused_devs, &devs, slot)) @@ -1634,7 +1637,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(char *ids) +static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; int rc; @@ -1662,7 +1665,7 @@ static void __init vfio_pci_fill_ids(char *ids) continue; } - rc = pci_add_dynid(&vfio_pci_driver, vendor, device, + rc = pci_add_dynid(driver, vendor, device, subvendor, subdevice, class, class_mask, 0); if (rc) pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", @@ -1689,7 +1692,7 @@ static int __init vfio_pci_init(void) if (ret) goto out_driver; - vfio_pci_fill_ids(ids); + vfio_pci_fill_ids(ids, &vfio_pci_driver); return 0; From patchwork Thu Nov 21 11:23:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257941 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 095016C1 for ; Fri, 22 Nov 2019 11:42:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF5E62071C for ; Fri, 22 Nov 2019 11:42:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726548AbfKVLmU (ORCPT ); Fri, 22 Nov 2019 06:42:20 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727790AbfKVLmR (ORCPT ); Fri, 22 Nov 2019 06:42:17 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110491" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:14 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 04/10] vfio_pci: make common functions be extern Date: Thu, 21 Nov 2019 19:23:41 +0800 Message-Id: <1574335427-3763-5-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch makes the common functions (module agnostic functions) in vfio_pci.c to extern. So that such functions could be moved to a common source file. *) vfio_pci_set_vga_decode *) vfio_pci_enable *) vfio_pci_disable *) vfio_pci_ioctl *) vfio_pci_read *) vfio_pci_write *) vfio_pci_mmap *) vfio_pci_request *) vfio_pci_fill_ids *) vfio_pci_reflck_attach *) vfio_pci_reflck_put *) vfio_pci_probe_power_state Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 30 +++++++++++++----------------- drivers/vfio/pci/vfio_pci_private.h | 15 +++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 2096e66..80a6b15 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -62,7 +62,7 @@ MODULE_PARM_DESC(disable_idle_d3, * has no way to get to it and routing can be disabled externally at the * bridge. */ -static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) +unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) { struct vfio_pci_device *vdev = opaque; struct pci_dev *tmp = NULL, *pdev = vdev->pdev; @@ -163,7 +163,6 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) } static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); -static void vfio_pci_disable(struct vfio_pci_device *vdev); /* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -194,7 +193,7 @@ static bool vfio_pci_nointx(struct pci_dev *pdev) return false; } -static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) +void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; u16 pmcsr; @@ -245,7 +244,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) return ret; } -static int vfio_pci_enable(struct vfio_pci_device *vdev) +int vfio_pci_enable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; int ret; @@ -352,7 +351,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) return ret; } -static void vfio_pci_disable(struct vfio_pci_device *vdev) +void vfio_pci_disable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; struct vfio_pci_dummy_resource *dummy_res, *tmp; @@ -684,8 +683,8 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, return 0; } -static long vfio_pci_ioctl(void *device_data, - unsigned int cmd, unsigned long arg) +long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) { struct vfio_pci_device *vdev = device_data; unsigned long minsz; @@ -1170,7 +1169,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf, return -EINVAL; } -static ssize_t vfio_pci_read(void *device_data, char __user *buf, +ssize_t vfio_pci_read(void *device_data, char __user *buf, size_t count, loff_t *ppos) { if (!count) @@ -1179,7 +1178,7 @@ static ssize_t vfio_pci_read(void *device_data, char __user *buf, return vfio_pci_rw(device_data, buf, count, ppos, false); } -static ssize_t vfio_pci_write(void *device_data, const char __user *buf, +ssize_t vfio_pci_write(void *device_data, const char __user *buf, size_t count, loff_t *ppos) { if (!count) @@ -1188,7 +1187,7 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf, return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); } -static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) +int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) { struct vfio_pci_device *vdev = device_data; struct pci_dev *pdev = vdev->pdev; @@ -1250,7 +1249,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) req_len, vma->vm_page_prot); } -static void vfio_pci_request(void *device_data, unsigned int count) +void vfio_pci_request(void *device_data, unsigned int count) { struct vfio_pci_device *vdev = device_data; struct pci_dev *pdev = vdev->pdev; @@ -1282,9 +1281,6 @@ static const struct vfio_device_ops vfio_pci_ops = { .request = vfio_pci_request, }; -static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck); - static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct vfio_pci_device *vdev; @@ -1487,7 +1483,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) return 0; } -static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) +int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) { bool slot = !pci_probe_reset_slot(vdev->pdev->slot); @@ -1513,7 +1509,7 @@ static void vfio_pci_reflck_release(struct kref *kref) mutex_unlock(&reflck_lock); } -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) +void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) { kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); } @@ -1637,7 +1633,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) +void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; int rc; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index eae8d94..670268e 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -185,6 +185,21 @@ extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state); +extern unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga); +extern int vfio_pci_enable(struct vfio_pci_device *vdev); +extern void vfio_pci_disable(struct vfio_pci_device *vdev); +extern long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg); +extern ssize_t vfio_pci_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos); +extern ssize_t vfio_pci_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos); +extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma); +extern void vfio_pci_request(void *device_data, unsigned int count); +extern void vfio_pci_fill_ids(char *ids, struct pci_driver *driver); +extern int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); +extern void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck); +extern void vfio_pci_probe_power_state(struct vfio_pci_device *vdev); #ifdef CONFIG_VFIO_PCI_IGD extern int vfio_pci_igd_init(struct vfio_pci_device *vdev); From patchwork Thu Nov 21 11:23:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257939 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCF50138C for ; Fri, 22 Nov 2019 11:42:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 79CD420704 for ; Fri, 22 Nov 2019 11:42:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727922AbfKVLmW (ORCPT ); Fri, 22 Nov 2019 06:42:22 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727869AbfKVLmV (ORCPT ); Fri, 22 Nov 2019 06:42:21 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110497" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:17 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 05/10] vfio_pci: duplicate vfio_pci.c Date: Thu, 21 Nov 2019 19:23:42 +0800 Message-Id: <1574335427-3763-6-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch has no code change, just a file copy. In following patches, vfio_pci_common.c will be modified to only include the common functions and related static functions in original vfio_pci.c. Meanwhile, vfio_pci.c will be modified to only include vfio-pci module specific codes. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_common.c | 1706 ++++++++++++++++++++++++++++++++++++ 1 file changed, 1706 insertions(+) create mode 100644 drivers/vfio/pci/vfio_pci_common.c diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c new file mode 100644 index 0000000..80a6b15 --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -0,0 +1,1706 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, pugs@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#define dev_fmt pr_fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +#define DRIVER_VERSION "0.2" +#define DRIVER_AUTHOR "Alex Williamson " +#define DRIVER_DESC "VFIO PCI - User Level meta-driver" + +static char ids[1024] __initdata; +module_param_string(ids, ids, sizeof(ids), 0); +MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); + +static bool nointxmask; +module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nointxmask, + "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); + +#ifdef CONFIG_VFIO_PCI_VGA +static bool disable_vga; +module_param(disable_vga, bool, S_IRUGO); +MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci"); +#endif + +static bool disable_idle_d3; +module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(disable_idle_d3, + "Disable using the PCI D3 low power state for idle, unused devices"); + +/* + * Our VGA arbiter participation is limited since we don't know anything + * about the device itself. However, if the device is the only VGA device + * downstream of a bridge and VFIO VGA support is disabled, then we can + * safely return legacy VGA IO and memory as not decoded since the user + * has no way to get to it and routing can be disabled externally at the + * bridge. + */ +unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) +{ + struct vfio_pci_device *vdev = opaque; + struct pci_dev *tmp = NULL, *pdev = vdev->pdev; + unsigned char max_busnr; + unsigned int decodes; + + if (single_vga || !vfio_vga_disabled(vdev) || + pci_is_root_bus(pdev->bus)) + return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + + max_busnr = pci_bus_max_busnr(pdev->bus); + decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM; + + while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) { + if (tmp == pdev || + pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) || + pci_is_root_bus(tmp->bus)) + continue; + + if (tmp->bus->number >= pdev->bus->number && + tmp->bus->number <= max_busnr) { + pci_dev_put(tmp); + decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + break; + } + } + + return decodes; +} + +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) +{ + struct resource *res; + int bar; + struct vfio_pci_dummy_resource *dummy_res; + + INIT_LIST_HEAD(&vdev->dummy_resources_list); + + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { + res = vdev->pdev->resource + bar; + + if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) + goto no_mmap; + + if (!(res->flags & IORESOURCE_MEM)) + goto no_mmap; + + /* + * The PCI core shouldn't set up a resource with a + * type but zero size. But there may be bugs that + * cause us to do that. + */ + if (!resource_size(res)) + goto no_mmap; + + if (resource_size(res) >= PAGE_SIZE) { + vdev->bar_mmap_supported[bar] = true; + continue; + } + + if (!(res->start & ~PAGE_MASK)) { + /* + * Add a dummy resource to reserve the remainder + * of the exclusive page in case that hot-add + * device's bar is assigned into it. + */ + dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL); + if (dummy_res == NULL) + goto no_mmap; + + dummy_res->resource.name = "vfio sub-page reserved"; + dummy_res->resource.start = res->end + 1; + dummy_res->resource.end = res->start + PAGE_SIZE - 1; + dummy_res->resource.flags = res->flags; + if (request_resource(res->parent, + &dummy_res->resource)) { + kfree(dummy_res); + goto no_mmap; + } + dummy_res->index = bar; + list_add(&dummy_res->res_next, + &vdev->dummy_resources_list); + vdev->bar_mmap_supported[bar] = true; + continue; + } + /* + * Here we don't handle the case when the BAR is not page + * aligned because we can't expect the BAR will be + * assigned into the same location in a page in guest + * when we passthrough the BAR. And it's hard to access + * this BAR in userspace because we have no way to get + * the BAR's location in a page. + */ +no_mmap: + vdev->bar_mmap_supported[bar] = false; + } +} + +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); + +/* + * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND + * _and_ the ability detect when the device is asserting INTx via PCI_STATUS. + * If a device implements the former but not the latter we would typically + * expect broken_intx_masking be set and require an exclusive interrupt. + * However since we do have control of the device's ability to assert INTx, + * we can instead pretend that the device does not implement INTx, virtualizing + * the pin register to report zero and maintaining DisINTx set on the host. + */ +static bool vfio_pci_nointx(struct pci_dev *pdev) +{ + switch (pdev->vendor) { + case PCI_VENDOR_ID_INTEL: + switch (pdev->device) { + /* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */ + case 0x1572: + case 0x1574: + case 0x1580 ... 0x1581: + case 0x1583 ... 0x158b: + case 0x37d0 ... 0x37d2: + return true; + default: + return false; + } + } + + return false; +} + +void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + u16 pmcsr; + + if (!pdev->pm_cap) + return; + + pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr); + + vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET); +} + +/* + * pci_set_power_state() wrapper handling devices which perform a soft reset on + * D3->D0 transition. Save state prior to D0/1/2->D3, stash it on the vdev, + * restore when returned to D0. Saved separately from pci_saved_state for use + * by PM capability emulation and separately from pci_dev internal saved state + * to avoid it being overwritten and consumed around other resets. + */ +int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) +{ + struct pci_dev *pdev = vdev->pdev; + bool needs_restore = false, needs_save = false; + int ret; + + if (vdev->needs_pm_restore) { + if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { + pci_save_state(pdev); + needs_save = true; + } + + if (pdev->current_state >= PCI_D3hot && state <= PCI_D0) + needs_restore = true; + } + + ret = pci_set_power_state(pdev, state); + + if (!ret) { + /* D3 might be unsupported via quirk, skip unless in D3 */ + if (needs_save && pdev->current_state >= PCI_D3hot) { + vdev->pm_save = pci_store_saved_state(pdev); + } else if (needs_restore) { + pci_load_and_free_saved_state(pdev, &vdev->pm_save); + pci_restore_state(pdev); + } + } + + return ret; +} + +int vfio_pci_enable(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + int ret; + u16 cmd; + u8 msix_pos; + + vfio_pci_set_power_state(vdev, PCI_D0); + + /* Don't allow our initial saved state to include busmaster */ + pci_clear_master(pdev); + + ret = pci_enable_device(pdev); + if (ret) + return ret; + + /* If reset fails because of the device lock, fail this path entirely */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) { + pci_disable_device(pdev); + return ret; + } + + vdev->reset_works = !ret; + pci_save_state(pdev); + vdev->pci_saved_state = pci_store_saved_state(pdev); + if (!vdev->pci_saved_state) + pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); + + if (likely(!vdev->nointxmask)) { + if (vfio_pci_nointx(pdev)) { + pci_info(pdev, "Masking broken INTx support\n"); + vdev->nointx = true; + pci_intx(pdev, 0); + } else + vdev->pci_2_3 = pci_intx_mask_supported(pdev); + } + + pci_read_config_word(pdev, PCI_COMMAND, &cmd); + if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) { + cmd &= ~PCI_COMMAND_INTX_DISABLE; + pci_write_config_word(pdev, PCI_COMMAND, cmd); + } + + ret = vfio_config_init(vdev); + if (ret) { + kfree(vdev->pci_saved_state); + vdev->pci_saved_state = NULL; + pci_disable_device(pdev); + return ret; + } + + msix_pos = pdev->msix_cap; + if (msix_pos) { + u16 flags; + u32 table; + + pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags); + pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table); + + vdev->msix_bar = table & PCI_MSIX_TABLE_BIR; + vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET; + vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16; + } else + vdev->msix_bar = 0xFF; + + if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) + vdev->has_vga = true; + + + if (vfio_pci_is_vga(pdev) && + pdev->vendor == PCI_VENDOR_ID_INTEL && + IS_ENABLED(CONFIG_VFIO_PCI_IGD)) { + ret = vfio_pci_igd_init(vdev); + if (ret) { + pci_warn(pdev, "Failed to setup Intel IGD regions\n"); + goto disable_exit; + } + } + + if (pdev->vendor == PCI_VENDOR_ID_NVIDIA && + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { + ret = vfio_pci_nvdia_v100_nvlink2_init(vdev); + if (ret && ret != -ENODEV) { + pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n"); + goto disable_exit; + } + } + + if (pdev->vendor == PCI_VENDOR_ID_IBM && + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { + ret = vfio_pci_ibm_npu2_init(vdev); + if (ret && ret != -ENODEV) { + pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n"); + goto disable_exit; + } + } + + vfio_pci_probe_mmaps(vdev); + + return 0; + +disable_exit: + vfio_pci_disable(vdev); + return ret; +} + +void vfio_pci_disable(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + struct vfio_pci_dummy_resource *dummy_res, *tmp; + struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; + int i, bar; + + /* Stop the device from further DMA */ + pci_clear_master(pdev); + + vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | + VFIO_IRQ_SET_ACTION_TRIGGER, + vdev->irq_type, 0, 0, NULL); + + /* Device closed, don't need mutex here */ + list_for_each_entry_safe(ioeventfd, ioeventfd_tmp, + &vdev->ioeventfds_list, next) { + vfio_virqfd_disable(&ioeventfd->virqfd); + list_del(&ioeventfd->next); + kfree(ioeventfd); + } + vdev->ioeventfds_nr = 0; + + vdev->virq_disabled = false; + + for (i = 0; i < vdev->num_regions; i++) + vdev->region[i].ops->release(vdev, &vdev->region[i]); + + vdev->num_regions = 0; + kfree(vdev->region); + vdev->region = NULL; /* don't krealloc a freed pointer */ + + vfio_config_free(vdev); + + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { + if (!vdev->barmap[bar]) + continue; + pci_iounmap(pdev, vdev->barmap[bar]); + pci_release_selected_regions(pdev, 1 << bar); + vdev->barmap[bar] = NULL; + } + + list_for_each_entry_safe(dummy_res, tmp, + &vdev->dummy_resources_list, res_next) { + list_del(&dummy_res->res_next); + release_resource(&dummy_res->resource); + kfree(dummy_res); + } + + vdev->needs_reset = true; + + /* + * If we have saved state, restore it. If we can reset the device, + * even better. Resetting with current state seems better than + * nothing, but saving and restoring current state without reset + * is just busy work. + */ + if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) { + pci_info(pdev, "%s: Couldn't reload saved state\n", __func__); + + if (!vdev->reset_works) + goto out; + + pci_save_state(pdev); + } + + /* + * Disable INTx and MSI, presumably to avoid spurious interrupts + * during reset. Stolen from pci_reset_function() + */ + pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); + + /* + * Try to get the locks ourselves to prevent a deadlock. The + * success of this is dependent on being able to lock the device, + * which is not always possible. + * We can not use the "try" reset interface here, which will + * overwrite the previously restored configuration information. + */ + if (vdev->reset_works && pci_cfg_access_trylock(pdev)) { + if (device_trylock(&pdev->dev)) { + if (!__pci_reset_function_locked(pdev)) + vdev->needs_reset = false; + device_unlock(&pdev->dev); + } + pci_cfg_access_unlock(pdev); + } + + pci_restore_state(pdev); +out: + pci_disable_device(pdev); + + vfio_pci_try_bus_reset(vdev); + + if (!vdev->disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D3hot); +} + +void vfio_pci_refresh_config(struct vfio_pci_device *vdev, + bool nointxmask, bool disable_idle_d3) +{ + vdev->nointxmask = nointxmask; + vdev->disable_idle_d3 = disable_idle_d3; +} + +static void vfio_pci_release(void *device_data) +{ + struct vfio_pci_device *vdev = device_data; + + mutex_lock(&vdev->reflck->lock); + + if (!(--vdev->refcnt)) { + vfio_spapr_pci_eeh_release(vdev->pdev); + vfio_pci_disable(vdev); + } + + mutex_unlock(&vdev->reflck->lock); + + module_put(THIS_MODULE); +} + +static int vfio_pci_open(void *device_data) +{ + struct vfio_pci_device *vdev = device_data; + int ret = 0; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3); + + mutex_lock(&vdev->reflck->lock); + + if (!vdev->refcnt) { + ret = vfio_pci_enable(vdev); + if (ret) + goto error; + + vfio_spapr_pci_eeh_open(vdev->pdev); + } + vdev->refcnt++; +error: + mutex_unlock(&vdev->reflck->lock); + if (ret) + module_put(THIS_MODULE); + return ret; +} + +static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) +{ + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { + u8 pin; + + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || + vdev->nointx || vdev->pdev->is_virtfn) + return 0; + + pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); + + return pin ? 1 : 0; + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) { + u8 pos; + u16 flags; + + pos = vdev->pdev->msi_cap; + if (pos) { + pci_read_config_word(vdev->pdev, + pos + PCI_MSI_FLAGS, &flags); + return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1); + } + } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) { + u8 pos; + u16 flags; + + pos = vdev->pdev->msix_cap; + if (pos) { + pci_read_config_word(vdev->pdev, + pos + PCI_MSIX_FLAGS, &flags); + + return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; + } + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { + if (pci_is_pcie(vdev->pdev)) + return 1; + } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { + return 1; + } + + return 0; +} + +static int vfio_pci_count_devs(struct pci_dev *pdev, void *data) +{ + (*(int *)data)++; + return 0; +} + +struct vfio_pci_fill_info { + int max; + int cur; + struct vfio_pci_dependent_device *devices; +}; + +static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_fill_info *fill = data; + struct iommu_group *iommu_group; + + if (fill->cur == fill->max) + return -EAGAIN; /* Something changed, try again */ + + iommu_group = iommu_group_get(&pdev->dev); + if (!iommu_group) + return -EPERM; /* Cannot reset non-isolated devices */ + + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group); + fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus); + fill->devices[fill->cur].bus = pdev->bus->number; + fill->devices[fill->cur].devfn = pdev->devfn; + fill->cur++; + iommu_group_put(iommu_group); + return 0; +} + +struct vfio_pci_group_entry { + struct vfio_group *group; + int id; +}; + +struct vfio_pci_group_info { + int count; + struct vfio_pci_group_entry *groups; +}; + +static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_group_info *info = data; + struct iommu_group *group; + int id, i; + + group = iommu_group_get(&pdev->dev); + if (!group) + return -EPERM; + + id = iommu_group_id(group); + + for (i = 0; i < info->count; i++) + if (info->groups[i].id == id) + break; + + iommu_group_put(group); + + return (i == info->count) ? -EINVAL : 0; +} + +static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot) +{ + for (; pdev; pdev = pdev->bus->self) + if (pdev->bus == slot->bus) + return (pdev->slot == slot); + return false; +} + +struct vfio_pci_walk_info { + int (*fn)(struct pci_dev *, void *data); + void *data; + struct pci_dev *pdev; + bool slot; + int ret; +}; + +static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_walk_info *walk = data; + + if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot)) + walk->ret = walk->fn(pdev, walk->data); + + return walk->ret; +} + +static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev, + int (*fn)(struct pci_dev *, + void *data), void *data, + bool slot) +{ + struct vfio_pci_walk_info walk = { + .fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0, + }; + + pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk); + + return walk.ret; +} + +static int msix_mmappable_cap(struct vfio_pci_device *vdev, + struct vfio_info_cap *caps) +{ + struct vfio_info_cap_header header = { + .id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, + .version = 1 + }; + + return vfio_info_add_capability(caps, &header, sizeof(header)); +} + +int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, + unsigned int type, unsigned int subtype, + const struct vfio_pci_regops *ops, + size_t size, u32 flags, void *data) +{ + struct vfio_pci_region *region; + + region = krealloc(vdev->region, + (vdev->num_regions + 1) * sizeof(*region), + GFP_KERNEL); + if (!region) + return -ENOMEM; + + vdev->region = region; + vdev->region[vdev->num_regions].type = type; + vdev->region[vdev->num_regions].subtype = subtype; + vdev->region[vdev->num_regions].ops = ops; + vdev->region[vdev->num_regions].size = size; + vdev->region[vdev->num_regions].flags = flags; + vdev->region[vdev->num_regions].data = data; + + vdev->num_regions++; + + return 0; +} + +long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) +{ + struct vfio_pci_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + info.flags = VFIO_DEVICE_FLAGS_PCI; + + if (vdev->reset_works) + info.flags |= VFIO_DEVICE_FLAGS_RESET; + + info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct pci_dev *pdev = vdev->pdev; + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + int i, ret; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = pdev->cfg_size; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = pci_resource_len(pdev, info.index); + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + if (vdev->bar_mmap_supported[info.index]) { + info.flags |= VFIO_REGION_INFO_FLAG_MMAP; + if (info.index == vdev->msix_bar) { + ret = msix_mmappable_cap(vdev, &caps); + if (ret) + return ret; + } + } + + break; + case VFIO_PCI_ROM_REGION_INDEX: + { + void __iomem *io; + size_t size; + u16 orig_cmd; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = 0; + + /* Report the BAR size, not the ROM size */ + info.size = pci_resource_len(pdev, info.index); + if (!info.size) { + /* Shadow ROMs appear as PCI option ROMs */ + if (pdev->resource[PCI_ROM_RESOURCE].flags & + IORESOURCE_ROM_SHADOW) + info.size = 0x20000; + else + break; + } + + /* + * Is it really there? Enable memory decode for + * implicit access in pci_map_rom(). + */ + pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd); + pci_write_config_word(pdev, PCI_COMMAND, + orig_cmd | PCI_COMMAND_MEMORY); + + io = pci_map_rom(pdev, &size); + if (io) { + info.flags = VFIO_REGION_INFO_FLAG_READ; + pci_unmap_rom(pdev, io); + } else { + info.size = 0; + } + + pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); + break; + } + case VFIO_PCI_VGA_REGION_INDEX: + if (!vdev->has_vga) + return -EINVAL; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0xc0000; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + + break; + default: + { + struct vfio_region_info_cap_type cap_type = { + .header.id = VFIO_REGION_INFO_CAP_TYPE, + .header.version = 1 }; + + if (info.index >= + VFIO_PCI_NUM_REGIONS + vdev->num_regions) + return -EINVAL; + info.index = array_index_nospec(info.index, + VFIO_PCI_NUM_REGIONS + + vdev->num_regions); + + i = info.index - VFIO_PCI_NUM_REGIONS; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vdev->region[i].size; + info.flags = vdev->region[i].flags; + + cap_type.type = vdev->region[i].type; + cap_type.subtype = vdev->region[i].subtype; + + ret = vfio_info_add_capability(&caps, &cap_type.header, + sizeof(cap_type)); + if (ret) + return ret; + + if (vdev->region[i].ops->add_capability) { + ret = vdev->region[i].ops->add_capability(vdev, + &vdev->region[i], &caps); + if (ret) + return ret; + } + } + } + + if (caps.size) { + info.flags |= VFIO_REGION_INFO_FLAG_CAPS; + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + + sizeof(info), caps.buf, + caps.size)) { + kfree(caps.buf); + return -EFAULT; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: + case VFIO_PCI_REQ_IRQ_INDEX: + break; + case VFIO_PCI_ERR_IRQ_INDEX: + if (pci_is_pcie(vdev->pdev)) + break; + /* fall through */ + default: + return -EINVAL; + } + + info.flags = VFIO_IRQ_INFO_EVENTFD; + + info.count = vfio_pci_get_irq_count(vdev, info.index); + + if (info.index == VFIO_PCI_INTX_IRQ_INDEX) + info.flags |= (VFIO_IRQ_INFO_MASKABLE | + VFIO_IRQ_INFO_AUTOMASKED); + else + info.flags |= VFIO_IRQ_INFO_NORESIZE; + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + int max, ret = 0; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + max = vfio_pci_get_irq_count(vdev, hdr.index); + + ret = vfio_set_irqs_validate_and_prepare(&hdr, max, + VFIO_PCI_NUM_IRQS, &data_size); + if (ret) + return ret; + + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), + data_size); + if (IS_ERR(data)) + return PTR_ERR(data); + } + + mutex_lock(&vdev->igate); + + ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, + hdr.start, hdr.count, data); + + mutex_unlock(&vdev->igate); + kfree(data); + + return ret; + + } else if (cmd == VFIO_DEVICE_RESET) { + return vdev->reset_works ? + pci_try_reset_function(vdev->pdev) : -EINVAL; + + } else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) { + struct vfio_pci_hot_reset_info hdr; + struct vfio_pci_fill_info fill = { 0 }; + struct vfio_pci_dependent_device *devices = NULL; + bool slot = false; + int ret = 0; + + minsz = offsetofend(struct vfio_pci_hot_reset_info, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (hdr.argsz < minsz) + return -EINVAL; + + hdr.flags = 0; + + /* Can we do a slot or bus reset or neither? */ + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return -ENODEV; + + /* How many devices are affected? */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_count_devs, + &fill.max, slot); + if (ret) + return ret; + + WARN_ON(!fill.max); /* Should always be at least one */ + + /* + * If there's enough space, fill it now, otherwise return + * -ENOSPC and the number of devices affected. + */ + if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) { + ret = -ENOSPC; + hdr.count = fill.max; + goto reset_info_exit; + } + + devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL); + if (!devices) + return -ENOMEM; + + fill.devices = devices; + + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_fill_devs, + &fill, slot); + + /* + * If a device was removed between counting and filling, + * we may come up short of fill.max. If a device was + * added, we'll have a return of -EAGAIN above. + */ + if (!ret) + hdr.count = fill.cur; + +reset_info_exit: + if (copy_to_user((void __user *)arg, &hdr, minsz)) + ret = -EFAULT; + + if (!ret) { + if (copy_to_user((void __user *)(arg + minsz), devices, + hdr.count * sizeof(*devices))) + ret = -EFAULT; + } + + kfree(devices); + return ret; + + } else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) { + struct vfio_pci_hot_reset hdr; + int32_t *group_fds; + struct vfio_pci_group_entry *groups; + struct vfio_pci_group_info info; + bool slot = false; + int i, count = 0, ret = 0; + + minsz = offsetofend(struct vfio_pci_hot_reset, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (hdr.argsz < minsz || hdr.flags) + return -EINVAL; + + /* Can we do a slot or bus reset or neither? */ + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return -ENODEV; + + /* + * We can't let userspace give us an arbitrarily large + * buffer to copy, so verify how many we think there + * could be. Note groups can have multiple devices so + * one group per device is the max. + */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_count_devs, + &count, slot); + if (ret) + return ret; + + /* Somewhere between 1 and count is OK */ + if (!hdr.count || hdr.count > count) + return -EINVAL; + + group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL); + groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL); + if (!group_fds || !groups) { + kfree(group_fds); + kfree(groups); + return -ENOMEM; + } + + if (copy_from_user(group_fds, (void __user *)(arg + minsz), + hdr.count * sizeof(*group_fds))) { + kfree(group_fds); + kfree(groups); + return -EFAULT; + } + + /* + * For each group_fd, get the group through the vfio external + * user interface and store the group and iommu ID. This + * ensures the group is held across the reset. + */ + for (i = 0; i < hdr.count; i++) { + struct vfio_group *group; + struct fd f = fdget(group_fds[i]); + if (!f.file) { + ret = -EBADF; + break; + } + + group = vfio_group_get_external_user(f.file); + fdput(f); + if (IS_ERR(group)) { + ret = PTR_ERR(group); + break; + } + + groups[i].group = group; + groups[i].id = vfio_external_user_iommu_id(group); + } + + kfree(group_fds); + + /* release reference to groups on error */ + if (ret) + goto hot_reset_release; + + info.count = hdr.count; + info.groups = groups; + + /* + * Test whether all the affected devices are contained + * by the set of groups provided by the user. + */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_validate_devs, + &info, slot); + if (!ret) + /* User has access, do the reset */ + ret = pci_reset_bus(vdev->pdev); + +hot_reset_release: + for (i--; i >= 0; i--) + vfio_group_put_external_user(groups[i].group); + + kfree(groups); + return ret; + } else if (cmd == VFIO_DEVICE_IOEVENTFD) { + struct vfio_device_ioeventfd ioeventfd; + int count; + + minsz = offsetofend(struct vfio_device_ioeventfd, fd); + + if (copy_from_user(&ioeventfd, (void __user *)arg, minsz)) + return -EFAULT; + + if (ioeventfd.argsz < minsz) + return -EINVAL; + + if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK) + return -EINVAL; + + count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK; + + if (hweight8(count) != 1 || ioeventfd.fd < -1) + return -EINVAL; + + return vfio_pci_ioeventfd(vdev, ioeventfd.offset, + ioeventfd.data, count, ioeventfd.fd); + } + + return -ENOTTY; +} + +static ssize_t vfio_pci_rw(void *device_data, char __user *buf, + size_t count, loff_t *ppos, bool iswrite) +{ + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + struct vfio_pci_device *vdev = device_data; + + if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) + return -EINVAL; + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); + + case VFIO_PCI_ROM_REGION_INDEX: + if (iswrite) + return -EINVAL; + return vfio_pci_bar_rw(vdev, buf, count, ppos, false); + + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); + + case VFIO_PCI_VGA_REGION_INDEX: + return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); + default: + index -= VFIO_PCI_NUM_REGIONS; + return vdev->region[index].ops->rw(vdev, buf, + count, ppos, iswrite); + } + + return -EINVAL; +} + +ssize_t vfio_pci_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos) +{ + if (!count) + return 0; + + return vfio_pci_rw(device_data, buf, count, ppos, false); +} + +ssize_t vfio_pci_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos) +{ + if (!count) + return 0; + + return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); +} + +int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) +{ + struct vfio_pci_device *vdev = device_data; + struct pci_dev *pdev = vdev->pdev; + unsigned int index; + u64 phys_len, req_len, pgoff, req_start; + int ret; + + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + if (index >= VFIO_PCI_NUM_REGIONS) { + int regnum = index - VFIO_PCI_NUM_REGIONS; + struct vfio_pci_region *region = vdev->region + regnum; + + if (region && region->ops && region->ops->mmap && + (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) + return region->ops->mmap(vdev, region, vma); + return -EINVAL; + } + if (index >= VFIO_PCI_ROM_REGION_INDEX) + return -EINVAL; + if (!vdev->bar_mmap_supported[index]) + return -EINVAL; + + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); + req_len = vma->vm_end - vma->vm_start; + pgoff = vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + req_start = pgoff << PAGE_SHIFT; + + if (req_start + req_len > phys_len) + return -EINVAL; + + /* + * Even though we don't make use of the barmap for the mmap, + * we need to request the region and the barmap tracks that. + */ + if (!vdev->barmap[index]) { + ret = pci_request_selected_regions(pdev, + 1 << index, "vfio-pci"); + if (ret) + return ret; + + vdev->barmap[index] = pci_iomap(pdev, index, 0); + if (!vdev->barmap[index]) { + pci_release_selected_regions(pdev, 1 << index); + return -ENOMEM; + } + } + + vma->vm_private_data = vdev; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; + + return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, + req_len, vma->vm_page_prot); +} + +void vfio_pci_request(void *device_data, unsigned int count) +{ + struct vfio_pci_device *vdev = device_data; + struct pci_dev *pdev = vdev->pdev; + + mutex_lock(&vdev->igate); + + if (vdev->req_trigger) { + if (!(count % 10)) + pci_notice_ratelimited(pdev, + "Relaying device request to user (#%u)\n", + count); + eventfd_signal(vdev->req_trigger, 1); + } else if (count == 0) { + pci_warn(pdev, + "No device request channel registered, blocked until released by user\n"); + } + + mutex_unlock(&vdev->igate); +} + +static const struct vfio_device_ops vfio_pci_ops = { + .name = "vfio-pci", + .open = vfio_pci_open, + .release = vfio_pci_release, + .ioctl = vfio_pci_ioctl, + .read = vfio_pci_read, + .write = vfio_pci_write, + .mmap = vfio_pci_mmap, + .request = vfio_pci_request, +}; + +static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct vfio_pci_device *vdev; + struct iommu_group *group; + int ret; + + if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) + return -EINVAL; + + /* + * Prevent binding to PFs with VFs enabled, this too easily allows + * userspace instance with VFs and PFs from the same device, which + * cannot work. Disabling SR-IOV here would initiate removing the + * VFs, which would unbind the driver, which is prone to blocking + * if that VF is also in use by vfio-pci. Just reject these PFs + * and let the user sort it out. + */ + if (pci_num_vf(pdev)) { + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); + return -EBUSY; + } + + group = vfio_iommu_group_get(&pdev->dev); + if (!group) + return -EINVAL; + + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); + if (!vdev) { + vfio_iommu_group_put(group, &pdev->dev); + return -ENOMEM; + } + + vdev->pdev = pdev; + vdev->irq_type = VFIO_PCI_NUM_IRQS; + mutex_init(&vdev->igate); + spin_lock_init(&vdev->irqlock); + mutex_init(&vdev->ioeventfds_lock); + INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; + + ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); + if (ret) { + vfio_iommu_group_put(group, &pdev->dev); + kfree(vdev); + return ret; + } + + ret = vfio_pci_reflck_attach(vdev); + if (ret) { + vfio_del_group_dev(&pdev->dev); + vfio_iommu_group_put(group, &pdev->dev); + kfree(vdev); + return ret; + } + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); + vga_set_legacy_decoding(pdev, + vfio_pci_set_vga_decode(vdev, false)); + } + + vfio_pci_probe_power_state(vdev); + + if (!vdev->disable_idle_d3) { + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before going to D3. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_set_power_state(vdev, PCI_D3hot); + } + + return ret; +} + +static void vfio_pci_remove(struct pci_dev *pdev) +{ + struct vfio_pci_device *vdev; + + vdev = vfio_del_group_dev(&pdev->dev); + if (!vdev) + return; + + vfio_pci_reflck_put(vdev->reflck); + + vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev); + kfree(vdev->region); + mutex_destroy(&vdev->ioeventfds_lock); + + if (!vdev->disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D0); + + kfree(vdev->pm_save); + kfree(vdev); + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, NULL, NULL, NULL); + vga_set_legacy_decoding(pdev, + VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); + } +} + +static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vdev; + struct vfio_device *device; + + device = vfio_device_get_from_dev(&pdev->dev); + if (device == NULL) + return PCI_ERS_RESULT_DISCONNECT; + + vdev = vfio_device_data(device); + if (vdev == NULL) { + vfio_device_put(device); + return PCI_ERS_RESULT_DISCONNECT; + } + + mutex_lock(&vdev->igate); + + if (vdev->err_trigger) + eventfd_signal(vdev->err_trigger, 1); + + mutex_unlock(&vdev->igate); + + vfio_device_put(device); + + return PCI_ERS_RESULT_CAN_RECOVER; +} + +static const struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_pci_aer_err_detected, +}; + +static struct pci_driver vfio_pci_driver = { + .name = "vfio-pci", + .id_table = NULL, /* only dynamic ids */ + .probe = vfio_pci_probe, + .remove = vfio_pci_remove, + .err_handler = &vfio_err_handlers, +}; + +static DEFINE_MUTEX(reflck_lock); + +static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) +{ + struct vfio_pci_reflck *reflck; + + reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); + if (!reflck) + return ERR_PTR(-ENOMEM); + + kref_init(&reflck->kref); + mutex_init(&reflck->lock); + + return reflck; +} + +static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) +{ + kref_get(&reflck->kref); +} + +static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_device *vdev = data; + struct vfio_pci_reflck **preflck = &vdev->reflck; + struct vfio_device *device; + struct vfio_pci_device *tmp; + + device = vfio_device_get_from_dev(&pdev->dev); + if (!device) + return 0; + + if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { + vfio_device_put(device); + return 0; + } + + tmp = vfio_device_data(device); + + if (tmp->reflck) { + vfio_pci_reflck_get(tmp->reflck); + *preflck = tmp->reflck; + vfio_device_put(device); + return 1; + } + + vfio_device_put(device); + return 0; +} + +int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) +{ + bool slot = !pci_probe_reset_slot(vdev->pdev->slot); + + mutex_lock(&reflck_lock); + + if (pci_is_root_bus(vdev->pdev->bus) || + vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, + vdev, slot) <= 0) + vdev->reflck = vfio_pci_reflck_alloc(); + + mutex_unlock(&reflck_lock); + + return PTR_ERR_OR_ZERO(vdev->reflck); +} + +static void vfio_pci_reflck_release(struct kref *kref) +{ + struct vfio_pci_reflck *reflck = container_of(kref, + struct vfio_pci_reflck, + kref); + + kfree(reflck); + mutex_unlock(&reflck_lock); +} + +void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) +{ + kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); +} + +struct vfio_devices { + struct vfio_device **devices; + struct vfio_pci_device *vdev; + int cur_index; + int max_index; +}; + +static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_devices *devs = data; + struct vfio_device *device; + struct vfio_pci_device *tmp; + + if (devs->cur_index == devs->max_index) + return -ENOSPC; + + device = vfio_device_get_from_dev(&pdev->dev); + if (!device) + return -EINVAL; + + if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { + vfio_device_put(device); + return -EBUSY; + } + + tmp = vfio_device_data(device); + + /* Fault if the device is not unused */ + if (tmp->refcnt) { + vfio_device_put(device); + return -EBUSY; + } + + devs->devices[devs->cur_index++] = device; + return 0; +} + +/* + * If a bus or slot reset is available for the provided device and: + * - All of the devices affected by that bus or slot reset are unused + * (!refcnt) + * - At least one of the affected devices is marked dirty via + * needs_reset (such as by lack of FLR support) + * Then attempt to perform that bus or slot reset. Callers are required + * to hold vdev->reflck->lock, protecting the bus/slot reset group from + * concurrent opens. A vfio_device reference is acquired for each device + * to prevent unbinds during the reset operation. + * + * NB: vfio-core considers a group to be viable even if some devices are + * bound to drivers like pci-stub or pcieport. Here we require all devices + * to be bound to vfio_pci since that's the only way we can be sure they + * stay put. + */ +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) +{ + struct vfio_devices devs = { .cur_index = 0 }; + int i = 0, ret = -EINVAL; + bool slot = false; + struct vfio_pci_device *tmp; + + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return; + + if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, + &i, slot) || !i) + return; + + devs.max_index = i; + devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); + if (!devs.devices) + return; + + devs.vdev = vdev; + if (vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_get_unused_devs, + &devs, slot)) + goto put_devs; + + /* Does at least one need a reset? */ + for (i = 0; i < devs.cur_index; i++) { + tmp = vfio_device_data(devs.devices[i]); + if (tmp->needs_reset) { + ret = pci_reset_bus(vdev->pdev); + break; + } + } + +put_devs: + for (i = 0; i < devs.cur_index; i++) { + tmp = vfio_device_data(devs.devices[i]); + + /* + * If reset was successful, affected devices no longer need + * a reset and we should return all the collateral devices + * to low power. If not successful, we either didn't reset + * the bus or timed out waiting for it, so let's not touch + * the power state. + */ + if (!ret) { + tmp->needs_reset = false; + + if (tmp != vdev && !tmp->disable_idle_d3) + vfio_pci_set_power_state(tmp, PCI_D3hot); + } + + vfio_device_put(devs.devices[i]); + } + + kfree(devs.devices); +} + +static void __exit vfio_pci_cleanup(void) +{ + pci_unregister_driver(&vfio_pci_driver); + vfio_pci_uninit_perm_bits(); +} + +void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) +{ + char *p, *id; + int rc; + + /* no ids passed actually */ + if (ids[0] == '\0') + return; + + /* add ids specified in the module parameter */ + p = ids; + while ((id = strsep(&p, ","))) { + unsigned int vendor, device, subvendor = PCI_ANY_ID, + subdevice = PCI_ANY_ID, class = 0, class_mask = 0; + int fields; + + if (!strlen(id)) + continue; + + fields = sscanf(id, "%x:%x:%x:%x:%x:%x", + &vendor, &device, &subvendor, &subdevice, + &class, &class_mask); + + if (fields < 2) { + pr_warn("invalid id string \"%s\"\n", id); + continue; + } + + rc = pci_add_dynid(driver, vendor, device, + subvendor, subdevice, class, class_mask, 0); + if (rc) + pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", + vendor, device, subvendor, subdevice, + class, class_mask, rc); + else + pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n", + vendor, device, subvendor, subdevice, + class, class_mask); + } +} + +static int __init vfio_pci_init(void) +{ + int ret; + + /* Allocate shared config space permision data used by all devices */ + ret = vfio_pci_init_perm_bits(); + if (ret) + return ret; + + /* Register and scan for devices */ + ret = pci_register_driver(&vfio_pci_driver); + if (ret) + goto out_driver; + + vfio_pci_fill_ids(ids, &vfio_pci_driver); + + return 0; + +out_driver: + vfio_pci_uninit_perm_bits(); + return ret; +} + +module_init(vfio_pci_init); +module_exit(vfio_pci_cleanup); + +MODULE_VERSION(DRIVER_VERSION); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR(DRIVER_AUTHOR); +MODULE_DESCRIPTION(DRIVER_DESC); From patchwork Thu Nov 21 11:23:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257951 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64EC7138C for ; Fri, 22 Nov 2019 11:42:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 468F220708 for ; Fri, 22 Nov 2019 11:42:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727969AbfKVLma (ORCPT ); Fri, 22 Nov 2019 06:42:30 -0500 Received: from mga01.intel.com ([192.55.52.88]:15269 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727869AbfKVLmY (ORCPT ); Fri, 22 Nov 2019 06:42:24 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110502" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:21 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 06/10] vfio_pci: shrink vfio_pci_common.c Date: Thu, 21 Nov 2019 19:23:43 +0800 Message-Id: <1574335427-3763-7-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch removes the vfio-pci module specific codes in vfio_pci_common.c to make vfio_pci_common.c be a common source file. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_common.c | 235 ------------------------------------- 1 file changed, 235 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c index 80a6b15..4664733a 100644 --- a/drivers/vfio/pci/vfio_pci_common.c +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -30,30 +30,6 @@ #include "vfio_pci_private.h" -#define DRIVER_VERSION "0.2" -#define DRIVER_AUTHOR "Alex Williamson " -#define DRIVER_DESC "VFIO PCI - User Level meta-driver" - -static char ids[1024] __initdata; -module_param_string(ids, ids, sizeof(ids), 0); -MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); - -static bool nointxmask; -module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(nointxmask, - "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); - -#ifdef CONFIG_VFIO_PCI_VGA -static bool disable_vga; -module_param(disable_vga, bool, S_IRUGO); -MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci"); -#endif - -static bool disable_idle_d3; -module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(disable_idle_d3, - "Disable using the PCI D3 low power state for idle, unused devices"); - /* * Our VGA arbiter participation is limited since we don't know anything * about the device itself. However, if the device is the only VGA device @@ -456,49 +432,6 @@ void vfio_pci_refresh_config(struct vfio_pci_device *vdev, vdev->disable_idle_d3 = disable_idle_d3; } -static void vfio_pci_release(void *device_data) -{ - struct vfio_pci_device *vdev = device_data; - - mutex_lock(&vdev->reflck->lock); - - if (!(--vdev->refcnt)) { - vfio_spapr_pci_eeh_release(vdev->pdev); - vfio_pci_disable(vdev); - } - - mutex_unlock(&vdev->reflck->lock); - - module_put(THIS_MODULE); -} - -static int vfio_pci_open(void *device_data) -{ - struct vfio_pci_device *vdev = device_data; - int ret = 0; - - if (!try_module_get(THIS_MODULE)) - return -ENODEV; - - vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3); - - mutex_lock(&vdev->reflck->lock); - - if (!vdev->refcnt) { - ret = vfio_pci_enable(vdev); - if (ret) - goto error; - - vfio_spapr_pci_eeh_open(vdev->pdev); - } - vdev->refcnt++; -error: - mutex_unlock(&vdev->reflck->lock); - if (ret) - module_put(THIS_MODULE); - return ret; -} - static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) { if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { @@ -1270,129 +1203,6 @@ void vfio_pci_request(void *device_data, unsigned int count) mutex_unlock(&vdev->igate); } -static const struct vfio_device_ops vfio_pci_ops = { - .name = "vfio-pci", - .open = vfio_pci_open, - .release = vfio_pci_release, - .ioctl = vfio_pci_ioctl, - .read = vfio_pci_read, - .write = vfio_pci_write, - .mmap = vfio_pci_mmap, - .request = vfio_pci_request, -}; - -static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) -{ - struct vfio_pci_device *vdev; - struct iommu_group *group; - int ret; - - if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) - return -EINVAL; - - /* - * Prevent binding to PFs with VFs enabled, this too easily allows - * userspace instance with VFs and PFs from the same device, which - * cannot work. Disabling SR-IOV here would initiate removing the - * VFs, which would unbind the driver, which is prone to blocking - * if that VF is also in use by vfio-pci. Just reject these PFs - * and let the user sort it out. - */ - if (pci_num_vf(pdev)) { - pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); - return -EBUSY; - } - - group = vfio_iommu_group_get(&pdev->dev); - if (!group) - return -EINVAL; - - vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); - if (!vdev) { - vfio_iommu_group_put(group, &pdev->dev); - return -ENOMEM; - } - - vdev->pdev = pdev; - vdev->irq_type = VFIO_PCI_NUM_IRQS; - mutex_init(&vdev->igate); - spin_lock_init(&vdev->irqlock); - mutex_init(&vdev->ioeventfds_lock); - INIT_LIST_HEAD(&vdev->ioeventfds_list); - vdev->nointxmask = nointxmask; -#ifdef CONFIG_VFIO_PCI_VGA - vdev->disable_vga = disable_vga; -#endif - vdev->disable_idle_d3 = disable_idle_d3; - - ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); - if (ret) { - vfio_iommu_group_put(group, &pdev->dev); - kfree(vdev); - return ret; - } - - ret = vfio_pci_reflck_attach(vdev); - if (ret) { - vfio_del_group_dev(&pdev->dev); - vfio_iommu_group_put(group, &pdev->dev); - kfree(vdev); - return ret; - } - - if (vfio_pci_is_vga(pdev)) { - vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); - vga_set_legacy_decoding(pdev, - vfio_pci_set_vga_decode(vdev, false)); - } - - vfio_pci_probe_power_state(vdev); - - if (!vdev->disable_idle_d3) { - /* - * pci-core sets the device power state to an unknown value at - * bootup and after being removed from a driver. The only - * transition it allows from this unknown state is to D0, which - * typically happens when a driver calls pci_enable_device(). - * We're not ready to enable the device yet, but we do want to - * be able to get to D3. Therefore first do a D0 transition - * before going to D3. - */ - vfio_pci_set_power_state(vdev, PCI_D0); - vfio_pci_set_power_state(vdev, PCI_D3hot); - } - - return ret; -} - -static void vfio_pci_remove(struct pci_dev *pdev) -{ - struct vfio_pci_device *vdev; - - vdev = vfio_del_group_dev(&pdev->dev); - if (!vdev) - return; - - vfio_pci_reflck_put(vdev->reflck); - - vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev); - kfree(vdev->region); - mutex_destroy(&vdev->ioeventfds_lock); - - if (!vdev->disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); - - kfree(vdev->pm_save); - kfree(vdev); - - if (vfio_pci_is_vga(pdev)) { - vga_client_register(pdev, NULL, NULL, NULL); - vga_set_legacy_decoding(pdev, - VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | - VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); - } -} - static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -1425,14 +1235,6 @@ static const struct pci_error_handlers vfio_err_handlers = { .error_detected = vfio_pci_aer_err_detected, }; -static struct pci_driver vfio_pci_driver = { - .name = "vfio-pci", - .id_table = NULL, /* only dynamic ids */ - .probe = vfio_pci_probe, - .remove = vfio_pci_remove, - .err_handler = &vfio_err_handlers, -}; - static DEFINE_MUTEX(reflck_lock); static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) @@ -1627,12 +1429,6 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) kfree(devs.devices); } -static void __exit vfio_pci_cleanup(void) -{ - pci_unregister_driver(&vfio_pci_driver); - vfio_pci_uninit_perm_bits(); -} - void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; @@ -1673,34 +1469,3 @@ void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) class, class_mask); } } - -static int __init vfio_pci_init(void) -{ - int ret; - - /* Allocate shared config space permision data used by all devices */ - ret = vfio_pci_init_perm_bits(); - if (ret) - return ret; - - /* Register and scan for devices */ - ret = pci_register_driver(&vfio_pci_driver); - if (ret) - goto out_driver; - - vfio_pci_fill_ids(ids, &vfio_pci_driver); - - return 0; - -out_driver: - vfio_pci_uninit_perm_bits(); - return ret; -} - -module_init(vfio_pci_init); -module_exit(vfio_pci_cleanup); - -MODULE_VERSION(DRIVER_VERSION); -MODULE_LICENSE("GPL v2"); -MODULE_AUTHOR(DRIVER_AUTHOR); -MODULE_DESCRIPTION(DRIVER_DESC); From patchwork Thu Nov 21 11:23:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257943 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C332B6C1 for ; Fri, 22 Nov 2019 11:42:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A96820718 for ; Fri, 22 Nov 2019 11:42:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727947AbfKVLm3 (ORCPT ); Fri, 22 Nov 2019 06:42:29 -0500 Received: from mga01.intel.com ([192.55.52.88]:15287 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727949AbfKVLm2 (ORCPT ); Fri, 22 Nov 2019 06:42:28 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110511" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:23 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 07/10] vfio_pci: shrink vfio_pci.c Date: Thu, 21 Nov 2019 19:23:44 +0800 Message-Id: <1574335427-3763-8-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch removes the common codes in vfio_pci.c, instead, vfio-pci module will leverage the common functions implemented in vfio_pci_common.c. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/Makefile | 3 +- drivers/vfio/pci/vfio_pci.c | 1440 ----------------------------------- drivers/vfio/pci/vfio_pci_common.c | 2 +- drivers/vfio/pci/vfio_pci_private.h | 2 + 4 files changed, 5 insertions(+), 1442 deletions(-) diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index f027f8a..d94317a 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only -vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o +vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \ + vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 80a6b15..7e24da2 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -54,408 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(disable_idle_d3, "Disable using the PCI D3 low power state for idle, unused devices"); -/* - * Our VGA arbiter participation is limited since we don't know anything - * about the device itself. However, if the device is the only VGA device - * downstream of a bridge and VFIO VGA support is disabled, then we can - * safely return legacy VGA IO and memory as not decoded since the user - * has no way to get to it and routing can be disabled externally at the - * bridge. - */ -unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) -{ - struct vfio_pci_device *vdev = opaque; - struct pci_dev *tmp = NULL, *pdev = vdev->pdev; - unsigned char max_busnr; - unsigned int decodes; - - if (single_vga || !vfio_vga_disabled(vdev) || - pci_is_root_bus(pdev->bus)) - return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | - VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; - - max_busnr = pci_bus_max_busnr(pdev->bus); - decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM; - - while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) { - if (tmp == pdev || - pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) || - pci_is_root_bus(tmp->bus)) - continue; - - if (tmp->bus->number >= pdev->bus->number && - tmp->bus->number <= max_busnr) { - pci_dev_put(tmp); - decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; - break; - } - } - - return decodes; -} - -static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) -{ - struct resource *res; - int bar; - struct vfio_pci_dummy_resource *dummy_res; - - INIT_LIST_HEAD(&vdev->dummy_resources_list); - - for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { - res = vdev->pdev->resource + bar; - - if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) - goto no_mmap; - - if (!(res->flags & IORESOURCE_MEM)) - goto no_mmap; - - /* - * The PCI core shouldn't set up a resource with a - * type but zero size. But there may be bugs that - * cause us to do that. - */ - if (!resource_size(res)) - goto no_mmap; - - if (resource_size(res) >= PAGE_SIZE) { - vdev->bar_mmap_supported[bar] = true; - continue; - } - - if (!(res->start & ~PAGE_MASK)) { - /* - * Add a dummy resource to reserve the remainder - * of the exclusive page in case that hot-add - * device's bar is assigned into it. - */ - dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL); - if (dummy_res == NULL) - goto no_mmap; - - dummy_res->resource.name = "vfio sub-page reserved"; - dummy_res->resource.start = res->end + 1; - dummy_res->resource.end = res->start + PAGE_SIZE - 1; - dummy_res->resource.flags = res->flags; - if (request_resource(res->parent, - &dummy_res->resource)) { - kfree(dummy_res); - goto no_mmap; - } - dummy_res->index = bar; - list_add(&dummy_res->res_next, - &vdev->dummy_resources_list); - vdev->bar_mmap_supported[bar] = true; - continue; - } - /* - * Here we don't handle the case when the BAR is not page - * aligned because we can't expect the BAR will be - * assigned into the same location in a page in guest - * when we passthrough the BAR. And it's hard to access - * this BAR in userspace because we have no way to get - * the BAR's location in a page. - */ -no_mmap: - vdev->bar_mmap_supported[bar] = false; - } -} - -static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); - -/* - * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND - * _and_ the ability detect when the device is asserting INTx via PCI_STATUS. - * If a device implements the former but not the latter we would typically - * expect broken_intx_masking be set and require an exclusive interrupt. - * However since we do have control of the device's ability to assert INTx, - * we can instead pretend that the device does not implement INTx, virtualizing - * the pin register to report zero and maintaining DisINTx set on the host. - */ -static bool vfio_pci_nointx(struct pci_dev *pdev) -{ - switch (pdev->vendor) { - case PCI_VENDOR_ID_INTEL: - switch (pdev->device) { - /* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */ - case 0x1572: - case 0x1574: - case 0x1580 ... 0x1581: - case 0x1583 ... 0x158b: - case 0x37d0 ... 0x37d2: - return true; - default: - return false; - } - } - - return false; -} - -void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - u16 pmcsr; - - if (!pdev->pm_cap) - return; - - pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr); - - vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET); -} - -/* - * pci_set_power_state() wrapper handling devices which perform a soft reset on - * D3->D0 transition. Save state prior to D0/1/2->D3, stash it on the vdev, - * restore when returned to D0. Saved separately from pci_saved_state for use - * by PM capability emulation and separately from pci_dev internal saved state - * to avoid it being overwritten and consumed around other resets. - */ -int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) -{ - struct pci_dev *pdev = vdev->pdev; - bool needs_restore = false, needs_save = false; - int ret; - - if (vdev->needs_pm_restore) { - if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { - pci_save_state(pdev); - needs_save = true; - } - - if (pdev->current_state >= PCI_D3hot && state <= PCI_D0) - needs_restore = true; - } - - ret = pci_set_power_state(pdev, state); - - if (!ret) { - /* D3 might be unsupported via quirk, skip unless in D3 */ - if (needs_save && pdev->current_state >= PCI_D3hot) { - vdev->pm_save = pci_store_saved_state(pdev); - } else if (needs_restore) { - pci_load_and_free_saved_state(pdev, &vdev->pm_save); - pci_restore_state(pdev); - } - } - - return ret; -} - -int vfio_pci_enable(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - int ret; - u16 cmd; - u8 msix_pos; - - vfio_pci_set_power_state(vdev, PCI_D0); - - /* Don't allow our initial saved state to include busmaster */ - pci_clear_master(pdev); - - ret = pci_enable_device(pdev); - if (ret) - return ret; - - /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) { - pci_disable_device(pdev); - return ret; - } - - vdev->reset_works = !ret; - pci_save_state(pdev); - vdev->pci_saved_state = pci_store_saved_state(pdev); - if (!vdev->pci_saved_state) - pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); - - if (likely(!vdev->nointxmask)) { - if (vfio_pci_nointx(pdev)) { - pci_info(pdev, "Masking broken INTx support\n"); - vdev->nointx = true; - pci_intx(pdev, 0); - } else - vdev->pci_2_3 = pci_intx_mask_supported(pdev); - } - - pci_read_config_word(pdev, PCI_COMMAND, &cmd); - if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) { - cmd &= ~PCI_COMMAND_INTX_DISABLE; - pci_write_config_word(pdev, PCI_COMMAND, cmd); - } - - ret = vfio_config_init(vdev); - if (ret) { - kfree(vdev->pci_saved_state); - vdev->pci_saved_state = NULL; - pci_disable_device(pdev); - return ret; - } - - msix_pos = pdev->msix_cap; - if (msix_pos) { - u16 flags; - u32 table; - - pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags); - pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table); - - vdev->msix_bar = table & PCI_MSIX_TABLE_BIR; - vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET; - vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16; - } else - vdev->msix_bar = 0xFF; - - if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) - vdev->has_vga = true; - - - if (vfio_pci_is_vga(pdev) && - pdev->vendor == PCI_VENDOR_ID_INTEL && - IS_ENABLED(CONFIG_VFIO_PCI_IGD)) { - ret = vfio_pci_igd_init(vdev); - if (ret) { - pci_warn(pdev, "Failed to setup Intel IGD regions\n"); - goto disable_exit; - } - } - - if (pdev->vendor == PCI_VENDOR_ID_NVIDIA && - IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { - ret = vfio_pci_nvdia_v100_nvlink2_init(vdev); - if (ret && ret != -ENODEV) { - pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n"); - goto disable_exit; - } - } - - if (pdev->vendor == PCI_VENDOR_ID_IBM && - IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { - ret = vfio_pci_ibm_npu2_init(vdev); - if (ret && ret != -ENODEV) { - pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n"); - goto disable_exit; - } - } - - vfio_pci_probe_mmaps(vdev); - - return 0; - -disable_exit: - vfio_pci_disable(vdev); - return ret; -} - -void vfio_pci_disable(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - struct vfio_pci_dummy_resource *dummy_res, *tmp; - struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; - int i, bar; - - /* Stop the device from further DMA */ - pci_clear_master(pdev); - - vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | - VFIO_IRQ_SET_ACTION_TRIGGER, - vdev->irq_type, 0, 0, NULL); - - /* Device closed, don't need mutex here */ - list_for_each_entry_safe(ioeventfd, ioeventfd_tmp, - &vdev->ioeventfds_list, next) { - vfio_virqfd_disable(&ioeventfd->virqfd); - list_del(&ioeventfd->next); - kfree(ioeventfd); - } - vdev->ioeventfds_nr = 0; - - vdev->virq_disabled = false; - - for (i = 0; i < vdev->num_regions; i++) - vdev->region[i].ops->release(vdev, &vdev->region[i]); - - vdev->num_regions = 0; - kfree(vdev->region); - vdev->region = NULL; /* don't krealloc a freed pointer */ - - vfio_config_free(vdev); - - for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { - if (!vdev->barmap[bar]) - continue; - pci_iounmap(pdev, vdev->barmap[bar]); - pci_release_selected_regions(pdev, 1 << bar); - vdev->barmap[bar] = NULL; - } - - list_for_each_entry_safe(dummy_res, tmp, - &vdev->dummy_resources_list, res_next) { - list_del(&dummy_res->res_next); - release_resource(&dummy_res->resource); - kfree(dummy_res); - } - - vdev->needs_reset = true; - - /* - * If we have saved state, restore it. If we can reset the device, - * even better. Resetting with current state seems better than - * nothing, but saving and restoring current state without reset - * is just busy work. - */ - if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) { - pci_info(pdev, "%s: Couldn't reload saved state\n", __func__); - - if (!vdev->reset_works) - goto out; - - pci_save_state(pdev); - } - - /* - * Disable INTx and MSI, presumably to avoid spurious interrupts - * during reset. Stolen from pci_reset_function() - */ - pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); - - /* - * Try to get the locks ourselves to prevent a deadlock. The - * success of this is dependent on being able to lock the device, - * which is not always possible. - * We can not use the "try" reset interface here, which will - * overwrite the previously restored configuration information. - */ - if (vdev->reset_works && pci_cfg_access_trylock(pdev)) { - if (device_trylock(&pdev->dev)) { - if (!__pci_reset_function_locked(pdev)) - vdev->needs_reset = false; - device_unlock(&pdev->dev); - } - pci_cfg_access_unlock(pdev); - } - - pci_restore_state(pdev); -out: - pci_disable_device(pdev); - - vfio_pci_try_bus_reset(vdev); - - if (!vdev->disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D3hot); -} - -void vfio_pci_refresh_config(struct vfio_pci_device *vdev, - bool nointxmask, bool disable_idle_d3) -{ - vdev->nointxmask = nointxmask; - vdev->disable_idle_d3 = disable_idle_d3; -} - static void vfio_pci_release(void *device_data) { struct vfio_pci_device *vdev = device_data; @@ -499,777 +97,6 @@ static int vfio_pci_open(void *device_data) return ret; } -static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) -{ - if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { - u8 pin; - - if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || - vdev->nointx || vdev->pdev->is_virtfn) - return 0; - - pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); - - return pin ? 1 : 0; - } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) { - u8 pos; - u16 flags; - - pos = vdev->pdev->msi_cap; - if (pos) { - pci_read_config_word(vdev->pdev, - pos + PCI_MSI_FLAGS, &flags); - return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1); - } - } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) { - u8 pos; - u16 flags; - - pos = vdev->pdev->msix_cap; - if (pos) { - pci_read_config_word(vdev->pdev, - pos + PCI_MSIX_FLAGS, &flags); - - return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; - } - } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { - if (pci_is_pcie(vdev->pdev)) - return 1; - } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { - return 1; - } - - return 0; -} - -static int vfio_pci_count_devs(struct pci_dev *pdev, void *data) -{ - (*(int *)data)++; - return 0; -} - -struct vfio_pci_fill_info { - int max; - int cur; - struct vfio_pci_dependent_device *devices; -}; - -static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_fill_info *fill = data; - struct iommu_group *iommu_group; - - if (fill->cur == fill->max) - return -EAGAIN; /* Something changed, try again */ - - iommu_group = iommu_group_get(&pdev->dev); - if (!iommu_group) - return -EPERM; /* Cannot reset non-isolated devices */ - - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group); - fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus); - fill->devices[fill->cur].bus = pdev->bus->number; - fill->devices[fill->cur].devfn = pdev->devfn; - fill->cur++; - iommu_group_put(iommu_group); - return 0; -} - -struct vfio_pci_group_entry { - struct vfio_group *group; - int id; -}; - -struct vfio_pci_group_info { - int count; - struct vfio_pci_group_entry *groups; -}; - -static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_group_info *info = data; - struct iommu_group *group; - int id, i; - - group = iommu_group_get(&pdev->dev); - if (!group) - return -EPERM; - - id = iommu_group_id(group); - - for (i = 0; i < info->count; i++) - if (info->groups[i].id == id) - break; - - iommu_group_put(group); - - return (i == info->count) ? -EINVAL : 0; -} - -static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot) -{ - for (; pdev; pdev = pdev->bus->self) - if (pdev->bus == slot->bus) - return (pdev->slot == slot); - return false; -} - -struct vfio_pci_walk_info { - int (*fn)(struct pci_dev *, void *data); - void *data; - struct pci_dev *pdev; - bool slot; - int ret; -}; - -static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_walk_info *walk = data; - - if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot)) - walk->ret = walk->fn(pdev, walk->data); - - return walk->ret; -} - -static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev, - int (*fn)(struct pci_dev *, - void *data), void *data, - bool slot) -{ - struct vfio_pci_walk_info walk = { - .fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0, - }; - - pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk); - - return walk.ret; -} - -static int msix_mmappable_cap(struct vfio_pci_device *vdev, - struct vfio_info_cap *caps) -{ - struct vfio_info_cap_header header = { - .id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, - .version = 1 - }; - - return vfio_info_add_capability(caps, &header, sizeof(header)); -} - -int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, - unsigned int type, unsigned int subtype, - const struct vfio_pci_regops *ops, - size_t size, u32 flags, void *data) -{ - struct vfio_pci_region *region; - - region = krealloc(vdev->region, - (vdev->num_regions + 1) * sizeof(*region), - GFP_KERNEL); - if (!region) - return -ENOMEM; - - vdev->region = region; - vdev->region[vdev->num_regions].type = type; - vdev->region[vdev->num_regions].subtype = subtype; - vdev->region[vdev->num_regions].ops = ops; - vdev->region[vdev->num_regions].size = size; - vdev->region[vdev->num_regions].flags = flags; - vdev->region[vdev->num_regions].data = data; - - vdev->num_regions++; - - return 0; -} - -long vfio_pci_ioctl(void *device_data, - unsigned int cmd, unsigned long arg) -{ - struct vfio_pci_device *vdev = device_data; - unsigned long minsz; - - if (cmd == VFIO_DEVICE_GET_INFO) { - struct vfio_device_info info; - - minsz = offsetofend(struct vfio_device_info, num_irqs); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz) - return -EINVAL; - - info.flags = VFIO_DEVICE_FLAGS_PCI; - - if (vdev->reset_works) - info.flags |= VFIO_DEVICE_FLAGS_RESET; - - info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions; - info.num_irqs = VFIO_PCI_NUM_IRQS; - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { - struct pci_dev *pdev = vdev->pdev; - struct vfio_region_info info; - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; - int i, ret; - - minsz = offsetofend(struct vfio_region_info, offset); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz) - return -EINVAL; - - switch (info.index) { - case VFIO_PCI_CONFIG_REGION_INDEX: - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = pdev->cfg_size; - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - break; - case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = pci_resource_len(pdev, info.index); - if (!info.size) { - info.flags = 0; - break; - } - - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - if (vdev->bar_mmap_supported[info.index]) { - info.flags |= VFIO_REGION_INFO_FLAG_MMAP; - if (info.index == vdev->msix_bar) { - ret = msix_mmappable_cap(vdev, &caps); - if (ret) - return ret; - } - } - - break; - case VFIO_PCI_ROM_REGION_INDEX: - { - void __iomem *io; - size_t size; - u16 orig_cmd; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.flags = 0; - - /* Report the BAR size, not the ROM size */ - info.size = pci_resource_len(pdev, info.index); - if (!info.size) { - /* Shadow ROMs appear as PCI option ROMs */ - if (pdev->resource[PCI_ROM_RESOURCE].flags & - IORESOURCE_ROM_SHADOW) - info.size = 0x20000; - else - break; - } - - /* - * Is it really there? Enable memory decode for - * implicit access in pci_map_rom(). - */ - pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd); - pci_write_config_word(pdev, PCI_COMMAND, - orig_cmd | PCI_COMMAND_MEMORY); - - io = pci_map_rom(pdev, &size); - if (io) { - info.flags = VFIO_REGION_INFO_FLAG_READ; - pci_unmap_rom(pdev, io); - } else { - info.size = 0; - } - - pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); - break; - } - case VFIO_PCI_VGA_REGION_INDEX: - if (!vdev->has_vga) - return -EINVAL; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = 0xc0000; - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - - break; - default: - { - struct vfio_region_info_cap_type cap_type = { - .header.id = VFIO_REGION_INFO_CAP_TYPE, - .header.version = 1 }; - - if (info.index >= - VFIO_PCI_NUM_REGIONS + vdev->num_regions) - return -EINVAL; - info.index = array_index_nospec(info.index, - VFIO_PCI_NUM_REGIONS + - vdev->num_regions); - - i = info.index - VFIO_PCI_NUM_REGIONS; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = vdev->region[i].size; - info.flags = vdev->region[i].flags; - - cap_type.type = vdev->region[i].type; - cap_type.subtype = vdev->region[i].subtype; - - ret = vfio_info_add_capability(&caps, &cap_type.header, - sizeof(cap_type)); - if (ret) - return ret; - - if (vdev->region[i].ops->add_capability) { - ret = vdev->region[i].ops->add_capability(vdev, - &vdev->region[i], &caps); - if (ret) - return ret; - } - } - } - - if (caps.size) { - info.flags |= VFIO_REGION_INFO_FLAG_CAPS; - if (info.argsz < sizeof(info) + caps.size) { - info.argsz = sizeof(info) + caps.size; - info.cap_offset = 0; - } else { - vfio_info_cap_shift(&caps, sizeof(info)); - if (copy_to_user((void __user *)arg + - sizeof(info), caps.buf, - caps.size)) { - kfree(caps.buf); - return -EFAULT; - } - info.cap_offset = sizeof(info); - } - - kfree(caps.buf); - } - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { - struct vfio_irq_info info; - - minsz = offsetofend(struct vfio_irq_info, count); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) - return -EINVAL; - - switch (info.index) { - case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: - case VFIO_PCI_REQ_IRQ_INDEX: - break; - case VFIO_PCI_ERR_IRQ_INDEX: - if (pci_is_pcie(vdev->pdev)) - break; - /* fall through */ - default: - return -EINVAL; - } - - info.flags = VFIO_IRQ_INFO_EVENTFD; - - info.count = vfio_pci_get_irq_count(vdev, info.index); - - if (info.index == VFIO_PCI_INTX_IRQ_INDEX) - info.flags |= (VFIO_IRQ_INFO_MASKABLE | - VFIO_IRQ_INFO_AUTOMASKED); - else - info.flags |= VFIO_IRQ_INFO_NORESIZE; - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_SET_IRQS) { - struct vfio_irq_set hdr; - u8 *data = NULL; - int max, ret = 0; - size_t data_size = 0; - - minsz = offsetofend(struct vfio_irq_set, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - max = vfio_pci_get_irq_count(vdev, hdr.index); - - ret = vfio_set_irqs_validate_and_prepare(&hdr, max, - VFIO_PCI_NUM_IRQS, &data_size); - if (ret) - return ret; - - if (data_size) { - data = memdup_user((void __user *)(arg + minsz), - data_size); - if (IS_ERR(data)) - return PTR_ERR(data); - } - - mutex_lock(&vdev->igate); - - ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, - hdr.start, hdr.count, data); - - mutex_unlock(&vdev->igate); - kfree(data); - - return ret; - - } else if (cmd == VFIO_DEVICE_RESET) { - return vdev->reset_works ? - pci_try_reset_function(vdev->pdev) : -EINVAL; - - } else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) { - struct vfio_pci_hot_reset_info hdr; - struct vfio_pci_fill_info fill = { 0 }; - struct vfio_pci_dependent_device *devices = NULL; - bool slot = false; - int ret = 0; - - minsz = offsetofend(struct vfio_pci_hot_reset_info, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - if (hdr.argsz < minsz) - return -EINVAL; - - hdr.flags = 0; - - /* Can we do a slot or bus reset or neither? */ - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return -ENODEV; - - /* How many devices are affected? */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_count_devs, - &fill.max, slot); - if (ret) - return ret; - - WARN_ON(!fill.max); /* Should always be at least one */ - - /* - * If there's enough space, fill it now, otherwise return - * -ENOSPC and the number of devices affected. - */ - if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) { - ret = -ENOSPC; - hdr.count = fill.max; - goto reset_info_exit; - } - - devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL); - if (!devices) - return -ENOMEM; - - fill.devices = devices; - - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_fill_devs, - &fill, slot); - - /* - * If a device was removed between counting and filling, - * we may come up short of fill.max. If a device was - * added, we'll have a return of -EAGAIN above. - */ - if (!ret) - hdr.count = fill.cur; - -reset_info_exit: - if (copy_to_user((void __user *)arg, &hdr, minsz)) - ret = -EFAULT; - - if (!ret) { - if (copy_to_user((void __user *)(arg + minsz), devices, - hdr.count * sizeof(*devices))) - ret = -EFAULT; - } - - kfree(devices); - return ret; - - } else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) { - struct vfio_pci_hot_reset hdr; - int32_t *group_fds; - struct vfio_pci_group_entry *groups; - struct vfio_pci_group_info info; - bool slot = false; - int i, count = 0, ret = 0; - - minsz = offsetofend(struct vfio_pci_hot_reset, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - if (hdr.argsz < minsz || hdr.flags) - return -EINVAL; - - /* Can we do a slot or bus reset or neither? */ - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return -ENODEV; - - /* - * We can't let userspace give us an arbitrarily large - * buffer to copy, so verify how many we think there - * could be. Note groups can have multiple devices so - * one group per device is the max. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_count_devs, - &count, slot); - if (ret) - return ret; - - /* Somewhere between 1 and count is OK */ - if (!hdr.count || hdr.count > count) - return -EINVAL; - - group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL); - groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL); - if (!group_fds || !groups) { - kfree(group_fds); - kfree(groups); - return -ENOMEM; - } - - if (copy_from_user(group_fds, (void __user *)(arg + minsz), - hdr.count * sizeof(*group_fds))) { - kfree(group_fds); - kfree(groups); - return -EFAULT; - } - - /* - * For each group_fd, get the group through the vfio external - * user interface and store the group and iommu ID. This - * ensures the group is held across the reset. - */ - for (i = 0; i < hdr.count; i++) { - struct vfio_group *group; - struct fd f = fdget(group_fds[i]); - if (!f.file) { - ret = -EBADF; - break; - } - - group = vfio_group_get_external_user(f.file); - fdput(f); - if (IS_ERR(group)) { - ret = PTR_ERR(group); - break; - } - - groups[i].group = group; - groups[i].id = vfio_external_user_iommu_id(group); - } - - kfree(group_fds); - - /* release reference to groups on error */ - if (ret) - goto hot_reset_release; - - info.count = hdr.count; - info.groups = groups; - - /* - * Test whether all the affected devices are contained - * by the set of groups provided by the user. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_validate_devs, - &info, slot); - if (!ret) - /* User has access, do the reset */ - ret = pci_reset_bus(vdev->pdev); - -hot_reset_release: - for (i--; i >= 0; i--) - vfio_group_put_external_user(groups[i].group); - - kfree(groups); - return ret; - } else if (cmd == VFIO_DEVICE_IOEVENTFD) { - struct vfio_device_ioeventfd ioeventfd; - int count; - - minsz = offsetofend(struct vfio_device_ioeventfd, fd); - - if (copy_from_user(&ioeventfd, (void __user *)arg, minsz)) - return -EFAULT; - - if (ioeventfd.argsz < minsz) - return -EINVAL; - - if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK) - return -EINVAL; - - count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK; - - if (hweight8(count) != 1 || ioeventfd.fd < -1) - return -EINVAL; - - return vfio_pci_ioeventfd(vdev, ioeventfd.offset, - ioeventfd.data, count, ioeventfd.fd); - } - - return -ENOTTY; -} - -static ssize_t vfio_pci_rw(void *device_data, char __user *buf, - size_t count, loff_t *ppos, bool iswrite) -{ - unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); - struct vfio_pci_device *vdev = device_data; - - if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) - return -EINVAL; - - switch (index) { - case VFIO_PCI_CONFIG_REGION_INDEX: - return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); - - case VFIO_PCI_ROM_REGION_INDEX: - if (iswrite) - return -EINVAL; - return vfio_pci_bar_rw(vdev, buf, count, ppos, false); - - case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: - return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); - - case VFIO_PCI_VGA_REGION_INDEX: - return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); - default: - index -= VFIO_PCI_NUM_REGIONS; - return vdev->region[index].ops->rw(vdev, buf, - count, ppos, iswrite); - } - - return -EINVAL; -} - -ssize_t vfio_pci_read(void *device_data, char __user *buf, - size_t count, loff_t *ppos) -{ - if (!count) - return 0; - - return vfio_pci_rw(device_data, buf, count, ppos, false); -} - -ssize_t vfio_pci_write(void *device_data, const char __user *buf, - size_t count, loff_t *ppos) -{ - if (!count) - return 0; - - return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); -} - -int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) -{ - struct vfio_pci_device *vdev = device_data; - struct pci_dev *pdev = vdev->pdev; - unsigned int index; - u64 phys_len, req_len, pgoff, req_start; - int ret; - - index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); - - if (vma->vm_end < vma->vm_start) - return -EINVAL; - if ((vma->vm_flags & VM_SHARED) == 0) - return -EINVAL; - if (index >= VFIO_PCI_NUM_REGIONS) { - int regnum = index - VFIO_PCI_NUM_REGIONS; - struct vfio_pci_region *region = vdev->region + regnum; - - if (region && region->ops && region->ops->mmap && - (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) - return region->ops->mmap(vdev, region, vma); - return -EINVAL; - } - if (index >= VFIO_PCI_ROM_REGION_INDEX) - return -EINVAL; - if (!vdev->bar_mmap_supported[index]) - return -EINVAL; - - phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); - req_len = vma->vm_end - vma->vm_start; - pgoff = vma->vm_pgoff & - ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); - req_start = pgoff << PAGE_SHIFT; - - if (req_start + req_len > phys_len) - return -EINVAL; - - /* - * Even though we don't make use of the barmap for the mmap, - * we need to request the region and the barmap tracks that. - */ - if (!vdev->barmap[index]) { - ret = pci_request_selected_regions(pdev, - 1 << index, "vfio-pci"); - if (ret) - return ret; - - vdev->barmap[index] = pci_iomap(pdev, index, 0); - if (!vdev->barmap[index]) { - pci_release_selected_regions(pdev, 1 << index); - return -ENOMEM; - } - } - - vma->vm_private_data = vdev; - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; - - return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - req_len, vma->vm_page_prot); -} - -void vfio_pci_request(void *device_data, unsigned int count) -{ - struct vfio_pci_device *vdev = device_data; - struct pci_dev *pdev = vdev->pdev; - - mutex_lock(&vdev->igate); - - if (vdev->req_trigger) { - if (!(count % 10)) - pci_notice_ratelimited(pdev, - "Relaying device request to user (#%u)\n", - count); - eventfd_signal(vdev->req_trigger, 1); - } else if (count == 0) { - pci_warn(pdev, - "No device request channel registered, blocked until released by user\n"); - } - - mutex_unlock(&vdev->igate); -} - static const struct vfio_device_ops vfio_pci_ops = { .name = "vfio-pci", .open = vfio_pci_open, @@ -1393,38 +220,6 @@ static void vfio_pci_remove(struct pci_dev *pdev) } } -static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, - pci_channel_state_t state) -{ - struct vfio_pci_device *vdev; - struct vfio_device *device; - - device = vfio_device_get_from_dev(&pdev->dev); - if (device == NULL) - return PCI_ERS_RESULT_DISCONNECT; - - vdev = vfio_device_data(device); - if (vdev == NULL) { - vfio_device_put(device); - return PCI_ERS_RESULT_DISCONNECT; - } - - mutex_lock(&vdev->igate); - - if (vdev->err_trigger) - eventfd_signal(vdev->err_trigger, 1); - - mutex_unlock(&vdev->igate); - - vfio_device_put(device); - - return PCI_ERS_RESULT_CAN_RECOVER; -} - -static const struct pci_error_handlers vfio_err_handlers = { - .error_detected = vfio_pci_aer_err_detected, -}; - static struct pci_driver vfio_pci_driver = { .name = "vfio-pci", .id_table = NULL, /* only dynamic ids */ @@ -1433,247 +228,12 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, }; -static DEFINE_MUTEX(reflck_lock); - -static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) -{ - struct vfio_pci_reflck *reflck; - - reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); - if (!reflck) - return ERR_PTR(-ENOMEM); - - kref_init(&reflck->kref); - mutex_init(&reflck->lock); - - return reflck; -} - -static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) -{ - kref_get(&reflck->kref); -} - -static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_device *vdev = data; - struct vfio_pci_reflck **preflck = &vdev->reflck; - struct vfio_device *device; - struct vfio_pci_device *tmp; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return 0; - - if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { - vfio_device_put(device); - return 0; - } - - tmp = vfio_device_data(device); - - if (tmp->reflck) { - vfio_pci_reflck_get(tmp->reflck); - *preflck = tmp->reflck; - vfio_device_put(device); - return 1; - } - - vfio_device_put(device); - return 0; -} - -int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) -{ - bool slot = !pci_probe_reset_slot(vdev->pdev->slot); - - mutex_lock(&reflck_lock); - - if (pci_is_root_bus(vdev->pdev->bus) || - vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, - vdev, slot) <= 0) - vdev->reflck = vfio_pci_reflck_alloc(); - - mutex_unlock(&reflck_lock); - - return PTR_ERR_OR_ZERO(vdev->reflck); -} - -static void vfio_pci_reflck_release(struct kref *kref) -{ - struct vfio_pci_reflck *reflck = container_of(kref, - struct vfio_pci_reflck, - kref); - - kfree(reflck); - mutex_unlock(&reflck_lock); -} - -void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) -{ - kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); -} - -struct vfio_devices { - struct vfio_device **devices; - struct vfio_pci_device *vdev; - int cur_index; - int max_index; -}; - -static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_devices *devs = data; - struct vfio_device *device; - struct vfio_pci_device *tmp; - - if (devs->cur_index == devs->max_index) - return -ENOSPC; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return -EINVAL; - - if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { - vfio_device_put(device); - return -EBUSY; - } - - tmp = vfio_device_data(device); - - /* Fault if the device is not unused */ - if (tmp->refcnt) { - vfio_device_put(device); - return -EBUSY; - } - - devs->devices[devs->cur_index++] = device; - return 0; -} - -/* - * If a bus or slot reset is available for the provided device and: - * - All of the devices affected by that bus or slot reset are unused - * (!refcnt) - * - At least one of the affected devices is marked dirty via - * needs_reset (such as by lack of FLR support) - * Then attempt to perform that bus or slot reset. Callers are required - * to hold vdev->reflck->lock, protecting the bus/slot reset group from - * concurrent opens. A vfio_device reference is acquired for each device - * to prevent unbinds during the reset operation. - * - * NB: vfio-core considers a group to be viable even if some devices are - * bound to drivers like pci-stub or pcieport. Here we require all devices - * to be bound to vfio_pci since that's the only way we can be sure they - * stay put. - */ -static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) -{ - struct vfio_devices devs = { .cur_index = 0 }; - int i = 0, ret = -EINVAL; - bool slot = false; - struct vfio_pci_device *tmp; - - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return; - - if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, - &i, slot) || !i) - return; - - devs.max_index = i; - devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); - if (!devs.devices) - return; - - devs.vdev = vdev; - if (vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_get_unused_devs, - &devs, slot)) - goto put_devs; - - /* Does at least one need a reset? */ - for (i = 0; i < devs.cur_index; i++) { - tmp = vfio_device_data(devs.devices[i]); - if (tmp->needs_reset) { - ret = pci_reset_bus(vdev->pdev); - break; - } - } - -put_devs: - for (i = 0; i < devs.cur_index; i++) { - tmp = vfio_device_data(devs.devices[i]); - - /* - * If reset was successful, affected devices no longer need - * a reset and we should return all the collateral devices - * to low power. If not successful, we either didn't reset - * the bus or timed out waiting for it, so let's not touch - * the power state. - */ - if (!ret) { - tmp->needs_reset = false; - - if (tmp != vdev && !tmp->disable_idle_d3) - vfio_pci_set_power_state(tmp, PCI_D3hot); - } - - vfio_device_put(devs.devices[i]); - } - - kfree(devs.devices); -} - static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(&vfio_pci_driver); vfio_pci_uninit_perm_bits(); } -void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) -{ - char *p, *id; - int rc; - - /* no ids passed actually */ - if (ids[0] == '\0') - return; - - /* add ids specified in the module parameter */ - p = ids; - while ((id = strsep(&p, ","))) { - unsigned int vendor, device, subvendor = PCI_ANY_ID, - subdevice = PCI_ANY_ID, class = 0, class_mask = 0; - int fields; - - if (!strlen(id)) - continue; - - fields = sscanf(id, "%x:%x:%x:%x:%x:%x", - &vendor, &device, &subvendor, &subdevice, - &class, &class_mask); - - if (fields < 2) { - pr_warn("invalid id string \"%s\"\n", id); - continue; - } - - rc = pci_add_dynid(driver, vendor, device, - subvendor, subdevice, class, class_mask, 0); - if (rc) - pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", - vendor, device, subvendor, subdevice, - class, class_mask, rc); - else - pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n", - vendor, device, subvendor, subdevice, - class, class_mask); - } -} - static int __init vfio_pci_init(void) { int ret; diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c index 4664733a..42b46d0 100644 --- a/drivers/vfio/pci/vfio_pci_common.c +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -1231,7 +1231,7 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, return PCI_ERS_RESULT_CAN_RECOVER; } -static const struct pci_error_handlers vfio_err_handlers = { +const struct pci_error_handlers vfio_err_handlers = { .error_detected = vfio_pci_aer_err_detected, }; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 670268e..5d17394 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -135,6 +135,8 @@ struct vfio_pci_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) +extern const struct pci_error_handlers vfio_err_handlers; + static inline bool vfio_pci_is_vga(struct pci_dev *pdev) { return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; From patchwork Thu Nov 21 11:23:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257947 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 324AF138C for ; Fri, 22 Nov 2019 11:42:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1C02E20704 for ; Fri, 22 Nov 2019 11:42:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728051AbfKVLmg (ORCPT ); Fri, 22 Nov 2019 06:42:36 -0500 Received: from mga01.intel.com ([192.55.52.88]:15289 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727987AbfKVLmb (ORCPT ); Fri, 22 Nov 2019 06:42:31 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110518" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:28 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 08/10] vfio/pci: protect cap/ecap_perm bits alloc/free Date: Thu, 21 Nov 2019 19:23:45 +0800 Message-Id: <1574335427-3763-9-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch add a user numer track for the shared cap/ecap_perms bits, and the alloc/free will hold a semaphore to protect the operations. With the changes, first caller of vfio_pci_init_perm_bits() will initialize the bits. While the last caller of vfio_pci_uninit_perm_bits() will free the bits. This is a preparation to have multiple cap/ecap_perms bits users. Cc: Kevin Tian Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu Yi L Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_config.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index f0891bd..274c993 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -36,6 +36,13 @@ (offset >= PCI_ROM_ADDRESS && offset < PCI_ROM_ADDRESS + 4)) /* + * vfio_perm_bits_sem: prorects the shared perm_bits alloc/free + * vfio_pci_perm_bits_users: tracks the user of the shared perm_bits + */ +static DEFINE_SEMAPHORE(vfio_perm_bits_sem); +static int vfio_pci_perm_bits_users; + +/* * Lengths of PCI Config Capabilities * 0: Removed from the user visible capability list * FF: Variable length @@ -995,7 +1002,7 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm) /* * Initialize the shared permission tables */ -void vfio_pci_uninit_perm_bits(void) +static void vfio_pci_uninit_perm_bits_internal(void) { free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]); @@ -1009,10 +1016,30 @@ void vfio_pci_uninit_perm_bits(void) free_perm_bits(&ecap_perms[PCI_EXT_CAP_ID_PWR]); } +void vfio_pci_uninit_perm_bits(void) +{ + down(&vfio_perm_bits_sem); + + if (--vfio_pci_perm_bits_users > 0) + goto out; + + vfio_pci_uninit_perm_bits_internal(); + +out: + up(&vfio_perm_bits_sem); +} + int __init vfio_pci_init_perm_bits(void) { int ret; + down(&vfio_perm_bits_sem); + + if (++vfio_pci_perm_bits_users > 1) { + ret = 0; + goto out; + } + /* Basic config space */ ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]); @@ -1030,8 +1057,10 @@ int __init vfio_pci_init_perm_bits(void) ecap_perms[PCI_EXT_CAP_ID_VNDR].writefn = vfio_raw_config_write; if (ret) - vfio_pci_uninit_perm_bits(); + vfio_pci_uninit_perm_bits_internal(); +out: + up(&vfio_perm_bits_sem); return ret; } From patchwork Thu Nov 21 11:23:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257949 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3300E6C1 for ; Fri, 22 Nov 2019 11:42:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 017CF20704 for ; Fri, 22 Nov 2019 11:42:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728153AbfKVLmq (ORCPT ); Fri, 22 Nov 2019 06:42:46 -0500 Received: from mga01.intel.com ([192.55.52.88]:15289 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727482AbfKVLmg (ORCPT ); Fri, 22 Nov 2019 06:42:36 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110536" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:31 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Masahiro Yamada Subject: [PATCH v3 09/10] samples: add vfio-mdev-pci driver Date: Thu, 21 Nov 2019 19:23:46 +0800 Message-Id: <1574335427-3763-10-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds sample driver named vfio-mdev-pci. It is to wrap a PCI device as a mediated device. For a pci device, once bound to vfio-mdev-pci driver, user space access of this device will go through vfio mdev framework. The usage of the device follows mdev management method. e.g. user should create a mdev before exposing the device to user-space. Benefit of this new driver would be acting as a sample driver for recent changes from "vfio/mdev: IOMMU aware mediated device" patchset. Also it could be a good experiment driver for future device specific mdev migration support. This sample driver only supports singleton iommu groups, for non-singleton iommu groups, this sample driver doesn't work. It will fail when trying to assign the non-singleton iommu group to VMs. To use this driver: a) build and load vfio-mdev-pci.ko module execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI then load it with following command > sudo modprobe vfio > sudo modprobe vfio-pci > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko b) unbind original device driver e.g. use following command to unbind its original driver > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind c) bind vfio-mdev-pci driver to the physical device > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id d) check the supported mdev instances > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/ vfio-mdev-pci-type_name > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type_name/ available_instances create device_api devices name e) create mdev on this physical device (only 1 instance) > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \ /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type_name/create f) passthru the mdev to guest add the following line in QEMU boot command -device vfio-pci,\ sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003 g) destroy mdev > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\ remove Cc: Kevin Tian Cc: Lu Baolu Cc: Masahiro Yamada Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/pci/Makefile | 6 + drivers/vfio/pci/vfio_mdev_pci.c | 407 +++++++++++++++++++++++++++++++++++++++ samples/Kconfig | 11 ++ 3 files changed, 424 insertions(+) create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index d94317a..ac118ef 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -5,4 +5,10 @@ vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \ vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o +vfio-mdev-pci-y := vfio_mdev_pci.o vfio_pci_common.o vfio_pci_intrs.o \ + vfio_pci_rdwr.o vfio_pci_config.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o + obj-$(CONFIG_VFIO_PCI) += vfio-pci.o +obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c new file mode 100644 index 0000000..1eb4ddc --- /dev/null +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -0,0 +1,407 @@ +/* + * Copyright © 2019 Intel Corporation. + * Author: Liu Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio_pci.c: + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, pugs@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +#define DRIVER_VERSION "0.1" +#define DRIVER_AUTHOR "Liu Yi L " +#define DRIVER_DESC "VFIO Mdev PCI - Sample driver for PCI device as a mdev" + +#define VFIO_MDEV_PCI_NAME "vfio-mdev-pci" + +static char ids[1024] __initdata; +module_param_string(ids, ids, sizeof(ids), 0); +MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); + +static bool nointxmask; +module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nointxmask, + "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); + +#ifdef CONFIG_VFIO_PCI_VGA +static bool disable_vga; +module_param(disable_vga, bool, S_IRUGO); +MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci"); +#endif + +static bool disable_idle_d3; +module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(disable_idle_d3, + "Disable using the PCI D3 low power state for idle, unused devices"); + +static struct pci_driver vfio_mdev_pci_driver; + +static ssize_t +name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + return sprintf(buf, "%s-type1\n", dev_name(dev)); +} + +MDEV_TYPE_ATTR_RO(name); + +static ssize_t +available_instances_show(struct kobject *kobj, struct device *dev, char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + +MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} + +MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *vfio_mdev_pci_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group vfio_mdev_pci_type_group1 = { + .name = "type1", + .attrs = vfio_mdev_pci_types_attrs, +}; + +struct attribute_group *vfio_mdev_pci_type_groups[] = { + &vfio_mdev_pci_type_group1, + NULL, +}; + +struct vfio_mdev_pci { + struct vfio_pci_device *vdev; + struct mdev_device *mdev; + unsigned long handle; +}; + +static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct device *pdev; + struct vfio_pci_device *vdev; + struct vfio_mdev_pci *pmdev; + int ret; + + pdev = mdev_parent_dev(mdev); + vdev = dev_get_drvdata(pdev); + pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL); + if (pmdev == NULL) { + ret = -EBUSY; + goto out; + } + + pmdev->mdev = mdev; + pmdev->vdev = vdev; + mdev_set_drvdata(mdev, pmdev); + ret = mdev_set_iommu_device(mdev_dev(mdev), pdev); + if (ret) { + pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n", + __func__, dev_name(mdev_dev(mdev)), dev_name(pdev)); + goto out; + } + + pr_info("%s, creation succeeded for mdev: %s\n", __func__, + dev_name(mdev_dev(mdev))); +out: + return ret; +} + +static int vfio_mdev_pci_remove(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + kfree(pmdev); + pr_info("%s, succeeded for mdev: %s\n", __func__, + dev_name(mdev_dev(mdev))); + + return 0; +} + +static int vfio_mdev_pci_open(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_pci_device *vdev = pmdev->vdev; + int ret = 0; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + vfio_pci_refresh_config(vdev, nointxmask, disable_idle_d3); + + mutex_lock(&vdev->reflck->lock); + + if (!vdev->refcnt) { + ret = vfio_pci_enable(vdev); + if (ret) + goto error; + + vfio_spapr_pci_eeh_open(vdev->pdev); + } + vdev->refcnt++; +error: + mutex_unlock(&vdev->reflck->lock); + if (!ret) + pr_info("Succeeded to open mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + else { + pr_info("Failed to open mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + module_put(THIS_MODULE); + } + return ret; +} + +static void vfio_mdev_pci_release(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_pci_device *vdev = pmdev->vdev; + + pr_info("Release mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + + mutex_lock(&vdev->reflck->lock); + + if (!(--vdev->refcnt)) { + vfio_spapr_pci_eeh_release(vdev->pdev); + vfio_pci_disable(vdev); + } + + mutex_unlock(&vdev->reflck->lock); + + module_put(THIS_MODULE); +} + +static long vfio_mdev_pci_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_ioctl(pmdev->vdev, cmd, arg); +} + +static int vfio_mdev_pci_mmap(struct mdev_device *mdev, + struct vm_area_struct *vma) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_mmap(pmdev->vdev, vma); +} + +static ssize_t vfio_mdev_pci_read(struct mdev_device *mdev, char __user *buf, + size_t count, loff_t *ppos) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_read(pmdev->vdev, buf, count, ppos); +} + +static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev, + const char __user *buf, + size_t count, loff_t *ppos) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos); +} + +static const struct mdev_parent_ops vfio_mdev_pci_ops = { + .supported_type_groups = vfio_mdev_pci_type_groups, + .create = vfio_mdev_pci_create, + .remove = vfio_mdev_pci_remove, + + .open = vfio_mdev_pci_open, + .release = vfio_mdev_pci_release, + + .read = vfio_mdev_pci_read, + .write = vfio_mdev_pci_write, + .mmap = vfio_mdev_pci_mmap, + .ioctl = vfio_mdev_pci_ioctl, +}; + +static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct vfio_pci_device *vdev; + int ret; + + if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) + return -EINVAL; + + /* + * Prevent binding to PFs with VFs enabled, this too easily allows + * userspace instance with VFs and PFs from the same device, which + * cannot work. Disabling SR-IOV here would initiate removing the + * VFs, which would unbind the driver, which is prone to blocking + * if that VF is also in use by vfio-pci or vfio-mdev-pci. Just + * reject these PFs and let the user sort it out. + */ + if (pci_num_vf(pdev)) { + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); + return -EBUSY; + } + + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); + if (!vdev) + return -ENOMEM; + + vdev->pdev = pdev; + vdev->irq_type = VFIO_PCI_NUM_IRQS; + mutex_init(&vdev->igate); + spin_lock_init(&vdev->irqlock); + mutex_init(&vdev->ioeventfds_lock); + INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; + + pci_set_drvdata(pdev, vdev); + + ret = vfio_pci_reflck_attach(vdev); + if (ret) { + pci_set_drvdata(pdev, NULL); + kfree(vdev); + return ret; + } + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); + vga_set_legacy_decoding(pdev, + vfio_pci_set_vga_decode(vdev, false)); + } + + vfio_pci_probe_power_state(vdev); + + if (!vdev->disable_idle_d3) { + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before going to D3. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_set_power_state(vdev, PCI_D3hot); + } + + ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops); + if (ret) + pr_err("Cannot register mdev for device %s\n", + dev_name(&pdev->dev)); + else + pr_info("Wrap device %s as a mdev\n", dev_name(&pdev->dev)); + + return ret; +} + +static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) +{ + struct vfio_pci_device *vdev; + + vdev = pci_get_drvdata(pdev); + if (!vdev) + return; + + vfio_pci_reflck_put(vdev->reflck); + + kfree(vdev->region); + mutex_destroy(&vdev->ioeventfds_lock); + + if (!disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D0); + + kfree(vdev->pm_save); + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, NULL, NULL, NULL); + vga_set_legacy_decoding(pdev, + VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); + } + + kfree(vdev); +} + +static struct pci_driver vfio_mdev_pci_driver = { + .name = VFIO_MDEV_PCI_NAME, + .id_table = NULL, /* only dynamic ids */ + .probe = vfio_mdev_pci_driver_probe, + .remove = vfio_mdev_pci_driver_remove, + .err_handler = &vfio_err_handlers, +}; + +static void __exit vfio_mdev_pci_cleanup(void) +{ + pci_unregister_driver(&vfio_mdev_pci_driver); + vfio_pci_uninit_perm_bits(); +} + +static int __init vfio_mdev_pci_init(void) +{ + int ret; + + /* Allocate shared config space permision data used by all devices */ + ret = vfio_pci_init_perm_bits(); + if (ret) + return ret; + + /* Register and scan for devices */ + ret = pci_register_driver(&vfio_mdev_pci_driver); + if (ret) + goto out_driver; + + vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver); + + return 0; +out_driver: + vfio_pci_uninit_perm_bits(); + return ret; +} + +module_init(vfio_mdev_pci_init); +module_exit(vfio_mdev_pci_cleanup); + +MODULE_VERSION(DRIVER_VERSION); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR(DRIVER_AUTHOR); +MODULE_DESCRIPTION(DRIVER_DESC); diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4..1513fef 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,15 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_VFIO_MDEV_PCI + tristate "Sample driver for wrapping PCI device as a mdev" + depends on PCI && EVENTFD && VFIO_MDEV_DEVICE + select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER + help + Sample driver for wrapping a PCI device as a mdev. Once bound to + this driver, device passthru should through mdev path. + + If you don't know what to do here, say N. + endif # SAMPLES From patchwork Thu Nov 21 11:23:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11257945 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60C10138C for ; Fri, 22 Nov 2019 11:42:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4282F20730 for ; Fri, 22 Nov 2019 11:42:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728100AbfKVLmi (ORCPT ); Fri, 22 Nov 2019 06:42:38 -0500 Received: from mga01.intel.com ([192.55.52.88]:15289 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728073AbfKVLmi (ORCPT ); Fri, 22 Nov 2019 06:42:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Nov 2019 03:42:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,229,1571727600"; d="scan'208";a="358110542" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by orsmga004.jf.intel.com with ESMTP; 22 Nov 2019 03:42:34 -0800 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, jean-philippe.brucker@arm.com, peterx@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v3 10/10] samples: refine vfio-mdev-pci driver Date: Thu, 21 Nov 2019 19:23:47 +0800 Message-Id: <1574335427-3763-11-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> References: <1574335427-3763-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Alex Williamson This patch refines the implementation of original vfio-mdev-pci driver. And the vfio-mdev-pci-type_name will be named per the following rule: vmdev->attr.name = kasprintf(GFP_KERNEL, "%04x:%04x:%04x:%04x:%06x:%02x", pdev->vendor, pdev->device, pdev->subsystem_vendor, pdev->subsystem_device, pdev->class, pdev->revision); Before usage, check the /sys/bus/pci/devices/$bdf/mdev_supported_types/ to ensure the final mdev_supported_types for the specific device. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_mdev_pci.c | 123 +++++++++++++++++++++++---------------- 1 file changed, 73 insertions(+), 50 deletions(-) diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c index 1eb4ddc..a1f62cf 100644 --- a/drivers/vfio/pci/vfio_mdev_pci.c +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -65,18 +65,22 @@ MODULE_PARM_DESC(disable_idle_d3, static struct pci_driver vfio_mdev_pci_driver; -static ssize_t -name_show(struct kobject *kobj, struct device *dev, char *buf) -{ - return sprintf(buf, "%s-type1\n", dev_name(dev)); -} - -MDEV_TYPE_ATTR_RO(name); +struct vfio_mdev_pci_device { + struct vfio_pci_device vdev; + struct mdev_parent_ops ops; + struct attribute_group *groups[2]; + struct attribute_group attr; + atomic_t avail; +}; static ssize_t available_instances_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "%d\n", 1); + struct vfio_mdev_pci_device *vmdev; + + vmdev = pci_get_drvdata(to_pci_dev(dev)); + + return sprintf(buf, "%d\n", atomic_read(&vmdev->avail)); } MDEV_TYPE_ATTR_RO(available_instances); @@ -90,64 +94,61 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev, MDEV_TYPE_ATTR_RO(device_api); static struct attribute *vfio_mdev_pci_types_attrs[] = { - &mdev_type_attr_name.attr, &mdev_type_attr_device_api.attr, &mdev_type_attr_available_instances.attr, NULL, }; -static struct attribute_group vfio_mdev_pci_type_group1 = { - .name = "type1", - .attrs = vfio_mdev_pci_types_attrs, -}; - -struct attribute_group *vfio_mdev_pci_type_groups[] = { - &vfio_mdev_pci_type_group1, - NULL, -}; - struct vfio_mdev_pci { struct vfio_pci_device *vdev; struct mdev_device *mdev; - unsigned long handle; }; static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) { struct device *pdev; - struct vfio_pci_device *vdev; + struct vfio_mdev_pci_device *vmdev; struct vfio_mdev_pci *pmdev; int ret; pdev = mdev_parent_dev(mdev); - vdev = dev_get_drvdata(pdev); + vmdev = dev_get_drvdata(pdev); + + if (atomic_dec_if_positive(&vmdev->avail) < 0) + return -ENOSPC; + pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL); - if (pmdev == NULL) { - ret = -EBUSY; - goto out; + if (!pmdev) { + atomic_inc(&vmdev->avail); + return -ENOMEM; } pmdev->mdev = mdev; - pmdev->vdev = vdev; + pmdev->vdev = &vmdev->vdev; mdev_set_drvdata(mdev, pmdev); ret = mdev_set_iommu_device(mdev_dev(mdev), pdev); if (ret) { pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n", __func__, dev_name(mdev_dev(mdev)), dev_name(pdev)); - goto out; + kfree(pmdev); + atomic_inc(&vmdev->avail); + return ret; } pr_info("%s, creation succeeded for mdev: %s\n", __func__, dev_name(mdev_dev(mdev))); -out: - return ret; + return 0; } static int vfio_mdev_pci_remove(struct mdev_device *mdev) { struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_mdev_pci_device *vmdev; + + vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev); kfree(pmdev); + atomic_inc(&vmdev->avail); pr_info("%s, succeeded for mdev: %s\n", __func__, dev_name(mdev_dev(mdev))); @@ -241,24 +242,12 @@ static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev, return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos); } -static const struct mdev_parent_ops vfio_mdev_pci_ops = { - .supported_type_groups = vfio_mdev_pci_type_groups, - .create = vfio_mdev_pci_create, - .remove = vfio_mdev_pci_remove, - - .open = vfio_mdev_pci_open, - .release = vfio_mdev_pci_release, - - .read = vfio_mdev_pci_read, - .write = vfio_mdev_pci_write, - .mmap = vfio_mdev_pci_mmap, - .ioctl = vfio_mdev_pci_ioctl, -}; - static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id) { + struct vfio_mdev_pci_device *vmdev; struct vfio_pci_device *vdev; + const struct mdev_parent_ops *ops; int ret; if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) @@ -277,10 +266,38 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, return -EBUSY; } - vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); - if (!vdev) + vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL); + if (!vmdev) return -ENOMEM; + vmdev->attr.name = kasprintf(GFP_KERNEL, + "%04x:%04x:%04x:%04x:%06x:%02x", + pdev->vendor, pdev->device, + pdev->subsystem_vendor, + pdev->subsystem_device, pdev->class, + pdev->revision); + if (!vmdev->attr.name) { + kfree(vmdev); + return -ENOMEM; + } + + atomic_set(&vmdev->avail, 1); + + vmdev->attr.attrs = vfio_mdev_pci_types_attrs; + vmdev->groups[0] = &vmdev->attr; + + vmdev->ops.supported_type_groups = vmdev->groups; + vmdev->ops.create = vfio_mdev_pci_create; + vmdev->ops.remove = vfio_mdev_pci_remove; + vmdev->ops.open = vfio_mdev_pci_open; + vmdev->ops.release = vfio_mdev_pci_release; + vmdev->ops.read = vfio_mdev_pci_read; + vmdev->ops.write = vfio_mdev_pci_write; + vmdev->ops.mmap = vfio_mdev_pci_mmap; + vmdev->ops.ioctl = vfio_mdev_pci_ioctl; + ops = &vmdev->ops; + + vdev = &vmdev->vdev; vdev->pdev = pdev; vdev->irq_type = VFIO_PCI_NUM_IRQS; mutex_init(&vdev->igate); @@ -293,7 +310,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, #endif vdev->disable_idle_d3 = disable_idle_d3; - pci_set_drvdata(pdev, vdev); + pci_set_drvdata(pdev, vmdev); ret = vfio_pci_reflck_attach(vdev); if (ret) { @@ -324,7 +341,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, vfio_pci_set_power_state(vdev, PCI_D3hot); } - ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops); + ret = mdev_register_device(&pdev->dev, ops); if (ret) pr_err("Cannot register mdev for device %s\n", dev_name(&pdev->dev)); @@ -336,12 +353,17 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) { + struct vfio_mdev_pci_device *vmdev; struct vfio_pci_device *vdev; - vdev = pci_get_drvdata(pdev); - if (!vdev) + mdev_unregister_device(&pdev->dev); + + vmdev = pci_get_drvdata(pdev); + if (!vmdev) return; + vdev = &vmdev->vdev; + vfio_pci_reflck_put(vdev->reflck); kfree(vdev->region); @@ -359,7 +381,8 @@ static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); } - kfree(vdev); + kfree(vmdev->attr.name); + kfree(vmdev); } static struct pci_driver vfio_mdev_pci_driver = {