From patchwork Tue Sep 15 23:27:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778349 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CBBB6CA for ; Tue, 15 Sep 2020 23:28:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2665A208E4 for ; Tue, 15 Sep 2020 23:28:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727440AbgIOX16 (ORCPT ); Tue, 15 Sep 2020 19:27:58 -0400 Received: from mga02.intel.com ([134.134.136.20]:50253 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726986AbgIOX14 (ORCPT ); Tue, 15 Sep 2020 19:27:56 -0400 IronPort-SDR: n/mwJ0X+YkoNRgUdYzGZYYSAH8EwTI88oRGwRlBIJ4EaKRD9GOReNi2REtVWgLU7nUf+L+xqKP O4uBLOOAU/sg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="147056485" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="147056485" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:27:44 -0700 IronPort-SDR: M81TrARsbFvTOqJY0nQ7ZbdyE0ijG0xYV7QWQvaNTJN7s4eT9zFUhqqEL8JewYztGqVXGlwZVU 7CLrd7fJ2rRA== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="451619375" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:27:42 -0700 Subject: [PATCH v3 01/18] irqchip: Add IMS (Interrupt Message Storage) driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:27:42 -0700 Message-ID: <160021246221.67751.16280230469654363209.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Generic IMS irq chips and irq domain implementations for IMS based devices which store the interrupt messages in an array in device memory. Allocation and freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() interface. No special purpose IMS magic required as long as the interrupt domain is stored in the underlying device struct. [Megha : Fixed compile time errors Added necessary dependencies to IMS_MSI_ARRAY config Fixed polarity of IMS_VECTOR_CTRL Added missing read after write in write_msg Tested the IMS infrastructure with the IDXD driver] Reviewed-by: Ashok Raj Signed-off-by: Thomas Gleixner Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/irqchip/Kconfig | 14 +++ drivers/irqchip/Makefile | 1 drivers/irqchip/irq-ims-msi.c | 181 +++++++++++++++++++++++++++++++++++ include/linux/irqchip/irq-ims-msi.h | 45 +++++++++ 4 files changed, 241 insertions(+) create mode 100644 drivers/irqchip/irq-ims-msi.c create mode 100644 include/linux/irqchip/irq-ims-msi.h diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index bfc9719dbcdc..76479e9b1b4c 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -571,4 +571,18 @@ config LOONGSON_PCH_MSI help Support for the Loongson PCH MSI Controller. +config IMS_MSI + depends on PCI + select DEVICE_MSI + bool + +config IMS_MSI_ARRAY + bool "IMS Interrupt Message Storm MSI controller for device memory storage arrays" + depends on PCI + select IMS_MSI + select GENERIC_MSI_IRQ_DOMAIN + help + Support for IMS Interrupt Message Storm MSI controller + with IMS slot storage in a slot array in device memory + endmenu diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile index 133f9c45744a..5ccd8336f8e3 100644 --- a/drivers/irqchip/Makefile +++ b/drivers/irqchip/Makefile @@ -111,3 +111,4 @@ obj-$(CONFIG_LOONGSON_HTPIC) += irq-loongson-htpic.o obj-$(CONFIG_LOONGSON_HTVEC) += irq-loongson-htvec.o obj-$(CONFIG_LOONGSON_PCH_PIC) += irq-loongson-pch-pic.o obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o +obj-$(CONFIG_IMS_MSI) += irq-ims-msi.o diff --git a/drivers/irqchip/irq-ims-msi.c b/drivers/irqchip/irq-ims-msi.c new file mode 100644 index 000000000000..581febf2bca0 --- /dev/null +++ b/drivers/irqchip/irq-ims-msi.c @@ -0,0 +1,181 @@ +// SPDX-License-Identifier: GPL-2.0 +// (C) Copyright 2020 Thomas Gleixner +/* + * Shared interrupt chips and irq domains for IMS devices + */ +#include +#include +#include +#include +#include + +#include + +#ifdef CONFIG_IMS_MSI_ARRAY + +struct ims_array_data { + struct ims_array_info info; + unsigned long map[0]; +}; + +static void ims_array_mask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_MASK, ctrl); + ioread32(ctrl); /* Flush write to device */ +} + +static void ims_array_unmask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_MASK, ctrl); +} + +static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + + iowrite32(msg->address_lo, &slot->address_lo); + iowrite32(msg->address_hi, &slot->address_hi); + iowrite32(msg->data, &slot->data); + ioread32(slot); +} + +static const struct irq_chip ims_array_msi_controller = { + .name = "IMS", + .irq_mask = ims_array_mask_irq, + .irq_unmask = ims_array_unmask_irq, + .irq_write_msi_msg = ims_array_write_msi_msg, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .flags = IRQCHIP_SKIP_SET_WAKE, +}; + +static void ims_array_reset_slot(struct ims_slot __iomem *slot) +{ + iowrite32(0, &slot->address_lo); + iowrite32(0, &slot->address_hi); + iowrite32(0, &slot->data); + iowrite32(0, &slot->ctrl); +} + +static void ims_array_free_msi_store(struct irq_domain *domain, + struct device *dev) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_array_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + if (entry->device_msi.priv_iomem) { + clear_bit(entry->device_msi.hwirq, ims->map); + ims_array_reset_slot(entry->device_msi.priv_iomem); + entry->device_msi.priv_iomem = NULL; + entry->device_msi.hwirq = 0; + } + } +} + +static int ims_array_alloc_msi_store(struct irq_domain *domain, + struct device *dev, int nvec) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_array_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + unsigned int idx; + + idx = find_first_zero_bit(ims->map, ims->info.max_slots); + if (idx >= ims->info.max_slots) + goto fail; + set_bit(idx, ims->map); + entry->device_msi.priv_iomem = &ims->info.slots[idx]; + entry->device_msi.hwirq = idx; + } + return 0; + +fail: + ims_array_free_msi_store(domain, dev); + return -ENOSPC; +} + +struct ims_array_domain_template { + struct msi_domain_ops ops; + struct msi_domain_info info; +}; + +static const struct ims_array_domain_template ims_array_domain_template = { + .ops = { + .msi_alloc_store = ims_array_alloc_msi_store, + .msi_free_store = ims_array_free_msi_store, + }, + .info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | + MSI_FLAG_USE_DEF_CHIP_OPS, + .handler = handle_edge_irq, + .handler_name = "edge", + }, +}; + +struct irq_domain * +pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev, + struct ims_array_info *ims_info) +{ + struct ims_array_domain_template *info; + struct ims_array_data *data; + struct irq_domain *domain; + struct irq_chip *chip; + unsigned int size; + + /* Allocate new domain storage */ + info = kmemdup(&ims_array_domain_template, + sizeof(ims_array_domain_template), GFP_KERNEL); + if (!info) + return NULL; + /* Link the ops */ + info->info.ops = &info->ops; + + /* Allocate ims_info along with the bitmap */ + size = sizeof(*data); + size += BITS_TO_LONGS(ims_info->max_slots) * sizeof(unsigned long); + data = kzalloc(size, GFP_KERNEL); + if (!data) + goto err_info; + + data->info = *ims_info; + info->info.data = data; + + /* + * Allocate an interrupt chip because the core needs to be able to + * update it with default callbacks. + */ + chip = kmemdup(&ims_array_msi_controller, + sizeof(ims_array_msi_controller), GFP_KERNEL); + if (!chip) + goto err_data; + info->info.chip = chip; + + domain = pci_subdevice_msi_create_irq_domain(pdev, &info->info); + if (!domain) + goto err_chip; + + return domain; + +err_chip: + kfree(chip); +err_data: + kfree(data); +err_info: + kfree(info); + return NULL; +} +EXPORT_SYMBOL_GPL(pci_ims_array_create_msi_irq_domain); + +#endif /* CONFIG_IMS_MSI_ARRAY */ diff --git a/include/linux/irqchip/irq-ims-msi.h b/include/linux/irqchip/irq-ims-msi.h new file mode 100644 index 000000000000..2982ef3c0ed4 --- /dev/null +++ b/include/linux/irqchip/irq-ims-msi.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* (C) Copyright 2020 Thomas Gleixner */ + +#ifndef _LINUX_IRQCHIP_IRQ_IMS_MSI_H +#define _LINUX_IRQCHIP_IRQ_IMS_MSI_H + +#include + +/** + * ims_hw_slot - The hardware layout of an IMS based MSI message + * @address_lo: Lower 32bit address + * @address_hi: Upper 32bit address + * @data: Message data + * @ctrl: Control word + * + * This structure is used by both the device memory array and the queue + * memory variants of IMS. + */ +struct ims_slot { + u32 address_lo; + u32 address_hi; + u32 data; + u32 ctrl; +} __packed; + +/* Bit to mask the interrupt in ims_hw_slot::ctrl */ +#define IMS_VECTOR_CTRL_MASK 0x01 + +/** + * struct ims_array_info - Information to create an IMS array domain + * @slots: Pointer to the start of the array + * @max_slots: Maximum number of slots in the array + */ +struct ims_array_info { + struct ims_slot __iomem *slots; + unsigned int max_slots; +}; + +struct pci_dev; +struct irq_domain; + +struct irq_domain *pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev, + struct ims_array_info *ims_info); + +#endif From patchwork Tue Sep 15 23:27:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B3E76CA for ; Tue, 15 Sep 2020 23:32:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2C1862193E for ; Tue, 15 Sep 2020 23:32:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727320AbgIOX1z (ORCPT ); Tue, 15 Sep 2020 19:27:55 -0400 Received: from mga05.intel.com ([192.55.52.43]:50317 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727225AbgIOX1w (ORCPT ); Tue, 15 Sep 2020 19:27:52 -0400 IronPort-SDR: FAblg7oGQ+AJnrtvj1/AP6GZFWRYCNZRM/DqeoCGYquiVCdHGP60tihI4j/tmNACRgvgEWJfT0 YO/EXmiF9wCw== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="244200209" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="244200209" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:27:51 -0700 IronPort-SDR: YQADHup6O6rIdY3Au4rbDBvjquOi5gXEiGEv70TOm3Hk+GGW0CzJ7zfiyj39vZM7OT/n3bF8d5 mYLhwifju0XQ== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="451619391" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:27:49 -0700 Subject: [PATCH v3 02/18] iommu/vt-d: Add DEV-MSI support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:27:49 -0700 Message-ID: <160021246905.67751.1674517279122764758.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Megha Dey Add required support in the interrupt remapping driver for devices which generate dev-msi interrupts and use the intel remapping domain as the parent domain. Signed-off-by: Megha Dey Signed-off-by: Dave Jiang Reviewed-by: Ashok Raj --- drivers/iommu/intel/irq_remapping.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 0cfce1d3b7bb..75e388263b78 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1303,9 +1303,10 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: + case X86_IRQ_ALLOC_TYPE_DEV_MSI: if (info->type == X86_IRQ_ALLOC_TYPE_HPET) set_hpet_sid(irte, info->devid); - else + else if (info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI) set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); msg->address_hi = MSI_ADDR_BASE_HI; @@ -1358,7 +1359,8 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, if (!info || !iommu) return -EINVAL; if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI && - info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX) + info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX && + info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI) return -EINVAL; /* From patchwork Tue Sep 15 23:28:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778357 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D5CC6CA for ; Tue, 15 Sep 2020 23:28:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29C43208E4 for ; Tue, 15 Sep 2020 23:28:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727286AbgIOX2c (ORCPT ); Tue, 15 Sep 2020 19:28:32 -0400 Received: from mga09.intel.com ([134.134.136.24]:39712 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726986AbgIOX2O (ORCPT ); Tue, 15 Sep 2020 19:28:14 -0400 IronPort-SDR: NlXem3MpnGD5pw5CJ8DtIWGudIj3kwNErP/dGOz57QKr7w3/ygqDn/gz+xSifXptTcpzGBwwhI JSzy5QbJggvg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="160300147" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="160300147" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:13 -0700 IronPort-SDR: 9tDc0neWsLd7p+7S+64+T2UCIfcqrzQ9sd+AcAJXLT8WfzVlhMaYyskSs+w50LKufd+cFQJ+dO ik5+GAXHWU7w== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="451619449" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:10 -0700 Subject: [PATCH v3 05/18] dmaengine: idxd: add IMS support in base driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:28:09 -0700 Message-ID: <160021248979.67751.3799965857372703876.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In preparation for support of VFIO mediated device for idxd driver, the enabling for Interrupt Message Store (IMS) interrupts is added for the idxd With IMS support the idxd driver can dynamically allocate interrupts on a per mdev basis based on how many IMS vectors that are mapped to the mdev device. This commit only provides the support functions in the base driver and not the VFIO mdev code utilization. The commit has some portal related changes. A "portal" is a special location within the MMIO BAR2 of the DSA device where descriptors are submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the portal address determines whether the submitted descriptor is for MSI-X or IMS notification. See Intel SIOV spec for more details: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification Signed-off-by: Dave Jiang --- drivers/dma/idxd/cdev.c | 4 ++- drivers/dma/idxd/idxd.h | 14 +++++++++--- drivers/dma/idxd/init.c | 52 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/submit.c | 11 ++++++++-- drivers/dma/idxd/sysfs.c | 11 ++++++++++ 5 files changed, 84 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index c3976156db2f..06fed39ff17f 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -157,8 +157,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return rc; vma->vm_flags |= VM_DONTCOPY; - pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 2cd541c5faab..acc444ad9db6 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -146,6 +146,7 @@ enum idxd_device_state { enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, + IDXD_FLAG_SIOV_SUPPORTED, }; struct idxd_device { @@ -170,6 +171,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -177,6 +179,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; @@ -196,6 +199,7 @@ struct idxd_device { struct work_struct work; int *int_handles; + struct sbitmap ims_sbmap; }; /* IDXD software descriptor */ @@ -233,15 +237,17 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; + return prot * 0x1000 + irq_type * 0x2000; } static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) + enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type); } static inline void idxd_set_type(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index fb8b1001d35a..7d08f2b92de7 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -231,10 +231,51 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) idxd->msix_perm_offset = offsets.msix_perm * 0x100; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * 0x100; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * 0x100; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } +#define PCI_DEVSEC_CAP 0x23 +#define SIOVDVSEC1(offset) ((offset) + 0x4) +#define SIOVDVSEC2(offset) ((offset) + 0x8) +#define DVSECID 0x5 +#define SIOVCAP(offset) ((offset) + 0x14) + +static void idxd_check_siov(struct idxd_device *idxd) +{ + struct pci_dev *pdev = idxd->pdev; + struct device *dev = &pdev->dev; + int dvsec; + u16 val16; + u32 val32; + + dvsec = pci_find_ext_capability(pdev, PCI_DEVSEC_CAP); + pci_read_config_word(pdev, SIOVDVSEC1(dvsec), &val16); + if (val16 != PCI_VENDOR_ID_INTEL) { + dev_dbg(&pdev->dev, "DVSEC vendor id is not Intel\n"); + return; + } + + pci_read_config_word(pdev, SIOVDVSEC2(dvsec), &val16); + if (val16 != DVSECID) { + dev_dbg(&pdev->dev, "DVSEC ID is not SIOV\n"); + return; + } + + pci_read_config_dword(pdev, SIOVCAP(dvsec), &val32); + if ((val32 & 0x1) && idxd->hw.gen_cap.max_ims_mult) { + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(dev, "IMS size: %u\n", idxd->ims_size); + set_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags); + dev_dbg(&pdev->dev, "IMS supported for device\n"); + return; + } + + dev_dbg(&pdev->dev, "SIOV unsupported for device\n"); +} + static void idxd_read_caps(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -253,6 +294,7 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + idxd_check_siov(idxd); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); @@ -347,9 +389,19 @@ static int idxd_probe(struct idxd_device *idxd) idxd->major = idxd_cdev_get_major(idxd); + if (idxd->ims_size) { + rc = sbitmap_init_node(&idxd->ims_sbmap, idxd->ims_size, -1, + GFP_KERNEL, dev_to_node(dev)); + if (rc < 0) + goto sbitmap_fail; + } dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id); return 0; + sbitmap_fail: + mutex_lock(&idxd_idr_lock); + idr_remove(&idxd_idrs[idxd->type], idxd->id); + mutex_unlock(&idxd_idr_lock); err_idr_fail: idxd_mask_error_interrupts(idxd); idxd_mask_msix_vectors(idxd); diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index 2b3c2f132af7..8dd72cf3c838 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -27,7 +27,13 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) desc->hw->int_handle = wq->vec_ptr; } else { desc->vec_ptr = wq->vec_ptr; - desc->hw->int_handle = idxd->int_handles[desc->vec_ptr]; + /* + * int_handles are only for descriptor completion. However for device + * MSIX enumeration, vec 0 is used for misc interrupts. Therefore even + * though we are rotating through 1...N for descriptor interrupts, we + * need to acqurie the int_handles from 0..N-1. + */ + desc->hw->int_handle = idxd->int_handles[desc->vec_ptr - 1]; } return desc; @@ -87,7 +93,8 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) if (idxd->state != IDXD_DEV_ENABLED) return -EIO; - portal = wq->dportal + idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED); + portal = wq->dportal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX); + /* * The wmb() flushes writes to coherent DMA data before possibly * triggering a DMA read. The wmb() is necessary even on UP because diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 00d2347399bb..4914316ae493 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1237,6 +1237,16 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = + container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1422,6 +1432,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Tue Sep 15 23:28:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778363 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 366DF14F6 for ; Tue, 15 Sep 2020 23:29:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 159EE21973 for ; Tue, 15 Sep 2020 23:29:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727356AbgIOX3B (ORCPT ); Tue, 15 Sep 2020 19:29:01 -0400 Received: from mga03.intel.com ([134.134.136.65]:55275 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727299AbgIOX22 (ORCPT ); Tue, 15 Sep 2020 19:28:28 -0400 IronPort-SDR: hGsqlqLr9Db7CcDhUfDAy+V8dYeA2GImka0dbgA9JcBPBFizkJm2pJ0wiXznaCsy/41763p6hw BmLvJ6wf3/HQ== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="159419084" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="159419084" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:26 -0700 IronPort-SDR: xkym1UMqJjYeVJTaG62SAgqtV3sl6WNjQHMzIFvzaIEGtfxNfcBAnpLzxe+fycrzce0gcZ0zWu X1v97pUkpaAg== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="331439345" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:25 -0700 Subject: [PATCH v3 07/18] dmaengine: idxd: add basic mdev registration and helper functions From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:28:24 -0700 Message-ID: <160021250454.67751.3119489448651243709.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create a mediated device through the VFIO mediated device framework. The mdev framework allows creation of an mediated device by the driver with portion of the device's resources. The driver will emulate the slow path such as the PCI config space, MMIO bar, and the command registers. The descriptor submission portal(s) will be mmaped to the guest in order to submit descriptors directly by the guest kernel or apps. The mediated device support code in the idxd will be referred to as the Virtual Device Composition Module (vdcm). Add basic plumbing to fill out the mdev_parent_ops struct that VFIO mdev requires to support a mediated device. Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 7 drivers/dma/idxd/Makefile | 2 drivers/dma/idxd/idxd.h | 11 + drivers/dma/idxd/init.c | 11 + drivers/dma/idxd/mdev.c | 957 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/mdev.h | 118 ++++++ drivers/dma/idxd/vdev.c | 87 ++++ drivers/dma/idxd/vdev.h | 21 + 8 files changed, 1214 insertions(+) create mode 100644 drivers/dma/idxd/mdev.c create mode 100644 drivers/dma/idxd/mdev.h create mode 100644 drivers/dma/idxd/vdev.c create mode 100644 drivers/dma/idxd/vdev.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 518a1437862a..a39392157dc2 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -296,6 +296,13 @@ config INTEL_IDXD If unsure, say N. +config INTEL_IDXD_MDEV + bool "IDXD VFIO Mediated Device Support" + depends on INTEL_IDXD + depends on VFIO_MDEV + depends on VFIO_MDEV_DEVICE + depends on IRQ_BYPASS_MANAGER + config INTEL_IOATDMA tristate "Intel I/OAT DMA support" depends on PCI && X86_64 diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 8978b898d777..30cad704a95a 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o + +idxd-$(CONFIG_INTEL_IDXD_MDEV) += mdev.o vdev.o diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 22d00e980908..649b389f07e6 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -116,6 +116,7 @@ struct idxd_wq { char name[WQ_NAME_SIZE + 1]; u64 max_xfer_bytes; u32 max_batch_size; + struct list_head vdcm_list; }; struct idxd_engine { @@ -147,6 +148,7 @@ enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_SIOV_SUPPORTED, + IDXD_FLAG_MDEV_ENABLED, }; struct idxd_device { @@ -227,6 +229,11 @@ static inline bool wq_dedicated(struct idxd_wq *wq) return test_bit(WQ_FLAG_DEDICATED, &wq->flags); } +static inline bool device_mdev_enabled(struct idxd_device *idxd) +{ + return test_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); +} + enum idxd_portal_prot { IDXD_PORTAL_UNLIMITED = 0, IDXD_PORTAL_LIMITED, @@ -345,4 +352,8 @@ int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +/* mdev */ +int idxd_mdev_host_init(struct idxd_device *idxd); +void idxd_mdev_host_release(struct idxd_device *idxd); + #endif diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 7d08f2b92de7..7c5bfa14c269 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -201,6 +201,7 @@ static int idxd_setup_internals(struct idxd_device *idxd) wq->idxd_cdev.minor = -1; wq->max_xfer_bytes = idxd->max_xfer_bytes; wq->max_batch_size = idxd->max_batch_size; + INIT_LIST_HEAD(&wq->vdcm_list); } for (i = 0; i < idxd->max_engines; i++) { @@ -462,6 +463,14 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return -ENODEV; } + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV)) { + rc = idxd_mdev_host_init(idxd); + if (rc < 0) + dev_warn(dev, "VFIO mdev not setup: %d\n", rc); + else + set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); + } + rc = idxd_setup_sysfs(idxd); if (rc) { dev_err(dev, "IDXD sysfs setup failed\n"); @@ -536,6 +545,8 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV) && device_mdev_enabled(idxd)) + idxd_mdev_host_release(idxd); mutex_lock(&idxd_idr_lock); idr_remove(&idxd_idrs[idxd->type], idxd->id); mutex_unlock(&idxd_idr_lock); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c new file mode 100644 index 000000000000..8219e23b80b2 --- /dev/null +++ b/drivers/dma/idxd/mdev.c @@ -0,0 +1,957 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +static u64 idxd_pci_config[] = { + 0x001000000b258086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, +}; + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index, + unsigned int start, unsigned int count, void *data); + +static inline void reset_vconfig(struct vdcm_idxd *vidxd) +{ + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); +} + +static inline void reset_vmmio(struct vdcm_idxd *vidxd) +{ + memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ); +} + +static void idxd_vdcm_init(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + reset_vconfig(vidxd); + reset_vmmio(vidxd); + + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + idxd_wq_disable(wq); +} + +static void idxd_vdcm_release(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type); + mutex_lock(&vidxd->dev_lock); + if (!vidxd->refcount) + goto out; + + idxd_vdcm_set_irqs(vidxd, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, + VFIO_PCI_MSIX_IRQ_INDEX, 0, 0, NULL); + + vidxd_free_ims_entries(vidxd); + + /* Re-initialize the VIDXD to a pristine state for re-use */ + idxd_vdcm_init(vidxd); + vidxd->refcount--; + + out: + mutex_unlock(&vidxd->dev_lock); +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + struct idxd_wq *wq = NULL; + int i; + + /* PLACEHOLDER, wq matching comes later */ + + if (!wq) + return ERR_PTR(-ENODEV); + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return ERR_PTR(-ENOMEM); + + mutex_init(&vidxd->dev_lock); + vidxd->idxd = idxd; + vidxd->vdev.mdev = mdev; + vidxd->wq = wq; + mdev_set_drvdata(mdev, vidxd); + vidxd->type = type; + vidxd->num_wqs = VIDXD_MAX_WQS; + + for (i = 0; i < VIDXD_MAX_MSIX_VECS - 1; i++) + vidxd->ims_index[i] = -1; + + idxd_vdcm_init(vidxd); + mutex_lock(&wq->wq_lock); + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; + +static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, + const char *name) +{ + int i; + char dev_name[IDXD_MDEV_NAME_LEN]; + + for (i = 0; i < IDXD_MDEV_TYPES; i++) { + snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s", + idxd_mdev_types[i].name); + + if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN)) + return &idxd_mdev_types[i]; + } + + return NULL; +} + +static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + struct idxd_wq *wq; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = mdev_dev(mdev); + + mdev_set_iommu_device(dev, parent); + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) { + dev_err(dev, "failed to find type %s to create\n", + kobject_name(kobj)); + return -EINVAL; + } + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR(vidxd)) { + dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd)); + return PTR_ERR(vidxd); + } + + wq = vidxd->wq; + mutex_lock(&wq->wq_lock); + list_add(&vidxd->list, &wq->vdcm_list); + mutex_unlock(&wq->wq_lock); + dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev))); + + return 0; +} + +static int idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + struct device *dev = &idxd->pdev->dev; + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id); + + mutex_lock(&wq->wq_lock); + list_del(&vidxd->list); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + + kfree(vidxd); + return 0; +} + +static int idxd_vdcm_open(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + int rc; + struct vdcm_idxd_type *type = vidxd->type; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s: type: %d\n", __func__, type->type); + + mutex_lock(&vidxd->dev_lock); + if (vidxd->refcount) + goto out; + + /* allocate and setup IMS entries */ + rc = vidxd_setup_ims_entries(vidxd); + if (rc < 0) + goto out; + + vidxd->refcount++; + mutex_unlock(&vidxd->dev_lock); + + return rc; + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, size_t count, loff_t *ppos, + enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = mdev_dev(mdev); + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count); + else + rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + + read_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct mdev_device *mdev, const char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + +write_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma) +{ + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + return 0; +} + +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx, rc; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_portal, phys_portal; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = mdev_dev(mdev); + + rc = check_vma(wq, vma); + if (rc) + return rc; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + /* + * Check and see if the guest wants to map to the limited or unlimited portal. + * The driver will allow mapping to unlimited portal only if the the wq is a + * dedicated wq. Otherwise, it goes to limited. + */ + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_portal = IDXD_PORTAL_LIMITED; + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_portal = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_private_data = mdev; + vma->vm_pgoff = pgoff; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type) +{ + /* + * Even though the number of MSIX vectors supported are not tied to number of + * wqs being exported, the current design is to allow 1 vector per WQ for guest. + * So here we end up with num of wqs plus 1 that handles the misc interrupts. + */ + if (type == VFIO_PCI_MSI_IRQ_INDEX || type == VFIO_PCI_MSIX_IRQ_INDEX) + return VIDXD_MAX_MSIX_VECS; + + return 0; +} + +static irqreturn_t idxd_guest_wq_completion(int irq, void *data) +{ + struct ims_irq_entry *irq_entry = data; + struct vdcm_idxd *vidxd = irq_entry->vidxd; + int msix_idx = irq_entry->int_src; + + vidxd_send_interrupt(vidxd, msix_idx + 1); + return IRQ_HANDLED; +} + +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + int rc; + + if (!vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "disable MSIX trigger %d\n", index); + if (index) { + struct irq_bypass_producer *producer; + + producer = &vidxd->vdev.producer[index]; + irq_bypass_unregister_producer(producer); + + irq_entry = &vidxd->irq_entries[index - 1]; + if (irq_entry->irq_set) { + free_irq(irq_entry->irq, irq_entry); + irq_entry->irq_set = false; + } + rc = vidxd_disable_host_ims_pasid(vidxd, index - 1); + if (rc) + return rc; + } + eventfd_ctx_put(vidxd->vdev.msix_trigger[index]); + vidxd->vdev.msix_trigger[index] = NULL; + + return 0; +} + +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + struct eventfd_ctx *trigger; + int rc; + + /* No need to do anything if trigger already created */ + if (vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "enable MSIX trigger %d\n", index); + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index); + return PTR_ERR(trigger); + } + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (index) { + struct irq_bypass_producer *producer; + + producer = &vidxd->vdev.producer[index]; + irq_entry = &vidxd->irq_entries[index - 1]; + rc = vidxd_enable_host_ims_pasid(vidxd, index - 1); + if (rc) { + dev_warn(dev, "failed to enable host ims pasid\n"); + eventfd_ctx_put(trigger); + return rc; + } + + rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims", irq_entry); + if (rc) { + dev_warn(dev, "failed to request ims irq\n"); + eventfd_ctx_put(trigger); + vidxd_disable_host_ims_pasid(vidxd, index - 1); + return rc; + } + + producer->token = trigger; + producer->irq = irq_entry->irq; + rc = irq_bypass_register_producer(producer); + if (unlikely(rc)) + dev_info(dev, "irq bypass producer (token %p) registration failed: %d\n", + producer->token, rc); + + irq_entry->irq_set = true; + } + + vidxd->vdev.msix_trigger[index] = trigger; + return 0; +} + +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + int i, rc = 0; + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + rc = msix_trigger_register(vidxd, fd, i); + if (rc < 0) + return rc; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + default: + return -ENOTTY; + } + + if (!func) + return -ENOTTY; + + return func(vidxd, index, start, count, flags, data); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd); + + mutex_lock(&vidxd->dev_lock); + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) { + rc = -ENOMEM; + goto out; + } + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", info.index); + break; + default: { + if (info.index >= VFIO_PCI_NUM_REGIONS) + rc = -EINVAL; + else + rc = 0; + goto out; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, &sparse->header, + sizeof(*sparse) + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + goto out; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + sizeof(info), + caps.buf, caps.size)) { + kfree(caps.buf); + rc = -EFAULT; + goto out; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_MSI_IRQ_INDEX: + case VFIO_PCI_MSIX_IRQ_INDEX: + default: + rc = -EINVAL; + goto out; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vidxd, info.index); + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vidxd, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + rc = -EINVAL; + goto out; + } + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), data_size); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out; + } + } + } + + if (!data) { + rc = -EINVAL; + goto out; + } + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data); + kfree(data); + goto out; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + } + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static const struct mdev_parent_ops idxd_vdcm_ops = { + .create = idxd_vdcm_create, + .remove = idxd_vdcm_remove, + .open = idxd_vdcm_open, + .release = idxd_vdcm_release, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + +int idxd_mdev_host_init(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!test_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags)) + return -EOPNOTSUPP; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) { + dev_warn(dev, "Failed to enable aux-domain: %d\n", rc); + return rc; + } + } else { + dev_warn(dev, "No aux-domain feature.\n"); + return -EOPNOTSUPP; + } + + return mdev_register_device(dev, &idxd_vdcm_ops); +} + +void idxd_mdev_host_release(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + mdev_unregister_device(dev); + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to disable aux-domain: %d\n", + rc); + } +} diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h new file mode 100644 index 000000000000..f369d46b4435 --- /dev/null +++ b/drivers/dma/idxd/mdev.h @@ -0,0 +1,118 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MAX_MMIO_SPACE_SZ 8192 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x20000 +#define VIDXD_MAX_MSIX_ENTRIES (VIDXD_MSIX_TBL_SZ / 0x10) +#define VIDXD_MAX_WQS 1 +#define VIDXD_MAX_MSIX_VECS 2 + +#define VIDXD_ATS_OFFSET 0x100 +#define VIDXD_PRS_OFFSET 0x110 +#define VIDXD_PASID_OFFSET 0x120 +#define VIDXD_MSIX_PBA_OFFSET 0x700 + +struct ims_irq_entry { + struct vdcm_idxd *vidxd; + int int_src; + unsigned int irq; + bool irq_set; +}; + +struct idxd_vdev { + struct mdev_device *mdev; + struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; + struct irq_bypass_producer producer[VIDXD_MAX_MSIX_ENTRIES]; +}; + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct idxd_vdev vdev; + struct vdcm_idxd_type *type; + int num_wqs; + u64 ims_index[VIDXD_MAX_MSIX_VECS - 1]; + struct msix_entry ims_entry; + struct ims_irq_entry irq_entries[VIDXD_MAX_MSIX_VECS - 1]; + + /* For VM use case */ + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ]; + struct list_head list; + struct mutex dev_lock; /* lock for vidxd resources */ + + int refcount; +}; + +static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + +#define IDXD_MDEV_NAME_LEN 16 +#define IDXD_MDEV_DESCRIPTION_LEN 64 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_1_DWQ = 0, +}; + +#define IDXD_MDEV_TYPES 1 + +struct vdcm_idxd_type { + char name[IDXD_MDEV_NAME_LEN]; + char description[IDXD_MDEV_DESCRIPTION_LEN]; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(uint64_t *)buf; + break; + case 4: + val = *(uint32_t *)buf; + break; + case 2: + val = *(uint16_t *)buf; + break; + case 1: + val = *(uint8_t *)buf; + break; + } + + return val; +} + +#endif diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c new file mode 100644 index 000000000000..4da89ee61b6e --- /dev/null +++ b/drivers/dma/idxd/vdev.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) +{ + /* PLACE HOLDER */ + return 0; +} + +int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ + return 0; +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h new file mode 100644 index 000000000000..cd5a30d90e9a --- /dev/null +++ b/drivers/dma/idxd/vdev.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_VDEV_H_ +#define _IDXD_VDEV_H_ + +#include "mdev.h" + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +void vidxd_mmio_init(struct vdcm_idxd *vidxd); +void vidxd_reset(struct vdcm_idxd *vidxd); +int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); +int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); + +#endif From patchwork Tue Sep 15 23:28:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3B6D6CA for ; Tue, 15 Sep 2020 23:31:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B932D20936 for ; Tue, 15 Sep 2020 23:31:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727479AbgIOXbA (ORCPT ); Tue, 15 Sep 2020 19:31:00 -0400 Received: from mga02.intel.com ([134.134.136.20]:50340 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727045AbgIOX2x (ORCPT ); Tue, 15 Sep 2020 19:28:53 -0400 IronPort-SDR: yTS8nwSqjrzd/Fyu4vQm6k5renZxOTeTD7LUpKURV+tL7WC1skWhOAYndFpqPPjyb85UuC0Qqw hpd+Mjir8ing== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="147056571" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="147056571" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:46 -0700 IronPort-SDR: QVl+VYDQ3TqVexTybzwRcZRmTi9hYwmMS6VOuElneLsq+tEOt7UieYywF8xXQ4eEqgi3FbuXy0 K/QOkXKGxsiQ== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="451619583" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:45 -0700 Subject: [PATCH v3 10/18] dmaengine: idxd: virtual device commands emulation From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:28:45 -0700 Message-ID: <160021252504.67751.15983528921860572465.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add all the helper functions that supports the emulation of the commands that are submitted to the device command register. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 15 +- drivers/dma/idxd/vdev.c | 410 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 420 insertions(+), 5 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index 97acd0aaecbe..6531a40fad2e 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -115,7 +115,8 @@ union gencfg_reg { union genctrl_reg { struct { u32 softerr_int_en:1; - u32 rsvd:31; + u32 halt_state_int_en:1; + u32 rsvd:30; }; u32 bits; } __packed; @@ -137,6 +138,8 @@ enum idxd_device_status_state { IDXD_DEVICE_STATE_HALT, }; +#define IDXD_GENSTATS_MASK 0x03 + enum idxd_device_reset_type { IDXD_DEVICE_RESET_SOFTWARE = 0, IDXD_DEVICE_RESET_FLR, @@ -149,6 +152,7 @@ enum idxd_device_reset_type { #define IDXD_INTC_CMD 0x02 #define IDXD_INTC_OCCUPY 0x04 #define IDXD_INTC_PERFMON_OVFL 0x08 +#define IDXD_INTC_HALT_STATE 0x10 #define IDXD_CMD_OFFSET 0xa0 union idxd_command_reg { @@ -160,6 +164,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -217,7 +222,7 @@ enum idxd_cmdsts_err { /* disable device errors */ IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31, /* disable WQ, drain WQ, abort WQ, reset WQ */ - IDXD_CMDSTS_ERR_DEV_NOT_EN, + IDXD_CMDSTS_ERR_WQ_NOT_EN, /* request interrupt handle */ IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41, IDXD_CMDSTS_ERR_NO_HANDLE, @@ -354,4 +359,10 @@ union wqcfg { (n) * sizeof(union wqcfg) +\ sizeof(u32) * (ofs)) +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; + #endif diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index 2357d4c596d5..efdfb67ae51a 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -81,6 +81,30 @@ static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) } } +static int idxd_get_mdev_pasid(struct mdev_device *mdev) +{ + struct iommu_domain *domain; + struct device *dev = mdev_dev(mdev); + struct device *iommu_device; + struct device *(*fn)(struct device *dev); + + fn = symbol_get(mdev_get_iommu_device); + if (fn) { + iommu_device = fn(dev); + symbol_put(mdev_get_iommu_device); + } else { + return -ENODEV; + } + + iommu_device = mdev_get_iommu_device(dev); + + domain = iommu_get_domain_for_dev(dev); + if (!domain) + return -EINVAL; + + return iommu_aux_get_pasid(domain, iommu_device); +} + int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { u32 offset = pos & (vidxd->bar_size[0] - 1); @@ -470,12 +494,347 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd) static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { - /* PLACEHOLDER */ + u8 *bar0 = vidxd->bar0; + u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET); + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + *intcause |= IDXD_INTC_CMD; + vidxd_send_interrupt(vidxd, 0); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN); + + gensts->state = IDXD_DEVICE_STATE_ENABLED; + + return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + wq = vidxd->wq; + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) + dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status); + } + + wqcfg->wq_state = 0; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + + idxd_wq_drain(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_drain(wq, &status); + if (status) { + dev_dbg(dev, "wq drain failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + idxd_wq_abort(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "wq abort failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_reset(struct vdcm_idxd *vidxd) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct idxd_wq *wq; + + dev_dbg(dev, "%s\n", __func__); + gensts->state = IDXD_DEVICE_STATE_DRAIN; + wq = vidxd->wq; + + if (wq->state == IDXD_WQ_ENABLED) { + idxd_wq_abort(wq, NULL); + idxd_wq_disable(wq, NULL); + } + + vidxd_mmio_init(vidxd); + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id); + + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_wq_disable(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int vidx) +{ + bool ims = (vidx >> 16) & 1; + u32 cmdsts; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + vidx = vidx & 0xffff; + + dev_dbg(dev, "allocating int handle for %x\n", vidx); + + if (vidx != 1) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + } else { + vidx--; /* MSIX idx 0 is a slow path interrupt */ + cmdsts = vidxd->ims_index[vidx] << 8; + dev_dbg(dev, "int handle %d:%lld\n", vidx, vidxd->ims_index[vidx]); + idxd_complete_command(vidxd, cmdsts); + } +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_device *idxd; + union wqcfg *vwqcfg, *wqcfg; + unsigned long flags; + int wq_pasid; + u32 status; + int priv; + + if (wq_id >= VIDXD_MAX_WQS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id); + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32); + wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + wqcfg = &wq->wqcfg; + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + wq_pasid = idxd_get_mdev_pasid(mdev); + priv = 1; + + if (wq_pasid >= 0) { + wqcfg->bits[2] &= ~0x3fffff00; + wqcfg->priv = priv; + wqcfg->pasid_en = 1; + wqcfg->pasid = wq_pasid; + dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_setup_pasid(wq, wq_pasid); + idxd_wq_setup_priv(wq, priv); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + idxd_wq_enable(wq, &status); + if (status) { + dev_err(dev, "vidxd enable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + } else { + dev_err(dev, "idxd pasid setup failed wq %d wq_pasid %d\n", wq->id, wq_pasid); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + + dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id); + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) @@ -491,5 +850,50 @@ int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) { - /* PLACEHOLDER */ + union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits); + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain_all(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort_all(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_RESET_WQ: + vidxd_wq_reset(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } } From patchwork Tue Sep 15 23:28:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 291236CA for ; Tue, 15 Sep 2020 23:29:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17FF02193E for ; Tue, 15 Sep 2020 23:29:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727424AbgIOX3e (ORCPT ); Tue, 15 Sep 2020 19:29:34 -0400 Received: from mga05.intel.com ([192.55.52.43]:50403 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727176AbgIOX3A (ORCPT ); Tue, 15 Sep 2020 19:29:00 -0400 IronPort-SDR: ecUJIcx9HTEbRlrqx93s64aAN0NJL3hY+7pr+x/SGII57DwC6xQ/LzxiKkFWe3a0JXN2dYH/W9 WELW5quH+cvQ== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="244200291" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="244200291" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:59 -0700 IronPort-SDR: 7SVpz8o6qKHdfPiE4xcigHFUb8nzIGCbbNX8ZkbJLzt3JgPWlAwNg4pH1Go5qsJbbKwbFnyTNL aJhnCL4OdqUg== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="507771710" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:28:58 -0700 Subject: [PATCH v3 12/18] dmaengine: idxd: add mdev type as a new wq type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:28:58 -0700 Message-ID: <160021253826.67751.7453808239640664490.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add "mdev" wq type and support helpers. The mdev wq type marks the wq to be utilized as a VFIO mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/sysfs.c | 13 +++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 23d3ebca1da5..23287f8fd19a 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -65,6 +65,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -290,6 +291,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd); int idxd_register_driver(void); void idxd_unregister_driver(void); struct bus_type *idxd_get_bus_type(struct idxd_device *idxd); +bool is_idxd_wq_mdev(struct idxd_wq *wq); /* device interrupt control */ irqreturn_t idxd_irq_handler(int vec, void *data); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index cbb1038f1ed0..6eb67f744c8e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static void idxd_conf_device_release(struct device *dev) @@ -69,6 +70,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return wq->type == IDXD_WQT_MDEV ? true : false; +} + static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) { @@ -987,8 +993,9 @@ static ssize_t wq_type_show(struct device *dev, return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: - return sprintf(buf, "%s\n", - idxd_wq_type_names[IDXD_WQT_USER]); + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sprintf(buf, "%s\n", @@ -1015,6 +1022,8 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; From patchwork Tue Sep 15 23:29:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 149116CA for ; Tue, 15 Sep 2020 23:34:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F34CD2193E for ; Tue, 15 Sep 2020 23:34:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727455AbgIOXeB (ORCPT ); Tue, 15 Sep 2020 19:34:01 -0400 Received: from mga04.intel.com ([192.55.52.120]:49245 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727384AbgIOXdx (ORCPT ); Tue, 15 Sep 2020 19:33:53 -0400 IronPort-SDR: VgODABpb8S4k8shmihF3PkVC1Twr7k0L1EwKZMHVSZEZ6NzmTnF3bOIy5YR1AcVyzaggpQ9Sql LoiGlSYAE+Qg== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="156755448" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="156755448" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:07 -0700 IronPort-SDR: Y7PK3s13NhBfmBJ+aALMybqzoVdKYxWv79Priv4kBofwVhB/TfhIAoa+AyXQvb/g1zMMdvkk2c 1hk1f9mXVKaA== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="331439472" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:05 -0700 Subject: [PATCH v3 13/18] dmaengine: idxd: add dedicated wq mdev type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:29:04 -0700 Message-ID: <160021254470.67751.9829487993127571356.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the support code for "1dwq" mdev type. This mdev type follows the standard VFIO mdev flow. The "1dwq" type will export a single dedicated wq to the mdev. The dwq will have read-only configuration that is configured by the host. The mdev type does not support PASID and SVA and will match the stage 1 driver in functional support. For backward compatibility, the mdev will maintain the DSA spec definition of this mdev type once the commit goes upstream. Signed-off-by: Dave Jiang --- drivers/dma/idxd/mdev.c | 142 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 133 insertions(+), 9 deletions(-) diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 6b91abd4d8d9..2d3ff1a50d39 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -99,21 +99,58 @@ static void idxd_vdcm_release(struct mdev_device *mdev) mutex_unlock(&vidxd->dev_lock); } +static struct idxd_wq *find_any_dwq(struct idxd_device *idxd) +{ + int i; + struct idxd_wq *wq; + unsigned long flags; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + wq = &idxd->wqs[i]; + + if (wq->state != IDXD_WQ_ENABLED) + continue; + + if (!wq_dedicated(wq)) + continue; + + if (idxd_wq_refcount(wq) != 0) + continue; + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + mutex_lock(&wq->wq_lock); + if (idxd_wq_refcount(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + continue; + } + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + return wq; + } + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + return NULL; +} + static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; struct idxd_wq *wq = NULL; - int i; - - /* PLACEHOLDER, wq matching comes later */ + int i, rc; + if (type->type == IDXD_MDEV_TYPE_1_DWQ) + wq = find_any_dwq(idxd); if (!wq) return ERR_PTR(-ENODEV); vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); - if (!vidxd) - return ERR_PTR(-ENOMEM); + if (!vidxd) { + rc = -ENOMEM; + goto err; + } mutex_init(&vidxd->dev_lock); vidxd->idxd = idxd; @@ -127,14 +164,23 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev vidxd->ims_index[i] = -1; idxd_vdcm_init(vidxd); - mutex_lock(&wq->wq_lock); - idxd_wq_get(wq); - mutex_unlock(&wq->wq_lock); return vidxd; + + err: + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + return ERR_PTR(rc); } -static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = "1dwq-v1", + .description = "IDXD MDEV with 1 dedicated workqueue", + .type = IDXD_MDEV_TYPE_1_DWQ, + }, +}; static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, const char *name) @@ -910,7 +956,85 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, return rc; } +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + + if (type) + return sprintf(buf, "%s\n", type->description); + + return -EINVAL; +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int count = 0, i; + unsigned long flags; + + if (type->type != IDXD_MDEV_TYPE_1_DWQ) + return 0; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = &idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq)) + continue; + + count++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + return count; +} + +static ssize_t available_instances_show(struct kobject *kobj, + struct device *dev, char *buf) +{ + int count; + struct idxd_device *idxd = dev_get_drvdata(dev); + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) + return -EINVAL; + + count = find_available_mdev_instances(idxd, type); + + return sprintf(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_group0 = { + .name = "1dwq-v1", + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group *idxd_mdev_type_groups[] = { + &idxd_mdev_type_group0, + NULL, +}; + static const struct mdev_parent_ops idxd_vdcm_ops = { + .supported_type_groups = idxd_mdev_type_groups, .create = idxd_vdcm_create, .remove = idxd_vdcm_remove, .open = idxd_vdcm_open, From patchwork Tue Sep 15 23:29:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C348892C for ; Tue, 15 Sep 2020 23:31:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1CA821974 for ; Tue, 15 Sep 2020 23:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727416AbgIOXa5 (ORCPT ); Tue, 15 Sep 2020 19:30:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:39780 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726818AbgIOX3N (ORCPT ); Tue, 15 Sep 2020 19:29:13 -0400 IronPort-SDR: 1ho4d7MImUHCoKcIFh9Z7dDVeeZMfVkwrdpOsaVUQ+Ac+q5tcoC68SL1UAiJnX1DkB4asPfD/Z 8nANsTxNbE2g== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="160300249" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="160300249" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:12 -0700 IronPort-SDR: k0U9QmM+u6hubZU49zqqGjcsOs5mlYLD6h9iOBwGCj30m+NZRwHRy5OAncHd1aTqWcjta4dYYv 1MLSN0h8fZVQ== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="343691342" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:11 -0700 Subject: [PATCH v3 14/18] dmaengine: idxd: add new wq state for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:29:11 -0700 Message-ID: <160021255105.67751.1103109119558046722.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a dedicated wq is enabled as mdev, we must disable the wq on the device in order to program the pasid to the wq. Introduce a wq state IDXD_WQ_LOCKED that is software state only in order to prevent the user from modifying the configuration while mdev wq is in this state. While in this state, the wq is not in DISABLED state and will prevent any modifications to the configuration. It is also not in the ENABLED state and therefore prevents any actions allowed in the ENABLED state. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/mdev.c | 4 +++- drivers/dma/idxd/sysfs.c | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 23287f8fd19a..f67c0036f968 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -55,6 +55,7 @@ struct idxd_group { enum idxd_wq_state { IDXD_WQ_DISABLED = 0, IDXD_WQ_ENABLED, + IDXD_WQ_LOCKED, }; enum idxd_wq_flag { diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 2d3ff1a50d39..ea07e7c1ba31 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -72,8 +72,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); - if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) { idxd_wq_disable(wq, NULL); + wq->state = IDXD_WQ_LOCKED; + } } static void idxd_vdcm_release(struct mdev_device *mdev) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 6eb67f744c8e..4cc05392130c 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -797,6 +797,8 @@ static ssize_t wq_state_show(struct device *dev, return sprintf(buf, "disabled\n"); case IDXD_WQ_ENABLED: return sprintf(buf, "enabled\n"); + case IDXD_WQ_LOCKED: + return sprintf(buf, "locked\n"); } return sprintf(buf, "unknown\n"); From patchwork Tue Sep 15 23:29:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02EFC92C for ; Tue, 15 Sep 2020 23:30:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E508B2078D for ; Tue, 15 Sep 2020 23:30:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727327AbgIOXae (ORCPT ); Tue, 15 Sep 2020 19:30:34 -0400 Received: from mga14.intel.com ([192.55.52.115]:1337 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727413AbgIOX3Y (ORCPT ); Tue, 15 Sep 2020 19:29:24 -0400 IronPort-SDR: c5CYUpBXcMGvcRAcmH6DFwvTTrMRj3lo8uNT5xentpQbhb5cR+J7xvYE0UgH5UnZN8afLssxUR Ct0qgqsWbXqQ== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="158647438" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="158647438" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:19 -0700 IronPort-SDR: Wi6xhsSDxbd+zDpjEtWs1fQNfQEstja/dGnHtkNTPfdWsa6cmNibpeaGPiTaUjBtABaEOx6EIO u6CZf5jHPtUQ== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="346035920" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:17 -0700 Subject: [PATCH v3 15/18] dmaengine: idxd: add error notification from host driver to mediated device From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:29:17 -0700 Message-ID: <160021255739.67751.18063125769657114052.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 12 ++++++++++++ drivers/dma/idxd/irq.c | 4 ++++ drivers/dma/idxd/vdev.c | 32 ++++++++++++++++++++++++++++++++ drivers/dma/idxd/vdev.h | 1 + 4 files changed, 49 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index f67c0036f968..07e1e3fcd4aa 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -359,4 +359,16 @@ void idxd_wq_del_cdev(struct idxd_wq *wq); int idxd_mdev_host_init(struct idxd_device *idxd); void idxd_mdev_host_release(struct idxd_device *idxd); +#ifdef CONFIG_INTEL_IDXD_MDEV +void idxd_vidxd_send_errors(struct idxd_device *idxd); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); +#else +static inline void idxd_vidxd_send_errors(struct idxd_device *idxd) +{ +} +static inline void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ +} +#endif /* CONFIG_INTEL_IDXD_MDEV */ + #endif diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 04f00255dae0..f9df0910b799 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -79,6 +79,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } else { int i; @@ -87,6 +89,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } } diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index fa7c27d746cc..ce2f19b9b860 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -978,3 +978,35 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } + +static void vidxd_send_errors(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + int i; + + if (swerr->valid) { + if (!swerr->overflow) + swerr->overflow = 1; + return; + } + + lockdep_assert_held(&idxd->dev_lock); + for (i = 0; i < 4; i++) { + swerr->bits[i] = idxd->sw_err.bits[i]; + swerr++; + } + + if (genctrl->softerr_int_en) + vidxd_send_interrupt(vidxd, 0); +} + +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd; + + list_for_each_entry(vidxd, &wq->vdcm_list, list) + vidxd_send_errors(vidxd); +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index e04c92c432d8..1bfdcdeed8e7 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -25,5 +25,6 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif From patchwork Tue Sep 15 23:29:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11778377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7C8A6CA for ; Tue, 15 Sep 2020 23:30:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D88A3206A5 for ; Tue, 15 Sep 2020 23:30:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727237AbgIOXab (ORCPT ); Tue, 15 Sep 2020 19:30:31 -0400 Received: from mga18.intel.com ([134.134.136.126]:59908 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727232AbgIOX31 (ORCPT ); Tue, 15 Sep 2020 19:29:27 -0400 IronPort-SDR: 6WUEQ+pPJrTJW16WmztSGy3dbfBg2+goEAFsNm+KF2PDuLZGt5kTxXhHkjSu0HJ5UjkqFmNxNd gPBZtE9wVY2A== X-IronPort-AV: E=McAfee;i="6000,8403,9745"; a="147120160" X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="147120160" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:26 -0700 IronPort-SDR: EAu9yNq+VH45IieXXe+oaFmwbky6X7Vd9Y6W2Fb1GLbSz/u9fj/tlipxM08N6PvN9cFwKCRf9e 1pKj0B6TfO1A== X-IronPort-AV: E=Sophos;i="5.76,430,1592895600"; d="scan'208";a="319642576" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2020 16:29:24 -0700 Subject: [PATCH v3 16/18] dmaengine: idxd: add ABI documentation for mediated device support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 15 Sep 2020 16:29:24 -0700 Message-ID: <160021256431.67751.2268297306094926564.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> References: <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Lin Add the sysfs attribute bits in ABI/stable for mediated device and guest support. Signed-off-by: Jing Lin Signed-off-by: Dave Jiang --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index b44183880935..6bb925b55027 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -77,6 +77,12 @@ Contact: dmaengine@vger.kernel.org Description: The operation capability bit mask specify the operation types supported by the this device. +What: /sys/bus/dsa/devices/dsa/ims_size +Date: Sep 8, 2020 +KernelVersion: 5.10.0 +Contact: dmaengine@vger.kernel.org +Description: Number of entries in the interrupt message storage table. + What: /sys/bus/dsa/devices/dsa/state Date: Oct 25, 2019 KernelVersion: 5.6.0 @@ -139,8 +145,9 @@ Date: Oct 25, 2019 KernelVersion: 5.6.0 Contact: dmaengine@vger.kernel.org Description: The type of this work queue, it can be "kernel" type for work - queue usages in the kernel space or "user" type for work queue - usages by applications in user space. + queue usages in the kernel space, "user" type for work queue + usages by applications in user space, or "mdev" type for + VFIO mediated devices. What: /sys/bus/dsa/devices/wq./cdev_minor Date: Oct 25, 2019