From patchwork Sat Feb 22 08:07:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397839 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76A7092A for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5F5DA214DB for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726928AbgBVIB6 (ORCPT ); Sat, 22 Feb 2020 03:01:58 -0500 Received: from mga05.intel.com ([192.55.52.43]:63020 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726883AbgBVIB6 (ORCPT ); Sat, 22 Feb 2020 03:01:58 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547653" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Cornelia Huck Subject: [RFC v3.1 01/22] scripts/update-linux-headers: Import iommu.h Date: Sat, 22 Feb 2020 00:07:02 -0800 Message-Id: <1582358843-51931-2-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Eric Auger Update the script to import the new iommu.h uapi header. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Michael S. Tsirkin Cc: Cornelia Huck Cc: Paolo Bonzini Acked-by: Cornelia Huck Signed-off-by: Eric Auger --- scripts/update-linux-headers.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 29c27f4..5b64ee3 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -141,7 +141,7 @@ done rm -rf "$output/linux-headers/linux" mkdir -p "$output/linux-headers/linux" -for header in kvm.h vfio.h vfio_ccw.h vhost.h \ +for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \ psci.h psp-sev.h userfaultfd.h mman.h; do cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux" done From patchwork Sat Feb 22 08:07:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2412292A for ; Sat, 22 Feb 2020 08:02:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E7CBD214DB for ; Sat, 22 Feb 2020 08:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727102AbgBVICI (ORCPT ); Sat, 22 Feb 2020 03:02:08 -0500 Received: from mga05.intel.com ([192.55.52.43]:63020 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726895AbgBVICG (ORCPT ); Sat, 22 Feb 2020 03:02:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547656" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Cornelia Huck Subject: [RFC v3.1 02/22] header file update VFIO/IOMMU vSVA APIs Date: Sat, 22 Feb 2020 00:07:03 -0800 Message-Id: <1582358843-51931-3-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The kernel uapi/linux/iommu.h header file includes the extensions for vSVA support. e.g. bind gpasid, iommu fault report related user structures and etc. Note: this should be replaced with a full header files update when the vSVA uPAPI is stable. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Michael S. Tsirkin Cc: Cornelia Huck Cc: Paolo Bonzini Signed-off-by: Liu Yi L --- linux-headers/linux/iommu.h | 372 ++++++++++++++++++++++++++++++++++++++++++++ linux-headers/linux/vfio.h | 127 +++++++++++++++ 2 files changed, 499 insertions(+) create mode 100644 linux-headers/linux/iommu.h diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h new file mode 100644 index 0000000..04cc4b0 --- /dev/null +++ b/linux-headers/linux/iommu.h @@ -0,0 +1,372 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * IOMMU user API definitions + */ + +#ifndef _IOMMU_H +#define _IOMMU_H + +#include + +/** + * Current version of the IOMMU user API. This is intended for query + * between user and kernel to determine compatible data structures. + * + * Having a single UAPI version to govern the user-kernel data structures + * makes compatibility check straightforward. On the contrary, supporting + * combinations of multiple versions of the data can be a nightmare. + * + * UAPI version can be bumped up with the following rules: + * 1. All data structures passed between user and kernel space share + * the same version number. i.e. any extension to to any structure + * results in version bump up. + * + * 2. Data structures are open to extension but closed to modification. + * New fields must be added at the end of each data structure with + * 64bit alignment. Flag bits can be added without size change but + * existing ones cannot be altered. + * + * 3. Versions are backward compatible. + * + * 4. Version to size lookup is supported by kernel internal API for each + * API function type. @version is mandatory for new data structures + * and must be at the beginning with type of __u32. + */ +#define IOMMU_UAPI_VERSION 1 +static __inline__ int iommu_get_uapi_version(void) +{ + return IOMMU_UAPI_VERSION; +} + +/* + * Supported UAPI features that can be reported to user space. + * These types represent the capability available in the kernel. + * + * REVISIT: UAPI version also implies the capabilities. Should we + * report them explicitly? + */ +enum IOMMU_UAPI_DATA_TYPES { + IOMMU_UAPI_BIND_GPASID, + IOMMU_UAPI_CACHE_INVAL, + IOMMU_UAPI_PAGE_RESP, + NR_IOMMU_UAPI_TYPE, +}; + +#define IOMMU_UAPI_CAP_MASK ((1 << IOMMU_UAPI_BIND_GPASID) | \ + (1 << IOMMU_UAPI_CACHE_INVAL) | \ + (1 << IOMMU_UAPI_PAGE_RESP)) + +#define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ +#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ +#define IOMMU_FAULT_PERM_EXEC (1 << 2) /* exec */ +#define IOMMU_FAULT_PERM_PRIV (1 << 3) /* privileged */ + +/* Generic fault types, can be expanded IRQ remapping fault */ +enum iommu_fault_type { + IOMMU_FAULT_DMA_UNRECOV = 1, /* unrecoverable fault */ + IOMMU_FAULT_PAGE_REQ, /* page request fault */ +}; + +enum iommu_fault_reason { + IOMMU_FAULT_REASON_UNKNOWN = 0, + + /* Could not access the PASID table (fetch caused external abort) */ + IOMMU_FAULT_REASON_PASID_FETCH, + + /* PASID entry is invalid or has configuration errors */ + IOMMU_FAULT_REASON_BAD_PASID_ENTRY, + + /* + * PASID is out of range (e.g. exceeds the maximum PASID + * supported by the IOMMU) or disabled. + */ + IOMMU_FAULT_REASON_PASID_INVALID, + + /* + * An external abort occurred fetching (or updating) a translation + * table descriptor + */ + IOMMU_FAULT_REASON_WALK_EABT, + + /* + * Could not access the page table entry (Bad address), + * actual translation fault + */ + IOMMU_FAULT_REASON_PTE_FETCH, + + /* Protection flag check failed */ + IOMMU_FAULT_REASON_PERMISSION, + + /* access flag check failed */ + IOMMU_FAULT_REASON_ACCESS, + + /* Output address of a translation stage caused Address Size fault */ + IOMMU_FAULT_REASON_OOR_ADDRESS, +}; + +/** + * struct iommu_fault_unrecoverable - Unrecoverable fault data + * @reason: reason of the fault, from &enum iommu_fault_reason + * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values) + * @pasid: Process Address Space ID + * @perm: requested permission access using by the incoming transaction + * (IOMMU_FAULT_PERM_* values) + * @addr: offending page address + * @fetch_addr: address that caused a fetch abort, if any + */ +struct iommu_fault_unrecoverable { + __u32 reason; +#define IOMMU_FAULT_UNRECOV_PASID_VALID (1 << 0) +#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 << 1) +#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID (1 << 2) + __u32 flags; + __u32 pasid; + __u32 perm; + __u64 addr; + __u64 fetch_addr; +}; + +/** + * struct iommu_fault_page_request - Page Request data + * @flags: encodes whether the corresponding fields are valid and whether this + * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values) + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values) + * @addr: page address + * @private_data: device-specific private information + */ +struct iommu_fault_page_request { +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) + __u32 flags; + __u32 pasid; + __u32 grpid; + __u32 perm; + __u64 addr; + __u64 private_data[2]; +}; + +/** + * struct iommu_fault - Generic fault data + * @type: fault type from &enum iommu_fault_type + * @padding: reserved for future use (should be zero) + * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV + * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ + * @padding2: sets the fault size to allow for future extensions + */ +struct iommu_fault { + __u32 type; + __u32 padding; + union { + struct iommu_fault_unrecoverable event; + struct iommu_fault_page_request prm; + __u8 padding2[56]; + }; +}; + +/** + * enum iommu_page_response_code - Return status of fault handlers + * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables + * populated, retry the access. This is "Success" in PCI PRI. + * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from + * this device if possible. This is "Response Failure" in PCI PRI. + * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the + * access. This is "Invalid Request" in PCI PRI. + */ +enum iommu_page_response_code { + IOMMU_PAGE_RESP_SUCCESS = 0, + IOMMU_PAGE_RESP_INVALID, + IOMMU_PAGE_RESP_FAILURE, +}; + +/** + * struct iommu_page_response - Generic page response information + * @version: IOMMU_UAPI_VERSION + * @flags: encodes whether the corresponding fields are valid + * (IOMMU_FAULT_PAGE_RESPONSE_* values) + * @pasid: Process Address Space ID + * @grpid: Page Request Group Index + * @code: response code from &enum iommu_page_response_code + */ +struct iommu_page_response { + __u32 version; +#define IOMMU_PAGE_RESP_PASID_VALID (1 << 0) + __u32 flags; + __u32 pasid; + __u32 grpid; + __u32 code; +}; + +/* defines the granularity of the invalidation */ +enum iommu_inv_granularity { + IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */ + IOMMU_INV_GRANU_PASID, /* PASID-selective invalidation */ + IOMMU_INV_GRANU_ADDR, /* page-selective invalidation */ + IOMMU_INV_GRANU_NR, /* number of invalidation granularities */ +}; + +/** + * struct iommu_inv_addr_info - Address Selective Invalidation Structure + * + * @flags: indicates the granularity of the address-selective invalidation + * - If the PASID bit is set, the @pasid field is populated and the invalidation + * relates to cache entries tagged with this PASID and matching the address + * range. + * - If ARCHID bit is set, @archid is populated and the invalidation relates + * to cache entries tagged with this architecture specific ID and matching + * the address range. + * - Both PASID and ARCHID can be set as they may tag different caches. + * - If neither PASID or ARCHID is set, global addr invalidation applies. + * - The LEAF flag indicates whether only the leaf PTE caching needs to be + * invalidated and other paging structure caches can be preserved. + * @pasid: process address space ID + * @archid: architecture-specific ID + * @addr: first stage/level input address + * @granule_size: page/block size of the mapping in bytes + * @nb_granules: number of contiguous granules to be invalidated + */ +struct iommu_inv_addr_info { +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0) +#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1) +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2) + __u32 flags; + __u32 archid; + __u64 pasid; + __u64 addr; + __u64 granule_size; + __u64 nb_granules; +}; + +/** + * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure + * + * @flags: indicates the granularity of the PASID-selective invalidation + * - If the PASID bit is set, the @pasid field is populated and the invalidation + * relates to cache entries tagged with this PASID and matching the address + * range. + * - If the ARCHID bit is set, the @archid is populated and the invalidation + * relates to cache entries tagged with this architecture specific ID and + * matching the address range. + * - Both PASID and ARCHID can be set as they may tag different caches. + * - At least one of PASID or ARCHID must be set. + * @pasid: process address space ID + * @archid: architecture-specific ID + */ +struct iommu_inv_pasid_info { +#define IOMMU_INV_PASID_FLAGS_PASID (1 << 0) +#define IOMMU_INV_PASID_FLAGS_ARCHID (1 << 1) + __u32 flags; + __u32 archid; + __u64 pasid; +}; + +/** + * struct iommu_cache_invalidate_info - First level/stage invalidation + * information + * @version: IOMMU_UAPI_VERSION + * @cache: bitfield that allows to select which caches to invalidate + * @granularity: defines the lowest granularity used for the invalidation: + * domain > PASID > addr + * @padding: reserved for future use (should be zero) + * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID + * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR + * + * Not all the combinations of cache/granularity are valid: + * + * +--------------+---------------+---------------+---------------+ + * | type / | DEV_IOTLB | IOTLB | PASID | + * | granularity | | | cache | + * +==============+===============+===============+===============+ + * | DOMAIN | N/A | Y | Y | + * +--------------+---------------+---------------+---------------+ + * | PASID | Y | Y | Y | + * +--------------+---------------+---------------+---------------+ + * | ADDR | Y | Y | N/A | + * +--------------+---------------+---------------+---------------+ + * + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than + * @version and @cache. + * + * If multiple cache types are invalidated simultaneously, they all + * must support the used granularity. + */ +struct iommu_cache_invalidate_info { + __u32 version; +/* IOMMU paging structure cache */ +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ +#define IOMMU_CACHE_INV_TYPE_NR (3) + __u8 cache; + __u8 granularity; + __u8 padding[2]; + union { + struct iommu_inv_pasid_info pasid_info; + struct iommu_inv_addr_info addr_info; + }; +}; + +/** + * struct iommu_gpasid_bind_data_vtd - Intel VT-d specific data on device and guest + * SVA binding. + * + * @flags: VT-d PASID table entry attributes + * @pat: Page attribute table data to compute effective memory type + * @emt: Extended memory type + * + * Only guest vIOMMU selectable and effective options are passed down to + * the host IOMMU. + */ +struct iommu_gpasid_bind_data_vtd { +#define IOMMU_SVA_VTD_GPASID_SRE (1 << 0) /* supervisor request */ +#define IOMMU_SVA_VTD_GPASID_EAFE (1 << 1) /* extended access enable */ +#define IOMMU_SVA_VTD_GPASID_PCD (1 << 2) /* page-level cache disable */ +#define IOMMU_SVA_VTD_GPASID_PWT (1 << 3) /* page-level write through */ +#define IOMMU_SVA_VTD_GPASID_EMTE (1 << 4) /* extended mem type enable */ +#define IOMMU_SVA_VTD_GPASID_CD (1 << 5) /* PASID-level cache disable */ + __u64 flags; + __u32 pat; + __u32 emt; +}; +#define IOMMU_SVA_VTD_GPASID_EMT_MASK (IOMMU_SVA_VTD_GPASID_CD | \ + IOMMU_SVA_VTD_GPASID_EMTE | \ + IOMMU_SVA_VTD_GPASID_PCD | \ + IOMMU_SVA_VTD_GPASID_PWT) +/** + * struct iommu_gpasid_bind_data - Information about device and guest PASID binding + * @version: IOMMU_UAPI_VERSION + * @format: PASID table entry format + * @flags: Additional information on guest bind request + * @gpgd: Guest page directory base of the guest mm to bind + * @hpasid: Process address space ID used for the guest mm in host IOMMU + * @gpasid: Process address space ID used for the guest mm in guest IOMMU + * @addr_width: Guest virtual address width + * @padding: Reserved for future use (should be zero) + * @vtd: Intel VT-d specific data + * + * Guest to host PASID mapping can be an identity or non-identity, where guest + * has its own PASID space. For non-identify mapping, guest to host PASID lookup + * is needed when VM programs guest PASID into an assigned device. VMM may + * trap such PASID programming then request host IOMMU driver to convert guest + * PASID to host PASID based on this bind data. + */ +struct iommu_gpasid_bind_data { + __u32 version; +#define IOMMU_PASID_FORMAT_INTEL_VTD 1 + __u32 format; +#define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */ + __u64 flags; + __u64 gpgd; + __u64 hpasid; + __u64 gpasid; + __u32 addr_width; + __u8 padding[12]; + /* Vendor specific data */ + union { + struct iommu_gpasid_bind_data_vtd vtd; + }; +}; + +#endif /* _IOMMU_H */ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index fb10370..7f87ccd 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -14,6 +14,7 @@ #include #include +#include #define VFIO_API_VERSION 0 @@ -47,6 +48,15 @@ #define VFIO_NOIOMMU_IOMMU 8 /* + * Hardware IOMMUs with two-stage translation capability give userspace + * the ownership of stage-1 translation structures (e.g. page tables). + * VFIO exposes the two-stage IOMMU programming capability to userspace + * based on the IOMMU UAPIs. Therefore user of VFIO_TYPE1_NESTING should + * check the IOMMU UAPI version compatibility. + */ +#define VFIO_NESTING_IOMMU_UAPI 9 + +/* * The IOCTL interface is designed for extensibility by embedding the * structure length (argsz) and flags into structures passed between * kernel and userspace. We therefore use the _IO() macro for these @@ -748,6 +758,15 @@ struct vfio_iommu_type1_info_cap_iova_range { struct vfio_iova_range iova_ranges[]; }; +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 2 + +struct vfio_iommu_type1_info_cap_nesting { + struct vfio_info_cap_header header; +#define VFIO_IOMMU_PASID_REQS (1 << 0) + __u32 nesting_capabilities; + __u32 stage1_format; +}; + #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) /** @@ -794,6 +813,114 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/* + * PASID (Process Address Space ID) is a PCIe concept which + * has been extended to support DMA isolation in fine-grain. + * With device assigned to user space (e.g. VMs), PASID alloc + * and free need to be system wide. This structure defines + * the info for pasid alloc/free between user space and kernel + * space. + * + * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @alloc_pasid + * @flag=VFIO_IOMMU_PASID_FREE, refer to @free_pasid + */ +struct vfio_iommu_type1_pasid_request { + __u32 argsz; +#define VFIO_IOMMU_PASID_ALLOC (1 << 0) +#define VFIO_IOMMU_PASID_FREE (1 << 1) + __u32 flags; + union { + struct { + __u32 min; + __u32 max; + __u32 result; + } alloc_pasid; + __u32 free_pasid; + }; +}; + +#define VFIO_PASID_REQUEST_MASK (VFIO_IOMMU_PASID_ALLOC | \ + VFIO_IOMMU_PASID_FREE) + +/** + * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 22, + * struct vfio_iommu_type1_pasid_request) + * + * Availability of this feature depends on PASID support in the device, + * its bus, the underlying IOMMU and the CPU architecture. In VFIO, it + * is available after VFIO_SET_IOMMU. + * + * returns: 0 on success, -errno on failure. + */ +#define VFIO_IOMMU_PASID_REQUEST _IO(VFIO_TYPE, VFIO_BASE + 22) + +/** + * Supported flags: + * - VFIO_IOMMU_BIND_GUEST_PGTBL: bind guest page tables to host for + * nesting type IOMMUs. In @data field It takes struct + * iommu_gpasid_bind_data. + * - VFIO_IOMMU_UNBIND_GUEST_PGTBL: undo a bind guest page table operation + * invoked by VFIO_IOMMU_BIND_GUEST_PGTBL. + * + */ +struct vfio_iommu_type1_bind { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_BIND_GUEST_PGTBL (1 << 0) +#define VFIO_IOMMU_UNBIND_GUEST_PGTBL (1 << 1) + __u8 data[]; +}; + +#define VFIO_IOMMU_BIND_MASK (VFIO_IOMMU_BIND_GUEST_PGTBL | \ + VFIO_IOMMU_UNBIND_GUEST_PGTBL) + +/** + * VFIO_IOMMU_BIND - _IOW(VFIO_TYPE, VFIO_BASE + 23, + * struct vfio_iommu_type1_bind) + * + * Manage address spaces of devices in this container. Initially a TYPE1 + * container can only have one address space, managed with + * VFIO_IOMMU_MAP/UNMAP_DMA. + * + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page + * tables, and BIND manages the stage-1 (guest) page tables. Other types of + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls + * the traffics only require single stage translation while BIND controls the + * traffics require nesting translation. But this depends on the underlying + * IOMMU architecture and isn't guaranteed. Example of this is the guest SVA + * traffics, such traffics need nesting translation to gain gVA->gPA and then + * gPA->hPA translation. + * + * Availability of this feature depends on the device, its bus, the underlying + * IOMMU and the CPU architecture. + * + * returns: 0 on success, -errno on failure. + */ +#define VFIO_IOMMU_BIND _IO(VFIO_TYPE, VFIO_BASE + 23) + +/** + * VFIO_IOMMU_CACHE_INVALIDATE - _IOW(VFIO_TYPE, VFIO_BASE + 24, + * struct vfio_iommu_type1_cache_invalidate) + * + * Propagate guest IOMMU cache invalidation to the host. The cache + * invalidation information is conveyed by @cache_info, the content + * format would be structures defined in uapi/linux/iommu.h. User + * should be aware of that the struct iommu_cache_invalidate_info + * has a @version field, vfio needs to parse this field before getting + * data from userspace. + * + * Availability of this IOCTL is after VFIO_SET_IOMMU. + * + * returns: 0 on success, -errno on failure. + */ +struct vfio_iommu_type1_cache_invalidate { + __u32 argsz; + __u32 flags; + struct iommu_cache_invalidate_info cache_info; +}; +#define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 24) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Sat Feb 22 08:07:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9BF4138D for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A95B2208C3 for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726343AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547659" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 03/22] vfio: check VFIO_TYPE1_NESTING_IOMMU support Date: Sat, 22 Feb 2020 00:07:04 -0800 Message-Id: <1582358843-51931-4-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org VFIO needs to check VFIO_TYPE1_NESTING_IOMMU support with Kernel before further using it. e.g. requires to check IOMMU UAPI version. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L Signed-off-by: Yi Sun --- hw/vfio/common.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 5ca1148..be1a9e5 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1157,12 +1157,21 @@ static void vfio_put_address_space(VFIOAddressSpace *space) static int vfio_get_iommu_type(VFIOContainer *container, Error **errp) { - int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU, + int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU, + VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU }; - int i; + int i, version; for (i = 0; i < ARRAY_SIZE(iommu_types); i++) { if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) { + if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU) { + version = ioctl(container->fd, VFIO_CHECK_EXTENSION, + VFIO_NESTING_IOMMU_UAPI); + if (version < IOMMU_UAPI_VERSION) { + info_report("IOMMU UAPI incompatible for nesting"); + continue; + } + } return iommu_types[i]; } } @@ -1278,6 +1287,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, } switch (container->iommu_type) { + case VFIO_TYPE1_NESTING_IOMMU: case VFIO_TYPE1v2_IOMMU: case VFIO_TYPE1_IOMMU: { From patchwork Sat Feb 22 08:07:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397877 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A371092A for ; Sat, 22 Feb 2020 08:02:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 85273208C4 for ; Sat, 22 Feb 2020 08:02:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727166AbgBVIC0 (ORCPT ); Sat, 22 Feb 2020 03:02:26 -0500 Received: from mga05.intel.com ([192.55.52.43]:63022 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726983AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547662" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 04/22] hw/iommu: introduce HostIOMMUContext Date: Sat, 22 Feb 2020 00:07:05 -0800 Message-Id: <1582358843-51931-5-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently, many platform vendors provide the capability of dual stage DMA address translation in hardware. For example, nested translation on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3, and etc. In dual stage DMA address translation, there are two stages address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a second-level) translation structures. Stage-1 translation results are also subjected to stage-2 translation structures. Take vSVA (Virtual Shared Virtual Addressing) as an example, guest IOMMU driver owns stage-1 translation structures (covers GVA->GPA translation), and host IOMMU driver owns stage-2 translation structures (covers GPA->HPA translation). VMM is responsible to bind stage-1 translation structures to host, thus hardware could achieve GVA->GPA and then GPA->HPA translation. For more background on SVA, refer the below links. - https://www.youtube.com/watch?v=Kq_nfGK5MwQ - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\ Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf In QEMU, vIOMMU emualtors expose IOMMUs to VM per their own spec. Devices are pass-through to guest via device pass-through components like VFIO. VFIO is a userspace driver framework which exposes host IOMMU programming capability to userspace in a secure way. e.g. IOVA MAP/UNMAP requests. Thus the major connection between VFIO and vIOMMU are MAP/UNMAP. However, with the dual stage DMA translation support, there are more interactions between vIOMMU and VFIO as below: 1) PASID allocation (allow host to intercept in PASID allocation) 2) bind stage-1 translation structures to host 3) propagate stage-1 cache invalidation to host 4) DMA address translation fault (I/O page fault) servicing etc. With the above new interactions, it requires an abstract layer to abstract the above operations and expose to vIOMMU emulators as an explicit way for vIOMMU emulators call into VFIO. This patch introduces HostIOMMUContext to stand for hardware IOMMU w/ dual stage DMA address translation capability. PASID allocation/free are the first two operations included to demonstrate the design, in future, there will be more operations like bind_stage1_pgtbl and invalidate_stage1_cache and etc. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Michael S. Tsirkin Signed-off-by: Liu Yi L --- hw/Makefile.objs | 1 + hw/iommu/Makefile.objs | 1 + hw/iommu/host_iommu_context.c | 55 +++++++++++++++++++++++++++++++ include/hw/iommu/host_iommu_context.h | 61 +++++++++++++++++++++++++++++++++++ 4 files changed, 118 insertions(+) create mode 100644 hw/iommu/Makefile.objs create mode 100644 hw/iommu/host_iommu_context.c create mode 100644 include/hw/iommu/host_iommu_context.h diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 660e2b4..cab83fe 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -40,6 +40,7 @@ devices-dirs-$(CONFIG_MEM_DEVICE) += mem/ devices-dirs-$(CONFIG_NUBUS) += nubus/ devices-dirs-y += semihosting/ devices-dirs-y += smbios/ +devices-dirs-y += iommu/ endif common-obj-y += $(devices-dirs-y) diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs new file mode 100644 index 0000000..e6eed4e --- /dev/null +++ b/hw/iommu/Makefile.objs @@ -0,0 +1 @@ +obj-y += host_iommu_context.o diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c new file mode 100644 index 0000000..11b092f --- /dev/null +++ b/hw/iommu/host_iommu_context.c @@ -0,0 +1,55 @@ +/* + * QEMU abstract of Host IOMMU + * + * Copyright (C) 2020 Intel Corporation. + * + * Authors: Liu Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see . + */ + +#include "qemu/osdep.h" +#include "hw/iommu/host_iommu_context.h" + +int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, + uint32_t max, uint32_t *pasid) +{ + if (host_icx && (host_icx->flags & HOST_IOMMU_PASID_REQUEST) && + host_icx->ops && host_icx->ops->pasid_alloc) { + return host_icx->ops->pasid_alloc(host_icx, min, max, pasid); + } + return -ENOENT; +} + +int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid) +{ + if (host_icx && (host_icx->flags & HOST_IOMMU_PASID_REQUEST) && + host_icx->ops && host_icx->ops->pasid_free) { + return host_icx->ops->pasid_free(host_icx, pasid); + } + return -ENOENT; +} + +void host_iommu_ctx_init(HostIOMMUContext *host_icx, + uint64_t flags, HostIOMMUOps *ops) +{ + host_icx->flags = flags; + host_icx->ops = ops; +} + +void host_iommu_ctx_destroy(HostIOMMUContext *host_icx) +{ + host_icx->flags = 0x0; + host_icx->ops = NULL; +} diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h new file mode 100644 index 0000000..f4d811a --- /dev/null +++ b/include/hw/iommu/host_iommu_context.h @@ -0,0 +1,61 @@ +/* + * QEMU abstraction of Host IOMMU + * + * Copyright (C) 2020 Intel Corporation. + * + * Authors: Liu Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see . + */ + +#ifndef HW_IOMMU_CONTEXT_H +#define HW_IOMMU_CONTEXT_H + +#include "qemu/queue.h" +#ifndef CONFIG_USER_ONLY +#include "exec/hwaddr.h" +#endif + +typedef struct HostIOMMUContext HostIOMMUContext; +typedef struct HostIOMMUOps HostIOMMUOps; + +struct HostIOMMUOps { + /* Allocate pasid from HostIOMMUContext (a.k.a. host software) */ + int (*pasid_alloc)(HostIOMMUContext *host_icx, + uint32_t min, + uint32_t max, + uint32_t *pasid); + /* Reclaim pasid from HostIOMMUContext (a.k.a. host software) */ + int (*pasid_free)(HostIOMMUContext *host_icx, + uint32_t pasid); +}; + +/* + * This is an abstraction of host IOMMU with dual-stage capability + */ +struct HostIOMMUContext { +#define HOST_IOMMU_PASID_REQUEST (1ULL << 0) + uint64_t flags; + HostIOMMUOps *ops; +}; + +int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, + uint32_t max, uint32_t *pasid); +int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid); + +void host_iommu_ctx_init(HostIOMMUContext *host_icx, + uint64_t flags, HostIOMMUOps *ops); +void host_iommu_ctx_destroy(HostIOMMUContext *host_icx); + +#endif From patchwork Sat Feb 22 08:07:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397883 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD1AC138D for ; Sat, 22 Feb 2020 08:02:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C77EF208C3 for ; Sat, 22 Feb 2020 08:02:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727186AbgBVICb (ORCPT ); Sat, 22 Feb 2020 03:02:31 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727021AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547665" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 05/22] hw/pci: add pci_device_setup_iommu Date: Sat, 22 Feb 2020 00:07:06 -0800 Message-Id: <1582358843-51931-6-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org HostIOMMUContext was introduced to provide an explicit way for vIOMMU emulators call into pass-through components (e.g. VFIO). vIOMMU needs to get the HostIOMMUContext before using it. This patch adds a new callback in PCIDevice, which would be set by pass-through components, and be used by vIOMMU emulators to get HostIOMMUContext. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Michael S. Tsirkin Signed-off-by: Liu, Yi L --- hw/pci/pci.c | 10 ++++++++++ include/hw/pci/pci.h | 6 ++++++ 2 files changed, 16 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index e1ed667..3166cc3 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2695,6 +2695,16 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque) bus->iommu_opaque = opaque; } +void pci_device_setup_iommu(PCIDevice *dev, PCIHostIOMMUFunc fn) +{ + dev->host_iommu_fn = fn; +} + +void pci_device_unset_iommu(PCIDevice *dev) +{ + dev->host_iommu_fn = NULL; +} + static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque) { Range *range = opaque; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 2acd832..e44eefb 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -8,6 +8,7 @@ #include "hw/isa/isa.h" #include "hw/pci/pcie.h" +#include "hw/iommu/host_iommu_context.h" extern bool pci_available; @@ -248,6 +249,7 @@ typedef void (*MSIVectorReleaseNotifier)(PCIDevice *dev, unsigned int vector); typedef void (*MSIVectorPollNotifier)(PCIDevice *dev, unsigned int vector_start, unsigned int vector_end); +typedef HostIOMMUContext *(*PCIHostIOMMUFunc)(PCIDevice *); enum PCIReqIDType { PCI_REQ_ID_INVALID = 0, @@ -356,6 +358,8 @@ struct PCIDevice { /* ID of standby device in net_failover pair */ char *failover_pair_id; + /* Callback to get host iommu context */ + PCIHostIOMMUFunc host_iommu_fn; }; void pci_register_bar(PCIDevice *pci_dev, int region_num, @@ -488,6 +492,8 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int); AddressSpace *pci_device_iommu_address_space(PCIDevice *dev); void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque); +void pci_device_setup_iommu(PCIDevice *dev, PCIHostIOMMUFunc fn); +void pci_device_unset_iommu(PCIDevice *dev); static inline void pci_set_byte(uint8_t *config, uint8_t val) From patchwork Sat Feb 22 08:07:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397845 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA01F18EC for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D3549214DB for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727125AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726836AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547668" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 06/22] vfio/pci: init HostIOMMUContext per-container Date: Sat, 22 Feb 2020 00:07:07 -0800 Message-Id: <1582358843-51931-7-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After confirming dual stage DMA translation support with kernel by checking VFIO_TYPE1_NESTING_IOMMU, VFIO inits HostIOMMUContet instance and exposes it to PCI layer. Thus vIOMMU emualtors may make use of such capability by leveraging the ops provided by HostIOMMUContext. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/vfio/common.c | 11 +++++++++++ hw/vfio/pci.c | 21 +++++++++++++++++++++ include/hw/vfio/vfio-common.h | 2 ++ 3 files changed, 34 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index be1a9e5..9ab62a6 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1179,10 +1179,15 @@ static int vfio_get_iommu_type(VFIOContainer *container, return -EINVAL; } +static struct HostIOMMUOps vfio_host_icx_ops = { +/* To be added later */ +}; + static int vfio_init_container(VFIOContainer *container, int group_fd, Error **errp) { int iommu_type, ret; + uint64_t flags = 0; iommu_type = vfio_get_iommu_type(container, errp); if (iommu_type < 0) { @@ -1210,6 +1215,11 @@ static int vfio_init_container(VFIOContainer *container, int group_fd, return -errno; } + if (iommu_type == VFIO_TYPE1_NESTING_IOMMU) { + host_iommu_ctx_init(&container->host_icx, + flags, &vfio_host_icx_ops); + } + container->iommu_type = iommu_type; return 0; } @@ -1456,6 +1466,7 @@ static void vfio_disconnect_container(VFIOGroup *group) } trace_vfio_disconnect_container(container->fd); + host_iommu_ctx_destroy(&container->host_icx); close(container->fd); g_free(container); diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 5e75a95..df79675 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2712,11 +2712,20 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev) vdev->req_enabled = false; } +static HostIOMMUContext *vfio_host_dma_iommu(PCIDevice *pdev) +{ + VFIOPCIDevice *vdev = PCI_VFIO(pdev); + VFIOContainer *container = vdev->vbasedev.group->container; + + return &container->host_icx; +} + static void vfio_realize(PCIDevice *pdev, Error **errp) { VFIOPCIDevice *vdev = PCI_VFIO(pdev); VFIODevice *vbasedev_iter; VFIOGroup *group; + VFIOContainer *container; char *tmp, *subsys, group_path[PATH_MAX], *group_name; Error *err = NULL; ssize_t len; @@ -3028,6 +3037,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) vfio_register_req_notifier(vdev); vfio_setup_resetfn_quirk(vdev); + container = vdev->vbasedev.group->container; + if (container->host_icx.ops) { + pci_device_setup_iommu(pdev, vfio_host_dma_iommu); + } + return; out_deregister: @@ -3072,9 +3086,16 @@ static void vfio_instance_finalize(Object *obj) static void vfio_exitfn(PCIDevice *pdev) { VFIOPCIDevice *vdev = PCI_VFIO(pdev); + VFIOContainer *container; vfio_unregister_req_notifier(vdev); vfio_unregister_err_notifier(vdev); + + container = vdev->vbasedev.group->container; + if (container->host_icx.ops) { + pci_device_unset_iommu(pdev); + } + pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); if (vdev->irqchip_change_notifier.notify) { kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index fd56420..36abe04 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -26,6 +26,7 @@ #include "qemu/notify.h" #include "ui/console.h" #include "hw/display/ramfb.h" +#include "hw/iommu/host_iommu_context.h" #ifdef CONFIG_LINUX #include #endif @@ -71,6 +72,7 @@ typedef struct VFIOContainer { MemoryListener listener; MemoryListener prereg_listener; unsigned iommu_type; + HostIOMMUContext host_icx; Error *error; bool initialized; unsigned long pgsizes; From patchwork Sat Feb 22 08:07:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F1BF17F0 for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7F6E2214DB for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726975AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 Received: from mga04.intel.com ([192.55.52.120]:65090 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbgBVIB6 (ORCPT ); Sat, 22 Feb 2020 03:01:58 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547671" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:56 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 07/22] vfio: get nesting iommu cap info from Kernel Date: Sat, 22 Feb 2020 00:07:08 -0800 Message-Id: <1582358843-51931-8-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org VFIO exposes host IOMMU dual-stage DMA translation programming capability to userspace by VFIO_TYPE1_NESTING_IOMMU type. However, userspace needs more info on the nesting type. e.g. the supported stage 1 format and PASID alloc/free request availability. This patch gets the iommu nesting cap info from kernel by using IOCTL VFIO_IOMMU_GET_INFO. This patch referred some code from Shameer Kolothum. https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Shameer Kolothum Signed-off-by: Liu Yi L --- hw/iommu/host_iommu_context.c | 5 +- hw/vfio/common.c | 97 ++++++++++++++++++++++++++++++++++- include/hw/iommu/host_iommu_context.h | 10 +++- 3 files changed, 108 insertions(+), 4 deletions(-) diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c index 11b092f..689a087 100644 --- a/hw/iommu/host_iommu_context.c +++ b/hw/iommu/host_iommu_context.c @@ -42,10 +42,13 @@ int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid) } void host_iommu_ctx_init(HostIOMMUContext *host_icx, - uint64_t flags, HostIOMMUOps *ops) + uint64_t flags, HostIOMMUOps *ops, + HostIOMMUInfo *uinfo) { host_icx->flags = flags; host_icx->ops = ops; + + host_icx->uinfo.stage1_format = uinfo->stage1_format; } void host_iommu_ctx_destroy(HostIOMMUContext *host_icx) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 9ab62a6..f9be68d 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1183,6 +1183,84 @@ static struct HostIOMMUOps vfio_host_icx_ops = { /* To be added later */ }; +/** + * Get iommu info from host. Caller of this funcion should free + * the memory pointed by the returned pointer stored in @info + * after a successful calling when finished its usage. + */ +static int vfio_get_iommu_info(VFIOContainer *container, + struct vfio_iommu_type1_info **info) +{ + + size_t argsz = sizeof(struct vfio_iommu_type1_info); + + *info = g_malloc0(argsz); + +retry: + (*info)->argsz = argsz; + + if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) { + g_free(*info); + *info = NULL; + return -errno; + } + + if (((*info)->argsz > argsz)) { + argsz = (*info)->argsz; + *info = g_realloc(*info, argsz); + goto retry; + } + + return 0; +} + +static struct vfio_info_cap_header * +vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id) +{ + struct vfio_info_cap_header *hdr; + void *ptr = info; + + if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) { + return NULL; + } + + for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) { + if (hdr->id == id) { + return hdr; + } + } + + return NULL; +} + +static int vfio_get_nesting_iommu_cap(VFIOContainer *container, + struct vfio_iommu_type1_info_cap_nesting *cap_nesting) +{ + struct vfio_iommu_type1_info *info; + struct vfio_info_cap_header *hdr; + struct vfio_iommu_type1_info_cap_nesting *cap; + int ret; + + ret = vfio_get_iommu_info(container, &info); + if (ret) { + return ret; + } + + hdr = vfio_get_iommu_info_cap(info, + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING); + if (!hdr) { + g_free(info); + return -errno; + } + + cap = container_of(hdr, + struct vfio_iommu_type1_info_cap_nesting, header); + *cap_nesting = *cap; + + g_free(info); + return 0; +} + static int vfio_init_container(VFIOContainer *container, int group_fd, Error **errp) { @@ -1216,8 +1294,23 @@ static int vfio_init_container(VFIOContainer *container, int group_fd, } if (iommu_type == VFIO_TYPE1_NESTING_IOMMU) { - host_iommu_ctx_init(&container->host_icx, - flags, &vfio_host_icx_ops); + struct vfio_iommu_type1_info_cap_nesting nesting = { + .nesting_capabilities = 0x0, + .stage1_format = 0, }; + HostIOMMUInfo uinfo; + + ret = vfio_get_nesting_iommu_cap(container, &nesting); + if (ret) { + error_setg_errno(errp, -ret, + "Failed to get nesting iommu cap"); + return ret; + } + + uinfo.stage1_format = nesting.stage1_format; + flags |= (nesting.nesting_capabilities & VFIO_IOMMU_PASID_REQS) ? + HOST_IOMMU_PASID_REQUEST : 0; + host_iommu_ctx_init(&container->host_icx, flags, + &vfio_host_icx_ops, &uinfo); } container->iommu_type = iommu_type; diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index f4d811a..6797f6d 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -23,12 +23,14 @@ #define HW_IOMMU_CONTEXT_H #include "qemu/queue.h" +#include #ifndef CONFIG_USER_ONLY #include "exec/hwaddr.h" #endif typedef struct HostIOMMUContext HostIOMMUContext; typedef struct HostIOMMUOps HostIOMMUOps; +typedef struct HostIOMMUInfo HostIOMMUInfo; struct HostIOMMUOps { /* Allocate pasid from HostIOMMUContext (a.k.a. host software) */ @@ -41,6 +43,10 @@ struct HostIOMMUOps { uint32_t pasid); }; +struct HostIOMMUInfo { + uint32_t stage1_format; +}; + /* * This is an abstraction of host IOMMU with dual-stage capability */ @@ -48,6 +54,7 @@ struct HostIOMMUContext { #define HOST_IOMMU_PASID_REQUEST (1ULL << 0) uint64_t flags; HostIOMMUOps *ops; + HostIOMMUInfo uinfo; }; int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, @@ -55,7 +62,8 @@ int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid); void host_iommu_ctx_init(HostIOMMUContext *host_icx, - uint64_t flags, HostIOMMUOps *ops); + uint64_t flags, HostIOMMUOps *ops, + HostIOMMUInfo *uinfo); void host_iommu_ctx_destroy(HostIOMMUContext *host_icx); #endif From patchwork Sat Feb 22 08:07:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EF63138D for ; Sat, 22 Feb 2020 08:02:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 86C41208C3 for ; Sat, 22 Feb 2020 08:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727148AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727099AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547674" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 08/22] vfio/common: add pasid_alloc/free support Date: Sat, 22 Feb 2020 00:07:09 -0800 Message-Id: <1582358843-51931-9-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds VFIO pasid alloc/free support to allow host intercept in PASID allocation for VM by adding VFIO implementation of HostIOMMUOps. pasid_alloc/free callbacks. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/vfio/common.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f9be68d..8f30a52 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1179,8 +1179,53 @@ static int vfio_get_iommu_type(VFIOContainer *container, return -EINVAL; } +static int vfio_host_icx_pasid_alloc(HostIOMMUContext *host_icx, + uint32_t min, uint32_t max, uint32_t *pasid) +{ + VFIOContainer *container = container_of(host_icx, VFIOContainer, host_icx); + struct vfio_iommu_type1_pasid_request req; + unsigned long argsz; + int ret; + + argsz = sizeof(req); + req.argsz = argsz; + req.flags = VFIO_IOMMU_PASID_ALLOC; + req.alloc_pasid.min = min; + req.alloc_pasid.max = max; + + if (ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req)) { + ret = -errno; + error_report("%s: %d, alloc failed", __func__, ret); + return ret; + } + *pasid = req.alloc_pasid.result; + return 0; +} + +static int vfio_host_icx_pasid_free(HostIOMMUContext *host_icx, + uint32_t pasid) +{ + VFIOContainer *container = container_of(host_icx, VFIOContainer, host_icx); + struct vfio_iommu_type1_pasid_request req; + unsigned long argsz; + int ret; + + argsz = sizeof(req); + req.argsz = argsz; + req.flags = VFIO_IOMMU_PASID_FREE; + req.free_pasid = pasid; + + if (ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req)) { + ret = -errno; + error_report("%s: %d, free failed", __func__, ret); + return ret; + } + return 0; +} + static struct HostIOMMUOps vfio_host_icx_ops = { -/* To be added later */ + .pasid_alloc = vfio_host_icx_pasid_alloc, + .pasid_free = vfio_host_icx_pasid_free, }; /** From patchwork Sat Feb 22 08:07:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397881 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F08CE17F0 for ; Sat, 22 Feb 2020 08:02:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CFC62214DB for ; Sat, 22 Feb 2020 08:02:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727286AbgBVIC2 (ORCPT ); Sat, 22 Feb 2020 03:02:28 -0500 Received: from mga04.intel.com ([192.55.52.120]:65090 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727030AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547677" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 09/22] hw/pci: add pci_device_host_iommu_context() Date: Sat, 22 Feb 2020 00:07:10 -0800 Message-Id: <1582358843-51931-10-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds pci_device_host_iommu_context() to expose HostIOMMUContext to vIOMMU emulators via pci layer. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Michael S. Tsirkin Signed-off-by: Liu Yi L --- hw/pci/pci.c | 8 ++++++++ include/hw/pci/pci.h | 1 + 2 files changed, 9 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 3166cc3..288576f 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2689,6 +2689,14 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) return &address_space_memory; } +HostIOMMUContext *pci_device_host_iommu_context(PCIDevice *dev) +{ + if (dev && dev->host_iommu_fn) { + return dev->host_iommu_fn(dev); + } + return NULL; +} + void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque) { bus->iommu_fn = fn; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index e44eefb..cb514d0 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -494,6 +494,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev); void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque); void pci_device_setup_iommu(PCIDevice *dev, PCIHostIOMMUFunc fn); void pci_device_unset_iommu(PCIDevice *dev); +HostIOMMUContext *pci_device_host_iommu_context(PCIDevice *dev); static inline void pci_set_byte(uint8_t *config, uint8_t val) From patchwork Sat Feb 22 08:07:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397847 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21AAB1892 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F38D2208C3 for ; Sat, 22 Feb 2020 08:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727126AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 Received: from mga04.intel.com ([192.55.52.120]:65092 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgBVIB7 (ORCPT ); Sat, 22 Feb 2020 03:01:59 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547680" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 10/22] intel_iommu: add virtual command capability support Date: Sat, 22 Feb 2020 00:07:11 -0800 Message-Id: <1582358843-51931-11-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds virtual command support to Intel vIOMMU per Intel VT-d 3.1 spec. And adds two virtual commands: allocate pasid and free pasid. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L Signed-off-by: Yi Sun --- hw/i386/intel_iommu.c | 166 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 37 +++++++++ hw/i386/trace-events | 1 + include/hw/i386/intel_iommu.h | 6 +- 4 files changed, 209 insertions(+), 1 deletion(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 6258c58..fcb80cd 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2649,6 +2649,148 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s) } } +static int vtd_request_pasid_alloc(IntelIOMMUState *s, uint32_t *pasid) +{ + VTDBus *vtd_bus; + int bus_n, devfn, ret = -1; + + for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) { + vtd_bus = vtd_find_as_from_bus_num(s, bus_n); + if (!vtd_bus) { + continue; + } + for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) { + HostIOMMUContext *host_icx; + PCIDevice *dev; + + dev = vtd_bus->bus->devices[devfn]; + host_icx = pci_device_host_iommu_context(dev); + if (!host_icx) { + continue; + } + + /* + * We'll return the first valid result we got. It's + * a bit hackish in that we don't have a good global + * interface yet to talk to modules like vfio to deliver + * this allocation request, so we're leveraging this + * per-device iommu context to do the same thing just + * to make sure the allocation happens only once. + */ + ret = host_iommu_ctx_pasid_alloc(host_icx, VTD_MIN_HPASID, + VTD_MAX_HPASID, pasid); + if (!ret) { + break; + } + } + } + return ret; +} + +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid) +{ + VTDBus *vtd_bus; + int bus_n, devfn, ret = -1; + + for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) { + vtd_bus = vtd_find_as_from_bus_num(s, bus_n); + if (!vtd_bus) { + continue; + } + for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) { + HostIOMMUContext *host_icx; + PCIDevice *dev; + + dev = vtd_bus->bus->devices[devfn]; + host_icx = pci_device_host_iommu_context(dev); + if (!host_icx) { + continue; + } + /* + * Similar with pasid allocation. We'll free the pasid + * on the first successful free operation. It's a bit + * hackish in that we don't have a good global interface + * yet to talk to modules like vfio to deliver this pasid + * free request, so we're leveraging this per-device iommu + * context to do the same thing just to make sure the free + * happens only once. + */ + ret = host_iommu_ctx_pasid_free(host_icx, pasid); + if (!ret) { + break; + } + } + } + return ret; +} + +/* + * If IP is not set, set it then return. + * If IP is already set, return. + */ +static void vtd_vcmd_set_ip(IntelIOMMUState *s) +{ + s->vcrsp = 1; + vtd_set_quad_raw(s, DMAR_VCRSP_REG, + ((uint64_t) s->vcrsp)); +} + +static void vtd_vcmd_clear_ip(IntelIOMMUState *s) +{ + s->vcrsp &= (~((uint64_t)(0x1))); + vtd_set_quad_raw(s, DMAR_VCRSP_REG, + ((uint64_t) s->vcrsp)); +} + +/* Handle write to Virtual Command Register */ +static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val) +{ + uint32_t pasid; + int ret = -1; + + trace_vtd_reg_write_vcmd(s->vcrsp, val); + + if (!(s->vccap & VTD_VCCAP_PAS) || + (s->vcrsp & 1)) { + return -1; + } + + /* + * Since vCPU should be blocked when the guest VMCD + * write was trapped to here. Should be no other vCPUs + * try to access VCMD if guest software is well written. + * However, we still emulate the IP bit here in case of + * bad guest software. Also align with the spec. + */ + vtd_vcmd_set_ip(s); + + switch (val & VTD_VCMD_CMD_MASK) { + case VTD_VCMD_ALLOC_PASID: + ret = vtd_request_pasid_alloc(s, &pasid); + if (ret) { + s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_NO_AVAILABLE_PASID); + } else { + s->vcrsp |= VTD_VCRSP_RSLT(pasid); + } + break; + + case VTD_VCMD_FREE_PASID: + pasid = VTD_VCMD_PASID_VALUE(val); + ret = vtd_request_pasid_free(s, pasid); + if (ret < 0) { + s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_FREE_INVALID_PASID); + } + break; + + default: + s->vcrsp |= VTD_VCRSP_SC(VTD_VCMD_UNDEFINED_CMD); + error_report_once("Virtual Command: unsupported command!!!"); + break; + } + vtd_vcmd_clear_ip(s); + return 0; +} + static uint64_t vtd_mem_read(void *opaque, hwaddr addr, unsigned size) { IntelIOMMUState *s = opaque; @@ -2937,6 +3079,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr, vtd_set_long(s, addr, val); break; + case DMAR_VCMD_REG: + if (!vtd_handle_vcmd_write(s, val)) { + if (size == 4) { + vtd_set_long(s, addr, val); + } else { + vtd_set_quad(s, addr, val); + } + } + break; + + case DMAR_VCMD_REG_HI: + assert(size == 4); + if (!vtd_handle_vcmd_write(s, val)) { + vtd_set_long(s, addr, val); + } + break; + default: if (size == 4) { vtd_set_long(s, addr, val); @@ -3697,6 +3856,13 @@ static void vtd_init(IntelIOMMUState *s) * Interrupt remapping registers. */ vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xfffffffffffff80fULL, 0); + + /* + * Virtual Command Definitions + */ + vtd_define_quad(s, DMAR_VCCAP_REG, s->vccap, 0, 0); + vtd_define_quad(s, DMAR_VCMD_REG, 0, 0xffffffffffffffffULL, 0); + vtd_define_quad(s, DMAR_VCRSP_REG, 0, 0, 0); } /* Should not reset address_spaces when reset because devices will still use diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 862033e..1d997a1 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -85,6 +85,12 @@ #define DMAR_MTRRCAP_REG_HI 0x104 #define DMAR_MTRRDEF_REG 0x108 /* MTRR default type */ #define DMAR_MTRRDEF_REG_HI 0x10c +#define DMAR_VCCAP_REG 0xE00 /* Virtual Command Capability Register */ +#define DMAR_VCCAP_REG_HI 0xE04 +#define DMAR_VCMD_REG 0xE10 /* Virtual Command Register */ +#define DMAR_VCMD_REG_HI 0xE14 +#define DMAR_VCRSP_REG 0xE20 /* Virtual Command Reponse Register */ +#define DMAR_VCRSP_REG_HI 0xE24 /* IOTLB registers */ #define DMAR_IOTLB_REG_OFFSET 0xf0 /* Offset to the IOTLB registers */ @@ -312,6 +318,37 @@ typedef enum VTDFaultReason { #define VTD_CONTEXT_CACHE_GEN_MAX 0xffffffffUL +/* VCCAP_REG */ +#define VTD_VCCAP_PAS (1UL << 0) + +/* + * The basic idea is to let hypervisor to set a range for available + * PASIDs for VMs. One of the reasons is PASID #0 is reserved by + * RID_PASID usage. We have no idea how many reserved PASIDs in future, + * so here just an evaluated value. Honestly, set it as "1" is enough + * at current stage. + */ +#define VTD_MIN_HPASID 1 +#define VTD_MAX_HPASID 0xFFFFF + +/* Virtual Command Register */ +enum { + VTD_VCMD_NULL_CMD = 0, + VTD_VCMD_ALLOC_PASID = 1, + VTD_VCMD_FREE_PASID = 2, + VTD_VCMD_CMD_NUM, +}; + +#define VTD_VCMD_CMD_MASK 0xffUL +#define VTD_VCMD_PASID_VALUE(val) (((val) >> 8) & 0xfffff) + +#define VTD_VCRSP_RSLT(val) ((val) << 8) +#define VTD_VCRSP_SC(val) (((val) & 0x3) << 1) + +#define VTD_VCMD_UNDEFINED_CMD 1ULL +#define VTD_VCMD_NO_AVAILABLE_PASID 2ULL +#define VTD_VCMD_FREE_INVALID_PASID 2ULL + /* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */ struct VTDInvDescIEC { uint32_t type:4; /* Should always be 0x4 */ diff --git a/hw/i386/trace-events b/hw/i386/trace-events index e48bef2..71536a7 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -51,6 +51,7 @@ vtd_reg_write_gcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%" vtd_reg_write_fectl(uint32_t value) "value 0x%"PRIx32 vtd_reg_write_iectl(uint32_t value) "value 0x%"PRIx32 vtd_reg_ics_clear_ip(void) "" +vtd_reg_write_vcmd(uint32_t status, uint32_t val) "status 0x%"PRIx32" value 0x%"PRIx32 vtd_dmar_translate(uint8_t bus, uint8_t slot, uint8_t func, uint64_t iova, uint64_t gpa, uint64_t mask) "dev %02x:%02x.%02x iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64 vtd_dmar_enable(bool en) "enable %d" vtd_dmar_fault(uint16_t sid, int fault, uint64_t addr, bool is_write) "sid 0x%"PRIx16" fault %d addr 0x%"PRIx64" write %d" diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 66b931e..d8a79d3 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -46,7 +46,7 @@ #define VTD_SID_TO_BUS(sid) (((sid) >> 8) & 0xff) #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) -#define DMAR_REG_SIZE 0x230 +#define DMAR_REG_SIZE 0xF00 #define VTD_HOST_AW_39BIT 39 #define VTD_HOST_AW_48BIT 48 #define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT @@ -271,6 +271,10 @@ struct IntelIOMMUState { uint8_t aw_bits; /* Host/IOVA address width (in bits) */ bool dma_drain; /* Whether DMA r/w draining enabled */ + /* Virtual Command Register */ + uint64_t vccap; /* The value of vcmd capability reg */ + uint64_t vcrsp; /* Current value of VCMD RSP REG */ + /* * Protects IOMMU states in general. Currently it protects the * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace. From patchwork Sat Feb 22 08:07:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397875 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9CFD7138D for ; Sat, 22 Feb 2020 08:02:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 87B37208C4 for ; Sat, 22 Feb 2020 08:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727327AbgBVICX (ORCPT ); Sat, 22 Feb 2020 03:02:23 -0500 Received: from mga05.intel.com ([192.55.52.43]:63022 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727100AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547683" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 11/22] intel_iommu: process pasid cache invalidation Date: Sat, 22 Feb 2020 00:07:12 -0800 Message-Id: <1582358843-51931-12-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds PASID cache invalidation handling. When guest enabled PASID usages (e.g. SVA), guest software should issue a proper PASID cache invalidation when caching-mode is exposed. This patch only adds the draft handling of pasid cache invalidation. Detailed handling will be added in subsequent patches. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 66 ++++++++++++++++++++++++++++++++++++++---- hw/i386/intel_iommu_internal.h | 12 ++++++++ hw/i386/trace-events | 3 ++ 3 files changed, 76 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index fcb80cd..462449c 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2393,6 +2393,63 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id) +{ + return 0; +} + +static int vtd_pasid_cache_psi(IntelIOMMUState *s, + uint16_t domain_id, uint32_t pasid) +{ + return 0; +} + +static int vtd_pasid_cache_gsi(IntelIOMMUState *s) +{ + return 0; +} + +static bool vtd_process_pasid_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + uint16_t domain_id; + uint32_t pasid; + int ret = 0; + + if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || + (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) || + (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) { + error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]); + + switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) { + case VTD_INV_DESC_PASIDC_DSI: + ret = vtd_pasid_cache_dsi(s, domain_id); + break; + + case VTD_INV_DESC_PASIDC_PASID_SI: + ret = vtd_pasid_cache_psi(s, domain_id, pasid); + break; + + case VTD_INV_DESC_PASIDC_GLOBAL: + ret = vtd_pasid_cache_gsi(s); + break; + + default: + error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + return (ret == 0) ? true : false; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -2499,12 +2556,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; - /* - * TODO: the entity of below two cases will be implemented in future series. - * To make guest (which integrates scalable mode support patch set in - * iommu driver) work, just return true is enough so far. - */ case VTD_INV_DESC_PC: + trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_pasid_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_PIOTLB: diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 1d997a1..0ca5f0b 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -444,6 +444,18 @@ typedef union VTDInvDesc VTDInvDesc; (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \ (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_INV_DESC_PASIDC_G (3ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PASIDC_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PASIDC_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL1 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL2 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL3 0xffffffffffffffffULL + +#define VTD_INV_DESC_PASIDC_DSI (0ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) +#define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 71536a7..f7cd4e5 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -22,6 +22,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d" vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" +vtd_pasid_cache_gsi(void) "" +vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 +vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Sat Feb 22 08:07:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48740196C for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 28A80208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727180AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547687" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 12/22] intel_iommu: add PASID cache management infrastructure Date: Sat, 22 Feb 2020 00:07:13 -0800 Message-Id: <1582358843-51931-13-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds a PASID cache management infrastructure based on new added structure VTDPASIDAddressSpace, which is used to track the PASID usage and future PASID tagged DMA address translation support in vIOMMU. struct VTDPASIDAddressSpace { VTDBus *vtd_bus; uint8_t devfn; AddressSpace as; uint32_t pasid; IntelIOMMUState *iommu_state; VTDContextCacheEntry context_cache_entry; QLIST_ENTRY(VTDPASIDAddressSpace) next; VTDPASIDCacheEntry pasid_cache_entry; }; Ideally, a VTDPASIDAddressSpace instance is created when a PASID is bound with a DMA AddressSpace. Intel VT-d spec requires guest software to issue pasid cache invalidation when bind or unbind a pasid with an address space under caching-mode. However, as VTDPASIDAddressSpace instances also act as pasid cache in this implementation, its creation also happens during vIOMMU PASID tagged DMA translation. The creation in this path will not be added in this patch since no PASID-capable emulated devices for now. The implementation in this patch manages VTDPASIDAddressSpace instances per PASID+BDF (lookup and insert will use PASID and BDF) since Intel VT-d spec allows per-BDF PASID Table. When a guest bind a PASID with an AddressSpace, QEMU will capture the guest pasid selective pasid cache invalidation, and allocate remove a VTDPASIDAddressSpace instance per the invalidation reasons: *) a present pasid entry moved to non-present *) a present pasid entry to be a present entry *) a non-present pasid entry moved to present vIOMMU emulator could figure out the reason by fetching latest guest pasid entry. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 386 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 14 ++ hw/i386/trace-events | 1 + include/hw/i386/intel_iommu.h | 33 +++- 4 files changed, 433 insertions(+), 1 deletion(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 462449c..b032a7c 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -40,6 +40,7 @@ #include "kvm_i386.h" #include "migration/vmstate.h" #include "trace.h" +#include "qemu/jhash.h" /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -65,6 +66,8 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); +static void vtd_pasid_cache_reset(IntelIOMMUState *s); + static void vtd_panic_require_caching_mode(void) { error_report("We need to set caching-mode=on for intel-iommu to enable " @@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s) vtd_iommu_lock(s); vtd_reset_iotlb_locked(s); vtd_reset_context_cache_locked(s); + vtd_pasid_cache_reset(s); vtd_iommu_unlock(s); } @@ -686,6 +690,11 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu, return true; } +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe) +{ + return VTD_SM_PASID_ENTRY_DID((pe)->val[1]); +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -2393,19 +2402,394 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static inline void vtd_init_pasid_key(uint32_t pasid, + uint16_t sid, + struct pasid_key *key) +{ + key->pasid = pasid; + key->sid = sid; +} + +static guint vtd_pasid_as_key_hash(gconstpointer v) +{ + struct pasid_key *key = (struct pasid_key *)v; + uint32_t a, b, c; + + /* Jenkins hash */ + a = b = c = JHASH_INITVAL + sizeof(*key); + a += key->sid; + b += extract32(key->pasid, 0, 16); + c += extract32(key->pasid, 16, 16); + + __jhash_mix(a, b, c); + __jhash_final(a, b, c); + + return c; +} + +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2) +{ + const struct pasid_key *k1 = v1; + const struct pasid_key *k2 = v2; + + return (k1->pasid == k2->pasid) && (k1->sid == k2->sid); +} + +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s, + uint8_t bus_num, + uint8_t devfn, + uint32_t pasid, + VTDPASIDEntry *pe) +{ + VTDContextEntry ce; + int ret; + dma_addr_t pasid_dir_base; + + if (!s->root_scalable) { + return -VTD_FR_PASID_TABLE_INV; + } + + ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); + if (ret) { + return ret; + } + + pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce); + ret = vtd_get_pe_from_pasid_table(s, + pasid_dir_base, pasid, pe); + + return ret; +} + +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2) +{ + return !memcmp(p1, p2, sizeof(*p1)); +} + +/** + * This function is used to clear pasid_cache_gen of cached pasid + * entry in vtd_pasid_as instances. Caller of this function should + * hold iommu_lock. + */ +static gboolean vtd_flush_pasid(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPASIDCacheInfo *pc_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + VTDBus *vtd_bus = vtd_pasid_as->vtd_bus; + VTDPASIDEntry pe; + uint16_t did; + uint32_t pasid; + uint16_t devfn; + int ret; + + did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); + pasid = vtd_pasid_as->pasid; + devfn = vtd_pasid_as->devfn; + + if (!(pc_entry->pasid_cache_gen == s->pasid_cache_gen)) { + return false; + } + + switch (pc_info->flags & VTD_PASID_CACHE_INFO_MASK) { + case VTD_PASID_CACHE_PASIDSI: + if (pc_info->pasid != pasid) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_DOMSI: + if (pc_info->domain_id != did) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_GLOBAL: + break; + default: + error_report("invalid pc_info->flags"); + abort(); + } + + /* + * pasid cache invalidation may indicate a present pasid + * entry to present pasid entry modification. To cover such + * case, vIOMMU emulator needs to fetch latest guest pasid + * entry and check cached pasid entry, then update pasid + * cache and send pasid bind/unbind to host properly. + */ + ret = vtd_dev_get_pe_from_pasid(s, + pci_bus_num(vtd_bus->bus), devfn, pasid, &pe); + if (ret) { + /* + * No valid pasid entry in guest memory. e.g. pasid entry + * was modified to be either all-zero or non-present. Either + * case means existing pasid cache should be removed. + */ + goto remove; + } + /* Compare cached pasid entry and latest pasid entry */ + if (!vtd_pasid_entry_compare(&pe, &pc_entry->pasid_entry)) { + /* pasid entry was updated, thus update the pasid cache */ + pc_entry->pasid_entry = pe; + pc_entry->pasid_cache_gen = s->pasid_cache_gen; + /* + * TODO: + * - send pasid bind to host for passthru devices + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + } + return false; +remove: + /* + * TODO: + * - send pasid unbind to host for passthru devices + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + return true; +} + static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id) { + VTDPASIDCacheInfo pc_info; + + trace_vtd_pasid_cache_dsi(domain_id); + + pc_info.flags = VTD_PASID_CACHE_DOMSI; + pc_info.domain_id = domain_id; + + /* + * Loop all existing pasid caches and update them. + */ + vtd_iommu_lock(s); + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, &pc_info); + vtd_iommu_unlock(s); + + /* + * TODO: Domain selective PASID cache invalidation + * flushes all the pasid caches within a domain. To + * be safe, after invalidating the pasid caches, emulator + * needs to replay the pasid bindings by walking guest + * pasid dir and pasid table. e.g. When the guest setup a + * new PASID entry then send a PASID DSI. + */ return 0; } +/** + * This function finds or adds a VTDPASIDAddressSpace for a device + * when it is bound to a pasid. Caller of this function should hold + * iommu_lock. + */ +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, + VTDBus *vtd_bus, + int devfn, + uint32_t pasid) +{ + struct pasid_key key; + struct pasid_key *new_key; + VTDPASIDAddressSpace *vtd_pasid_as; + uint16_t sid; + + sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn); + vtd_init_pasid_key(pasid, sid, &key); + vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key); + + if (!vtd_pasid_as) { + new_key = g_malloc0(sizeof(*new_key)); + vtd_init_pasid_key(pasid, sid, new_key); + /* + * Initiate the vtd_pasid_as structure. + * + * This structure here is used to track the guest pasid + * binding and also serves as pasid-cache mangement entry. + * + * TODO: in future, if wants to support the SVA-aware DMA + * emulation, the vtd_pasid_as should have include + * AddressSpace to support DMA emulation. + */ + vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace)); + vtd_pasid_as->iommu_state = s; + vtd_pasid_as->vtd_bus = vtd_bus; + vtd_pasid_as->devfn = devfn; + vtd_pasid_as->context_cache_entry.context_cache_gen = 0; + vtd_pasid_as->pasid = pasid; + vtd_pasid_as->pasid_cache_entry.pasid_cache_gen = 0; + g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as); + } + return vtd_pasid_as; +} + + /** + * This function cached the pasid entry in &vtd_pasid_as. + * Caller of this function should hold iommu_lock. + */ +static inline void vtd_fill_in_pe_cache( + VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe) +{ + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + + pc_entry->pasid_entry = *pe; + pc_entry->pasid_cache_gen = s->pasid_cache_gen; +} + +/** + * Caller of this function should hold iommu_lock + */ +static void vtd_new_pasid_bind_for_dev(IntelIOMMUState *s, VTDBus *vtd_bus, + uint16_t devfn, uint16_t domain_id, + uint32_t pasid) +{ + VTDPASIDAddressSpace *vtd_pasid_as; + VTDPASIDEntry pe; + VTDPASIDCacheEntry *pc_entry; + int bus_n = pci_bus_num(vtd_bus->bus); + + /* i) fetch vtd_pasid_as and check if it is valid */ + vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus, + devfn, pasid); + pc_entry = &vtd_pasid_as->pasid_cache_entry; + if (s->pasid_cache_gen == pc_entry->pasid_cache_gen) { + /* + * pasid_cache_gen equals to s->pasid_cache_gen means + * vtd_pasid_as is valid after the above s->vtd_pasid_as + * updates. Thus no need for the below steps. + */ + return; + } + + /* + * ii) vtd_pasid_as is not valid, it's potentailly a new + * pasid bind. Fetch guest pasid entry. + */ + if (vtd_dev_get_pe_from_pasid(s, bus_n, devfn, pasid, &pe)) { + return; + } + + /* + * iii) pasid entry exists, update pasid cache + * + * Here need to check domain ID since guest pasid entry + * exists. What needs to do are: + * - update the pc_entry in the vtd_pasid_as + * - set proper pc_entry.pasid_cache_gen + * - pass down the latest guest pasid entry config to host + * (will be added in later patch) + */ + if (domain_id == vtd_pe_get_domain_id(&pe)) { + vtd_fill_in_pe_cache(vtd_pasid_as, &pe); + } +} + static int vtd_pasid_cache_psi(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid) { + VTDPASIDCacheInfo pc_info; + VTDBus *vtd_bus; + int bus_n, devfn; + + /* PASID selective implies a DID selective */ + pc_info.flags = VTD_PASID_CACHE_PASIDSI; + pc_info.domain_id = domain_id; + pc_info.pasid = pasid; + + /* + * Regards to a pasid selective pasid cache invalidation (PSI), + * it could be either cases of below: + * a) a present pasid entry moved to non-present + * b) a present pasid entry to be a present entry + * c) a non-present pasid entry moved to present + * + * Here the handling of a PSI follows below steps: + * 1) loop all the exisitng vtd_pasid_as instances to update them + * according to the latest guest pasid entry in pasid table. + * this will make sure affected existing vtd_pasid_as instances + * cached the latest pasid entries. Also, during the loop, the + * host should be notified if needed. e.g. pasid unbind or pasid + * update. Should be able to cover case a) and case b). + * + * 2) loop all devices to cover case c) + * - For devices which have HostIOMMUContext instances, + * we loop them and check if guest pasid entry exists. If yes, + * it is case c), we update the pasid cache and also notify + * host. + * - For devices which have no HostIOMMUContext, it is not + * necessary to create pasid cache at this phase since it + * could be created when vIOMMU does DMA address translation. + * This is not yet implemented since there is no emulated + * pasid-capable devices today. If we have such devices in + * future, the pasid cache shall be created there. + */ + + vtd_iommu_lock(s); + /* Step 1: loop all the exisitng vtd_pasid_as instances */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, &pc_info); + + /* Step 1: loop all the exisitng vtd_pasid_as instances */ + for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) { + vtd_bus = vtd_find_as_from_bus_num(s, bus_n); + if (!vtd_bus) { + continue; + } + for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) { + PCIDevice *dev; + + dev = vtd_bus->bus->devices[devfn]; + if (pci_device_host_iommu_context(dev)) { + vtd_new_pasid_bind_for_dev(s, vtd_bus, devfn, + domain_id, pasid); + } + } + } + vtd_iommu_unlock(s); return 0; } +/** + * Caller of this function should hold iommu_lock + */ +static void vtd_pasid_cache_reset(IntelIOMMUState *s) +{ + VTDPASIDCacheInfo pc_info; + + trace_vtd_pasid_cache_reset(); + + pc_info.flags = VTD_PASID_CACHE_GLOBAL; + + /* + * Reset pasid cache is a big hammer, so use + * g_hash_table_foreach_remove which will free + * the vtd_pasid_as instances, indicates the + * cached pasid_cache_gen would be set to 0. + */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, &pc_info); + s->pasid_cache_gen = 1; +} + static int vtd_pasid_cache_gsi(IntelIOMMUState *s) { + trace_vtd_pasid_cache_gsi(); + + vtd_iommu_lock(s); + s->pasid_cache_gen++; + if (s->pasid_cache_gen > PASID_CACHE_GEN_MAX) { + vtd_pasid_cache_reset(s); + } + vtd_iommu_unlock(s); + + /* + * TODO: Global PASID cache invalidation may be + * flushes all the pasid caches. To be safe, after + * invalidating the pasid caches, emulator needs + * to replay the pasid bindings by walking guest + * pasid dir and pasid table. + */ return 0; } @@ -4052,6 +4436,8 @@ static void vtd_realize(DeviceState *dev, Error **errp) g_free, g_free); s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal, g_free, g_free); + s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash, + vtd_pasid_as_key_equal, g_free, g_free); vtd_init(s); sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR); pci_setup_iommu(bus, vtd_host_dma_iommu, dev); diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 0ca5f0b..2684769 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -307,6 +307,7 @@ typedef enum VTDFaultReason { VTD_FR_IR_SID_ERR = 0x26, /* Invalid Source-ID */ VTD_FR_PASID_TABLE_INV = 0x58, /*Invalid PASID table entry */ + VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */ /* This is not a normal fault reason. We use this to indicate some faults * that are not referenced by the VT-d specification. @@ -481,6 +482,19 @@ struct VTDRootEntry { }; typedef struct VTDRootEntry VTDRootEntry; +struct VTDPASIDCacheInfo { +#define VTD_PASID_CACHE_GLOBAL (1ULL << 0) +#define VTD_PASID_CACHE_DOMSI (1ULL << 1) +#define VTD_PASID_CACHE_PASIDSI (1ULL << 2) + uint32_t flags; + uint16_t domain_id; + uint32_t pasid; +}; +#define VTD_PASID_CACHE_INFO_MASK (VTD_PASID_CACHE_GLOBAL | \ + VTD_PASID_CACHE_DOMSI | \ + VTD_PASID_CACHE_PASIDSI) +typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; + /* Masks for struct VTDRootEntry */ #define VTD_ROOT_ENTRY_P 1ULL #define VTD_ROOT_ENTRY_CTP (~0xfffULL) diff --git a/hw/i386/trace-events b/hw/i386/trace-events index f7cd4e5..87364a3 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -22,6 +22,7 @@ vtd_inv_qi_head(uint16_t head) "read head %d" vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" +vtd_pasid_cache_reset(void) "" vtd_pasid_cache_gsi(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index d8a79d3..ff41af0 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -68,6 +68,8 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry; typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress; typedef struct VTDPASIDDirEntry VTDPASIDDirEntry; typedef struct VTDPASIDEntry VTDPASIDEntry; +typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry; +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace; /* Context-Entry */ struct VTDContextEntry { @@ -100,6 +102,31 @@ struct VTDPASIDEntry { uint64_t val[8]; }; +struct pasid_key { + uint32_t pasid; + uint16_t sid; +}; + +struct VTDPASIDCacheEntry { + /* + * The cache entry is obsolete if + * pasid_cache_gen!=IntelIOMMUState.pasid_cache_gen + */ + uint32_t pasid_cache_gen; + struct VTDPASIDEntry pasid_entry; +}; + +struct VTDPASIDAddressSpace { + VTDBus *vtd_bus; + uint8_t devfn; + AddressSpace as; + uint32_t pasid; + IntelIOMMUState *iommu_state; + VTDContextCacheEntry context_cache_entry; + QLIST_ENTRY(VTDPASIDAddressSpace) next; + VTDPASIDCacheEntry pasid_cache_entry; +}; + struct VTDAddressSpace { PCIBus *bus; uint8_t devfn; @@ -258,6 +285,9 @@ struct IntelIOMMUState { GHashTable *vtd_as_by_busptr; /* VTDBus objects indexed by PCIBus* reference */ VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */ + GHashTable *vtd_pasid_as; /* VTDPASIDAddressSpace instances */ +#define PASID_CACHE_GEN_MAX 512 + uint32_t pasid_cache_gen; /* Should be in [1,MAX] */ /* list of registered notifiers */ QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers; @@ -277,7 +307,8 @@ struct IntelIOMMUState { /* * Protects IOMMU states in general. Currently it protects the - * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace. + * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace, + * and pasid cache in VTDPASIDAddressSpace. */ QemuMutex iommu_lock; }; From patchwork Sat Feb 22 08:07:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D6BB138D for ; Sat, 22 Feb 2020 08:02:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6EB28208C4 for ; Sat, 22 Feb 2020 08:02:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727297AbgBVICW (ORCPT ); Sat, 22 Feb 2020 03:02:22 -0500 Received: from mga04.intel.com ([192.55.52.120]:65090 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727093AbgBVICA (ORCPT ); Sat, 22 Feb 2020 03:02:00 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547689" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 13/22] vfio: add bind stage-1 page table support Date: Sat, 22 Feb 2020 00:07:14 -0800 Message-Id: <1582358843-51931-14-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds bind_stage1_pgtbl() definition in HostIOMMUOops, also adds corresponding implementation in VFIO. This is to expose a way for vIOMMU to setup dual stage DMA translation for passthru devices on hardware. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu, Yi L --- hw/iommu/host_iommu_context.c | 20 ++++++++++++++ hw/vfio/common.c | 49 +++++++++++++++++++++++++++++++++++ include/hw/iommu/host_iommu_context.h | 23 ++++++++++++++++ 3 files changed, 92 insertions(+) diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c index 689a087..5f7eb92 100644 --- a/hw/iommu/host_iommu_context.c +++ b/hw/iommu/host_iommu_context.c @@ -41,6 +41,26 @@ int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid) return -ENOENT; } +int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *data) +{ + if (host_icx && (host_icx->flags & HOST_IOMMU_NESTING) && + host_icx->ops && host_icx->ops->bind_stage1_pgtbl) { + return host_icx->ops->bind_stage1_pgtbl(host_icx, data); + } + return -ENOENT; +} + +int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *data) +{ + if (host_icx && (host_icx->flags & HOST_IOMMU_NESTING) && + host_icx->ops && host_icx->ops->unbind_stage1_pgtbl) { + return host_icx->ops->unbind_stage1_pgtbl(host_icx, data); + } + return -ENOENT; +} + void host_iommu_ctx_init(HostIOMMUContext *host_icx, uint64_t flags, HostIOMMUOps *ops, HostIOMMUInfo *uinfo) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 8f30a52..b560fdb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1223,9 +1223,57 @@ static int vfio_host_icx_pasid_free(HostIOMMUContext *host_icx, return 0; } +static int vfio_host_icx_bind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *bind_data) +{ + VFIOContainer *container = container_of(host_icx, VFIOContainer, host_icx); + struct vfio_iommu_type1_bind *bind; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*bind) + sizeof(bind_data->bind_data); + bind = g_malloc0(argsz); + bind->argsz = argsz; + bind->flags = VFIO_IOMMU_BIND_GUEST_PGTBL; + memcpy(&bind->data, &bind_data->bind_data, sizeof(bind_data->bind_data)); + + if (ioctl(container->fd, VFIO_IOMMU_BIND, bind)) { + ret = -errno; + error_report("%s: pasid (%u) bind failed: %d", + __func__, bind_data->pasid, ret); + } + g_free(bind); + return ret; +} + +static int vfio_host_icx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *bind_data) +{ + VFIOContainer *container = container_of(host_icx, VFIOContainer, host_icx); + struct vfio_iommu_type1_bind *bind; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*bind) + sizeof(bind_data->bind_data); + bind = g_malloc0(argsz); + bind->argsz = argsz; + bind->flags = VFIO_IOMMU_UNBIND_GUEST_PGTBL; + memcpy(&bind->data, &bind_data->bind_data, sizeof(bind_data->bind_data)); + + if (ioctl(container->fd, VFIO_IOMMU_BIND, bind)) { + ret = -errno; + error_report("%s: pasid (%u) unbind failed: %d", + __func__, bind_data->pasid, ret); + } + g_free(bind); + return ret; +} + static struct HostIOMMUOps vfio_host_icx_ops = { .pasid_alloc = vfio_host_icx_pasid_alloc, .pasid_free = vfio_host_icx_pasid_free, + .bind_stage1_pgtbl = vfio_host_icx_bind_stage1_pgtbl, + .unbind_stage1_pgtbl = vfio_host_icx_unbind_stage1_pgtbl, }; /** @@ -1354,6 +1402,7 @@ static int vfio_init_container(VFIOContainer *container, int group_fd, uinfo.stage1_format = nesting.stage1_format; flags |= (nesting.nesting_capabilities & VFIO_IOMMU_PASID_REQS) ? HOST_IOMMU_PASID_REQUEST : 0; + flags |= HOST_IOMMU_NESTING; host_iommu_ctx_init(&container->host_icx, flags, &vfio_host_icx_ops, &uinfo); } diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index 6797f6d..660fab8 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -31,6 +31,7 @@ typedef struct HostIOMMUContext HostIOMMUContext; typedef struct HostIOMMUOps HostIOMMUOps; typedef struct HostIOMMUInfo HostIOMMUInfo; +typedef struct DualIOMMUStage1BindData DualIOMMUStage1BindData; struct HostIOMMUOps { /* Allocate pasid from HostIOMMUContext (a.k.a. host software) */ @@ -41,6 +42,16 @@ struct HostIOMMUOps { /* Reclaim pasid from HostIOMMUContext (a.k.a. host software) */ int (*pasid_free)(HostIOMMUContext *host_icx, uint32_t pasid); + /* + * Bind stage-1 page table to a hostIOMMU w/ dual stage + * DMA translation capability. + * @bind_data specifies the bind configurations. + */ + int (*bind_stage1_pgtbl)(HostIOMMUContext *dsi_obj, + DualIOMMUStage1BindData *bind_data); + /* Undo a previous bind. @bind_data specifies the unbind info. */ + int (*unbind_stage1_pgtbl)(HostIOMMUContext *dsi_obj, + DualIOMMUStage1BindData *bind_data); }; struct HostIOMMUInfo { @@ -52,14 +63,26 @@ struct HostIOMMUInfo { */ struct HostIOMMUContext { #define HOST_IOMMU_PASID_REQUEST (1ULL << 0) +#define HOST_IOMMU_NESTING (1ULL << 1) uint64_t flags; HostIOMMUOps *ops; HostIOMMUInfo uinfo; }; +struct DualIOMMUStage1BindData { + uint32_t pasid; + union { + struct iommu_gpasid_bind_data gpasid_bind; + } bind_data; +}; + int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, uint32_t max, uint32_t *pasid); int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid); +int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *data); +int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, + DualIOMMUStage1BindData *data); void host_iommu_ctx_init(HostIOMMUContext *host_icx, uint64_t flags, HostIOMMUOps *ops, From patchwork Sat Feb 22 08:07:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 010E092A for ; Sat, 22 Feb 2020 08:02:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D54D1214DB for ; Sat, 22 Feb 2020 08:02:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727250AbgBVICP (ORCPT ); Sat, 22 Feb 2020 03:02:15 -0500 Received: from mga05.intel.com ([192.55.52.43]:63022 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727124AbgBVICC (ORCPT ); Sat, 22 Feb 2020 03:02:02 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547692" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 14/22] intel_iommu: bind/unbind guest page table to host Date: Sat, 22 Feb 2020 00:07:15 -0800 Message-Id: <1582358843-51931-15-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch captures the guest PASID table entry modifications and propagates the changes to host to setup dual stage DMA translation. The guest page table is configured as 1st level page table (GVA->GPA) whose translation result would further go through host VT-d 2nd level page table(GPA->HPA) under nested translation mode. This is a key part of vSVA support, and also a key to support IOVA over 1st level page table for Intel VT-d in virtualization environment. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 93 ++++++++++++++++++++++++++++++++++++++++-- hw/i386/intel_iommu_internal.h | 26 ++++++++++++ 2 files changed, 115 insertions(+), 4 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index b032a7c..8bd27b1 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -41,6 +41,7 @@ #include "migration/vmstate.h" #include "trace.h" #include "qemu/jhash.h" +#include /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -695,6 +696,16 @@ static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe) return VTD_SM_PASID_ENTRY_DID((pe)->val[1]); } +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe) +{ + return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9; +} + +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe) +{ + return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR; +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -1854,6 +1865,73 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) vtd_iommu_replay_all(s); } +static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus, + int devfn, int pasid, VTDPASIDEntry *pe, + VTDPASIDOp op) +{ + HostIOMMUContext *host_icx; + DualIOMMUStage1BindData *bind_data; + struct iommu_gpasid_bind_data *g_bind_data; + PCIDevice *dev; + int ret = -1; + + dev = vtd_bus->bus->devices[devfn]; + host_icx = pci_device_host_iommu_context(dev); + if (!host_icx) { + return ret; + } + + if (host_icx->uinfo.stage1_format + != IOMMU_PASID_FORMAT_INTEL_VTD) + { + error_report_once("IOMMU Stage 1 format is not compatible!\n"); + } + + bind_data = g_malloc0(sizeof(*bind_data)); + bind_data->pasid = pasid; + g_bind_data = &bind_data->bind_data.gpasid_bind; + + g_bind_data->flags = 0; + g_bind_data->vtd.flags = 0; + switch (op) { + case VTD_PASID_BIND: + case VTD_PASID_UPDATE: + g_bind_data->version = IOMMU_UAPI_VERSION; + g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD; + g_bind_data->gpgd = vtd_pe_get_flpt_base(pe); + g_bind_data->addr_width = vtd_pe_get_fl_aw(pe); + g_bind_data->hpasid = pasid; + g_bind_data->gpasid = pasid; + g_bind_data->flags |= IOMMU_SVA_GPASID_VAL; + g_bind_data->vtd.flags = + (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? 1 : 0) + | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? 1 : 0) + | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? 1 : 0) + | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? 1 : 0) + | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? 1 : 0) + | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? 1 : 0); + g_bind_data->vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]); + g_bind_data->vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]); + ret = host_iommu_ctx_bind_stage1_pgtbl(host_icx, bind_data); + break; + case VTD_PASID_UNBIND: + g_bind_data->version = IOMMU_UAPI_VERSION; + g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD; + g_bind_data->gpgd = 0; + g_bind_data->addr_width = 0; + g_bind_data->hpasid = pasid; + g_bind_data->gpasid = pasid; + g_bind_data->flags |= IOMMU_SVA_GPASID_VAL; + ret = host_iommu_ctx_unbind_stage1_pgtbl(host_icx, bind_data); + break; + default: + error_report_once("Unknown VTDPASIDOp!!\n"); + break; + } + g_free(bind_data); + return ret; +} + /* Do a context-cache device-selective invalidation. * @func_mask: FM field after shifting */ @@ -2533,18 +2611,20 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, /* pasid entry was updated, thus update the pasid cache */ pc_entry->pasid_entry = pe; pc_entry->pasid_cache_gen = s->pasid_cache_gen; + vtd_bind_guest_pasid(s, vtd_bus, devfn, + pasid, &pe, VTD_PASID_UPDATE); /* * TODO: - * - send pasid bind to host for passthru devices * - when pasid-base-iotlb(piotlb) infrastructure is ready, * should invalidate QEMU piotlb togehter with this change. */ } return false; remove: + vtd_bind_guest_pasid(s, vtd_bus, devfn, + pasid, NULL, VTD_PASID_UNBIND); /* * TODO: - * - send pasid unbind to host for passthru devices * - when pasid-base-iotlb(piotlb) infrastructure is ready, * should invalidate QEMU piotlb togehter with this change. */ @@ -2624,8 +2704,9 @@ static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, } /** - * This function cached the pasid entry in &vtd_pasid_as. - * Caller of this function should hold iommu_lock. + * This function cached the pasid entry in &vtd_pasid_as. Also + * notifies host about the new pasid binding. Caller of this + * function should hold iommu_lock. */ static inline void vtd_fill_in_pe_cache( VTDPASIDAddressSpace *vtd_pasid_as, VTDPASIDEntry *pe) @@ -2635,6 +2716,10 @@ static inline void vtd_fill_in_pe_cache( pc_entry->pasid_entry = *pe; pc_entry->pasid_cache_gen = s->pasid_cache_gen; + vtd_bind_guest_pasid(s, vtd_pasid_as->vtd_bus, + vtd_pasid_as->devfn, + vtd_pasid_as->pasid, + pe, VTD_PASID_BIND); } /** diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 2684769..9ee5856 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -482,6 +482,20 @@ struct VTDRootEntry { }; typedef struct VTDRootEntry VTDRootEntry; +enum VTD_DUAL_STAGE_UAPI { + UAPI_BIND_GPASID, + UAPI_NUM +}; +typedef enum VTD_DUAL_STAGE_UAPI VTD_DUAL_STAGE_UAPI; + +enum VTDPASIDOp { + VTD_PASID_BIND, + VTD_PASID_UNBIND, + VTD_PASID_UPDATE, + VTD_OP_NUM +}; +typedef enum VTDPASIDOp VTDPASIDOp; + struct VTDPASIDCacheInfo { #define VTD_PASID_CACHE_GLOBAL (1ULL << 0) #define VTD_PASID_CACHE_DOMSI (1ULL << 1) @@ -552,6 +566,18 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; #define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-address-width */ #define VTD_SM_PASID_ENTRY_DID(val) ((val) & VTD_DOMAIN_ID_MASK) +/* Adjusted guest-address-width */ +#define VTD_SM_PASID_ENTRY_FLPM 3ULL +#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL) +#define VTD_SM_PASID_ENTRY_SRE_BIT(val) (!!((val) & 1ULL)) +#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PCD_BIT(val) (!!(((val) >> 31) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PWT_BIT(val) (!!(((val) >> 30) & 1ULL)) +#define VTD_SM_PASID_ENTRY_EMTE_BIT(val) (!!(((val) >> 26) & 1ULL)) +#define VTD_SM_PASID_ENTRY_CD_BIT(val) (!!(((val) >> 25) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PAT(val) (((val) >> 32) & 0xFFFFFFFFULL) +#define VTD_SM_PASID_ENTRY_EMT(val) (((val) >> 27) & 0x7ULL) + /* Second Level Page Translation Pointer*/ #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL) From patchwork Sat Feb 22 08:07:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CEE7092A for ; Sat, 22 Feb 2020 08:02:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AFD4F214DB for ; Sat, 22 Feb 2020 08:02:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727275AbgBVICT (ORCPT ); Sat, 22 Feb 2020 03:02:19 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727086AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547695" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 15/22] intel_iommu: replay guest pasid bindings to host Date: Sat, 22 Feb 2020 00:07:16 -0800 Message-Id: <1582358843-51931-16-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds guest pasid bindings replay for domain selective pasid cache invalidation(dsi) and global pasid cache invalidation by walking guest pasid table. Reason: Guest OS may flush the pasid cache with a larger granularity. e.g. guest does a svm_bind() but flush the pasid cache with global or domain selective pasid cache invalidation instead of pasid selective(psi) pasid cache invalidation. Regards to such case, it works in host. Per spec, a global or domain selective pasid cache invalidation should be able to cover what a pasid selective invalidation does. The only concern is performance deduction since dsi and global cache invalidation will flush more than psi. To align with native, vIOMMU needs emulator needs to do replay for the two invalidation granularity to reflect the latest pasid bindings in guest pasid table. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 183 ++++++++++++++++++++++++++++++++++++++--- hw/i386/intel_iommu_internal.h | 1 + 2 files changed, 173 insertions(+), 11 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 8bd27b1..e7c9677 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -68,6 +68,8 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); static void vtd_pasid_cache_reset(IntelIOMMUState *s); +static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s, + VTDBus *vtd_bus, int devfn, int pasid, VTDPASIDEntry *pe); static void vtd_panic_require_caching_mode(void) { @@ -2631,6 +2633,127 @@ remove: return true; } +/** + * Constant information used during pasid table walk + @vtd_bus, @devfn: device info + * @flags: indicates if it is domain selective walk + * @did: domain ID of the pasid table walk + */ +typedef struct { + VTDBus *vtd_bus; + uint16_t devfn; +#define VTD_PASID_TABLE_DID_SEL_WALK (1ULL << 0); + uint32_t flags; + uint16_t did; +} vtd_pasid_table_walk_info; + +static bool vtd_sm_pasid_table_walk_one(IntelIOMMUState *s, + dma_addr_t pt_base, + int start, + int end, + vtd_pasid_table_walk_info *info) +{ + VTDPASIDEntry pe; + int pasid = start; + int pasid_next; + + while (pasid < end) { + pasid_next = pasid + 1; + + if (!vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) + && vtd_pe_present(&pe)) { + if (vtd_update_pe_cache_for_dev(s, info->vtd_bus, + info->devfn, pasid, &pe)) { + error_report_once("%s, bus: %d, devfn: %d, pasid: %d", + __func__, + pci_bus_num(info->vtd_bus->bus), + info->devfn, pasid); + return false; + } + } + pasid = pasid_next; + } + return true; +} + +/* + * Currently, VT-d scalable mode pasid table is a two level table, + * this function aims to loop a range of PASIDs in a given pasid + * table to identify the pasid config in guest. + */ +static void vtd_sm_pasid_table_walk(IntelIOMMUState *s, + dma_addr_t pdt_base, + int start, + int end, + vtd_pasid_table_walk_info *info) +{ + VTDPASIDDirEntry pdire; + int pasid = start; + int pasid_next; + dma_addr_t pt_base; + + while (pasid < end) { + pasid_next = pasid + VTD_PASID_TBL_ENTRY_NUM; + if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire) + && vtd_pdire_present(&pdire)) { + pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK; + if (!vtd_sm_pasid_table_walk_one(s, + pt_base, pasid, pasid_next, info)) { + break; + } + } + pasid = pasid_next; + } +} + +/** + * This function replay the guest pasid bindings to hots by + * walking the guest PASID table. This ensures host will have + * latest guest pasid bindings. + */ +static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, + uint16_t *did, + bool is_dsi) +{ + VTDContextEntry ce; + VTDBus *vtd_bus; + int bus_n, devfn; + vtd_pasid_table_walk_info info; + + if (is_dsi) { + info.flags = VTD_PASID_TABLE_DID_SEL_WALK; + info.did = *did; + } + + /* + * In this replay, only needs to care about the devices which + * has iommu_context created. For the one not have iommu_context, + * it is not necessary to replay the bindings since their cache + * could be re-created in the next DMA address transaltion. + */ + for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) { + vtd_bus = vtd_find_as_from_bus_num(s, bus_n); + if (!vtd_bus) { + continue; + } + for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) { + PCIDevice *dev; + + dev = vtd_bus->bus->devices[devfn]; + if (pci_device_host_iommu_context(dev) && + !vtd_dev_to_context_entry(s, bus_n, devfn, &ce)) { + info.vtd_bus = vtd_bus; + info.devfn = devfn; + vtd_sm_pasid_table_walk(s, + VTD_CE_GET_PASID_DIR_TABLE(&ce), + 0, + VTD_MAX_HPASID, + &info); + } + } + } +} + static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id) { VTDPASIDCacheInfo pc_info; @@ -2649,13 +2772,14 @@ static int vtd_pasid_cache_dsi(IntelIOMMUState *s, uint16_t domain_id) vtd_iommu_unlock(s); /* - * TODO: Domain selective PASID cache invalidation - * flushes all the pasid caches within a domain. To - * be safe, after invalidating the pasid caches, emulator - * needs to replay the pasid bindings by walking guest - * pasid dir and pasid table. e.g. When the guest setup a - * new PASID entry then send a PASID DSI. + * Domain selective PASID cache invalidation flushes + * all the pasid caches within a domain. To be safe, + * after invalidating the pasid caches, emulator needs + * to replay the pasid bindings by walking guest pasid + * dir and pasid table. e.g. When the guest setup a new + * PASID entry then send a PASID DSI. */ + vtd_replay_guest_pasid_bindings(s, &domain_id, true); return 0; } @@ -2723,6 +2847,42 @@ static inline void vtd_fill_in_pe_cache( } /** + * This function updates the pasid entry cached in &vtd_pasid_as. + */ +static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s, + VTDBus *vtd_bus, + int devfn, int pasid, + VTDPASIDEntry *pe) +{ + VTDPASIDAddressSpace *vtd_pasid_as; + VTDPASIDCacheEntry *pc_entry; + int ret; + + vtd_iommu_lock(s); + vtd_pasid_as = vtd_add_find_pasid_as(s, vtd_bus, + devfn, pasid); + if (!vtd_pasid_as) { + error_report_once("%s, fatal error happened!\n", __func__); + ret = -1; + goto out; + } + + pc_entry = &vtd_pasid_as->pasid_cache_entry; + if (pc_entry->pasid_cache_gen == s->pasid_cache_gen && + vtd_pasid_entry_compare(pe, &pc_entry->pasid_entry)) { + /* No need to go further as cached pasid entry is latest */ + ret = 0; + goto out; + } + + vtd_fill_in_pe_cache(vtd_pasid_as, pe); + ret = 0; +out: + vtd_iommu_unlock(s); + return ret; +} + +/** * Caller of this function should hold iommu_lock */ static void vtd_new_pasid_bind_for_dev(IntelIOMMUState *s, VTDBus *vtd_bus, @@ -2869,12 +3029,13 @@ static int vtd_pasid_cache_gsi(IntelIOMMUState *s) vtd_iommu_unlock(s); /* - * TODO: Global PASID cache invalidation may be - * flushes all the pasid caches. To be safe, after - * invalidating the pasid caches, emulator needs - * to replay the pasid bindings by walking guest - * pasid dir and pasid table. + * Global PASID cache invalidation flushes all + * the pasid caches. To be safe, after invalidating + * the pasid caches, emulator needs to replay the + * pasid bindings by walking guest pasid dir and + * pasid table. */ + vtd_replay_guest_pasid_bindings(s, NULL, false); return 0; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 9ee5856..46cec5c 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -554,6 +554,7 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; #define VTD_PASID_TABLE_BITS_MASK (0x3fULL) #define VTD_PASID_TABLE_INDEX(pasid) ((pasid) & VTD_PASID_TABLE_BITS_MASK) #define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */ +#define VTD_PASID_TBL_ENTRY_NUM (1ULL << 6) /* PASID Granular Translation Type Mask */ #define VTD_PASID_ENTRY_P 1ULL From patchwork Sat Feb 22 08:07:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397853 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 881691A2B for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71F53208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727219AbgBVICC (ORCPT ); Sat, 22 Feb 2020 03:02:02 -0500 Received: from mga04.intel.com ([192.55.52.120]:65090 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727152AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547699" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 16/22] intel_iommu: replay pasid binds after context cache invalidation Date: Sat, 22 Feb 2020 00:07:17 -0800 Message-Id: <1582358843-51931-17-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch replays guest pasid bindings after context cache invalidation. This is a behavior to ensure safety. Actually, programmer should issue pasid cache invalidation with proper granularity after issuing a context cache invalidation. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 67 ++++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 6 +++- hw/i386/trace-events | 1 + 3 files changed, 73 insertions(+), 1 deletion(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index e7c9677..b85aad3 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -70,6 +70,10 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); static void vtd_pasid_cache_reset(IntelIOMMUState *s); static int vtd_update_pe_cache_for_dev(IntelIOMMUState *s, VTDBus *vtd_bus, int devfn, int pasid, VTDPASIDEntry *pe); +static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, + uint16_t *did, bool is_dsi); +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + VTDBus *vtd_bus, uint16_t devfn); static void vtd_panic_require_caching_mode(void) { @@ -1865,6 +1869,8 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) * VT-d emulation codes. */ vtd_iommu_replay_all(s); + + vtd_replay_guest_pasid_bindings(s, NULL, false); } static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus, @@ -1991,6 +1997,22 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s, * happened. */ vtd_sync_shadow_page_table(vtd_as); + /* + * Per spec, context flush should also followed with PASID + * cache and iotlb flush. Regards to a device selective + * context cache invalidation: + * if (emaulted_device) + * modify the pasid cache gen and pasid-based iotlb gen + * value (will be added in following patches) + * else if (assigned_device) + * check if the device has been bound to any pasid + * invoke pasid_unbind regards to each bound pasid + * Here, we have vtd_pasid_cache_devsi() to invalidate pasid + * caches, while for piotlb in QEMU, we don't have it yet, so + * no handling. For assigned device, host iommu driver would + * flush piotlb when a pasid unbind is pass down to it. + */ + vtd_pasid_cache_devsi(s, vtd_bus, devfn_it); } } } @@ -2586,6 +2608,12 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, /* Fall through */ case VTD_PASID_CACHE_GLOBAL: break; + case VTD_PASID_CACHE_DEVSI: + if (pc_info->vtd_bus != vtd_bus || + pc_info->devfn == devfn) { + return false; + } + break; default: error_report("invalid pc_info->flags"); abort(); @@ -2995,6 +3023,45 @@ static int vtd_pasid_cache_psi(IntelIOMMUState *s, return 0; } +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + VTDBus *vtd_bus, uint16_t devfn) +{ + VTDPASIDCacheInfo pc_info; + VTDContextEntry ce; + PCIDevice *dev; + vtd_pasid_table_walk_info info; + + trace_vtd_pasid_cache_devsi(devfn); + + pc_info.flags = VTD_PASID_CACHE_DEVSI; + pc_info.vtd_bus = vtd_bus; + pc_info.devfn = devfn; + + vtd_iommu_lock(s); + g_hash_table_foreach_remove(s->vtd_pasid_as, vtd_flush_pasid, &pc_info); + vtd_iommu_unlock(s); + + /* + * To be safe, after invalidating the pasid caches, + * emulator needs to replay the pasid bindings by + * walking guest pasid dir and pasid table. + */ + dev = vtd_bus->bus->devices[devfn]; + if (pci_device_host_iommu_context(dev) && + !vtd_dev_to_context_entry(s, pci_bus_num(vtd_bus->bus), + devfn, &ce)) { + info.flags = 0x0; + info.did = 0; + info.vtd_bus = vtd_bus; + info.devfn = devfn; + vtd_sm_pasid_table_walk(s, + VTD_CE_GET_PASID_DIR_TABLE(&ce), + 0, + VTD_MAX_HPASID, + &info); + } +} + /** * Caller of this function should hold iommu_lock */ diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 46cec5c..d427895 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -500,13 +500,17 @@ struct VTDPASIDCacheInfo { #define VTD_PASID_CACHE_GLOBAL (1ULL << 0) #define VTD_PASID_CACHE_DOMSI (1ULL << 1) #define VTD_PASID_CACHE_PASIDSI (1ULL << 2) +#define VTD_PASID_CACHE_DEVSI (1ULL << 3) uint32_t flags; uint16_t domain_id; uint32_t pasid; + VTDBus *vtd_bus; + uint16_t devfn; }; #define VTD_PASID_CACHE_INFO_MASK (VTD_PASID_CACHE_GLOBAL | \ VTD_PASID_CACHE_DOMSI | \ - VTD_PASID_CACHE_PASIDSI) + VTD_PASID_CACHE_PASIDSI | \ + VTD_PASID_CACHE_DEVSI) typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; /* Masks for struct VTDRootEntry */ diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 87364a3..34bab09 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -26,6 +26,7 @@ vtd_pasid_cache_reset(void) "" vtd_pasid_cache_gsi(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 +vtd_pasid_cache_devsi(uint16_t devfn) "Dev selective PC invalidation dev: 0x%"PRIx16 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Sat Feb 22 08:07:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397867 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB6CE138D for ; Sat, 22 Feb 2020 08:02:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9525D214DB for ; Sat, 22 Feb 2020 08:02:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727189AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 Received: from mga04.intel.com ([192.55.52.120]:65092 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727156AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547702" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 17/22] intel_iommu: do not pass down pasid bind for PASID #0 Date: Sat, 22 Feb 2020 00:07:18 -0800 Message-Id: <1582358843-51931-18-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org RID_PASID field was introduced in VT-d 3.0 spec, it is used for DMA requests w/o PASID in scalable mode VT-d. It is also known as IOVA. And in VT-d 3.1 spec, there is definition on it: "Implementations not supporting RID_PASID capability (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address translation for requests without PASID." This patch adds a check against the PASIDs which are going to be bound to device. For PASID #0, it is not necessary to pass down pasid bind request for it since PASID #0 is used as RID_PASID for DMA requests without pasid. Further reason is current Intel vIOMMU supports gIOVA by shadowing guest 2nd level page table. However, in future, if guest IOMMU driver uses 1st level page table to store IOVA mappings, then guest IOVA support will also be done via nested translation. When gIOVA is over FLPT, then vIOMMU should pass down the pasid bind request for PASID #0 to host, host needs to bind the guest IOVA page table to a proper PASID. e.g PASID value in RID_PASID field for PF/VF if ECAP_REG.RPS is clear or default PASID for ADI (Assignable Device Interface in Scalable IOV solution). IOVA over FLPT support on Intel VT-d: https://lkml.org/lkml/2019/9/23/297 Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index b85aad3..cacc38b 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -1883,6 +1883,16 @@ static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus, PCIDevice *dev; int ret = -1; + if (pasid < VTD_MIN_HPASID) { + /* + * If pasid < VTD_HPASID_MIN, this pasid is not allocated + * from host. No need to pass down the changes on it to host. + * TODO: when IOVA over FLPT is ready, this switch should be + * refined. + */ + return 0; + } + dev = vtd_bus->bus->devices[devfn]; host_icx = pci_device_host_iommu_context(dev); if (!host_icx) { From patchwork Sat Feb 22 08:07:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 685921921 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 51E0E208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727201AbgBVICC (ORCPT ); Sat, 22 Feb 2020 03:02:02 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727164AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547704" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v3.1 18/22] vfio/common: add support for flush iommu stage-1 cache Date: Sat, 22 Feb 2020 00:07:19 -0800 Message-Id: <1582358843-51931-19-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds flush_stage1_cache() definition in HostIOMMUOps. And adds corresponding implementation in VFIO. This is to expose a way for vIOMMU to flush stage-1 cache in host side since guest owns stage-1 translation structures in dual stage DMA translation. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/iommu/host_iommu_context.c | 10 ++++++++++ hw/vfio/common.c | 24 ++++++++++++++++++++++++ include/hw/iommu/host_iommu_context.h | 14 ++++++++++++++ 3 files changed, 48 insertions(+) diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c index 5f7eb92..90be684 100644 --- a/hw/iommu/host_iommu_context.c +++ b/hw/iommu/host_iommu_context.c @@ -61,6 +61,16 @@ int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, return -ENOENT; } +int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *host_icx, + DualIOMMUStage1Cache *cache) +{ + if (host_icx && (host_icx->flags & HOST_IOMMU_NESTING) && + host_icx && host_icx->ops && host_icx->ops->flush_stage1_cache) { + return host_icx->ops->flush_stage1_cache(host_icx, cache); + } + return -ENOENT; +} + void host_iommu_ctx_init(HostIOMMUContext *host_icx, uint64_t flags, HostIOMMUOps *ops, HostIOMMUInfo *uinfo) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index b560fdb..305796b 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1269,11 +1269,35 @@ static int vfio_host_icx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, return ret; } +static int vfio_host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *host_icx, + DualIOMMUStage1Cache *cache) +{ + VFIOContainer *container = container_of(host_icx, VFIOContainer, host_icx); + struct vfio_iommu_type1_cache_invalidate *cache_inv; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*cache_inv) + sizeof(cache->cache_info); + cache_inv = g_malloc0(argsz); + cache_inv->argsz = argsz; + cache_inv->flags = 0; + memcpy(&cache_inv->cache_info, &cache->cache_info, + sizeof(cache->cache_info)); + + if (ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, cache_inv)) { + error_report("%s: iommu cache flush failed: %d", __func__, -errno); + ret = -errno; + } + g_free(cache_inv); + return ret; +} + static struct HostIOMMUOps vfio_host_icx_ops = { .pasid_alloc = vfio_host_icx_pasid_alloc, .pasid_free = vfio_host_icx_pasid_free, .bind_stage1_pgtbl = vfio_host_icx_bind_stage1_pgtbl, .unbind_stage1_pgtbl = vfio_host_icx_unbind_stage1_pgtbl, + .flush_stage1_cache = vfio_host_iommu_ctx_flush_stage1_cache, }; /** diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index 660fab8..a55d49a 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -32,6 +32,7 @@ typedef struct HostIOMMUContext HostIOMMUContext; typedef struct HostIOMMUOps HostIOMMUOps; typedef struct HostIOMMUInfo HostIOMMUInfo; typedef struct DualIOMMUStage1BindData DualIOMMUStage1BindData; +typedef struct DualIOMMUStage1Cache DualIOMMUStage1Cache; struct HostIOMMUOps { /* Allocate pasid from HostIOMMUContext (a.k.a. host software) */ @@ -52,6 +53,12 @@ struct HostIOMMUOps { /* Undo a previous bind. @bind_data specifies the unbind info. */ int (*unbind_stage1_pgtbl)(HostIOMMUContext *dsi_obj, DualIOMMUStage1BindData *bind_data); + /* + * Propagate stage-1 cache flush to host IOMMU, cache + * info specifid in @cache + */ + int (*flush_stage1_cache)(HostIOMMUContext *host_icx, + DualIOMMUStage1Cache *cache); }; struct HostIOMMUInfo { @@ -76,6 +83,11 @@ struct DualIOMMUStage1BindData { } bind_data; }; +struct DualIOMMUStage1Cache { + uint32_t pasid; + struct iommu_cache_invalidate_info cache_info; +}; + int host_iommu_ctx_pasid_alloc(HostIOMMUContext *host_icx, uint32_t min, uint32_t max, uint32_t *pasid); int host_iommu_ctx_pasid_free(HostIOMMUContext *host_icx, uint32_t pasid); @@ -83,6 +95,8 @@ int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *host_icx, DualIOMMUStage1BindData *data); int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *host_icx, DualIOMMUStage1BindData *data); +int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *host_icx, + DualIOMMUStage1Cache *cache); void host_iommu_ctx_init(HostIOMMUContext *host_icx, uint64_t flags, HostIOMMUOps *ops, From patchwork Sat Feb 22 08:07:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397865 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87F2F138D for ; Sat, 22 Feb 2020 08:02:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7237C214DB for ; Sat, 22 Feb 2020 08:02:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727193AbgBVICC (ORCPT ); Sat, 22 Feb 2020 03:02:02 -0500 Received: from mga04.intel.com ([192.55.52.120]:65096 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727158AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547708" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 19/22] intel_iommu: process PASID-based iotlb invalidation Date: Sat, 22 Feb 2020 00:07:20 -0800 Message-Id: <1582358843-51931-20-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds the basic PASID-based iotlb (piotlb) invalidation support. piotlb is used during walking Intel VT-d 1st level page table. This patch only adds the basic processing. Detailed handling will be added in next patch. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 57 ++++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 13 ++++++++++ 2 files changed, 70 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index cacc38b..b712eae 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3157,6 +3157,59 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return (ret == 0) ? true : false; } +static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, + uint16_t domain_id, + uint32_t pasid) +{ +} + +static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, + uint32_t pasid, hwaddr addr, uint8_t am, bool ih) +{ +} + +static bool vtd_process_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + uint16_t domain_id; + uint32_t pasid; + uint8_t am; + hwaddr addr; + + if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) { + error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]); + switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) { + case VTD_INV_DESC_PIOTLB_ALL_IN_PASID: + vtd_piotlb_pasid_invalidate(s, domain_id, pasid); + break; + + case VTD_INV_DESC_PIOTLB_PSI_IN_PASID: + am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]); + addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]); + if (am > VTD_MAMV) { + error_report_once("Invalid am, > max am value, hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + vtd_piotlb_page_invalidate(s, domain_id, pasid, + addr, am, VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1])); + break; + + default: + error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -3271,6 +3324,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) break; case VTD_INV_DESC_PIOTLB: + trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_piotlb_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_WAIT: diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index d427895..17c6e84 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -457,6 +457,19 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) #define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) +#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID (2ULL << 4) +#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID (3ULL << 4) + +#define VTD_INV_DESC_PIOTLB_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PIOTLB_RSVD_VAL1 0xf80ULL + +#define VTD_INV_DESC_PIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PIOTLB_DID(val) (((val) >> 16) & \ + VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PIOTLB_ADDR(val) ((val) & ~0xfffULL) +#define VTD_INV_DESC_PIOTLB_AM(val) ((val) & 0x3fULL) +#define VTD_INV_DESC_PIOTLB_IH(val) (((val) >> 6) & 0x1) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; From patchwork Sat Feb 22 08:07:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397859 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8125138D for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D0B67208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727247AbgBVICE (ORCPT ); Sat, 22 Feb 2020 03:02:04 -0500 Received: from mga04.intel.com ([192.55.52.120]:65092 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727166AbgBVICD (ORCPT ); Sat, 22 Feb 2020 03:02:03 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547711" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 20/22] intel_iommu: propagate PASID-based iotlb invalidation to host Date: Sat, 22 Feb 2020 00:07:21 -0800 Message-Id: <1582358843-51931-21-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch propagates PASID-based iotlb invalidation to host. Intel VT-d 3.0 supports nested translation in PASID granular. Guest SVA support could be implemented by configuring nested translation on specific PASID. This is also known as dual stage DMA translation. Under such configuration, guest owns the GVA->GPA translation which is configured as first level page table in host side for a specific pasid, and host owns GPA->HPA translation. As guest owns first level translation table, piotlb invalidation should be propagated to host since host IOMMU will cache first level page table related mappings during DMA address translation. This patch traps the guest PASID-based iotlb flush and propagate it to host. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 131 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 7 +++ 2 files changed, 138 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index b712eae..e6326ef 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3157,15 +3157,146 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return (ret == 0) ? true : false; } +static void vtd_invalidate_piotlb(IntelIOMMUState *s, + VTDBus *vtd_bus, + int devfn, + DualIOMMUStage1Cache *stage1_cache) +{ + PCIDevice *dev; + HostIOMMUContext *host_icx; + dev = vtd_bus->bus->devices[devfn]; + + host_icx = pci_device_host_iommu_context(dev); + if (!host_icx) { + return; + } + if (host_iommu_ctx_flush_stage1_cache(host_icx, stage1_cache)) { + error_report("Cache flush failed"); + } +} + +static inline bool vtd_pasid_cache_valid( + VTDPASIDAddressSpace *vtd_pasid_as) +{ + return vtd_pasid_as->iommu_state && + (vtd_pasid_as->iommu_state->pasid_cache_gen + == vtd_pasid_as->pasid_cache_entry.pasid_cache_gen); +} + +/** + * This function is a loop function for the s->vtd_pasid_as + * list with VTDPIOTLBInvInfo as execution filter. It propagates + * the piotlb invalidation to host. Caller of this function + * should hold iommu_lock. + */ +static void vtd_flush_pasid_iotlb(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPIOTLBInvInfo *piotlb_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + uint16_t did; + + /* + * Needs to check whether the pasid entry cache stored in + * vtd_pasid_as is valid or not. "invalid" means the pasid + * cache has been flushed, thus host should have done piotlb + * invalidation together with a pasid cache invalidation, so + * no need to pass down piotlb invalidation to host for better + * performance. Only when pasid entry cache is "valid", should + * a piotlb invalidation be propagated to host since it means + * guest just modified a mapping in its page table. + */ + if (!vtd_pasid_cache_valid(vtd_pasid_as)) { + return; + } + + did = vtd_pe_get_domain_id( + &(vtd_pasid_as->pasid_cache_entry.pasid_entry)); + + if ((piotlb_info->domain_id == did) && + (piotlb_info->pasid == vtd_pasid_as->pasid)) { + vtd_invalidate_piotlb(vtd_pasid_as->iommu_state, + vtd_pasid_as->vtd_bus, + vtd_pasid_as->devfn, + piotlb_info->stage1_cache); + } + + /* + * TODO: needs to add QEMU piotlb flush when QEMU piotlb + * infrastructure is ready. For now, it is enough for passthru + * devices. + */ +} + static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid) { + VTDPIOTLBInvInfo piotlb_info; + DualIOMMUStage1Cache *stage1_cache; + struct iommu_cache_invalidate_info *cache_info; + + stage1_cache = g_malloc0(sizeof(*stage1_cache)); + stage1_cache->pasid = pasid; + + cache_info = &stage1_cache->cache_info; + cache_info->version = IOMMU_UAPI_VERSION; + cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB; + cache_info->granularity = IOMMU_INV_GRANU_PASID; + cache_info->pasid_info.pasid = pasid; + cache_info->pasid_info.flags = IOMMU_INV_PASID_FLAGS_PASID; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.stage1_cache = stage1_cache; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); + g_free(stage1_cache); } static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid, hwaddr addr, uint8_t am, bool ih) { + VTDPIOTLBInvInfo piotlb_info; + DualIOMMUStage1Cache *stage1_cache; + struct iommu_cache_invalidate_info *cache_info; + + stage1_cache = g_malloc0(sizeof(*stage1_cache)); + stage1_cache->pasid = pasid; + + cache_info = &stage1_cache->cache_info; + cache_info->version = IOMMU_UAPI_VERSION; + cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB; + cache_info->granularity = IOMMU_INV_GRANU_ADDR; + cache_info->addr_info.flags = IOMMU_INV_ADDR_FLAGS_PASID; + cache_info->addr_info.flags |= ih ? IOMMU_INV_ADDR_FLAGS_LEAF : 0; + cache_info->addr_info.pasid = pasid; + cache_info->addr_info.addr = addr; + cache_info->addr_info.granule_size = 1 << (12 + am); + cache_info->addr_info.nb_granules = 1; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.stage1_cache = stage1_cache; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); + g_free(stage1_cache); } static bool vtd_process_piotlb_desc(IntelIOMMUState *s, diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 17c6e84..bd241cb 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -526,6 +526,13 @@ struct VTDPASIDCacheInfo { VTD_PASID_CACHE_DEVSI) typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; +struct VTDPIOTLBInvInfo { + uint16_t domain_id; + uint32_t pasid; + DualIOMMUStage1Cache *stage1_cache; +}; +typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo; + /* Masks for struct VTDRootEntry */ #define VTD_ROOT_ENTRY_P 1ULL #define VTD_ROOT_ENTRY_CTP (~0xfffULL) From patchwork Sat Feb 22 08:07:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C765617F0 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B12F2208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727225AbgBVICD (ORCPT ); Sat, 22 Feb 2020 03:02:03 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727174AbgBVICB (ORCPT ); Sat, 22 Feb 2020 03:02:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547714" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 21/22] intel_iommu: process PASID-based Device-TLB invalidation Date: Sat, 22 Feb 2020 00:07:22 -0800 Message-Id: <1582358843-51931-22-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds an empty handling for PASID-based Device-TLB invalidation. For now it is enough as it is not necessary to propagate it to host for passthru device and also there is no emulated device has device tlb. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 18 ++++++++++++++++++ hw/i386/intel_iommu_internal.h | 1 + 2 files changed, 19 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index e6326ef..f5faa75 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3354,6 +3354,17 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, return true; } +static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + /* + * no need to handle it for passthru device, for emulated + * devices with device tlb, it may be required, but for now, + * return is enough + */ + return true; +} + static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -3475,6 +3486,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; + case VTD_INV_DESC_DEV_PIOTLB: + trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo); + if (!vtd_process_device_piotlb_desc(s, &inv_desc)) { + return false; + } + break; + case VTD_INV_DESC_DEVICE: trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo); if (!vtd_process_device_iotlb_desc(s, &inv_desc)) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index bd241cb..dfb54fc 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -386,6 +386,7 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ #define VTD_INV_DESC_PIOTLB 0x6 /* PASID-IOTLB Invalidate Desc */ #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ +#define VTD_INV_DESC_DEV_PIOTLB 0x8 /* PASID-based-DIOTLB inv_desc*/ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ From patchwork Sat Feb 22 08:07:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11397855 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A804992A for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 91815208C3 for ; Sat, 22 Feb 2020 08:02:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727238AbgBVICD (ORCPT ); Sat, 22 Feb 2020 03:02:03 -0500 Received: from mga05.intel.com ([192.55.52.43]:63018 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727181AbgBVICC (ORCPT ); Sat, 22 Feb 2020 03:02:02 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Feb 2020 00:01:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,471,1574150400"; d="scan'208";a="240547717" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga006.jf.intel.com with ESMTP; 22 Feb 2020 00:01:57 -0800 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com Cc: pbonzini@redhat.com, mst@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v3.1 22/22] intel_iommu: modify x-scalable-mode to be string option Date: Sat, 22 Feb 2020 00:07:23 -0800 Message-Id: <1582358843-51931-23-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> References: <1582358843-51931-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities related to scalable mode translation, thus there are multiple combinations. While this vIOMMU implementation wants simplify it for user by providing typical combinations. User could config it by "x-scalable-mode" option. The usage is as below: "-device intel-iommu,x-scalable-mode=["legacy"|"modern"|"off"]" - "legacy": gives support for SL page table - "modern": gives support for FL page table, pasid, virtual command - "off": no scalable mode support - if not configured, means no scalable mode support, if not proper configured, will throw error Note: this patch is supposed to be merged when the whole vSVA patch series were merged. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L Signed-off-by: Yi Sun --- hw/i386/intel_iommu.c | 29 +++++++++++++++++++++++++++-- hw/i386/intel_iommu_internal.h | 4 ++++ include/hw/i386/intel_iommu.h | 2 ++ 3 files changed, 33 insertions(+), 2 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index f5faa75..51b00ee 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4205,7 +4205,7 @@ static Property vtd_properties[] = { DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits, VTD_HOST_ADDRESS_WIDTH), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), - DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE), + DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode_str), DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true), DEFINE_PROP_END_OF_LIST(), }; @@ -4768,8 +4768,12 @@ static void vtd_init(IntelIOMMUState *s) } /* TODO: read cap/ecap from host to decide which cap to be exposed. */ - if (s->scalable_mode) { + if (s->scalable_mode && !s->scalable_modern) { s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS; + } else if (s->scalable_mode && s->scalable_modern) { + s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID + | VTD_ECAP_FLTS | VTD_ECAP_PSS | VTD_ECAP_VCS; + s->vccap |= VTD_VCCAP_PAS; } vtd_reset_caches(s); @@ -4895,6 +4899,27 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) return false; } + if (s->scalable_mode_str && + (strcmp(s->scalable_mode_str, "modern") && + strcmp(s->scalable_mode_str, "legacy"))) { + error_setg(errp, "Invalid x-scalable-mode config," + "Please use \"modern\", \"legacy\" or \"off\""); + return false; + } + + if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "legacy")) { + s->scalable_mode = true; + s->scalable_modern = false; + } else if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "modern")) { + s->scalable_mode = true; + s->scalable_modern = true; + } else { + s->scalable_mode = false; + s->scalable_modern = false; + } + return true; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index dfb54fc..f7de046 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -196,8 +196,12 @@ #define VTD_ECAP_PT (1ULL << 6) #define VTD_ECAP_MHMV (15ULL << 20) #define VTD_ECAP_SRS (1ULL << 31) +#define VTD_ECAP_PSS (19ULL << 35) +#define VTD_ECAP_PASID (1ULL << 40) #define VTD_ECAP_SMTS (1ULL << 43) +#define VTD_ECAP_VCS (1ULL << 44) #define VTD_ECAP_SLTS (1ULL << 46) +#define VTD_ECAP_FLTS (1ULL << 47) /* CAP_REG */ /* (offset >> 4) << 24 */ diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index ff41af0..94ead20 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -259,6 +259,8 @@ struct IntelIOMMUState { bool caching_mode; /* RO - is cap CM enabled? */ bool scalable_mode; /* RO - is Scalable Mode supported? */ + char *scalable_mode_str; /* RO - admin's Scalable Mode config */ + bool scalable_modern; /* RO - is modern SM supported? */ dma_addr_t root; /* Current root table pointer */ bool root_scalable; /* Type of root table (scalable or not) */