[v4,11/11] xen/arm: smmuv3: Add support for SMMUv3 driver

Message ID	9ba174516e01e2a53390420fa957ea563219141f.1610115608.git.rahul.singh@arm.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=/Lw+=GL=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABAA22388B Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Rahul Singh <rahul.singh@arm.com> To: xen-devel@lists.xenproject.org Cc: bertrand.marquis@arm.com, rahul.singh@arm.com, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Ian Jackson <iwj@xenproject.org>, Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>, Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>, Paul Durrant <paul@xen.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Subject: [PATCH v4 11/11] xen/arm: smmuv3: Add support for SMMUv3 driver Date: Fri, 8 Jan 2021 14:46:31 +0000 Message-Id: <9ba174516e01e2a53390420fa957ea563219141f.1610115608.git.rahul.singh@arm.com> In-Reply-To: <cover.1610115608.git.rahul.singh@arm.com> References: <cover.1610115608.git.rahul.singh@arm.com> In-Reply-To: <cover.1610115608.git.rahul.singh@arm.com> References: <cover.1610115608.git.rahul.singh@arm.com>
Series	xen/arm: Add support for SMMUv3 driver \| expand [v4,00/11] xen/arm: Add support for SMMUv3 driver [v4,01/11] xen/arm: smmuv3: Import the SMMUv3 driver from Linux [v4,02/11] xen/arm: Revert atomic operation related command-queue insertion patch [v4,03/11] xen/arm: smmuv3: Revert patch related to XArray [v4,04/11] xen/arm: smmuv3: Remove support for Stage-1 translation on SMMUv3. [v4,05/11] xen/arm: smmuv3: Remove Linux specific code that is not usable in XEN [v4,06/11] xen/device-tree: Add dt_property_match_string helper [v4,07/11] xen/arm: bitops: Implement a ffsll function [v4,08/11] xen/compiler: import 'fallthrough' keyword from linux [v4,09/11] xen/arm: smmuv3: Use fallthrough pseudo-keyword [v4,10/11] xen/arm: smmuv3: Replace linux functions with xen functions. [v4,11/11] xen/arm: smmuv3: Add support for SMMUv3 driver

Rahul Singh Jan. 8, 2021, 2:46 p.m. UTC

Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.

Driver is currently supported as Tech Preview.

Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
   that supports both Stage-1 and Stage-2 translations.
3. Use P2M  page table instead of creating one as SMMUv3 has the
   capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
   and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
   access functions based on atomic operations implemented in Linux.
   Atomic functions used by the commands queue access functions are not
   implemented in XEN therefore we decided to port the earlier version
   of the code. Atomic operations are introduced to fix the bottleneck
   of the SMMU command queue insertion operation. A new algorithm for
   inserting commands into the queue is introduced, which is lock-free
   on the fast-path.
   Consequence of reverting the patch is that the command queue
   insertion will be slow for large systems as spinlock will be used to
   serializes accesses from all CPUs to the single queue supported by
   the hardware. Once the proper atomic operations will be available in
   XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
   SMMU, as there is no blocking locks implementation available in XEN.
   This might introduce latency in XEN. Need to investigate before
   driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
   available in XEN to test the functionality. Code is not tested and
   compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
   XEN to request MSI interrupts. Code is not tested and compiled. Code
   is guarded by the flag CONFIG_MSI.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
---
Changes in V3:
- added return statement for readx_poll_timeout function.
- remove iommu_get_dma_cookie and iommu_put_dma_cookie.
- remove struct arm_smmu_xen_device as not required.
- move dt_property_match_string to device_tree.c file.
- replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
- use ARM_SMMU_REG_SZ as size when map memory to XEN.
- remove bypass keyword to make sure when device-tree probe is failed we
  are reporting error and not continuing to configure SMMU in bypass
  mode.
- fixed minor comments.
Changes in V4:
- Fixed typo for CONFIG_MSI
- Added back the mutex code
- Rebase the patch on top of newly added WARN_ON().
- Remove the direct read of register VTCR_EL2.
- Fixed minor comments.
---
 MAINTAINERS                           |   6 +
 SUPPORT.md                            |   1 +
 xen/drivers/passthrough/Kconfig       |  11 +
 xen/drivers/passthrough/arm/Makefile  |   1 +
 xen/drivers/passthrough/arm/smmu-v3.c | 808 ++++++++++++++++++++++----
 5 files changed, 715 insertions(+), 112 deletions(-)

Stefano Stabellini Jan. 9, 2021, 1:45 a.m. UTC | #1

On Fri, 8 Jan 2021, Rahul Singh wrote:
> Add support for ARM architected SMMUv3 implementation. It is based on
> the Linux SMMUv3 driver.
> 
> Driver is currently supported as Tech Preview.
> 
> Major differences with regard to Linux driver are as follows:
> 2. Only Stage-2 translation is supported as compared to the Linux driver
>    that supports both Stage-1 and Stage-2 translations.
> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>    capability to share the page tables with the CPU.
> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>    and priority queue IRQ handling.
> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>    access functions based on atomic operations implemented in Linux.
>    Atomic functions used by the commands queue access functions are not
>    implemented in XEN therefore we decided to port the earlier version
>    of the code. Atomic operations are introduced to fix the bottleneck
>    of the SMMU command queue insertion operation. A new algorithm for
>    inserting commands into the queue is introduced, which is lock-free
>    on the fast-path.
>    Consequence of reverting the patch is that the command queue
>    insertion will be slow for large systems as spinlock will be used to
>    serializes accesses from all CPUs to the single queue supported by
>    the hardware. Once the proper atomic operations will be available in
>    XEN the driver can be updated.
> 6. Spin lock is used in place of mutex when attaching a device to the
>    SMMU, as there is no blocking locks implementation available in XEN.
>    This might introduce latency in XEN. Need to investigate before
>    driver is out for tech preview.
> 7. PCI ATS functionality is not supported, as there is no support
>    available in XEN to test the functionality. Code is not tested and
>    compiled. Code is guarded by the flag CONFIG_PCI_ATS.
> 8. MSI interrupts are not supported as there is no support available in
>    XEN to request MSI interrupts. Code is not tested and compiled. Code
>    is guarded by the flag CONFIG_MSI.
>
> Signed-off-by: Rahul Singh <rahul.singh@arm.com>

Going through the code you addressed all my previous comments and I
don't have any to add, so:

Acked-by: Stefano Stabellini <sstabellini@kernel.org>

Oleksandr Tyshchenko Jan. 11, 2021, 4:09 p.m. UTC | #2

On 08.01.21 16:46, Rahul Singh wrote:

Hi Rahul

> Add support for ARM architected SMMUv3 implementation. It is based on
> the Linux SMMUv3 driver.
>
> Driver is currently supported as Tech Preview.
>
> Major differences with regard to Linux driver are as follows:
> 2. Only Stage-2 translation is supported as compared to the Linux driver
>     that supports both Stage-1 and Stage-2 translations.
> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>     capability to share the page tables with the CPU.
> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>     and priority queue IRQ handling.
> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>     access functions based on atomic operations implemented in Linux.
>     Atomic functions used by the commands queue access functions are not
>     implemented in XEN therefore we decided to port the earlier version
>     of the code. Atomic operations are introduced to fix the bottleneck
>     of the SMMU command queue insertion operation. A new algorithm for
>     inserting commands into the queue is introduced, which is lock-free
>     on the fast-path.
>     Consequence of reverting the patch is that the command queue
>     insertion will be slow for large systems as spinlock will be used to
>     serializes accesses from all CPUs to the single queue supported by
>     the hardware. Once the proper atomic operations will be available in
>     XEN the driver can be updated.
> 6. Spin lock is used in place of mutex when attaching a device to the
>     SMMU, as there is no blocking locks implementation available in XEN.
>     This might introduce latency in XEN. Need to investigate before
>     driver is out for tech preview.
> 7. PCI ATS functionality is not supported, as there is no support
>     available in XEN to test the functionality. Code is not tested and
>     compiled. Code is guarded by the flag CONFIG_PCI_ATS.
> 8. MSI interrupts are not supported as there is no support available in
>     XEN to request MSI interrupts. Code is not tested and compiled. Code
>     is guarded by the flag CONFIG_MSI.
>
> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
> ---
> Changes in V3:
> - added return statement for readx_poll_timeout function.
> - remove iommu_get_dma_cookie and iommu_put_dma_cookie.
> - remove struct arm_smmu_xen_device as not required.
> - move dt_property_match_string to device_tree.c file.
> - replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
> - use ARM_SMMU_REG_SZ as size when map memory to XEN.
> - remove bypass keyword to make sure when device-tree probe is failed we
>    are reporting error and not continuing to configure SMMU in bypass
>    mode.
> - fixed minor comments.
> Changes in V4:
> - Fixed typo for CONFIG_MSI
> - Added back the mutex code
> - Rebase the patch on top of newly added WARN_ON().
> - Remove the direct read of register VTCR_EL2.
> - Fixed minor comments.
> ---
>   MAINTAINERS                           |   6 +
>   SUPPORT.md                            |   1 +
>   xen/drivers/passthrough/Kconfig       |  11 +
>   xen/drivers/passthrough/arm/Makefile  |   1 +
>   xen/drivers/passthrough/arm/smmu-v3.c | 808 ++++++++++++++++++++++----
>   5 files changed, 715 insertions(+), 112 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6dbd99aff4..d832e8fd65 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -249,6 +249,12 @@ F:	xen/include/asm-arm/
>   F:	xen/include/public/arch-arm/
>   F:	xen/include/public/arch-arm.h
>   
> +ARM SMMUv3
> +M:	Bertrand Marquis <bertrand.marquis@arm.com>
> +M:	Rahul Singh <rahul.singh@arm.com>
> +S:	Supported
> +F:	xen/drivers/passthrough/arm/smmu-v3.c
> +
>   Change Log
>   M:	Paul Durrant <paul@xen.org>
>   R:	Community Manager <community.manager@xenproject.org>
> diff --git a/SUPPORT.md b/SUPPORT.md
> index ab02aca5f4..5ee3c8651a 100644
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -67,6 +67,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
>       Status, Intel VT-d: Supported
>       Status, ARM SMMUv1: Supported, not security supported
>       Status, ARM SMMUv2: Supported, not security supported
> +    Status, ARM SMMUv3: Tech Preview
>       Status, Renesas IPMMU-VMSA: Supported, not security supported
>   
>   ### ARM/GICv3 ITS
> diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
> index 0036007ec4..341ba92b30 100644
> --- a/xen/drivers/passthrough/Kconfig
> +++ b/xen/drivers/passthrough/Kconfig
> @@ -13,6 +13,17 @@ config ARM_SMMU
>   	  Say Y here if your SoC includes an IOMMU device implementing the
>   	  ARM SMMU architecture.
>   
> +config ARM_SMMU_V3
> +	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
> +	depends on ARM_64
> +	---help---
> +	 Support for implementations of the ARM System MMU architecture
> +	 version 3. Driver is in experimental stage and should not be used in
> +	 production.
> +
> +	 Say Y here if your system includes an IOMMU device implementing
> +	 the ARM SMMUv3 architecture.
> +
>   config IPMMU_VMSA
>   	bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
>   	depends on ARM_64
> diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
> index fcd918ea3e..c5fb3b58a5 100644
> --- a/xen/drivers/passthrough/arm/Makefile
> +++ b/xen/drivers/passthrough/arm/Makefile
> @@ -1,3 +1,4 @@
>   obj-y += iommu.o iommu_helpers.o iommu_fwspec.o
>   obj-$(CONFIG_ARM_SMMU) += smmu.o
>   obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> +obj-$(CONFIG_ARM_SMMU_V3) += smmu-v3.o
> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
> index 2dfadc6a65..e53f9f4c21 100644
> --- a/xen/drivers/passthrough/arm/smmu-v3.c
> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
> @@ -2,37 +2,263 @@
>   /*
>    * IOMMU API for ARM architected SMMUv3 implementations.
>    *
> + * Based on Linux's SMMUv3 driver:
> + *    drivers/iommu/arm-smmu-v3.c
> + *    commit: ab435ce49bd1d02e33dfec24f76955dc1196970b
> + * and Xen's SMMU driver:
> + *    xen/drivers/passthrough/arm/smmu.c
> + *
> + * Major differences with regard to Linux driver are as follows:
> + *  1. Driver is currently supported as Tech Preview.
> + *  2. Only Stage-2 translation is supported as compared to the Linux driver
> + *     that supports both Stage-1 and Stage-2 translations.
> + *  3. Use P2M  page table instead of creating one as SMMUv3 has the
> + *     capability to share the page tables with the CPU.
> + *  4. Tasklets are used in place of threaded IRQ's in Linux for event queue
> + *     and priority queue IRQ handling.
> + *  5. Latest version of the Linux SMMUv3 code implements the commands queue
> + *     access functions based on atomic operations implemented in Linux.
> + *     Atomic functions used by the commands queue access functions are not
> + *     implemented in XEN therefore we decided to port the earlier version
> + *     of the code. Atomic operations are introduced to fix the bottleneck of
> + *     the SMMU command queue insertion operation. A new algorithm for
> + *     inserting commands into the queue is introduced, which is
> + *     lock-free on the fast-path.
> + *     Consequence of reverting the patch is that the command queue insertion
> + *     will be slow for large systems as spinlock will be used to serializes
> + *     accesses from all CPUs to the single queue supported by the hardware.
> + *     Once the proper atomic operations will be available in XEN the driver
> + *     can be updated.
> + *  6. Spin lock is used in place of Mutex when attaching a device to the SMMU,
> + *     as there is no blocking locks implementation available in XEN.This might
> + *     introduce latency in XEN. Need to investigate before driver is out for
> + *     Tech Preview.
> + *  7. PCI ATS functionality is not supported, as there is no support available
> + *     in XEN to test the functionality. Code is not tested and compiled. Code
> + *     is guarded by the flag CONFIG_PCI_ATS.
> + *  8. MSI interrupts are not supported as there is no support available
> + *     in XEN to request MSI interrupts. Code is not tested and compiled. Code
> + *     is guarded by the flag CONFIG_MSI.
> + *
> + * Following functionality should be supported before driver is out for tech
> + * preview
> + *
> + *  1. Investigate the timing analysis of using spin lock in place of mutex
> + *     when attaching devices to SMMU.
> + *  2. Merged the latest Linux SMMUv3 driver code once atomic operation is
> + *     available in XEN.
> + *  3. PCI ATS and MSI interrupts should be supported.
> + *  4. Investigate side-effect of using tasklet in place of threaded IRQ and
> + *     fix if any.
> + *
>    * Copyright (C) 2015 ARM Limited
>    *
>    * Author: Will Deacon <will.deacon@arm.com>
>    *
> - * This driver is powered by bad coffee and bombay mix.
> + * Copyright (C) 2020 Arm Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#include <xen/acpi.h>
> +#include <xen/config.h>
> +#include <xen/delay.h>
> +#include <xen/errno.h>
> +#include <xen/err.h>
> +#include <xen/irq.h>
> +#include <xen/lib.h>
> +#include <xen/list.h>
> +#include <xen/mm.h>
> +#include <xen/rbtree.h>
> +#include <xen/sched.h>
> +#include <xen/sizes.h>
> +#include <xen/vmap.h>
> +#include <asm/atomic.h>
> +#include <asm/device.h>
> +#include <asm/io.h>
> +#include <asm/iommu_fwspec.h>
> +#include <asm/platform.h>
> +
> +
> +#define ARM_SMMU_VTCR_SH_IS		3
> +#define ARM_SMMU_VTCR_RGN_WBWA		1
> +#define ARM_SMMU_VTCR_TG0_4K		0
> +#define ARM_SMMU_VTCR_PS_32_BIT		0x0ULL
> +#define ARM_SMMU_VTCR_PS_36_BIT		0x1ULL
> +#define ARM_SMMU_VTCR_PS_40_BIT		0x2ULL
> +#define ARM_SMMU_VTCR_PS_42_BIT		0x3ULL
> +#define ARM_SMMU_VTCR_PS_44_BIT		0x4ULL
> +#define ARM_SMMU_VTCR_PS_48_BIT		0x5ULL
> +#define ARM_SMMU_VTCR_PS_52_BIT		0x6ULL
> +
> +/* Linux compatibility functions. */
> +typedef paddr_t		dma_addr_t;
> +typedef paddr_t		phys_addr_t;
> +typedef unsigned int		gfp_t;
> +
> +#define platform_device		device
> +
> +#define GFP_KERNEL		0
> +
> +/* Alias to Xen lock functions */
> +#define mutex spinlock
> +#define mutex_init spin_lock_init
> +#define mutex_lock spin_lock
> +#define mutex_unlock spin_unlock
> +
> +/* Device logger functions */
> +#define dev_name(dev)	dt_node_full_name(dev->of_node)
> +#define dev_dbg(dev, fmt, ...)			\
> +	printk(XENLOG_DEBUG "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +#define dev_notice(dev, fmt, ...)		\
> +	printk(XENLOG_INFO "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +#define dev_warn(dev, fmt, ...)			\
> +	printk(XENLOG_WARNING "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +#define dev_err(dev, fmt, ...)			\
> +	printk(XENLOG_ERR "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +#define dev_info(dev, fmt, ...)			\
> +	printk(XENLOG_INFO "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +#define dev_err_ratelimited(dev, fmt, ...)			\
> +	printk(XENLOG_ERR "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
> +
> +/*
> + * Periodically poll an address and wait between reads in us until a
> + * condition is met or a timeout occurs.
> + *
> + * @return: 0 when cond met, -ETIMEDOUT upon timeout
> + */
> +#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \
> +({ \
> +	s_time_t deadline = NOW() + MICROSECS(timeout_us); \
> +	for (;;) { \
> +		(val) = op(addr); \
> +		if (cond) \
> +			break; \
> +		if (NOW() > deadline) { \
> +			(val) = op(addr); \
> +			break; \
> +		} \
> +		udelay(sleep_us); \
> +	} \
> +	(cond) ? 0 : -ETIMEDOUT; \
> +})
> +
> +#define readl_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us)	\
> +	readx_poll_timeout(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define FIELD_PREP(_mask, _val)			\
> +	(((typeof(_mask))(_val) << (ffsll(_mask) - 1)) & (_mask))
> +
> +#define FIELD_GET(_mask, _reg)			\
> +	((typeof(_mask))(((_reg) & (_mask)) >> (ffsll(_mask) - 1)))
> +
> +/*
> + * Helpers for DMA allocation. Just the function name is reused for
> + * porting code, these allocation are not managed allocations
>    */
> +static void *dmam_alloc_coherent(struct device *dev, size_t size,
> +				paddr_t *dma_handle, gfp_t gfp)
> +{
> +	void *vaddr;
> +	unsigned long alignment = size;
> +
> +	/*
> +	 * _xzalloc requires that the (align & (align -1)) = 0. Most of the
> +	 * allocations in SMMU code should send the right value for size. In
> +	 * case this is not true print a warning and align to the size of a
> +	 * (void *)
> +	 */
> +	if (size & (size - 1)) {
> +		printk(XENLOG_WARNING "SMMUv3: Fixing alignment for the DMA buffer\n");
> +		alignment = sizeof(void *);
> +	}
> +
> +	vaddr = _xzalloc(size, alignment);
> +	if (!vaddr) {
> +		printk(XENLOG_ERR "SMMUv3: DMA allocation failed\n");
> +		return NULL;
> +	}
> +
> +	*dma_handle = virt_to_maddr(vaddr);
> +
> +	return vaddr;
> +}
> +
> +
> +/* Xen specific code. */
> +struct iommu_domain {
> +	/* Runtime SMMU configuration for this iommu_domain */
> +	atomic_t		ref;
> +	/*
> +	 * Used to link iommu_domain contexts for a same domain.
> +	 * There is at least one per-SMMU to used by the domain.
> +	 */
> +	struct list_head		list;
> +};
>   
> -#include <linux/acpi.h>
> -#include <linux/acpi_iort.h>
> -#include <linux/bitfield.h>
> -#include <linux/bitops.h>
> -#include <linux/crash_dump.h>
> -#include <linux/delay.h>
> -#include <linux/dma-iommu.h>
> -#include <linux/err.h>
> -#include <linux/interrupt.h>
> -#include <linux/io-pgtable.h>
> -#include <linux/iommu.h>
> -#include <linux/iopoll.h>
> -#include <linux/module.h>
> -#include <linux/msi.h>
> -#include <linux/of.h>
> -#include <linux/of_address.h>
> -#include <linux/of_iommu.h>
> -#include <linux/of_platform.h>
> -#include <linux/pci.h>
> -#include <linux/pci-ats.h>
> -#include <linux/platform_device.h>
> -
> -#include <linux/amba/bus.h>
> +/* Describes information required for a Xen domain */
> +struct arm_smmu_xen_domain {
> +	spinlock_t		lock;
>   
> +	/* List of iommu domains associated to this domain */
> +	struct list_head	contexts;
> +};
> +
> +
> +/* Keep a list of devices associated with this driver */
> +static DEFINE_SPINLOCK(arm_smmu_devices_lock);
> +static LIST_HEAD(arm_smmu_devices);
> +
> +static inline void *dev_iommu_priv_get(struct device *dev)
> +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	return fwspec && fwspec->iommu_priv ? fwspec->iommu_priv : NULL;
> +}
> +
> +static inline void dev_iommu_priv_set(struct device *dev, void *priv)
> +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +
> +	fwspec->iommu_priv = priv;
> +}
> +
> +static int platform_get_irq_byname_optional(struct device *dev,
> +				const char *name)
> +{
> +	int index, ret;
> +	struct dt_device_node *np  = dev_to_dt(dev);
> +
> +	if (unlikely(!name))
> +		return -EINVAL;
> +
> +	index = dt_property_match_string(np, "interrupt-names", name);
> +	if (index < 0) {
> +		dev_info(dev, "IRQ %s not found\n", name);
> +		return index;
> +	}
> +
> +	ret = platform_get_irq(np, index);
> +	if (ret < 0) {
> +		dev_err(dev, "failed to get irq index %d\n", index);
> +		return -ENODEV;
> +	}
> +
> +	return ret;
> +}
> +
> +/* Start of Linux SMMUv3 code */
>   /* MMIO registers */
>   #define ARM_SMMU_IDR0			0x0
>   #define IDR0_ST_LVL			GENMASK(28, 27)
> @@ -402,6 +628,7 @@ enum pri_resp {
>   	PRI_RESP_SUCC = 2,
>   };
>   
> +#ifdef CONFIG_MSI
>   enum arm_smmu_msi_index {
>   	EVTQ_MSI_INDEX,
>   	GERROR_MSI_INDEX,
> @@ -426,6 +653,7 @@ static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>   		ARM_SMMU_PRIQ_IRQ_CFG2,
>   	},
>   };
> +#endif /* CONFIG_MSI */
>   
>   struct arm_smmu_cmdq_ent {
>   	/* Common fields */
> @@ -613,8 +841,13 @@ struct arm_smmu_device {
>   		u64			padding;
>   	};
>   
> -	/* IOMMU core code handle */
> -	struct iommu_device		iommu;
> +	/* Need to keep a list of SMMU devices */
> +	struct list_head		devices;
> +
> +	/* Tasklets for handling evts/faults and pci page request IRQs*/
> +	struct tasklet		evtq_irq_tasklet;
> +	struct tasklet		priq_irq_tasklet;
> +	struct tasklet		combined_irq_tasklet;
>   };
>   
>   /* SMMU private data for each master */
> @@ -646,6 +879,9 @@ struct arm_smmu_domain {
>   	enum arm_smmu_domain_stage	stage;
>   	struct arm_smmu_s2_cfg	s2_cfg;
>   
> +	/* Xen domain associated with this SMMU domain */
> +	struct domain		*d;
> +
>   	struct iommu_domain		domain;
>   
>   	struct list_head		devices;
> @@ -984,6 +1220,7 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>   	spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
>   }
>   
> +#ifdef CONFIG_MSI
>   /*
>    * The difference between val and sync_idx is bounded by the maximum size of
>    * a queue at 2^20 entries, so 32 bits is plenty for wrap-safe arithmetic.
> @@ -1027,6 +1264,13 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct arm_smmu_device *smmu)
>   
>   	return __arm_smmu_sync_poll_msi(smmu, ent.sync.msidata);
>   }
> +#else
> +static inline int __arm_smmu_cmdq_issue_sync_msi(struct arm_smmu_device *smmu)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_MSI */
> +
>   
>   static int __arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>   {
> @@ -1069,7 +1313,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
>   	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>   
>   	/* See comment in arm_smmu_write_ctx_desc() */
> -	WRITE_ONCE(*dst, cpu_to_le64(val));
> +	write_atomic(dst, cpu_to_le64(val));
>   }
>   
>   static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
> @@ -1184,7 +1428,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   						 STRTAB_STE_1_EATS_TRANS));
>   
>   	arm_smmu_sync_ste_for_sid(smmu, sid);
> -	WRITE_ONCE(dst[0], cpu_to_le64(val));
> +	write_atomic(&dst[0], cpu_to_le64(val));
>   	arm_smmu_sync_ste_for_sid(smmu, sid);
>   
>   	/* It's likely that we'll want to use the new STE soon */
> @@ -1231,7 +1475,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>   }
>   
>   /* IRQ and event handlers */
> -static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
> +static void arm_smmu_evtq_tasklet(void *dev)
>   {
>   	int i;
>   	struct arm_smmu_device *smmu = dev;
> @@ -1261,7 +1505,6 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>   	/* Sync our overflow flag, as we believe we're up to speed */
>   	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
>   		    Q_IDX(llq, llq->cons);
> -	return IRQ_HANDLED;
>   }
>   
>   static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
> @@ -1302,7 +1545,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>   	}
>   }
>   
> -static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
> +static void arm_smmu_priq_tasklet(void *dev)
>   {
>   	struct arm_smmu_device *smmu = dev;
>   	struct arm_smmu_queue *q = &smmu->priq.q;
> @@ -1321,12 +1564,12 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>   	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
>   		      Q_IDX(llq, llq->cons);
>   	queue_sync_cons_out(q);
> -	return IRQ_HANDLED;
>   }
>   
>   static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>   
> -static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> +static void arm_smmu_gerror_handler(int irq, void *dev,
> +				struct cpu_user_regs *regs)
>   {
>   	u32 gerror, gerrorn, active;
>   	struct arm_smmu_device *smmu = dev;
> @@ -1336,7 +1579,7 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
>   
>   	active = gerror ^ gerrorn;
>   	if (!(active & GERROR_ERR_MASK))
> -		return IRQ_NONE; /* No errors pending */
> +		return; /* No errors pending */
>   
>   	dev_warn(smmu->dev,
>   		 "unexpected global error reported (0x%08x), this could be serious\n",
> @@ -1369,26 +1612,44 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
>   		arm_smmu_cmdq_skip_err(smmu);
>   
>   	writel(gerror, smmu->base + ARM_SMMU_GERRORN);
> -	return IRQ_HANDLED;
>   }
>   
> -static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev)
> +static void arm_smmu_combined_irq_handler(int irq, void *dev,
> +				struct cpu_user_regs *regs)
> +{
> +	struct arm_smmu_device *smmu = dev;
> +
> +	arm_smmu_gerror_handler(irq, dev, regs);
> +
> +	tasklet_schedule(&(smmu->combined_irq_tasklet));
> +}
> +
> +static void arm_smmu_combined_irq_tasklet(void *dev)
>   {
>   	struct arm_smmu_device *smmu = dev;
>   
> -	arm_smmu_evtq_thread(irq, dev);
> +	arm_smmu_evtq_tasklet(dev);
>   	if (smmu->features & ARM_SMMU_FEAT_PRI)
> -		arm_smmu_priq_thread(irq, dev);
> +		arm_smmu_priq_tasklet(dev);
> +}
> +
> +static void arm_smmu_evtq_irq_tasklet(int irq, void *dev,
> +				struct cpu_user_regs *regs)
> +{
> +	struct arm_smmu_device *smmu = dev;
>   
> -	return IRQ_HANDLED;
> +	tasklet_schedule(&(smmu->evtq_irq_tasklet));
>   }
>   
> -static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
> +static void arm_smmu_priq_irq_tasklet(int irq, void *dev,
> +				struct cpu_user_regs *regs)
>   {
> -	arm_smmu_gerror_handler(irq, dev);
> -	return IRQ_WAKE_THREAD;
> +	struct arm_smmu_device *smmu = dev;
> +
> +	tasklet_schedule(&(smmu->priq_irq_tasklet));
>   }
>   
> +#ifdef CONFIG_PCI_ATS
>   static void
>   arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
>   			struct arm_smmu_cmdq_ent *cmd)
> @@ -1495,6 +1756,7 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
>   
>   	return ret ? -ETIMEDOUT : 0;
>   }
> +#endif /* CONFIG_PCI_ATS */
>   
>   static void arm_smmu_tlb_inv_context(void *cookie)
>   {
> @@ -1525,7 +1787,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(void)
>   	 * We can't really do anything meaningful until we've added a
>   	 * master.
>   	 */
> -	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
> +	smmu_domain = xzalloc(struct arm_smmu_domain);
>   	if (!smmu_domain)
>   		return NULL;
>   
> @@ -1563,7 +1825,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>   	if (cfg->vmid)
>   		arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
>   
> -	kfree(smmu_domain);
> +	xfree(smmu_domain);
>   }
>   
>   
> @@ -1576,6 +1838,44 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>   	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
>   	typeof(&arm_lpae_s2_cfg.vtcr) vtcr = &arm_lpae_s2_cfg.vtcr;
>   
> +	vtcr->sh = ARM_SMMU_VTCR_SH_IS;
> +	vtcr->irgn = ARM_SMMU_VTCR_RGN_WBWA;
> +	vtcr->orgn = ARM_SMMU_VTCR_RGN_WBWA;
> +
> +	BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
> +	vtcr->tg = ARM_SMMU_VTCR_TG0_4K;
> +
> +	switch (smmu->oas) {
> +	case 32:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_32_BIT;
> +		break;
> +	case 36:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_36_BIT;
> +		break;
> +	case 40:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_40_BIT;
> +		break;
> +	case 42:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_42_BIT;
> +		break;
> +	case 44:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_44_BIT;
> +		break;
> +	case 48:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_48_BIT;
> +		break;
> +	case 52:
> +		vtcr->ps = ARM_SMMU_VTCR_PS_52_BIT;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	vtcr->tsz = 64 - p2m_ipa_bits;
> +	vtcr->sl = 2 - P2M_ROOT_LEVEL;
> +
> +	arm_lpae_s2_cfg.vttbr  = page_to_maddr(smmu_domain->d->arch.p2m.root);
> +
>   	vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
>   	if (vmid < 0)
>   		return vmid;
> @@ -1589,6 +1889,11 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>   			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
>   			  FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
>   			  FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
> +
> +	printk(XENLOG_DEBUG
> +		   "SMMUv3: d%u: vmid 0x%x vtcr 0x%"PRIpaddr" p2maddr 0x%"PRIpaddr"\n",
> +		   smmu_domain->d->domain_id, cfg->vmid, cfg->vtcr, cfg->vttbr);
> +
>   	return 0;
>   }
>   
> @@ -1650,6 +1955,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
>   	}
>   }
>   
> +#ifdef CONFIG_PCI_ATS
>   static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>   {
>   	struct device *dev = master->dev;
> @@ -1748,6 +2054,23 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>   
>   	pci_disable_pasid(pdev);
>   }
> +#else
> +static inline bool arm_smmu_ats_supported(struct arm_smmu_master *master)
> +{
> +	return false;
> +}
> +
> +static inline void arm_smmu_enable_ats(struct arm_smmu_master *master) { }
> +
> +static inline void arm_smmu_disable_ats(struct arm_smmu_master *master) { }
> +
> +static inline int arm_smmu_enable_pasid(struct arm_smmu_master *master)
> +{
> +	return 0;
> +}
> +
> +static inline void arm_smmu_disable_pasid(struct arm_smmu_master *master) { }
> +#endif /* CONFIG_PCI_ATS */
>   
>   static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>   {
> @@ -1830,8 +2153,10 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>   
>   	return sid < limit;
>   }
> +/* Forward declaration */
> +static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev);
>   
> -static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> +static int arm_smmu_add_device(u8 devfn, struct device *dev)
>   {
>   	int i, ret;
>   	struct arm_smmu_device *smmu;
> @@ -1839,14 +2164,15 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>   	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>   
>   	if (!fwspec)
> -		return ERR_PTR(-ENODEV);
> +		return -ENODEV;
>   
> -	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
> -		return ERR_PTR(-EBUSY);
> +	smmu = arm_smmu_get_by_dev(fwspec->iommu_dev);
> +	if (!smmu)
> +		return -ENODEV;
>   
> -	master = kzalloc(sizeof(*master), GFP_KERNEL);
> +	master = xzalloc(struct arm_smmu_master);
>   	if (!master)
> -		return ERR_PTR(-ENOMEM);
> +		return -ENOMEM;
>   
>   	master->dev = dev;
>   	master->smmu = smmu;
> @@ -1881,17 +2207,36 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>   	 */
>   	arm_smmu_enable_pasid(master);
>   
> -	return &smmu->iommu;
> +	return 0;
>   
>   err_free_master:
> -	kfree(master);
> +	xfree(master);
>   	dev_iommu_priv_set(dev, NULL);
> -	return ERR_PTR(ret);
> +	return ret;
>   }
>   
> -static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
> +static int arm_smmu_dt_xlate(struct device *dev,
> +				const struct dt_phandle_args *args)
>   {
> -	return iommu_fwspec_add_ids(dev, args->args, 1);
> +	int ret;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
Please bear in mind I am not familiar with the SMMU, but don't we need 
to perform a some kind
of sanity check of passed DT IOMMU specifier here?


> +
> +	ret = iommu_fwspec_add_ids(dev, args->args, 1);
> +	if (ret)
> +		return ret;
> +
> +	if (dt_device_is_protected(dev_to_dt(dev))) {
> +		dev_err(dev, "Already added to SMMUv3\n");
> +		return -EEXIST;
> +	}
> +
> +	/* Let Xen know that the master device is protected by an IOMMU. */
> +	dt_device_set_protected(dev_to_dt(dev));
> +
> +	dev_info(dev, "Added master device (SMMUv3 %s StreamIds %u)\n",
> +			dev_name(fwspec->iommu_dev), fwspec->num_ids);
> +
> +	return 0;
>   }
>   
>   /* Probing and initialisation functions */
> @@ -1972,7 +2317,7 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
>   	size_t size = sizeof(*cfg->l1_desc) * cfg->num_l1_ents;
>   	void *strtab = smmu->strtab_cfg.strtab;
>   
> -	cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);
> +	cfg->l1_desc = _xzalloc(size, sizeof(void *));
>   	if (!cfg->l1_desc) {
>   		dev_err(smmu->dev, "failed to allocate l1 stream table desc\n");
>   		return -ENOMEM;
> @@ -2118,6 +2463,7 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
>   	return ret;
>   }
>   
> +#ifdef CONFIG_MSI
>   static void arm_smmu_free_msis(void *data)
>   {
>   	struct device *dev = data;
> @@ -2188,6 +2534,9 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
>   	/* Add callback to free MSIs on teardown */
>   	devm_add_action(dev, arm_smmu_free_msis, dev);
>   }
> +#else
> +static inline void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { }
> +#endif /* CONFIG_MSI */
>   
>   static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>   {
> @@ -2198,9 +2547,7 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>   	/* Request interrupt lines */
>   	irq = smmu->evtq.q.irq;
>   	if (irq) {
> -		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
> -						arm_smmu_evtq_thread,
> -						IRQF_ONESHOT,
> +		ret = request_irq(irq, 0, arm_smmu_evtq_irq_tasklet,
>   						"arm-smmu-v3-evtq", smmu);
>   		if (ret < 0)
>   			dev_warn(smmu->dev, "failed to enable evtq irq\n");
> @@ -2210,8 +2557,8 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>   
>   	irq = smmu->gerr_irq;
>   	if (irq) {
> -		ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
> -				       0, "arm-smmu-v3-gerror", smmu);
> +		ret = request_irq(irq, 0, arm_smmu_gerror_handler,
> +						"arm-smmu-v3-gerror", smmu);
>   		if (ret < 0)
>   			dev_warn(smmu->dev, "failed to enable gerror irq\n");
>   	} else {
> @@ -2221,11 +2568,8 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>   	if (smmu->features & ARM_SMMU_FEAT_PRI) {
>   		irq = smmu->priq.q.irq;
>   		if (irq) {
> -			ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
> -							arm_smmu_priq_thread,
> -							IRQF_ONESHOT,
> -							"arm-smmu-v3-priq",
> -							smmu);
> +			ret = request_irq(irq, 0, arm_smmu_priq_irq_tasklet,
> +							"arm-smmu-v3-priq", smmu);
>   			if (ret < 0)
>   				dev_warn(smmu->dev,
>   					 "failed to enable priq irq\n");
> @@ -2254,11 +2598,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
>   		 * Cavium ThunderX2 implementation doesn't support unique irq
>   		 * lines. Use a single irq line for all the SMMUv3 interrupts.
>   		 */
> -		ret = devm_request_threaded_irq(smmu->dev, irq,
> -					arm_smmu_combined_irq_handler,
> -					arm_smmu_combined_irq_thread,
> -					IRQF_ONESHOT,
> -					"arm-smmu-v3-combined-irq", smmu);
> +		ret = request_irq(irq, 0, arm_smmu_combined_irq_handler,
> +						"arm-smmu-v3-combined-irq", smmu);
>   		if (ret < 0)
>   			dev_warn(smmu->dev, "failed to enable combined irq\n");
>   	} else
> @@ -2287,7 +2628,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
>   	return ret;
>   }
>   
> -static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
>   {
>   	int ret;
>   	u32 reg, enables;
> @@ -2297,7 +2638,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>   	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
>   	if (reg & CR0_SMMUEN) {
>   		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
> -		WARN_ON(is_kdump_kernel() && !disable_bypass);
> +		WARN_ON(!disable_bypass);
>   		arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
>   	}
>   
> @@ -2401,11 +2742,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>   		return ret;
>   	}
>   
> -	if (is_kdump_kernel())
> -		enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
> +	/* Initialize tasklets for threaded IRQs*/
> +	tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
> +	tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
> +	tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
> +				 smmu);
>   
>   	/* Enable the SMMU interface, or ensure bypass */
> -	if (!bypass || disable_bypass) {
> +	if (disable_bypass) {
>   		enables |= CR0_SMMUEN;
>   	} else {
>   		ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
> @@ -2470,8 +2814,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>   	if (reg & IDR0_SEV)
>   		smmu->features |= ARM_SMMU_FEAT_SEV;
>   
> +#ifdef CONFIG_MSI
>   	if (reg & IDR0_MSI)
>   		smmu->features |= ARM_SMMU_FEAT_MSI;
> +#endif
>   
>   	if (reg & IDR0_HYP)
>   		smmu->features |= ARM_SMMU_FEAT_HYP;
> @@ -2496,7 +2842,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>   		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
>   
>   	if (!(reg & IDR0_S2P)) {
> -		dev_err(smmu->dev, "no translation support!\n");
> +		dev_err(smmu->dev, "no stage-2 translation support!\n");
>   		return -ENXIO;
>   	}
>   
> @@ -2593,8 +2939,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>   		smmu->oas = 48;
>   	}
>   
> +	smmu->oas = min_t(unsigned long, PADDR_BITS, smmu->oas);
>   	smmu->ias = max(smmu->ias, smmu->oas);
>   
> +	/* Xen: Set maximum Stage-2 input size supported by the SMMU. */
> +	p2m_restrict_ipa_bits(smmu->ias);
> +
>   	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
>   		 smmu->ias, smmu->oas, smmu->features);
>   	return 0;
> @@ -2645,7 +2995,7 @@ static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
>   static int arm_smmu_device_dt_probe(struct platform_device *pdev,
>   				    struct arm_smmu_device *smmu)
>   {
> -	struct device *dev = &pdev->dev;
> +	struct device *dev = pdev;
>   	u32 cells;
>   	int ret = -EINVAL;
>   
> @@ -2658,7 +3008,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
>   
>   	parse_driver_options(smmu);
>   
> -	if (of_dma_is_coherent(dev->of_node))
> +	if (dt_get_property(dev->of_node, "dma-coherent", NULL))
>   		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
>   
>   	return ret;
> @@ -2672,63 +3022,49 @@ static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
>   		return SZ_128K;
>   }
>   
> -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start,
> -				      resource_size_t size)
> -{
> -	struct resource res = {
> -		.flags = IORESOURCE_MEM,
> -		.start = start,
> -		.end = start + size - 1,
> -	};
> -
> -	return devm_ioremap_resource(dev, &res);
> -}
> -
>   static int arm_smmu_device_probe(struct platform_device *pdev)
>   {
>   	int irq, ret;
> -	struct resource *res;
> -	resource_size_t ioaddr;
> +	paddr_t ioaddr, iosize;
>   	struct arm_smmu_device *smmu;
> -	struct device *dev = &pdev->dev;
> -	bool bypass;
>   
> -	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
> +	smmu = xzalloc(struct arm_smmu_device);
>   	if (!smmu) {
> -		dev_err(dev, "failed to allocate arm_smmu_device\n");
> +		dev_err(pdev, "failed to allocate arm_smmu_device\n");
>   		return -ENOMEM;
>   	}
> -	smmu->dev = dev;
> +	smmu->dev = pdev;
>   
> -	if (dev->of_node) {
> +	if (pdev->of_node) {
>   		ret = arm_smmu_device_dt_probe(pdev, smmu);
> +		if (ret)
> +			return -EINVAL;
>   	} else {
>   		ret = arm_smmu_device_acpi_probe(pdev, smmu);
>   		if (ret == -ENODEV)
>   			return ret;
>   	}
>   
> -	/* Set bypass mode according to firmware probing result */
> -	bypass = !!ret;
> -
>   	/* Base address */
> -	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	if (resource_size(res) < arm_smmu_resource_size(smmu)) {
> -		dev_err(dev, "MMIO region too small (%pr)\n", res);
> +	ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
> +	if (ret)
> +		return -ENODEV;
> +
> +	if (iosize < arm_smmu_resource_size(smmu)) {
> +		dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>   		return -EINVAL;
>   	}
> -	ioaddr = res->start;
>   
>   	/*
>   	 * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
> -	 * the PMCG registers which are reserved by the PMU driver.
> +	 * the PMCG registers which are optional and currently not supported.
>   	 */
> -	smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
> +	smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>   	if (IS_ERR(smmu->base))
>   		return PTR_ERR(smmu->base);
>   
> -	if (arm_smmu_resource_size(smmu) > SZ_64K) {
> -		smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
> +	if (iosize > SZ_64K) {
> +		smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>   					       ARM_SMMU_REG_SZ);
>   		if (IS_ERR(smmu->page1))
>   			return PTR_ERR(smmu->page1);
> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>   		return ret;
>   
>   	/* Reset the device */
> -	ret = arm_smmu_device_reset(smmu, bypass);
> +	ret = arm_smmu_device_reset(smmu);
>   	if (ret)
>   		return ret;
>   
> +	/*
> +	 * Keep a list of all probed devices. This will be used to query
> +	 * the smmu devices based on the fwnode.
> +	 */
> +	INIT_LIST_HEAD(&smmu->devices);
> +
> +	spin_lock(&arm_smmu_devices_lock);
> +	list_add(&smmu->devices, &arm_smmu_devices);
> +	spin_unlock(&arm_smmu_devices_lock);
> +
>   	return 0;
>   }
>   
> -static const struct of_device_id arm_smmu_of_match[] = {
> +static const struct dt_device_match arm_smmu_of_match[] = {
>   	{ .compatible = "arm,smmu-v3", },
>   	{ },
>   };
> +
> +/* Start of Xen specific code. */
> +static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
> +{
> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
> +	struct iommu_domain *io_domain;
> +
> +	spin_lock(&xen_domain->lock);
> +
> +	list_for_each_entry(io_domain, &xen_domain->contexts, list) {
> +		/*
> +		 * Only invalidate the context when SMMU is present.
> +		 * This is because the context initialization is delayed
> +		 * until a master has been added.
> +		 */
> +		if (unlikely(!ACCESS_ONCE(to_smmu_domain(io_domain)->smmu)))
> +			continue;
> +
> +		arm_smmu_tlb_inv_context(to_smmu_domain(io_domain));
> +	}
> +
> +	spin_unlock(&xen_domain->lock);
> +
> +	return 0;
> +}
> +
> +static int __must_check arm_smmu_iotlb_flush(struct domain *d, dfn_t dfn,
> +				unsigned long page_count, unsigned int flush_flags)
> +{
> +	return arm_smmu_iotlb_flush_all(d);
> +}
> +
> +static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev)
> +{
> +	struct arm_smmu_device *smmu = NULL;
> +
> +	spin_lock(&arm_smmu_devices_lock);
> +
> +	list_for_each_entry(smmu, &arm_smmu_devices, devices) {
> +		if (smmu->dev  == dev) {
> +			spin_unlock(&arm_smmu_devices_lock);
> +			return smmu;
> +		}
> +	}
> +
> +	spin_unlock(&arm_smmu_devices_lock);
> +
> +	return NULL;
> +}
> +
> +static struct iommu_domain *arm_smmu_get_domain(struct domain *d,
> +				struct device *dev)
> +{
> +	struct iommu_domain *io_domain;
> +	struct arm_smmu_domain *smmu_domain;
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
> +	struct arm_smmu_device *smmu = arm_smmu_get_by_dev(fwspec->iommu_dev);
> +
> +	if (!smmu)
> +		return NULL;
> +
> +	/*
> +	 * Loop through the &xen_domain->contexts to locate a context
> +	 * assigned to this SMMU
> +	 */
> +	list_for_each_entry(io_domain, &xen_domain->contexts, list) {
> +		smmu_domain = to_smmu_domain(io_domain);
> +		if (smmu_domain->smmu == smmu)
> +			return io_domain;
> +	}
> +	return NULL;
> +}
> +
> +static void arm_smmu_destroy_iommu_domain(struct iommu_domain *io_domain)
> +{
> +	list_del(&io_domain->list);
> +	arm_smmu_domain_free(io_domain);
> +}
> +
> +static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
> +		struct device *dev, u32 flag)
> +{
> +	int ret = 0;
> +	struct iommu_domain *io_domain;
> +	struct arm_smmu_domain *smmu_domain;
> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
> +
> +	spin_lock(&xen_domain->lock);
> +
> +	/*
> +	 * Check to see if an iommu_domain already exists for this xen domain
> +	 * under the same SMMU
> +	 */
> +	io_domain = arm_smmu_get_domain(d, dev);
> +	if (!io_domain) {
> +		io_domain = arm_smmu_domain_alloc();
> +		if (!io_domain) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		smmu_domain = to_smmu_domain(io_domain);
> +		smmu_domain->d = d;
> +
> +		/* Chain the new context to the domain */
> +		list_add(&io_domain->list, &xen_domain->contexts);
> +	}
> +
> +	ret = arm_smmu_attach_dev(io_domain, dev);
> +	if (ret) {
> +		if (io_domain->ref.counter == 0)
> +			arm_smmu_destroy_iommu_domain(io_domain);
> +	} else {
> +		atomic_inc(&io_domain->ref);
> +	}
> +
> +out:
> +	spin_unlock(&xen_domain->lock);
> +	return ret;
> +}
> +
> +static int arm_smmu_deassign_dev(struct domain *d, struct device *dev)
> +{
> +	struct iommu_domain *io_domain = arm_smmu_get_domain(d, dev);
> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(io_domain);
> +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +
> +	if (!smmu_domain || smmu_domain->d != d) {
> +		dev_err(dev, " not attached to domain %d\n", d->domain_id);
> +		return -ESRCH;
> +	}
> +
> +	spin_lock(&xen_domain->lock);
> +
> +	arm_smmu_detach_dev(master);
> +	atomic_dec(&io_domain->ref);
> +
> +	if (io_domain->ref.counter == 0)
> +		arm_smmu_destroy_iommu_domain(io_domain);
> +
> +	spin_unlock(&xen_domain->lock);
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_reassign_dev(struct domain *s, struct domain *t,
> +				u8 devfn,  struct device *dev)
> +{
> +	int ret = 0;
> +
> +	/* Don't allow remapping on other domain than hwdom */
> +	if (t && t != hardware_domain)
> +		return -EPERM;
> +
> +	if (t == s)
> +		return 0;
> +
> +	ret = arm_smmu_deassign_dev(s, dev);
> +	if (ret)
> +		return ret;
> +
> +	if (t) {
> +		/* No flags are defined for ARM. */
> +		ret = arm_smmu_assign_dev(t, devfn, dev, 0);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int arm_smmu_iommu_xen_domain_init(struct domain *d)
> +{
> +	struct arm_smmu_xen_domain *xen_domain;
> +
> +	xen_domain = xzalloc(struct arm_smmu_xen_domain);
> +	if (!xen_domain)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&xen_domain->lock);
> +	INIT_LIST_HEAD(&xen_domain->contexts);
> +
> +	dom_iommu(d)->arch.priv = xen_domain;
> +	return 0;
> +
> +}
> +
> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
> +{

Both SMMUv2 and IPMMU perform some actions here. Any reason we don't 
need to do the same here?

     /* Set to false options not supported on ARM. */
     if ( iommu_hwdom_inclusive )
         printk(XENLOG_WARNING
         "map-inclusive dom0-iommu option is not supported on ARM\n");
     iommu_hwdom_inclusive = false;
     if ( iommu_hwdom_reserved == 1 )
         printk(XENLOG_WARNING
         "map-reserved dom0-iommu option is not supported on ARM\n");
     iommu_hwdom_reserved = 0;

     arch_iommu_hwdom_init(d);


> +}
> +
> +static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
> +{
> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
> +
> +	ASSERT(list_empty(&xen_domain->contexts));
> +	xfree(xen_domain);
> +}
> +
> +static const struct iommu_ops arm_smmu_iommu_ops = {
> +	.init		= arm_smmu_iommu_xen_domain_init,
> +	.hwdom_init		= arm_smmu_iommu_hwdom_init,
> +	.teardown		= arm_smmu_iommu_xen_domain_teardown,
> +	.iotlb_flush		= arm_smmu_iotlb_flush,
> +	.iotlb_flush_all	= arm_smmu_iotlb_flush_all,
> +	.assign_device		= arm_smmu_assign_dev,
> +	.reassign_device	= arm_smmu_reassign_dev,
> +	.map_page		= arm_iommu_map_page,
> +	.unmap_page		= arm_iommu_unmap_page,
> +	.dt_xlate		= arm_smmu_dt_xlate,
> +	.add_device		= arm_smmu_add_device,
> +};
> +
> +static __init int arm_smmu_dt_init(struct dt_device_node *dev,
> +				const void *data)
> +{
> +	int rc;
> +
> +	/*
> +	 * Even if the device can't be initialized, we don't want to
> +	 * give the SMMU device to dom0.
> +	 */
> +	dt_device_set_used_by(dev, DOMID_XEN);
> +
> +	rc = arm_smmu_device_probe(dt_to_dev(dev));
> +	if (rc)
> +		return rc;
> +
> +	iommu_set_ops(&arm_smmu_iommu_ops);
> +
> +	return 0;
> +}
> +
> +DT_DEVICE_START(smmuv3, "ARM SMMU V3", DEVICE_IOMMU)
> +.dt_match = arm_smmu_of_match,
> +.init = arm_smmu_dt_init,
> +DT_DEVICE_END

Oleksandr Tyshchenko Jan. 11, 2021, 4:39 p.m. UTC | #3

Hi Rahul


>> -
>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>   {
>>       int irq, ret;
>> -    struct resource *res;
>> -    resource_size_t ioaddr;
>> +    paddr_t ioaddr, iosize;
>>       struct arm_smmu_device *smmu;
>> -    struct device *dev = &pdev->dev;
>> -    bool bypass;
>>   -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>> +    smmu = xzalloc(struct arm_smmu_device);
>>       if (!smmu) {
>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>           return -ENOMEM;
>>       }
>> -    smmu->dev = dev;
>> +    smmu->dev = pdev;
>>   -    if (dev->of_node) {
>> +    if (pdev->of_node) {
>>           ret = arm_smmu_device_dt_probe(pdev, smmu);
>> +        if (ret)
>> +            return -EINVAL;
>>       } else {
>>           ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>           if (ret == -ENODEV)
>>               return ret;
>>       }
>>   -    /* Set bypass mode according to firmware probing result */
>> -    bypass = !!ret;
>> -
>>       /* Base address */
>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>> +    if (ret)
>> +        return -ENODEV;
>> +
>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>           return -EINVAL;
>>       }
>> -    ioaddr = res->start;
>>         /*
>>        * Don't map the IMPLEMENTATION DEFINED regions, since they may 
>> contain
>> -     * the PMCG registers which are reserved by the PMU driver.
>> +     * the PMCG registers which are optional and currently not 
>> supported.
>>        */
>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>       if (IS_ERR(smmu->base))
>>           return PTR_ERR(smmu->base);
>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>> +    if (iosize > SZ_64K) {
>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>                              ARM_SMMU_REG_SZ);
>>           if (IS_ERR(smmu->page1))
>>               return PTR_ERR(smmu->page1);
>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>           return ret;
>>         /* Reset the device */
>> -    ret = arm_smmu_device_reset(smmu, bypass);
>> +    ret = arm_smmu_device_reset(smmu);
>>       if (ret)
>>           return ret;
>>   +    /*
>> +     * Keep a list of all probed devices. This will be used to query
>> +     * the smmu devices based on the fwnode.
>> +     */
>> +    INIT_LIST_HEAD(&smmu->devices);
>> +
>> +    spin_lock(&arm_smmu_devices_lock);
>> +    list_add(&smmu->devices, &arm_smmu_devices);
>> +    spin_unlock(&arm_smmu_devices_lock);

Looks like that we need some kind of manual roll-back logic here in case 
of error during probe (there is no real devm_*):

iounmap, xfree, etc.


>>
>> +
>>       return 0;
>>   }
>

Rahul Singh Jan. 12, 2021, 9:41 a.m. UTC | #4

Hello Oleksandr ,

Thanks for reviewing the code.

> On 11 Jan 2021, at 4:09 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> On 08.01.21 16:46, Rahul Singh wrote:
> 
> Hi Rahul
> 
>> Add support for ARM architected SMMUv3 implementation. It is based on
>> the Linux SMMUv3 driver.
>> 
>> Driver is currently supported as Tech Preview.
>> 
>> Major differences with regard to Linux driver are as follows:
>> 2. Only Stage-2 translation is supported as compared to the Linux driver
>>    that supports both Stage-1 and Stage-2 translations.
>> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>>    capability to share the page tables with the CPU.
>> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>>    and priority queue IRQ handling.
>> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>>    access functions based on atomic operations implemented in Linux.
>>    Atomic functions used by the commands queue access functions are not
>>    implemented in XEN therefore we decided to port the earlier version
>>    of the code. Atomic operations are introduced to fix the bottleneck
>>    of the SMMU command queue insertion operation. A new algorithm for
>>    inserting commands into the queue is introduced, which is lock-free
>>    on the fast-path.
>>    Consequence of reverting the patch is that the command queue
>>    insertion will be slow for large systems as spinlock will be used to
>>    serializes accesses from all CPUs to the single queue supported by
>>    the hardware. Once the proper atomic operations will be available in
>>    XEN the driver can be updated.
>> 6. Spin lock is used in place of mutex when attaching a device to the
>>    SMMU, as there is no blocking locks implementation available in XEN.
>>    This might introduce latency in XEN. Need to investigate before
>>    driver is out for tech preview.
>> 7. PCI ATS functionality is not supported, as there is no support
>>    available in XEN to test the functionality. Code is not tested and
>>    compiled. Code is guarded by the flag CONFIG_PCI_ATS.
>> 8. MSI interrupts are not supported as there is no support available in
>>    XEN to request MSI interrupts. Code is not tested and compiled. Code
>>    is guarded by the flag CONFIG_MSI.
>> 
>> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
>> ---
>> Changes in V3:
>> - added return statement for readx_poll_timeout function.
>> - remove iommu_get_dma_cookie and iommu_put_dma_cookie.
>> - remove struct arm_smmu_xen_device as not required.
>> - move dt_property_match_string to device_tree.c file.
>> - replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
>> - use ARM_SMMU_REG_SZ as size when map memory to XEN.
>> - remove bypass keyword to make sure when device-tree probe is failed we
>>   are reporting error and not continuing to configure SMMU in bypass
>>   mode.
>> - fixed minor comments.
>> Changes in V4:
>> - Fixed typo for CONFIG_MSI
>> - Added back the mutex code
>> - Rebase the patch on top of newly added WARN_ON().
>> - Remove the direct read of register VTCR_EL2.
>> - Fixed minor comments.
>> ---
>>  MAINTAINERS                           |   6 +
>>  SUPPORT.md                            |   1 +
>>  xen/drivers/passthrough/Kconfig       |  11 +
>>  xen/drivers/passthrough/arm/Makefile  |   1 +
>>  xen/drivers/passthrough/arm/smmu-v3.c | 808 ++++++++++++++++++++++----
>>  5 files changed, 715 insertions(+), 112 deletions(-)
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 6dbd99aff4..d832e8fd65 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -249,6 +249,12 @@ F:	xen/include/asm-arm/
>>  F:	xen/include/public/arch-arm/
>>  F:	xen/include/public/arch-arm.h
>>  +ARM SMMUv3
>> +M:	Bertrand Marquis <bertrand.marquis@arm.com>
>> +M:	Rahul Singh <rahul.singh@arm.com>
>> +S:	Supported
>> +F:	xen/drivers/passthrough/arm/smmu-v3.c
>> +
>>  Change Log
>>  M:	Paul Durrant <paul@xen.org>
>>  R:	Community Manager <community.manager@xenproject.org>
>> diff --git a/SUPPORT.md b/SUPPORT.md
>> index ab02aca5f4..5ee3c8651a 100644
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -67,6 +67,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
>>      Status, Intel VT-d: Supported
>>      Status, ARM SMMUv1: Supported, not security supported
>>      Status, ARM SMMUv2: Supported, not security supported
>> +    Status, ARM SMMUv3: Tech Preview
>>      Status, Renesas IPMMU-VMSA: Supported, not security supported
>>    ### ARM/GICv3 ITS
>> diff --git a/xen/drivers/passthrough/Kconfig b/xen/drivers/passthrough/Kconfig
>> index 0036007ec4..341ba92b30 100644
>> --- a/xen/drivers/passthrough/Kconfig
>> +++ b/xen/drivers/passthrough/Kconfig
>> @@ -13,6 +13,17 @@ config ARM_SMMU
>>  	  Say Y here if your SoC includes an IOMMU device implementing the
>>  	  ARM SMMU architecture.
>>  +config ARM_SMMU_V3
>> +	bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
>> +	depends on ARM_64
>> +	---help---
>> +	 Support for implementations of the ARM System MMU architecture
>> +	 version 3. Driver is in experimental stage and should not be used in
>> +	 production.
>> +
>> +	 Say Y here if your system includes an IOMMU device implementing
>> +	 the ARM SMMUv3 architecture.
>> +
>>  config IPMMU_VMSA
>>  	bool "Renesas IPMMU-VMSA found in R-Car Gen3 SoCs"
>>  	depends on ARM_64
>> diff --git a/xen/drivers/passthrough/arm/Makefile b/xen/drivers/passthrough/arm/Makefile
>> index fcd918ea3e..c5fb3b58a5 100644
>> --- a/xen/drivers/passthrough/arm/Makefile
>> +++ b/xen/drivers/passthrough/arm/Makefile
>> @@ -1,3 +1,4 @@
>>  obj-y += iommu.o iommu_helpers.o iommu_fwspec.o
>>  obj-$(CONFIG_ARM_SMMU) += smmu.o
>>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>> +obj-$(CONFIG_ARM_SMMU_V3) += smmu-v3.o
>> diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
>> index 2dfadc6a65..e53f9f4c21 100644
>> --- a/xen/drivers/passthrough/arm/smmu-v3.c
>> +++ b/xen/drivers/passthrough/arm/smmu-v3.c
>> @@ -2,37 +2,263 @@
>>  /*
>>   * IOMMU API for ARM architected SMMUv3 implementations.
>>   *
>> + * Based on Linux's SMMUv3 driver:
>> + *    drivers/iommu/arm-smmu-v3.c
>> + *    commit: ab435ce49bd1d02e33dfec24f76955dc1196970b
>> + * and Xen's SMMU driver:
>> + *    xen/drivers/passthrough/arm/smmu.c
>> + *
>> + * Major differences with regard to Linux driver are as follows:
>> + *  1. Driver is currently supported as Tech Preview.
>> + *  2. Only Stage-2 translation is supported as compared to the Linux driver
>> + *     that supports both Stage-1 and Stage-2 translations.
>> + *  3. Use P2M  page table instead of creating one as SMMUv3 has the
>> + *     capability to share the page tables with the CPU.
>> + *  4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>> + *     and priority queue IRQ handling.
>> + *  5. Latest version of the Linux SMMUv3 code implements the commands queue
>> + *     access functions based on atomic operations implemented in Linux.
>> + *     Atomic functions used by the commands queue access functions are not
>> + *     implemented in XEN therefore we decided to port the earlier version
>> + *     of the code. Atomic operations are introduced to fix the bottleneck of
>> + *     the SMMU command queue insertion operation. A new algorithm for
>> + *     inserting commands into the queue is introduced, which is
>> + *     lock-free on the fast-path.
>> + *     Consequence of reverting the patch is that the command queue insertion
>> + *     will be slow for large systems as spinlock will be used to serializes
>> + *     accesses from all CPUs to the single queue supported by the hardware.
>> + *     Once the proper atomic operations will be available in XEN the driver
>> + *     can be updated.
>> + *  6. Spin lock is used in place of Mutex when attaching a device to the SMMU,
>> + *     as there is no blocking locks implementation available in XEN.This might
>> + *     introduce latency in XEN. Need to investigate before driver is out for
>> + *     Tech Preview.
>> + *  7. PCI ATS functionality is not supported, as there is no support available
>> + *     in XEN to test the functionality. Code is not tested and compiled. Code
>> + *     is guarded by the flag CONFIG_PCI_ATS.
>> + *  8. MSI interrupts are not supported as there is no support available
>> + *     in XEN to request MSI interrupts. Code is not tested and compiled. Code
>> + *     is guarded by the flag CONFIG_MSI.
>> + *
>> + * Following functionality should be supported before driver is out for tech
>> + * preview
>> + *
>> + *  1. Investigate the timing analysis of using spin lock in place of mutex
>> + *     when attaching devices to SMMU.
>> + *  2. Merged the latest Linux SMMUv3 driver code once atomic operation is
>> + *     available in XEN.
>> + *  3. PCI ATS and MSI interrupts should be supported.
>> + *  4. Investigate side-effect of using tasklet in place of threaded IRQ and
>> + *     fix if any.
>> + *
>>   * Copyright (C) 2015 ARM Limited
>>   *
>>   * Author: Will Deacon <will.deacon@arm.com>
>>   *
>> - * This driver is powered by bad coffee and bombay mix.
>> + * Copyright (C) 2020 Arm Ltd
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + */
>> +
>> +#include <xen/acpi.h>
>> +#include <xen/config.h>
>> +#include <xen/delay.h>
>> +#include <xen/errno.h>
>> +#include <xen/err.h>
>> +#include <xen/irq.h>
>> +#include <xen/lib.h>
>> +#include <xen/list.h>
>> +#include <xen/mm.h>
>> +#include <xen/rbtree.h>
>> +#include <xen/sched.h>
>> +#include <xen/sizes.h>
>> +#include <xen/vmap.h>
>> +#include <asm/atomic.h>
>> +#include <asm/device.h>
>> +#include <asm/io.h>
>> +#include <asm/iommu_fwspec.h>
>> +#include <asm/platform.h>
>> +
>> +
>> +#define ARM_SMMU_VTCR_SH_IS		3
>> +#define ARM_SMMU_VTCR_RGN_WBWA		1
>> +#define ARM_SMMU_VTCR_TG0_4K		0
>> +#define ARM_SMMU_VTCR_PS_32_BIT		0x0ULL
>> +#define ARM_SMMU_VTCR_PS_36_BIT		0x1ULL
>> +#define ARM_SMMU_VTCR_PS_40_BIT		0x2ULL
>> +#define ARM_SMMU_VTCR_PS_42_BIT		0x3ULL
>> +#define ARM_SMMU_VTCR_PS_44_BIT		0x4ULL
>> +#define ARM_SMMU_VTCR_PS_48_BIT		0x5ULL
>> +#define ARM_SMMU_VTCR_PS_52_BIT		0x6ULL
>> +
>> +/* Linux compatibility functions. */
>> +typedef paddr_t		dma_addr_t;
>> +typedef paddr_t		phys_addr_t;
>> +typedef unsigned int		gfp_t;
>> +
>> +#define platform_device		device
>> +
>> +#define GFP_KERNEL		0
>> +
>> +/* Alias to Xen lock functions */
>> +#define mutex spinlock
>> +#define mutex_init spin_lock_init
>> +#define mutex_lock spin_lock
>> +#define mutex_unlock spin_unlock
>> +
>> +/* Device logger functions */
>> +#define dev_name(dev)	dt_node_full_name(dev->of_node)
>> +#define dev_dbg(dev, fmt, ...)			\
>> +	printk(XENLOG_DEBUG "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +#define dev_notice(dev, fmt, ...)		\
>> +	printk(XENLOG_INFO "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +#define dev_warn(dev, fmt, ...)			\
>> +	printk(XENLOG_WARNING "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +#define dev_err(dev, fmt, ...)			\
>> +	printk(XENLOG_ERR "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +#define dev_info(dev, fmt, ...)			\
>> +	printk(XENLOG_INFO "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +#define dev_err_ratelimited(dev, fmt, ...)			\
>> +	printk(XENLOG_ERR "SMMUv3: %s: " fmt, dev_name(dev), ## __VA_ARGS__)
>> +
>> +/*
>> + * Periodically poll an address and wait between reads in us until a
>> + * condition is met or a timeout occurs.
>> + *
>> + * @return: 0 when cond met, -ETIMEDOUT upon timeout
>> + */
>> +#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \
>> +({ \
>> +	s_time_t deadline = NOW() + MICROSECS(timeout_us); \
>> +	for (;;) { \
>> +		(val) = op(addr); \
>> +		if (cond) \
>> +			break; \
>> +		if (NOW() > deadline) { \
>> +			(val) = op(addr); \
>> +			break; \
>> +		} \
>> +		udelay(sleep_us); \
>> +	} \
>> +	(cond) ? 0 : -ETIMEDOUT; \
>> +})
>> +
>> +#define readl_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us)	\
>> +	readx_poll_timeout(readl_relaxed, addr, val, cond, delay_us, timeout_us)
>> +
>> +#define FIELD_PREP(_mask, _val)			\
>> +	(((typeof(_mask))(_val) << (ffsll(_mask) - 1)) & (_mask))
>> +
>> +#define FIELD_GET(_mask, _reg)			\
>> +	((typeof(_mask))(((_reg) & (_mask)) >> (ffsll(_mask) - 1)))
>> +
>> +/*
>> + * Helpers for DMA allocation. Just the function name is reused for
>> + * porting code, these allocation are not managed allocations
>>   */
>> +static void *dmam_alloc_coherent(struct device *dev, size_t size,
>> +				paddr_t *dma_handle, gfp_t gfp)
>> +{
>> +	void *vaddr;
>> +	unsigned long alignment = size;
>> +
>> +	/*
>> +	 * _xzalloc requires that the (align & (align -1)) = 0. Most of the
>> +	 * allocations in SMMU code should send the right value for size. In
>> +	 * case this is not true print a warning and align to the size of a
>> +	 * (void *)
>> +	 */
>> +	if (size & (size - 1)) {
>> +		printk(XENLOG_WARNING "SMMUv3: Fixing alignment for the DMA buffer\n");
>> +		alignment = sizeof(void *);
>> +	}
>> +
>> +	vaddr = _xzalloc(size, alignment);
>> +	if (!vaddr) {
>> +		printk(XENLOG_ERR "SMMUv3: DMA allocation failed\n");
>> +		return NULL;
>> +	}
>> +
>> +	*dma_handle = virt_to_maddr(vaddr);
>> +
>> +	return vaddr;
>> +}
>> +
>> +
>> +/* Xen specific code. */
>> +struct iommu_domain {
>> +	/* Runtime SMMU configuration for this iommu_domain */
>> +	atomic_t		ref;
>> +	/*
>> +	 * Used to link iommu_domain contexts for a same domain.
>> +	 * There is at least one per-SMMU to used by the domain.
>> +	 */
>> +	struct list_head		list;
>> +};
>>  -#include <linux/acpi.h>
>> -#include <linux/acpi_iort.h>
>> -#include <linux/bitfield.h>
>> -#include <linux/bitops.h>
>> -#include <linux/crash_dump.h>
>> -#include <linux/delay.h>
>> -#include <linux/dma-iommu.h>
>> -#include <linux/err.h>
>> -#include <linux/interrupt.h>
>> -#include <linux/io-pgtable.h>
>> -#include <linux/iommu.h>
>> -#include <linux/iopoll.h>
>> -#include <linux/module.h>
>> -#include <linux/msi.h>
>> -#include <linux/of.h>
>> -#include <linux/of_address.h>
>> -#include <linux/of_iommu.h>
>> -#include <linux/of_platform.h>
>> -#include <linux/pci.h>
>> -#include <linux/pci-ats.h>
>> -#include <linux/platform_device.h>
>> -
>> -#include <linux/amba/bus.h>
>> +/* Describes information required for a Xen domain */
>> +struct arm_smmu_xen_domain {
>> +	spinlock_t		lock;
>>  +	/* List of iommu domains associated to this domain */
>> +	struct list_head	contexts;
>> +};
>> +
>> +
>> +/* Keep a list of devices associated with this driver */
>> +static DEFINE_SPINLOCK(arm_smmu_devices_lock);
>> +static LIST_HEAD(arm_smmu_devices);
>> +
>> +static inline void *dev_iommu_priv_get(struct device *dev)
>> +{
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> +
>> +	return fwspec && fwspec->iommu_priv ? fwspec->iommu_priv : NULL;
>> +}
>> +
>> +static inline void dev_iommu_priv_set(struct device *dev, void *priv)
>> +{
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> +
>> +	fwspec->iommu_priv = priv;
>> +}
>> +
>> +static int platform_get_irq_byname_optional(struct device *dev,
>> +				const char *name)
>> +{
>> +	int index, ret;
>> +	struct dt_device_node *np  = dev_to_dt(dev);
>> +
>> +	if (unlikely(!name))
>> +		return -EINVAL;
>> +
>> +	index = dt_property_match_string(np, "interrupt-names", name);
>> +	if (index < 0) {
>> +		dev_info(dev, "IRQ %s not found\n", name);
>> +		return index;
>> +	}
>> +
>> +	ret = platform_get_irq(np, index);
>> +	if (ret < 0) {
>> +		dev_err(dev, "failed to get irq index %d\n", index);
>> +		return -ENODEV;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +/* Start of Linux SMMUv3 code */
>>  /* MMIO registers */
>>  #define ARM_SMMU_IDR0			0x0
>>  #define IDR0_ST_LVL			GENMASK(28, 27)
>> @@ -402,6 +628,7 @@ enum pri_resp {
>>  	PRI_RESP_SUCC = 2,
>>  };
>>  +#ifdef CONFIG_MSI
>>  enum arm_smmu_msi_index {
>>  	EVTQ_MSI_INDEX,
>>  	GERROR_MSI_INDEX,
>> @@ -426,6 +653,7 @@ static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>>  		ARM_SMMU_PRIQ_IRQ_CFG2,
>>  	},
>>  };
>> +#endif /* CONFIG_MSI */
>>    struct arm_smmu_cmdq_ent {
>>  	/* Common fields */
>> @@ -613,8 +841,13 @@ struct arm_smmu_device {
>>  		u64			padding;
>>  	};
>>  -	/* IOMMU core code handle */
>> -	struct iommu_device		iommu;
>> +	/* Need to keep a list of SMMU devices */
>> +	struct list_head		devices;
>> +
>> +	/* Tasklets for handling evts/faults and pci page request IRQs*/
>> +	struct tasklet		evtq_irq_tasklet;
>> +	struct tasklet		priq_irq_tasklet;
>> +	struct tasklet		combined_irq_tasklet;
>>  };
>>    /* SMMU private data for each master */
>> @@ -646,6 +879,9 @@ struct arm_smmu_domain {
>>  	enum arm_smmu_domain_stage	stage;
>>  	struct arm_smmu_s2_cfg	s2_cfg;
>>  +	/* Xen domain associated with this SMMU domain */
>> +	struct domain		*d;
>> +
>>  	struct iommu_domain		domain;
>>    	struct list_head		devices;
>> @@ -984,6 +1220,7 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  	spin_unlock_irqrestore(&smmu->cmdq.lock, flags);
>>  }
>>  +#ifdef CONFIG_MSI
>>  /*
>>   * The difference between val and sync_idx is bounded by the maximum size of
>>   * a queue at 2^20 entries, so 32 bits is plenty for wrap-safe arithmetic.
>> @@ -1027,6 +1264,13 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct arm_smmu_device *smmu)
>>    	return __arm_smmu_sync_poll_msi(smmu, ent.sync.msidata);
>>  }
>> +#else
>> +static inline int __arm_smmu_cmdq_issue_sync_msi(struct arm_smmu_device *smmu)
>> +{
>> +	return 0;
>> +}
>> +#endif /* CONFIG_MSI */
>> +
>>    static int __arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>>  {
>> @@ -1069,7 +1313,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
>>  	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>>    	/* See comment in arm_smmu_write_ctx_desc() */
>> -	WRITE_ONCE(*dst, cpu_to_le64(val));
>> +	write_atomic(dst, cpu_to_le64(val));
>>  }
>>    static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
>> @@ -1184,7 +1428,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>>  						 STRTAB_STE_1_EATS_TRANS));
>>    	arm_smmu_sync_ste_for_sid(smmu, sid);
>> -	WRITE_ONCE(dst[0], cpu_to_le64(val));
>> +	write_atomic(&dst[0], cpu_to_le64(val));
>>  	arm_smmu_sync_ste_for_sid(smmu, sid);
>>    	/* It's likely that we'll want to use the new STE soon */
>> @@ -1231,7 +1475,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>>  }
>>    /* IRQ and event handlers */
>> -static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>> +static void arm_smmu_evtq_tasklet(void *dev)
>>  {
>>  	int i;
>>  	struct arm_smmu_device *smmu = dev;
>> @@ -1261,7 +1505,6 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>>  	/* Sync our overflow flag, as we believe we're up to speed */
>>  	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
>>  		    Q_IDX(llq, llq->cons);
>> -	return IRQ_HANDLED;
>>  }
>>    static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>> @@ -1302,7 +1545,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
>>  	}
>>  }
>>  -static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>> +static void arm_smmu_priq_tasklet(void *dev)
>>  {
>>  	struct arm_smmu_device *smmu = dev;
>>  	struct arm_smmu_queue *q = &smmu->priq.q;
>> @@ -1321,12 +1564,12 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
>>  	llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) |
>>  		      Q_IDX(llq, llq->cons);
>>  	queue_sync_cons_out(q);
>> -	return IRQ_HANDLED;
>>  }
>>    static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
>>  -static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
>> +static void arm_smmu_gerror_handler(int irq, void *dev,
>> +				struct cpu_user_regs *regs)
>>  {
>>  	u32 gerror, gerrorn, active;
>>  	struct arm_smmu_device *smmu = dev;
>> @@ -1336,7 +1579,7 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
>>    	active = gerror ^ gerrorn;
>>  	if (!(active & GERROR_ERR_MASK))
>> -		return IRQ_NONE; /* No errors pending */
>> +		return; /* No errors pending */
>>    	dev_warn(smmu->dev,
>>  		 "unexpected global error reported (0x%08x), this could be serious\n",
>> @@ -1369,26 +1612,44 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
>>  		arm_smmu_cmdq_skip_err(smmu);
>>    	writel(gerror, smmu->base + ARM_SMMU_GERRORN);
>> -	return IRQ_HANDLED;
>>  }
>>  -static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev)
>> +static void arm_smmu_combined_irq_handler(int irq, void *dev,
>> +				struct cpu_user_regs *regs)
>> +{
>> +	struct arm_smmu_device *smmu = dev;
>> +
>> +	arm_smmu_gerror_handler(irq, dev, regs);
>> +
>> +	tasklet_schedule(&(smmu->combined_irq_tasklet));
>> +}
>> +
>> +static void arm_smmu_combined_irq_tasklet(void *dev)
>>  {
>>  	struct arm_smmu_device *smmu = dev;
>>  -	arm_smmu_evtq_thread(irq, dev);
>> +	arm_smmu_evtq_tasklet(dev);
>>  	if (smmu->features & ARM_SMMU_FEAT_PRI)
>> -		arm_smmu_priq_thread(irq, dev);
>> +		arm_smmu_priq_tasklet(dev);
>> +}
>> +
>> +static void arm_smmu_evtq_irq_tasklet(int irq, void *dev,
>> +				struct cpu_user_regs *regs)
>> +{
>> +	struct arm_smmu_device *smmu = dev;
>>  -	return IRQ_HANDLED;
>> +	tasklet_schedule(&(smmu->evtq_irq_tasklet));
>>  }
>>  -static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
>> +static void arm_smmu_priq_irq_tasklet(int irq, void *dev,
>> +				struct cpu_user_regs *regs)
>>  {
>> -	arm_smmu_gerror_handler(irq, dev);
>> -	return IRQ_WAKE_THREAD;
>> +	struct arm_smmu_device *smmu = dev;
>> +
>> +	tasklet_schedule(&(smmu->priq_irq_tasklet));
>>  }
>>  +#ifdef CONFIG_PCI_ATS
>>  static void
>>  arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
>>  			struct arm_smmu_cmdq_ent *cmd)
>> @@ -1495,6 +1756,7 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
>>    	return ret ? -ETIMEDOUT : 0;
>>  }
>> +#endif /* CONFIG_PCI_ATS */
>>    static void arm_smmu_tlb_inv_context(void *cookie)
>>  {
>> @@ -1525,7 +1787,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(void)
>>  	 * We can't really do anything meaningful until we've added a
>>  	 * master.
>>  	 */
>> -	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
>> +	smmu_domain = xzalloc(struct arm_smmu_domain);
>>  	if (!smmu_domain)
>>  		return NULL;
>>  @@ -1563,7 +1825,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
>>  	if (cfg->vmid)
>>  		arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid);
>>  -	kfree(smmu_domain);
>> +	xfree(smmu_domain);
>>  }
>>    @@ -1576,6 +1838,44 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>>  	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
>>  	typeof(&arm_lpae_s2_cfg.vtcr) vtcr = &arm_lpae_s2_cfg.vtcr;
>>  +	vtcr->sh = ARM_SMMU_VTCR_SH_IS;
>> +	vtcr->irgn = ARM_SMMU_VTCR_RGN_WBWA;
>> +	vtcr->orgn = ARM_SMMU_VTCR_RGN_WBWA;
>> +
>> +	BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
>> +	vtcr->tg = ARM_SMMU_VTCR_TG0_4K;
>> +
>> +	switch (smmu->oas) {
>> +	case 32:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_32_BIT;
>> +		break;
>> +	case 36:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_36_BIT;
>> +		break;
>> +	case 40:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_40_BIT;
>> +		break;
>> +	case 42:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_42_BIT;
>> +		break;
>> +	case 44:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_44_BIT;
>> +		break;
>> +	case 48:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_48_BIT;
>> +		break;
>> +	case 52:
>> +		vtcr->ps = ARM_SMMU_VTCR_PS_52_BIT;
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	vtcr->tsz = 64 - p2m_ipa_bits;
>> +	vtcr->sl = 2 - P2M_ROOT_LEVEL;
>> +
>> +	arm_lpae_s2_cfg.vttbr  = page_to_maddr(smmu_domain->d->arch.p2m.root);
>> +
>>  	vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits);
>>  	if (vmid < 0)
>>  		return vmid;
>> @@ -1589,6 +1889,11 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
>>  			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
>>  			  FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
>>  			  FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
>> +
>> +	printk(XENLOG_DEBUG
>> +		   "SMMUv3: d%u: vmid 0x%x vtcr 0x%"PRIpaddr" p2maddr 0x%"PRIpaddr"\n",
>> +		   smmu_domain->d->domain_id, cfg->vmid, cfg->vtcr, cfg->vttbr);
>> +
>>  	return 0;
>>  }
>>  @@ -1650,6 +1955,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
>>  	}
>>  }
>>  +#ifdef CONFIG_PCI_ATS
>>  static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>>  {
>>  	struct device *dev = master->dev;
>> @@ -1748,6 +2054,23 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>>    	pci_disable_pasid(pdev);
>>  }
>> +#else
>> +static inline bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>> +{
>> +	return false;
>> +}
>> +
>> +static inline void arm_smmu_enable_ats(struct arm_smmu_master *master) { }
>> +
>> +static inline void arm_smmu_disable_ats(struct arm_smmu_master *master) { }
>> +
>> +static inline int arm_smmu_enable_pasid(struct arm_smmu_master *master)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline void arm_smmu_disable_pasid(struct arm_smmu_master *master) { }
>> +#endif /* CONFIG_PCI_ATS */
>>    static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>>  {
>> @@ -1830,8 +2153,10 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
>>    	return sid < limit;
>>  }
>> +/* Forward declaration */
>> +static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev);
>>  -static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>> +static int arm_smmu_add_device(u8 devfn, struct device *dev)
>>  {
>>  	int i, ret;
>>  	struct arm_smmu_device *smmu;
>> @@ -1839,14 +2164,15 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>    	if (!fwspec)
>> -		return ERR_PTR(-ENODEV);
>> +		return -ENODEV;
>>  -	if (WARN_ON_ONCE(dev_iommu_priv_get(dev)))
>> -		return ERR_PTR(-EBUSY);
>> +	smmu = arm_smmu_get_by_dev(fwspec->iommu_dev);
>> +	if (!smmu)
>> +		return -ENODEV;
>>  -	master = kzalloc(sizeof(*master), GFP_KERNEL);
>> +	master = xzalloc(struct arm_smmu_master);
>>  	if (!master)
>> -		return ERR_PTR(-ENOMEM);
>> +		return -ENOMEM;
>>    	master->dev = dev;
>>  	master->smmu = smmu;
>> @@ -1881,17 +2207,36 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  	 */
>>  	arm_smmu_enable_pasid(master);
>>  -	return &smmu->iommu;
>> +	return 0;
>>    err_free_master:
>> -	kfree(master);
>> +	xfree(master);
>>  	dev_iommu_priv_set(dev, NULL);
>> -	return ERR_PTR(ret);
>> +	return ret;
>>  }
>>  -static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>> +static int arm_smmu_dt_xlate(struct device *dev,
>> +				const struct dt_phandle_args *args)
>>  {
>> -	return iommu_fwspec_add_ids(dev, args->args, 1);
>> +	int ret;
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> Please bear in mind I am not familiar with the SMMU, but don't we need to perform a some kind
> of sanity check of passed DT IOMMU specifier here?

I checked the code follow we will never hit the dt_xlate without IOMMU specifier but anyway I will add the sanity check. 
> 
> 
>> +
>> +	ret = iommu_fwspec_add_ids(dev, args->args, 1);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (dt_device_is_protected(dev_to_dt(dev))) {
>> +		dev_err(dev, "Already added to SMMUv3\n");
>> +		return -EEXIST;
>> +	}
>> +
>> +	/* Let Xen know that the master device is protected by an IOMMU. */
>> +	dt_device_set_protected(dev_to_dt(dev));
>> +
>> +	dev_info(dev, "Added master device (SMMUv3 %s StreamIds %u)\n",
>> +			dev_name(fwspec->iommu_dev), fwspec->num_ids);
>> +
>> +	return 0;
>>  }
>>    /* Probing and initialisation functions */
>> @@ -1972,7 +2317,7 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
>>  	size_t size = sizeof(*cfg->l1_desc) * cfg->num_l1_ents;
>>  	void *strtab = smmu->strtab_cfg.strtab;
>>  -	cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);
>> +	cfg->l1_desc = _xzalloc(size, sizeof(void *));
>>  	if (!cfg->l1_desc) {
>>  		dev_err(smmu->dev, "failed to allocate l1 stream table desc\n");
>>  		return -ENOMEM;
>> @@ -2118,6 +2463,7 @@ static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
>>  	return ret;
>>  }
>>  +#ifdef CONFIG_MSI
>>  static void arm_smmu_free_msis(void *data)
>>  {
>>  	struct device *dev = data;
>> @@ -2188,6 +2534,9 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
>>  	/* Add callback to free MSIs on teardown */
>>  	devm_add_action(dev, arm_smmu_free_msis, dev);
>>  }
>> +#else
>> +static inline void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { }
>> +#endif /* CONFIG_MSI */
>>    static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>>  {
>> @@ -2198,9 +2547,7 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>>  	/* Request interrupt lines */
>>  	irq = smmu->evtq.q.irq;
>>  	if (irq) {
>> -		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
>> -						arm_smmu_evtq_thread,
>> -						IRQF_ONESHOT,
>> +		ret = request_irq(irq, 0, arm_smmu_evtq_irq_tasklet,
>>  						"arm-smmu-v3-evtq", smmu);
>>  		if (ret < 0)
>>  			dev_warn(smmu->dev, "failed to enable evtq irq\n");
>> @@ -2210,8 +2557,8 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>>    	irq = smmu->gerr_irq;
>>  	if (irq) {
>> -		ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
>> -				       0, "arm-smmu-v3-gerror", smmu);
>> +		ret = request_irq(irq, 0, arm_smmu_gerror_handler,
>> +						"arm-smmu-v3-gerror", smmu);
>>  		if (ret < 0)
>>  			dev_warn(smmu->dev, "failed to enable gerror irq\n");
>>  	} else {
>> @@ -2221,11 +2568,8 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
>>  	if (smmu->features & ARM_SMMU_FEAT_PRI) {
>>  		irq = smmu->priq.q.irq;
>>  		if (irq) {
>> -			ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
>> -							arm_smmu_priq_thread,
>> -							IRQF_ONESHOT,
>> -							"arm-smmu-v3-priq",
>> -							smmu);
>> +			ret = request_irq(irq, 0, arm_smmu_priq_irq_tasklet,
>> +							"arm-smmu-v3-priq", smmu);
>>  			if (ret < 0)
>>  				dev_warn(smmu->dev,
>>  					 "failed to enable priq irq\n");
>> @@ -2254,11 +2598,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
>>  		 * Cavium ThunderX2 implementation doesn't support unique irq
>>  		 * lines. Use a single irq line for all the SMMUv3 interrupts.
>>  		 */
>> -		ret = devm_request_threaded_irq(smmu->dev, irq,
>> -					arm_smmu_combined_irq_handler,
>> -					arm_smmu_combined_irq_thread,
>> -					IRQF_ONESHOT,
>> -					"arm-smmu-v3-combined-irq", smmu);
>> +		ret = request_irq(irq, 0, arm_smmu_combined_irq_handler,
>> +						"arm-smmu-v3-combined-irq", smmu);
>>  		if (ret < 0)
>>  			dev_warn(smmu->dev, "failed to enable combined irq\n");
>>  	} else
>> @@ -2287,7 +2628,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
>>  	return ret;
>>  }
>>  -static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>> +static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
>>  {
>>  	int ret;
>>  	u32 reg, enables;
>> @@ -2297,7 +2638,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>>  	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
>>  	if (reg & CR0_SMMUEN) {
>>  		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>> -		WARN_ON(is_kdump_kernel() && !disable_bypass);
>> +		WARN_ON(!disable_bypass);
>>  		arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
>>  	}
>>  @@ -2401,11 +2742,14 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>>  		return ret;
>>  	}
>>  -	if (is_kdump_kernel())
>> -		enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
>> +	/* Initialize tasklets for threaded IRQs*/
>> +	tasklet_init(&smmu->evtq_irq_tasklet, arm_smmu_evtq_tasklet, smmu);
>> +	tasklet_init(&smmu->priq_irq_tasklet, arm_smmu_priq_tasklet, smmu);
>> +	tasklet_init(&smmu->combined_irq_tasklet, arm_smmu_combined_irq_tasklet,
>> +				 smmu);
>>    	/* Enable the SMMU interface, or ensure bypass */
>> -	if (!bypass || disable_bypass) {
>> +	if (disable_bypass) {
>>  		enables |= CR0_SMMUEN;
>>  	} else {
>>  		ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT);
>> @@ -2470,8 +2814,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>>  	if (reg & IDR0_SEV)
>>  		smmu->features |= ARM_SMMU_FEAT_SEV;
>>  +#ifdef CONFIG_MSI
>>  	if (reg & IDR0_MSI)
>>  		smmu->features |= ARM_SMMU_FEAT_MSI;
>> +#endif
>>    	if (reg & IDR0_HYP)
>>  		smmu->features |= ARM_SMMU_FEAT_HYP;
>> @@ -2496,7 +2842,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>>  		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
>>    	if (!(reg & IDR0_S2P)) {
>> -		dev_err(smmu->dev, "no translation support!\n");
>> +		dev_err(smmu->dev, "no stage-2 translation support!\n");
>>  		return -ENXIO;
>>  	}
>>  @@ -2593,8 +2939,12 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>>  		smmu->oas = 48;
>>  	}
>>  +	smmu->oas = min_t(unsigned long, PADDR_BITS, smmu->oas);
>>  	smmu->ias = max(smmu->ias, smmu->oas);
>>  +	/* Xen: Set maximum Stage-2 input size supported by the SMMU. */
>> +	p2m_restrict_ipa_bits(smmu->ias);
>> +
>>  	dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n",
>>  		 smmu->ias, smmu->oas, smmu->features);
>>  	return 0;
>> @@ -2645,7 +2995,7 @@ static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev,
>>  static int arm_smmu_device_dt_probe(struct platform_device *pdev,
>>  				    struct arm_smmu_device *smmu)
>>  {
>> -	struct device *dev = &pdev->dev;
>> +	struct device *dev = pdev;
>>  	u32 cells;
>>  	int ret = -EINVAL;
>>  @@ -2658,7 +3008,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
>>    	parse_driver_options(smmu);
>>  -	if (of_dma_is_coherent(dev->of_node))
>> +	if (dt_get_property(dev->of_node, "dma-coherent", NULL))
>>  		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
>>    	return ret;
>> @@ -2672,63 +3022,49 @@ static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu)
>>  		return SZ_128K;
>>  }
>>  -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start,
>> -				      resource_size_t size)
>> -{
>> -	struct resource res = {
>> -		.flags = IORESOURCE_MEM,
>> -		.start = start,
>> -		.end = start + size - 1,
>> -	};
>> -
>> -	return devm_ioremap_resource(dev, &res);
>> -}
>> -
>>  static int arm_smmu_device_probe(struct platform_device *pdev)
>>  {
>>  	int irq, ret;
>> -	struct resource *res;
>> -	resource_size_t ioaddr;
>> +	paddr_t ioaddr, iosize;
>>  	struct arm_smmu_device *smmu;
>> -	struct device *dev = &pdev->dev;
>> -	bool bypass;
>>  -	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>> +	smmu = xzalloc(struct arm_smmu_device);
>>  	if (!smmu) {
>> -		dev_err(dev, "failed to allocate arm_smmu_device\n");
>> +		dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>  		return -ENOMEM;
>>  	}
>> -	smmu->dev = dev;
>> +	smmu->dev = pdev;
>>  -	if (dev->of_node) {
>> +	if (pdev->of_node) {
>>  		ret = arm_smmu_device_dt_probe(pdev, smmu);
>> +		if (ret)
>> +			return -EINVAL;
>>  	} else {
>>  		ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>  		if (ret == -ENODEV)
>>  			return ret;
>>  	}
>>  -	/* Set bypass mode according to firmware probing result */
>> -	bypass = !!ret;
>> -
>>  	/* Base address */
>> -	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> -	if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>> -		dev_err(dev, "MMIO region too small (%pr)\n", res);
>> +	ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>> +	if (ret)
>> +		return -ENODEV;
>> +
>> +	if (iosize < arm_smmu_resource_size(smmu)) {
>> +		dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>  		return -EINVAL;
>>  	}
>> -	ioaddr = res->start;
>>    	/*
>>  	 * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>> -	 * the PMCG registers which are reserved by the PMU driver.
>> +	 * the PMCG registers which are optional and currently not supported.
>>  	 */
>> -	smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>> +	smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>  	if (IS_ERR(smmu->base))
>>  		return PTR_ERR(smmu->base);
>>  -	if (arm_smmu_resource_size(smmu) > SZ_64K) {
>> -		smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>> +	if (iosize > SZ_64K) {
>> +		smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>  					       ARM_SMMU_REG_SZ);
>>  		if (IS_ERR(smmu->page1))
>>  			return PTR_ERR(smmu->page1);
>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>  		return ret;
>>    	/* Reset the device */
>> -	ret = arm_smmu_device_reset(smmu, bypass);
>> +	ret = arm_smmu_device_reset(smmu);
>>  	if (ret)
>>  		return ret;
>>  +	/*
>> +	 * Keep a list of all probed devices. This will be used to query
>> +	 * the smmu devices based on the fwnode.
>> +	 */
>> +	INIT_LIST_HEAD(&smmu->devices);
>> +
>> +	spin_lock(&arm_smmu_devices_lock);
>> +	list_add(&smmu->devices, &arm_smmu_devices);
>> +	spin_unlock(&arm_smmu_devices_lock);
>> +
>>  	return 0;
>>  }
>>  -static const struct of_device_id arm_smmu_of_match[] = {
>> +static const struct dt_device_match arm_smmu_of_match[] = {
>>  	{ .compatible = "arm,smmu-v3", },
>>  	{ },
>>  };
>> +
>> +/* Start of Xen specific code. */
>> +static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
>> +{
>> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
>> +	struct iommu_domain *io_domain;
>> +
>> +	spin_lock(&xen_domain->lock);
>> +
>> +	list_for_each_entry(io_domain, &xen_domain->contexts, list) {
>> +		/*
>> +		 * Only invalidate the context when SMMU is present.
>> +		 * This is because the context initialization is delayed
>> +		 * until a master has been added.
>> +		 */
>> +		if (unlikely(!ACCESS_ONCE(to_smmu_domain(io_domain)->smmu)))
>> +			continue;
>> +
>> +		arm_smmu_tlb_inv_context(to_smmu_domain(io_domain));
>> +	}
>> +
>> +	spin_unlock(&xen_domain->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static int __must_check arm_smmu_iotlb_flush(struct domain *d, dfn_t dfn,
>> +				unsigned long page_count, unsigned int flush_flags)
>> +{
>> +	return arm_smmu_iotlb_flush_all(d);
>> +}
>> +
>> +static struct arm_smmu_device *arm_smmu_get_by_dev(struct device *dev)
>> +{
>> +	struct arm_smmu_device *smmu = NULL;
>> +
>> +	spin_lock(&arm_smmu_devices_lock);
>> +
>> +	list_for_each_entry(smmu, &arm_smmu_devices, devices) {
>> +		if (smmu->dev  == dev) {
>> +			spin_unlock(&arm_smmu_devices_lock);
>> +			return smmu;
>> +		}
>> +	}
>> +
>> +	spin_unlock(&arm_smmu_devices_lock);
>> +
>> +	return NULL;
>> +}
>> +
>> +static struct iommu_domain *arm_smmu_get_domain(struct domain *d,
>> +				struct device *dev)
>> +{
>> +	struct iommu_domain *io_domain;
>> +	struct arm_smmu_domain *smmu_domain;
>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
>> +	struct arm_smmu_device *smmu = arm_smmu_get_by_dev(fwspec->iommu_dev);
>> +
>> +	if (!smmu)
>> +		return NULL;
>> +
>> +	/*
>> +	 * Loop through the &xen_domain->contexts to locate a context
>> +	 * assigned to this SMMU
>> +	 */
>> +	list_for_each_entry(io_domain, &xen_domain->contexts, list) {
>> +		smmu_domain = to_smmu_domain(io_domain);
>> +		if (smmu_domain->smmu == smmu)
>> +			return io_domain;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static void arm_smmu_destroy_iommu_domain(struct iommu_domain *io_domain)
>> +{
>> +	list_del(&io_domain->list);
>> +	arm_smmu_domain_free(io_domain);
>> +}
>> +
>> +static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
>> +		struct device *dev, u32 flag)
>> +{
>> +	int ret = 0;
>> +	struct iommu_domain *io_domain;
>> +	struct arm_smmu_domain *smmu_domain;
>> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
>> +
>> +	spin_lock(&xen_domain->lock);
>> +
>> +	/*
>> +	 * Check to see if an iommu_domain already exists for this xen domain
>> +	 * under the same SMMU
>> +	 */
>> +	io_domain = arm_smmu_get_domain(d, dev);
>> +	if (!io_domain) {
>> +		io_domain = arm_smmu_domain_alloc();
>> +		if (!io_domain) {
>> +			ret = -ENOMEM;
>> +			goto out;
>> +		}
>> +		smmu_domain = to_smmu_domain(io_domain);
>> +		smmu_domain->d = d;
>> +
>> +		/* Chain the new context to the domain */
>> +		list_add(&io_domain->list, &xen_domain->contexts);
>> +	}
>> +
>> +	ret = arm_smmu_attach_dev(io_domain, dev);
>> +	if (ret) {
>> +		if (io_domain->ref.counter == 0)
>> +			arm_smmu_destroy_iommu_domain(io_domain);
>> +	} else {
>> +		atomic_inc(&io_domain->ref);
>> +	}
>> +
>> +out:
>> +	spin_unlock(&xen_domain->lock);
>> +	return ret;
>> +}
>> +
>> +static int arm_smmu_deassign_dev(struct domain *d, struct device *dev)
>> +{
>> +	struct iommu_domain *io_domain = arm_smmu_get_domain(d, dev);
>> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
>> +	struct arm_smmu_domain *smmu_domain = to_smmu_domain(io_domain);
>> +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>> +
>> +	if (!smmu_domain || smmu_domain->d != d) {
>> +		dev_err(dev, " not attached to domain %d\n", d->domain_id);
>> +		return -ESRCH;
>> +	}
>> +
>> +	spin_lock(&xen_domain->lock);
>> +
>> +	arm_smmu_detach_dev(master);
>> +	atomic_dec(&io_domain->ref);
>> +
>> +	if (io_domain->ref.counter == 0)
>> +		arm_smmu_destroy_iommu_domain(io_domain);
>> +
>> +	spin_unlock(&xen_domain->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static int arm_smmu_reassign_dev(struct domain *s, struct domain *t,
>> +				u8 devfn,  struct device *dev)
>> +{
>> +	int ret = 0;
>> +
>> +	/* Don't allow remapping on other domain than hwdom */
>> +	if (t && t != hardware_domain)
>> +		return -EPERM;
>> +
>> +	if (t == s)
>> +		return 0;
>> +
>> +	ret = arm_smmu_deassign_dev(s, dev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (t) {
>> +		/* No flags are defined for ARM. */
>> +		ret = arm_smmu_assign_dev(t, devfn, dev, 0);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int arm_smmu_iommu_xen_domain_init(struct domain *d)
>> +{
>> +	struct arm_smmu_xen_domain *xen_domain;
>> +
>> +	xen_domain = xzalloc(struct arm_smmu_xen_domain);
>> +	if (!xen_domain)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&xen_domain->lock);
>> +	INIT_LIST_HEAD(&xen_domain->contexts);
>> +
>> +	dom_iommu(d)->arch.priv = xen_domain;
>> +	return 0;
>> +
>> +}
>> +
>> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
>> +{
> 
> Both SMMUv2 and IPMMU perform some actions here. Any reason we don't need to do the same here?
> 
>     /* Set to false options not supported on ARM. */
>     if ( iommu_hwdom_inclusive )
>         printk(XENLOG_WARNING
>         "map-inclusive dom0-iommu option is not supported on ARM\n");
>     iommu_hwdom_inclusive = false;
>     if ( iommu_hwdom_reserved == 1 )
>         printk(XENLOG_WARNING
>         "map-reserved dom0-iommu option is not supported on ARM\n");
>     iommu_hwdom_reserved = 0;
> 
>     arch_iommu_hwdom_init(d);

I will add the above code for SMMUv3 also.

Regards,
Rahul

>> +}
>> +
>> +static void arm_smmu_iommu_xen_domain_teardown(struct domain *d)
>> +{
>> +	struct arm_smmu_xen_domain *xen_domain = dom_iommu(d)->arch.priv;
>> +
>> +	ASSERT(list_empty(&xen_domain->contexts));
>> +	xfree(xen_domain);
>> +}
>> +
>> +static const struct iommu_ops arm_smmu_iommu_ops = {
>> +	.init		= arm_smmu_iommu_xen_domain_init,
>> +	.hwdom_init		= arm_smmu_iommu_hwdom_init,
>> +	.teardown		= arm_smmu_iommu_xen_domain_teardown,
>> +	.iotlb_flush		= arm_smmu_iotlb_flush,
>> +	.iotlb_flush_all	= arm_smmu_iotlb_flush_all,
>> +	.assign_device		= arm_smmu_assign_dev,
>> +	.reassign_device	= arm_smmu_reassign_dev,
>> +	.map_page		= arm_iommu_map_page,
>> +	.unmap_page		= arm_iommu_unmap_page,
>> +	.dt_xlate		= arm_smmu_dt_xlate,
>> +	.add_device		= arm_smmu_add_device,
>> +};
>> +
>> +static __init int arm_smmu_dt_init(struct dt_device_node *dev,
>> +				const void *data)
>> +{
>> +	int rc;
>> +
>> +	/*
>> +	 * Even if the device can't be initialized, we don't want to
>> +	 * give the SMMU device to dom0.
>> +	 */
>> +	dt_device_set_used_by(dev, DOMID_XEN);
>> +
>> +	rc = arm_smmu_device_probe(dt_to_dev(dev));
>> +	if (rc)
>> +		return rc;
>> +
>> +	iommu_set_ops(&arm_smmu_iommu_ops);
>> +
>> +	return 0;
>> +}
>> +
>> +DT_DEVICE_START(smmuv3, "ARM SMMU V3", DEVICE_IOMMU)
>> +.dt_match = arm_smmu_of_match,
>> +.init = arm_smmu_dt_init,
>> +DT_DEVICE_END
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko

Oleksandr Tyshchenko Jan. 12, 2021, 8:59 p.m. UTC | #5

On 12.01.21 11:41, Rahul Singh wrote:

Hi Rahul


>
>>>   -static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>>> +static int arm_smmu_dt_xlate(struct device *dev,
>>> +				const struct dt_phandle_args *args)
>>>   {
>>> -	return iommu_fwspec_add_ids(dev, args->args, 1);
>>> +	int ret;
>>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> Please bear in mind I am not familiar with the SMMU, but don't we need to perform a some kind
>> of sanity check of passed DT IOMMU specifier here?
> I checked the code follow we will never hit the dt_xlate without IOMMU specifier but anyway I will add the sanity check.
By sanity check I meant to make sure that device ID (stream ID) is in 
allowed range (of course, if this is relevant for SMMU).
For example, for IPMMU-VMSA we have a check that device ID (uTLB) is 
less than max uTLB number.


>   
>
>
>>> +
>>> +static int arm_smmu_iommu_xen_domain_init(struct domain *d)
>>> +{
>>> +	struct arm_smmu_xen_domain *xen_domain;
>>> +
>>> +	xen_domain = xzalloc(struct arm_smmu_xen_domain);
>>> +	if (!xen_domain)
>>> +		return -ENOMEM;
>>> +
>>> +	spin_lock_init(&xen_domain->lock);
>>> +	INIT_LIST_HEAD(&xen_domain->contexts);
>>> +
>>> +	dom_iommu(d)->arch.priv = xen_domain;
>>> +	return 0;
>>> +
>>> +}
>>> +
>>> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
>>> +{
>> Both SMMUv2 and IPMMU perform some actions here. Any reason we don't need to do the same here?
>>
>>      /* Set to false options not supported on ARM. */
>>      if ( iommu_hwdom_inclusive )
>>          printk(XENLOG_WARNING
>>          "map-inclusive dom0-iommu option is not supported on ARM\n");
>>      iommu_hwdom_inclusive = false;
>>      if ( iommu_hwdom_reserved == 1 )
>>          printk(XENLOG_WARNING
>>          "map-reserved dom0-iommu option is not supported on ARM\n");
>>      iommu_hwdom_reserved = 0;
>>
>>      arch_iommu_hwdom_init(d);
> I will add the above code for SMMUv3 also.

Great.

I was thinking about it, this is the third IOMMU driver on Arm which has 
to disable the _same_ unsupported options, probably this code wants to 
be folded in arch_iommu_hwdom_init() to avoid duplication?

Julien Grall Jan. 15, 2021, 12:37 p.m. UTC | #6

Hi Rahul,

On 08/01/2021 14:46, Rahul Singh wrote:
> Add support for ARM architected SMMUv3 implementation. It is based on
> the Linux SMMUv3 driver.
> 
> Driver is currently supported as Tech Preview.
> 
> Major differences with regard to Linux driver are as follows:
> 2. Only Stage-2 translation is supported as compared to the Linux driver
>     that supports both Stage-1 and Stage-2 translations.
> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>     capability to share the page tables with the CPU.
> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>     and priority queue IRQ handling.
> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>     access functions based on atomic operations implemented in Linux.
>     Atomic functions used by the commands queue access functions are not
>     implemented in XEN therefore we decided to port the earlier version
>     of the code. Atomic operations are introduced to fix the bottleneck
>     of the SMMU command queue insertion operation. A new algorithm for
>     inserting commands into the queue is introduced, which is lock-free
>     on the fast-path.
>     Consequence of reverting the patch is that the command queue
>     insertion will be slow for large systems as spinlock will be used to
>     serializes accesses from all CPUs to the single queue supported by
>     the hardware. Once the proper atomic operations will be available in
>     XEN the driver can be updated.
> 6. Spin lock is used in place of mutex when attaching a device to the
>     SMMU, as there is no blocking locks implementation available in XEN.
>     This might introduce latency in XEN. Need to investigate before
>     driver is out for tech preview.
> 7. PCI ATS functionality is not supported, as there is no support
>     available in XEN to test the functionality. Code is not tested and
>     compiled. Code is guarded by the flag CONFIG_PCI_ATS.
> 8. MSI interrupts are not supported as there is no support available in
>     XEN to request MSI interrupts. Code is not tested and compiled. Code
>     is guarded by the flag CONFIG_MSI.
> 
> Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Aside what Oleksandr pointed out, this code looks ok to me.

Cheers,

Rahul Singh Jan. 15, 2021, 12:38 p.m. UTC | #7

Hello Oleksandr,

> On 12 Jan 2021, at 8:59 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> On 12.01.21 11:41, Rahul Singh wrote:
> 
> Hi Rahul
> 
> 
>> 
>>>>  -static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>>>> +static int arm_smmu_dt_xlate(struct device *dev,
>>>> +				const struct dt_phandle_args *args)
>>>>  {
>>>> -	return iommu_fwspec_add_ids(dev, args->args, 1);
>>>> +	int ret;
>>>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>> Please bear in mind I am not familiar with the SMMU, but don't we need to perform a some kind
>>> of sanity check of passed DT IOMMU specifier here?
>> I checked the code follow we will never hit the dt_xlate without IOMMU specifier but anyway I will add the sanity check.
> By sanity check I meant to make sure that device ID (stream ID) is in allowed range (of course, if this is relevant for SMMU).
> For example, for IPMMU-VMSA we have a check that device ID (uTLB) is less than max uTLB number.

Sorry I misunderstood your previous comments. Yes SMMUv3 driver is performing the sanity check for Stream Id before configuring the hardware in function arm_smmu_sid_in_range(). 
> 
>>  
>> 
>>>> +
>>>> +static int arm_smmu_iommu_xen_domain_init(struct domain *d)
>>>> +{
>>>> +	struct arm_smmu_xen_domain *xen_domain;
>>>> +
>>>> +	xen_domain = xzalloc(struct arm_smmu_xen_domain);
>>>> +	if (!xen_domain)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	spin_lock_init(&xen_domain->lock);
>>>> +	INIT_LIST_HEAD(&xen_domain->contexts);
>>>> +
>>>> +	dom_iommu(d)->arch.priv = xen_domain;
>>>> +	return 0;
>>>> +
>>>> +}
>>>> +
>>>> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
>>>> +{
>>> Both SMMUv2 and IPMMU perform some actions here. Any reason we don't need to do the same here?
>>> 
>>>     /* Set to false options not supported on ARM. */
>>>     if ( iommu_hwdom_inclusive )
>>>         printk(XENLOG_WARNING
>>>         "map-inclusive dom0-iommu option is not supported on ARM\n");
>>>     iommu_hwdom_inclusive = false;
>>>     if ( iommu_hwdom_reserved == 1 )
>>>         printk(XENLOG_WARNING
>>>         "map-reserved dom0-iommu option is not supported on ARM\n");
>>>     iommu_hwdom_reserved = 0;
>>> 
>>>     arch_iommu_hwdom_init(d);
>> I will add the above code for SMMUv3 also.
> 
> Great.
> 
> I was thinking about it, this is the third IOMMU driver on Arm which has to disable the _same_ unsupported options, probably this code wants to be folded in arch_iommu_hwdom_init() to avoid duplication?

Yes I also agree with you to avoid duplication we can move the come code to the function arch_iommu_hwdom_init().
I will submit the patch(not part of this series)  if everyone is ok to move the common code to arch_iommu_hwdom_init().

Regards,
Rahul
 
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko
>

Julien Grall Jan. 15, 2021, 12:40 p.m. UTC | #8

On 15/01/2021 12:38, Rahul Singh wrote:
> Hello Oleksandr,
> 
>> On 12 Jan 2021, at 8:59 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> On 12.01.21 11:41, Rahul Singh wrote:
>>
>> Hi Rahul
>>
>>
>>>
>>>>>   -static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
>>>>> +static int arm_smmu_dt_xlate(struct device *dev,
>>>>> +				const struct dt_phandle_args *args)
>>>>>   {
>>>>> -	return iommu_fwspec_add_ids(dev, args->args, 1);
>>>>> +	int ret;
>>>>> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>>>> Please bear in mind I am not familiar with the SMMU, but don't we need to perform a some kind
>>>> of sanity check of passed DT IOMMU specifier here?
>>> I checked the code follow we will never hit the dt_xlate without IOMMU specifier but anyway I will add the sanity check.
>> By sanity check I meant to make sure that device ID (stream ID) is in allowed range (of course, if this is relevant for SMMU).
>> For example, for IPMMU-VMSA we have a check that device ID (uTLB) is less than max uTLB number.
> 
> Sorry I misunderstood your previous comments. Yes SMMUv3 driver is performing the sanity check for Stream Id before configuring the hardware in function arm_smmu_sid_in_range().
>>
>>>   
>>>
>>>>> +
>>>>> +static int arm_smmu_iommu_xen_domain_init(struct domain *d)
>>>>> +{
>>>>> +	struct arm_smmu_xen_domain *xen_domain;
>>>>> +
>>>>> +	xen_domain = xzalloc(struct arm_smmu_xen_domain);
>>>>> +	if (!xen_domain)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>> +	spin_lock_init(&xen_domain->lock);
>>>>> +	INIT_LIST_HEAD(&xen_domain->contexts);
>>>>> +
>>>>> +	dom_iommu(d)->arch.priv = xen_domain;
>>>>> +	return 0;
>>>>> +
>>>>> +}
>>>>> +
>>>>> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
>>>>> +{
>>>> Both SMMUv2 and IPMMU perform some actions here. Any reason we don't need to do the same here?
>>>>
>>>>      /* Set to false options not supported on ARM. */
>>>>      if ( iommu_hwdom_inclusive )
>>>>          printk(XENLOG_WARNING
>>>>          "map-inclusive dom0-iommu option is not supported on ARM\n");
>>>>      iommu_hwdom_inclusive = false;
>>>>      if ( iommu_hwdom_reserved == 1 )
>>>>          printk(XENLOG_WARNING
>>>>          "map-reserved dom0-iommu option is not supported on ARM\n");
>>>>      iommu_hwdom_reserved = 0;
>>>>
>>>>      arch_iommu_hwdom_init(d);
>>> I will add the above code for SMMUv3 also.
>>
>> Great.
>>
>> I was thinking about it, this is the third IOMMU driver on Arm which has to disable the _same_ unsupported options, probably this code wants to be folded in arch_iommu_hwdom_init() to avoid duplication?
> 
> Yes I also agree with you to avoid duplication we can move the come code to the function arch_iommu_hwdom_init().
> I will submit the patch(not part of this series)  if everyone is ok to move the common code to arch_iommu_hwdom_init().

+1 on a follow-up.

Cheers,

Rahul Singh Jan. 18, 2021, 3:33 p.m. UTC | #9

Hello Oleksandr,

> On 11 Jan 2021, at 4:39 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> Hi Rahul
> 
> 
>>> -
>>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>>   {
>>>       int irq, ret;
>>> -    struct resource *res;
>>> -    resource_size_t ioaddr;
>>> +    paddr_t ioaddr, iosize;
>>>       struct arm_smmu_device *smmu;
>>> -    struct device *dev = &pdev->dev;
>>> -    bool bypass;
>>>   -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>> +    smmu = xzalloc(struct arm_smmu_device);
>>>       if (!smmu) {
>>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>>           return -ENOMEM;
>>>       }
>>> -    smmu->dev = dev;
>>> +    smmu->dev = pdev;
>>>   -    if (dev->of_node) {
>>> +    if (pdev->of_node) {
>>>           ret = arm_smmu_device_dt_probe(pdev, smmu);
>>> +        if (ret)
>>> +            return -EINVAL;
>>>       } else {
>>>           ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>>           if (ret == -ENODEV)
>>>               return ret;
>>>       }
>>>   -    /* Set bypass mode according to firmware probing result */
>>> -    bypass = !!ret;
>>> -
>>>       /* Base address */
>>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>>> +    if (ret)
>>> +        return -ENODEV;
>>> +
>>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>>           return -EINVAL;
>>>       }
>>> -    ioaddr = res->start;
>>>         /*
>>>        * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>>> -     * the PMCG registers which are reserved by the PMU driver.
>>> +     * the PMCG registers which are optional and currently not supported.
>>>        */
>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>>       if (IS_ERR(smmu->base))
>>>           return PTR_ERR(smmu->base);
>>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>> +    if (iosize > SZ_64K) {
>>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>>                              ARM_SMMU_REG_SZ);
>>>           if (IS_ERR(smmu->page1))
>>>               return PTR_ERR(smmu->page1);
>>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>           return ret;
>>>         /* Reset the device */
>>> -    ret = arm_smmu_device_reset(smmu, bypass);
>>> +    ret = arm_smmu_device_reset(smmu);
>>>       if (ret)
>>>           return ret;
>>>   +    /*
>>> +     * Keep a list of all probed devices. This will be used to query
>>> +     * the smmu devices based on the fwnode.
>>> +     */
>>> +    INIT_LIST_HEAD(&smmu->devices);
>>> +
>>> +    spin_lock(&arm_smmu_devices_lock);
>>> +    list_add(&smmu->devices, &arm_smmu_devices);
>>> +    spin_unlock(&arm_smmu_devices_lock);
> 
> Looks like that we need some kind of manual roll-back logic here in case of error during probe (there is no real devm_*):
> 
> iounmap, xfree, etc.

I agree with you that manual roll-back logic is good to have clean code but in this scenario what I have found out that if there is an error during probe arm_smmu_device_probe() will return and XEN will not continue to boot (call panic function) , in that case if we free the memory also there is no much difference. That why I decided not to modify the code that we ported from Linux.

XEN) I/O virtualisation disabled
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Couldn't configure correctly all the IOMMUs.
(XEN) ****************************************
(XEN) 
(XEN) Manual reset required ('noreboot' specified)

Do we have a requirement to continue to boot the XEN if there is an IOMMU available in the system and IOMMU probe is failed? If yes then I will modify the code to free all the resources if there is error during probe.


Regards,
Rahul
> 
> 
>>> 
>>> +
>>>       return 0;
>>>   }
>> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko
>

Oleksandr Tyshchenko Jan. 18, 2021, 4:20 p.m. UTC | #10

On 18.01.21 17:33, Rahul Singh wrote:
> Hello Oleksandr,
>
>> On 11 Jan 2021, at 4:39 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> Hi Rahul

Hi Rahul


>>
>>
>>>> -
>>>>    static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>    {
>>>>        int irq, ret;
>>>> -    struct resource *res;
>>>> -    resource_size_t ioaddr;
>>>> +    paddr_t ioaddr, iosize;
>>>>        struct arm_smmu_device *smmu;
>>>> -    struct device *dev = &pdev->dev;
>>>> -    bool bypass;
>>>>    -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>>> +    smmu = xzalloc(struct arm_smmu_device);
>>>>        if (!smmu) {
>>>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>>>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>>>            return -ENOMEM;
>>>>        }
>>>> -    smmu->dev = dev;
>>>> +    smmu->dev = pdev;
>>>>    -    if (dev->of_node) {
>>>> +    if (pdev->of_node) {
>>>>            ret = arm_smmu_device_dt_probe(pdev, smmu);
>>>> +        if (ret)
>>>> +            return -EINVAL;
>>>>        } else {
>>>>            ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>>>            if (ret == -ENODEV)
>>>>                return ret;
>>>>        }
>>>>    -    /* Set bypass mode according to firmware probing result */
>>>> -    bypass = !!ret;
>>>> -
>>>>        /* Base address */
>>>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>>>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>>>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>>>> +    if (ret)
>>>> +        return -ENODEV;
>>>> +
>>>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>>>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>>>            return -EINVAL;
>>>>        }
>>>> -    ioaddr = res->start;
>>>>          /*
>>>>         * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>>>> -     * the PMCG registers which are reserved by the PMU driver.
>>>> +     * the PMCG registers which are optional and currently not supported.
>>>>         */
>>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>>>        if (IS_ERR(smmu->base))
>>>>            return PTR_ERR(smmu->base);
>>>>    -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>>> +    if (iosize > SZ_64K) {
>>>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>>>                               ARM_SMMU_REG_SZ);
>>>>            if (IS_ERR(smmu->page1))
>>>>                return PTR_ERR(smmu->page1);
>>>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>            return ret;
>>>>          /* Reset the device */
>>>> -    ret = arm_smmu_device_reset(smmu, bypass);
>>>> +    ret = arm_smmu_device_reset(smmu);
>>>>        if (ret)
>>>>            return ret;
>>>>    +    /*
>>>> +     * Keep a list of all probed devices. This will be used to query
>>>> +     * the smmu devices based on the fwnode.
>>>> +     */
>>>> +    INIT_LIST_HEAD(&smmu->devices);
>>>> +
>>>> +    spin_lock(&arm_smmu_devices_lock);
>>>> +    list_add(&smmu->devices, &arm_smmu_devices);
>>>> +    spin_unlock(&arm_smmu_devices_lock);
>> Looks like that we need some kind of manual roll-back logic here in case of error during probe (there is no real devm_*):
>>
>> iounmap, xfree, etc.
> I agree with you that manual roll-back logic is good to have clean code but in this scenario what I have found out that if there is an error during probe arm_smmu_device_probe() will return and XEN will not continue to boot (call panic function) , in that case if we free the memory also there is no much difference. That why I decided not to modify the code that we ported from Linux.
>
> XEN) I/O virtualisation disabled
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Couldn't configure correctly all the IOMMUs.
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
>
> Do we have a requirement to continue to boot the XEN if there is an IOMMU available in the system and IOMMU probe is failed? If yes then I will modify the code to free all the resources if there is error during probe.

Xen won't call panic if IOMMU driver returns -ENODEV and will continue 
to boot. For example, if the IOMMU is present but cannot be used in Xen 
for some reason (doesn't support page table sharing, etc)

Rahul Singh Jan. 18, 2021, 4:57 p.m. UTC | #11

Hello Oleksandr,

> On 18 Jan 2021, at 4:20 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> On 18.01.21 17:33, Rahul Singh wrote:
>> Hello Oleksandr,
>> 
>>> On 11 Jan 2021, at 4:39 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>> 
>>> 
>>> Hi Rahul
> 
> Hi Rahul
> 
> 
>>> 
>>> 
>>>>> -
>>>>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>   {
>>>>>       int irq, ret;
>>>>> -    struct resource *res;
>>>>> -    resource_size_t ioaddr;
>>>>> +    paddr_t ioaddr, iosize;
>>>>>       struct arm_smmu_device *smmu;
>>>>> -    struct device *dev = &pdev->dev;
>>>>> -    bool bypass;
>>>>>   -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>>>> +    smmu = xzalloc(struct arm_smmu_device);
>>>>>       if (!smmu) {
>>>>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>>>>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>>>>           return -ENOMEM;
>>>>>       }
>>>>> -    smmu->dev = dev;
>>>>> +    smmu->dev = pdev;
>>>>>   -    if (dev->of_node) {
>>>>> +    if (pdev->of_node) {
>>>>>           ret = arm_smmu_device_dt_probe(pdev, smmu);
>>>>> +        if (ret)
>>>>> +            return -EINVAL;
>>>>>       } else {
>>>>>           ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>>>>           if (ret == -ENODEV)
>>>>>               return ret;
>>>>>       }
>>>>>   -    /* Set bypass mode according to firmware probing result */
>>>>> -    bypass = !!ret;
>>>>> -
>>>>>       /* Base address */
>>>>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>>>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>>>>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>>>>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>>>>> +    if (ret)
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>>>>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>>>>           return -EINVAL;
>>>>>       }
>>>>> -    ioaddr = res->start;
>>>>>         /*
>>>>>        * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>>>>> -     * the PMCG registers which are reserved by the PMU driver.
>>>>> +     * the PMCG registers which are optional and currently not supported.
>>>>>        */
>>>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>>>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>>>>       if (IS_ERR(smmu->base))
>>>>>           return PTR_ERR(smmu->base);
>>>>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>>>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>>>> +    if (iosize > SZ_64K) {
>>>>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>>>>                              ARM_SMMU_REG_SZ);
>>>>>           if (IS_ERR(smmu->page1))
>>>>>               return PTR_ERR(smmu->page1);
>>>>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>           return ret;
>>>>>         /* Reset the device */
>>>>> -    ret = arm_smmu_device_reset(smmu, bypass);
>>>>> +    ret = arm_smmu_device_reset(smmu);
>>>>>       if (ret)
>>>>>           return ret;
>>>>>   +    /*
>>>>> +     * Keep a list of all probed devices. This will be used to query
>>>>> +     * the smmu devices based on the fwnode.
>>>>> +     */
>>>>> +    INIT_LIST_HEAD(&smmu->devices);
>>>>> +
>>>>> +    spin_lock(&arm_smmu_devices_lock);
>>>>> +    list_add(&smmu->devices, &arm_smmu_devices);
>>>>> +    spin_unlock(&arm_smmu_devices_lock);
>>> Looks like that we need some kind of manual roll-back logic here in case of error during probe (there is no real devm_*):
>>> 
>>> iounmap, xfree, etc.
>> I agree with you that manual roll-back logic is good to have clean code but in this scenario what I have found out that if there is an error during probe arm_smmu_device_probe() will return and XEN will not continue to boot (call panic function) , in that case if we free the memory also there is no much difference. That why I decided not to modify the code that we ported from Linux.
>> 
>> XEN) I/O virtualisation disabled
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Couldn't configure correctly all the IOMMUs.
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Manual reset required ('noreboot' specified)
>> 
>> Do we have a requirement to continue to boot the XEN if there is an IOMMU available in the system and IOMMU probe is failed? If yes then I will modify the code to free all the resources if there is error during probe.
> 
> Xen won't call panic if IOMMU driver returns -ENODEV and will continue to boot. For example, if the IOMMU is present but cannot be used in Xen for some reason (doesn't support page table sharing, etc)

Yes you are right in case of IOMMU driver probe failed and return -ENODEV XEN will continue to boot. 

I am thinking of if there is a problem with configuring the IOMMU HW and return -ENODEV or  for some reason if IOMMU is present cannot not be used in XEN why we are silently allows XEN to boot and make the system insecure.
As end user might miss the error logs during boot and will think IOMMU is enabled and system is secure but IOMMU is either disable or is working in bypass mode. 

I might be wrong, in that case as per my understanding we should return error and call panic and let user decide either to fix the issue on next boot or boot XEN with cmdline option "iommu=no”

Regards,
Rahul   

> 
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko

Oleksandr Tyshchenko Jan. 19, 2021, 2:43 p.m. UTC | #12

On 18.01.21 18:57, Rahul Singh wrote:
> Hello Oleksandr,

Hi Rahul


>
>> On 18 Jan 2021, at 4:20 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> On 18.01.21 17:33, Rahul Singh wrote:
>>> Hello Oleksandr,
>>>
>>>> On 11 Jan 2021, at 4:39 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>>>
>>>>
>>>> Hi Rahul
>> Hi Rahul
>>
>>
>>>>
>>>>>> -
>>>>>>    static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>>    {
>>>>>>        int irq, ret;
>>>>>> -    struct resource *res;
>>>>>> -    resource_size_t ioaddr;
>>>>>> +    paddr_t ioaddr, iosize;
>>>>>>        struct arm_smmu_device *smmu;
>>>>>> -    struct device *dev = &pdev->dev;
>>>>>> -    bool bypass;
>>>>>>    -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>>>>> +    smmu = xzalloc(struct arm_smmu_device);
>>>>>>        if (!smmu) {
>>>>>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>>>>>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>>>>>            return -ENOMEM;
>>>>>>        }
>>>>>> -    smmu->dev = dev;
>>>>>> +    smmu->dev = pdev;
>>>>>>    -    if (dev->of_node) {
>>>>>> +    if (pdev->of_node) {
>>>>>>            ret = arm_smmu_device_dt_probe(pdev, smmu);
>>>>>> +        if (ret)
>>>>>> +            return -EINVAL;
>>>>>>        } else {
>>>>>>            ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>>>>>            if (ret == -ENODEV)
>>>>>>                return ret;
>>>>>>        }
>>>>>>    -    /* Set bypass mode according to firmware probing result */
>>>>>> -    bypass = !!ret;
>>>>>> -
>>>>>>        /* Base address */
>>>>>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>>>>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>>>>>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>>>>>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>>>>>> +    if (ret)
>>>>>> +        return -ENODEV;
>>>>>> +
>>>>>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>>>>>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>>>>>            return -EINVAL;
>>>>>>        }
>>>>>> -    ioaddr = res->start;
>>>>>>          /*
>>>>>>         * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>>>>>> -     * the PMCG registers which are reserved by the PMU driver.
>>>>>> +     * the PMCG registers which are optional and currently not supported.
>>>>>>         */
>>>>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>>>>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>>>>>        if (IS_ERR(smmu->base))
>>>>>>            return PTR_ERR(smmu->base);
>>>>>>    -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>>>>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>>>>> +    if (iosize > SZ_64K) {
>>>>>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>>>>>                               ARM_SMMU_REG_SZ);
>>>>>>            if (IS_ERR(smmu->page1))
>>>>>>                return PTR_ERR(smmu->page1);
>>>>>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>>            return ret;
>>>>>>          /* Reset the device */
>>>>>> -    ret = arm_smmu_device_reset(smmu, bypass);
>>>>>> +    ret = arm_smmu_device_reset(smmu);
>>>>>>        if (ret)
>>>>>>            return ret;
>>>>>>    +    /*
>>>>>> +     * Keep a list of all probed devices. This will be used to query
>>>>>> +     * the smmu devices based on the fwnode.
>>>>>> +     */
>>>>>> +    INIT_LIST_HEAD(&smmu->devices);
>>>>>> +
>>>>>> +    spin_lock(&arm_smmu_devices_lock);
>>>>>> +    list_add(&smmu->devices, &arm_smmu_devices);
>>>>>> +    spin_unlock(&arm_smmu_devices_lock);
>>>> Looks like that we need some kind of manual roll-back logic here in case of error during probe (there is no real devm_*):
>>>>
>>>> iounmap, xfree, etc.
>>> I agree with you that manual roll-back logic is good to have clean code but in this scenario what I have found out that if there is an error during probe arm_smmu_device_probe() will return and XEN will not continue to boot (call panic function) , in that case if we free the memory also there is no much difference. That why I decided not to modify the code that we ported from Linux.
>>>
>>> XEN) I/O virtualisation disabled
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Couldn't configure correctly all the IOMMUs.
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Manual reset required ('noreboot' specified)
>>>
>>> Do we have a requirement to continue to boot the XEN if there is an IOMMU available in the system and IOMMU probe is failed? If yes then I will modify the code to free all the resources if there is error during probe.
>> Xen won't call panic if IOMMU driver returns -ENODEV and will continue to boot. For example, if the IOMMU is present but cannot be used in Xen for some reason (doesn't support page table sharing, etc)
> Yes you are right in case of IOMMU driver probe failed and return -ENODEV XEN will continue to boot.
>
> I am thinking of if there is a problem with configuring the IOMMU HW and return -ENODEV or  for some reason if IOMMU is present cannot not be used in XEN why we are silently allows XEN to boot and make the system insecure.
> As end user might miss the error logs during boot and will think IOMMU is enabled and system is secure but IOMMU is either disable or is working in bypass mode.

But, wouldn't end user notice that device passthrough is not functional 
then?


>   
>
> I might be wrong, in that case as per my understanding we should return error and call panic and let user decide either to fix the issue on next boot or boot XEN with cmdline option "iommu=no”
I got your point, but I am not sure I can answer precisely how Xen 
should behave in the situation above, I will let the maintainers comment 
on that. Just a note, the -ENODEV is also returned by the framework if 
the IOMMU is not present (please see iommu_hardware_setup() in 
drivers/passthrough/arm/iommu.c for the details), either Xen doesn't 
have a suitable driver for it or the IOMMU H/W is not available in the 
target SoC, etc. I am not quite sure we should call panic in such cases.


Regarding the cleanup my point is that driver should be responsible of 
doing it if there is an error during initialization (and it cannot 
continue) regardless on how the common code would handle that (returned 
by driver) error. Now it panics on some conditions, tomorrow it will act 
differently, etc. If driver called panic by itself, it could _probably_ 
be in a position to leave resources unreleased then... This is my 
viewpoint which might be wrong.


>
> Regards,
> Rahul
>
>>
>> -- 
>> Regards,
>>
>> Oleksandr Tyshchenko

Rahul Singh Jan. 19, 2021, 2:53 p.m. UTC | #13

Hello Oleksandr,

> On 19 Jan 2021, at 2:43 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> On 18.01.21 18:57, Rahul Singh wrote:
>> Hello Oleksandr,
> 
> Hi Rahul
> 
> 
>> 
>>> On 18 Jan 2021, at 4:20 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>> 
>>> 
>>> On 18.01.21 17:33, Rahul Singh wrote:
>>>> Hello Oleksandr,
>>>> 
>>>>> On 11 Jan 2021, at 4:39 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Hi Rahul
>>> Hi Rahul
>>> 
>>> 
>>>>> 
>>>>>>> -
>>>>>>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>>>   {
>>>>>>>       int irq, ret;
>>>>>>> -    struct resource *res;
>>>>>>> -    resource_size_t ioaddr;
>>>>>>> +    paddr_t ioaddr, iosize;
>>>>>>>       struct arm_smmu_device *smmu;
>>>>>>> -    struct device *dev = &pdev->dev;
>>>>>>> -    bool bypass;
>>>>>>>   -    smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
>>>>>>> +    smmu = xzalloc(struct arm_smmu_device);
>>>>>>>       if (!smmu) {
>>>>>>> -        dev_err(dev, "failed to allocate arm_smmu_device\n");
>>>>>>> +        dev_err(pdev, "failed to allocate arm_smmu_device\n");
>>>>>>>           return -ENOMEM;
>>>>>>>       }
>>>>>>> -    smmu->dev = dev;
>>>>>>> +    smmu->dev = pdev;
>>>>>>>   -    if (dev->of_node) {
>>>>>>> +    if (pdev->of_node) {
>>>>>>>           ret = arm_smmu_device_dt_probe(pdev, smmu);
>>>>>>> +        if (ret)
>>>>>>> +            return -EINVAL;
>>>>>>>       } else {
>>>>>>>           ret = arm_smmu_device_acpi_probe(pdev, smmu);
>>>>>>>           if (ret == -ENODEV)
>>>>>>>               return ret;
>>>>>>>       }
>>>>>>>   -    /* Set bypass mode according to firmware probing result */
>>>>>>> -    bypass = !!ret;
>>>>>>> -
>>>>>>>       /* Base address */
>>>>>>> -    res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>>>>>> -    if (resource_size(res) < arm_smmu_resource_size(smmu)) {
>>>>>>> -        dev_err(dev, "MMIO region too small (%pr)\n", res);
>>>>>>> +    ret = dt_device_get_address(dev_to_dt(pdev), 0, &ioaddr, &iosize);
>>>>>>> +    if (ret)
>>>>>>> +        return -ENODEV;
>>>>>>> +
>>>>>>> +    if (iosize < arm_smmu_resource_size(smmu)) {
>>>>>>> +        dev_err(pdev, "MMIO region too small (%lx)\n", iosize);
>>>>>>>           return -EINVAL;
>>>>>>>       }
>>>>>>> -    ioaddr = res->start;
>>>>>>>         /*
>>>>>>>        * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>>>>>>> -     * the PMCG registers which are reserved by the PMU driver.
>>>>>>> +     * the PMCG registers which are optional and currently not supported.
>>>>>>>        */
>>>>>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>>>>>> +    smmu->base = ioremap_nocache(ioaddr, ARM_SMMU_REG_SZ);
>>>>>>>       if (IS_ERR(smmu->base))
>>>>>>>           return PTR_ERR(smmu->base);
>>>>>>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>>>>>> -        smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>>>>>> +    if (iosize > SZ_64K) {
>>>>>>> +        smmu->page1 = ioremap_nocache(ioaddr + SZ_64K,
>>>>>>>                              ARM_SMMU_REG_SZ);
>>>>>>>           if (IS_ERR(smmu->page1))
>>>>>>>               return PTR_ERR(smmu->page1);
>>>>>>> @@ -2765,14 +3101,262 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>>>>           return ret;
>>>>>>>         /* Reset the device */
>>>>>>> -    ret = arm_smmu_device_reset(smmu, bypass);
>>>>>>> +    ret = arm_smmu_device_reset(smmu);
>>>>>>>       if (ret)
>>>>>>>           return ret;
>>>>>>>   +    /*
>>>>>>> +     * Keep a list of all probed devices. This will be used to query
>>>>>>> +     * the smmu devices based on the fwnode.
>>>>>>> +     */
>>>>>>> +    INIT_LIST_HEAD(&smmu->devices);
>>>>>>> +
>>>>>>> +    spin_lock(&arm_smmu_devices_lock);
>>>>>>> +    list_add(&smmu->devices, &arm_smmu_devices);
>>>>>>> +    spin_unlock(&arm_smmu_devices_lock);
>>>>> Looks like that we need some kind of manual roll-back logic here in case of error during probe (there is no real devm_*):
>>>>> 
>>>>> iounmap, xfree, etc.
>>>> I agree with you that manual roll-back logic is good to have clean code but in this scenario what I have found out that if there is an error during probe arm_smmu_device_probe() will return and XEN will not continue to boot (call panic function) , in that case if we free the memory also there is no much difference. That why I decided not to modify the code that we ported from Linux.
>>>> 
>>>> XEN) I/O virtualisation disabled
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Couldn't configure correctly all the IOMMUs.
>>>> (XEN) ****************************************
>>>> (XEN)
>>>> (XEN) Manual reset required ('noreboot' specified)
>>>> 
>>>> Do we have a requirement to continue to boot the XEN if there is an IOMMU available in the system and IOMMU probe is failed? If yes then I will modify the code to free all the resources if there is error during probe.
>>> Xen won't call panic if IOMMU driver returns -ENODEV and will continue to boot. For example, if the IOMMU is present but cannot be used in Xen for some reason (doesn't support page table sharing, etc)
>> Yes you are right in case of IOMMU driver probe failed and return -ENODEV XEN will continue to boot.
>> 
>> I am thinking of if there is a problem with configuring the IOMMU HW and return -ENODEV or  for some reason if IOMMU is present cannot not be used in XEN why we are silently allows XEN to boot and make the system insecure.
>> As end user might miss the error logs during boot and will think IOMMU is enabled and system is secure but IOMMU is either disable or is working in bypass mode.
> 
> But, wouldn't end user notice that device passthrough is not functional then?

I am no sure but might be yes as I think if iommu is disabled we cannot passthrough the device.
> 
> 
>>  
>> I might be wrong, in that case as per my understanding we should return error and call panic and let user decide either to fix the issue on next boot or boot XEN with cmdline option "iommu=no”
> I got your point, but I am not sure I can answer precisely how Xen should behave in the situation above, I will let the maintainers comment on that. Just a note, the -ENODEV is also returned by the framework if the IOMMU is not present (please see iommu_hardware_setup() in drivers/passthrough/arm/iommu.c for the details), either Xen doesn't have a suitable driver for it or the IOMMU H/W is not available in the target SoC, etc. I am not quite sure we should call panic in such cases.
> 
> 
> Regarding the cleanup my point is that driver should be responsible of doing it if there is an error during initialization (and it cannot continue) regardless on how the common code would handle that (returned by driver) error. Now it panics on some conditions, tomorrow it will act differently, etc. If driver called panic by itself, it could _probably_ be in a position to leave resources unreleased then... This is my viewpoint which might be wrong.

Yes I agree with you and  I will add the code to free resources if probe failed and will send next version of the patch for review.

Regards,
Rahul
> 
> 
>> 
>> Regards,
>> Rahul
>> 
>>> 
>>> -- 
>>> Regards,
>>> 
>>> Oleksandr Tyshchenko
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko

[v4,11/11] xen/arm: smmuv3: Add support for SMMUv3 driver

Commit Message

Comments

Patch