From patchwork Mon Mar 6 17:30:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timothy Pearson X-Patchwork-Id: 13162046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71F4AC61DA4 for ; Mon, 6 Mar 2023 17:48:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229725AbjCFRsH (ORCPT ); Mon, 6 Mar 2023 12:48:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbjCFRsG (ORCPT ); Mon, 6 Mar 2023 12:48:06 -0500 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 360CD6B5C7 for ; Mon, 6 Mar 2023 09:47:34 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id DD87037E2D608F; Mon, 6 Mar 2023 11:30:21 -0600 (CST) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id C0UBHeda5DkJ; Mon, 6 Mar 2023 11:30:20 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 3D82E37E2D608C; Mon, 6 Mar 2023 11:30:20 -0600 (CST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 3D82E37E2D608C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1678123820; bh=IIMqz1QQYrje3bbf91KJUZcVhibsDYEdfwdZwwDI3R8=; h=Date:From:To:Message-ID:MIME-Version; b=rPL6QstU9zX7HYDbw2GGTvg7rq0YEwiZVo/eVuPp7JpIKBUhaBryow+/qpK1/gFO5 2PkPqB1QrrWDaQWk5lvxjiSUHqbKX8lNl9wd8KWUWzYTchhm+7hI2tcEHODgkirYh1 oBIZ0aX05lA96l/7csbZ54Wzxgr7ci7KxQRwsDmc= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id RxuhlV1h4t_W; Mon, 6 Mar 2023 11:30:20 -0600 (CST) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 2028737E2D6089; Mon, 6 Mar 2023 11:30:20 -0600 (CST) Date: Mon, 6 Mar 2023 11:30:20 -0600 (CST) From: Timothy Pearson To: kvm Cc: linuxppc-dev Message-ID: <525438831.16998517.1678123820075.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v2 1/4] powerpc/iommu: Add "borrowing" iommu_table_group_ops MIME-Version: 1.0 X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC110 (Linux)/8.5.0_GA_3042) Thread-Index: hUSfuDDPqLKZwFP9gR3jsjjw6MwO/A== Thread-Topic: powerpc/iommu: Add "borrowing" iommu_table_group_ops Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows for PEs: control the ownership, create/set/unset a table the hardware for dynamic DMA windows (DDW). VFIO uses the API to implement support on POWER. So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused existing iommu_table structs. This means 1) no DDW 2) ownership transfer is done directly in the VFIO SPAPR TCE driver. Soon POWER is going to get its own iommu_ops and ownership control is going to move there. This implements spapr_tce_table_group_ops which borrows iommu_table tables. The upside is that VFIO needs to know less about POWER. The new ops returns the existing table from create_table() and only checks if the same window is already set. This is only going to work if the default DMA window starts table_group.tce32_start and as big as pe->table_group.tce32_size (not the case for IODA2+ PowerNV). This changes iommu_table_group_ops::take_ownership() to return an error if borrowing a table failed. This should not cause any visible change in behavior for PowerNV. pSeries was not that well tested/supported anyway. Signed-off-by: Alexey Kardashevskiy Signed-off-by: Timothy Pearson --- arch/powerpc/include/asm/iommu.h | 6 +- arch/powerpc/kernel/iommu.c | 98 ++++++++++++++++++++++- arch/powerpc/platforms/powernv/pci-ioda.c | 6 +- arch/powerpc/platforms/pseries/iommu.c | 3 + drivers/vfio/vfio_iommu_spapr_tce.c | 94 ++++------------------ 5 files changed, 121 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 7e29c73e3dd4..678b5bdc79b1 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -175,7 +175,7 @@ struct iommu_table_group_ops { long (*unset_window)(struct iommu_table_group *table_group, int num); /* Switch ownership from platform code to external user (e.g. VFIO) */ - void (*take_ownership)(struct iommu_table_group *table_group); + long (*take_ownership)(struct iommu_table_group *table_group); /* Switch ownership from external user (e.g. VFIO) back to core */ void (*release_ownership)(struct iommu_table_group *table_group); }; @@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, enum dma_data_direction *direction); extern void iommu_tce_kill(struct iommu_table *tbl, unsigned long entry, unsigned long pages); + +extern struct iommu_table_group_ops spapr_tce_table_group_ops; #else static inline void iommu_register_group(struct iommu_table_group *table_group, int pci_domain_number, @@ -303,8 +305,6 @@ extern int iommu_tce_check_gpa(unsigned long page_shift, iommu_tce_check_gpa((tbl)->it_page_shift, (gpa))) extern void iommu_flush_tce(struct iommu_table *tbl); -extern int iommu_take_ownership(struct iommu_table *tbl); -extern void iommu_release_ownership(struct iommu_table *tbl); extern enum dma_data_direction iommu_tce_direction(unsigned long tce); extern unsigned long iommu_direction_to_tce_perm(enum dma_data_direction dir); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index ee95937bdaf1..f9f5a9418092 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1086,7 +1086,7 @@ void iommu_tce_kill(struct iommu_table *tbl, } EXPORT_SYMBOL_GPL(iommu_tce_kill); -int iommu_take_ownership(struct iommu_table *tbl) +static int iommu_take_ownership(struct iommu_table *tbl) { unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; int ret = 0; @@ -1118,9 +1118,8 @@ int iommu_take_ownership(struct iommu_table *tbl) return ret; } -EXPORT_SYMBOL_GPL(iommu_take_ownership); -void iommu_release_ownership(struct iommu_table *tbl) +static void iommu_release_ownership(struct iommu_table *tbl) { unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; @@ -1137,7 +1136,6 @@ void iommu_release_ownership(struct iommu_table *tbl) spin_unlock(&tbl->pools[i].lock); spin_unlock_irqrestore(&tbl->large_pool.lock, flags); } -EXPORT_SYMBOL_GPL(iommu_release_ownership); int iommu_add_device(struct iommu_table_group *table_group, struct device *dev) { @@ -1179,4 +1177,96 @@ void iommu_del_device(struct device *dev) iommu_group_remove_device(dev); } EXPORT_SYMBOL_GPL(iommu_del_device); + +/* + * A simple iommu_table_group_ops which only allows reusing the existing + * iommu_table. This handles VFIO for POWER7 or the nested KVM. + * The ops does not allow creating windows and only allows reusing the existing + * one if it matches table_group->tce32_start/tce32_size/page_shift. + */ +static unsigned long spapr_tce_get_table_size(__u32 page_shift, + __u64 window_size, __u32 levels) +{ + unsigned long size; + + if (levels > 1) + return ~0U; + size = window_size >> (page_shift - 3); + return size; +} + +static long spapr_tce_create_table(struct iommu_table_group *table_group, + int num, __u32 page_shift, __u64 window_size, __u32 levels, + struct iommu_table **ptbl) +{ + struct iommu_table *tbl = table_group->tables[0]; + + if (num > 0) + return -EPERM; + + if (tbl->it_page_shift != page_shift || + tbl->it_size != (window_size >> page_shift) || + tbl->it_indirect_levels != levels - 1) + return -EINVAL; + + *ptbl = iommu_tce_table_get(tbl); + return 0; +} + +static long spapr_tce_set_window(struct iommu_table_group *table_group, + int num, struct iommu_table *tbl) +{ + return tbl == table_group->tables[num] ? 0 : -EPERM; +} + +static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num) +{ + return 0; +} + +static long spapr_tce_take_ownership(struct iommu_table_group *table_group) +{ + int i, j, rc = 0; + + for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = table_group->tables[i]; + + if (!tbl || !tbl->it_map) + continue; + + rc = iommu_take_ownership(tbl); + if (!rc) + continue; + for (j = 0; j < i; ++j) + iommu_release_ownership(table_group->tables[j]); + return rc; + } + return 0; +} + +static void spapr_tce_release_ownership(struct iommu_table_group *table_group) +{ + int i; + + for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = table_group->tables[i]; + + if (!tbl) + continue; + + iommu_table_clear(tbl); + if (tbl->it_map) + iommu_release_ownership(tbl); + } +} + +struct iommu_table_group_ops spapr_tce_table_group_ops = { + .get_table_size = spapr_tce_get_table_size, + .create_table = spapr_tce_create_table, + .set_window = spapr_tce_set_window, + .unset_window = spapr_tce_unset_window, + .take_ownership = spapr_tce_take_ownership, + .release_ownership = spapr_tce_release_ownership, +}; + #endif /* CONFIG_IOMMU_API */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 4f6e20a35aa1..539934c76002 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1554,6 +1554,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb, if (WARN_ON(!tbl)) return; + pe->table_group.ops = &spapr_tce_table_group_ops; + pe->table_group.pgsizes = SZ_4K; iommu_register_group(&pe->table_group, phb->hose->global_number, pe->pe_number); pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group); @@ -1888,7 +1890,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) } } -static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group) +static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group) { struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe, table_group); @@ -1902,6 +1904,8 @@ static void pnv_ioda2_take_ownership(struct iommu_table_group *table_group) else if (pe->pdev) set_iommu_table_base(&pe->pdev->dev, NULL); iommu_tce_table_put(tbl); + + return 0; } static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index c74b71d4733d..6769d2c55187 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -74,6 +74,9 @@ static struct iommu_table_group *iommu_pseries_alloc_group(int node) if (!table_group) return NULL; + table_group->ops = &spapr_tce_table_group_ops; + table_group->pgsizes = SZ_4K; + table_group->tables[0] = iommu_pseries_alloc_table(node); if (table_group->tables[0]) return table_group; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 60a50ce8701e..c3f8ae102ece 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -1189,52 +1189,6 @@ static long tce_iommu_ioctl(void *iommu_data, static void tce_iommu_release_ownership(struct tce_container *container, struct iommu_table_group *table_group) -{ - int i; - - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { - struct iommu_table *tbl = container->tables[i]; - - if (!tbl) - continue; - - tce_iommu_clear(container, tbl, tbl->it_offset, tbl->it_size); - if (tbl->it_map) - iommu_release_ownership(tbl); - - container->tables[i] = NULL; - } -} - -static int tce_iommu_take_ownership(struct tce_container *container, - struct iommu_table_group *table_group) -{ - int i, j, rc = 0; - - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { - struct iommu_table *tbl = table_group->tables[i]; - - if (!tbl || !tbl->it_map) - continue; - - rc = iommu_take_ownership(tbl); - if (rc) { - for (j = 0; j < i; ++j) - iommu_release_ownership( - table_group->tables[j]); - - return rc; - } - } - - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) - container->tables[i] = table_group->tables[i]; - - return 0; -} - -static void tce_iommu_release_ownership_ddw(struct tce_container *container, - struct iommu_table_group *table_group) { long i; @@ -1250,18 +1204,14 @@ static void tce_iommu_release_ownership_ddw(struct tce_container *container, table_group->ops->release_ownership(table_group); } -static long tce_iommu_take_ownership_ddw(struct tce_container *container, +static long tce_iommu_take_ownership(struct tce_container *container, struct iommu_table_group *table_group) { long i, ret = 0; - if (!table_group->ops->create_table || !table_group->ops->set_window || - !table_group->ops->release_ownership) { - WARN_ON_ONCE(1); - return -EFAULT; - } - - table_group->ops->take_ownership(table_group); + ret = table_group->ops->take_ownership(table_group); + if (ret) + return ret; /* Set all windows to the new group */ for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { @@ -1307,9 +1257,14 @@ static int tce_iommu_attach_group(void *iommu_data, goto unlock_exit; } - if (tce_groups_attached(container) && (!table_group->ops || - !table_group->ops->take_ownership || - !table_group->ops->release_ownership)) { + /* v2 requires full support of dynamic DMA windows */ + if (container->v2 && table_group->max_dynamic_windows_supported == 0) { + ret = -EINVAL; + goto unlock_exit; + } + + /* v1 reuses TCE tables and does not share them among PEs */ + if (!container->v2 && tce_groups_attached(container)) { ret = -EBUSY; goto unlock_exit; } @@ -1344,29 +1299,15 @@ static int tce_iommu_attach_group(void *iommu_data, goto unlock_exit; } - if (!table_group->ops || !table_group->ops->take_ownership || - !table_group->ops->release_ownership) { - if (container->v2) { - ret = -EPERM; - goto free_exit; - } - ret = tce_iommu_take_ownership(container, table_group); - } else { - if (!container->v2) { - ret = -EPERM; - goto free_exit; - } - ret = tce_iommu_take_ownership_ddw(container, table_group); - if (!tce_groups_attached(container) && !container->tables[0]) - container->def_window_pending = true; - } + ret = tce_iommu_take_ownership(container, table_group); + if (!tce_groups_attached(container) && !container->tables[0]) + container->def_window_pending = true; if (!ret) { tcegrp->grp = iommu_group; list_add(&tcegrp->next, &container->group_list); } -free_exit: if (ret && tcegrp) kfree(tcegrp); @@ -1405,10 +1346,7 @@ static void tce_iommu_detach_group(void *iommu_data, table_group = iommu_group_get_iommudata(iommu_group); BUG_ON(!table_group); - if (!table_group->ops || !table_group->ops->release_ownership) - tce_iommu_release_ownership(container, table_group); - else - tce_iommu_release_ownership_ddw(container, table_group); + tce_iommu_release_ownership(container, table_group); unlock_exit: mutex_unlock(&container->lock); From patchwork Mon Mar 6 17:30:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timothy Pearson X-Patchwork-Id: 13162018 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D67AC6FA99 for ; Mon, 6 Mar 2023 17:38:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229826AbjCFRiQ (ORCPT ); Mon, 6 Mar 2023 12:38:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229915AbjCFRiN (ORCPT ); Mon, 6 Mar 2023 12:38:13 -0500 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65ECE37737 for ; Mon, 6 Mar 2023 09:37:40 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 958C937E2D60D1; Mon, 6 Mar 2023 11:30:42 -0600 (CST) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id HYDUPU0u9TJI; Mon, 6 Mar 2023 11:30:42 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 33B8537E2D60CE; Mon, 6 Mar 2023 11:30:42 -0600 (CST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 33B8537E2D60CE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1678123842; bh=kiK4u61n03Ewv9GiwNQjcFB9//YPBFFbrjx8hSx1Y+c=; h=Date:From:To:Message-ID:MIME-Version; b=KpZ/1BX9ebbx6jGZSpuBBOBFbjFdvhoRBWG9+oBSQ+x4DVjNAuBhynoMcXBJ0WxBU 8kpfHHKMxWt77SgaObYmZNhveYByVYGIqNOo+fXDiR1HaUiu5pLHnlSrEle2YkruTt b04EhKOKwc0jfOXQnIddNfibXi1qpWzHOd2jzBr4= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id BL7M-Xkg8qh2; Mon, 6 Mar 2023 11:30:42 -0600 (CST) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 1362237E2D60C8; Mon, 6 Mar 2023 11:30:42 -0600 (CST) Date: Mon, 6 Mar 2023 11:30:42 -0600 (CST) From: Timothy Pearson To: kvm Cc: linuxppc-dev Message-ID: <12303156.16998521.1678123842049.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v2 2/4] powerpc/pci_64: Init pcibios subsys a bit later MIME-Version: 1.0 X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC110 (Linux)/8.5.0_GA_3042) Thread-Index: xJigH1RzbFL1MLjO/JsgLFkXazYTjQ== Thread-Topic: powerpc/pci_64: Init pcibios subsys a bit later Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The following patches are going to add dependency/use of iommu_ops which is initialized in subsys_initcall as well. This moves pciobios_init() to the next initcall level. This should not cause behavioral change. Signed-off-by: Alexey Kardashevskiy Signed-off-by: Timothy Pearson --- arch/powerpc/kernel/pci_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c index fd42059ae2a5..e27342ef128b 100644 --- a/arch/powerpc/kernel/pci_64.c +++ b/arch/powerpc/kernel/pci_64.c @@ -73,7 +73,7 @@ static int __init pcibios_init(void) return 0; } -subsys_initcall(pcibios_init); +subsys_initcall_sync(pcibios_init); int pcibios_unmap_io_space(struct pci_bus *bus) { From patchwork Mon Mar 6 17:31:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timothy Pearson X-Patchwork-Id: 13162045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7129DC61DA4 for ; Mon, 6 Mar 2023 17:44:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjCFRns (ORCPT ); Mon, 6 Mar 2023 12:43:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229812AbjCFRnr (ORCPT ); Mon, 6 Mar 2023 12:43:47 -0500 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 630C518B39 for ; Mon, 6 Mar 2023 09:43:07 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id A173E37E2D60E5; Mon, 6 Mar 2023 11:31:02 -0600 (CST) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uuLrLi4_1t_N; Mon, 6 Mar 2023 11:31:00 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 57EE937E2D60DA; Mon, 6 Mar 2023 11:31:00 -0600 (CST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 57EE937E2D60DA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1678123860; bh=yebnO1LO3S3uTuA0LRdMAMfR8loATotNKkTWX/e7OlE=; h=Date:From:To:Message-ID:MIME-Version; b=SA6l+T5I8WLMDPE3WyvJpAQjGWFCI57QcPGeWYQpS6rqRbn7ruZhX/mzPVVmdX8R8 5QYjmta2/KWSdOSY6fqZChCSpWTSvt0sgFZsWgQloWRztFKxvNkf3QpFOh9n2WRV3z ohaG4REddOpKFLag5+HVgTVt/puZGYfybCcTNbxo= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9jaesIDJkhYb; Mon, 6 Mar 2023 11:31:00 -0600 (CST) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 2F93537E2D60D7; Mon, 6 Mar 2023 11:31:00 -0600 (CST) Date: Mon, 6 Mar 2023 11:31:00 -0600 (CST) From: Timothy Pearson To: kvm Cc: linuxppc-dev Message-ID: <2000135730.16998523.1678123860135.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v2 3/4] powerpc/iommu: Add iommu_ops to report capabilities and MIME-Version: 1.0 X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC110 (Linux)/8.5.0_GA_3042) Thread-Index: 6jUIKsIr3C/DQkflHEmExIa6wjx3Kg== Thread-Topic: powerpc/iommu: Add iommu_ops to report capabilities and Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org allow blocking domains Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in the Type1 VFIO driver. Recent development added 2 uses of iommu_ops to the generic VFIO which broke POWER: - a coherency capability check; - blocking IOMMU domain - iommu_group_dma_owner_claimed()/... This adds a simple iommu_ops which reports support for cache coherency and provides a basic support for blocking domains. No other domain types are implemented so the default domain is NULL. Since now iommu_ops controls the group ownership, this takes it out of VFIO. This adds an IOMMU device into a pci_controller (=PHB) and registers it in the IOMMU subsystem, iommu_ops is registered at this point. This setup is done in postcore_initcall_sync. This replaces iommu_group_add_device() with iommu_probe_device() as the former misses necessary steps in connecting PCI devices to IOMMU devices. This adds a comment about why explicit iommu_probe_device() is still needed. The previous discussion is here: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-aik@ozlabs.ru/ https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/ Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence") Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices") Cc: Deming Wang Cc: Robin Murphy Cc: Jason Gunthorpe Cc: Alex Williamson Cc: Daniel Henrique Barboza Cc: Fabiano Rosas Cc: Murilo Opsfelder Araujo Cc: Nicholas Piggin Co-authored-by: Timothy Pearson Signed-off-by: Alexey Kardashevskiy Signed-off-by: Timothy Pearson --- arch/powerpc/include/asm/pci-bridge.h | 7 + arch/powerpc/kernel/iommu.c | 148 +++++++++++++++++++++- arch/powerpc/platforms/powernv/pci-ioda.c | 30 +++++ arch/powerpc/platforms/pseries/iommu.c | 24 ++++ arch/powerpc/platforms/pseries/pseries.h | 4 + arch/powerpc/platforms/pseries/setup.c | 3 + drivers/vfio/vfio_iommu_spapr_tce.c | 8 -- 7 files changed, 214 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h index 71c1d26f2400..2aa3a091ef20 100644 --- a/arch/powerpc/include/asm/pci-bridge.h +++ b/arch/powerpc/include/asm/pci-bridge.h @@ -8,6 +8,7 @@ #include #include #include +#include struct device_node; @@ -44,6 +45,9 @@ struct pci_controller_ops { #endif void (*shutdown)(struct pci_controller *hose); + + struct iommu_group *(*device_group)(struct pci_controller *hose, + struct pci_dev *pdev); }; /* @@ -131,6 +135,9 @@ struct pci_controller { struct irq_domain *dev_domain; struct irq_domain *msi_domain; struct fwnode_handle *fwnode; + + /* iommu_ops support */ + struct iommu_device iommu; }; /* These are used for config access before all the PCI probing diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index f9f5a9418092..b42e202af3bc 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -35,6 +35,7 @@ #include #include #include +#include #define DBG(...) @@ -1156,8 +1157,14 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev) pr_debug("%s: Adding %s to iommu group %d\n", __func__, dev_name(dev), iommu_group_id(table_group->group)); - - return iommu_group_add_device(table_group->group, dev); + /* + * This is still not adding devices via the IOMMU bus notifier because + * of pcibios_init() from arch/powerpc/kernel/pci_64.c which calls + * pcibios_scan_phb() first (and this guy adds devices and triggers + * the notifier) and only then it calls pci_bus_add_devices() which + * configures DMA for buses which also creates PEs and IOMMU groups. + */ + return iommu_probe_device(dev); } EXPORT_SYMBOL_GPL(iommu_add_device); @@ -1237,6 +1244,7 @@ static long spapr_tce_take_ownership(struct iommu_table_group *table_group) rc = iommu_take_ownership(tbl); if (!rc) continue; + for (j = 0; j < i; ++j) iommu_release_ownership(table_group->tables[j]); return rc; @@ -1269,4 +1277,140 @@ struct iommu_table_group_ops spapr_tce_table_group_ops = { .release_ownership = spapr_tce_release_ownership, }; +/* + * A simple iommu_ops to allow less cruft in generic VFIO code. + */ +static int spapr_tce_blocking_iommu_attach_dev(struct iommu_domain *dom, + struct device *dev) +{ + struct iommu_group *grp = iommu_group_get(dev); + struct iommu_table_group *table_group; + int ret = -EINVAL; + + if (!grp) + return -ENODEV; + + table_group = iommu_group_get_iommudata(grp); + ret = table_group->ops->take_ownership(table_group); + iommu_group_put(grp); + + return ret; +} + +static void spapr_tce_blocking_iommu_set_platform_dma(struct device *dev) +{ + struct iommu_group *grp = iommu_group_get(dev); + struct iommu_table_group *table_group; + + table_group = iommu_group_get_iommudata(grp); + table_group->ops->release_ownership(table_group); +} + +static const struct iommu_domain_ops spapr_tce_blocking_domain_ops = { + .attach_dev = spapr_tce_blocking_iommu_attach_dev, +}; + +static bool spapr_tce_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; + default: + break; + } + + return false; +} + +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type) +{ + struct iommu_domain *dom; + + if (type != IOMMU_DOMAIN_BLOCKED) + return NULL; + + dom = kzalloc(sizeof(*dom), GFP_KERNEL); + if (!dom) + return NULL; + + dom->ops = &spapr_tce_blocking_domain_ops; + + return dom; +} + +static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev) +{ + struct pci_dev *pdev; + struct pci_controller *hose; + + if (!dev_is_pci(dev)) + return ERR_PTR(-EPERM); + + pdev = to_pci_dev(dev); + hose = pdev->bus->sysdata; + + return &hose->iommu; +} + +static void spapr_tce_iommu_release_device(struct device *dev) +{ +} + +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev) +{ + struct pci_controller *hose; + struct pci_dev *pdev; + + pdev = to_pci_dev(dev); + hose = pdev->bus->sysdata; + + if (!hose->controller_ops.device_group) + return ERR_PTR(-ENOENT); + + return hose->controller_ops.device_group(hose, pdev); +} + +static const struct iommu_ops spapr_tce_iommu_ops = { + .capable = spapr_tce_iommu_capable, + .domain_alloc = spapr_tce_iommu_domain_alloc, + .probe_device = spapr_tce_iommu_probe_device, + .release_device = spapr_tce_iommu_release_device, + .device_group = spapr_tce_iommu_device_group, + .set_platform_dma_ops = spapr_tce_blocking_iommu_set_platform_dma, +}; + +static struct attribute *spapr_tce_iommu_attrs[] = { + NULL, +}; + +static struct attribute_group spapr_tce_iommu_group = { + .name = "spapr-tce-iommu", + .attrs = spapr_tce_iommu_attrs, +}; + +static const struct attribute_group *spapr_tce_iommu_groups[] = { + &spapr_tce_iommu_group, + NULL, +}; + +/* + * This registers IOMMU devices of PHBs. This needs to happen + * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and + * before subsys_initcall(iommu_subsys_init). + */ +static int __init spapr_tce_setup_phb_iommus_initcall(void) +{ + struct pci_controller *hose; + + list_for_each_entry(hose, &hose_list, list_node) { + iommu_device_sysfs_add(&hose->iommu, hose->parent, + spapr_tce_iommu_groups, "iommu-phb%04x", + hose->global_number); + iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, + hose->parent); + } + return 0; +} +postcore_initcall_sync(spapr_tce_setup_phb_iommus_initcall); + #endif /* CONFIG_IOMMU_API */ diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 539934c76002..6e9a0967bab1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -1897,6 +1897,13 @@ static long pnv_ioda2_take_ownership(struct iommu_table_group *table_group) /* Store @tbl as pnv_pci_ioda2_unset_window() resets it */ struct iommu_table *tbl = pe->table_group.tables[0]; + /* + * iommu_ops transfers the ownership per a device and we mode + * the group ownership with the first device in the group. + */ + if (!tbl) + return 0; + pnv_pci_ioda2_set_bypass(pe, false); pnv_pci_ioda2_unset_window(&pe->table_group, 0); if (pe->pbus) @@ -1913,6 +1920,9 @@ static void pnv_ioda2_release_ownership(struct iommu_table_group *table_group) struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe, table_group); + /* See the comment about iommu_ops above */ + if (pe->table_group.tables[0]) + return; pnv_pci_ioda2_setup_default_config(pe); if (pe->pbus) pnv_ioda_setup_bus_dma(pe, pe->pbus); @@ -2919,6 +2929,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus) } } +static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose, + struct pci_dev *pdev) +{ + struct pnv_phb *phb = hose->private_data; + struct pnv_ioda_pe *pe; + + if (WARN_ON(!phb)) + return ERR_PTR(-ENODEV); + + pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8)); + if (!pe) + return ERR_PTR(-ENODEV); + + if (!pe->table_group.group) + return ERR_PTR(-ENODEV); + + return iommu_group_ref_get(pe->table_group.group); +} + static const struct pci_controller_ops pnv_pci_ioda_controller_ops = { .dma_dev_setup = pnv_pci_ioda_dma_dev_setup, .dma_bus_setup = pnv_pci_ioda_dma_bus_setup, @@ -2929,6 +2958,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = { .setup_bridge = pnv_pci_fixup_bridge_resources, .reset_secondary_bus = pnv_pci_reset_secondary_bus, .shutdown = pnv_pci_ioda_shutdown, + .device_group = pnv_pci_device_group, }; static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = { diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 6769d2c55187..1c911eedf8fb 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -1727,3 +1727,27 @@ static int __init tce_iommu_bus_notifier_init(void) return 0; } machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init); + +#ifdef CONFIG_SPAPR_TCE_IOMMU +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose, + struct pci_dev *pdev) +{ + struct device_node *pdn, *dn = pdev->dev.of_node; + struct iommu_group *grp; + struct pci_dn *pci; + + pdn = pci_dma_find(dn, NULL); + if (!pdn || !PCI_DN(pdn)) + return ERR_PTR(-ENODEV); + + pci = PCI_DN(pdn); + if (!pci->table_group) + return ERR_PTR(-ENODEV); + + grp = pci->table_group->group; + if (!grp) + return ERR_PTR(-ENODEV); + + return iommu_group_ref_get(grp); +} +#endif diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h index 1d75b7742ef0..f8bce40ebd0c 100644 --- a/arch/powerpc/platforms/pseries/pseries.h +++ b/arch/powerpc/platforms/pseries/pseries.h @@ -123,5 +123,9 @@ static inline void pseries_lpar_read_hblkrm_characteristics(void) { } #endif void pseries_rng_init(void); +#ifdef CONFIG_SPAPR_TCE_IOMMU +struct iommu_group *pSeries_pci_device_group(struct pci_controller *hose, + struct pci_dev *pdev); +#endif #endif /* _PSERIES_PSERIES_H */ diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 4a0cec8cf623..94a7617eb044 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -1118,6 +1118,9 @@ static int pSeries_pci_probe_mode(struct pci_bus *bus) struct pci_controller_ops pseries_pci_controller_ops = { .probe_mode = pSeries_pci_probe_mode, +#ifdef CONFIG_SPAPR_TCE_IOMMU + .device_group = pSeries_pci_device_group, +#endif }; define_machine(pseries) { diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index c3f8ae102ece..a94ec6225d31 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -1200,8 +1200,6 @@ static void tce_iommu_release_ownership(struct tce_container *container, for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) if (container->tables[i]) table_group->ops->unset_window(table_group, i); - - table_group->ops->release_ownership(table_group); } static long tce_iommu_take_ownership(struct tce_container *container, @@ -1209,10 +1207,6 @@ static long tce_iommu_take_ownership(struct tce_container *container, { long i, ret = 0; - ret = table_group->ops->take_ownership(table_group); - if (ret) - return ret; - /* Set all windows to the new group */ for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { struct iommu_table *tbl = container->tables[i]; @@ -1231,8 +1225,6 @@ static long tce_iommu_take_ownership(struct tce_container *container, for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) table_group->ops->unset_window(table_group, i); - table_group->ops->release_ownership(table_group); - return ret; } From patchwork Mon Mar 6 17:31:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timothy Pearson X-Patchwork-Id: 13162044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0778EC6FA99 for ; Mon, 6 Mar 2023 17:43:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230188AbjCFRnH (ORCPT ); Mon, 6 Mar 2023 12:43:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbjCFRnF (ORCPT ); Mon, 6 Mar 2023 12:43:05 -0500 Received: from raptorengineering.com (mail.raptorengineering.com [23.155.224.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3723D3E09C for ; Mon, 6 Mar 2023 09:42:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id BF64737E2D6111; Mon, 6 Mar 2023 11:31:29 -0600 (CST) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id SmH_P4v-Vbwj; Mon, 6 Mar 2023 11:31:29 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 0AAB037E2D610E; Mon, 6 Mar 2023 11:31:29 -0600 (CST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com 0AAB037E2D610E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1678123889; bh=fTU8lI8Un7GOGF+BJicHxq+cOFMCDRta41UtYHK65M0=; h=Date:From:To:Message-ID:MIME-Version; b=PiV7TdGr8IO82iim6h2i3e1dweRFs3Ez0um5+JxUVhcNp/hijecIMQpaBGOivT6GA TuU5CaDqM5dO8d/Ynk2UM2qUlbfLnCNJauVdKA9W1at+xrlpu8Is1vf4l2swJFwpj+ +ji3t8sWm4pG2waH4jHQ9XT4X4YgtlHWgou0r3UE= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MEEI5Y3J0Rx7; Mon, 6 Mar 2023 11:31:28 -0600 (CST) Received: from vali.starlink.edu (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id E327937E2D6101; Mon, 6 Mar 2023 11:31:28 -0600 (CST) Date: Mon, 6 Mar 2023 11:31:28 -0600 (CST) From: Timothy Pearson To: kvm Cc: linuxppc-dev Message-ID: <256219069.16998525.1678123888896.JavaMail.zimbra@raptorengineeringinc.com> Subject: [PATCH v2 4/4] Add myself to MAINTAINERS for Power VFIO support MIME-Version: 1.0 X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC110 (Linux)/8.5.0_GA_3042) Thread-Index: AKxleKwX8tG6hTXHlIF77Cy3UrdaGA== Thread-Topic: Add myself to MAINTAINERS for Power VFIO support Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Timothy Pearson --- MAINTAINERS | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 8d5bc223f305..876f96e82d66 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9836,6 +9836,11 @@ F: drivers/crypto/vmx/ghash* F: drivers/crypto/vmx/ppc-xlate.pl F: drivers/crypto/vmx/vmx.c +IBM Power VFIO Support +M: Timothy Pearson +S: Supported +F: drivers/vfio/vfio_iommu_spapr_tce.c + IBM ServeRAID RAID DRIVER S: Orphan F: drivers/scsi/ips.*