From patchwork Thu Aug 15 15:11:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764921 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2081.outbound.protection.outlook.com [40.107.92.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFB5B1A76C1 for ; Thu, 15 Aug 2024 15:11:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734712; cv=fail; b=awZhoRkATB9kJYidNnRrXJy/fB1EJAdliR87MzYpz0nv0k0DvcVj20nt6EbP0Dlp3YtEGGumSMLke0Zk9tt3hUrps7iaXmo1QXV+HFZKyGFLhvXeorW259hdJEbJXlifp/tbR+PeGbAsg5tfpKGt2LEi2Q9eHyHzMR7CU/JPlYE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734712; c=relaxed/simple; bh=8cna2in0HBUaUTGhbHodCo0Gz77C+6NzH/AUC5spjjQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=Tb80ualHIPAZAujTul7Z+dPVQRd2ZKI2gyKSUdJ6IAfkHpm/WwQCYgWD0Njs4yBwN0x4CEWkQ+7xVH01sEQNJr7aLCvcqssw0wfV+ZP4roJpGYhRz8cJXCVvlE0CaM4vR0IUC6RE+ASBnlL6qM+8hlxfjde3S8Mcrl6HzCA4cOo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=uo2LMgq7; arc=fail smtp.client-ip=40.107.92.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="uo2LMgq7" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ODfj9GEPwCG0EKXhB68SkfI87cw5pSnzDRz2EDNn1hsnj5kTs1Ll5t/gFyqNPDXsbRMWj8Wv2DC9rQo9rZmQAldbx/Ho1HffWn96shLRT1PGAk9YBnvYZdEp+RD/0RlzWU8RExkscuaNiLYiGtGMW1VFxSEdMMhzJHRpgpViZwd8f9M0w8FsoYD1AkdBYdAafanKkoPc8lhafA+L7X3HmvTqv9XcqPoGlYoHpShqtXnS3eWYCcaL7l33kY4gu0RGIZVrXf39RaVgt4EnXh6ijmldvU+m6YGPi/0mvERxZ1R63WJVFurxbZDjLjB9gDgnnNQijx5DJDA/mwvG7fZ1HA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P4i3+91myaSeMKFwRWjAwIQMVPNUJoAHh9k4U94hNes=; b=y3Y+HrIrMYC3tkIEV0LcU5s183cAUYxYT1CMM3Lv/0Q/Hqmj2v9sYap65iz3x/thxxew08jH/Idb0zPj6cbiYIfdZXkKBWOvIJXoAwkNGVFIBFJxd+eUHT5D1geL1ISxMCqpZcCyuU2LX3LcEVMc7918FeF51/f9JHqRMOUB3/DWAExfy8ai89wMLb1R9OzcIhq+q2m+WlFjYGI+MwEfNKp/lqDz4Az3sCNeT/NVmttVjCNvSmzJW6bcjAjw8Wx96dGHSbVLatwQ1tsQbEQYlLy/rgng4gqsuhDAfi1iqWBcoG+ZPbuX+cwWth477bijUPvAIEemzrhF+UjTB65NXg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P4i3+91myaSeMKFwRWjAwIQMVPNUJoAHh9k4U94hNes=; b=uo2LMgq7EWR7pTxnWYMw0kGRpEDBGH0lDu17bAmSwzaNs0NGqg8w9PjePTtC2a1Ro14Fqdb/Fvi4cwAOaEfZ2QlGSND7+OEnzO3inok8bLkTlzoGd8a9OUovRS1BBrWG6w28auq9AVdD47bckD/6zF/vKnFnYobLYZXYZ0mWgqvIFkkTtz/jSxp1QtzQQAH1pI0IoLRAivHmwISPJR9jhkKR1BylPBjnHkAP7GM0lIahk2+CQHceBX8g81U3GYF47RBDgjxyVrhVa15opYSOlIpNS743TDjDTmWEQ5YsKVHJmfwVBP17sC9kJbL0/EVjghYJHrPxd18fr18R3SDpbQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:35 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:35 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 01/16] genpt: Generic Page Table base API Date: Thu, 15 Aug 2024 12:11:17 -0300 Message-ID: <1-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: MN2PR20CA0066.namprd20.prod.outlook.com (2603:10b6:208:235::35) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: fc2709e6-b664-4d8b-e8e5-08dcbd3c8d3d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: IH05gsQRs+G42/f7gbGlN4AE6vp4cgdR/9hOooGGUoSHv1eTudBEw7Qu6ezagIinfg25HUu9Hyg/TXvJCpx2ifclurpkTWYROPdhor7+e5o8n+YllIgDmnADAOhjPq4H88eqzjM1H0sHHAeqBUKou7yuy/mlY97lfwPhqkPXvOf3ELl2uHY52Z1StIA70N9K0hrYhakM1eZP5h16lY/iAXAuphSnaK1PBbYgrAQ81EbB5LpX+LVlJp4sP7Ctidaq+GhTRdLiRcy7JH/fCO1UFkTpcC9HfVYPyNfXgPrUvoVHS14h09nv7iPgekjbFzee0D8irOY5zmrn53ZpR09lVbLXrPZ9Kw+IKtnqkZ5FpcufK04WlGFiftv0ILiB1IY2bbxfvd4CWnD2CgXTT88nmtLFD7Ei1//LDe8ccYASvlC+cmsU+YMarhuy+UoLYr/HcDgewoZvH299qCMqrMpWyCJAd3Az2uCKBNfGNSV7K/2IzvDyvxdBrljlYCdCHnfCNQU62njkX38TrPQV7hpv7P2gZedwO39bz4escb9sixAxTOugX2UF4MtRhn8kRrG3czTq/DUL31Ay0qt7om5L5MEiRa3maCjEPWwMO+JQ3pji4GvuPBIwP4KtkJT+NNx8JRq5qEWZ2Je5dJdL/RKwO40Gy5i7VwZas6rjgdzS5+a7pTo/m125pqixFslLYXClOBNFWLYdUQVebKHKOdk0weFnUfuPioH6EW3/1tHpHNeLUK1Yv65UfwnDkv7jmPKyOEoGotirzb3NQV+B5vTrPy+SI1f3P4slU/OU4zVIYQu33QvPNSObrjJ1pw9dYF3kf/opxaGvVOdAZ+SlPcMbm2DjMG3RsT97PDJUEqWvx3yNp4Qn/P2QtmoaRfdVVWqKpQMM7ZUHX8eLQAW+38dIdNcqCf7QcD+vp+vJdtHTqbXEkFGS/+Na+tMyLzup5SdVddPZd9oNYRS6XJ5TjTJkSJeP82bDjm8d1utqbEcIU9jlqhXRsR5iMKTKigJ1CZgAViZYKMOqzg9/pX7toUUDHMYrKXpqtEKzO/VXOrkrcntHQbNtoHFSER41mgicpfDZPqFnmLbsEn7gvmEtpBRttT+CQKPNK+p3vRZ83lJjpyakH2r32+KLzM+lmJtOwGp5+AeHwb/+e/re1gM1vLpyR+IDqB1pv2I1bUDNXOnhc0BXe52Zhz3gdExO9vHjvo3cpqO1TLVmHflAib3a0LZdMXJXpUm5LyYGgOxeSfu3DNX3OsZbwBw3M0n1ikwLvDlpGqRFwFG2EN3yIsCOefFGurpennYadtnmQgXHAAFiAWZhLOyU19QqtmvVzepLDlxbqy1/RlaMZZ8COoZ8gSpmRA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: MCU7w3/UPheOyuuIlLVbnefCYwGCaL/qyfx0ulTfXxcJ0WnYtUHSGprJR06HilQCnzpeL47IEbzvO5dgX9OJjOBagdbLvne1KiLdeuTP5pXwvuXv5sJ4wtcq7rvqFW4Rk//f6rxuPRXEziFKpfxNq2sYPbrMHabn33df8ptSOSD1saIFDywKkFVP8mI49m+uRNpzyAsY8DNOWawB/4NXl87M6pmRewMb0Uip7iPj4NsVvJE7h0uKMSjBOCtSn8dOGtb3P0kekX2JmdYx9I874l1NHfOCDfA4GfBenkdDH5pQnN9p5R1Mq+y6Jal/0OvV5ynfIja5nvM/r00XQc2zEOOPL8rMuuCbX6uWVPEfK81xdw3UakyZpxQ+J976DhgbEKpUqgKVa91+CZiw3HCOW/O9I3u+YRM2P2sbCzDLyW58Fr/jx8Vi9XxTQZpsxX2PY6nMQaA2K5QoeYOyIaXiE6GXpf3/UcmuS2hDt+Q2lwTA6SbGKNp5UVppW3CgqHz/D7OOjqS6+TndjostlF/DXTRwt2lpFZOJpV2b7+Atg7jZyEP+dkXnVbRsNmjxbCObsVmR15llCeuk9ySoOYEzQA0MWk2mzAFElhPxM6cFOs0yDI89pYmXHQ/ujG4EwnkSI7RBsspBueUVd6+TDSgTUBJt6cHlgMoxs3c2sorBYQJ3f7u7I6J6vKRzlRrujqZ0rYxNirHgm7b15xTTH5HjHuwhNYCLjfSu/c6X8UytgNBDRvvrqLziCB+RYvUCvgdCanc8iP/zOpjj6VwzA1XJWKMpfoPqDUa/VMrbEcY/XMFAP+koi3A1V66fYWSZkjGGL4CzBZWkuovwkigVbRSlaQNFuWm8OU1flYFH6s/d+lgK+kUa3Fl9MW537BGNufGfRzjT4mJ7VKIjVrMIR8kONxuP5/A05bNIV7SB99ni8x0FF9QRA1XkODlG53ET/GFDvr3jFzw63++BlW/tscxQapHrTWzOfcj5xhPne6K8DU0jd+kDMjWVBNp/gONxBfehP1oYoEdOHH9wceAJrisa4W1SMcaBj/SNONUmL+AzqSrmqeAtx5QMr7j+7ktjXVfjOosf1p/PAMgAFspptaodhb3IFuu0pNEJ+ZKo7mbc9dYTRtEeIXES/oXMJddjETqkp3/Tj+HzFDvFLwPRUU2LW2bQrIU1O3MEs49hD7jjLKCg4Lpe1tBrgNwdoAdsP3QvGL6p5byYt70qPxzENoLOWm0YpBfbCcc0LaWRz9pn7dvVaNyNsEs+mdrvwAf2AXh77OCN4ms0pVhrUYAbqQkENyVSUBgTACuR6yiqOb2xq03sUGiR8QbvyGT69UlyDktZQtFwpEVFoXqhxCEpAYzwpiiQvd6vCT1N1oZDAFeBvgchzFxwZS3NTQ29LAbOy8k4obCIJEg46fazvSxi9x6xXEkB1UmsEMIquxx3wJ7HlANffINUVyxxRm+8VwEPx3CVCzoFpWLYc+n0Ld6+YWErwKVsqxX7/uunWS+zLu3k5fbdDal18K9EtIr+oS5cZy19l+hnLJbnGvvWOSsg8z1aBWNvnRIO5eUxGqy21lxxpcc= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: fc2709e6-b664-4d8b-e8e5-08dcbd3c8d3d X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.8110 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: K7dzTaOVNkFtM51YWRbXpXSKcVHxYUkA7YQTAwZWuncBi/Y75r0vQ72QKCWybrpx X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 The generic API is intended to be separated from the implementation of page table algorithms. It contains only accessors for walking and manipulating the table and helpers that are useful for building an implementation. Memory management is part of the implementation. Using a multi-compilation approach the implementation module would include headers in this order: common.h defs_FMT.h pt_defs.h FMT.h pt_common.h IMPLEMENTATION.h Where each compilation unit would have a combination of FMT and IMPLEMENTATION to produce a per-format per-implementation module. The API is designed so that the format headers have minimal logic, and default implementations are provided if the format doesn't include one. Generally formats provide their code via an inline function using the pattern: static inline FMTpt_XX(..) {} #define pt_XX FMTpt_XX The common code then enforces a function signature so that there is no drift in function arguments, or accidental polymorphic functions (as has been slightly troublesome in mm). Use of function-like #defines are avoided in the format even though many of the functions are small enough. Provide kdocs for the API surface. This is enough to implement the 8 initial format variations with all of their features: * Entries comprised of contiguous blocks of IO PTEs for larger page sizes * Multi-level tables, up to 6 levels. Runtime selected top level * Runtime variable table level size (ARM's concatenated tables) * Expandable top level * Optional leaf entries at any level * 32 bit/64 bit virtual and output addresses, using every bit * Dirty tracking * DMA incoherent table walkers * Bottom up and top down tables A basic simple format takes about 200 lines to declare the require inline functions. Signed-off-by: Jason Gunthorpe --- .clang-format | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/generic_pt/Kconfig | 22 + drivers/iommu/generic_pt/Makefile | 1 + drivers/iommu/generic_pt/pt_common.h | 311 ++++++++++++++ drivers/iommu/generic_pt/pt_defs.h | 276 ++++++++++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 109 +++++ drivers/iommu/generic_pt/pt_iter.h | 468 +++++++++++++++++++++ drivers/iommu/generic_pt/pt_log2.h | 131 ++++++ include/linux/generic_pt/common.h | 103 +++++ 11 files changed, 1425 insertions(+) create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/Makefile create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h diff --git a/.clang-format b/.clang-format index 252820d9c80a15..88b7b42c7170fd 100644 --- a/.clang-format +++ b/.clang-format @@ -381,6 +381,7 @@ ForEachMacros: - 'for_each_prop_dlc_cpus' - 'for_each_prop_dlc_platforms' - 'for_each_property_of_node' + - 'for_each_pt_level_item' - 'for_each_reg' - 'for_each_reg_filtered' - 'for_each_reloc' diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index a82f10054aec86..70ee313fb3fe93 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -509,3 +509,5 @@ config SPRD_IOMMU Say Y here if you want to use the multimedia devices listed above. endif # IOMMU_SUPPORT + +source "drivers/iommu/generic_pt/Kconfig" diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 542760d963ec7c..b978af18b94598 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-y += amd/ intel/ arm/ iommufd/ +obj-$(CONFIG_GENERIC_PT) += generic_pt/ obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig new file mode 100644 index 00000000000000..775a3afb563f72 --- /dev/null +++ b/drivers/iommu/generic_pt/Kconfig @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0-only + +menuconfig GENERIC_PT + bool "Generic Radix Page Table" + default n + help + Generic library for building radix tree page tables. + + Generic PT provides a set of HW page table formats and a common + set of APIs to work with them. + +if GENERIC_PT +config DEBUG_GENERIC_PT + bool "Extra debugging checks for GENERIC_PT" + default n + help + Enable extra run time debugging checks for GENERIC_PT code. This + incurs a runtime cost and should not be enabled for production + kernels. + + The kunit tests require this to be enabled to get full coverage. +endif diff --git a/drivers/iommu/generic_pt/Makefile b/drivers/iommu/generic_pt/Makefile new file mode 100644 index 00000000000000..f66554cd5c4518 --- /dev/null +++ b/drivers/iommu/generic_pt/Makefile @@ -0,0 +1 @@ +# SPDX-License-Identifier: GPL-2.0 diff --git a/drivers/iommu/generic_pt/pt_common.h b/drivers/iommu/generic_pt/pt_common.h new file mode 100644 index 00000000000000..c5c09ea95850b5 --- /dev/null +++ b/drivers/iommu/generic_pt/pt_common.h @@ -0,0 +1,311 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * This header is included after the format. It contains definitions + * that build on the format definitions to create the basic format API. + * + * The format API is listed here, with kdocs, in alphabetical order. The + * functions without bodies are implemented in the format using the pattern: + * static inline FMTpt_XXX(..) {..} + * #define pt_XXX FMTpt_XXX + * + * The routines marked "@pts: Entry to query" operate on the entire contiguous + * entry and can be called with a pts->index pointing to any sub item that makes + * up that entry. + * + * The header order is: + * pt_defs.h + * fmt_XX.h + * pt_common.h + */ +#ifndef __GENERIC_PT_PT_COMMON_H +#define __GENERIC_PT_PT_COMMON_H + +#include "pt_defs.h" +#include "pt_fmt_defaults.h" + +/** + * pt_attr_from_entry() - Convert the permission bits back to attrs + * @pts: Entry to convert from + * @attrs: Resulting attrs + * + * Fill in the attrs with the permission bits encoded in the current leaf entry. + * The attrs should be usable with pt_install_leaf_entry() to reconstruct the + * same entry. + */ +static inline void pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs); + +/** + * pt_can_have_leaf() - True if the current level can have an OA entry + * @pts: The current level + * + * True if the current level can support pt_install_leaf_entry(). A leaf + * entry produce an OA. + */ +static inline bool pt_can_have_leaf(const struct pt_state *pts); + +/** + * pt_can_have_table() - True if the current level can have a lower table + * @pts: The current level + * + * Every level except 0 is allowed to have a lower table. + */ +static inline bool pt_can_have_table(const struct pt_state *pts) +{ + /* No further tables at level 0 */ + return pts->level > 0; +} + +/** + * pt_clear_entry() - Make entries empty (non-present) + * @pts: Starting table index + * @num_contig_lg2: Number of contiguous items to clear + * + * Clear a run of entries. A cleared entry will load back as PT_ENTRY_EMPTY + * and does not have any effect on table walking. The starting index must be + * aligned to num_contig_lg2. + */ +static inline void pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2); + +/** + * pt_entry_num_contig_lg2() - Number of contiguous items for this leaf entry + * @pts: Entry to query + * + * Returns the number of contiguous items this leaf entry spans. If the entry is + * single item it returns ilog2(1). + */ +static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts); + +/** + * pt_entry_oa() - Output Address for this leaf entry + * @pts: Entry to query + * + * Return the output address for the start of the entry. If the entry + * is contigous this returns the same value for each sub-item. Ie: + * log2_mod(pt_entry_oa(), pt_entry_oa_lg2sz()) == 0 + * + * See pt_item_oa(). The format should implement one of these two functions + * depending on how it stores the OA's in the table. + */ +static inline pt_oaddr_t pt_entry_oa(const struct pt_state *pts); + +/** + * pt_entry_oa_lg2sz() - Return the size of a OA entry + * @pts: Entry to query + * + * If the entry is not contigous this returns pt_table_item_lg2sz(), otherwise + * it returns the total VA/OA size of the entire contiguous entry. + */ +static inline unsigned int pt_entry_oa_lg2sz(const struct pt_state *pts) +{ + return pt_entry_num_contig_lg2(pts) + pt_table_item_lg2sz(pts); +} + +/** + * pt_entry_oa_full() - Return the full OA for an entry + * @pts: Entry to query + * + * During iteration the first entry could have a VA with an offset from the + * natural start of the entry. Return the true full OA considering this VA + * offset. + */ +/* Include the sub page bits as well */ +static inline pt_oaddr_t pt_entry_oa_full(const struct pt_state *pts) +{ + return _pt_entry_oa_fast(pts) | + log2_mod(pts->range->va, pt_entry_oa_lg2sz(pts)); +} + +/** + * pt_entry_set_write_clean() - Make the entry write clean + * @pts: Table index to change + * + * Modify the entry so that pt_entry_write_is_dirty() == false. The HW will + * eventually be notified of this change via a TLB flush, which is the point + * that the HW must become synchronized. Any "write dirty" prior to the TLB + * flush can be lost, but once the TLB flush completes all writes must make + * their entries write dirty. + * + * The format should alter the entry in a way that is compatible with any + * concurrent update from HW. + */ +static inline void pt_entry_set_write_clean(struct pt_state *pts); + +/** + * pt_entry_write_is_dirty() - True if the entry has been written to + * @pts: Entry to query + * + * "write dirty" means that the HW has written to the OA translated + * by this entry. If the entry is contiguous then the consolidated + * "write dirty" for all the items must be returned. + */ +static inline bool pt_entry_write_is_dirty(const struct pt_state *pts); + +/** + * pt_full_va_prefix() - The top bits of the VA + * @common: Page table to query + * + * This is usually 0, but some formats have their VA space going downward from + * PT_VADDR_MAX, and will return that instead. This value must always be + * adjusted by struct pt_common max_vasz_lg2. + */ +static inline pt_vaddr_t pt_full_va_prefix(const struct pt_common *common); + +/** + * pt_install_leaf_entry() - Write a leaf entry to the table + * @pts: Table index to change + * @oa: Output Address for this leaf + * @oasz_lg2: Size in VA for this leaf + * + * A leaf OA entry will return PT_ENTRY_OA from pt_load_entry(). It translates + * the VA indicated by pts to the given OA. + * + * For a single item non-contiguous entry oasz_lg2 is pt_table_item_lg2sz(). + * For contiguous it is pt_table_item_lg2sz() + num_contig_lg2. + * + * This must not be called if pt_can_have_leaf() == false. Contigous sizes + * not indicated by pt_possible_sizes() must not be specified. + */ +static inline void pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs); + +/** + * pt_install_table() - Write a table entry to the table + * @pts: Table index to change + * @table_pa: CPU physical address of the lower table's memory + * @attrs: Attributes to modify the table index + * + * A table entry will return PT_ENTRY_TABLE from pt_load_entry(). The table_pa + * is the table at pts->level - 1. + * + * This must not be called if pt_can_have_table() == false. + */ +static inline bool pt_install_table(struct pt_state *pts, pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs); + +/** + * pt_item_oa() - Output Address for this leaf item + * @pts: Item to query + * + * Return the output address for this item. If the item is part of a contiguous + * entry it returns the value of the OA for this individual sub item. + * + * See pt_entry_oa(). The format should implement one of these two functions + * depending on how it stores the OA's in the table. + */ +static inline pt_oaddr_t pt_item_oa(const struct pt_state *pts); + +/** + * pt_load_entry_raw() - Read from the location pts points at into the pts + * @pts: Table index to load + * + * Return the type of entry that was loaded. pts->entry will be filled in with + * the entry's content. See pt_load_entry() + */ +static inline enum pt_entry_type pt_load_entry_raw(struct pt_state *pts); + +/** + * pt_max_output_address_lg2() - Return the maximum OA the table format can hold + * @common: Page table to query + * + * The value oalog2_to_max_int(pt_max_output_address_lg2()) is the MAX for the + * OA. This is the absolute maximum address the table can hold. struct pt_common + * max_oasz_lg2 sets a lower dynamic maximum based on HW capability. + */ +static inline unsigned int +pt_max_output_address_lg2(const struct pt_common *common); + +/** + * pt_num_items_lg2() - Return the number of items in this table level + * @pts: The current level + * + * The number of items in a table level defines the number of bits this level + * decodes from the VA. This function is not called for the top level, + * so it does not need to compute a special value for the top case. The + * result for the top is based on pt_common max_vasz_lg2. + * + * The value is used as part if determining the table indexes via the + * equation: + * log2_mod(log2_div(VA, pt_table_item_lg2sz()), pt_num_items_lg2()) + */ +static inline unsigned int pt_num_items_lg2(const struct pt_state *pts); + +/** + * pt_possible_sizes() - Return a bitmap of possible output sizes at this level + * @pts: The current level + * + * Each level has a list of possible output sizes that can be installed as + * leaf entries. If pt_can_have_leaf() is false returns zero. + * + * Otherwise the bit in position pt_table_item_lg2sz() should be set indicating + * that a non-contigous singe item leaf entry is supported. The following + * pt_num_items_lg2() number of bits can be set indicating contiguous entries + * are supported. Bit pt_table_item_lg2sz() + pt_num_items_lg2() must not be + * set, contiguous entries cannot span the entire table. + * + * The OR of pt_possible_sizes() of all levels is the typical bitmask of all + * supported sizes in the entire table. + */ +static inline pt_vaddr_t pt_possible_sizes(const struct pt_state *pts); + +/** + * pt_table_item_lg2sz() - Size of a single item entry in this table level + * @pts: The current level + * + * The size of the item specifies how much VA and OA a single item occupies. + * + * See pt_entry_oa_lg2sz() for the same value including the effect of contiguous + * entries. + */ +static inline unsigned int pt_table_item_lg2sz(const struct pt_state *pts); + +/** + * pt_table_oa_lg2sz() - Return the VA/OA size of the entire table + * @pts: The current level + * + * Return the size of VA decoded by the entire table level. + */ +static inline unsigned int pt_table_oa_lg2sz(const struct pt_state *pts) +{ + return min_t(unsigned int, pts->range->common->max_vasz_lg2, + pt_num_items_lg2(pts) + pt_table_item_lg2sz(pts)); +} + +/** + * pt_table_pa() - Return the CPU physical address of the table entry + * @pts: Entry to query + * + * This is only ever called on PT_ENTRY_TABLE entries. Must return the same + * value passed to pt_install_table(). + */ +static inline pt_oaddr_t pt_table_pa(const struct pt_state *pts); + +/** + * pt_table_ptr() - Return a CPU pointer for a table item + * @pts: Entry to query + * + * Same as pt_table_pa() but returns a CPU pointer. + */ +static inline struct pt_table_p *pt_table_ptr(const struct pt_state *pts) +{ + return __va(pt_table_pa(pts)); +} + +/** + * pt_load_entry() - Read from the location pts points at into the pts + * @pts: Table index to load + * + * Set the type of entry that was loaded. pts->entry and pts->table_lower + * will be filled in with the entry's content. + */ +static inline void pt_load_entry(struct pt_state *pts) +{ + pts->type = pt_load_entry_raw(pts); + if (pts->type == PT_ENTRY_TABLE) + pts->table_lower = pt_table_ptr(pts); +} +#endif diff --git a/drivers/iommu/generic_pt/pt_defs.h b/drivers/iommu/generic_pt/pt_defs.h new file mode 100644 index 00000000000000..80ca5beb286ff4 --- /dev/null +++ b/drivers/iommu/generic_pt/pt_defs.h @@ -0,0 +1,276 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * This header is included before the format. It contains definitions + * that are required to compile the format. The header order is: + * pt_defs.h + * fmt_XX.h + * pt_common.h + */ +#ifndef __GENERIC_PT_DEFS_H +#define __GENERIC_PT_DEFS_H + +#include + +#include +#include +#include +#include +#include +#include +#include "pt_log2.h" + +/* Header self-compile default defines */ +#ifndef pt_write_attrs +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; +#endif + +enum { + PT_VADDR_MAX = sizeof(pt_vaddr_t) == 8 ? U64_MAX : U32_MAX, + PT_VADDR_MAX_LG2 = sizeof(pt_vaddr_t) == 8 ? 64 : 32, + PT_OADDR_MAX = sizeof(pt_oaddr_t) == 8 ? U64_MAX : U32_MAX, + PT_OADDR_MAX_LG2 = sizeof(pt_oaddr_t) == 8 ? 64 : 32, +}; + +/* + * When in debug mode we compile all formats with all features. This allows the + * kunit to test the full matrix. + */ +#if IS_ENABLED(CONFIG_DEBUG_GENERIC_PT) +#undef PT_SUPPORTED_FEATURES +#define PT_SUPPORTED_FEATURES UINT_MAX +#endif + +/* + * The format instantiaion can have features wired off or on to optimize the + * code gen. Supported features are just a reflection of what the current set of + * kernel users want to use. + */ +#ifndef PT_SUPPORTED_FEATURES +#define PT_SUPPORTED_FEATURES 0 +#endif + +#ifndef PT_FORCE_ENABLED_FEATURES +#define PT_FORCE_ENABLED_FEATURES 0 +#endif + +#define PT_GRANUAL_SIZE (1 << PT_GRANUAL_LG2SZ) + +/* + * Language used in Generic Page Table + * va: The input address to the page table + * oa: The output address from the page table. + * leaf: A entry that results in an output address. Ie a page pointer + * start/end: An open range, eg [0,0) refers to no VA + * start/last: An inclusive closed range, eg [0,0] refers to the VA 0 + * common: The generic page table container struct pt_common + * level: The number of table hops from the lowest leaf. Level 0 + * is always a table of only leaves of the least significant VA bits + * top_level: The inclusive highest level of the table. A two level table + * has a top level of 1. + * table: A linear array of entries representing the translation for that + * level. + * entry: A single element in a table + * index: The position in a table of an element: entry = table[index] + * entry_size: The number of bytes of VA the entry translates for. + * If the entry is a table entry then the next table covers + * this size. If the entry is an output address then the + * full OA is: OA | (VA % entry_size) + * contig_count: The number of consecutive entries fused into a single OA. + * entry_size * contig_count is the size of that translation. + * This is often called contiguous pages + * lg2: Indicates the value is encoded as log2, ie 1<features & BIT(feature_nr); +} + +static inline bool pts_feature(const struct pt_state *pts, + unsigned int feature_nr) +{ + return pt_feature(pts->range->common, feature_nr); +} + +/* + * PT_WARN_ON is used for invariants that the kunit should be checking can't + * happen. + */ +#if IS_ENABLED(CONFIG_DEBUG_GENERIC_PT) +#define PT_WARN_ON WARN_ON +#else +static inline bool PT_WARN_ON(bool condition) +{ + return false; +} +#endif + +/* These all work on the VA type */ +#define log2_to_int(a_lg2) log2_to_int_t(pt_vaddr_t, a_lg2) +#define log2_to_max_int(a_lg2) log2_to_max_int_t(pt_vaddr_t, a_lg2) +#define log2_div(a, b_lg2) log2_div_t(pt_vaddr_t, a, b_lg2) +#define log2_div_eq(a, b, c_lg2) log2_div_eq_t(pt_vaddr_t, a, b, c_lg2) +#define log2_mod(a, b_lg2) log2_mod_t(pt_vaddr_t, a, b_lg2) +#define log2_mod_eq_max(a, b_lg2) log2_mod_eq_max_t(pt_vaddr_t, a, b_lg2) +#define log2_set_mod(a, val, b_lg2) log2_set_mod_t(pt_vaddr_t, a, val, b_lg2) +#define log2_set_mod_max(a, b_lg2) log2_set_mod_max_t(pt_vaddr_t, a, b_lg2) +#define log2_mul(a, b_lg2) log2_mul_t(pt_vaddr_t, a, b_lg2) +#define log2_ffs(a) log2_ffs_t(pt_vaddr_t, a) +#define log2_fls(a) log2_fls_t(pt_vaddr_t, a) +#define log2_ffz(a) log2_ffz_t(pt_vaddr_t, a) + +/* + * The full va (fva) versions permit the lg2 value to be == PT_VADDR_MAX_LG2 and + * generate a useful defined result. The non fva versions will malfunction at + * this extreme. + */ +static inline pt_vaddr_t fvalog2_div(pt_vaddr_t a, unsigned int b_lg2) +{ + if (PT_SUPPORTED_FEATURE(PT_FEAT_FULL_VA) && b_lg2 == PT_VADDR_MAX_LG2) + return 0; + return log2_div_t(pt_vaddr_t, a, b_lg2); +} + +static inline bool fvalog2_div_eq(pt_vaddr_t a, pt_vaddr_t b, + unsigned int c_lg2) +{ + if (PT_SUPPORTED_FEATURE(PT_FEAT_FULL_VA) && c_lg2 == PT_VADDR_MAX_LG2) + return true; + return log2_div_eq_t(pt_vaddr_t, a, b, c_lg2); +} + +static inline pt_vaddr_t fvalog2_set_mod(pt_vaddr_t a, pt_vaddr_t val, + unsigned int b_lg2) +{ + if (PT_SUPPORTED_FEATURE(PT_FEAT_FULL_VA) && b_lg2 == PT_VADDR_MAX_LG2) + return val; + return log2_set_mod_t(pt_vaddr_t, a, val, b_lg2); +} + +static inline pt_vaddr_t fvalog2_set_mod_max(pt_vaddr_t a, unsigned int b_lg2) +{ + if (PT_SUPPORTED_FEATURE(PT_FEAT_FULL_VA) && b_lg2 == PT_VADDR_MAX_LG2) + return PT_VADDR_MAX; + return log2_set_mod_max_t(pt_vaddr_t, a, b_lg2); +} + +/* These all work on the OA type */ +#define oalog2_to_int(a_lg2) log2_to_int_t(pt_oaddr_t, a_lg2) +#define oalog2_to_max_int(a_lg2) log2_to_max_int_t(pt_oaddr_t, a_lg2) +#define oalog2_div(a, b_lg2) log2_div_t(pt_oaddr_t, a, b_lg2) +#define oalog2_div_eq(a, b, c_lg2) log2_div_eq_t(pt_oaddr_t, a, b, c_lg2) +#define oalog2_mod(a, b_lg2) log2_mod_t(pt_oaddr_t, a, b_lg2) +#define oalog2_mod_eq_max(a, b_lg2) log2_mod_eq_max_t(pt_oaddr_t, a, b_lg2) +#define oalog2_set_mod(a, val, b_lg2) log2_set_mod_t(pt_oaddr_t, a, val, b_lg2) +#define oalog2_set_mod_max(a, b_lg2) log2_set_mod_max_t(pt_oaddr_t, a, b_lg2) +#define oalog2_mul(a, b_lg2) log2_mul_t(pt_oaddr_t, a, b_lg2) +#define oalog2_ffs(a) log2_ffs_t(pt_oaddr_t, a) +#define oalog2_fls(a) log2_fls_t(pt_oaddr_t, a) +#define oalog2_ffz(a) log2_ffz_t(pt_oaddr_t, a) + +#define pt_cur_table(pts, type) ((type *)((pts)->table)) + +static inline uintptr_t _pt_top_set(struct pt_table_p *table_mem, + unsigned int top_level) +{ + return top_level | (uintptr_t)table_mem; +} + +static inline void pt_top_set(struct pt_common *common, + struct pt_table_p *table_mem, + unsigned int top_level) +{ + WRITE_ONCE(common->top_of_table, _pt_top_set(table_mem, top_level)); +} + +static inline void pt_top_set_level(struct pt_common *common, + unsigned int top_level) +{ + pt_top_set(common, NULL, top_level); +} + +static inline unsigned int pt_top_get_level(const struct pt_common *common) +{ + return READ_ONCE(common->top_of_table) % (1 << PT_TOP_LEVEL_BITS); +} + +#endif diff --git a/drivers/iommu/generic_pt/pt_fmt_defaults.h b/drivers/iommu/generic_pt/pt_fmt_defaults.h new file mode 100644 index 00000000000000..4532a1146c5eca --- /dev/null +++ b/drivers/iommu/generic_pt/pt_fmt_defaults.h @@ -0,0 +1,109 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Default definitions for formats that don't define these functions. + */ +#ifndef __GENERIC_PT_PT_FMT_DEFAULTS_H +#define __GENERIC_PT_PT_FMT_DEFAULTS_H + +#include "pt_defs.h" +#include + +/* Header self-compile default defines */ +#ifndef pt_load_entry_raw +#include "fmt/amdv1.h" +#endif + +/* If not supplied by the format then contiguous pages are not supported */ +#ifndef pt_entry_num_contig_lg2 +static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts) +{ + return ilog2(1); +} + +static inline unsigned short pt_contig_count_lg2(const struct pt_state *pts) +{ + return ilog2(1); +} +#endif + +/* If not supplied by the format then dirty tracking is not supported */ +#ifndef pt_entry_write_is_dirty +static inline bool pt_entry_write_is_dirty(const struct pt_state *pts) +{ + return false; +} + +static inline void pt_entry_set_write_clean(struct pt_state *pts) +{ +} +#endif + +/* + * Format supplies either: + * pt_entry_oa - OA is at the start of a contiguous entry + * or + * pt_item_oa - OA is correct for every item in a contiguous entry + * + * Build the missing one + */ +#ifdef pt_entry_oa +static inline pt_oaddr_t pt_item_oa(const struct pt_state *pts) +{ + return pt_entry_oa(pts) | + log2_mul(pts->index, pt_table_item_lg2sz(pts)); +} +#define _pt_entry_oa_fast pt_entry_oa +#endif + +#ifdef pt_item_oa +static inline pt_oaddr_t pt_entry_oa(const struct pt_state *pts) +{ + return log2_set_mod(pt_item_oa(pts), 0, + pt_entry_num_contig_lg2(pts) + + pt_table_item_lg2sz(pts)); +} +#define _pt_entry_oa_fast pt_item_oa +#endif + +/* + * If not supplied by the format then use the constant + * PT_MAX_OUTPUT_ADDRESS_LG2. + */ +#ifndef pt_max_output_address_lg2 +static inline unsigned int +pt_max_output_address_lg2(const struct pt_common *common) +{ + return PT_MAX_OUTPUT_ADDRESS_LG2; +} +#endif + +/* + * If not supplied by the format then assume only one contiguous size determined + * by pt_contig_count_lg2() + */ +#ifndef pt_possible_sizes +static inline unsigned short pt_contig_count_lg2(const struct pt_state *pts); + +/* Return a bitmap of possible leaf page sizes at this level */ +static inline pt_vaddr_t pt_possible_sizes(const struct pt_state *pts) +{ + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + + if (!pt_can_have_leaf(pts)) + return 0; + return log2_to_int(isz_lg2) | + log2_to_int(pt_contig_count_lg2(pts) + isz_lg2); +} +#endif + +/* If not supplied by the format then use 0. */ +#ifndef pt_full_va_prefix +static inline pt_vaddr_t pt_full_va_prefix(const struct pt_common *common) +{ + return 0; +} +#endif + +#endif diff --git a/drivers/iommu/generic_pt/pt_iter.h b/drivers/iommu/generic_pt/pt_iter.h new file mode 100644 index 00000000000000..a36dade62a6f32 --- /dev/null +++ b/drivers/iommu/generic_pt/pt_iter.h @@ -0,0 +1,468 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Iterators for Generic Page Table + */ +#ifndef __GENERIC_PT_PT_ITER_H +#define __GENERIC_PT_PT_ITER_H + +#include "pt_common.h" + +#include + +/* + * Use to mangle symbols so that backtraces and the symbol table are + * understandable. Any non-inlined function should get mangled like this. + */ +#define NS(fn) CONCATENATE(PTPFX, fn) + +/* + * With range->va being the start, set range->last_va and validate that the + * range is within the allowed + */ +static inline int pt_check_range(struct pt_range *range) +{ + pt_vaddr_t prefix = pt_full_va_prefix(range->common); + + PT_WARN_ON(!range->max_vasz_lg2); + + if (!fvalog2_div_eq(range->va, prefix, range->max_vasz_lg2) || + !fvalog2_div_eq(range->last_va, prefix, range->max_vasz_lg2)) + return -ERANGE; + return 0; +} + +/* + * Adjust the va to match the current index. + */ +static inline void pt_index_to_va(struct pt_state *pts) +{ + unsigned int table_lg2sz = pt_table_oa_lg2sz(pts); + pt_vaddr_t lower_va; + + lower_va = log2_mul(pts->index, pt_table_item_lg2sz(pts)); + pts->range->va = fvalog2_set_mod(pts->range->va, lower_va, table_lg2sz); +} + +/* + * Add index_count_lg2 number of entries to pts's VA and index. The va will be + * adjusted to the end of the contiguous block if it is currently in the middle. + */ +static inline void _pt_advance(struct pt_state *pts, + unsigned int index_count_lg2) +{ + pts->index = log2_set_mod(pts->index + log2_to_int(index_count_lg2), 0, + index_count_lg2); + pt_index_to_va(pts); +} + +/* True if the current entry is fully enclosed by the range of va to last_va. */ +static inline bool pt_entry_fully_covered(const struct pt_state *pts, + unsigned int oasz_lg2) +{ + struct pt_range *range = pts->range; + + /* Range begins at the start of the entry */ + if (log2_mod(pts->range->va, oasz_lg2)) + return false; + + /* Range ends past the end of the entry */ + if (!log2_div_eq(range->va, range->last_va, oasz_lg2)) + return true; + + /* Range ends at the end of the entry */ + return log2_mod_eq_max(range->last_va, oasz_lg2); +} + +static inline unsigned int pt_range_to_index(struct pt_state *pts) +{ + unsigned int num_entries_lg2 = pt_num_items_lg2(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + + if (pts->range->top_level == pts->level) + return log2_div(pts->range->va, isz_lg2); + return log2_mod(log2_div(pts->range->va, isz_lg2), num_entries_lg2); +} + +static inline void _pt_iter_first(struct pt_state *pts) +{ + unsigned int num_entries_lg2 = pt_num_items_lg2(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + struct pt_range *range = pts->range; + + pts->index = pt_range_to_index(pts); + if (range->va == range->last_va) { + pts->end_index = pts->index + 1; + return; + } + + /* last_va falls within this table */ + if (pts->range->top_level == pts->level || + log2_div_eq(range->va, range->last_va, num_entries_lg2 + isz_lg2)) { + pts->end_index = log2_mod(log2_div(range->last_va, isz_lg2), + num_entries_lg2) + + 1; + return; + } + pts->end_index = log2_to_int(num_entries_lg2); +} + +static inline bool _pt_iter_load(struct pt_state *pts) +{ + if (pts->index == pts->end_index) + return false; + pt_load_entry(pts); + return true; +} + +/* Update pts to go to the next index at this level */ +static inline void pt_next_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_OA) + _pt_advance(pts, pt_entry_num_contig_lg2(pts)); + else + _pt_advance(pts, ilog2(1)); +} + +#define for_each_pt_level_item(pts) \ + for (_pt_iter_first(pts); _pt_iter_load(pts); pt_next_entry(pts)) + +/* Version of pt_load_entry() usable within a walker */ +static inline enum pt_entry_type pt_load_single_entry(struct pt_state *pts) +{ + pts->index = pt_range_to_index(pts); + pt_load_entry(pts); + return pts->type; +} + +static __always_inline struct pt_range _pt_top_range(struct pt_common *common, + uintptr_t top_of_table) +{ + struct pt_range range = { + .common = common, + .top_table = + (struct pt_table_p *)(top_of_table & + ~(uintptr_t)PT_TOP_LEVEL_MASK), +#ifdef PT_FIXED_TOP_LEVEL + .top_level = PT_FIXED_TOP_LEVEL, +#else + .top_level = top_of_table % (1 << PT_TOP_LEVEL_BITS), +#endif + }; + struct pt_state pts = { .range = &range, .level = range.top_level }; + + range.max_vasz_lg2 = + min_t(unsigned int, common->max_vasz_lg2, + pt_num_items_lg2(&pts) + pt_table_item_lg2sz(&pts)); + range.va = fvalog2_set_mod(pt_full_va_prefix(common), 0, + range.max_vasz_lg2); + range.last_va = fvalog2_set_mod_max(pt_full_va_prefix(common), + range.max_vasz_lg2); + return range; +} + +/* Span the whole table */ +static __always_inline struct pt_range pt_top_range(struct pt_common *common) +{ + /* + * The top pointer can change without locking. We capture the value and + * it's level here and are safe to walk it so long as both values are + * captured without tearing. + */ + return _pt_top_range(common, READ_ONCE(common->top_of_table)); +} + +/* Span a slice of the table starting at the top */ +static __always_inline struct pt_range +pt_make_range(struct pt_common *common, pt_vaddr_t va, pt_vaddr_t last_va) +{ + struct pt_range range = + _pt_top_range(common, READ_ONCE(common->top_of_table)); + + range.va = va; + range.last_va = last_va; + return range; +} + +/* + * Span a slice of the table starting at a lower table level from an active + * walk. + */ +static __always_inline struct pt_range +pt_make_child_range(const struct pt_range *parent, pt_vaddr_t va, + pt_vaddr_t last_va) +{ + struct pt_range range = *parent; + + range.va = va; + range.last_va = last_va; + + PT_WARN_ON(last_va < va); + PT_WARN_ON(pt_check_range(&range)); + + return range; +} + +static __always_inline struct pt_state +pt_init(struct pt_range *range, unsigned int level, struct pt_table_p *table) +{ + struct pt_state pts = { + .range = range, + .table = table, + .level = level, + }; + return pts; +} + +static __always_inline struct pt_state pt_init_top(struct pt_range *range) +{ + return pt_init(range, range->top_level, range->top_table); +} + +typedef int (*pt_level_fn_t)(struct pt_range *range, void *arg, + unsigned int level, struct pt_table_p *table); + +static __always_inline int pt_descend(struct pt_state *pts, void *arg, + pt_level_fn_t fn) +{ + int ret; + + if (PT_WARN_ON(!pts->table_lower)) + return -EINVAL; + + ret = (*fn)(pts->range, arg, pts->level - 1, pts->table_lower); + return ret; +} + +/* + * Walk over an IOVA range. The caller should have done a validity check, at + * least calling pt_check_range(), when building range. + */ +static __always_inline int pt_walk_range(struct pt_range *range, + pt_level_fn_t fn, void *arg) +{ + return fn(range, arg, range->top_level, range->top_table); +} + +/* + * With parent_pts pointing at a table this will prepare to walk over a slice of + * the child table of the current entry. + */ +static __always_inline int +pt_walk_child_range(const struct pt_state *parent_pts, pt_vaddr_t va, + pt_vaddr_t last_va, pt_level_fn_t fn, void *arg) +{ + struct pt_range range = + pt_make_child_range(parent_pts->range, va, last_va); + + if (PT_WARN_ON(!pt_can_have_table(parent_pts)) || + PT_WARN_ON(!parent_pts->table_lower)) + return -EINVAL; + + return fn(&range, arg, parent_pts->level - 1, parent_pts->table_lower); +} + +/* + * With parent_pts pointing at a table this will prepare to walk over the entire + * the child table + */ +static __always_inline int pt_walk_child_all(const struct pt_state *parent_pts, + pt_level_fn_t fn, void *arg) +{ + unsigned int isz_lg2 = pt_table_item_lg2sz(parent_pts); + + return pt_walk_child_range( + parent_pts, log2_set_mod(parent_pts->range->va, 0, isz_lg2), + log2_set_mod_max(parent_pts->range->va, isz_lg2), fn, arg); +} + +/* Create a range than spans an index range of the current pt_state */ +static inline struct pt_range pt_range_slice(const struct pt_state *pts, + unsigned int start_index, + unsigned int end_index) +{ + unsigned int table_lg2sz = pt_table_oa_lg2sz(pts); + pt_vaddr_t last_va; + pt_vaddr_t va; + + va = fvalog2_set_mod(pts->range->va, + log2_mul(start_index, pt_table_item_lg2sz(pts)), + table_lg2sz); + last_va = fvalog2_set_mod( + pts->range->va, + log2_mul(end_index, pt_table_item_lg2sz(pts)) - 1, table_lg2sz); + return pt_make_child_range(pts->range, va, last_va); +} + +/* + * Compute the size of the top table. For PT_FEAT_DYNAMIC_TOP this will compute + * the top size assuming the table will grow. + */ +static inline unsigned int pt_top_memsize_lg2(struct pt_common *common, + uintptr_t top_of_table) +{ + struct pt_range range = _pt_top_range(common, top_of_table); + struct pt_state pts = pt_init_top(&range); + unsigned int num_items_lg2; + + num_items_lg2 = common->max_vasz_lg2 - pt_table_item_lg2sz(&pts); + if (range.top_level != PT_MAX_TOP_LEVEL && + pt_feature(common, PT_FEAT_DYNAMIC_TOP)) + num_items_lg2 = min(num_items_lg2, pt_num_items_lg2(&pts)); + + return num_items_lg2 + ilog2(PT_ENTRY_WORD_SIZE); +} + +static inline unsigned int __pt_compute_best_pgsize(pt_vaddr_t pgsz_bitmap, + pt_vaddr_t va, + pt_vaddr_t last_va, + pt_oaddr_t oa) +{ + unsigned int best_pgsz_lg2; + unsigned int pgsz_lg2; + pt_vaddr_t len = last_va - va + 1; + pt_vaddr_t mask; + + if (PT_WARN_ON(va >= last_va)) + return 0; + + /* + * Given a VA/OA pair the best page size is the largest page side + * where: + * + * 1) VA and OA start at the page. Bitwise this is the count of least + * significant 0 bits. + * This also implies that last_va/oa has the same prefix as va/oa. + */ + mask = va | oa; + + /* + * 2) The page size is not larger than the last_va (length). Since page + * sizes are always power of two this can't be larger than the + * largest power of two factor of the length. + */ + mask |= log2_to_int(log2_fls(len) - 1); + + best_pgsz_lg2 = log2_ffs(mask); + + /* Choose the higest bit <= best_pgsz_lg2 */ + if (best_pgsz_lg2 < PT_VADDR_MAX_LG2 - 1) + pgsz_bitmap = log2_mod(pgsz_bitmap, best_pgsz_lg2 + 1); + + pgsz_lg2 = log2_fls(pgsz_bitmap); + if (!pgsz_lg2) + return 0; + + pgsz_lg2--; + + PT_WARN_ON(log2_mod(va, pgsz_lg2) != 0); + PT_WARN_ON(oalog2_mod(oa, pgsz_lg2) != 0); + PT_WARN_ON(va + log2_to_int(pgsz_lg2) - 1 > last_va); + PT_WARN_ON(!log2_div_eq(va, va + log2_to_int(pgsz_lg2) - 1, pgsz_lg2)); + PT_WARN_ON( + !oalog2_div_eq(oa, oa + log2_to_int(pgsz_lg2) - 1, pgsz_lg2)); + return pgsz_lg2; +} + +/* + * Compute the largest page size for va, last_va, and pa together and return it + * in lg2. + */ +static inline unsigned int pt_compute_best_pgsize(struct pt_state *pts, + pt_oaddr_t oa) +{ + return __pt_compute_best_pgsize(pt_possible_sizes(pts), pts->range->va, + pts->range->last_va, oa); +} + +#define _PT_MAKE_CALL_LEVEL2(fn) \ + static __always_inline int fn(struct pt_range *range, void *arg, \ + unsigned int level, \ + struct pt_table_p *table) \ + { \ + switch (level) { \ + case 0: \ + return CONCATENATE(fn, 0)(range, arg, level, table); \ + case 1: \ + if (1 > PT_MAX_TOP_LEVEL) \ + break; \ + return CONCATENATE(fn, 1)(range, arg, level, table); \ + case 2: \ + if (2 > PT_MAX_TOP_LEVEL) \ + break; \ + return CONCATENATE(fn, 2)(range, arg, level, table); \ + case 3: \ + if (3 > PT_MAX_TOP_LEVEL) \ + break; \ + return CONCATENATE(fn, 3)(range, arg, level, table); \ + case 4: \ + if (4 > PT_MAX_TOP_LEVEL) \ + break; \ + return CONCATENATE(fn, 4)(range, arg, level, table); \ + default: \ + break; \ + } \ + return -EPROTOTYPE; \ + } + +#define _PT_MAKE_CALL_LEVEL(fn) \ + static __always_inline int fn(struct pt_range *range, void *arg, \ + unsigned int level, \ + struct pt_table_p *table) \ + { \ + static_assert(PT_MAX_TOP_LEVEL <= 5); \ + if (level == 0) \ + return CONCATENATE(fn, 0)(range, arg, 0, table); \ + if (level == 1 || PT_MAX_TOP_LEVEL == 1) \ + return CONCATENATE(fn, 1)(range, arg, 1, table); \ + if (level == 2 || PT_MAX_TOP_LEVEL == 2) \ + return CONCATENATE(fn, 2)(range, arg, 2, table); \ + if (level == 3 || PT_MAX_TOP_LEVEL == 3) \ + return CONCATENATE(fn, 3)(range, arg, 3, table); \ + if (level == 4 || PT_MAX_TOP_LEVEL == 4) \ + return CONCATENATE(fn, 4)(range, arg, 4, table); \ + return CONCATENATE(fn, 5)(range, arg, 5, table); \ + } + +static inline int __pt_make_level_fn_err(struct pt_range *range, void *arg, + unsigned int unused_level, + struct pt_table_p *table) +{ + static_assert(PT_MAX_TOP_LEVEL <= 5); + return -EPROTOTYPE; +} + +#define __PT_MAKE_LEVEL_FN(fn, level, descend_fn, do_fn) \ + static inline int fn(struct pt_range *range, void *arg, \ + unsigned int unused_level, \ + struct pt_table_p *table) \ + { \ + return do_fn(range, arg, level, table, descend_fn); \ + } + +/* + * This builds a function call tree that can be fully inlined, + * The caller must provide a function body in an __always_inline function: + * static __always_inline int do(struct pt_range *range, void *arg, + * unsigned int level, struct pt_table_p *table, + * pt_level_fn_t descend_fn) + * + * An inline function will be created for each table level that calls do with a + * compile time constant for level and a pointer to the next lower function. + * This generates an optimally inlined walk where each of the functions sees a + * constant level and can codegen the exact constants/etc for that level. + * + * Note this can produce a lot of code! + */ +#define PT_MAKE_LEVELS(fn, do_fn) \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 0), 0, __pt_make_level_fn_err, \ + do_fn); \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 1), 1, CONCATENATE(fn, 0), do_fn); \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 2), 2, CONCATENATE(fn, 1), do_fn); \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 3), 3, CONCATENATE(fn, 2), do_fn); \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 4), 4, CONCATENATE(fn, 3), do_fn); \ + __PT_MAKE_LEVEL_FN(CONCATENATE(fn, 5), 5, CONCATENATE(fn, 4), do_fn); \ + _PT_MAKE_CALL_LEVEL(fn) + +#endif diff --git a/drivers/iommu/generic_pt/pt_log2.h b/drivers/iommu/generic_pt/pt_log2.h new file mode 100644 index 00000000000000..8eb966e31c3afd --- /dev/null +++ b/drivers/iommu/generic_pt/pt_log2.h @@ -0,0 +1,131 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Helper macros for working with log2 values + * + */ +#ifndef __GENERIC_PT_LOG2_H +#define __GENERIC_PT_LOG2_H +#include +#include +#include + +/* Compute a */ +#define log2_to_int_t(type, a_lg2) ((type)(((type)1) << (a_lg2))) +static_assert(log2_to_int_t(unsigned int, 0) == 1); + +/* Compute a - 1 (aka all low bits set) */ +#define log2_to_max_int_t(type, a_lg2) ((type)(log2_to_int_t(type, a_lg2) - 1)) + +/* Compute a / b */ +#define log2_div_t(type, a, b_lg2) ((type)(((type)a) >> (b_lg2))) +static_assert(log2_div_t(unsigned int, 4, 2) == 1); + +/* + * Compute: + * a / c == b / c + * aka the high bits are equal + */ +#define log2_div_eq_t(type, a, b, c_lg2) \ + (log2_div_t(type, (a) ^ (b), c_lg2) == 0) +static_assert(log2_div_eq_t(unsigned int, 1, 1, 2)); + +/* Compute a % b */ +#define log2_mod_t(type, a, b_lg2) \ + ((type)(((type)a) & log2_to_max_int_t(type, b_lg2))) +static_assert(log2_mod_t(unsigned int, 1, 2) == 1); + +/* + * Compute: + * a % b == b - 1 + * aka the low bits are all 1s + */ +#define log2_mod_eq_max_t(type, a, b_lg2) \ + (log2_mod_t(type, a, b_lg2) == log2_to_max_int_t(type, b_lg2)) +static_assert(log2_mod_eq_max_t(unsigned int, 3, 2)); + +/* + * Return a value such that: + * a / b == ret / b + * ret % b == val + * aka set the low bits to val. val must be < b + */ +#define log2_set_mod_t(type, a, val, b_lg2) \ + ((((type)(a)) & (~log2_to_max_int_t(type, b_lg2))) | ((type)(val))) +static_assert(log2_set_mod_t(unsigned int, 3, 1, 2) == 1); + +/* Return a value such that: + * a / b == ret / b + * ret % b == b - 1 + * aka set the low bits to all 1s + */ +#define log2_set_mod_max_t(type, a, b_lg2) \ + (((type)(a)) | log2_to_max_int_t(type, b_lg2)) +static_assert(log2_set_mod_max_t(unsigned int, 2, 2) == 3); + +/* Compute a * b */ +#define log2_mul_t(type, a, b_lg2) ((type)(((type)a) << (b_lg2))) +static_assert(log2_mul_t(unsigned int, 2, 2) == 8); + +#define _dispatch_sz(type, fn, a) \ + (sizeof(type) == 4 ? fn##32((u32)a) : fn##64(a)) + +/* + * Return the highest value such that: + * log2_fls(0) == 0 + * log2_fls(1) == 1 + * a >= log2_to_int(ret - 1) + * aka find last set bit + */ +static inline unsigned int log2_fls32(u32 a) +{ + return fls(a); +} +static inline unsigned int log2_fls64(u64 a) +{ + return fls64(a); +} +#define log2_fls_t(type, a) _dispatch_sz(type, log2_fls, a) + +/* + * Return the highest value such that: + * log2_ffs(0) == UNDEFINED + * log2_ffs(1) == 0 + * log_mod(a, ret) == 0 + * aka find first set bit + */ +static inline unsigned int log2_ffs32(u32 a) +{ + return __ffs(a); +} +static inline unsigned int log2_ffs64(u64 a) +{ + return __ffs64(a); +} +#define log2_ffs_t(type, a) _dispatch_sz(type, log2_ffs, a) + +/* + * Return the highest value such that: + * log2_ffz(MAX) == UNDEFINED + * log2_ffz(0) == 0 + * log2_ffz(1) == 1 + * log_mod(a, ret) == log_to_max_int(ret) + * aka find first zero bit + */ +static inline unsigned int log2_ffz32(u32 a) +{ + return ffz(a); +} +static inline unsigned int log2_ffz64(u64 a) +{ + if (sizeof(u64) == sizeof(unsigned long)) + return ffz(a); + + if ((u32)a == U32_MAX) + return log2_ffz32(a >> 32) + 32; + return log2_ffz32(a); +} +#define log2_ffz_t(type, a) _dispatch_sz(type, log2_ffz, a) + +#endif diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h new file mode 100644 index 00000000000000..6a865dbf075192 --- /dev/null +++ b/include/linux/generic_pt/common.h @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#ifndef __GENERIC_PT_COMMON_H +#define __GENERIC_PT_COMMON_H + +#include +#include +#include + +struct pt_table_p; + +/** + * DOC: Generic Radix Page Table + * + * Generic Radix Page Table is a set of functions and helpers to efficiently + * parse radix style page tables typically seen in HW implementations. The + * interface is built to deliver similar code generation as the mm's pte/pmd/etc + * system by fully inlining the exact code required to handle each level. + * + * Like the MM each format contributes its parsing implementation under common + * names and the common code implements an algorithm. + * + * The system is divided into three logical levels: + * - The page table format and its accessors + * - Generic helpers to make + * - An implementation (eg IOMMU/DRM/KVM/MM) + * + * Multiple implementations are supported, the intention is to have the generic + * format code be re-usable for whatever specalized implementation is required. + */ +struct pt_common { + /** + * @top_of_table: Encodes the table top pointer and the top level in a + * single value. Must use READ_ONCE/WRITE_ONCE to access it. The lower + * bits of the aligned table pointer are used for the level. + */ + uintptr_t top_of_table; + /** + * @oasz_lg2: Maximum number of bits the OA can contain. Upper bits must + * be zero. This may be less than what the page table format supports, + * but must not be more. It reflects the active HW capability. + */ + u8 max_oasz_lg2; + /** + * @vasz_lg2: Maximum number of bits the VA can contain. Upper bits are + * 0 or 1 depending on PT_FEAT_TOP_DOWN. This may be less than what the + * page table format supports, but must not be more. When + * PT_FEAT_DYNAMIC_TOP this reflects the maximum VA capability, not the + * current maximum VA size for the current top level. + */ + u8 max_vasz_lg2; + unsigned int features; +}; + +enum { + PT_TOP_LEVEL_BITS = 3, + PT_TOP_LEVEL_MASK = GENMASK(PT_TOP_LEVEL_BITS - 1, 0), +}; + +enum { + /* + * Cache flush page table memory before assuming the HW can read it. + * Otherwise a SMP release is sufficient for HW to read it. + */ + PT_FEAT_DMA_INCOHERENT, + /* + * An OA entry can change size while still present. For instance an item + * can be up-sized to a contiguous entry, a contiguous entry down-sized + * to single items, or the size of a contiguous entry changed. Changes + * are hitless to ongoing translation. Otherwise an OA has to be made + * non present and flushed before it can be re-established with a new + * size. + */ + PT_FEAT_OA_SIZE_CHANGE, + /* + * A non-contiguous OA entry can be converted to a populated table and + * vice versa while still present. For instance a OA with a high size + * can be replaced with a table mapping the same OA using a lower size. + * Assuming the table has the same translation as the OA then it is + * hitless to ongoing translation. Otherwise an OA or populated table + * can only be stored over a non-present item. + * + * Note this does not apply to tables which have entirely non present + * items. A non present table can be replaced with an OA or vice versa + * freely so long as nothing is made present without flushing. + */ + PT_FEAT_OA_TABLE_XCHG, + /* + * The table can span the full VA range from 0 to PT_VADDR_MAX. + */ + PT_FEAT_FULL_VA, + /* + * The table's top level can be increased dynamically during map. This + * requires HW support for atomically setting both the table top pointer + * and the starting table level. + */ + PT_FEAT_DYNAMIC_TOP, + PT_FEAT_FMT_START, +}; + +#endif From patchwork Thu Aug 15 15:11:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764917 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2073.outbound.protection.outlook.com [40.107.92.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDEC61ABEC8 for ; Thu, 15 Aug 2024 15:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.73 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734708; cv=fail; b=Ftk/C5HZxEO6wMZHA3lgkll1snJ5e80G0g429ZAcKqle8MQ49KIpjpUKsgAiXhdsYRnl6pBE6ta7lZf5LnvpNGrUtwZupgUqIZI5AFT8o7UAgoAl9wldIj32hahx1sJnVhMGrX8wioH29LSnbdNbF5NoDBK3UgNu5aZfHZFO/kc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734708; c=relaxed/simple; bh=5q8ox8u6ER/A2O25TMKyRiddQ8hXgbfucnlYAQsfHoU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=rbBpd5edEb2D+31It5NOUyEmK2667J0DdJZfcZYFli1wNyVvoqwo8hTod4zvuiy/E/m/5b7C7j9G+lzwfh7Wc3r1ha9AmBP1EINv8/GEzbfPN58duYMzHlHKmt/iHE3qRdQDUiLq7WLld16ekvcWS6+luYwJUW7ew5S+DMuuDd0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=izF+pujD; arc=fail smtp.client-ip=40.107.92.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="izF+pujD" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MFiusJvmurS8wISARywhk/bKo8z36QuBoLE+aV+z0NeddEyZkHoLSHC526DpntkDG2JwSOY/kkK2JPUNej8czxwziqNlHlaI3z+1iH5Rs8nt+0eeMVPR6U3cMrscesIVVNKdwO42IYkUTsmAl6VVZLAG34+Uuj7t+6elfOegv18eGjYifiDPdmNyZb6vReaCbKcyMG4rc3Gwopc5++LdOYy8RQi6bL8sWjis8s8j87XAWVIzxlQkLBhFD5FrllbBlVoqRNosH8SFgrA4b7MfP4jklDPBbv43dkXa3Lg/3uOZ30IBF5I9OxGcLeEe4FXzEMjg5ZJM7NL0cGKQDrfA3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eJJA1GQA3vJPtmW3yZinNnfONAqFQ4KjojeNqa3Q0/k=; b=YJKGUUblNrCZDaga5IqFrQNsuIXIIHzsziedgvaAN0RU9fgsrDfc9mz0hYE5wCfExVxueL5unfNheZVlV3lH3+31aYe3iHyMyq66U7VBKzQuj4omgnxt6uyE7AIIS1sd+OCog5Wi176XiDHVK/eXjhq+ktMfKVModjUlwjIvGzyoZGC0NBU39gWtfODH/KKmp6S1MAEGCouTV7R/oJQc2EXLNNqBoyjc2n1N+llCJ7aMJrnQy2cD1O5pqZ0M+taw6gzGYn/oMUK2C3JbziMXRCLqFq0tSmTAuBb/d+aKYKqp7OFND40U43BhvgFWgaQtYBIqzYDTgbSp5OU1rQEFxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eJJA1GQA3vJPtmW3yZinNnfONAqFQ4KjojeNqa3Q0/k=; b=izF+pujDchOiDYDyEc2zpKeWQehgzlmWdDMVXwS9CYAQFBhhNKQwVF39qYbGqNWpq86YfDwUMCDPPzA7UOuzDnr8r3AnEkne2mA0MrPSPiTv0usnlB/qRzu4uZOYBqKeKjAvuF+EKH5g8HHaQiopT3eRq8h5N+4qAXFxCudftgzIK2PcIZOQUvFLAdzS0+yHtk+ocWig3QfCVgncG5YdP+BaSbCX1/k+pR/uNyr1zD7oTQpcgL4vEsCo/1W5H1orGM3sn8Odfshxq6HSX6SCsTj9pcV/zL1A6hn/9j8pmRZ9i05qijOkHTmrfNRiW2HgQpQbIDKpwxsnKhRiBUKsKg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by DS0PR12MB6631.namprd12.prod.outlook.com (2603:10b6:8:d1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Thu, 15 Aug 2024 15:11:40 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:40 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 02/16] genpt: Add a specialized allocator for page table levels Date: Thu, 15 Aug 2024 12:11:18 -0300 Message-ID: <2-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR05CA0046.namprd05.prod.outlook.com (2603:10b6:208:335::26) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|DS0PR12MB6631:EE_ X-MS-Office365-Filtering-Correlation-Id: 64bd94e8-6c90-4333-4927-08dcbd3c8e12 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: VP6dQLmgY3NjXPaFZ6xFFwUkqgXP5NkntvAg5PTwcnrgnYeEK/Zegm+7FQszerpVNs2NbecLyJ9ntgRaC2gqhCuHoaSNRjFCLCKHnX+PQ9IjyXBpI+HOmLVTFlVd/w0h/02pvhwa3y0dfFB06YDT6Cx8Rfa5UQsrYJSZ0ozCCU38/p3UcQvJzzlqIn/mHMTzG/G7yhy/ucfslfQhWm9fLnLOI4c1qksnGBWVDd5B2GwHVpp/fh3bySVq5ZWUhJBiKa44Ylhj7MOBhB8XMpIs6e1YuS4QTknW85fRdyjL0ZK6XjGlvzRraDVkrJ0XyIBbnSKmn0LK2SX5pzx7qzUlGybTKvLe9Xi2se01rQABDp2GYpMs8rvAO4/gDSZoTI1kIXkgvbPlQATFNVFyRShOG7g2/tNeEzFfuw/nCUq1SzySH+J93Lr3HXT1+6Dw3GM5bqJ+FjT3LIsbLP/YAL8maaOoiEE/UsErbvnapvfJHsJIUhmpZphSAP+J0aIe1a7X6cUT/E4kPyanmxuIE6TpZ4GGTsYMSvqD5YtlZs0R3L2QeQ48aN5+m8q6GHHr7z/+Ln3YhuTWEnG5y5XV4N7Q0kBgaRnayh587WMZnfj86Npnz9z9jICkhOeKo/+/6HuhWTWljzfqAn0F8L9CDyTob3d9VZibbJMTSUeUNAfF8e3fUIVvDDTMiZYEhdcFWacS4dyyXCu52sIteiXL6PlDFV8wj85WbimjOr4doHysKZOAGKyrqv8fYcL6TdssSjsq39iKBEPwYRFo928SRlsrUjB2noP7473c98qlFM4OfxEpK0JgAe/FVWEZO55kAwEPw+Oay6Xn8Cel/DA6aIlfQRQqInR61JQ0mrHwrhKsj9vP0sd+ewTByPUxkXzBwSfWVeree/F6rC/6Y/FF+IYtaJD5Pv9jrXtK4On+RXXjzad03spqY1dlsxseUSvMmrQWe34J9v4OwS4iv4FbLM+vCc2a3d1MYs7zhJaFG9T1pNaMtQ98iDhlC8qwqbwbTdc87xGWrSkEjoJpmlv+VPXmX58d9xKMsM1bhuyRI0lNsFOo63oOf4JkM2OPKj7s0iPypAgaI/xtW4BogEeqbhGDtEBBBbsKbfu856RkHGg2CfmZrHwTSYvoHiL/4h5D4jgY2wuceVX1uyXHIcDNGdOhIA3GZfArf6ZGwu7lq2fOgX+mkSMxO+F94vY4Y1MQmDUNsVK8t7oxrSu6C57jJif5g/S/JzwLnvy4Fo7fjm31hr8+z6UR8ojWOJx/TPAJ7L6PGVL3JZrr/2uTHzHjMgIkDFSfjwRVjYkI1ldOP1+x+w2/EEwsvYer6EorNzkfhVXHyUwu2qwb9w//Kif801zncg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: k+fZLrzDM102u+qlw/t1/ymr1XjRCe5ES7bAhGZfpA4VoqGKFfmQEVKXAzyRNoi7ta/epMRn8bcsBK9MPive+NnqXjSNB+OJyR6zvbCStpYxQ/Jm9MnT03BDHB8HdszG+1636MNN+gTSh6fzgppcAEikuEPCn0FhJVWR7bwJ1nzJkSNE9DRYdgxj0c4VGw51U5oGgOXEKjCW5qf7ZfiOQo3SmV2sXJdKbKiZJwIzEHMdZRkUP571OTHFuacLLXdjzZUjhGJNFLmL1yMaiySn0jMv6iiQcBjAphxrCQW1V8XjMAhHYgeQFuC5/dZA9w5Ehbx5KkaUU9ChPZPJp8AZ9CG47ogz0Nz3jNkzj12bzv5AKmIe2DWAxqjgMeoQmevz89czLFspMOEY2F0X9Y2IRovte990FdT9+lONthesm8iaj3Ch+p6bI/U8iKtYm4UGsSsCMw9vQrajlTtFE3wsPjp0O//fa/ZASfkRVVBRH4FFp3CdjRSenjLghTlualMAeT0OFFYRbh2/6HBTbF/zDZu3FiiyafVUWZXL0Fu98KC5ejPREMJBQwTfchmtEa627CGDRWvVZK0ObJusTN+BeS1tDVP+YhD3K8INTX8NnUbVTyz9iNQN7EBDnP/HuECqscS+x7et4/6m9JsXyIUQohyR+f0PD7glaTXuuTSx/IomhiNFUzkCRFuioaSaJ6nPTVN1kLdf9tAlKOSB1jVDic/IoMWlcC+MfSzF22bWhevdg2pR/nE5+nKjuEvzG1D+9JaEjP0w/Uo9B7koVPbFsAoc6qeOnMotjC7ovpcor8ZA1GGodvsKBiM2tyDoq9w3JfwMY6q3ZIDfW+vgHU2HRwIChFmJ9WgERRXorxNh9LWFCBeGyRj0LfQXX20Vz/ktcebwFunqLs9Pw96RLaPTQL0yfJVXR43LcVzHOuf88ZeQBst9fkXNdXLIpEEqZyqaSsmn/2RWl0RzN9hqEIAUC2paBS1+QjzYTF2QTccNmS9MmXQaWD0j8/ddT+0xZm0zmmD8PhEUSR4J/ksg37/xZcuEPVtmkHVbp+ZwwbiFeyXy0P+tA70WPL5H+nCvboxr6nFNropeSPVn4fr2XbKZ1YusykaJNJ0a8clOPLrm9kztHD3R9Iz2fcxOyN22KC6bP8bR3P7cAyjym0kqKJX9kfH1u5qR5GBy8ERn3fD0iC42pHY11GQqaMa4UMImyCMocpBrxzFUTmh282zUHi4PlICkhkwkruSOxOALh7s5F+blsCPChhlWN2rsjUpzb+6XDJrdA9woVNOcRdyUQ3J4r8eRLIJRmJStAfV2lOXx7EOqu7WOoVZkm1QnD1FAZ3tH+b954CUuqRX9GriP+os9fxGjQmkYIbxdYgagSHu7v+MPIP4Bg/1cp0ye03E7GraoEVu/2sr3T9r/rCrkjQ3hQVIGmENh0C5LpUmVK+oYFxfxJ4DHJSxY1B41dL9yoelNi3S4Y1Nil4VB5BiZdMtH1KGqM8bPAVEZppk6I7pxrp5C9kb1slRc9YU5LKG9q6Kz1Eet2iwYA8JKLbmakq3O+HkNOgsdWnOhqL73OITT99E= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 64bd94e8-6c90-4333-4927-08dcbd3c8e12 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.0880 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: E2y5zo4N9OLEQHsgU/xG0d1seYXm6CE0zU7Aft7XzSB4AMOBaj3xglyrqOLZ4o95 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6631 A radix or "page table level" is the memory inside the page table used to store the data. Generally formats have a fixed size for these tables and all are uniform. It is usually PAGE_SIZE of their respective architectures, but not always. Often the top most table level has a different size than the rest. The key function of this allocator is a way to maintain a linked list of the memory, and a RCU free capability of those lists. Most of the algorithms in the iommu implementation rely on the linked lists, and the RCU is necessary for debugfs support. Use the new folio-ish infrastructure for creating a custom struct page to store the additional data. Included in this is some support for managing the CPU cache invalidation algorithm that ARM uses. The folio is used to record when the table memory has been DMA mapped along with helpers to DMA API map/unmap the memory. FIXME: Several of the formats require sub-page sizes (ie ARMv7s uses 1k tables pages on a 4k architecture, ARMv8 can use 4k/16k/64k pages regardless of the CPU PAGE_SIZE). 4:1 can be handled by giving up on the no-allocate RCU and storing 4 next pointers directly in the folio. The 16:1 case would require allocating additional memory to hold the metadata, much like Matthew's proposed memdesc. In a future memdesc world the per-folio metadata would be allocated to the required size. This logic is not implemented yet. FIXME: - sub-page sizes. Without support it wastes memory but is suitable for funtional testing. - This has become weirdly named - This is general, except it does use NR_IOMMU_PAGES Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 8 ++ drivers/iommu/generic_pt/Makefile | 4 + drivers/iommu/generic_pt/pt_alloc.c | 174 ++++++++++++++++++++++++++++ drivers/iommu/generic_pt/pt_alloc.h | 98 ++++++++++++++++ 4 files changed, 284 insertions(+) create mode 100644 drivers/iommu/generic_pt/pt_alloc.c create mode 100644 drivers/iommu/generic_pt/pt_alloc.h diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 775a3afb563f72..c22a55b00784d0 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -19,4 +19,12 @@ config DEBUG_GENERIC_PT kernels. The kunit tests require this to be enabled to get full coverage. + +config IOMMU_PT + tristate "IOMMU Page Tables" + depends on IOMMU_SUPPORT + depends on GENERIC_PT + default n + help + Generic library for building IOMMU page tables endif diff --git a/drivers/iommu/generic_pt/Makefile b/drivers/iommu/generic_pt/Makefile index f66554cd5c4518..f7862499642237 100644 --- a/drivers/iommu/generic_pt/Makefile +++ b/drivers/iommu/generic_pt/Makefile @@ -1 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 +iommu_pt-y := \ + pt_alloc.o + +obj-$(CONFIG_IOMMU_PT) += iommu_pt.o diff --git a/drivers/iommu/generic_pt/pt_alloc.c b/drivers/iommu/generic_pt/pt_alloc.c new file mode 100644 index 00000000000000..4ee032161103f3 --- /dev/null +++ b/drivers/iommu/generic_pt/pt_alloc.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "pt_alloc.h" +#include "pt_log2.h" +#include +#include + +#define RADIX_MATCH(pg, rl) \ + static_assert(offsetof(struct page, pg) == \ + offsetof(struct pt_radix_meta, rl)) +RADIX_MATCH(flags, __page_flags); +RADIX_MATCH(rcu_head, rcu_head); /* Ensure bit 0 is clear */ +RADIX_MATCH(mapping, __page_mapping); +RADIX_MATCH(private, free_next); +RADIX_MATCH(page_type, __page_type); +RADIX_MATCH(_refcount, __page_refcount); +#ifdef CONFIG_MEMCG +RADIX_MATCH(memcg_data, memcg_data); +#endif +#undef RADIX_MATCH +static_assert(sizeof(struct pt_radix_meta) <= sizeof(struct page)); + +static inline struct folio *meta_to_folio(struct pt_radix_meta *meta) +{ + return (struct folio *)meta; +} + +void *pt_radix_alloc(struct pt_common *owner, int nid, size_t lg2sz, gfp_t gfp) +{ + struct pt_radix_meta *meta; + unsigned int order; + struct folio *folio; + + /* + * FIXME we need to support sub page size tables, eg to allow a 4K table + * on a 64K kernel. This should be done by allocating extra memory + * per page and placing the pointer in the meta. The extra memory can + * contain the additional list heads and rcu's required. + */ + if (lg2sz <= PAGE_SHIFT) + order = 0; + else + order = lg2sz - PAGE_SHIFT; + + folio = (struct folio *)alloc_pages_node( + nid, gfp | __GFP_ZERO | __GFP_COMP, order); + if (!folio) + return ERR_PTR(-ENOMEM); + + meta = folio_to_meta(folio); + meta->owner = owner; + meta->free_next = NULL; + meta->lg2sz = lg2sz; + + mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, + log2_to_int_t(long, order)); + lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, + log2_to_int_t(long, order)); + + return folio_address(folio); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_alloc, GENERIC_PT); + +void pt_radix_free_list(struct pt_radix_list_head *list) +{ + struct pt_radix_meta *cur = list->head; + + while (cur) { + struct folio *folio = meta_to_folio(cur); + unsigned int order = folio_order(folio); + long pgcnt = 1UL << order; + + mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, -pgcnt); + lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, -pgcnt); + + cur = cur->free_next; + folio->mapping = NULL; + __free_pages(&folio->page, order); + } +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free_list, GENERIC_PT); + +void pt_radix_free(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + struct pt_radix_list_head list = { .head = meta }; + + pt_radix_free_list(&list); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free, GENERIC_PT); + +static void pt_radix_free_list_rcu_cb(struct rcu_head *head) +{ + struct pt_radix_meta *meta = + container_of(head, struct pt_radix_meta, rcu_head); + struct pt_radix_list_head list = { .head = meta }; + + pt_radix_free_list(&list); +} + +void pt_radix_free_list_rcu(struct pt_radix_list_head *list) +{ + if (!list->head) + return; + call_rcu(&list->head->rcu_head, pt_radix_free_list_rcu_cb); +} +EXPORT_SYMBOL_NS_GPL(pt_radix_free_list_rcu, GENERIC_PT); + +/* + * For incoherent memory we use the DMA API to manage the cache flushing. This + * is a lot of complexity compared to just calling arch_sync_dma_for_device(), + * but it is what the existing iommu drivers have been doing. + */ +int pt_radix_start_incoherent(void *radix, struct device *dma_dev, + bool still_flushing) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + dma_addr_t dma; + + dma = dma_map_single(dma_dev, radix, log2_to_int_t(size_t, meta->lg2sz), + DMA_TO_DEVICE); + if (dma_mapping_error(dma_dev, dma)) + return -EINVAL; + + /* The DMA API is not allowed to do anything other than DMA direct. */ + if (WARN_ON(dma != virt_to_phys(radix))) { + dma_unmap_single(dma_dev, dma, + log2_to_int_t(size_t, meta->lg2sz), + DMA_TO_DEVICE); + return -EOPNOTSUPP; + } + meta->incoherent = 1; + meta->still_flushing = 1; + return 0; +} +EXPORT_SYMBOL_NS_GPL(pt_radix_start_incoherent, GENERIC_PT); + +int pt_radix_start_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev) +{ + struct pt_radix_meta *cur; + int ret; + + for (cur = list->head; cur; cur = cur->free_next) { + if (cur->incoherent) + continue; + + ret = pt_radix_start_incoherent( + folio_address(meta_to_folio(cur)), dma_dev, false); + if (ret) + return ret; + } + return 0; +} +EXPORT_SYMBOL_NS_GPL(pt_radix_start_incoherent_list, GENERIC_PT); + +void pt_radix_stop_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev) +{ + struct pt_radix_meta *cur; + + for (cur = list->head; cur; cur = cur->free_next) { + struct folio *folio = meta_to_folio(cur); + + if (!cur->incoherent) + continue; + dma_unmap_single(dma_dev, virt_to_phys(folio_address(folio)), + log2_to_int_t(size_t, cur->lg2sz), + DMA_TO_DEVICE); + } +} +EXPORT_SYMBOL_NS_GPL(pt_radix_stop_incoherent_list, GENERIC_PT); diff --git a/drivers/iommu/generic_pt/pt_alloc.h b/drivers/iommu/generic_pt/pt_alloc.h new file mode 100644 index 00000000000000..9751cc63b7d13f --- /dev/null +++ b/drivers/iommu/generic_pt/pt_alloc.h @@ -0,0 +1,98 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#ifndef __GENERIC_PT_PT_ALLOC_H +#define __GENERIC_PT_PT_ALLOC_H + +#include +#include +#include + +/* + * Per radix table level allocation meta data. This is very similar in purpose + * to the struct ptdesc. + * + * radix levels have special properties: + * - Always a power of two size + * - Can be threaded on a list without a memory allocation + * - Can be RCU freed without a memory allocation + */ +struct pt_radix_meta { + unsigned long __page_flags; + + struct rcu_head rcu_head; + union { + struct { + u8 lg2sz; + u8 incoherent; + u8 still_flushing; + }; + unsigned long __page_mapping; + }; + struct pt_common *owner; + struct pt_radix_meta *free_next; + + unsigned int __page_type; + atomic_t __page_refcount; +#ifdef CONFIG_MEMCG + unsigned long memcg_data; +#endif +}; + +static inline struct pt_radix_meta *folio_to_meta(struct folio *folio) +{ + return (struct pt_radix_meta *)folio; +} + +static inline struct pt_radix_meta *virt_to_meta(const void *addr) +{ + return folio_to_meta(virt_to_folio(addr)); +} + +struct pt_radix_list_head { + struct pt_radix_meta *head; +}; + +void *pt_radix_alloc(struct pt_common *owner, int nid, size_t log2size, + gfp_t gfp); +void pt_radix_free(void *radix); +void pt_radix_free_list(struct pt_radix_list_head *list); +void pt_radix_free_list_rcu(struct pt_radix_list_head *list); + +static inline void pt_radix_add_list(struct pt_radix_list_head *head, + void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + meta->free_next = head->head; + head->head = meta->free_next; +} + +int pt_radix_start_incoherent(void *radix, struct device *dma_dev, + bool still_flushing); +int pt_radix_start_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev); +void pt_radix_stop_incoherent_list(struct pt_radix_list_head *list, + struct device *dma_dev); + +static inline void pt_radix_done_incoherent_flush(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + /* + * Release/acquire is against the cache flush, + * pt_radix_still_incoherent() must not return 0 until the HW observes + * the flush. + */ + smp_store_release(&meta->still_flushing, 0); +} + +static inline bool pt_radix_incoherent_still_flushing(void *radix) +{ + struct pt_radix_meta *meta = virt_to_meta(radix); + + return smp_load_acquire(&meta->still_flushing); +} + +#endif From patchwork Thu Aug 15 15:11:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764926 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CE521B32D6 for ; Thu, 15 Aug 2024 15:11:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; cv=fail; b=uo9bR9N6eHCZM2V1WZkQBrYuNYTT23h3XpwrxK5UnYpTGKCLw6NYF6wUAmi3uKxyKtdxs9HRCRUWv9TOCdxT1hZGZnIJGoImEcOXMVe0nmTFwcXEF1Cp3GhISDRvfh/bPyZFtWVOstkN5ZwQLpe0U1Opax1/qzNCJcm3ACjwYN4= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; c=relaxed/simple; bh=iezv6Cb9leC6dEkHJDjcdJQ5cchfc40jP+jW8mUBytk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=Nce1etjqyd78zXEf0BTnVyL01iRwAYXs64BphizWQLxSDEG+3u2wOAszVzSbQhwulCjCBdaXMU00xPNeDVNZFwdGrndEoQ7vgWeWgEmU+i9IyP0fWLVRUbuhBWhcPaQErcOyp4Octr4UI9pu/JeEuWqBrKA2/oFsO9bZyqP4iTU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=OUeZlj+Y; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="OUeZlj+Y" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=NDyvsAXFE+UImNepmrP2uZukyuK8T7itDLChP1mRMkcrbQ43lwz6TA+CrRlOfnPXwpAvhQQqT6l53dMSPH2Tq8dtedacFYR796g0bxfMmkeud0OrkFjrkYjb9oxdheCxpk1xQWgqWD6lWsfrI+oRY8F39/P8F/92S/LxVU8zmxgzo3zmspXkFsZnVTxra0JQsmPntSOzrfC9+gt334tPV4RJyfbQiudIuZcUom4ityT+ZsR/avPLTkKRHJm7nqyXKeB+vgC1lj+IXIj0OIEZ6D1Kkxvr/+NDJD7IaXsuh0HLNv8TOoUY18JpKOdJanROUWQGjLW1xgxkZWmGQClUXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CseehZW1/tl+23sWMKn8i2i+3hNkbe8IlewYv7NWFro=; b=wVdEfnpVXAs2f/XURwAs32xU8DOdkpmf/AQ7wfL4aOkZ3tSECWgn6Tkf2bIryCVkUKGUKtFT9RZU+DtpWfvfeW1VVgEADjGviqsOtzlXDA91N5YI2HyVvFGh/RD+LRJ8vwjMuyo+I6MSXU94d3527amhNjweu/dEKJZe9kYUd+6BfxzGhGIeVqrnL5112hxzj1jdlPEmrKXeEYhWaTtAm1OidPWSY+SnrlwbVBm4jb8UsEuVqhf638SR7WzmNBVLUPOVKqFglqhQfvNQAsk1dl6uqxrCQ0lHau90o9ulJnkr1QbWTm/IF495nJ2/OjCw9PlpuIjLCDitDJs4qMVjvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CseehZW1/tl+23sWMKn8i2i+3hNkbe8IlewYv7NWFro=; b=OUeZlj+YG12xvlNOQivw+1b81M1wuRxcqkvkc96QPg3kEas8yvqOc/uWPR2GeY5Tj7QqzRBFV7CecbE7NLUvtDG8erE+CfJhuLhFSgWhexByURzAindzUKB64IbONj+4p2hVlQyOtyruT57Idgiw7ajG28Kf0yph1QW4e2KPxTzplVABcwXv27Hk15qF6nZQCUCPCtVhJNQHd9oUx+nQZLZEAm9S8QlooK6IS8Xr5ZQCVtNGlo2JNu47q/cIkv7Mz/7usp8M9CBnm7R5bcJcCanwZLxEbugJiCGOPihSHQhvZkT1M6xPBlh5YNjcu/jz/xxQ5dC4W62TLTy9XNLIdg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:43 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:43 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 03/16] iommupt: Add the basic structure of the iommu implementation Date: Thu, 15 Aug 2024 12:11:19 -0300 Message-ID: <3-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL6PEPF00016416.NAMP222.PROD.OUTLOOK.COM (2603:10b6:22e:400:0:1004:0:4) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 30051ef4-32fb-4d1a-e569-08dcbd3c8ea3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: qXM0zrAo/03YIptDigjFnSwsxV//MoZCf3FiSjAHFUqxcpkjBBpkERLmf78bXJrtGX2HW2N9G3jK0eseHnVjYLqsc6DkpGrFz9/etxrM1JV4lezvBChDcX6q8uGW2i3lOlxxKYZBGqN6IW354vV0xf3b7/bkrFZtFpMdCsfnnmcokeSWRU0TFDMLFwFoUE0+rKFhi9SbjkDH49vwiuB8d1eHnLZu9e3z+oGlX//BukB/v7ILhDY7qIgWwZ2MX2AJHdREwran7+JPVrimSX2iXOidoz4xWkIss9SAz61q88JpeCpwCiChlzbsR9KCVeIIbyCbjxEyTqYGEPGybxW/qHWdKwhCkZwx3x0NP3VJnt8iGRXK6hr7fjjIZtn3wL5YH292rQlmiI7h5PmtmnOaKDzxpI270+ANwsEwaHZj/clroxpzz1OQxytcvHwbVIp80BfEXu+7ZedQGt1+ofDZlYK5H4vHEL7mnZw4ViqvSFGsHlAjtecf5a1tlBcoNrnUvEjcyzf0ez5k4xXWErvPQdnYR6AQxY8kG+p1PBARJuZ5WQWF6ASeJsUueKwF4YNcPbvfnl0MQpNxJtzPEAjMxGDrpge0owT1Cojh+dpUFkqh/mBcJjtSLkKoAJ1v65sVt1O9H+UhEsodCNNWxaemZnC1gh3BT5FKYs1nXk3WD70aiuZSW+GKGhLAqdZDJh0qFDiBLl0gf32cbJLGrtzVeTVdbp1VP835jhpdWr23dgR1949xI+I0NMQcS8nAPEF8fGToV8UaO4QPmVeiepugjQaEH2A08BjmAvnttKQDj3jSy8Mytg/Weqhi4+X+LAX7H2prJVWUokqzcZUKlLQTChir+fUvTOJlSi1QBxSFHmGQQoZ07zikjRBgf5CG1BdIbShZ6b0WhimQ9vzCZ8LPVrXlzhWcw/pI6YYQgsg8ou4BwquGt7jwrWyyb8SiuhOEwJ2QOSixkGOs82WjQRQoSVsT3fdRDS28QwUDAqPBBS8y5YLqLuHwBa+sBjUKqUUjPGwsEwXEa5WnnaU9r0Q1f/dqO+DMPYEQN8hGbbgLKCayET0KvEiEqbNVi5qSHcOFg1Cl7LqPGiXU9xk6jjtkOObbZfOJ0dLFg+nQR0qz/QrT0s8oSq+pQY/e7P/RCRic2r1CLCg4dq8LtHw1LIVO9qg3dc003Sp3HNsC/lZQL5vZcTvBVcfZwamqkzjaQxwVZlCohcVqutDo7RL4IBE15B7ScmCHeVXN9iLo0hLbn6vgvbhNmitEKRsZH/oR38xloBUjznF6gMgqI1x5oz3ChyE8/GNWJvutr5izVsqsB8ng3ZpCDMxjNRlGdydFp617AugGT0dd842sPbHL1uzTdg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 8Cqrdyrva3gB/cdyHOmmvSrTNKLVV/3Lw/zM7EwP0lFgx+0/yyPMZ9xd1i7dkaxxRVGG30ryloEJg7fbzI/g41GsMsYqbfqnVakDpZpQnJW+c7wDdJZqMld1D1gySyHgSQQ4A4Y31GN6Ck/2Z7FSHE7tUrKdSX4QP8kVKVlGMsjAD6Qb3Izesi17ZoFGgPpX8OgsITO2P/sHQdud0leT2CV+0w5BCjipnsHmBgrzL/ni0uT49E9uur+hVMPPeSpkBE0UlGBVdd3oPkNktflyox5JyGoXPcmL1lRvvqDvRMshaIUXMPDjnAf+AQogbLHH2fgj3Q3oIOJINyaoe/KFGPcOIJOWU2HHzIYHrfv1I2STwjb6TMLunFePX1/IEtjUX+gB+1u5HhJmhU0qQK3lhVSn+calJ+hiX7/ty4YTKi11MyFCFlG/h7ikj9YtsEa0q53mT08bXuTSArDclNJiX02E35jU+KtGgTdjR2DLlaVfM22o/IA43L20oGwYB2HRp4aRmSMa4Nbg3SrY2bWHjQISHI5UZMZWxAAHVxuxkFtt5tvlm/lKmi+isYHkexGuuyn8oUhSg0ram3UNRikPUFbLCABv8I51dl1NpphF6n4doTiYTeJehrTxjSjSutNjkwOuF70VXVSqSJ+PEO4+IoPCOBMUAi7a94DkuaUANfbAI6C/xtyJf4vvfpEPzNYzBPnfHmqn7lW/iOpcp4Zfnekj0I7oQStLNvZ9G7M4ciZTUuv6qTAKj5oJVJKINngPAjpSuEdaHtTH6XfkC3pgHNQ/6vlp+crVRUPY9wd5G/5fPR6fJo6MqqPZTUgtKdR/rlleSRmT4yeedztEeCdOLIldQ87k2LgY2B3KhFECP7tUXQfR4ESQFllJlM9cu8xcu8OMcfglBltZ6tLNoXNvfUW/YkV8lhr4Z+vy1o5FpE2MdeYMRxINlG1KwuodwspyVnyLHcz7Zw86jkM+u51iU4xU6S1S3mLjlP+LF5QyeE5Wubr8AbaKwSbUNEZS/akCHeGzYZgoEJv5W6vgdFN5HtSmt4gfKy8L5lIzCfQ+Vb5a0EdPYAlM7WM+dIldidyINi8d93itpDYUYQAcgXtbT19vAUuExXaZHJDhA8g7oY6mNbUVzqyidmKY7kZWkKcK9RsaKqH2/pltMNUVpBM2CFgjwfSN/0/bzzmSN56FE0U3BPWkfln+53yPRYMX+yYoqMgQxUfXu3EDaWNZIHeip9IlMl4YqPU9HtFpLbTcynt6UAQCB/gpsAODO677SStEpOdWuYUJ1neOGFU4OSmaXmB9o51v+SYjGQczV32S9LAMMN4IVWv73OWnl2X+kKKEkt8w4WptYM36GyvjnBN8eQg9+c9qC6PPD4ru45w8+h65vY0wKe8UpwjplhV31PP/sF3OCbP278JsjM5LuHlo0k1xxiz/mmPMag7GSzN+LkXUJ7iy0eLSEJOm4laa5It6AIBcn0KVJ7YnFiirom7PbihBOAwf8e8YXNQ85FKUPy0ELLxrJX91PshEWYJv3vA7Tr7U0NmZwnGcYdSbo0AaiPvzE5UfbXqMK1x33eRafdc= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 30051ef4-32fb-4d1a-e569-08dcbd3c8ea3 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:37.0548 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: corMjB76D/Ts3sHjDqGOF8ddwwZujGF4zS5cAVLl7JwYZrNLAukH0iFydcT2OxIz X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 The iommu implementation is a single version of the iommu domain operations, iova_to_phys, map, unmap, read_and_clear_dirty and flushing. It is intended to be a near drop in replacement for existing iopt users. By using the Generic Page Table mechanism it is a single algorithmic implementation that operates all the different page table formats with consistent characteristics. Implement the basic starting point: alloc(), get_info() and deinit(). Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/fmt/iommu_template.h | 37 ++++ drivers/iommu/generic_pt/iommu_pt.h | 166 ++++++++++++++++++ include/linux/generic_pt/iommu.h | 87 +++++++++ 3 files changed, 290 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 include/linux/generic_pt/iommu.h diff --git a/drivers/iommu/generic_pt/fmt/iommu_template.h b/drivers/iommu/generic_pt/fmt/iommu_template.h new file mode 100644 index 00000000000000..d6ca1582e11ca4 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_template.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Template to build the iommu module and kunit from the format and + * implementation headers. + * + * The format should have: + * #define PT_FMT + * #define PT_SUPPORTED_FEATURES (BIT(PT_FEAT_xx) | BIT(PT_FEAT_yy)) + * And optionally: + * #define PT_FORCE_ENABLED_FEATURES .. + * #define PT_FMT_VARIANT + */ +#include +#include + +#ifdef PT_FMT_VARIANT +#define PTPFX \ + CONCATENATE(CONCATENATE(PT_FMT, _), CONCATENATE(PT_FMT_VARIANT, _)) +#else +#define PTPFX CONCATENATE(PT_FMT, _) +#endif + +#define _PT_FMT_H PT_FMT.h +#define PT_FMT_H __stringify(_PT_FMT_H) + +#define _PT_DEFS_H CONCATENATE(defs_, _PT_FMT_H) +#define PT_DEFS_H __stringify(_PT_DEFS_H) + +#include +#include PT_DEFS_H +#include "../pt_defs.h" +#include PT_FMT_H +#include "../pt_common.h" + +#include "../iommu_pt.h" diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h new file mode 100644 index 00000000000000..708beaf5d812f7 --- /dev/null +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -0,0 +1,166 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * "Templated C code" for implementing the iommu operations for page tables. + * This is compiled multiple times, over all the page table formats to pick up + * the per-format definitions. + */ +#ifndef __GENERIC_PT_IOMMU_PT_H +#define __GENERIC_PT_IOMMU_PT_H + +#include "pt_iter.h" +#include "pt_alloc.h" + +#include +#include + +struct pt_iommu_collect_args { + struct pt_radix_list_head free_list; + u8 ignore_mapped : 1; +}; + +static int __collect_tables(struct pt_range *range, void *arg, + unsigned int level, struct pt_table_p *table) +{ + struct pt_state pts = pt_init(range, level, table); + struct pt_iommu_collect_args *collect = arg; + int ret; + + if (collect->ignore_mapped && !pt_can_have_table(&pts)) + return 0; + + for_each_pt_level_item(&pts) { + if (pts.type == PT_ENTRY_TABLE) { + pt_radix_add_list(&collect->free_list, pts.table_lower); + ret = pt_descend(&pts, arg, __collect_tables); + if (ret) + return ret; + continue; + } + if (pts.type == PT_ENTRY_OA && !collect->ignore_mapped) + return -EADDRINUSE; + } + return 0; +} + +static void NS(get_info)(struct pt_iommu *iommu_table, + struct pt_iommu_info *info) +{ + struct pt_common *common = common_from_iommu(iommu_table); + struct pt_range range = pt_top_range(common); + struct pt_state pts = pt_init_top(&range); + pt_vaddr_t pgsize_bitmap = 0; + + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP)) { + for (pts.level = 0; pts.level <= PT_MAX_TOP_LEVEL; + pts.level++) { + if (pt_table_item_lg2sz(&pts) >= common->max_vasz_lg2) + break; + pgsize_bitmap |= pt_possible_sizes(&pts); + } + } else { + for (pts.level = 0; pts.level <= range.top_level; pts.level++) + pgsize_bitmap |= pt_possible_sizes(&pts); + } + + /* Hide page sizes larger than the maximum OA */ + info->pgsize_bitmap = oalog2_mod(pgsize_bitmap, common->max_oasz_lg2); +} + +static void NS(deinit)(struct pt_iommu *iommu_table) +{ + struct pt_common *common = common_from_iommu(iommu_table); + struct pt_range range = pt_top_range(common); + struct pt_iommu_collect_args collect = { + .ignore_mapped = true, + }; + + pt_radix_add_list(&collect.free_list, range.top_table); + pt_walk_range(&range, __collect_tables, &collect); + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) + pt_radix_stop_incoherent_list(&collect.free_list, + iommu_table->iommu_device); + pt_radix_free_list(&collect.free_list); +} + +static const struct pt_iommu_ops NS(ops) = { + .iova_to_phys = NS(iova_to_phys), + .get_info = NS(get_info), + .deinit = NS(deinit), +}; + +static int pt_init_common(struct pt_common *common) +{ + struct pt_range top_range = pt_top_range(common); + + if (PT_WARN_ON(top_range.top_level > PT_MAX_TOP_LEVEL)) + return -EINVAL; + + if (top_range.top_level == PT_MAX_TOP_LEVEL || + common->max_vasz_lg2 == top_range.max_vasz_lg2) + common->features &= ~BIT(PT_FEAT_DYNAMIC_TOP); + + if (!pt_feature(common, PT_FEAT_DYNAMIC_TOP)) + common->max_vasz_lg2 = top_range.max_vasz_lg2; + + if (top_range.max_vasz_lg2 == PT_VADDR_MAX_LG2) + common->features |= BIT(PT_FEAT_FULL_VA); + + /* Requested features must match features compiled into this format */ + if ((common->features & ~(unsigned int)PT_SUPPORTED_FEATURES) || + (common->features & PT_FORCE_ENABLED_FEATURES) != + PT_FORCE_ENABLED_FEATURES) + return -EOPNOTSUPP; + + /* FIXME generalize the oa/va maximums from HW better in the cfg */ + if (common->max_oasz_lg2 == 0) + common->max_oasz_lg2 = pt_max_output_address_lg2(common); + else + common->max_oasz_lg2 = min(common->max_oasz_lg2, + pt_max_output_address_lg2(common)); + return 0; +} + +#define pt_iommu_table_cfg CONCATENATE(pt_iommu_table, _cfg) +#define pt_iommu_init CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), init) +int pt_iommu_init(struct pt_iommu_table *fmt_table, + struct pt_iommu_table_cfg *cfg, gfp_t gfp) +{ + struct pt_iommu *iommu_table = &fmt_table->iommu; + struct pt_common *common = common_from_iommu(iommu_table); + struct pt_table_p *table_mem; + int ret; + + memset(fmt_table, 0, sizeof(*fmt_table)); + spin_lock_init(&iommu_table->table_lock); + common->features = cfg->features; + common->max_vasz_lg2 = PT_MAX_VA_ADDRESS_LG2; + iommu_table->iommu_device = cfg->iommu_device; + iommu_table->nid = dev_to_node(cfg->iommu_device); + + ret = pt_iommu_fmt_init(fmt_table, cfg); + if (ret) + return ret; + + ret = pt_init_common(common); + if (ret) + return ret; + + table_mem = table_alloc_top(common, common->top_of_table, gfp, false); + if (IS_ERR(table_mem)) + return PTR_ERR(table_mem); +#ifdef PT_FIXED_TOP_LEVEL + pt_top_set(common, table_mem, PT_FIXED_TOP_LEVEL); +#else + pt_top_set(common, table_mem, pt_top_get_level(common)); +#endif + iommu_table->ops = &NS(ops); + return 0; +} +EXPORT_SYMBOL_NS_GPL(pt_iommu_init, GENERIC_PT_IOMMU); + +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS(GENERIC_PT); + +#endif diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h new file mode 100644 index 00000000000000..d9d3da49dc0fe2 --- /dev/null +++ b/include/linux/generic_pt/iommu.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#ifndef __GENERIC_PT_IOMMU_H +#define __GENERIC_PT_IOMMU_H + +#include +#include + +struct pt_iommu_ops; + +/** + * DOC: IOMMU Radix Page Table + * + * The iommu implementation of the Generic Page Table provides an ops struct + * that is useful to go with an iommu_domain to serve the DMA API, IOMMUFD and + * the generic map/unmap interface. + * + * This interface uses a caller provided locking approach. The caller must have + * a VA range lock concept that prevents concurrent threads from calling ops on + * the same VA. Generally the range lock must be at least as large as a single + * map call. + */ + +/** + * struct pt_iommu - Base structure for iommu page tables + * + * The format specific struct will include this as the first member. + */ +struct pt_iommu { + /** + * @ops: Function pointers to access the API + */ + const struct pt_iommu_ops *ops; + /** + * @nid: Node ID to use for table memory allocations. This defaults to + * dev_to_node(iommu_device). The iommu driver may want to set the NID + * to the device's NID, if there are multiple table walkers. + */ + int nid; + /* private: */ + /* Write lock for pt_common top_of_table */ + spinlock_t table_lock; + struct device *iommu_device; +}; + +/** + * struct pt_iommu_info - Details about the iommu page table + * + * Returned from pt_iommu_ops->get_info() + */ +struct pt_iommu_info { + /** + * @pgsize_bitmap: A bitmask where each set bit indicates + * a page size that can be natively stored in the page table. + */ + u64 pgsize_bitmap; +}; + +/* See the function comments in iommu_pt.c for kdocs */ +struct pt_iommu_ops { + /** + * get_info() - Return the pt_iommu_info structure + * @iommu_table: Table to query + * + * Return some basic static information about the page table. + */ + void (*get_info)(struct pt_iommu *iommu_table, + struct pt_iommu_info *info); + + /** + * deinit() - Undo a format specific init operation + * @iommu_table: Table to destroy + * + * Release all of the memory. The caller must have already removed the + * table from all HW access and all caches. + */ + void (*deinit)(struct pt_iommu *iommu_table); +}; + +static inline void pt_iommu_deinit(struct pt_iommu *iommu_table) +{ + iommu_table->ops->deinit(iommu_table); +} + +#endif From patchwork Thu Aug 15 15:11:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764914 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2081.outbound.protection.outlook.com [40.107.92.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36F951A7076 for ; Thu, 15 Aug 2024 15:11:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734706; cv=fail; b=LfUD1MRxfhSSxJqeGsn2qmSdCqn82gJUbwkRo0GqV+6/hn5IlDkMV9HuCHCXrV1LLp32hGsIIp7p545LObQzFqD8C2lxKycDfI5clw36nqXKbp0xHUIZolLtwNYgJXy/hNK4JvF8KA59KpNkbWf/y7FZ2jcA6l9ZgVXajcBGnHA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734706; c=relaxed/simple; bh=7KNGKp/GaxE9Y6hTQ5T0KZZgOn7HXT0kRIqjAlt03xI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=DgkhKGsRHwA980tc9vByLdyb0mmwRibUBZkCyF47zkkc+LITA9eiWyY8g61RyDR50eAgNYgPZX0MP5x8U5lkjjPfCibiR7s8NfAQJyhQiXDlifnRTwqCpFdyLccJCCUXFbGm5qKA/KGHzhOMzk4XdV1BZh5TCNDJ85HpsfPecZo= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=YnuUYWUD; arc=fail smtp.client-ip=40.107.92.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="YnuUYWUD" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=kJKHuRXzk+Tcy0vBAUCL1X0jojrh7/++AooYuMcDBLRGDIf0/lE7v0bQGt8HwgQDMM91wYbxDTJKs6bmzk2pD7w8l0AHYX+I8o22TH7IleVuHV4Wrtw2mUFWnV8EKWznRChe875X9c+dR2Ket5ravAaOOyuj7biWoGJPT8Y8ToFeQ5y229aO7xBMr+Acwr0W1q2ubCRRxOi8QZOFtEer3YVkqS8C+/HXgvNI1NORgP2rXwnBJ3I3PPpM8pirTE84BgB5MHuHOBux+JeMo2YsCx/bRKnCLXIzVlIBn2e4bHHFjUoM0RUWmNzRveBa0RAnchLzK5eMwCTwSvRPuyE7Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ih2iiSzTZXbZw5C0UvsFEnc7n2vkMt87bzhOXl2eUhg=; b=n3jZ/8/ymiZqrHdsDUsplrYryHWVpIMeq2zgveW45l4XfZqaL9GZxd8HCtoo3A69Q2lQ0yuN5tmwOoInrwCMoB/WvnnYSzwkOGlAKuR1qbFDx4WaRUiZLCR7kEUdzakEk7Nr806Qq6cHhXwvg6ol4uOsYLFNNiSLCk2LJmiKEkdKuCZ5L3Yj9xQ3rAkti44Gx7RaXZ5GLgezhDthoRnjvSeCnuZi9Sf3CT1ky1kbjm5FLBj9jp0Fb91oOVF1gp6hu4S5rq1ZWxrmL8IsDujT+2gQ8uM2HU7FBWvdX26fsKZKo/4Kx1aAI67qRKRiK5liWQiC6Ay7xMiY4SXDtcoxdA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ih2iiSzTZXbZw5C0UvsFEnc7n2vkMt87bzhOXl2eUhg=; b=YnuUYWUD9wjjuuDSf2HDzaYTUPdZrNVYdDb8BWvC7d9exXY+fFq5xbDCEKLUXLt1IO/OV+gQ8GfBVH9idlUpU+uC8si76AKWmyniOeQDE2QjF+PKIfHvwes0FFoHaahAlBCi7mRA7tQV+PYgHuWxGYQd3urFatLMlwtfS/I3T0nPW0Wl/7GLknJ8RBGhqHkWoG3H+ZC/ajkGVql1mkjVlxM2lU8iDJFmPZSbVGL+uR20WYcJCsqa7xxRQuXs3ka6/mOQCkaa4CLGffeMOYQmXeJmzSTVHyDitipWrum7jAkLc8fq2BfcMf8oFRn9DBJ49rC5QShKkbHX99AZW4jA1g== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:34 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:34 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 04/16] iommupt: Add iova_to_phys op Date: Thu, 15 Aug 2024 12:11:20 -0300 Message-ID: <4-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR05CA0032.namprd05.prod.outlook.com (2603:10b6:208:335::13) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 24ec37ee-8034-4c88-5ab5-08dcbd3c8d25 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: Un7aPPwuGhvbOkb+rfT/uc+jo7mEwZyuUIEzbkIQQcFQ6moqOgtWZ93/IhCSkV7YzFZZndkZ0304C86pGPYtQcTEWSWdJ9zOUGdyuGwiAL8D7gTMeyPzibAAFlzEWTQN9NKwr8yRtU6tD+itRTBYUMwSEbuamuyvhSR/LnEf8hE+gZpqUIAptKv8m/fxVfA8g6n/ifOjN6URy0u8P1bangx6JhxLvQM85WREw7ZjOW+twQDmiqLzvVwcE+WndC47KxsKevXsR0Uh0pOwe5okIDL/wWdkZ7Hkd8EPLDJ/y5oCswfaDfvNMd1pnLaL2stO2E3U9nVXIfgJnBqYr3IoMA195uixXcUbgXtrgn9HMi7iDr2Fx+1n17eLN/n5GFWtoz0lWh7RgGdpJNrCOQFOg3R4974kkHrhwuYngEwg8/i1vAiqcpYgDKuFZOTtutL8TDdJIM7sArl1RNc02/D0QZECOSiLt75mwD6dOnc0kM0X1/jHVy2D+NKQOYwpr1A1G1DeYEdOBI9tVqe9sodn4J4wy4ORfLsP/+UhlJzgTPw18/Yh52u0hFbuBM3DnjS5XAn80CGzxKPjlz0e0fw5pL3FLGy7jag5xLEZpFK96ic9loCHOXxJQKziOTESPND/GWNEbr5biKpmB6pZiB1W9Vkt1FsFEbNQwv90BlbpgIzdsrqgbr476nbtbgu6rXwyfy3Ox59Ge9vPkMJnsjYLPNmiEHGMVeWI2jMrsuBNOV3daO7RhOlBQoYs+YpImDLZci5gEC8q19CPJFVNbSBP09RvJLx5rTzR0/PGQf1dCw/qoorEM6Jd3kqrolbI0jaV8t/CcdBAxJhJ2gjnZkb2d/hNeAiGZcUqutVjH02cL+i53lYBxIar8Zd5k1vcwT6DIMtVHVD1lhUpHHXVtHdpJPEoSZosG2zHJKB1nP0vfuT9HxXjcaaRhowB8FCwI3gOnsB6iHZ/kb9V8ynAWxJADoTvXYeAIy+TPPggBZUZzWZyy2OHBrlj+28F7d+oESzBvPiXxwE6ZkrxHVuoEGJW+SceXZKBB8j19DGn78KMfn3oKZfG2EfjNcqhhyUILgcRzYmXNFSKnLXIHzqiNd26kO7D2oD+e4ivDvZqCGabLbmfQuF2rveCd7B0r/+mESG0Tb/LATWfB1UPW9XaSN/rq52SRu8PbsiPP3RbNIFIyDYa2cwOYy7MtsGdc4D/d9iOwooetbwg4qla3z/uq421rmeu5tFeDMTHgb41+V3sHYkb9gW5epyyE32JXGiJOQX7Ekl4ID7QrE87frx+s/Jdd+HVg7XEgTtdqBVZ+YU+fArPBQp9eE+vOM/Y5ebbtzXrBcC6e6+LjsOyVYEpxnrU8w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: zHaaUriN71aEAQVsBSX/P7JY/DlE580xWLToJ7c4T9EBvScwFimH5tp1pIouU4BXHcSZ1/9QI+/HO8Csfm5OphIm5PlE4AnAEfcPfnDhXL+ftU1NnJIJup87Ug7REFnglNXXg8YQNFNINe5kwwad7wlyHen3vLWK+NflCFjvKCACcV13VxxZUQkhGkCWaoTYaGoKYg63QrzhUa9WNLvea0FL8CFApl9DIOelxayZCbHmZvOZttqCW8BdR/NMuimZBOn4ijiF5myutH0OWNUNG/NsdnjVRim1JcWTzT/0vfUQVCXKLZzxXzJrlt5goFP7bapDEBPFHNEwF6VJcfMYDMoXNvG+JTbE+YU4WEgJ+Zw0wqmItJsq3oyO+sHX/AUBuL9LVR22jTD4fuD8rzG9O50YsncpMGWSiD+S716i54T0bac81P4v7yUqkbZxn5b7r6ajQBgd/C3jwfmsthdlq93uhDZApetAmBFQ/HZIShR6HuGjT4NjMPRqoug+lhZbzFDp+3WKTFsQYE8w1ebJcasF7hXD30O7L5znv9db/HCxHWfI7mUV+ZBq+bstKcjQ/FP5oZ49gO5oFmuJOgBL8Rxfn0dOhIMVxHa2rE4/ECgSokqQ2oryrHC0xbrlPXPU5bgyMWYI5c/Nh+LxvuxZ46C+KLLErT9VE+c12ypEFo2CMkbx/GoW3LUdl1bNe33lz6qp37qYs7IisyrlmNbEaJtxXhYbfl04PT94UZtnY3YdQIRJ+/iMf/iuoLASZq14fwzz0ON+Zy/XmmmfxPZ4x6L1IAghGaU1LO27UIwAL8Ucm4GNT8apENIE4fV9Z+iHU+pL/rr9ER+WwpZZPh2bSw9oyFA/Zd8eyxZ2iGgk1R6q3J9npVZr+vN+TTBTBYni4QZ1E58a+9OJ2/qUAh15R01QmQNYzuYYpfgCAwLIh9E92sjfHAAAe5JDBPM3pTBt3x/X5W2MMRrP5GGcz3s+qPhCJ4us5KWvwbTnReK45UjFJM8jIzi0EWyQ+aK+Fq1XO7/LRuaAz8CPpj+h791O90rmQMBSgCDDRzqQshOkytVM/7fskBGpadTg/iE3GTxTJmuNt/IVB/DVYRLX/RnhWvdUs5xqCBPOBUpyL5pTmdAhz9vUapzr54REQr7Dw7kHZd8KTSKtpYRPszSBEqrjyMoNKPZZKaazekC+UoOM7NQue5V2yJWJoVA5wrsvsCebC+WS21iCgPkK/cnDOhDDWOsJHt+yYiIktRWq6iVxzqiYx0J6JWul/x/A8a8FZOO28SQAHIVsFDbSQVLLDBZQwrugdDzhVPANaWl/hlcJdCGoKlqLsWnVlusNAImmBYdeAbdd8eS6wQ1MgEsoyOkb6HOtvBYJ8s01WWA8/+Cd4rox8CooLG8NkFz3JDCulqu9OsGjhDry+BaGuz7kHK1MmtNoQ1urtNQ6N1bKQinygQELrQHk91UQe2jD4x9mk5Cg/8jNf9BxSMhCzmeCmfoKv2zm9vWe+1WCSMd2RQG8i+Bekip1B8m4jRL2ld0VD0wuQk4/C1oNePT5ghWcRDiIMfPEXfnd6Yq/kPKeCKltnzE= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 24ec37ee-8034-4c88-5ab5-08dcbd3c8d25 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.4851 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: vTnM+g2enx9VxjQBLA39q/LZBtRau75PYHiwyKcrGHMX4YeHNIMFQ1tFY8hDuVQ1 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 iova_to_phys is a performance path for the DMA API and iommufd, implement it using an unrolled get_user_pages() like function waterfall scheme. The implementation itself is fairly trivial. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/iommu_pt.h | 58 +++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 16 ++++++++ 2 files changed, 74 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h index 708beaf5d812f7..835c84ea716093 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -15,6 +15,64 @@ #include #include +static int make_range(struct pt_common *common, struct pt_range *range, + dma_addr_t iova, dma_addr_t len) +{ + dma_addr_t last; + + if (unlikely(len == 0)) + return -EINVAL; + + if (check_add_overflow(iova, len - 1, &last)) + return -EOVERFLOW; + + *range = pt_make_range(common, iova, last); + if (sizeof(iova) > sizeof(range->va)) { + if (unlikely(range->va != iova || range->last_va != last)) + return -EOVERFLOW; + } + return pt_check_range(range); +} + +static __always_inline int __do_iova_to_phys(struct pt_range *range, void *arg, + unsigned int level, + struct pt_table_p *table, + pt_level_fn_t descend_fn) +{ + struct pt_state pts = pt_init(range, level, table); + pt_oaddr_t *res = arg; + + switch (pt_load_single_entry(&pts)) { + case PT_ENTRY_EMPTY: + return -ENOENT; + case PT_ENTRY_TABLE: + return pt_descend(&pts, arg, descend_fn); + case PT_ENTRY_OA: + *res = pt_entry_oa_full(&pts); + return 0; + } + return -ENOENT; +} +PT_MAKE_LEVELS(__iova_to_phys, __do_iova_to_phys); + +static phys_addr_t NS(iova_to_phys)(struct pt_iommu *iommu_table, + dma_addr_t iova) +{ + struct pt_range range; + pt_oaddr_t res; + int ret; + + ret = make_range(common_from_iommu(iommu_table), &range, iova, 1); + if (ret) + return ret; + + ret = pt_walk_range(&range, __iova_to_phys, &res); + /* PHYS_ADDR_MAX would be a better error code */ + if (ret) + return 0; + return res; +} + struct pt_iommu_collect_args { struct pt_radix_list_head free_list; u8 ignore_mapped : 1; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index d9d3da49dc0fe2..5cd56eac14b41d 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -60,6 +60,22 @@ struct pt_iommu_info { /* See the function comments in iommu_pt.c for kdocs */ struct pt_iommu_ops { + /** + * iova_to_phys() - Return the output address for the given IOVA + * @iommu_table: Table to query + * @iova: IO virtual address to query + * + * Determine the output address from the given IOVA. @iova may have any + * alignment, the returned physical will be adjusted with any sub page + * offset. + * + * Context: The caller must hold a read range lock that includes @iova. + * + * Return: 0 if there is no translation for the given iova. + */ + phys_addr_t (*iova_to_phys)(struct pt_iommu *iommu_table, + dma_addr_t iova); + /** * get_info() - Return the pt_iommu_info structure * @iommu_table: Table to query From patchwork Thu Aug 15 15:11:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764915 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DF381A76C1 for ; Thu, 15 Aug 2024 15:11:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734706; cv=fail; b=keDlrUHdetpxmQfe6ZmyIv/N6niiAJ+ZfwSK7iOzFDCDpNVqKGBv161pZhLq/lSFDjzziKI982QAybx24JOzDZu7NHU+jmue6AZs3geYJ7g0rw8CCc5i9ov/1ZC2K20gZynVFbSTCColzFFWnDkYoYi8O8cH9ujDdx/bEQ2Xmes= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734706; c=relaxed/simple; bh=oltgfHj9xgYo8yABidDClvE1piQ2hALUW8LC5JoluPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=iEygaaSN/COSfWXLxIZO3YadyJEBXk7wr/kwNxaxfUo3fHeZleXW0E66aSsk+Kn1xUrxICdG2CHFkYsIWgqDaSyph5M8Z6NQzSZkuCP6KlFPNQRU/CpZZNbnuU+jAUMb8hrOI1k9bMByLg5Coln95NfgRR913wjnru6OTvzEst8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=lfnh7nbv; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="lfnh7nbv" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=XK+PUYIpQA0hR7cKxsEWkRAzQOvFCJ2HiiYQ8dUrJ/rbCnqnQzIYPNQNa/ZCI7yUeDdNv0/gZnEeVWt7bSKd04j4gEVSbfldHh7q54WRZWkZLuvfv9xxpKcfp87olcKoNcC7dV+A4hL7BZNlQxtOSdLtbUcmS6Me4Zod70MSN4YXoHAuVXT+xBcXtKKRJ9EvmDarvkT6qS+3gBrHedA8nL7MD2dTGHtkVtqzFhPzqtKdPfuRjJKoIyFXRcV3ecKxPPem2yNy37DTXeu1pd3UB6H6YSwTGvsLKWTOqCBX4W4rlOKNce2n1wwrfZ18NqSMDSQj52xdFoihd78rrMaJvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pqIJ5+fLddzVg4CGWYrqo+HDxROaojy29XidMrKBPsk=; b=eNHbvct3Ext7cTGPAwp01grIl5ZcalWz/2GaRgvV/JlZCNexkI7p6xv9XF+V9/UddlM8a8B3Xs7NxLGpnJBoLvY4U0a/DbGPt/s6nWh2fXV1N4mVpCTiPF80kp4QXyeI2pUditl2FGi9pP4vulovLJP2tfD5qAcQdlIzoeNEEWMft7UuvrFsL0OBKo68l/T1bNEhCN9UrrFUDWYaZk5rIjUjLcj9xXNkD5uPzpgnxnMAcBn6EZj5kxmDxjzUMBgVBzIrKGToP3zZxntfkJoqHCwoX67hBggNcle0d9lcZFFVe41aYZ9hbMTYjfihLQAM///c6H5BuDgTPK0gsBn7lQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pqIJ5+fLddzVg4CGWYrqo+HDxROaojy29XidMrKBPsk=; b=lfnh7nbvZveojA+w98i8Zby6N3Bkpq9ByAnSoQCwDcxfeTRQDxZDsQsGNh3Kp4wGu8jOsrd5/EvyTKQmEeMUPx+Ex+USBf9odjxmliTBcn0rTHmWqvNhpM8pyQZlc/olvORmvn6WUtxUer/C1GRHOSKdb90o2qPWwH6ce9oGobIIVk6ZvbS1BNFxds51bN+dBRW2ocq72hFWnWl6qnQ21XyDZjqdktSftfAlxGjcEK6hQ6UthfloTjNm9g5lLCFsfFGB7OwGAfVYDOWuwPS2XMgnlX41c3/867DFWFUCmshuKbRngEEHv3xU2QKBNUGD/SGGTNTj/0jeTPLuLoEUlg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:35 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:35 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 05/16] iommupt: Add unmap_pages op Date: Thu, 15 Aug 2024 12:11:21 -0300 Message-ID: <5-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: MN2PR20CA0049.namprd20.prod.outlook.com (2603:10b6:208:235::18) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 4d69deff-d8be-4258-2a07-08dcbd3c8d29 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: 3R4stLRy0bdo6vwkKvfHENmEVgJ2spyWsshcZCeuniioktuk5D0DF85O/eZP1NgJl8sBBu8zEFUR4wLPMmYHS/3AzWns/RCxTr4kBMiTCENYxAn1hACKoKvKNYPkAc1DpeIjdgq6TXkMYafBYjaVHvDtKCNKfMjxDCkuxkGgHKP1SYU+Ld25pf8MhJXkJxYOmA8nJWPvWaDdZEc8hbrUyd7g4nRbl15voCMaeFAuwd4/BoyLZsQtaFDAA6ipQGnn5kzDGCZ658Ia5C6DVjcStmaBeQYlEc0ItBkdvFIY1pUVKUz/pcnyPyZLYhB51Q0QGp6flF17+8IzYnesozJWmfhaBgixO+99uU3OlmqGMT3gkBOubfw8u0m5wF23GpQz6aTyXLuA11i6tIKDcyTBi8aTF0Mvkg/ZHZjPnNXfy6jtXeL/cFoMzkCWgUADZf7BuXJb3Q4I2D/htFFK5na21fJ99FHf8efe4t6ndE5DQrcPk3Fkm0FNVo+oDQpKKSzbaSydlTyjAhP5RQ1N2e3SKk2/Q7Jh9LyASAUwcbsHYDoyU1nEbVOoJHHvKvEQZbqSBNJ9o5WXhhwzTBbgf8Mo/DSGfho12UxPBx+Qzh1LJWD8LNDm/o7jBKlI/NMoEtYXUmnSTpy9NSTWzdhgrnBywe+VAXhL4mVftCnPNX4M4Jki1C2b9aBxjfv97KJIkbqLrvb0rCdwBIWOM6SVTfFGsQUEaOLHBYfiVuhaPkRyFZS1L2PwRC2mjMp1GHgnOJOoVA6aJsI3h6FRazrT/po7FiW9KthC4ipPFbd186kGMTuJGIDsg9aVRar2RZlHv6XWAyrTakIdaDLyAYdCHDmT7PGG7pjnB/8QlBKDCxeENnM6N18VTF7anNUSa1r3ZZIMB/mWVTyGZbzT4UoB5JpoL75d3iz+Sup/tsFRk3yTKObayJErUZAViBqHF3pTBTxaTd5KPtdQUwIfOK7U8etmohqQl6bIeDEHipkinXdW/m94VFyjYWWeebAzZg3dYm+v8rvZY+DJfCFRp2s15BNiwEjqJZlY2pgZZJjGlixdC4jgkUK3PwK5npBIWo9iC8CZ9QVtJRJzswa5f4V3jAFVbFKSCk7VE71NPexu0BsW+X6peX/lMrI1V9Bv8wJFwn9bzdmg+5n2Ye2aentRo0QdpURznyUzois127CTd57j0idnvOmDTagDC8uq/NnQA0OLktoGeNE27ZgJPBz2gIhBsYDadnFEUDUgHTfCfjXnHFlr/GhFozwLFyqmUg/ImlQiCSXzvPJphp5aY2lI0HX4dN4AhJG4U/rp7YXR1l9WrSgOIeuOsAh3vKuXxV04I+LAL7Gsj91TrrC08QuWQ2Uj1A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: N4nSCERX4YPbhUjo4qkhlLT4pOJ32cgSbObx3rOr+PqQM0Aujd6bftlDX9ne0wsGaW9rXRPglJkM/89gnG2t60MufuXAiOKFZ50J1iPWGMvvCmN1DXlnKNGhkhVXXUiIGEcsKcw1V1yrw0yddbjOoOqZf5qQ/b/M0eU5xRPw4x+6R9IJfzDRHPRpyDyKFWF6orMnbyI0Cmc12qdz7jdf6JEBuTW8ar26sCgxO1Y2FhR1rSSFEWZSG8NZRblhZSnkOV/yvur5RbRrW3TwSU+wDhfs90gTTF4v4svkCf6yyGYHIYPHbFnXXEgJZCR7Rc3rRfTg12V0gWfTdpyN+brMAKdfB1LxbFwUTWSuUvErsAQIOwlrYPNjg+tZ+uOebRnhrQPBazMtfqyE5Gemlf0IOSmDpZF2bJ3xWU5LmHMdn65rBalMEFO9Q5NVAPCUsnlXHv6pyVU/URgzWPjR3Lr3iGwFP/x3jnHHZ2FK9WAhHZgUumS8+Ph86GfVsULi8BEQX0sA85rlX0uVBN4g9K8uPzidtvpnrOtAHwcyZlf/c1zBSv4ECUF15x85WvA6rRbZsJdGie6k8IK75lakRsqVrJ/INwRQpuHo/LUTt7GxvPTEx0OB9b0omiM5TKCgVKFZ0Djd0aTYQ5dr3JTelY2CMefE8Dpx2yDgHLg0QntFpe80NkPvq0+6TBwDB87tJrh1gqSDMXA3nJm25zFz8pplfLuTh84hT4vlpgPneZWaBipDx0xhM4/+ndWvmu2VnDbUjyUxIb5lqgZb7SEEnX2ZI6ZR5Fxpn/KFTMgANTSXBG9xDvn47garl9SrjiXKDUJmUhmNVBLqDB4/ws4jKmG5xsu4Uk3typhmx6TApRxSOmvs3FZVvJLL09PDZ+U2ZgqZLXkaiShMuo9QLFeHFgUfUxHYqnH+3o7UuvOgA+WPspw2ADYAEqHXctM8lft/i3lEI/MjOnLt7iy2A3+9Ow5l0Y13oG3Who1jGaSgSZs5DzhKbLXdCwiM/vny74wNx4ul3nkJm3DQBxfEYY/XhHJqWOZuS8h14SwRRdS1deH+ru+AmBMQLfbvwHEU9bOnO+10DWfQwhJIPN6JQmHo6kkNVGpfC9BCq7XuCsYcfYQsW8G0R2R0d8L8IMy8UadOfSAfp+bsE3XFK5uW1rUyz/OvFbmtvaQsu1+po/OzdkQu4REkWba7GdbkTcpgz7UjD3cqDrDmvIfLbZVBvxUeKPBAg4gHq3oTSF7IEm970lkNrGe32REW8QzFxq4e4Ox6nxuBmwXBa4pKKRJ7XLqCGHQclcAyhHAiZuyGJZnxp3UaOxSXhI4vcJiLhQ18ueXKqDN+pokG0Exrn2xMIgQtE2eMR71sHM/CYQmfjH5bvHmmjucpCiiLQ5mNUndyOAo1dLkPjZIty40efH6EKRXyHv4WiOgUlGcHVDQgYS0qEYYdG29aX6VihqINezsKmH5ZBjR2tnKy5wvi7GCjjvS+QR9vsYRGvCnBBQyATKvvUriZXkFgUOnOfCF97h2DfzojnaIITd3JzBq4s0EtXo2cLmXGveMpYdffruhVrYwF46y/o4o= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4d69deff-d8be-4258-2a07-08dcbd3c8d29 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.5865 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: p1lprTzD8GPAQkVY8Dx8d0cWK65+t5sgsFz6m077qn8HEtIH5DrBmZ3Qj55vwqIB X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 unmap_pages removes mappings and any fully contained interior tables from the given range. This follows the strict iommu_domain API definition where it does not split up larger page sizes into smaller. The caller must perform unmap only on ranges created by map or it must have somehow otherwise determined safe cut points (eg iommufd/vfio use iova_to_phys to scan for them) A following patch will provide 'cut' which explicitly does the page size split if the HW can support it. unmap is implemented with a recursive descent of the tree. It has an additional cost of checking that the entire VA range is mapped. If the caller provides a VA range that spans an entire table item then the table can be freed as well. Cache incoherent HW is handled by keep tracking of what table memory ranges need CPU cache invalidation at each level and performing that invalidation once when ascending from that level. Currently, the only user I know of for partial unmap is VFIO type 1 v1.0. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/iommu_pt.h | 143 ++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 24 +++++ 2 files changed, 167 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h index 835c84ea716093..6d1c59b33d02f3 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -14,6 +14,63 @@ #include #include +#include +#include + +/* + * Keep track of what table items are being written too during mutation + * operations. When the HW is DMA Incoherent these have to be cache flushed + * before they are visible. The write_log batches flushes together and uses a C + * cleanup to make sure the table memory is flushed before walking concludes + * with that table. + * + * There are two notable cases that need special flushing: + * 1) Installing a table entry requires the new table memory (and all of it's + * children) are flushed. + * 2) Installing a shared table requires that other threads using the shared + * table ensure it is flushed before they attempt to use it. + */ +struct iommu_write_log { + struct pt_range *range; + struct pt_table_p *table; + unsigned int start_idx; + unsigned int last_idx; +}; + +static void record_write(struct iommu_write_log *wlog, + const struct pt_state *pts, + unsigned int index_count_lg2) +{ + if (!(PT_SUPPORTED_FEATURES & BIT(PT_FEAT_DMA_INCOHERENT))) + return; + + if (!wlog->table) { + wlog->table = pts->table; + wlog->start_idx = pts->index; + } + wlog->last_idx = + max(wlog->last_idx, + log2_set_mod(pts->index + log2_to_int(index_count_lg2), 0, + index_count_lg2)); +} + +static void done_writes(struct iommu_write_log *wlog) +{ + struct pt_iommu *iommu_table = iommu_from_common(wlog->range->common); + dma_addr_t dma; + + if (!pt_feature(wlog->range->common, PT_FEAT_DMA_INCOHERENT) || + !wlog->table) + return; + + dma = virt_to_phys(wlog->table); + dma_sync_single_for_device(iommu_table->iommu_device, + dma + wlog->start_idx * PT_ENTRY_WORD_SIZE, + (wlog->last_idx - wlog->start_idx + 1) * + PT_ENTRY_WORD_SIZE, + DMA_TO_DEVICE); + wlog->table = NULL; +} static int make_range(struct pt_common *common, struct pt_range *range, dma_addr_t iova, dma_addr_t len) @@ -102,6 +159,91 @@ static int __collect_tables(struct pt_range *range, void *arg, return 0; } +struct pt_unmap_args { + struct pt_radix_list_head free_list; + pt_vaddr_t unmapped; +}; + +static int __unmap_pages(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table) +{ + struct iommu_write_log wlog __cleanup(done_writes) = { .range = range }; + struct pt_state pts = pt_init(range, level, table); + struct pt_unmap_args *unmap = arg; + int ret; + + for_each_pt_level_item(&pts) { + switch (pts.type) { + case PT_ENTRY_TABLE: { + /* descend will change va */ + bool fully_covered = pt_entry_fully_covered( + &pts, pt_table_item_lg2sz(&pts)); + + ret = pt_descend(&pts, arg, __unmap_pages); + if (ret) + return ret; + + /* + * If the unmapping range fully covers the table then we + * can free it as well. The clear is delayed until we + * succeed in clearing the lower table levels. + */ + if (fully_covered) { + pt_radix_add_list(&unmap->free_list, + pts.table_lower); + record_write(&wlog, &pts, ilog2(1)); + pt_clear_entry(&pts, ilog2(1)); + } + break; + } + case PT_ENTRY_EMPTY: + return -EFAULT; + case PT_ENTRY_OA: + /* + * The IOMMU API does not require drivers to support + * unmapping parts of pages. Only legacy VFIO type 1 v1 + * will attempt it after probing for "fine-grained + * superpages" support. There it allows the v1 version + * of VFIO (that nobody uses) to pass more than + * PAGE_SIZE to map. + */ + if (!pt_entry_fully_covered(&pts, + pt_entry_oa_lg2sz(&pts))) + return -EADDRINUSE; + unmap->unmapped += log2_to_int(pt_entry_oa_lg2sz(&pts)); + record_write(&wlog, &pts, + pt_entry_num_contig_lg2(&pts)); + pt_clear_entry(&pts, pt_entry_num_contig_lg2(&pts)); + break; + } + } + return 0; +} + +static size_t NS(unmap_pages)(struct pt_iommu *iommu_table, dma_addr_t iova, + dma_addr_t len, + struct iommu_iotlb_gather *iotlb_gather) +{ + struct pt_common *common = common_from_iommu(iommu_table); + struct pt_unmap_args unmap = {}; + struct pt_range range; + int ret; + + ret = make_range(common_from_iommu(iommu_table), &range, iova, len); + if (ret) + return ret; + + pt_walk_range(&range, __unmap_pages, &unmap); + + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) + pt_radix_stop_incoherent_list(&unmap.free_list, + iommu_table->iommu_device); + + /* FIXME into gather */ + pt_radix_free_list_rcu(&unmap.free_list); + return unmap.unmapped; +} + static void NS(get_info)(struct pt_iommu *iommu_table, struct pt_iommu_info *info) { @@ -143,6 +285,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table) } static const struct pt_iommu_ops NS(ops) = { + .unmap_pages = NS(unmap_pages), .iova_to_phys = NS(iova_to_phys), .get_info = NS(get_info), .deinit = NS(deinit), diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index 5cd56eac14b41d..bdb6bf2c2ebe85 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -8,6 +8,7 @@ #include #include +struct iommu_iotlb_gather; struct pt_iommu_ops; /** @@ -60,6 +61,29 @@ struct pt_iommu_info { /* See the function comments in iommu_pt.c for kdocs */ struct pt_iommu_ops { + /** + * unmap_pages() - Make a range of IOVA empty/not present + * @iommu_table: Table to manipulate + * @iova: IO virtual address to start + * @len: Length of the range starting from @iova + * @gather: Gather struct that must be flushed on return + * + * unmap_pages() will remove translation created by map_pages(). + * It cannot subdivide a mapping created by map_pages(), + * so it should be called with IOVA ranges that match those passed + * to map_pages. The IOVA range can aggregate contiguous map_pages() calls + * so long as no individual range is split. + * + * Context: The caller must hold a write range lock that includes + * the whole range. + * + * Returns: Number of bytes of VA unmapped. iova + res will be the + * point unmapping stopped. + */ + size_t (*unmap_pages)(struct pt_iommu *iommu_table, dma_addr_t iova, + dma_addr_t len, + struct iommu_iotlb_gather *iotlb_gather); + /** * iova_to_phys() - Return the output address for the given IOVA * @iommu_table: Table to query From patchwork Thu Aug 15 15:11:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764925 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2081.outbound.protection.outlook.com [40.107.92.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6069B1B1429 for ; Thu, 15 Aug 2024 15:11:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734714; cv=fail; b=kpTipy9IJSy1VQ5VshTiw+2h3iwFiimD7wTeC3rrArAhIdFPgKAF0YPBX9WADF0qBFLRgB/vV4JC4GOneerkWb+/Wd7Q4/pd4AKmv73tDNftfYqeOwmBUJS5Pe8EdAshrsrJdG2Y74g0wn0o8o7V1fg5isU9ME1fbzY2KjuJ/AI= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734714; c=relaxed/simple; bh=QdY8jnCU2kLVqGAgphb+9eEnoFMrqbDklhmnbIr5Pf8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=ValUkifFvXszMppHFdyPR2d3EqLWKXQV16B0qGjSfQVbTAeVaWOUmPGDTB/+fTelHdx+BZC/go5cPYropGre/0jRXpemX8oB4iXnjT2go6TF30QR+8vCeKSSxIiXJyZkAxZEgyOZSpVhE5A/97up9qqwgtrgJETTHJ3BkYlO13M= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Y64niQxM; arc=fail smtp.client-ip=40.107.92.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Y64niQxM" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Z4okiP7z7MNHngJMrLL9eR1dFvLtJC20q1onPL7SNUTmRCKK5ppWqBnb70Brrjdr+U24NCSYLM09+8rKI/6kJz8xeJ3wScGIvc9E8W7w7dubU1ayp44/uZ+Wz6J5AcXVS/WTSxd2erR6ZvElnL1369Y7HY5aejTUSO93IAuV/DH3PwShhJ4KkyEahsl+kIPRNC4rkfHYguYuEhta9ckVsG46Vwg7kfV46xVj/kdCzjIxENqsNrpgD9vCz94ggvjGFRDVTwzbGZHhnsvc+89/55XG5f8ncQUufpHrhHvk05l0+dwlqvZiyBbqd1EBRWitD05LLnWLSGpgZHcpiqGy7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/dc6OFVd1fJllHqtUdTj1JI7YguQmswQxuE+3IZV/Ec=; b=Aho6P2s4CMzQP/XXrzlYJbBkFqBUvc86UiU8lgzDE5E1gBiiauJccgWr5NWm3RACmD952omHKizfqmuA6nC+jG7+/ExHa9IjsIqLYj5Wh/9K2oPAPaTT37fPspMHQsbh20RJ9VXS3yOJRuWuuvUd9lkBRObumiIlXaVjBQif3JUutMA03Xy4yMWnrIEWz65rLDUKV4sPrSKQsfm4oVOpTGiruqG9VXpwIimwZwSQe/gi2xfpJkCcygoXe9bkKh+fjO5VlrjsuaHL/lG6qqq827SlU+25TyW2s8f/rn7/0h6Fxk6GtoBYWbCD07EthSaDbcJToL+N1Unzh+vEGaPO1A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/dc6OFVd1fJllHqtUdTj1JI7YguQmswQxuE+3IZV/Ec=; b=Y64niQxM2AMmQYAd73O5Ilgs0OOAM+D5Bl4DWHbwxVWyMXj06LHgbZTfBOdiehQzYHFGqf26K+B5cgrvTDJibvpAFFj6T/YectITKc5giYcLUkZ85A2/HedSeXGtDhJWrNhQU0yrKfAq1OLW+q3J8OrOg2B+7OmgQnqsXugMSGOVvxEZMvjGcWEMgTmv7ATKiBScqO7uDHWJ6kGRBrAtpG3Xut1rFd1S9k7BZnBZ7g0iFB0SbzgeJ8pyTw6mgns3/fy1fPLfZlbIKogR7qRKv84+7GD4PCFYn/vET2tdFk9ZD/MBGELSmDgBXmp+nCB8T0Ho47zqWjbT8V33gT3Gdg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:43 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:43 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 06/16] iommupt: Add map_pages op Date: Thu, 15 Aug 2024 12:11:22 -0300 Message-ID: <6-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL6PEPF0001641B.NAMP222.PROD.OUTLOOK.COM (2603:10b6:22e:400:0:1004:0:10) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 42c00cdc-9801-4e5c-aed7-08dcbd3c8e74 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: xWmIRYpPWlXtM1obgfF04hhP7fttpYJJis5uswm5fGeFg/8eykNhEdtCEdTWEj8NMPvVOE5mWz+L8lDjQIOISa/yjyWymUE6tN3Doe64oAXr/hQDyTUHQMsECtOxD1QIF7Gi1tVcQN0xr1DjIieUZdV3mMbXoCkwGl9pto1jSKxStjqGd1y3IMdu1m/iX2OHLRQsXRDv7E3ulL8bf7zNIplnruc2U78zH23wCYdEGErtsRI/P63srFDPEEBZs4roXjNUwalbJWWfNxg22a/XStrlLMIM2crdpeSXvTIGS70KbAgW0DaR5ATK//OTug+NFHGbwIHy7Q+TRhrrSh5n3j/4bR2lJaycRwMXFz+JYY7zNutZuA4psb0j4nQL3xqnvddbL2PJVGG1m9rHITB3FgbK4D5Zll0WFt403TsvSzdB49NgzqCxohigRtXlv0GUdbAdkOUboAcq9Cr+23+qgfr6OCTPpMEkSKLgXE7KA98ewls5pdqdBooKGsupZOk2nIraQgX61byOCm/17grsehIo617ktLBGvWWkOUVKbtrfy37uw+/uIEDVspcix6eE/Tpbu+S5kNibNBQFmDoju4U93dYjjODvxW4io3SPpuLlmerUSGvWcKUAMWhW3kLM2ciR7fkCh0gDnif5P11RGDRiiXreKYfPsl1xFhpSb49lAmH1rfh1vScdI4pVVZOB5lTRNFADmFpXVEyWEszE7saTCEIPzXZsLbhmXlmk0VtAylLNkIgd7z0cDUaRKv8IcmFnGSjyx9Pz++v4gMZLRd4Bku0uJmL+9aiYrP/qmHjNJ5QKPVmQC+MxpyLE7bgph2LqAt2lALMQFvzwBlAvyE/J46ND8/MLgb7UnaO/88L4pgqvadc+4xLTZsB5xwwsUQZ345iIqvyY8T72l7IuigUExT0aovPCSwzUgb/H37kAJmfgaoHVgHhUqErEnQlFxFnUcK+c0ytjY1GHOOCy+2RqDkS1pvd4XVtmGhkoU4viDjo+x+nOuoc+reLTbDTwFKugyZgUVhKS12aLdbgQMUieTBCl284EebutQzdpDqjK7EN6s/CCA7ZwE292+QafccLVgMqyptBl81mlajJOdc9/Jk4FjMr11thj6U8EhbDSY8SSvjDEFZ3mF1YDvb6S7YFOs+eNerhX/OEYOHG4BghFboOpCZ+24C9XXmYd1vYFs/KCfCBTwoLKHNBcJf/xW6cUNuD94ygrtJ4i6mGYCtQCGoF1JW0Qxzf7MLXVcwRxqEexLYCsoqr08vnMcT+mZsnLXF4DyiEwK+59/WG+BKkbA/cMKUzhSPMTvDynpuLkr6SpcFrs3uY4JXx16lbTQVqYzyzLEUcxXy36XLgT4A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: DiaaBDsWzdFR+Gi9RgXy8lKErpnHrKXD7uPPoknAD2N1yu1erw6IzaifmtEpcyX0l5ETxebZQmlFKsbmu9lTKpCJJIAys+w9L9I7HM5MWzXjmRK24GYCc/phtMBtXwhaeXVwXEStVlUGfNLCDm2x5xRj8ezzSECi6sixccyPX90MCihqaayWx1qx7bK1p+hzBu+EKPZVchkqn61Tu57Kd5h4G7bwOTq/jRKJE2PpdxVyEm2Dn0nmjrELy95hXP1tTKzz924DRw0m7VtnSmcwT5J5AWxSbUdI4Sj+4GpyziiT7utAoDTOMFFeODorTVQTfIoUtSU7NpFRpgXxHRUdyu1I+oW2e/7e5JCkIS0MGGxTZiYCcCR6bXWMjTaTFZ7+61EK4vC/+34wQxG0GdIOSsB+uFQxaeelo9wFWiSo4W8MLDE+msgFS4djWO2G89L+brx7Gy7yBNe+mU25iDK4zuoLorWVAwISnEEqJXZQnoys9ntguT/hESE6J/0G0S79tzw0euEk/SDPW59InDsG4KzCuC3RJf95jwih8JywpOnjTX0INMVBy7p17B89qdI84fyrKtrhccjnuey2yOJ9xbZS/kSjMyzXSKzkr1dNhvG/eGAMCqm742q+iM+h+pMGTezGxTTGV8wVvZU2rc4z4n4BOxx3Fonh43H96q+KhhPV0S330s8Nb6DKz6dXC1CIjvIwsaxgM1PmSN+9rdgZPdkaeoFi9Uwpd7lxyD3AdkqXdeGjxCpIYwuadEzCCib57NxGRTKxaM5+uKcl5DShEqjUIzF796T8assS5xDMBrRkTEgcqFS2riXHVEDKL5Jvgli0zkxPMCUQhMPw8VOcHjVsr7rSp4nW/7Eiho5r1IhU1UWFuvAe46+AUh8DSl5VTjlYvgUq1cLV4o6LhiBRghjmKvQej7r6HQ9LHy89SyC71JuCBF6lxPJubOkGTmzoMr6ykKoQ1LXI/qpPuG93zhjkQjYE5pOIQN6nInQUyacp4unxZNb9+kALysCLkAuP91UPbzqAZNHJ4EiO6p1bCJ8GcTBAZ9LtP6AfRC0/hAly48lGEpeHL2J6MHDv3af8jdOYV8vP5o4VCrRA8ZNH8Rji55FchOFtk6esWXiWWDMcaSGocBxeN8HHuPunPptJqbaqOb1J/iQRQ54oS1RbRAtU1S5XDkb5shFeOQD6NRa+G4A/8aXGzUzyTgupN/XsleVdE3fhbvB+WdxypQDJorD/iW6VKN3WdFFhrEmrcYv0xl5cCsO8aQF9QjEbd6G4JGxseCFXsGGK3q8sameMlHaXsW94S9Pce++ZfNP+F/Jdr+dhLqxyLLBquHDvBubXHVycCX6ENEQJfPRuOjG4MgfWQHRwOPGchv9XQ4IF7hWWeGhaIfqVunzUwdftUsvULGErA/N7jSZQSP6RMKXGsGWyrmq6YhIGs+fQ4s2plSAJgdN/+W7ThN7xcTpVDv1RTs8+Jl7iyejJ1LC/FDP4EfEoP2NWGnwNrn4QcP6ujqf/uKzm+MrGl8Mi3Q1tcpXFSHw7EpIiJDAFelATe1voXAcJymPjcRPRRIxk2m+c5cI= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 42c00cdc-9801-4e5c-aed7-08dcbd3c8e74 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.8048 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: oSWpNNT5aCHYJk/pkr56MaBZbRsfklrg6iSdmulqLvw6Oz+zahG4gMkH9CJ9DZ/i X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 Implement a self-segmenting algorithm for map_pages. This can handle any valid input VA/length and will automatically break it up into appropriately sized table entries using a recursive descent algorithm. The appropriate page size is computed each step using some bitwise calculations. map is slightly complicated because it has to handle a number of special edge cases: - Overmapping a previously shared table with an OA - requries validating and discarding the possibly empty tables - Doing the above across an entire to-be-created contiguous entry. - Installing a new table concurrently with another thread - Racing table installation with CPU cache flushing - Expanding the table by adding more top levels on the fly Managing the table installation race is done using a flag in the folio. When the shared table entry is possibly unflushed the flag will be set. This works for all pagetable formats but is less efficient than the io-pgtable-arm-lpae approach of using a SW table bit. It may be interesting to provide the latter as an option. Table expansion is a unique feature of AMDv1, this version is quite similar except we handle racing concurrent lockless map. The table top pointer and starting level are encoding in a single uintptr_t which ensures we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use that fixed point as its starting point. Concurrent expansion is handled with a table global spinlock. When inserting a new table entry map checks that the portion of the table is empty. This includes removing an empty interior tables. The approach here is atomic per entry. Either the new entry is written, or no change is made to the table. This is done by keeping a list of interior tables to free and only progressing once the entire space is checked to be empty. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/iommu_pt.h | 337 ++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 29 +++ 2 files changed, 366 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h index 6d1c59b33d02f3..a886c94a33eb6c 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -159,6 +159,342 @@ static int __collect_tables(struct pt_range *range, void *arg, return 0; } +/* Allocate a table, the empty table will be ready to be installed. */ +static inline struct pt_table_p *_table_alloc(struct pt_common *common, + size_t lg2sz, gfp_t gfp, + bool no_incoherent_start) +{ + struct pt_iommu *iommu_table = iommu_from_common(common); + struct pt_table_p *table_mem; + + table_mem = pt_radix_alloc(common, iommu_table->nid, lg2sz, gfp); + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT) && + !no_incoherent_start) { + int ret = pt_radix_start_incoherent( + table_mem, iommu_table->iommu_device, true); + if (ret) { + pt_radix_free(table_mem); + return ERR_PTR(ret); + } + } + return table_mem; +} + +static inline struct pt_table_p *table_alloc_top(struct pt_common *common, + uintptr_t top_of_table, + gfp_t gfp, + bool no_incoherent_start) +{ + /* + * FIXME top is special it doesn't need RCU or the list, and it might be + * small. For now just waste a page on it regardless. + */ + return _table_alloc(common, + max(pt_top_memsize_lg2(common, top_of_table), + PAGE_SHIFT), + gfp, no_incoherent_start); +} + +/* Allocate an interior table */ +static inline struct pt_table_p *table_alloc(struct pt_state *pts, gfp_t gfp, + bool no_incoherent_start) +{ + return _table_alloc(pts->range->common, + pt_num_items_lg2(pts) + ilog2(PT_ENTRY_WORD_SIZE), + gfp, no_incoherent_start); +} + +static inline int pt_iommu_new_table(struct pt_state *pts, + struct pt_write_attrs *attrs, + bool no_incoherent_start) +{ + struct pt_table_p *table_mem; + + /* Given PA/VA/length can't be represented */ + if (unlikely(!pt_can_have_table(pts))) + return -ENXIO; + + table_mem = table_alloc(pts, attrs->gfp, no_incoherent_start); + if (IS_ERR(table_mem)) + return PTR_ERR(table_mem); + + if (!pt_install_table(pts, virt_to_phys(table_mem), attrs)) { + pt_radix_free(table_mem); + return -EAGAIN; + } + pts->table_lower = table_mem; + return 0; +} + +struct pt_iommu_map_args { + struct pt_radix_list_head free_list; + struct pt_write_attrs attrs; + pt_oaddr_t oa; +}; + +/* + * Check that the items in a contiguous block are all empty. This will + * recursively check any tables in the block to validate they are empty and + * accumulate them on the free list. Makes no change on failure. On success + * caller must fill the items. + */ +static int pt_iommu_clear_contig(const struct pt_state *start_pts, + struct pt_iommu_map_args *map, + struct iommu_write_log *wlog, + unsigned int pgsize_lg2) +{ + struct pt_range range = *start_pts->range; + struct pt_state pts = + pt_init(&range, start_pts->level, start_pts->table); + struct pt_iommu_collect_args collect = { + .free_list = map->free_list, + }; + int ret; + + pts.index = start_pts->index; + pts.table_lower = start_pts->table_lower; + pts.end_index = start_pts->index + + log2_to_int(pgsize_lg2 - pt_table_item_lg2sz(&pts)); + pts.type = start_pts->type; + pts.entry = start_pts->entry; + while (true) { + if (pts.type == PT_ENTRY_TABLE) { + ret = pt_walk_child_all(&pts, __collect_tables, + &collect); + if (ret) + return ret; + pt_radix_add_list(&collect.free_list, + pt_table_ptr(&pts)); + } else if (pts.type != PT_ENTRY_EMPTY) { + return -EADDRINUSE; + } + + _pt_advance(&pts, ilog2(1)); + if (pts.index == pts.end_index) + break; + pt_load_entry(&pts); + } + map->free_list = collect.free_list; + return 0; +} + +static int __map_pages(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table) +{ + struct iommu_write_log wlog __cleanup(done_writes) = { .range = range }; + struct pt_state pts = pt_init(range, level, table); + struct pt_iommu_map_args *map = arg; + int ret; + +again: + for_each_pt_level_item(&pts) { + /* + * FIXME: This allows us to segment on our own, but there is + * probably a better performing way to implement it. + */ + unsigned int pgsize_lg2 = pt_compute_best_pgsize(&pts, map->oa); + + /* + * Our mapping fully covers this page size of items starting + * here + */ + if (pgsize_lg2) { + if (pgsize_lg2 != pt_table_item_lg2sz(&pts) || + pts.type != PT_ENTRY_EMPTY) { + ret = pt_iommu_clear_contig(&pts, map, &wlog, + pgsize_lg2); + if (ret) + return ret; + } + + record_write(&wlog, &pts, pgsize_lg2); + pt_install_leaf_entry(&pts, map->oa, pgsize_lg2, + &map->attrs); + pts.type = PT_ENTRY_OA; + map->oa += log2_to_int(pgsize_lg2); + continue; + } + + /* Otherwise we need to descend to a child table */ + + if (pts.type == PT_ENTRY_EMPTY) { + record_write(&wlog, &pts, ilog2(1)); + ret = pt_iommu_new_table(&pts, &map->attrs, false); + if (ret) { + /* + * Racing with another thread installing a table + */ + if (ret == -EAGAIN) + goto again; + return ret; + } + if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT)) { + done_writes(&wlog); + pt_radix_done_incoherent_flush(pts.table_lower); + } + } else if (pts.type == PT_ENTRY_TABLE) { + /* + * Racing with a shared pt_iommu_new_table()? The other + * thread is still flushing the cache, so we have to + * also flush it to ensure that when our thread's map + * completes our mapping is working. + * + * Using the folio memory means we don't have to rely on + * an available PTE bit to keep track. + * + */ + if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) && + pt_radix_incoherent_still_flushing(pts.table_lower)) + record_write(&wlog, &pts, ilog2(1)); + } else { + return -EADDRINUSE; + } + + /* + * Notice the already present table can possibly be shared with + * another concurrent map. + */ + ret = pt_descend(&pts, arg, __map_pages); + if (ret) + return ret; + } + return 0; +} + +/* + * Add a table to the top, increasing the top level as much as necessary to + * encompass range. + */ +static int increase_top(struct pt_iommu *iommu_table, struct pt_range *range, + struct pt_write_attrs *attrs) +{ + struct pt_common *common = common_from_iommu(iommu_table); + uintptr_t top_of_table = READ_ONCE(common->top_of_table); + uintptr_t new_top_of_table = top_of_table; + struct pt_radix_list_head free_list = {}; + unsigned long flags; + int ret; + + while (true) { + struct pt_range top_range = + _pt_top_range(common, new_top_of_table); + struct pt_state pts = pt_init_top(&top_range); + struct pt_table_p *table_mem; + + top_range.va = range->va; + top_range.last_va = range->last_va; + + if (!pt_check_range(&top_range)) + break; + + pts.level++; + if (pts.level > PT_MAX_TOP_LEVEL || + pt_table_item_lg2sz(&pts) >= common->max_vasz_lg2) { + ret = -ERANGE; + goto err_free; + } + + table_mem = table_alloc_top( + common, _pt_top_set(NULL, pts.level), attrs->gfp, true); + if (IS_ERR(table_mem)) + return PTR_ERR(table_mem); + pt_radix_add_list(&free_list, table_mem); + + /* The new table links to the lower table always at index 0 */ + top_range.va = 0; + pts.table_lower = pts.table; + pts.table = table_mem; + pt_load_single_entry(&pts); + PT_WARN_ON(pts.index != 0); + pt_install_table(&pts, virt_to_phys(pts.table_lower), attrs); + new_top_of_table = _pt_top_set(pts.table, pts.level); + + top_range = _pt_top_range(common, new_top_of_table); + } + + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) { + ret = pt_radix_start_incoherent_list( + &free_list, iommu_from_common(common)->iommu_device); + if (ret) + goto err_free; + } + + /* + * top_of_table is write locked by the spinlock, but readers can use + * READ_ONCE() to get the value. Since we encode both the level and the + * pointer in one quanta the lockless reader will always see something + * valid. The HW must be updated to the new level under the spinlock + * before top_of_table is updated so that concurrent readers don't map + * into the new level until it is fully functional. If another thread + * already updated it while we were working then throw everything away + * and try again. + */ + spin_lock_irqsave(&iommu_table->table_lock, flags); + if (common->top_of_table != top_of_table) { + spin_unlock_irqrestore(&iommu_table->table_lock, flags); + ret = -EAGAIN; + goto err_free; + } + + /* FIXME update the HW here */ + WRITE_ONCE(common->top_of_table, new_top_of_table); + spin_unlock_irqrestore(&iommu_table->table_lock, flags); + + *range = pt_make_range(common, range->va, range->last_va); + PT_WARN_ON(pt_check_range(range)); + return 0; + +err_free: + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) + pt_radix_stop_incoherent_list( + &free_list, iommu_from_common(common)->iommu_device); + pt_radix_free_list(&free_list); + return ret; +} + +static int NS(map_pages)(struct pt_iommu *iommu_table, dma_addr_t iova, + phys_addr_t paddr, dma_addr_t len, unsigned int prot, + gfp_t gfp, size_t *mapped, + struct iommu_iotlb_gather *iotlb_gather) +{ + struct pt_common *common = common_from_iommu(iommu_table); + struct pt_iommu_map_args map = { .oa = paddr }; + struct pt_range range; + int ret; + + if (WARN_ON(!(prot & (IOMMU_READ | IOMMU_WRITE)))) + return -EINVAL; + + if ((sizeof(pt_oaddr_t) > sizeof(paddr) && paddr > PT_VADDR_MAX) || + (common->max_oasz_lg2 != PT_VADDR_MAX_LG2 && + oalog2_div(paddr, common->max_oasz_lg2))) + return -ERANGE; + + ret = pt_iommu_set_prot(common, &map.attrs, prot); + if (ret) + return ret; + map.attrs.gfp = gfp; + +again: + ret = make_range(common_from_iommu(iommu_table), &range, iova, len); + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP) && ret == -ERANGE) { + ret = increase_top(iommu_table, &range, &map.attrs); + if (ret) { + if (ret == -EAGAIN) + goto again; + return ret; + } + } + if (ret) + return ret; + + ret = pt_walk_range(&range, __map_pages, &map); + + /* Bytes successfully mapped */ + *mapped += map.oa - paddr; + return ret; +} + struct pt_unmap_args { struct pt_radix_list_head free_list; pt_vaddr_t unmapped; @@ -285,6 +621,7 @@ static void NS(deinit)(struct pt_iommu *iommu_table) } static const struct pt_iommu_ops NS(ops) = { + .map_pages = NS(map_pages), .unmap_pages = NS(unmap_pages), .iova_to_phys = NS(iova_to_phys), .get_info = NS(get_info), diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index bdb6bf2c2ebe85..88e45d21dd21c4 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -61,6 +61,35 @@ struct pt_iommu_info { /* See the function comments in iommu_pt.c for kdocs */ struct pt_iommu_ops { + /** + * map_pages() - Install translation for an IOVA range + * @iommu_table: Table to manipulate + * @iova: IO virtual address to start + * @paddr: Physical/Output address to start + * @len: Length of the range starting from @iova + * @prot: A bitmap of IOMMU_READ/WRITE/CACHE/NOEXEC/MMIO + * @gfp: GFP flags for any memory allocations + * @gather: Gather struct that must be flushed on return + * + * The range starting at IOVA will have paddr installed into it. The + * rage is automatically segmented into optimally sized table entries, + * and can have any valid alignment. + * + * On error the caller will probably want to invoke unmap on the range + * from iova up to the amount indicated by @mapped to return the table + * back to an unchanged state. + * + * Context: The caller must hold a write range lock that includes + * the whole range. + * + * Returns: -ERRNO on failure, 0 on success. The number of bytes of VA + * that were mapped are added to @mapped, @mapped is not zerod first. + */ + int (*map_pages)(struct pt_iommu *iommu_table, dma_addr_t iova, + phys_addr_t paddr, dma_addr_t len, unsigned int prot, + gfp_t gfp, size_t *mapped, + struct iommu_iotlb_gather *iotlb_gather); + /** * unmap_pages() - Make a range of IOVA empty/not present * @iommu_table: Table to manipulate From patchwork Thu Aug 15 15:11:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764922 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2058.outbound.protection.outlook.com [40.107.94.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 788DF1AD401 for ; Thu, 15 Aug 2024 15:11:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.58 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734712; cv=fail; b=MhuMi0RxoX/BUz9uZuVS1OIsnnlfthG+lJqQkuntIvJFdX34KkcezoiiZWwLZj4yyiunSv9LCgOziYwgNRaF/lUTRlCvZh8gDYFnNLwFmdx7GbnGd9KhVAeTlB0zUshNwEYs6K0PA39RVQK3mmxfF6YWfd7X71ANIIu/qlu0epY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734712; c=relaxed/simple; bh=agXxQ/Q64zIyKFaSYsnto1+F0slwFdcbQoNB7ZKb12o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=IAukApo27aDqmFZmtfcprNrZ5qH12gMzoo+YoZDxS5qplFLoSeMCXEV9s8T/fg3+yX9j7lpxmi11Xs8GTJfx4sck7Y7r7ne+hXcICvd6XMApwBwxEPvGfJA68+uGq7kAAhqnkyAkV5dDByXhpR7lpqEkgHjT+2SvY002g88bR7Y= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=CrpwBF7A; arc=fail smtp.client-ip=40.107.94.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="CrpwBF7A" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=aA5y2mxsCrOi1FgzicjTVHAHnZXWz6TrDPe4AdzWUrkpFXDBXD63LCWd6MAyeWBYAe7/255XiHg5KAmgWOd2bfisvxU69mHAzuGIB8gojpyX6CuLjhjjZjuOEZHSj9+MouAcy91KIMPZly3UZayNdlIKkBAhXVF0GiYs4JQMG8LmR14c5g/VzhH0VyFkj5Sa07xMp5IN14ggsXVHnvlPVn2oTdbCtpfy0n380nMtMkGUfTGV4U4EHYyGhDIJbe8OYKMP3XnGgh9yAHgRJSbt/JZOaHHc3fnQiaPbOJZBaLkkfMhWV1/6vzkTSYvnkZdfp9m/htvTOFeeoKK6edRAfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DAdmJh4VgxgkmIhWavLGnmexBBWdbQeGVgAuOdnmCpg=; b=x2vMR4LrTPjQKZqROP0pDaB6YXJV3UjUVeXMbVKflzg+7CzDsGABXiTeRf8Q3QFPV4SmyjJsNL5S6B9hK/poYYDgKzk2PI5A0hic26w9MWfowvT2x/vckrkgrtZ11x0B12Qw3KSZb1/BjDUP7xQxElwv3n99lVJGkJJG4/l8VsP6DGFxK7m3NB5Fd7Nb6iE05aC5pMXv5TSNFUx2PwQ3JB5kWuOH7Di1vPa9NbEjWEbeG2w6Q+MzTCJusoN1RsDL/2QEMnAB0Tg1US4owHC6PRKnGbYml1UfPHnGioa8FIeIfjnNW7e2yNwg/XTHz7Qkq4umFrGqYRjnnkoQzogaGw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DAdmJh4VgxgkmIhWavLGnmexBBWdbQeGVgAuOdnmCpg=; b=CrpwBF7AVCc52/RWlTaTKJcEesyCK0HzH0GY/h5/lV5BhF3ZM/We2xaDklKZ9SfKyyYPFrmnC6to70ZZc0IiZ6t0P1llnCvjwd67NXbLL+2TTdpDWmitVHCjg4am4p+Wc1jJ2RT+7wt6gCpfbEvoHOtsmEcsn5UT3VajHuyH7BoZ0Hu38u7G2hanQ6Cw9aGH+gu2VtND3KzzM458xqSSgEfPE7KYvfZAI+dHQzdxGeJUii696HQIH5HiK/OAnNygaH5W27V6/J6Re0mOq5LU2HBdEdw8Vsz8VqZzNtB7IIyt8Dtr3tCON1mVTjNxKt6aUkI5wrXYESzQdVQADwT+HQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by MW6PR12MB8758.namprd12.prod.outlook.com (2603:10b6:303:23d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.19; Thu, 15 Aug 2024 15:11:38 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:38 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 07/16] iommupt: Add cut_mapping op Date: Thu, 15 Aug 2024 12:11:23 -0300 Message-ID: <7-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR03CA0001.namprd03.prod.outlook.com (2603:10b6:208:32b::6) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|MW6PR12MB8758:EE_ X-MS-Office365-Filtering-Correlation-Id: 5161fb87-7b34-4b61-c547-08dcbd3c8d85 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016; X-Microsoft-Antispam-Message-Info: c9w7wDxGwclhu9pbwi8KRiN2QOtvsrgFXM8W4cBl9Pi2KYIgrIdm58+cHv0lM/D7Aw0ZLjIGbm/S6Bv0QOPP3phkR0EV3Vto/8TDsTwAskSsQ8al9EVSaSbTtiG63l09ny8o4U1muV5m50iflFqfBVv2lf2W3jEuzd6oKSKOtYpH80tAvm5Ci5c9Q+YGn0nVVWXENJax1ckDR83R4H3S9QdI81xepdSL8xdR8dHVcthuMvH46gFGynM1F39iPCRyCCLsaFhFwD8FU1EnMefGDpSPncqEUull8R/PyA63wGFytDrvOARYurwLiN9dnDQO0BmEJNdoTlYRzNM3ET1xxpqCND9mCoROyby2vw+/MKzBOTmZbNSblyTr/T5aY6ocigeAdlHxrteE8ghX/vxgFwQOdWgR3Ef1klM4veySObIgPDAECGWv68GWsXf3fWen0+R3gzBiwG7hHTl7HDHJ2JqQPZzvJ1OJevILLZ/y3Se1+XpvpF8fJcN/XSl3t3xi800+8DnHVOvDtTukwK+dDTjdaParde/Y/WlWlJQDKXGqTObVa4JCw1Vy+Utpz/eLl9yuW28SX7u2mSOzwpvA2v4yXCteyH2bOKNwdS4J+VKZ0MxbKIGJR5TlfD6mu2BYV419hCGjAuj1iPrEYux7SZdt+vjhU4/UkX1zf+LQig+j2u8BnjMC6vc957ojYfTVp4GlsllLAxDDQLZLOkiQGOc0fjTw1iInO7QoSPh/gUTeDQlFm9hIqPR/p68Oc6l0vjwKCoeDknowsZJeKsWP+NrbuwrAW96CTLCbCA1uSPLsRAeBuSfaoO44FGOYKw49EROQkhk7tP6xv0U9sppARgVrfhi0Hw7TwEkSWj9qikGeEMUqmuo2ymFFNfDqIH1hQ1O1Gfz+AwzSOtVme157TpKpCxY1gcylGORDbbVMn4LaHBhJ+xsviXBUkyMThOXu6680KG0vOM/jNeEHl8+Hzt94ZhfQeCOlX/gxI6lfqSKLmIpywecQpNn1J1trDOS0/wGmmZK1GHc1JxK5Uf/gleCfCkhmL6OY/0OUL2aTCQVZkcBgM1ohvLOT0i8m9c12uEqZZChRqj+4Y5Ubl8VuwVQPQBkCK23T0J7+wn3Vs1KbjxlrKkeps0xlMStaTYG00iX4n2zuSuskTTzAilo9NUxbVCLAyYviqRA/yo4KJgORRJAIMFSAgOWpMRZyK8WtVVuGDkw/GTItvnh/MmSQVtg6PDDiVfS8e1pvEK9fEcDxhwepbj+xKQL06PbPiZJZXsnZL0gINplmpZnHtCZE0uILEafAtqqylohX7yOmGxwLx+qnl8lMbNEtXOwGK9r1QqyhSPQocoNQiA2lXTPa1g== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: S4awHrNUXt4cAziVw2eMzfLBcRYetRoKACS3hTDPnjJ9VDGSgZkzJzvIQtUyuEcHS3yQ8OFqzeWKwgBd6npSsyC9pXYna55+zIEjIolwxOgmMmXdpD55IiOHb1B94XeB/FRCf47SmFKC9gyqkxeTcowjxfVOC5xAaIpsD1+gLHRjeRatSwjYNpUe0UGR44ys/eAILIXbxgdX5AaXBAU6xUBniEHJUccC7SFbrHSFojC8ZZfYYrnsDeVVkr7quU0Qcctl9m4KkXeWMGblMA0djpDA/Gy35khomG7ibwjjOYXKW1PJ4T6OHXNrHO6ydWBY6xGl4Fe68U+aiMCsx9sb7DLVgL4DdBoek5QmkQUBek9WhB1O0+7VsU8Jo4JEqTY1Ya41ewxX6iartWLaBGtXQAgQDgrjAjzfugWf828adtbcmwTLXnKdoZZClNROBoDzNl0FQHlaTRYpX73JL3e02klOCtx90PjtKcq5GbNLvHOkCPLmCnshZ+6LqiFMV2RhxH8gZK1dZsy3oklZ3eyA06NqSEjE6jRrKokF+ZURfl1SZDALGqb4G3wXyPV7AF9cHlWIZYRqnu9eawQWvFJLgKpLCNyAZWBCtxGiQXRkT1ebbG1xXZi8rBFNF0ZKDVpAg+bSDMND6p7i6dHG1OU46gI/2jK/9su7syTHXO115IRNSXnc4ayLr1sAUaE7sQfTYKf32fgf4d17bfH9TOryFW4S+s+yR+QfWkpKid40ijEUrfhgvPmVsSYDQA6TmDCrOLS/uolkoK5LlRbgdKrGDMoUkKw7hREXSH6J7dF/4uChWlgqNc0HBleiOIHiqo1n5UmNWbDpmlsV8FpLdSgSlOkYx+X/biHpXimQTHEuJQTl87aAWkvilcDYBDTpv8/P0aY8HvItoDYyv5w52cP5YgY/NV8F62ybGQLxkYY2/s/SG7D8MmIVJQKfmODc0ODyyLXAteXJdA5qx85Sjny21PlaVDbDq1bLK5UnUkQjyfnJq3+0un8C2Rm49TP3L5ziJkllQxBIg0LrFYgjjg88M3G9t1Cv9Ky7gHPQf6dTK67NTvf9cM7XoqyXxlgEDCkBQffDoTMWQOZ/fZ71oF0zvsTqqWXO8KoNimLvJ6Qf7RNP9rssFWOdDHTKrU4BPuUyhfbXIo8Hz30v9mfINjZrvfoHl3dfYKZE9C54P7mWIsEEg9sKbk+RI2hZsRHiuFq9dK6izbAHIX0BtIaVEZ8I05EddddsA5A8F2ZR7Jhu6FFaKRJ7lYmjWB7cyuhLmUsJ+rD68rEw3pgVAIoaDVb6q0K7IFyFlg2oqm/wBp2W+wxN/OUaIA/sMfm8GfxScsOsm1T1AIY/ZTPXulY9hdquktj+sbVgTIjn+CLoLPy53/Sa2fa984OdQAFTFiF2XIKsItmHF2UERVsZT6zO5tVw74d17TREgiRG3QkBxJN/s6vnTuI037tsmt2FwH+ETGa1frYjl7+CHmOxqtyf0X2ENHiSuDGv2p8LNJrCVqbqNOq5kQcvTUJC4KHAkRB+AdbdwSISS9a/3yRgrSSPx5R/qLaZdbTwMCKE2qEWygf9qgU= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5161fb87-7b34-4b61-c547-08dcbd3c8d85 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:35.2222 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ZotXjMyMtROpMZW+J3Si2H3IC/O4aLnGU+UPKeqW9mpoGDMGGBtqHBxeIpEAt8xi X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW6PR12MB8758 The cut operation allows breaking large pages into smaller pages with specific purpose of allowing unmapping of a portion of the previously large page. Specifically it ensures that no large page crosses the cut point, and thus, we can successfully unmap starting/ending on the cut point. This is the operation that VFIO type 1 v1.0 implicitly supported and some iommu drivers provided internal to their unmap operation. Implement the cut operation to be hitless, changes to the page table during cutting must cause zero disruption to any ongoing DMA. This is the expectation of the VFIO type 1 uAPI. Hitless requires HW support, it is incompatible with HW requiring break-before-make. Having it separate from unmap makes it much easier to handle failure cases. Since cut is fully hitless, even in failure cases, a caller can simply do nothing if cut fails. Cut during unmap requries dealing with the potentially nasty case where some of the IOVA range has been unmapped but some cannot be unmapped. The operation is generalized into a form that iommufd can use for it's existing area split operation. By placing cuts on the boundaries of the split we can subdivide an area and maintain all existing semantics even with large pages in the page table. Cut is optimal and generates a page table that is equivalent to calling map twice, that is the two halves will still use maximal page sizes. FIXME: requires deeper kunit tests Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/iommu_pt.h | 263 ++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 29 +++ 2 files changed, 292 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h index a886c94a33eb6c..4fccdcd58d4ba6 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -232,6 +232,268 @@ struct pt_iommu_map_args { pt_oaddr_t oa; }; +/* + * Build an entire sub tree of tables separate from the active table. This is + * used to build an entire mapping and then once complete atomically place it in + * the table. This is a simplified version of map since we know there is no + * concurrency and all the tables start zero filled. + */ +static int __build_tree(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table) +{ + struct pt_state pts = pt_init(range, level, table); + struct pt_iommu_map_args *build = arg; + int ret; + + for_each_pt_level_item(&pts) { + unsigned int pgsize_lg2 = + pt_compute_best_pgsize(&pts, build->oa); + + if (pgsize_lg2) { + /* Private population can not see table entries other than 0. */ + if (PT_WARN_ON(pts.type != PT_ENTRY_EMPTY)) + return -EADDRINUSE; + + pt_install_leaf_entry(&pts, build->oa, pgsize_lg2, + &build->attrs); + pts.type = PT_ENTRY_OA; + build->oa += log2_to_int(pgsize_lg2); + continue; + } + + if (pts.type == PT_ENTRY_EMPTY) { + /* start_incoherent is done after the table is filled */ + ret = pt_iommu_new_table(&pts, &build->attrs, true); + if (ret) + return ret; + pt_radix_add_list(&build->free_list, pts.table_lower); + } else if (PT_WARN_ON(pts.type != PT_ENTRY_TABLE)) { + return -EINVAL; + } + + ret = pt_descend(&pts, arg, __build_tree); + if (ret) + return ret; + } + return 0; +} + +/* + * Replace the OA entry patent_pts points at with a tree of OA entries. The tree + * is organized so that parent_pts->va is a cut point. The created mappings will + * still have optimized page sizes within the cut point. + */ +static int replace_cut_table(struct pt_state *parent_pts, + const struct pt_write_attrs *parent_attrs) +{ + struct pt_common *common = parent_pts->range->common; + struct pt_iommu_map_args build = { + .attrs = *parent_attrs, + .oa = pt_entry_oa(parent_pts), + }; + pt_vaddr_t cut_start_va = parent_pts->range->va; + pt_vaddr_t entry_start_va = + log2_set_mod(cut_start_va, 0, pt_table_item_lg2sz(parent_pts)); + pt_vaddr_t entry_last_va = + log2_set_mod_max(cut_start_va, pt_table_item_lg2sz(parent_pts)); + struct pt_table_p *table_mem; + int ret; + + if (unlikely(!pt_can_have_table(parent_pts))) + return -ENXIO; + + if (PT_WARN_ON(entry_start_va == cut_start_va)) + return -ENXIO; + + if (!pts_feature(parent_pts, PT_FEAT_OA_TABLE_XCHG)) + return -EOPNOTSUPP; + + table_mem = table_alloc(parent_pts, parent_attrs->gfp, true); + if (IS_ERR(table_mem)) + return PTR_ERR(table_mem); + pt_radix_add_list(&build.free_list, table_mem); + parent_pts->table_lower = table_mem; + + /* Fill from the start of the table to the cut point */ + ret = pt_walk_child_range(parent_pts, entry_start_va, cut_start_va - 1, + __build_tree, &build); + if (ret) + goto err_free; + + /* Fill from the cut point to the end of the table */ + ret = pt_walk_child_range(parent_pts, cut_start_va, entry_last_va, + __build_tree, &build); + if (ret) + goto err_free; + + /* + * Avoid double flushing when building a tree privately. All the tree + * memory is initialized now so flush it before installing it. This + * thread is the exclusive owner of the item being split so we don't + * need to worry about still flushing. + */ + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) { + ret = pt_radix_start_incoherent_list( + &build.free_list, + iommu_from_common(common)->iommu_device); + if (ret) + goto err_free; + } + + if (!pt_install_table(parent_pts, virt_to_phys(table_mem), + parent_attrs)) { + /* + * This only fails if the table entry changed while we were + * building the sub tree, which would be a locking violation. + */ + WARN(true, "Locking violating during cut"); + ret = -EINVAL; + goto err_free; + } + + return 0; + +err_free: + /* + * None of the allocated memory was ever reachable outside this function + */ + if (pt_feature(common, PT_FEAT_DMA_INCOHERENT)) + pt_radix_stop_incoherent_list( + &build.free_list, + iommu_from_common(common)->iommu_device); + pt_radix_free_list(&build.free_list); + parent_pts->table_lower = NULL; + return ret; +} + +static void __replace_cut_entry(const struct pt_state *parent_pts, + struct pt_iommu_map_args *replace, + unsigned int start_index, + unsigned int end_index) +{ + struct pt_range range = + pt_range_slice(parent_pts, start_index, end_index); + struct pt_state pts = + pt_init(&range, parent_pts->level, parent_pts->table); + + if (start_index == end_index) + return; + + for_each_pt_level_item(&pts) { + unsigned int pgsize_lg2 = + pt_compute_best_pgsize(&pts, replace->oa); + + if (PT_WARN_ON(pts.type != PT_ENTRY_OA) || + PT_WARN_ON(!pgsize_lg2)) + continue; + + pt_install_leaf_entry(&pts, replace->oa, pgsize_lg2, + &replace->attrs); + pts.type = PT_ENTRY_OA; + replace->oa += log2_to_int(pgsize_lg2); + } +} + +/* + * This is a little more complicated than just clearing a contig bit because + * some formats have multi-size contigs and we still want to use best page sizes + * for each half of the cut. So we remap over the current values with new + * correctly sized entries. + */ +static void replace_cut_entry(const struct pt_state *parent_pts, + const struct pt_write_attrs *parent_attrs) +{ + struct pt_iommu_map_args replace = { + .attrs = *parent_attrs, + .oa = pt_entry_oa(parent_pts), + }; + unsigned int start_index = log2_set_mod( + parent_pts->index, 0, pt_entry_num_contig_lg2(parent_pts)); + unsigned int cut_index = parent_pts->index; + unsigned int last_index = log2_set_mod( + parent_pts->index, + log2_to_int(pt_entry_num_contig_lg2(parent_pts)) - 1, + pt_entry_num_contig_lg2(parent_pts)); + + pt_attr_from_entry(parent_pts, &replace.attrs); + + if (!log2_mod(parent_pts->range->va, pt_table_item_lg2sz(parent_pts))) { + /* The cut start at an item boundary */ + __replace_cut_entry(parent_pts, &replace, start_index, + cut_index); + __replace_cut_entry(parent_pts, &replace, cut_index, + last_index + 1); + } else { + /* cut_index will be replaced by a table */ + if (start_index != cut_index) + __replace_cut_entry(parent_pts, &replace, start_index, + cut_index - 1); + __replace_cut_entry(parent_pts, &replace, cut_index, + cut_index + 1); + if (cut_index != last_index) + __replace_cut_entry(parent_pts, &replace, cut_index + 1, + last_index + 1); + } +} + +static int __cut_mapping(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table) +{ + struct iommu_write_log wlog __cleanup(done_writes) = { .range = range }; + struct pt_state pts = pt_init(range, level, table); + const struct pt_write_attrs *cut_attrs = arg; + +again: + switch (pt_load_single_entry(&pts)) { + case PT_ENTRY_EMPTY: + return -ENOENT; + case PT_ENTRY_TABLE: + return pt_descend(&pts, arg, __cut_mapping); + case PT_ENTRY_OA: { + /* This entry's OA starts at the cut point, all done */ + if (!log2_mod(range->va, pt_entry_oa_lg2sz(&pts))) + return 0; + + record_write(&wlog, &pts, pt_entry_num_contig_lg2(&pts)); + + /* This is a contiguous entry, split it down */ + if (pt_entry_num_contig_lg2(&pts) != ilog2(1)) { + if (!pts_feature(&pts, PT_FEAT_OA_SIZE_CHANGE)) + return -EOPNOTSUPP; + replace_cut_entry(&pts, cut_attrs); + goto again; + } + + /* + * Need to replace an OA with a table. The new table will map + * the same OA as the table item, just with smaller granularity. + */ + return replace_cut_table(&pts, cut_attrs); + } + } + return -ENOENT; +} + +/* + * FIXME this is currently incompatible with active dirty tracking as we + * don't take care to capture or propagate the dirty bits during the mutation. + */ +static int NS(cut_mapping)(struct pt_iommu *iommu_table, dma_addr_t cut_iova, + gfp_t gfp) +{ + struct pt_write_attrs cut_attrs = { + .gfp = gfp, + }; + struct pt_range range; + int ret; + + ret = make_range(common_from_iommu(iommu_table), &range, cut_iova, 1); + if (ret) + return ret; + + return pt_walk_range(&range, __cut_mapping, &cut_attrs); +} + /* * Check that the items in a contiguous block are all empty. This will * recursively check any tables in the block to validate they are empty and @@ -624,6 +886,7 @@ static const struct pt_iommu_ops NS(ops) = { .map_pages = NS(map_pages), .unmap_pages = NS(unmap_pages), .iova_to_phys = NS(iova_to_phys), + .cut_mapping = NS(cut_mapping), .get_info = NS(get_info), .deinit = NS(deinit), }; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index 88e45d21dd21c4..d83f293209fa77 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -113,6 +113,35 @@ struct pt_iommu_ops { dma_addr_t len, struct iommu_iotlb_gather *iotlb_gather); + /** + * cut_mapping() - Split a mapping + * @iommu_table: Table to manipulate + * @iova: IO virtual address to cut at + * @gfp: GFP flags for any memory allocations + * + * If map was used on [iova_a, iova_b] then unmap must be used on the + * same interval. When called twice this is useful to unmap a portion of + * a larger mapping. + * + * cut_mapping() changes the page table so that umap of both: + * [iova_a, iova_c - 1] + * [iova_c, iova_b] + * will work. + * + * In practice this is done by breaking up large pages into smaller + * pages so that no large page crosses iova_c. + * + * cut_mapping() works to ensure all page sizes that don't cross the cut + * remain at the optimal sizes. + * + * Context: The caller must hold a write range lock that includes the + * entire range used with the map that contains iova. + * + * Returns: -ERRNO on failure, 0 on success. + */ + int (*cut_mapping)(struct pt_iommu *iommu_table, dma_addr_t cut_iova, + gfp_t gfp); + /** * iova_to_phys() - Return the output address for the given IOVA * @iommu_table: Table to query From patchwork Thu Aug 15 15:11:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764913 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26D1319DF9C for ; Thu, 15 Aug 2024 15:11:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734705; cv=fail; b=ck1l857myTo1si2RXy+Yg2hKpkJZEMn9autTjB8mjIfKDm7IBzNzradJrMX7PXi8nOSMZPbqX2jUu9H5mRZXzLktsqgvq+Su5aDJewAPRu1VSPjIQKKTNTh8vnjpamMK9GCaDVYYp4lAfvPs2UHkNitH2suouC8LDP4dNvUKGSg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734705; c=relaxed/simple; bh=6wBlfx6SyuNxYG5csajEvknGLVOQeAH6aakFu4Ff2So=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=J1CjP744xb9/9/sXC0PvcXOcHrEw3IcV+oTJwROrqrYIO5FQOj8lfe5jhlw566n6Wjup6ledp4z5545IVz7j6srU9gvzVJiPhUOq/wFofmg5z/79SbNNczKjuEqtZNTqMp3uVTE5BA1eHA/JhEmVQUkcIbk6zTa+ARRUOTMtuqk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=PDDTmYK7; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="PDDTmYK7" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=aHaKy78KbNTc0SAtOlDhX/zR6XSuD8DyymaN6LTXhHksRIuDR/iWAgUgo9MXfpr5zBfi9VcIxyKB+L8KhxXRWNUGvsJchnrdc7eP7PIeYNjvMPYm3POq0GaUE4Bc6+4i9hd+IEvM9f0Dbj9wQWOci191fTHk2DHaDTW6M37Hoh/tdgzRZhVeB1bZcQXjBZAK3gS7xFbHZK1aOdeq1Pzg8BX/ya43JnhXQtIc8Lu1ysYh8iWngrsAtcAsOBPXAHJxIN8KZQwVPIV+5qmngXZNoMyNwHiWFSyyKVdws3Xfpy68SPLar/AftEBsPOUkySPAe+3DJrXWoO0eGjDkVmP7jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cT5OYTyoXYx5X/x38jJiW1W79T0WeY2Tg2D25rF5DZI=; b=gqSbBNBZ0qLxXxCPhNZUyeEwcpafirGWgcCYpjMYodStmRHt/LEQ0titBlHfFjtMqa2yV7H/9vZqDiWBwHiwBpYk25p9wiUl8JeYk8dsuvKa5gYmh0Bc8O+bYgD4iGt0r2650RWZNYzemn+e9BxRCMbUyoYlOUmGfIp6rHlLn/BS9xTrDYmdzMvG5R9X04fqck7HSbxTK3xfoXLJKOA5HDAo3YBz3aJo6/K9SZYd7Sq6eKNh++iM15vqtd6aLT1Fj5kBXrVyigUtnvhxTi/t0zJwQctYV9qCjFHDzbdrBgbdbwshCJtXNZVqruBrWV1y80rkbcesu4KPK+aV1eIxnA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cT5OYTyoXYx5X/x38jJiW1W79T0WeY2Tg2D25rF5DZI=; b=PDDTmYK7316KTvAwV7FLzV8k3mZmXVLDVlofUDnukLePag6A2/aXe4cPJcU8SP23eCGV/y/CEBtPl4xszH6YY6q1nB89vXoDDkjInlETMtRRnuroM/glx0t5ZTP/JOQjDseOrxH1BJ1TsAaWBFPOJhZL6eafgKQyF0pAYsYbTUzdT0EatlE2XjNe/DzYegcyDhK8p4ifwJysfBaGhnuylETndQKqqTQfKBH8S71/LCJOEhICa+ogw97QdOGRbqkYMTSvXUfGieK6oCXSFeZnzVApV5P9Y6Lqvch3fjfgsr/KKsT8KbwDMgbBbijFHf8mlm2GuuEJFfDOkb8qi2q3tA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:35 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:35 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 08/16] iommupt: Add read_and_clear_dirty op Date: Thu, 15 Aug 2024 12:11:24 -0300 Message-ID: <8-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: MN2PR05CA0049.namprd05.prod.outlook.com (2603:10b6:208:236::18) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 6e4e0405-7df8-4d98-055f-08dcbd3c8d29 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: pos61fFPSFsNDtfDkVWGj5nL47JRth8Vp8W1ggD6VTclyqXVb+KbWbu5eRtkmhI4kqzX7aSkcGurLR6l9cWwbvNXNMZZO8624PunO4uta0N2zt+1MtoY09EY+dfQpNqtBrqlAw8Dgs4PoXSSdwJJJffB4Iiivc0uJ0O7zfeExnggrCBsccO029EHELZhYO1qkfUB0nL2u7ABt3uv139U1g+f6ex7/I6NAD92yN70uCG2v+wumurXM3GoHkRoqZpcPa6GmdUc6WZ5mshCz1bjzrRsplpnCN7pi9EX3G0+gFp46V0wkSXVw5yiJdJSdasZxewdPk6BbN0crlzpS+PtdrXfXoo1sqi0VIikehBBJ7uQc5eKVuXPzX6AurpQBtPDFJzj0xAgDNOND0N89UMBCABCEtPJDM7iP8TUIp4K56VS5uSfY6Dl9bW+jBNWeRaU1u748Emx0wUSBhFYqwPH98fOL6vmp1t+EuQsH4xdvq8yf1duJr5yiUC27HcVVbmP9LKQxQxMnuxAAYUWw3bL19nFKLFl0e6zNQrvIz9FCBVxFYmbV5rM8SD2pfa6cMoTufh5DtnMCLzdwsquAI0e5kvxtVTEQFi0f/uIKkZoIAnb6ImaNfwUyMcUduPBwYsKIFq+jXAy1JDI8Mz2XqnLtrGJIaPv7zfFuStplyTtGDXKaPs5D+E9sx74sg+87H+0GIp8A6jnXzsnOj6628EuAeRwMTkoNY1Pf/H/FbLo+UIMLJLy1KdCG5UtiNE6RRpar/yqa53sbt7jk99DpslRP88Tf+kcDCStaxFmlwBu5qHYr5pMwPbwhCbNV59qz2It1i8hR4d03gVvog5KtJAwCDH964Etwv/m5ow9vzwR31byG9ATvDRVbDkyc+rZwDx3iIIsn18z2mdYyF7xvLk63Wj7xIXyyjwWedB4hgPYHSq50ViTArvLbTjPxR8D/hz9pI+aCd3AbGbFCZXH0HhR7XKCrxW5ErbCmZVUzGTe5DYPSkIzWUVsC5EzrEyaKcNx+ZBmO8V65eXTITOcv1l8JByuECl6QT5Iym8Xy9VCYU51CP+Mgc5TxyLbGAlp79Lj5aSVbwyGifZMttxUjC8Fkkf7EY3D+loSWEahHWl5UhfIlGXlD0Paqeoehbu3f3zo2R7ohgzcI3T+uE2z7hfsJU5/1+BV09okcv7zJNa7IUParFOr1Y4fZLRY1O+3D5JUmkCyZ64w9q6/reo9nTumTx9wU+DRkwvra/jHlfJzIdvdpL3QJeBwYOYICnIVhcrY3KpKtVFoR5rVXOnsKNf/3/iNqAZe2QAnRl087je/3DHMnvHUHOfIf0/O6eMi6b8tkyTu8FRFZvJM5avTtmn+cQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Ir7ZpkrRUOqHNZea6VR2AyfWPQHG1IwReI25VUlwWUB0fD/XlSQS7UmLhTzSCpocQeysA6cn9fQ3KnJgrHHMNfu05EdxRI5uspOOkLzkoGwG+zbaW6dnCprQzhtdjOxicIQo3u08QnXHgfK4FtteF7vD6uY/134c5Isq0RQjs1p8IWfwOkzbKHfQzaoRw4d2717zXYDIXKGQKEXjIqHXUactvrAF3zDihyoGtU/rTabSPd0uFOfolXZTVK5RhRB/mRDGupqy4OVXBcs5ORftOwgvQtryPxlHhv0ObnWhfqqplLkWHcKojrI1S/nXe0JeBLi2dY0GIMSDO4VrDdGv2dvmIez0BABxccrhHV/sokbm4MbTvC7SYOK62PjF1eXbM4wyLkCSL/M8gQVdbbAWOhY0Kg2IMvKCQS+uiWVwYG6KDacrRZr+WYvtH/xi2/34AthRw1JySsBp10dtlSA+dGsdn4eAdahopvsO3C4G3JssSpMqnFJA85mxN1JMtBQ0LHOGbZnb/mW0oWgJxBV8jnRxEdRkCq4GSjmVn02mRjoGUxP27/5Tq2oEeSxw60W0KRw4Cn0Nof7sxnEI3Q7BEGlQK93qhsvlBD7Wshu/CUbcXChAn50pMgUL4FCCstEsMSa1IX+ZQcVqF0wFBX7rkWKTwCjdUWSlwiHNmCq/NdszcnjHdghWQ3C2OvAJlRjVFKEaAgrFnBwC6QhYxHH5iWjIWGDEO1ESg6iIxPF+7HNrdSXkljQrI9YJV+1mMfSNEwZ/hZr7Vq39l98CWPfqbgeadVxhhrNHN5rROpABD36qO9PZVR+uhmBL8KyFWhdj3RlhhUS3rwRBvO4jWsqrVrlxgIjSKo6lOLGnoUQespTLXjOIGd4CjJe2EixDG55eRj9dAJm/T3waMPZI2KvEdZ7L9VdhEmnl9SoX7id00FIH/tf0l8NgMqHil1d11LpDq2yMh4r5pQeJ3eRUlVhuAOvsZI1ErBYXzmOxl+5cSUGrTbU7KIivsiBtoHf1bysIPsjgGhuxycl37mbLum/j9C0jJP+k/swHBB+qEblNyMcdKVyDdJwz6GckSvstIlzjnP3KJ/keTshzT5nMoHYE+NhQ84tANwxYkbZ3eFjqrG7EGD+QDouAcmZlaG6fdM6XbKLRU7XYZI4sjn3LAjt9K5xTKitxbjIj7ZnmwGO26k6cRdZUSM+SJJ4Pmmyiy4wFewYInWLe21+gs+b/3v4drS9XzSAVoirkeEJ8piD324sABJbhfP5+37HRATAb6vPKGl8NG7IzhFCuv1Ygd9GzYlPZ/eT//KkYL5a88E8o4cbCrGZRU70EsZGjBENSNeNC0yq49BW5w1Ml7aIWAC6H73UPe5ZLMTYfZVS0d7vZXw8ckzgH+j7U+hPOzDJnv5tQoB8LBVQIvE6D/ixPOWrI9JtxnkxGBRY+DLfYqpvZd5SlihBRIZi0b9gKht8q73gz7sUjGafYD7mlWGmIamUHzDAdTkShla+pDb+uwu02BU45izip5glnbua/6O3BL8voRUHLtGwvWL9K71WYQjlCjvISPgRh3fAcMQKneZQ2law= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6e4e0405-7df8-4d98-055f-08dcbd3c8d29 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.4977 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: I7X6qejelVoqfoseLkmQuK1oXVBfGoK8hA/OonVnJvvblp49TVse+P18MA3cR990 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 IOMMU HW now supports updating a dirty bit in an entry when a DMA writes to the entry's VA range. iommufd has a uAPI to read and clear the dirty bits from the tables. This is a trivial recrusive descent algorithm unwound into a function call waterfall. The format needs a function to tell if a contiguous entry is dirty, and a function to clear a contiguous entry back to clean. FIXME: needs kunit testing Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/iommu_pt.h | 63 +++++++++++++++++++++++++++++ include/linux/generic_pt/iommu.h | 22 ++++++++++ 2 files changed, 85 insertions(+) diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h index 4fccdcd58d4ba6..79b0ecbdc1adf6 100644 --- a/drivers/iommu/generic_pt/iommu_pt.h +++ b/drivers/iommu/generic_pt/iommu_pt.h @@ -130,6 +130,64 @@ static phys_addr_t NS(iova_to_phys)(struct pt_iommu *iommu_table, return res; } +struct pt_iommu_dirty_args { + struct iommu_dirty_bitmap *dirty; + unsigned int flags; +}; + +/* FIXME this is a bit big on formats with contig.. */ +static __always_inline int +__do_read_and_clear_dirty(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table, pt_level_fn_t descend_fn) +{ + struct pt_state pts = pt_init(range, level, table); + struct pt_iommu_dirty_args *dirty = arg; + + for_each_pt_level_item(&pts) { + if (pts.type == PT_ENTRY_TABLE) + return pt_descend(&pts, arg, descend_fn); + if (pts.type == PT_ENTRY_EMPTY) + continue; + + if (!pt_entry_write_is_dirty(&pts)) + continue; + + /* FIXME we should probably do our own gathering? */ + iommu_dirty_bitmap_record(dirty->dirty, range->va, + log2_to_int(pt_entry_oa_lg2sz(&pts))); + if (!(dirty->flags & IOMMU_DIRTY_NO_CLEAR)) { + /* + * No write log required because DMA incoherence and + * atomic dirty tracking bits can't work together + */ + pt_entry_set_write_clean(&pts); + } + break; + } + return 0; +} +PT_MAKE_LEVELS(__read_and_clear_dirty, __do_read_and_clear_dirty); + +static int __maybe_unused NS(read_and_clear_dirty)( + struct pt_iommu *iommu_table, dma_addr_t iova, dma_addr_t len, + unsigned long flags, struct iommu_dirty_bitmap *dirty_bitmap) +{ + struct pt_iommu_dirty_args dirty = { + .dirty = dirty_bitmap, + .flags = flags, + }; + struct pt_range range; + int ret; + + ret = make_range(common_from_iommu(iommu_table), &range, iova, len); + if (ret) + return ret; + + ret = pt_walk_range(&range, __read_and_clear_dirty, &dirty); + PT_WARN_ON(ret); + return ret; +} + struct pt_iommu_collect_args { struct pt_radix_list_head free_list; u8 ignore_mapped : 1; @@ -887,6 +945,9 @@ static const struct pt_iommu_ops NS(ops) = { .unmap_pages = NS(unmap_pages), .iova_to_phys = NS(iova_to_phys), .cut_mapping = NS(cut_mapping), +#if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) && defined(pt_entry_write_is_dirty) + .read_and_clear_dirty = NS(read_and_clear_dirty), +#endif .get_info = NS(get_info), .deinit = NS(deinit), }; @@ -963,5 +1024,7 @@ EXPORT_SYMBOL_NS_GPL(pt_iommu_init, GENERIC_PT_IOMMU); MODULE_LICENSE("GPL"); MODULE_IMPORT_NS(GENERIC_PT); +/* For iommu_dirty_bitmap_record() */ +MODULE_IMPORT_NS(IOMMUFD); #endif diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index d83f293209fa77..f77f6aef3f5958 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -10,6 +10,7 @@ struct iommu_iotlb_gather; struct pt_iommu_ops; +struct iommu_dirty_bitmap; /** * DOC: IOMMU Radix Page Table @@ -158,6 +159,27 @@ struct pt_iommu_ops { phys_addr_t (*iova_to_phys)(struct pt_iommu *iommu_table, dma_addr_t iova); + /** + * read_and_clear_dirty() - Manipulate the HW set write dirty state + * @iommu_table: Table to manipulate + * @iova: IO virtual address to start + * @size: Length of the IOVA + * @flags: A bitmap of IOMMU_DIRTY_NO_CLEAR + * + * Iterate over all the entries in the mapped range and record their + * write dirty status in iommu_dirty_bitmap. If IOMMU_DIRTY_NO_CLEAR is + * not specified then the entries will be left dirty, otherwise they are + * returned to being not write dirty. + * + * Context: The caller must hold a read range lock that includes @iova. + * + * Returns: -ERRNO on failure, 0 on success. + */ + int (*read_and_clear_dirty)(struct pt_iommu *iommu_table, + dma_addr_t iova, dma_addr_t len, + unsigned long flags, + struct iommu_dirty_bitmap *dirty_bitmap); + /** * get_info() - Return the pt_iommu_info structure * @iommu_table: Table to query From patchwork Thu Aug 15 15:11:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764927 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2073.outbound.protection.outlook.com [40.107.92.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D070F1AB53B for ; Thu, 15 Aug 2024 15:11:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.73 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; cv=fail; b=msQiCcZ/K3mgZ5B9rtvlpiluBrpfvjwXWpLA+OVrBcWzS7CF49abTaSoTxA7d8+nvW4wP/d1outb1XfBaK+lueyBCk3+q449XAomWcqMpVERCKQmYGzuaHGnrsBATjWRFCDvdXDU7tGJP6l4d0mgtD2R8kf0CiA0GENUpqYVFsI= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; c=relaxed/simple; bh=29uTBnoGpbyKMAiHRTrEGBS9z+5H1o4932xxCdaZ8PU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=tZuuDDVdLbqMlzxX9mM7LujA/RLHb3CJa21pImhyxVC8i/zZpnQNOrGTlhYzO926JHQYPVreDGWPUbdpZ8fnGdapsIJlU07CImb7oxaUp1BfSAmPPRProX6bWr882NOr1ZUd+iKPhdo41vaJvS3kXrQ7V5lYXgOBjTO2RxWbZZY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=WbNq89Zn; arc=fail smtp.client-ip=40.107.92.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="WbNq89Zn" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=BnQMH8Ny0qG8NrQFE8w+l5cOzpDGNMMuVFOlQ2ac3BPKTbs1iLNO+uxg1Zoz9IzGdUgoMZJILeMTrJz2pziMczHIhD5ep7XV4AnN/hAORGZbVeFiu7+LMNCcDAARwVVXYAh/CrAxTGh4zT3tnxpLluFBSWI06fp76Vnpq/0BzZ9CMTph9QDdesq0WE+HTyxMf60G1cyEI7XuUKp5PTGsoQDuL2rAsN4/vD/P4f4S8wzd7XuzcJEJuN9ho1Ti/Db0mTFMQDT+pXhmRQR9o0qvCGqjVS6T28ccSxLR605eBAlhfEHCeqxYpt2Mo32sYL96dSiRWpvuQi1Man46MusALg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=H1GCFF6HFre+Ch9IAI238a7zHyLUoZvwQXrRGegK8Qw=; b=rmxtqZZnORr+bDBDdTdfAm6C7vNCQYOCaTBMXqJSRCtR01t6NEcfz15rbP6KvQhWB1M0ymK85xogwQckXmpgWIz96yVkwkIdeZDDEMB6hH9WKyiHwm7r4CO/WxpcTx/siRepc7eNug2v+zLEfDrcB/BV43M0vFPmoqXBXY5bHm3qCLRWZ3GO1+NPiBrX0rg7jnFRTxgMu0R4vVZsgBt8fLGPV+TJPSt/iGKnCFDhMk5f7/FQOBkFBZ76kz+5kUu7YgarvLeH8sBcoxBHoeWnO6iVL3hIwL6SjQs3MO7G1qCeJ8bHrcsAUnqQITSXDk8F358qeSJ0AakPyDMaMWs9Cw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=H1GCFF6HFre+Ch9IAI238a7zHyLUoZvwQXrRGegK8Qw=; b=WbNq89ZnSqHqBEq0SYfGPjnAvA9zxdPq3p8Fm2PYmIMvCQinfy7uanF7yrPP1LHe3p9+ghk+fh0uGT+Gs8+YUurg2WogbxFyU8NP6qF9r/QMb48IWlxOsKHNJ8GS/IWYMTvc74caCslu2KLRMfmU5/fafwJI8Tx6vYI3Z6QhPWEVnPBC90lIspNiA14Izofw5yMvz9Nu8aXSol96hfaKiEoAfQhssPx6RtyC37tVJATFVhLFfDnFzWBu9gMYcK5YgxbqUEFCyPEwzQaOKw2a5Pj4vOtLfvPuTogpcmggVVj5Gn/7MG8TpR2x6P0L/Qnehc95OPBjcVk70FdqULOXjQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by DS0PR12MB6631.namprd12.prod.outlook.com (2603:10b6:8:d1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Thu, 15 Aug 2024 15:11:42 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:42 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 09/16] iommupt: Add a kunit test for Generic Page Table and the IOMMU implementation Date: Thu, 15 Aug 2024 12:11:25 -0300 Message-ID: <9-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: MN2PR05CA0055.namprd05.prod.outlook.com (2603:10b6:208:236::24) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|DS0PR12MB6631:EE_ X-MS-Office365-Filtering-Correlation-Id: 5910d8ad-7cc9-4b40-ee2e-08dcbd3c8e2c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: W4h1IJ1GYFP3pWzVdBPoly1rF/y9rCHT1PFaXvykCaLp3Pe0pEdl2K8vt+ABjXmLSJw/90cG8Fpkl0Qp6GetU4FLEHoV5vxh0bPzJIKpvCCPhi5eF0mdfyOmu46NsXhdsgnmgFr0llsHyovH72wtet2v6fVww9rIYXfnLWOkT5VPrCBckWi1GPY30ygnFPUt/ym2uMIOnflzj3OyqCwZOMjXNp1r43c2XYbb21yZEM5CH8oUqQXtU7vtO3rv23eN9ZKfACXmw6ORj6uHi3LckJFDXBVmWac9a8ddNGd1e/hei0Z6yulVDC6xz1uiMF3ozOzq8jXjbkMIT3RdN9VM0IzfzsDZf5naCuqHK+ZHamYUMRdKzC84nAQaT3LrhKcKIu/ttSZggfKPD5TmUisIJ0+gvW+Kv8psbRujTn2BQm2kJGe6pMFuoE5zfncxy8ALuvD38dp+/Jn0Wm5EvnV+L5x/q2VUy8uQGDnPZCO1ZGjNyn2LqiqiBJY0Mhk06qjz2tdt/XcSusOSZMy0tXPraQpEIcpCrBuBJfwlFugBjCqZKtwJuFXANdIHys9db99hqgEaS+rtIoTAveTRb45AKjvdR4SMV7wFaqXFEaBjDcNWFeMpCyrpNdUMoonv3iOO8yC5aj0MVu9qhlLzKlLewitIQqw5RsYiCAAajkP0V6ABNPXSuTXL068XeOPw2n+qZhBaPVEpKlHIch/8Z5PXcKgi2Z6GsWGGNMmd/ryAXhT8x5v3kQWq4HWCx4XEP8qA6e8Y+K1P7VqwVEgSgUILlOka7zFbuubmiEjP+wGWL9tvXGUEbzKhnMLYdyoG7Z7TetQ2AYhcCvSf54qzswe/uLDvcIepGZB8K4N6XAUKg34WOZwazVLdRePg7UgMuRdOYf8YGdMTUMh3i1ihgPMU87VK5+mcmniy4t9ICGp5B5VZMDT9U3q/dKlJHm2wYPc803UzNEvhb5nwoyP3+UbbNK4QwOUq0aGq0qqEa3j5Xhs12kCpvkCgE8W5gvnYzORwPBILPihclt3OiLkVQCaaXVrOUQT250gw7VmJFN5b+GkRNAtIfe3SoTwp8G+eNqhyyp2ZYTZaMT7lCjxUgrFjzGSEi5OcSZp5Kk9y3xhwu527kk+efTzMyOWj4fKP2k2ghNbXEg4Yg8yphpaGtCTx94Gw0Em239jOdPFml2a2Yb+AiaB8WubvZ6gwvvU4wV9QrhkkVf+INDS30O1QdneEfTqBZapZ1c7mFVH7pgAUsSE7QQvRggPgvmFfgstakvUajpNmCz0IU03pISfs1Ld3RBSFd6lcD+L3iXV6nM+kRNUK/Idfdien+OrE8Fm2Wb+EnBPoWA0bSpPt0ptHMF6Zqg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: tnWC1zPfctccvfTkiPC/F9GLpZdnaHrGZyrmK5ZKlVCkJZw3Un3C245gHu/kHgmsNQujj9fbr6xYjHrhISoZ9qqXv15dHWXf1t7xTNFj1UsnJ25TPJ23KaExOnegm2Bl/7RPa9f4p/tXvI8rNFWRrUiX49WXlMn74pzYlGNywVYt4uPtNncpodQfRvvzd6SxSxxmmCamDHN8nM0nVLN+kY5v6fdjZI6NUwYLlYCDJzYIpHdPfKZ0LAiatyVPJp3UWs/hrgjFnAzebKfGn4RU3L88pgNZSdEY7okhsMJyPOxgK4X3LxVH309pFC/Vn8p+INxi1/+9u5xIgsIVGB7wGoW/bP6uSLMAhAgZuUG+se8b7HA6MjOUTJHwasRR7yjz3iysLGearAiSCI/W0HivOaBnRxi81qe7nauO6NC2+p2MFB4Rb/L6Eo/NXr4/vHirm62IV+KGQ0Ksu2g968fk/qCVYv4NAi3t5xAwKXToKLVJHCtuUTHTMRY7Dw71OAAylMP+klHyvTDVlxOlCMJjFuxH60JMMinYAayg/ZyoatPWXHUHRR08bJ4J8LoF0Mg33SNgPVgy0qQByPUvolCEwURJyddT95MSGowRRspmjnQmrJMxJp2oIlmro2gDygRWQENtp5kg1sb+dhdCFZ+tEO+ELbtXF/6pv/8E7UiAL4V9+ZXPTBRcYYR22vS/01fVNojmOWYL8qjTfQqcQ5/E3Z4hjSIdT9HCBK0E29sDfkUuFjZCaXCJuRgilcHfOtqSwUu3MjwexRiSpbPKBOlEXRtVxBfkWRjkkzps0w4jGfXQDxylgrJkjpcnZ9i6/K/0LQ3oDhXmj+W9WXt1pGFs9SF9ve3GtPrRzog8jYVHITH53OJsWIuG8SkIRDPm2OP6R1p8tJ0/Z5nr9NaG0APPCQ9lUN6lL2K7V/qqULoxeGznnhcjhhJCUHB9Ao4apqEyWF6/MGUldiuXURP33KloFeJbwV2V7p1u1X8y/4TqEGyS5zc3Gki1yoh+i/IE6juZLF9Trb6nMZx0B53Sy4iih4S4ubQfA48LOCrCbb9GoTnZRZGZCy9zZug0tVkUkYCpmy799/7AoTXZzt18s+gQXljsTUxQ7aqLMCldQvctn/ugjun/zHj6mX2HEg0lB8wfhWjevBqBDiAqqLGG5fk+aTuncP9oCI1UNDxn6R7X71oklqvzVb58IIasP7cVycKfwoMaSCo1r1IFAujk0yEZ7LT1mKWOrfoDhfyPOfaJgm8McXT/0NvpBwW+MYW7sPva2jaAjtPXEvzLxXDn0Bc2yOLG72prWyum8JQukVWyrkXdtvqFiaL4KgU1Q2g9BQmEI7tUN0YBMjbBceuqaVr3/whbLdAYgXrgRt1owgUoDYs87Cp41ZV0olNySEy5RATuVxbavzf5VnRz4Ssw0Y3QNCova/S1KcxQnsS5GqgvhlpUfGAHbwZAdaC2Dxl3lKXhWx+RDrHil70/qVFMr61SShDsdOL5Du3QwOiP1XkFK/2N5/5zN2VU0WnpVFwaZLUAdqr8FgudK6X+co1CPLT+Zp4nHFHXv5GLvuu+GFgj0p4= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5910d8ad-7cc9-4b40-ee2e-08dcbd3c8e2c X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.3378 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: O/1/I74sF33HJAmM8qYkXbMMAiGd1pAQjNKfoPPNAnC7vwEq5OcqZ69HBM65dA2L X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6631 This intends to have high coverage of the page table format functions and the IOMMU implementation itself, exercising the various corner cases. The kunit can be run in the kunit framework, using commands like: tools/testing/kunit/kunit.py run --build_dir build_kunit_arm64 --arch arm64 --make_options LD=ld.lld-18 --make_options 'CC=clang-18 --target=aarch64-linux-gnu' --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_uml --make_options CC=gcc-13 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_WERROR=n tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386 --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386pae --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_X86_PAE=y There are several interesting corner cases on the 32 bit platforms that need checking. FIXME: further improve the tests Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/.kunitconfig | 13 + drivers/iommu/generic_pt/Kconfig | 7 + drivers/iommu/generic_pt/Makefile | 2 + drivers/iommu/generic_pt/fmt/Makefile | 21 + drivers/iommu/generic_pt/fmt/iommu_template.h | 9 + drivers/iommu/generic_pt/kunit_generic_pt.h | 576 ++++++++++++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 105 ++++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 352 +++++++++++ 8 files changed, 1085 insertions(+) create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h diff --git a/drivers/iommu/generic_pt/.kunitconfig b/drivers/iommu/generic_pt/.kunitconfig new file mode 100644 index 00000000000000..f428cae8ce584c --- /dev/null +++ b/drivers/iommu/generic_pt/.kunitconfig @@ -0,0 +1,13 @@ +CONFIG_KUNIT=y +CONFIG_GENERIC_PT=y +CONFIG_DEBUG_GENERIC_PT=y +CONFIG_IOMMU_PT=y +CONFIG_IOMMU_PT_AMDV1=y +CONFIG_IOMMU_PT_ARMV7S=y +CONFIG_IOMMU_PT_ARMV8_4K=y +CONFIG_IOMMU_PT_ARMV8_16K=y +CONFIG_IOMMU_PT_ARMV8_64K=y +CONFIG_IOMMU_PT_DART=y +CONFIG_IOMMU_PT_VTDSS=y +CONFIG_IOMMU_PT_X86PAE=y +CONFIG_IOMMUT_PT_KUNIT_TEST=y diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index c22a55b00784d0..2c5c2bc59bf8ea 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -27,4 +27,11 @@ config IOMMU_PT default n help Generic library for building IOMMU page tables + +if IOMMU_PT +config IOMMUT_PT_KUNIT_TEST + tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS +endif endif diff --git a/drivers/iommu/generic_pt/Makefile b/drivers/iommu/generic_pt/Makefile index f7862499642237..2c9f23551b9f6f 100644 --- a/drivers/iommu/generic_pt/Makefile +++ b/drivers/iommu/generic_pt/Makefile @@ -1,4 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 +obj-y += fmt/ + iommu_pt-y := \ pt_alloc.o diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile new file mode 100644 index 00000000000000..0c35b9ae4dfb34 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0 + +IOMMU_PT_KUNIT_TEST := +define create_format +obj-$(2) += iommu_$(1).o +iommu_pt_kunit_test-y += kunit_iommu_$(1).o +CFLAGS_kunit_iommu_$(1).o += -DGENERIC_PT_KUNIT=1 +IOMMU_PT_KUNIT_TEST := iommu_pt_kunit_test.o + +endef + +$(eval $(foreach fmt,$(iommu_pt_fmt-y),$(call create_format,$(fmt),y))) +$(eval $(foreach fmt,$(iommu_pt_fmt-m),$(call create_format,$(fmt),m))) + +# The kunit objects are constructed by compiling the main source +# with -DGENERIC_PT_KUNIT +$(obj)/kunit_iommu_%.o: $(src)/iommu_%.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,cc_o_c) + +obj-$(CONFIG_IOMMUT_PT_KUNIT_TEST) += $(IOMMU_PT_KUNIT_TEST) diff --git a/drivers/iommu/generic_pt/fmt/iommu_template.h b/drivers/iommu/generic_pt/fmt/iommu_template.h index d6ca1582e11ca4..809f4ce6874591 100644 --- a/drivers/iommu/generic_pt/fmt/iommu_template.h +++ b/drivers/iommu/generic_pt/fmt/iommu_template.h @@ -34,4 +34,13 @@ #include PT_FMT_H #include "../pt_common.h" +#ifndef GENERIC_PT_KUNIT #include "../iommu_pt.h" +#else +/* + * The makefile will compile the .c file twice, once with GENERIC_PT_KUNIT set + * which means we are building the kunit modle. + */ +#include "../kunit_generic_pt.h" +#include "../kunit_iommu_pt.h" +#endif diff --git a/drivers/iommu/generic_pt/kunit_generic_pt.h b/drivers/iommu/generic_pt/kunit_generic_pt.h new file mode 100644 index 00000000000000..dad13ac4b6d14f --- /dev/null +++ b/drivers/iommu/generic_pt/kunit_generic_pt.h @@ -0,0 +1,576 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Test the format API directly. + * + */ +#include "kunit_iommu.h" +#include "pt_iter.h" + +/* FIXME */ +static void do_map(struct kunit *test, pt_vaddr_t va, pt_oaddr_t pa, + pt_vaddr_t len); + +#define KUNIT_ASSERT_PT_LOAD(kunit, pts, entry) \ + ({ \ + pt_load_entry(pts); \ + KUNIT_ASSERT_EQ(test, (pts)->type, entry); \ + }) + +struct check_levels_arg { + struct kunit *test; + void *fn_arg; + void (*fn)(struct kunit *test, struct pt_state *pts, void *arg); +}; + +static int __check_all_levels(struct pt_range *range, void *arg, + unsigned int level, struct pt_table_p *table) +{ + struct pt_state pts = pt_init(range, level, table); + struct check_levels_arg *chk = arg; + struct kunit *test = chk->test; + int ret; + + // FIXME check that the index is max + if (pt_can_have_table(&pts)) { + pt_load_single_entry(&pts); + KUNIT_ASSERT_EQ(test, pts.type, PT_ENTRY_TABLE); + ret = pt_descend(&pts, arg, __check_all_levels); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Index 0 is used by the test */ + if (IS_32BIT && !pts.index) + return 0; + KUNIT_ASSERT_NE(chk->test, pts.index, 0); + } + + /* + * A format should not create a table with only one entry, at least this + * test approach won't work. + */ + KUNIT_ASSERT_NE(chk->test, pt_num_items_lg2(&pts), ilog2(1)); + + pts.index = 0; + pt_index_to_va(&pts); + (*chk->fn)(chk->test, &pts, chk->fn_arg); + return 0; +} + +/* + * Call fn for each level in the table with a pts setup to index 0 in a table + * for that level. This allows writing tests that run on every level. + * The test can use every index in the table except the last one. + */ +static void check_all_levels(struct kunit *test, + void (*fn)(struct kunit *test, + struct pt_state *pts, void *arg), + void *fn_arg) +{ + struct kunit_iommu_priv *priv = test->priv; + struct pt_range range = pt_top_range(priv->common); + struct check_levels_arg chk = { + .test = test, + .fn = fn, + .fn_arg = fn_arg, + }; + int ret; + + /* + * Map a page at the highest VA, this will populate all the levels so we + * can then iterate over them. Index 0 will be used for testing. + */ + if (IS_32BIT && range.max_vasz_lg2 > 32) + range.last_va = (u32)range.last_va; + range.va = range.last_va - (priv->smallest_pgsz - 1); + + do_map(test, range.va, 0, priv->smallest_pgsz); + ret = pt_walk_range(&range, __check_all_levels, &chk); + KUNIT_ASSERT_EQ(test, ret, 0); +} + +static void test_init(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + + /* Fixture does the setup */ + KUNIT_ASSERT_NE(test, priv->info.pgsize_bitmap, 0); +} + +static void test_bitops(struct kunit *test) +{ + int i; + + KUNIT_ASSERT_EQ(test, log2_fls_t(u32, 0), 0); + KUNIT_ASSERT_EQ(test, log2_fls_t(u32, 1), 1); + KUNIT_ASSERT_EQ(test, log2_fls_t(u32, BIT(2)), 3); + KUNIT_ASSERT_EQ(test, log2_fls_t(u32, U32_MAX), 32); + + KUNIT_ASSERT_EQ(test, log2_fls_t(u64, 0), 0); + KUNIT_ASSERT_EQ(test, log2_fls_t(u64, 1), 1); + KUNIT_ASSERT_EQ(test, log2_fls_t(u64, BIT(2)), 3); + KUNIT_ASSERT_EQ(test, log2_fls_t(u64, U64_MAX), 64); + + KUNIT_ASSERT_EQ(test, log2_ffs_t(u32, 1), 0); + KUNIT_ASSERT_EQ(test, log2_ffs_t(u32, BIT(2)), 2); + KUNIT_ASSERT_EQ(test, log2_ffs_t(u32, BIT(31)), 31); + + KUNIT_ASSERT_EQ(test, log2_ffs_t(u64, 1), 0); + KUNIT_ASSERT_EQ(test, log2_ffs_t(u64, BIT(2)), 2); + KUNIT_ASSERT_EQ(test, log2_ffs_t(u64, BIT_ULL(63)), 63); + + for (i = 0; i != 31; i++) + KUNIT_ASSERT_EQ(test, log2_ffz_t(u64, BIT_ULL(i) - 1), i); + + for (i = 0; i != 63; i++) + KUNIT_ASSERT_EQ(test, log2_ffz_t(u64, BIT_ULL(i) - 1), i); + + for (i = 0; i != 32; i++) { + u64 val = get_random_u64(); + + KUNIT_ASSERT_EQ(test, + log2_mod_t(u32, val, log2_ffs_t(u32, val)), 0); + KUNIT_ASSERT_EQ(test, + log2_mod_t(u64, val, log2_ffs_t(u64, val)), 0); + + KUNIT_ASSERT_EQ(test, + log2_mod_t(u32, val, log2_ffz_t(u32, val)), + log2_to_max_int_t(u32, log2_ffz_t(u32, val))); + KUNIT_ASSERT_EQ(test, + log2_mod_t(u64, val, log2_ffz_t(u64, val)), + log2_to_max_int_t(u64, log2_ffz_t(u64, val))); + } +} + +static unsigned int ref_best_pgsize(pt_vaddr_t pgsz_bitmap, pt_vaddr_t va, + pt_vaddr_t last_va, pt_oaddr_t oa) +{ + pt_vaddr_t pgsz_lg2; + + /* Brute force the constraints described in __pt_compute_best_pgsize() */ + for (pgsz_lg2 = PT_VADDR_MAX_LG2 - 1; pgsz_lg2 != 0; pgsz_lg2--) { + if ((pgsz_bitmap & log2_to_int(pgsz_lg2)) && + log2_mod(va, pgsz_lg2) == 0 && + oalog2_mod(oa, pgsz_lg2) == 0 && + va + log2_to_int(pgsz_lg2) - 1 <= last_va && + log2_div_eq(va, va + log2_to_int(pgsz_lg2) - 1, pgsz_lg2) && + oalog2_div_eq(oa, oa + log2_to_int(pgsz_lg2) - 1, pgsz_lg2)) + return pgsz_lg2; + } + return 0; +} + +/* Check that the bit logic in __pt_compute_best_pgsize() works. */ +static void test_best_pgsize(struct kunit *test) +{ + unsigned int a_lg2; + unsigned int b_lg2; + unsigned int c_lg2; + + /* Try random prefixes with every suffix combination */ + for (a_lg2 = 1; a_lg2 != 10; a_lg2++) { + for (b_lg2 = 1; b_lg2 != 10; b_lg2++) { + for (c_lg2 = 1; c_lg2 != 10; c_lg2++) { + pt_vaddr_t pgsz_bitmap = get_random_u64(); + pt_vaddr_t va = get_random_u64() << a_lg2; + pt_oaddr_t oa = get_random_u64() << b_lg2; + pt_vaddr_t last_va = log2_set_mod_max( + get_random_u64(), c_lg2); + + if (va > last_va) + swap(va, last_va); + KUNIT_ASSERT_EQ(test, + __pt_compute_best_pgsize( + pgsz_bitmap, va, + last_va, oa), + ref_best_pgsize(pgsz_bitmap, va, + last_va, oa)); + } + } + } + + /* 0 prefix, every suffix */ + for (c_lg2 = 1; c_lg2 != PT_VADDR_MAX_LG2 - 1; c_lg2++) { + pt_vaddr_t pgsz_bitmap = get_random_u64(); + pt_vaddr_t va = 0; + pt_oaddr_t oa = 0; + pt_vaddr_t last_va = log2_set_mod_max(0, c_lg2); + + KUNIT_ASSERT_EQ(test, + __pt_compute_best_pgsize(pgsz_bitmap, va, + last_va, oa), + ref_best_pgsize(pgsz_bitmap, va, last_va, oa)); + } + + /* 1's prefix, every suffix */ + for (a_lg2 = 1; a_lg2 != 10; a_lg2++) { + for (b_lg2 = 1; b_lg2 != 10; b_lg2++) { + for (c_lg2 = 1; c_lg2 != 10; c_lg2++) { + pt_vaddr_t pgsz_bitmap = get_random_u64(); + pt_vaddr_t va = PT_VADDR_MAX << a_lg2; + pt_oaddr_t oa = PT_VADDR_MAX << b_lg2; + pt_vaddr_t last_va = PT_VADDR_MAX; + + KUNIT_ASSERT_EQ(test, + __pt_compute_best_pgsize( + pgsz_bitmap, va, + last_va, oa), + ref_best_pgsize(pgsz_bitmap, va, + last_va, oa)); + } + } + } + + /* pgsize_bitmap is always 0 */ + for (a_lg2 = 1; a_lg2 != 10; a_lg2++) { + for (b_lg2 = 1; b_lg2 != 10; b_lg2++) { + for (c_lg2 = 1; c_lg2 != 10; c_lg2++) { + pt_vaddr_t pgsz_bitmap = 0; + pt_vaddr_t va = get_random_u64() << a_lg2; + pt_oaddr_t oa = get_random_u64() << b_lg2; + pt_vaddr_t last_va = log2_set_mod_max( + get_random_u64(), c_lg2); + + if (va > last_va) + swap(va, last_va); + KUNIT_ASSERT_EQ(test, + __pt_compute_best_pgsize( + pgsz_bitmap, va, + last_va, oa), + 0); + } + } + } + + if (sizeof(pt_vaddr_t) <= 4) + return; + + /* over 32 bit page sizes */ + for (a_lg2 = 32; a_lg2 != 42; a_lg2++) { + for (b_lg2 = 32; b_lg2 != 42; b_lg2++) { + for (c_lg2 = 32; c_lg2 != 42; c_lg2++) { + pt_vaddr_t pgsz_bitmap = get_random_u64(); + pt_vaddr_t va = get_random_u64() << a_lg2; + pt_oaddr_t oa = get_random_u64() << b_lg2; + pt_vaddr_t last_va = log2_set_mod_max( + get_random_u64(), c_lg2); + + if (va > last_va) + swap(va, last_va); + KUNIT_ASSERT_EQ(test, + __pt_compute_best_pgsize( + pgsz_bitmap, va, + last_va, oa), + ref_best_pgsize(pgsz_bitmap, va, + last_va, oa)); + } + } + } +} + +/* + * Check that pt_install_table() and pt_table_pa() match + */ +static void test_lvl_table_ptr(struct kunit *test, struct pt_state *pts, + void *arg) +{ + struct kunit_iommu_priv *priv = test->priv; + pt_oaddr_t paddr = + log2_set_mod(priv->test_oa, 0, priv->smallest_pgsz_lg2); + struct pt_write_attrs attrs = {}; + + if (!pt_can_have_table(pts)) + return; + + KUNIT_ASSERT_NO_ERRNO_FN(test, "pt_iommu_set_prot", + pt_iommu_set_prot(pts->range->common, &attrs, + IOMMU_READ)); + + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_EMPTY); + + KUNIT_ASSERT_TRUE(test, pt_install_table(pts, paddr, &attrs)); + + /* + * A second install should fail because install does not update + * pts->entry. So the expected entry is empty but the above installed, + * this we must fail with a cmxchg collision. + */ + KUNIT_ASSERT_EQ(test, pt_install_table(pts, paddr, &attrs), false); + + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_TABLE); + KUNIT_ASSERT_EQ(test, pt_table_pa(pts), paddr); + + pt_clear_entry(pts, ilog2(1)); + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_EMPTY); +} + +static void test_table_ptr(struct kunit *test) +{ + check_all_levels(test, test_lvl_table_ptr, NULL); +} + +struct lvl_radix_arg { + pt_vaddr_t vbits; +}; + +/* + * Check pt_num_items_lg2(), pt_table_item_lg2sz(), and amdv1pt_va_lg2sz() + * they need to decode a continuous list of VA across all the levels that + * covers the entire advertised VA space. + */ +static void test_lvl_radix(struct kunit *test, struct pt_state *pts, void *arg) +{ + unsigned int table_lg2sz = pt_table_oa_lg2sz(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + struct lvl_radix_arg *radix = arg; + + /* Every bit below us is decoded */ + KUNIT_ASSERT_EQ(test, log2_set_mod_max(0, isz_lg2), radix->vbits); + + /* We are not decoding bits someone else is */ + KUNIT_ASSERT_EQ(test, log2_div(radix->vbits, isz_lg2), 0); + + /* Can't decode past the pt_vaddr_t size */ + KUNIT_ASSERT_LE(test, table_lg2sz, PT_VADDR_MAX_LG2); + + radix->vbits = fvalog2_set_mod_max(0, table_lg2sz); +} + +static void test_table_radix(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + struct lvl_radix_arg radix = { .vbits = priv->smallest_pgsz - 1 }; + struct pt_range range = pt_top_range(priv->common); + + check_all_levels(test, test_lvl_radix, &radix); + + if (range.max_vasz_lg2 == PT_VADDR_MAX_LG2) { + KUNIT_ASSERT_EQ(test, radix.vbits, PT_VADDR_MAX); + } else { + if (!IS_32BIT) + KUNIT_ASSERT_EQ(test, + log2_set_mod_max(0, range.max_vasz_lg2), + radix.vbits); + KUNIT_ASSERT_EQ(test, log2_div(radix.vbits, range.max_vasz_lg2), + 0); + } +} + +static void test_lvl_possible_sizes(struct kunit *test, struct pt_state *pts, + void *arg) +{ + unsigned int num_entries_lg2 = pt_num_items_lg2(pts); + pt_vaddr_t pgsize_bitmap = pt_possible_sizes(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + + if (!pt_can_have_leaf(pts)) { + KUNIT_ASSERT_EQ(test, pgsize_bitmap, 0); + return; + } + + /* No bits for sizes that would be outside this table */ + KUNIT_ASSERT_EQ(test, log2_mod(pgsize_bitmap, isz_lg2), 0); + if (num_entries_lg2 + isz_lg2 != PT_VADDR_MAX_LG2) + KUNIT_ASSERT_EQ( + test, + log2_div(pgsize_bitmap, num_entries_lg2 + isz_lg2), 0); + + /* Non contiguous must be supported */ + KUNIT_ASSERT_TRUE(test, pgsize_bitmap & log2_to_int(isz_lg2)); + + /* A contiguous entry should not span the whole table */ + if (num_entries_lg2 + isz_lg2 != PT_VADDR_MAX_LG2) + KUNIT_ASSERT_FALSE( + test, + pgsize_bitmap & log2_to_int(num_entries_lg2 + isz_lg2)); +} + +static void test_entry_possible_sizes(struct kunit *test) +{ + check_all_levels(test, test_lvl_possible_sizes, NULL); +} + +static void sweep_all_pgsizes(struct kunit *test, struct pt_state *pts, + struct pt_write_attrs *attrs, + pt_oaddr_t test_oaddr) +{ + pt_vaddr_t pgsize_bitmap = pt_possible_sizes(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + unsigned int len_lg2; + + for (len_lg2 = 0; len_lg2 < PT_VADDR_MAX_LG2 - 1; len_lg2++) { + struct pt_state sub_pts = *pts; + pt_oaddr_t oaddr; + + if (!(pgsize_bitmap & log2_to_int(len_lg2))) + continue; + + oaddr = log2_set_mod(test_oaddr, 0, len_lg2); + pt_install_leaf_entry(pts, oaddr, len_lg2, attrs); + /* Verify that every contiguous item translates correctly */ + for (sub_pts.index = 0; + sub_pts.index != log2_to_int(len_lg2 - isz_lg2); + sub_pts.index++) { + KUNIT_ASSERT_PT_LOAD(test, &sub_pts, PT_ENTRY_OA); + KUNIT_ASSERT_EQ(test, pt_item_oa(&sub_pts), + oaddr + sub_pts.index * + oalog2_mul(1, isz_lg2)); + KUNIT_ASSERT_EQ(test, pt_entry_oa(&sub_pts), oaddr); + KUNIT_ASSERT_EQ(test, pt_entry_num_contig_lg2(&sub_pts), + len_lg2 - isz_lg2); + } + + pt_clear_entry(pts, len_lg2 - isz_lg2); + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_EMPTY); + } +} + +/* + * Check that pt_install_leaf_entry() and pt_entry_oa() match. + * Check that pt_clear_entry() works. + */ +static void test_lvl_entry_oa(struct kunit *test, struct pt_state *pts, + void *arg) +{ + unsigned int max_oa_lg2 = pts->range->common->max_oasz_lg2; + struct kunit_iommu_priv *priv = test->priv; + struct pt_write_attrs attrs = {}; + + if (!pt_can_have_leaf(pts)) + return; + + KUNIT_ASSERT_NO_ERRNO_FN(test, "pt_iommu_set_prot", + pt_iommu_set_prot(pts->range->common, &attrs, + IOMMU_READ)); + + sweep_all_pgsizes(test, pts, &attrs, priv->test_oa); + + /* Check that the table can store the boundary OAs */ + sweep_all_pgsizes(test, pts, &attrs, 0); + if (max_oa_lg2 == PT_OADDR_MAX_LG2) + sweep_all_pgsizes(test, pts, &attrs, PT_OADDR_MAX); + else + sweep_all_pgsizes(test, pts, &attrs, + oalog2_to_max_int(max_oa_lg2)); +} + +static void test_entry_oa(struct kunit *test) +{ + check_all_levels(test, test_lvl_entry_oa, NULL); +} + +/* Test pt_attr_from_entry() */ +static void test_lvl_attr_from_entry(struct kunit *test, struct pt_state *pts, + void *arg) +{ + pt_vaddr_t pgsize_bitmap = pt_possible_sizes(pts); + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + struct kunit_iommu_priv *priv = test->priv; + unsigned int len_lg2; + unsigned int prot; + + if (!pt_can_have_leaf(pts)) + return; + + for (len_lg2 = 0; len_lg2 < PT_VADDR_MAX_LG2; len_lg2++) { + if (!(pgsize_bitmap & log2_to_int(len_lg2))) + continue; + for (prot = 0; prot <= (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE | + IOMMU_NOEXEC | IOMMU_MMIO); + prot++) { + pt_oaddr_t oaddr; + struct pt_write_attrs attrs = {}; + u64 good_entry; + + /* + * If the format doesn't support this combination of + * prot bits skip it + */ + if (pt_iommu_set_prot(pts->range->common, &attrs, + prot)) { + /* But RW has to be supported */ + KUNIT_ASSERT_NE(test, prot, + IOMMU_READ | IOMMU_WRITE); + continue; + } + + oaddr = log2_set_mod(priv->test_oa, 0, len_lg2); + pt_install_leaf_entry(pts, oaddr, len_lg2, &attrs); + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_OA); + + good_entry = pts->entry; + + memset(&attrs, 0, sizeof(attrs)); + pt_attr_from_entry(pts, &attrs); + + pt_clear_entry(pts, len_lg2 - isz_lg2); + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_EMPTY); + + pt_install_leaf_entry(pts, oaddr, len_lg2, &attrs); + KUNIT_ASSERT_PT_LOAD(test, pts, PT_ENTRY_OA); + + /* + * The descriptor produced by pt_attr_from_entry() + * produce an identical entry value when re-written + */ + KUNIT_ASSERT_EQ(test, good_entry, pts->entry); + + pt_clear_entry(pts, len_lg2 - isz_lg2); + } + } +} + +static void test_attr_from_entry(struct kunit *test) +{ + check_all_levels(test, test_lvl_attr_from_entry, NULL); +} + +/* FIXME possible sizes can not return values outside the OA mask? */ + +static struct kunit_case generic_pt_test_cases[] = { + KUNIT_CASE(test_init), + KUNIT_CASE(test_bitops), + KUNIT_CASE(test_best_pgsize), + KUNIT_CASE(test_table_ptr), + KUNIT_CASE(test_table_radix), + KUNIT_CASE(test_entry_possible_sizes), + KUNIT_CASE(test_entry_oa), + KUNIT_CASE(test_attr_from_entry), + {}, +}; + +static int pt_kunit_generic_pt_init(struct kunit *test) +{ + struct kunit_iommu_priv *priv; + int ret; + + test->priv = priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + ret = pt_kunit_priv_init(priv); + if (ret) { + kfree(test->priv); + test->priv = NULL; + return ret; + } + return 0; +} + +static void pt_kunit_generic_pt_exit(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + + if (!test->priv) + return; + + pt_iommu_deinit(priv->iommu); + kfree(test->priv); +} + +static struct kunit_suite NS(generic_pt_suite) = { + .name = __stringify(NS(fmt_test)), + .init = pt_kunit_generic_pt_init, + .exit = pt_kunit_generic_pt_exit, + .test_cases = generic_pt_test_cases, +}; +kunit_test_suites(&NS(generic_pt_suite)); diff --git a/drivers/iommu/generic_pt/kunit_iommu.h b/drivers/iommu/generic_pt/kunit_iommu.h new file mode 100644 index 00000000000000..e0adea69596858 --- /dev/null +++ b/drivers/iommu/generic_pt/kunit_iommu.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#ifndef __GENERIC_PT_KUNIT_IOMMU_H +#define __GENERIC_PT_KUNIT_IOMMU_H + +#define GENERIC_PT_KUNIT 1 +#include +#include "pt_common.h" + +#define pt_iommu_table_cfg CONCATENATE(pt_iommu_table, _cfg) +#define pt_iommu_init CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), init) +int pt_iommu_init(struct pt_iommu_table *fmt_table, + struct pt_iommu_table_cfg *cfg, gfp_t gfp); + +#ifndef pt_kunit_setup_cfg +#define pt_kunit_setup_cfg(cfg) +#endif + +#ifndef pt_iommu_free_pgtbl_cfg +#define pt_iommu_free_pgtbl_cfg(cfg) +#endif + +#define KUNIT_ASSERT_NO_ERRNO(test, ret) \ + KUNIT_ASSERT_EQ_MSG(test, ret, 0, KUNIT_SUBSUBTEST_INDENT "errno %pe", \ + ERR_PTR(ret)) + +#define KUNIT_ASSERT_NO_ERRNO_FN(test, fn, ret) \ + KUNIT_ASSERT_EQ_MSG(test, ret, 0, \ + KUNIT_SUBSUBTEST_INDENT "errno %pe from %s", \ + ERR_PTR(ret), fn) + +/* + * When the test is run on a 32 bit system dma_addr_t can be 32 bits. This cause + * the iommu op signatures to be restricted to 32 bits. Meaning the test has to + * be mindful not to create any VA's over the 32 bit limit. Reduce the scope of + * the testing as the main purpose of checking on full 32 bit is to look for + * 32bitism in the core code. Run the test on i386 with X86_PAE=y to get the + * full coverage. + */ +#define IS_32BIT (sizeof(dma_addr_t) == 4) + +struct kunit_iommu_priv { + struct pt_iommu_table fmt_table; + struct device dummy_dev; + struct pt_iommu *iommu; + struct pt_common *common; + struct pt_iommu_table_cfg cfg; + struct pt_iommu_info info; + unsigned int smallest_pgsz_lg2; + pt_vaddr_t smallest_pgsz; + unsigned int largest_pgsz_lg2; + pt_oaddr_t test_oa; + pt_vaddr_t safe_pgsize_bitmap; +}; + +static int pt_kunit_priv_init(struct kunit_iommu_priv *priv) +{ + unsigned int va_lg2sz; + int ret; + + /* Enough so the memory allocator works */ + set_dev_node(&priv->dummy_dev, NUMA_NO_NODE); + priv->cfg.iommu_device = &priv->dummy_dev; + priv->cfg.features = PT_SUPPORTED_FEATURES & + ~BIT(PT_FEAT_DMA_INCOHERENT); + + pt_kunit_setup_cfg(&priv->cfg); + ret = pt_iommu_init(&priv->fmt_table, &priv->cfg, GFP_KERNEL); + if (ret) + return ret; + priv->iommu = &priv->fmt_table.iommu; + priv->common = common_from_iommu(&priv->fmt_table.iommu); + + priv->iommu->ops->get_info(priv->iommu, &priv->info); + + /* + * size_t is used to pass the mapping length, it can be 32 bit, truncate + * the pagesizes so we don't use large sizes. + */ + priv->info.pgsize_bitmap = (size_t)priv->info.pgsize_bitmap; + + priv->smallest_pgsz_lg2 = log2_ffs(priv->info.pgsize_bitmap); + priv->smallest_pgsz = log2_to_int(priv->smallest_pgsz_lg2); + priv->largest_pgsz_lg2 = + log2_fls((dma_addr_t)priv->info.pgsize_bitmap) - 1; + + priv->test_oa = oalog2_mod(0x74a71445deadbeef, + pt_max_output_address_lg2(priv->common)); + + /* + * We run out of VA space if the mappings get too big, make something + * smaller that can safely pass throug dma_addr_t API. + */ + va_lg2sz = priv->common->max_vasz_lg2; + if (IS_32BIT && va_lg2sz > 32) + va_lg2sz = 32; + priv->safe_pgsize_bitmap = + log2_mod(priv->info.pgsize_bitmap, va_lg2sz - 1); + + return 0; +} + +#endif diff --git a/drivers/iommu/generic_pt/kunit_iommu_pt.h b/drivers/iommu/generic_pt/kunit_iommu_pt.h new file mode 100644 index 00000000000000..047ef240d067ff --- /dev/null +++ b/drivers/iommu/generic_pt/kunit_iommu_pt.h @@ -0,0 +1,352 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "kunit_iommu.h" +#include "pt_iter.h" +#include +#include + +static unsigned int next_smallest_pgsz_lg2(struct kunit_iommu_priv *priv, + unsigned int pgsz_lg2) +{ + WARN_ON(!(priv->info.pgsize_bitmap & log2_to_int(pgsz_lg2))); + pgsz_lg2--; + for (; pgsz_lg2 > 0; pgsz_lg2--) { + if (priv->info.pgsize_bitmap & log2_to_int(pgsz_lg2)) + return pgsz_lg2; + } + WARN_ON(true); + return priv->smallest_pgsz_lg2; +} + +struct count_valids { + u64 per_size[PT_VADDR_MAX_LG2]; +}; + +static int __count_valids(struct pt_range *range, void *arg, unsigned int level, + struct pt_table_p *table) +{ + struct pt_state pts = pt_init(range, level, table); + struct count_valids *valids = arg; + + for_each_pt_level_item(&pts) { + if (pts.type == PT_ENTRY_TABLE) { + pt_descend(&pts, arg, __count_valids); + continue; + } + if (pts.type == PT_ENTRY_OA) { + valids->per_size[pt_entry_oa_lg2sz(&pts)]++; + continue; + } + } + return 0; +} + +/* + * Number of valid table entries. This counts contiguous entries as a single + * valid. + */ +static unsigned int count_valids(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + struct pt_range range = pt_top_range(priv->common); + struct count_valids valids = {}; + u64 total = 0; + unsigned int i; + + KUNIT_ASSERT_NO_ERRNO(test, + pt_walk_range(&range, __count_valids, &valids)); + + for (i = 0; i != ARRAY_SIZE(valids.per_size); i++) + total += valids.per_size[i]; + return total; +} + +/* Only a single page size is present, count the number of valid entries */ +static unsigned int count_valids_single(struct kunit *test, pt_vaddr_t pgsz) +{ + struct kunit_iommu_priv *priv = test->priv; + struct pt_range range = pt_top_range(priv->common); + struct count_valids valids = {}; + u64 total = 0; + unsigned int i; + + KUNIT_ASSERT_NO_ERRNO(test, + pt_walk_range(&range, __count_valids, &valids)); + + for (i = 0; i != ARRAY_SIZE(valids.per_size); i++) { + if ((1ULL << i) == pgsz) + total = valids.per_size[i]; + else + KUNIT_ASSERT_EQ(test, valids.per_size[i], 0); + } + return total; +} + +static void do_map(struct kunit *test, pt_vaddr_t va, pt_oaddr_t pa, + pt_vaddr_t len) +{ + struct kunit_iommu_priv *priv = test->priv; + const struct pt_iommu_ops *ops = priv->iommu->ops; + size_t mapped; + int ret; + + KUNIT_ASSERT_EQ(test, len, (size_t)len); + + /* Mapped accumulates */ + mapped = 1; + ret = ops->map_pages(priv->iommu, va, pa, len, IOMMU_READ | IOMMU_WRITE, + GFP_KERNEL, &mapped, NULL); + KUNIT_ASSERT_NO_ERRNO_FN(test, "map_pages", ret); + KUNIT_ASSERT_EQ(test, mapped, len + 1); +} + +static void do_unmap(struct kunit *test, pt_vaddr_t va, pt_vaddr_t len) +{ + struct kunit_iommu_priv *priv = test->priv; + const struct pt_iommu_ops *ops = priv->iommu->ops; + size_t ret; + + ret = ops->unmap_pages(priv->iommu, va, len, NULL); + KUNIT_ASSERT_EQ(test, ret, len); +} + +static void do_cut(struct kunit *test, pt_vaddr_t va) +{ + struct kunit_iommu_priv *priv = test->priv; + const struct pt_iommu_ops *ops = priv->iommu->ops; + size_t ret; + + ret = ops->cut_mapping(priv->iommu, va, GFP_KERNEL); + if (ret == -EOPNOTSUPP) + kunit_skip( + test, + "ops->cut_mapping not supported (enable CONFIG_DEBUG_GENERIC_PT)"); + KUNIT_ASSERT_NO_ERRNO_FN(test, "ops->cut_mapping", ret); +} + +static void check_iova(struct kunit *test, pt_vaddr_t va, pt_oaddr_t pa, + pt_vaddr_t len) +{ + struct kunit_iommu_priv *priv = test->priv; + const struct pt_iommu_ops *ops = priv->iommu->ops; + pt_vaddr_t pfn = log2_div(va, priv->smallest_pgsz_lg2); + pt_vaddr_t end_pfn = pfn + log2_div(len, priv->smallest_pgsz_lg2); + + for (; pfn != end_pfn; pfn++) { + phys_addr_t res = ops->iova_to_phys(priv->iommu, + pfn * priv->smallest_pgsz); + + KUNIT_ASSERT_EQ(test, res, (phys_addr_t)pa); + if (res != pa) + break; + pa += priv->smallest_pgsz; + } +} + +static void test_increase_level(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + struct pt_common *common = priv->common; + + if (!pt_feature(common, PT_FEAT_DYNAMIC_TOP)) + kunit_skip(test, "PT_FEAT_DYNAMIC_TOP not set for this format"); + + if (IS_32BIT) + kunit_skip(test, "Unable to test on 32bit"); + + KUNIT_ASSERT_GT(test, common->max_vasz_lg2, + pt_top_range(common).max_vasz_lg2); + + /* Add every possible level to the max */ + while (common->max_vasz_lg2 != pt_top_range(common).max_vasz_lg2) { + struct pt_range top_range = pt_top_range(common); + + if (top_range.va == 0) + do_map(test, top_range.last_va + 1, 0, + priv->smallest_pgsz); + else + do_map(test, top_range.va - priv->smallest_pgsz, 0, + priv->smallest_pgsz); + + KUNIT_ASSERT_EQ(test, pt_top_range(common).top_level, + top_range.top_level + 1); + KUNIT_ASSERT_GE(test, common->max_vasz_lg2, + pt_top_range(common).max_vasz_lg2); + } +} + +static void test_map_simple(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + struct pt_range range = pt_top_range(priv->common); + struct count_valids valids = {}; + pt_vaddr_t pgsize_bitmap = priv->safe_pgsize_bitmap; + unsigned int pgsz_lg2; + pt_vaddr_t cur_va; + + /* Map every reported page size */ + cur_va = range.va + priv->smallest_pgsz * 256; + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + pt_oaddr_t paddr = log2_set_mod(priv->test_oa, 0, pgsz_lg2); + u64 len = log2_to_int(pgsz_lg2); + + if (!(pgsize_bitmap & len)) + continue; + + cur_va = ALIGN(cur_va, len); + do_map(test, cur_va, paddr, len); + if (len <= SZ_2G) + check_iova(test, cur_va, paddr, len); + cur_va += len; + } + + /* The read interface reports that every page size was created */ + range = pt_top_range(priv->common); + KUNIT_ASSERT_NO_ERRNO(test, + pt_walk_range(&range, __count_valids, &valids)); + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + if (pgsize_bitmap & (1ULL << pgsz_lg2)) + KUNIT_ASSERT_EQ(test, valids.per_size[pgsz_lg2], 1); + else + KUNIT_ASSERT_EQ(test, valids.per_size[pgsz_lg2], 0); + } + + /* Unmap works */ + range = pt_top_range(priv->common); + cur_va = range.va + priv->smallest_pgsz * 256; + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + u64 len = log2_to_int(pgsz_lg2); + + if (!(pgsize_bitmap & len)) + continue; + cur_va = ALIGN(cur_va, len); + do_unmap(test, cur_va, len); + cur_va += len; + } + KUNIT_ASSERT_EQ(test, count_valids(test), 0); +} + +/* + * Test to convert a table pointer into an OA by mapping something small, + * unmapping it so as to leave behind a table pointer, then mapping something + * larger that will convert the table into an OA. + */ +static void test_map_table_to_oa(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + pt_vaddr_t limited_pgbitmap = + priv->info.pgsize_bitmap % (IS_32BIT ? SZ_2G : SZ_16G); + struct pt_range range = pt_top_range(priv->common); + unsigned int pgsz_lg2; + pt_vaddr_t max_pgsize; + pt_vaddr_t cur_va; + + max_pgsize = 1ULL << (fls64(limited_pgbitmap) - 1); + KUNIT_ASSERT_TRUE(test, priv->info.pgsize_bitmap & max_pgsize); + + /* FIXME pgsz_lg2 should be random order */ + /* FIXME we need to check we didn't leak memory */ + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + pt_oaddr_t paddr = log2_set_mod(priv->test_oa, 0, pgsz_lg2); + u64 len = log2_to_int(pgsz_lg2); + pt_vaddr_t offset; + + if (!(priv->info.pgsize_bitmap & len)) + continue; + if (len > max_pgsize) + break; + + cur_va = ALIGN(range.va + priv->smallest_pgsz * 256, + max_pgsize); + for (offset = 0; offset != max_pgsize; offset += len) + do_map(test, cur_va + offset, paddr + offset, len); + check_iova(test, cur_va, paddr, max_pgsize); + KUNIT_ASSERT_EQ(test, count_valids_single(test, len), + max_pgsize / len); + + do_unmap(test, cur_va, max_pgsize); + + KUNIT_ASSERT_EQ(test, count_valids(test), 0); + } +} + +static void test_cut_simple(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + pt_oaddr_t paddr = + log2_set_mod(priv->test_oa, 0, priv->largest_pgsz_lg2); + pt_vaddr_t pgsz = log2_to_int(priv->largest_pgsz_lg2); + pt_vaddr_t vaddr = pt_top_range(priv->common).va; + + if (priv->largest_pgsz_lg2 == priv->smallest_pgsz_lg2) { + kunit_skip(test, "Format has only one page size"); + return; + } + + /* Chop a big page in half */ + do_map(test, vaddr, paddr, pgsz); + KUNIT_ASSERT_EQ(test, count_valids_single(test, pgsz), 1); + do_cut(test, vaddr + pgsz / 2); + KUNIT_ASSERT_EQ(test, count_valids(test), + log2_to_int(priv->largest_pgsz_lg2 - + next_smallest_pgsz_lg2( + priv, priv->largest_pgsz_lg2))); + do_unmap(test, vaddr, pgsz / 2); + do_unmap(test, vaddr + pgsz / 2, pgsz / 2); + + /* Replace the first item with the smallest page size */ + do_map(test, vaddr, paddr, pgsz); + KUNIT_ASSERT_EQ(test, count_valids_single(test, pgsz), 1); + do_cut(test, vaddr + priv->smallest_pgsz); + do_unmap(test, vaddr, priv->smallest_pgsz); + do_unmap(test, vaddr + priv->smallest_pgsz, pgsz - priv->smallest_pgsz); +} + +static struct kunit_case iommu_test_cases[] = { + KUNIT_CASE(test_increase_level), + KUNIT_CASE(test_map_simple), + KUNIT_CASE(test_map_table_to_oa), + KUNIT_CASE(test_cut_simple), + {}, +}; + +static int pt_kunit_iommu_init(struct kunit *test) +{ + struct kunit_iommu_priv *priv; + int ret; + + test->priv = priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + ret = pt_kunit_priv_init(priv); + if (ret) { + kfree(test->priv); + test->priv = NULL; + return ret; + } + return 0; +} + +static void pt_kunit_iommu_exit(struct kunit *test) +{ + struct kunit_iommu_priv *priv = test->priv; + + if (!test->priv) + return; + + pt_iommu_deinit(priv->iommu); + kfree(test->priv); +} + +static struct kunit_suite NS(iommu_suite) = { + .name = __stringify(NS(iommu_test)), + .init = pt_kunit_iommu_init, + .exit = pt_kunit_iommu_exit, + .test_cases = iommu_test_cases, +}; +kunit_test_suites(&NS(iommu_suite)); + +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS(GENERIC_PT_IOMMU); From patchwork Thu Aug 15 15:11:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764928 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2081.outbound.protection.outlook.com [40.107.92.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B3CF1B3740 for ; Thu, 15 Aug 2024 15:11:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; cv=fail; b=Jn6jcZw11mtKgIVz4uJV77tde4xbHJW6kJM2a+Q4HDuVyPguhvcEhTvx+EkWeHdCW3YFsw38q9DFJUan+pw5vcDFPjCBZv/mbdS0k0lFdnqasnDqOqc4xCc3S12VCgjFSV2flGrvFRSADQPw0c9vtDeJnnpnoyz0iPAQv3zzAAk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734716; c=relaxed/simple; bh=CwR9hoANP7Il6EfEizLmR0uj+9y6oks3YYIFklMaO6g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=i5x4WantZqdv9VPdvAzxTIJzRJs185pOodg9AeZ57yo3xI6oqiCwC1P0kVP76Z0cKHs2q8gvu4WY53C/56y9YVq3D9UcGdxGCMYNZub3eM/SwjZ7BH2plnLJA5HufdPLdVjgQxdq+agavf/YtkJxkgwdk1hOJH0MTAPl82xJM6U= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=T5M2Luh1; arc=fail smtp.client-ip=40.107.92.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="T5M2Luh1" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yTscM4liO+R6nmvnPivwawETKgVyhRgZRhQgahehDU/bQEkhdLykjArDPFBBSVRgwfec+oL1a2MyoYWijVD3NhutN4HGw2jQA8lJPWj0EWKubbrzInlqP8Tjdv8Kl6W/am7FDHwNbBt33gnjKjU4wINQNRJntz47pN6qQhPZOBTcra6XQ8uan1wctXyCWCQOStuRSDj7CXVTcARKRrBSqOy8jlLoIpLZ0ljboPJdBLm4o0YJ6S+fMjnQZBW7fNbTd0Y35cgE59jMfwuATbui+1fR7bYP2Xge+l6hiJyI0Dlfug8e6ASZihbnrJKtOeyDgTEZivk0WXdXCtCVnCLR9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UMO+N4m33e4uKA41iVkBD8LJ8dCbYmDwvtksBzyKWAI=; b=NtHFwf63PH+tcXUO2C7bAxuOAn+Kh/U/kWRMJ5ap04LhrIblNKGuw+uqDwRetv8kBgW2no+aPUcSAnRCKWcwUBfak0f5BtkFVv3hF4SnN2EZNSv7hXYrwJWEvKJcT0iltkPvH/trTU0xmWDW/0oj4WvwdKELlB25cakZu7jk/asff4pfkF+N9001SxleeCapVszBPRIMoKEpsQJRfo0gCgvbK+L2e+gyWcTJEzHaPfrQHv+ywn/Vtsd4RK7UoAxcdOZpxwbn4nejORfcvxKHFXBsBDI3Kp3LHs7dx3HyKo+F8gbnSB1o+VZ9Q9hatzFmfUx5barusEsfEMwSCHralg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UMO+N4m33e4uKA41iVkBD8LJ8dCbYmDwvtksBzyKWAI=; b=T5M2Luh1mUhyZNSO2tXzTIkr2ck/JMOU7HZQz9sC1s5Emt9DyhcvSNLl7vkTpEXSEzL/U5IW0AUP47D8INxdw8m8CG6uhRHugmgoeKSRaDuWAQZij/bila6aFdtNarfamJF3gXoJK+yORfCInvzxuNlhaxoEoXaRW7NktkCGL++4N/gEDLbzmbtxupalvjCtMH8I0G205kSy570xSzEpP12HT//fVCI553E/pBhYCCWOND4waRgTznvgyf64Ss04kZmOwlB4rzii6Xejlj9si67L++igdJqyA3UDS/3lnt9kTNeTHYlOAQALybZ/w86wOq+2OWS8MlE2gnWLfSethA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:44 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:44 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 10/16] iommupt: Add a kunit test to compare against iopt Date: Thu, 15 Aug 2024 12:11:26 -0300 Message-ID: <10-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL1PR13CA0221.namprd13.prod.outlook.com (2603:10b6:208:2bf::16) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 4298e728-2e3e-40eb-64ca-08dcbd3c8f0e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: qBKw0ZmHC3j22Z1LC4shsphPj4WbwKntTamwvdFeJmKJtnsXC0vwC3arMV2vh2vGs1nKzR2RsM2zjw7f2iP/7DKcDuYpKrJDWvQ5FUxJiBIoMxUgUtc/3hYY1vvSwMYjVXuoOJkhTNMumsmupseercgy171ecyinbgPtpiPOnaig6w2v4aYiCqoAUrwB5+oIQtYn8eZxPN+9tuTGN5Sx/Qlc70fqSf7WX4jeRA8V+pQOaDU5ozpbJzcTGjfP+Wggp4U9Nu19E8J9c8FHFKDXMcMeTY3A65zRx2P4OyOxqXbnlfpPi2pi0Vzjj6Fha4iaMogb5ME30vuVnEKCiOJG1shXN0cY5MgiWA5r9CzudElIBRDRihOHKhOQuJtI38BJyERAqM3H5td98s6QaJXNWRrk1PCJ5lIfoPUIy4HSAJe/lAZVd4BgiP7elz367nElfDW1twBJJFZjYoone15qHlErprDyzH/04GRmY+/k4U6igpNOzATsBo9lp1tw/UkhFl7SRbphNf3Q8/6BD7mVXAYI4CG6ODWdnwD++mrGNNZlx+EXMZtbiRdfOvzPt/sQtYuMqNcFpZeouHE2yvgAZNhiWeiP64/qBR03UJpGx3mRKMZ2igNbhG+pwyVoNXMeB8nJVAhtUlLrefr6RcvzMCHPshC2yTmAB47qQY5gW9UfxVidCLU4ACiSh2Ik2HXA1ibIRCi5W6vU1CnduQ8f4/LvxfVGaciM2d/OSO415jZg/G2xKFmmecKSLs899yLLcuWaZvOt8qP6YWjwGFmN0XRnO8RbAH4hXF+6JNGkej5VjnFLPl1ZebnjFvtCK2rOgUQGUNVAojYlsIBufR3hmXnt0HnLbRqh+hq/4YQIxewLN0f0O0S1RWAnW/CUKoKagZYh6RBbpvBKvFD8uCAhBhL+eDBunluF8qrSAtpv9CGOR9tV66HFFw3kshGnEkVQUD+CAyGnVY0Cxsl6aJiGVhKNlw4FRVveTT8R10+bZ6z1YfU+9LlPtEmeqo+NA7rFvXPNt3QWkBxB1xQ8CCwC0rU4442/c/Ak2zrS6ZIKda3AW/Ey4x8lvPLkcM7X+bEPWqvp3bkWNhUXqs8eoa19oPVG0N3GqhrbFdfv2VDsA30oYJSGWksqgIjzdKfIynkVSFNeexuhdNmcYGUicu2/omeleIxqerd4GxZJ6wz2UaBj16UnXdbb8vcTLF7Be4DY1NqWvb8nZzXsSxHYzqMrDAyw/raHgzohisPV+kPqPkBPm4HUwEJzUAnBL5P4a5PdESGPP3xb3oSxm/jtjZZPu0HDa5eD2+BSO6fogvJiXzlAA/G4bjNxEtaFrhh9bjoUQH2KT5K5VJqKsWzk8Yq+UA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: aFphMUVSByhSrseQm8pRD0s3DrnFDvkPMNsTAysKnErfSpvipNVdnLgsItfnSFPFedHC+NgwNPKgrBSRcKBdT4cF98el3wAJRQhaL1FOobl5owBbNEUqmqayVEJa6C5Uchj+Ys+u69kzyUzwDAyohNMCRveuxfnFz9UxaQd/CVz2TnTbNIfT6QAffdcHRLveyPZnVRmAujbnuRianF/ee3y84czbUxq14Isjtg/PF0Gr+mO5GMkBYXnZAXRh12UofuP+17kZiirbG4JvFazgyTXAHKZv725WThuY2vmBUqT82FYxD6uImLzjaj5CNGam8UFDjQ5Lb3iBP2x6rMYj5XClAQ5nYTNj5gnIxCrxpXkm7ATjDnC1dZZgwTUYpV/q9nnodsUKsohKUP9xnB3FblGOy7azK544rng+PotEOKthg4tpqCnFIqRtMiMOUHmnQtcn27RVeSPI21wl24YT1XvVmUO8OjWg8wN3To3VO4CZEMkA80ATtBS3/K9gosh7FWI1m05vS6MlmbRhvC9/86lxnVkKlS016pxWNqVzAKWDbRriqQ3LJDw8JTQgTuMRav57+oDbMiE0c110DKOaUaEof2bzQ9D/ue0UXnH/dzBasd2gclbjLuHM9yR468BKTeEjZJDejFI325Zw6Nt1AqxU+AiBolhaCRFsobgzTmNGOVMp0lZEDuN5MTnmVTpnoJbLAcK3QjzTOsHVy5/vEYTCqmWLSMeqhtREyhVzpg2JTDtCjqt0MbOxJvFmVPMsOS8p9mru7S47fxDHG/tnMFPluRkHwNNPuzv6dHnYRm2Wrv+qFE0g8Qf0iMQV2awQn0Cq+2M2hLeHkhm8flHxQwzfVC5EK7Zj/aWbT+GhFvdIbkFC1/gxZOD0N2HmzjbZ0ANGFMhO6ofr6dvaV75NqxxSFW2PyytoKkY1MOv3LkH9m2cnVmQhQpjdaU/Y8/ihblct5BnE7c5tpy/3obf7r2VisycAAZO2Jfg8sDfE/vSTr5VK2p9UtWIfEPLItMg8Sg765Q/d/4f2U1YhZ3PJnjhwZWxgZTscGALda7WWKMWY8c64rXReHWhzEIszc+LxhANdESbhd+e6nftnXm4KGr3lgJj859XZlhtXdrHBTB2HKDtgMv5PTUkSPmGpnd935HHk7RPdcAmh0UG69UFcomYHj1L0Kv5KQKb9vqFo/1ELQCkDBJSnDfTUkvP/5PCVG19orU+hDmeomZU0SeKMcGuZQ6PD42IpCCfleAiQNH27Vv77N1qoJlMCb5B/GeqXnQuCzNalHC4ilaz4LnU8KVT5PLjcN4Q6xXQpcIJOXqsMkiHBZGFEW7W6I0whTHm78+sAhSaJ6NlEqLKylPGSbtyUXiaGU4Te6s7Mg+qy5B2je/XKhmUlxEH3XvJcpqxFb6c2iu5IRCkfCO1WWBb9lx4kGRvjXDmi/GGZY7m2m6KJWsDC/Q3Zb/qFnAtpdB7pshSbYw1v58W66hhnXC/iGHavkKrGKl18JBJBefCrk5XYmHb4e9/4JHSTYd9ycuFpftGAsyzJR1c9AHIzS60RP8v6FBjTj0dcljpTw+nUfS4= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4298e728-2e3e-40eb-64ca-08dcbd3c8f0e X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:37.7310 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: oyVI2z1T5JpnEMVflfMbYfMRmdgLkKXgMspwJeyF3Y7t6PjjYt9D88qmpr5r8e7f X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 The comparison tests checks the memory layout of the page table against the memory layout created by the io-pgtable version to ensure they are the same. This gives a high confidence aspects of the formats are working correctly. Most likely this would never be merged to the kernel, it is a useful development tool to build the formats. The compare tests for AMDv1, x86PAE and VTD SS require a bunch of hacky patches to those drivers and this kunit command: ./tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_PCI=y --kconfig_add CONFIG_AMD_IOMMU=y --kconfig_add CONFIG_INTEL_IOMMU=y --kconfig_add CONFIG_CONFIG_IOMMU_IO_PGTABLE_VTD=y Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/.kunitconfig | 10 + drivers/iommu/generic_pt/Kconfig | 1 + drivers/iommu/generic_pt/fmt/iommu_template.h | 3 + drivers/iommu/generic_pt/kunit_iommu_cmp.h | 272 ++++++++++++++++++ 4 files changed, 286 insertions(+) create mode 100644 drivers/iommu/generic_pt/kunit_iommu_cmp.h diff --git a/drivers/iommu/generic_pt/.kunitconfig b/drivers/iommu/generic_pt/.kunitconfig index f428cae8ce584c..a16ca5f72a7c5b 100644 --- a/drivers/iommu/generic_pt/.kunitconfig +++ b/drivers/iommu/generic_pt/.kunitconfig @@ -11,3 +11,13 @@ CONFIG_IOMMU_PT_DART=y CONFIG_IOMMU_PT_VTDSS=y CONFIG_IOMMU_PT_X86PAE=y CONFIG_IOMMUT_PT_KUNIT_TEST=y + +CONFIG_COMPILE_TEST=y +CONFIG_IOMMU_IO_PGTABLE_LPAE=y +CONFIG_IOMMU_IO_PGTABLE_ARMV7S=y +CONFIG_IOMMU_IO_PGTABLE_DART=y +# These are x86 specific and can't be turned on generally +# Turn them on to compare test x86pae and vtdss +#CONFIG_AMD_IOMMU=y +#CONFIG_INTEL_IOMMU=y +#CONFIG_CONFIG_IOMMU_IO_PGTABLE_VTD=y diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 2c5c2bc59bf8ea..3ac9b2324ebd98 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -31,6 +31,7 @@ config IOMMU_PT if IOMMU_PT config IOMMUT_PT_KUNIT_TEST tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS + select IOMMU_IO_PGTABLE depends on KUNIT default KUNIT_ALL_TESTS endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_template.h b/drivers/iommu/generic_pt/fmt/iommu_template.h index 809f4ce6874591..8d113cc68ec485 100644 --- a/drivers/iommu/generic_pt/fmt/iommu_template.h +++ b/drivers/iommu/generic_pt/fmt/iommu_template.h @@ -43,4 +43,7 @@ */ #include "../kunit_generic_pt.h" #include "../kunit_iommu_pt.h" +#ifdef pt_iommu_alloc_io_pgtable +#include "../kunit_iommu_cmp.h" +#endif #endif diff --git a/drivers/iommu/generic_pt/kunit_iommu_cmp.h b/drivers/iommu/generic_pt/kunit_iommu_cmp.h new file mode 100644 index 00000000000000..283b3f2b07425e --- /dev/null +++ b/drivers/iommu/generic_pt/kunit_iommu_cmp.h @@ -0,0 +1,272 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#include "kunit_iommu.h" +#include "pt_iter.h" +#include +#include + +struct kunit_iommu_cmp_priv { + /* Generic PT version */ + struct kunit_iommu_priv fmt; + + /* IO pagetable version */ + struct io_pgtable_ops *pgtbl_ops; + struct io_pgtable_cfg *fmt_memory; + struct pt_iommu_table ref_table; +}; + +struct compare_tables { + struct kunit *test; + struct pt_range ref_range; + struct pt_table_p *ref_table; +}; + +static int __compare_tables(struct pt_range *range, void *arg, + unsigned int level, struct pt_table_p *table) +{ + struct pt_state pts = pt_init(range, level, table); + struct compare_tables *cmp = arg; + struct pt_state ref_pts = + pt_init(&cmp->ref_range, level, cmp->ref_table); + struct kunit *test = cmp->test; + int ret; + + for_each_pt_level_item(&pts) { + u64 entry, ref_entry; + + cmp->ref_range.va = range->va; + ref_pts.index = pts.index; + pt_load_entry(&ref_pts); + + entry = pt_kunit_cmp_mask_entry(&pts); + ref_entry = pt_kunit_cmp_mask_entry(&ref_pts); + + /*if (entry != 0 || ref_entry != 0) + printk("Check %llx Level %u index %u ptr %px refptr %px: %llx (%llx) %llx (%llx)\n", + pts.range->va, pts.level, pts.index, + pts.table, + ref_pts.table, + pts.entry, entry, + ref_pts.entry, ref_entry);*/ + + KUNIT_ASSERT_EQ(test, pts.type, ref_pts.type); + KUNIT_ASSERT_EQ(test, entry, ref_entry); + if (entry != ref_entry) + return 0; + + if (pts.type == PT_ENTRY_TABLE) { + cmp->ref_table = ref_pts.table_lower; + ret = pt_descend(&pts, arg, __compare_tables); + if (ret) + return ret; + } + + /* Defeat contiguous entry aggregation */ + pts.type = PT_ENTRY_EMPTY; + } + + return 0; +} + +static void compare_tables(struct kunit *test) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + struct pt_range range = pt_top_range(priv->common); + struct compare_tables cmp = { + .test = test, + }; + struct pt_state pts = pt_init_top(&range); + struct pt_state ref_pts; + + pt_iommu_setup_ref_table(&cmp_priv->ref_table, cmp_priv->pgtbl_ops); + cmp.ref_range = + pt_top_range(common_from_iommu(&cmp_priv->ref_table.iommu)); + ref_pts = pt_init_top(&cmp.ref_range); + KUNIT_ASSERT_EQ(test, pts.level, ref_pts.level); + + cmp.ref_table = ref_pts.table; + KUNIT_ASSERT_EQ(test, pt_walk_range(&range, __compare_tables, &cmp), 0); +} + +static void test_cmp_init(struct kunit *test) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(cmp_priv->pgtbl_ops)->cfg; + + /* Fixture does the setup */ + KUNIT_EXPECT_NE(test, priv->info.pgsize_bitmap, 0); + + /* pt_iommu has a superset of page sizes (ARM supports contiguous) */ + KUNIT_EXPECT_EQ(test, + priv->info.pgsize_bitmap & pgtbl_cfg->pgsize_bitmap, + pgtbl_cfg->pgsize_bitmap); + + /* Empty compare works */ + compare_tables(test); +} + +static void do_cmp_map(struct kunit *test, pt_vaddr_t va, pt_oaddr_t pa, + pt_oaddr_t len, unsigned int prot) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + const struct pt_iommu_ops *ops = priv->iommu->ops; + size_t mapped; + int ret; + + /* This lacks pagination, must call with perfectly aligned everything */ + if (sizeof(unsigned long) == 8) { + KUNIT_EXPECT_EQ(test, va % len, 0); + KUNIT_EXPECT_EQ(test, pa % len, 0); + } + + mapped = 0; + ret = ops->map_pages(priv->iommu, va, pa, len, prot, GFP_KERNEL, + &mapped, NULL); + KUNIT_EXPECT_EQ(test, ret, 0); + KUNIT_EXPECT_EQ(test, mapped, len); + + mapped = 0; + ret = cmp_priv->pgtbl_ops->map_pages(cmp_priv->pgtbl_ops, va, pa, len, + 1, prot, GFP_KERNEL, &mapped); + KUNIT_EXPECT_EQ(test, ret, 0); + KUNIT_EXPECT_EQ(test, mapped, len); +} + +static void do_cmp_unmap(struct kunit *test, pt_vaddr_t va, pt_vaddr_t len) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + const struct pt_iommu_ops *ops = priv->iommu->ops; + size_t ret; + + KUNIT_EXPECT_EQ(test, va % len, 0); + + ret = ops->unmap_pages(priv->iommu, va, len, NULL); + KUNIT_EXPECT_EQ(test, ret, len); + ret = cmp_priv->pgtbl_ops->unmap_pages(cmp_priv->pgtbl_ops, va, len, 1, + NULL); + KUNIT_EXPECT_EQ(test, ret, len); +} + +static void test_cmp_one_map(struct kunit *test) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(cmp_priv->pgtbl_ops)->cfg; + const pt_oaddr_t addr = + oalog2_mod(0x74a71445deadbeef, priv->common->max_oasz_lg2); + pt_vaddr_t pgsize_bitmap = priv->safe_pgsize_bitmap & + pgtbl_cfg->pgsize_bitmap; + pt_vaddr_t cur_va; + unsigned int prot = 0; + unsigned int pgsz_lg2; + + /* + * Check that every prot combination at every page size level generates + * the same data in page table. + */ + for (prot = 0; prot <= (IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE | + IOMMU_NOEXEC | IOMMU_MMIO); + prot++) { + /* Page tables usually cannot represent inaccessible memory */ + if (!(prot & (IOMMU_READ | IOMMU_WRITE))) + continue; + + /* Try every supported page size */ + cur_va = priv->smallest_pgsz * 256; + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + pt_vaddr_t len = log2_to_int(pgsz_lg2); + + if (!(pgsize_bitmap & len)) + continue; + + cur_va = ALIGN(cur_va, len); + do_cmp_map(test, cur_va, + oalog2_set_mod(addr, 0, pgsz_lg2), len, + prot); + compare_tables(test); + cur_va += len; + } + + cur_va = priv->smallest_pgsz * 256; + for (pgsz_lg2 = 0; pgsz_lg2 != PT_VADDR_MAX_LG2; pgsz_lg2++) { + pt_vaddr_t len = log2_to_int(pgsz_lg2); + + if (!(pgsize_bitmap & len)) + continue; + + cur_va = ALIGN(cur_va, len); + do_cmp_unmap(test, cur_va, len); + compare_tables(test); + cur_va += len; + } + } +} + +static int pt_kunit_iommu_cmp_init(struct kunit *test) +{ + struct kunit_iommu_cmp_priv *cmp_priv; + struct kunit_iommu_priv *priv; + int ret; + + test->priv = cmp_priv = kzalloc(sizeof(*cmp_priv), GFP_KERNEL); + if (!cmp_priv) + return -ENOMEM; + priv = &cmp_priv->fmt; + + ret = pt_kunit_priv_init(priv); + if (ret) + goto err_priv; + + cmp_priv->pgtbl_ops = pt_iommu_alloc_io_pgtable( + &priv->cfg, &priv->dummy_dev, &cmp_priv->fmt_memory); + if (!cmp_priv->pgtbl_ops) { + ret = -ENOMEM; + goto err_fmt_table; + } + + cmp_priv->ref_table = priv->fmt_table; + return 0; + +err_fmt_table: + pt_iommu_deinit(priv->iommu); +err_priv: + kfree(test->priv); + test->priv = NULL; + return ret; +} + +static void pt_kunit_iommu_cmp_exit(struct kunit *test) +{ + struct kunit_iommu_cmp_priv *cmp_priv = test->priv; + struct kunit_iommu_priv *priv = &cmp_priv->fmt; + + if (!test->priv) + return; + + pt_iommu_deinit(priv->iommu); + free_io_pgtable_ops(cmp_priv->pgtbl_ops); + pt_iommu_free_pgtbl_cfg(cmp_priv->fmt_memory); + kfree(test->priv); +} + +static struct kunit_case cmp_test_cases[] = { + KUNIT_CASE(test_cmp_init), + KUNIT_CASE(test_cmp_one_map), + {}, +}; + +static struct kunit_suite NS(cmp_suite) = { + .name = __stringify(NS(iommu_cmp_test)), + .init = pt_kunit_iommu_cmp_init, + .exit = pt_kunit_iommu_cmp_exit, + .test_cases = cmp_test_cases, +}; +kunit_test_suites(&NS(cmp_suite)); From patchwork Thu Aug 15 15:11:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764929 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2073.outbound.protection.outlook.com [40.107.92.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56CA81B5801 for ; Thu, 15 Aug 2024 15:11:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.73 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734719; cv=fail; b=hfNOSifnSQeRP6n4QgdEz+5+smMy4OAjzZ6j77WClHg4jWw+lMMw67s8Ain7zwFY//4ftvkGoO4Ibv8nBRhEaps1SQA1qFqbO6djOJG1mda31O4tEPb1TJnGzgjx8OvwAzBsinX2ABpy8XMnm4QMKWePO13vQ3nswxE3uOYN9qQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734719; c=relaxed/simple; bh=elnGt5Scnh90tq/u6Bi6HaTrajvp+q7+d0VD7OK8i3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=l4zIL2klaB/+3FF8ToALMxMJAWznCrYXK5yqLJ/H8F7eyVT/UO4RszMyRcLsXr6PBWfiqWVoj/JSBseMXwqXw/Ih2F3nAuhyGzXeyh5W2Jiz41kOEOmrsWbcIb/5wfWUglMTHR42wmByi/unzZ989oTDwpHuk46C4mqRd9cElCc= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=HlHAzKMm; arc=fail smtp.client-ip=40.107.92.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="HlHAzKMm" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=h/lJTt19gWlwQyLniK90pj2stX4iYDjUy9Ra2am6llOdxY0xnSjjDOGpscS45myUC3T3vd5LIKL1WbNlvY3OiXFlV95KkI1AZZcI8+enlq580Sn5QS3mLpJm77yImPeq3uwiuJm1URJM/tU1RQdx+DqXGeqwrxNGHtoG/YnzJO4yrbzyxIu4AnS8FyHaJVheh4RhunKA4l3sWoUDALsFFQpgBAAGgqqUKsrhpvaD/XDnBEpyCA6IA0it7TlnTleoea9ENiqzqX2+mhTkgszcmndgHF04pq2X6U39lGPcim0Jo4lKd4f+EcKS8eNutrdon4kOhBjOtYf8ykZsP6OsCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ihOu2iVN4hwGcHHGXbybEwpG6CY6bCTOfEKx143ZTj4=; b=VOHovd+bdujtNa9v5rhCUGJ89hgeLsLAZN3jyHUW27A1Utldupatur3xfSGmUiB0fVnLMDwRXp7hIVukDApXnLhgf/nyMgbHIeWDocgLfYxsrfHE42jIkYOdTma8mvS+aelv1TmeM5f7k1vO07K5ENRhjMr8yAGh1Ar4rY9hgoLbCqaqvJFFv79qgUbyg8HXCdtSRLV41fjhdngIVJhnz4CpeRDWjgQX4/vcZ0Fi4h6LRoSjQhtG80841Bpq8pO5+gt5mHw2Pfn1nFZ6GYav7BW4WWV4v+RMi6cgYavPTPR5YC8sb99DtVQ7Hca5qFuLroN2vHdQmwBpkmmjt+N7CQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ihOu2iVN4hwGcHHGXbybEwpG6CY6bCTOfEKx143ZTj4=; b=HlHAzKMmBrvPzM9aUQJqBr9OXS+kstH6QVd9dvoU8IXWU1vMmm7+Ur3/6fB3wbtVzHSU9RYb50AULa0OR/ZpJGYlZpGwlpioRjT0aqDwrrwVGgcEnekv2cLjohc8OMw57yB1Fj2U1E6bYemKnELtgWNnKmnJ2uTDF3KWJ1xxpV4w4Hwc0bsqDqPGitne/8SDKGtOFj9+tnNXu7JORQe5sWI9RCxgo26rVcDIYWm2maIfMrDuqj5r8YxH3SZe05YYrdaFl8YcgRH6+dKTqbyRt49Il8BqafckFSQSKpCV9xoH1k9zpw3wsqnpMDvLOB+IaXwIQk/KWe4gPP1Yom2fgg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by DS0PR12MB6631.namprd12.prod.outlook.com (2603:10b6:8:d1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Thu, 15 Aug 2024 15:11:42 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:42 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 11/16] iommupt: Add the 64 bit ARMv8 page table format Date: Thu, 15 Aug 2024 12:11:27 -0300 Message-ID: <11-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL1PR13CA0239.namprd13.prod.outlook.com (2603:10b6:208:2bf::34) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|DS0PR12MB6631:EE_ X-MS-Office365-Filtering-Correlation-Id: 74004e44-e112-4ed6-fabb-08dcbd3c8e58 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: ziSufx5ojokGC33D/nzHI6hCJjOTxu4RQLTpTihna/7dLbKElMM5Wpt73mo4Oj8k2+H+AGIknfyQfCCJK7neX5cUaNkr9YlJNkMK5M8lfbNlHwoqXmaY9LKTvuMqrl9SzQwwZJnWwXsgKJk38UUDn1BPSI6SYJVfSyDWo20M7q3niB1d4amafu6x+yBZWxfcIq1BzrvvBJaHjWzZPVhR+65HQfZdXUlLG7rb3MJE3OY6hGYVQTjrp5wE+gdoCd37zjHv7ampzIYShI16ir6VQdqdOEKi0/cCH8R8kZFuVoDapWkcjpmagKyzgXJlG+7KRk/EeSPSLnCDUn2oqhUfGKnPehc/3fGm3v63eAOe6GFNamZszGi9pRSqQtRGORxtihcbUtxCscquL3F7XdMGUQtCxFazzhQ5SH9qna5x2AtK6y/JOqmUhXOmN9lyxO/2A65HVehJT8zT6jMR2vgsSURMiqGV0m2WAP22tIDH9KPY7t+DSfIvqIF6GDL8udshDFr6+cxTqsl5XOeJLy6e5fflHedhNMF9F5YxqpU9Y/RoorfFsjc91TRzCzBi/DGeAZAeaidyVoJTI4QSGnCGGY08g+pejx3LBraK/dHXJKK5pvxrgCuI0v8rkwhKkpzXEN72gB5EGiQqNI9a5W4UByj9vDNMoSLOgPfR8feG7VzedDAITUL2HAvWhaivCktJ9eFCB7AGVdAuMdmJxxH8S3YvgfFHQ2S5OqLfmQfp+Df0mWoSKJ7kOW1Fpu43C5pjGf7OZI2/ufNMtMWxIAurfcTvD5AKCFR2rZ1LKpjQ2Ks4sIdPOL65Ll5GnjeWhotQ1TZ3kxghDtDeCabCUWWxOhNThpqQYaoCcpaoN7cfdwZ0krEhgX5QyyK8bRpTTydPnFm1B0hm8ywJJ3pcC7u9q/YSM8iYKPcyVsidUeGJmbxoa+ZDkURVEP1/nkTtvaGJpoUurVOzLxZ7czDztHhlXeI2Hq/bARQ/C4YRqxsQ9VCimGlIbXo5+f7ADlTQfzphfd2ryXutE1wjs5sMkfQQ/0pIYb8alGbmOvJNAd8UqDgJkMfi74/ixE9x7cB5nvZi/x6u9XWsly9nd8PIiSaR6mrzOL6KA2r6jgEU48Vgpnog6ock6runmw0rvJaMKcMq9U1dcnx2PNSTCFGVSN3kUHQpVPAxD4G/r/nGU/6C8WcgqNihlsHU+MODjqimaS2v7Gt9KLhYB2rBU+cgXYP1ovKY5zq7tUJuTka/W0iK1LAm7ohbEr6JXqDhIVHi6F+vGsrtVjYrhROpV3RjW3cV2OJXX5nymihpPUGXGc82zxRChij43u8wx5TSUb1+iyoO6KS7nS1IezSHyirmt/3BQQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 67P6Uz96U3O+h4yLkkIm67jLvwMgQ6S0kQJ3g9lQpqiji4orSXeAr6C8ZaooZpQsMOJl2W1/f5yVamB6dazXPAPMt0HL2kvs5v7Ozx0MoSr9BCwLpS5Wpl6gZ/Dk+aEktx7S7aT7bwXj7Aq9HU58AFe+4NvqHGc6gHYDy+CPtnJY+YsqRHs1PL8SySsZmHnNDRukPnyOt79FBV4hS1G0WL0EWLE2klEMdY6IoOBYF3VQWSm1m1v34XGnLh8Xm/4TAbGkwkaaSmJ1h3v0hi8fKh+xcSQEwc+TFXhRYny9ClKxCIVu4+ORA5JhMSCJar3b9mpg0UR/68RZ9YSnw+7U3rMoI5Yl2mWtvy+lij3RlYH0U/CHo3INXoEipVLjLwN6THo2l/g+Xqr9rSl90RRhdwTV3ZYAjjmkwji4vk3PWw2Bm4prIkI4wMI9nKJiDyCxiAU2ovU1P4O9G9blbTPfhLxxP8iXo/pPcti3hQxEE9hAbu9cU6Xfop4szC8mddiz9+3k0+jqHvkrAIsr+xiqAs97XwWvABz/PqzL3FNBS5ec+L46KGiY7ADSwKmE3t0AWiVuc84Ppf6CprPlL5t8k6cJ90sDPTX/TzEDuexCD6ECjWkyTT7YV/CwOFPM/AuiYqKMZB/e0ATZcZsuVkiDMSQUm8O1K+bD+qP4+SGV+lU2A6aGN2kYRQNAxBiTljSGvC6k+xnswbDnAY2gGGyiPAMq8CvfrjUs4J+XtKWr3KXYBi5whQ7Z003LNefwXK/hlsSXNnkAp2/K8UHwsNnmoJg7vMfB8pE1LIMqM4tfw4TzLER7iGkFTqN7NAGggby3UTvxl+TGEOBhvs4xp2mhZnGUlJph87ALdxVFIXNPxZa8W30nkKa5oGJ8jr8/1Qoh8ZgqCHx2Q/O1TBM7jWSIQs63bEHlvSsaX5+xFJKj2CAHmrhSsBukr4bUovJ9bx6PQvsTEdCknIB15HUCx7pvdl3mZVej6PAsowC2XLZDPON2ipqp5TlvrV+GsNDSXQGnjXcLh3BxuXv7cL9cEpkCEBEIXhQFl2c3PXxHyi74oxJC/1QEZ1auhlRsVI0Q+wSoTe9/Jr4LFLAGTw/pGba3lhXIkhtOyM9X1EWIgD+5s6xxCVlblZ8sooexDrpUMJiptJssmcUOl+F40ehl/npPvAfaq4Mwq9yGoMvLutpOEHpvcVsvQ8dzv84MVRDir6cmroWnDSNZTE85Tv5meu0H7yHvtAtaIf+CpxfXtRQCBzaLLNW8abJ6M9UjA361lUlpa+Wddf7jUZAhHJT5ouI/0gKcWonWyNAhPi4OpuA+JvUmUTR8GRdDSs/CLx7GJ/CkXfgBhOecVHFZoFYXd9ibrpkXHGiDrz1PHB74u8AbK31HnHIXHBlcCoZPsNB0JksXWyGSR36weA0rs4oqlOlY3BuVYhzrSUoT007P+IZALhozcBzFSdwpYVBVvJEGpreP1JkUExkv7P42pPWucFiAqx1Jp+gg7vWmqwHM991z5ZuWDzV1wZkxux/NVl12FepsshL16UDpjc3hdHJ7qVmRGxD/02y66rjNpemU7btkSTg= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 74004e44-e112-4ed6-fabb-08dcbd3c8e58 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.6136 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: wka/EjPR5hlw01zTVC4Vv7inIkQnRmTjmhVi0T38+qyxR731zAIcWE/haEvSCawx X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6631 The features, format and naming is taking from the ARMv8 VMSAv8-64 chapter. ARMv8 uses almost all the features of the common implementation: - Contigous pages - Leaf pages at many levels - Variable top level - Variable size top level, including super-sized (concatenated tables) - Dirty tracking - low or high starting VA Compared to the io-pgtable version this also implements the contiguous page hint, and supports dirty readback from the S2. The common algorithms use a bit in the folio to keep track of the cache invalidation race, while the io-pgtable version uses a SW bit in the table PTE. In part as an demonstration, to be evaluated with performace data, ARMv8 is multi-compiled for each of the 4k/16k/64k granule size. This gives 3x the .text usage with an unmeasured performance improvement. It shows how Generic PT can be used to optimize code gen. FIXME: Not every detail around the variable VA width is fully completed and tested yet. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 39 ++ drivers/iommu/generic_pt/fmt/Makefile | 4 + drivers/iommu/generic_pt/fmt/armv8.h | 621 ++++++++++++++++++ drivers/iommu/generic_pt/fmt/defs_armv8.h | 28 + .../iommu/generic_pt/fmt/iommu_armv8_16k.c | 13 + drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c | 13 + .../iommu/generic_pt/fmt/iommu_armv8_64k.c | 13 + include/linux/generic_pt/common.h | 22 + include/linux/generic_pt/iommu.h | 73 ++ 9 files changed, 826 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/armv8.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv8.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 3ac9b2324ebd98..260fff5daa6e57 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -29,10 +29,49 @@ config IOMMU_PT Generic library for building IOMMU page tables if IOMMU_PT +config IOMMU_PT_ARMV8_4K + tristate "IOMMU page table for 64 bit ARMv8 4k page size" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + help + Enable support for the ARMv8 VMSAv8-64 and the VMSAv8-32 long + descriptor pagetable format. This format supports both stage-1 and + stage-2, as well as address spaces up to 48-bits in size. 4K + granule size version. + + If unsure, say N here. + +config IOMMU_PT_ARMV8_16K + tristate "IOMMU page table for 64 bit ARMv8 16k page size" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + help + Enable support for the ARMv8 VMSAv8-64 and the VMSAv8-32 long + descriptor pagetable format. This format supports both stage-1 and + stage-2, as well as address spaces up to 48-bits in size. 4K + granule size version. + + If unsure, say N here. + +config IOMMU_PT_ARMV8_64K + tristate "IOMMU page table for 64 bit ARMv8 64k page size" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + help + Enable support for the ARMv8 VMSAv8-64 and the VMSAv8-32 long + descriptor pagetable format. This format supports both stage-1 and + stage-2, as well as address spaces up to 48-bits in size. 4K + granule size version. + + If unsure, say N here. + config IOMMUT_PT_KUNIT_TEST tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS select IOMMU_IO_PGTABLE depends on KUNIT + depends on IOMMU_PT_ARMV8_4K || !IOMMU_PT_ARMV8_4K + depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K + depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K default KUNIT_ALL_TESTS endif endif diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index 0c35b9ae4dfb34..9a9173ce85e075 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -1,5 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 +iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_4K) += armv8_4k +iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_16K) += armv8_16k +iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k + IOMMU_PT_KUNIT_TEST := define create_format obj-$(2) += iommu_$(1).o diff --git a/drivers/iommu/generic_pt/fmt/armv8.h b/drivers/iommu/generic_pt/fmt/armv8.h new file mode 100644 index 00000000000000..73bccbfa72b19e --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/armv8.h @@ -0,0 +1,621 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * The page table format described by the ARMv8 VMSAv8-64 chapter in the + * Architecture Reference Manual. With the right cfg this will also implement + * the VMSAv8-32 Long Descriptor format. + * + * This was called io-pgtable-arm.c and ARM_xx_LPAE_Sx. + * + * NOTE! The level numbering is consistent with the Generic Page Table API, but + * is backwards from what the ARM documents use. What ARM calls level 3 this + * calls level 0. + * + * Present in io-pgtable-arm.c but not here: + * ARM_MALI_LPAE + * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA + */ +#ifndef __GENERIC_PT_FMT_ARMV8_H +#define __GENERIC_PT_FMT_ARMV8_H + +#include "defs_armv8.h" +#include "../pt_defs.h" + +#include +#include +#include +#include +#include +#include +#include + +#if ARMV8_GRANUAL_SIZE == 4096 +enum { + PT_MAX_TOP_LEVEL = 3, + PT_GRANUAL_LG2SZ = 12, +}; +#elif ARMV8_GRANUAL_SIZE == 16384 +enum { + PT_MAX_TOP_LEVEL = 3, + PT_GRANUAL_LG2SZ = 14, +}; +#elif ARMV8_GRANUAL_SIZE == 65536 +enum { + PT_MAX_TOP_LEVEL = 2, + PT_GRANUAL_LG2SZ = 16, +}; +#else +#error "Invalid ARMV8_GRANUAL_SIZE" +#endif + +enum { + PT_MAX_OUTPUT_ADDRESS_LG2 = 48, + /* + * Currently only support up to 48 bits of usable address, the 64k 52 + * bit mode is not supported. + */ + PT_MAX_VA_ADDRESS_LG2 = 48, + PT_TABLEMEM_LG2SZ = PT_GRANUAL_LG2SZ, + PT_ENTRY_WORD_SIZE = sizeof(u64), +}; + +/* Common PTE bits */ +enum { + ARMV8PT_FMT_VALID = BIT(0), + ARMV8PT_FMT_PAGE = BIT(1), + ARMV8PT_FMT_TABLE = BIT(1), + ARMV8PT_FMT_NS = BIT(5), + ARMV8PT_FMT_SH = GENMASK(9, 8), + ARMV8PT_FMT_AF = BIT(10), + + ARMV8PT_FMT_OA52 = GENMASK_ULL(15, 12), + ARMV8PT_FMT_OA48 = GENMASK_ULL(47, PT_GRANUAL_LG2SZ), + + ARMV8PT_FMT_DBM = BIT_ULL(51), + ARMV8PT_FMT_CONTIG = BIT_ULL(52), + ARMV8PT_FMT_UXN = BIT_ULL(53), + ARMV8PT_FMT_PXN = BIT_ULL(54), + ARMV8PT_FMT_NSTABLE = BIT_ULL(63), +}; + +/* S1 PTE bits */ +enum { + ARMV8PT_FMT_ATTRINDX = GENMASK(4, 2), + ARMV8PT_FMT_AP = GENMASK(7, 6), + ARMV8PT_FMT_nG = BIT(11), +}; + +enum { + ARMV8PT_MAIR_ATTR_IDX_CACHE = 1, + ARMV8PT_MAIR_ATTR_IDX_DEV = 2, + + ARMV8PT_SH_IS = 3, + ARMV8PT_SH_OS = 2, + + ARMV8PT_AP_UNPRIV = 1, + ARMV8PT_AP_RDONLY = 2, +}; + +/* S2 PTE bits */ +enum { + ARMV8PT_FMT_S2MEMATTR = GENMASK(5, 2), + ARMV8PT_FMT_S2AP = GENMASK(7, 6), +}; + +enum { + /* + * For !S2FWB these code to: + * 1111 = Normal outer write back cachable / Inner Write Back Cachable + * Permit S1 to override + * 0101 = Normal Non-cachable / Inner Non-cachable + * 0001 = Device / Device-nGnRE + * For S2FWB these code to: + * 0110 Force Normal Write Back + * 0101 Normal* is forced Normal-NC, Device unchanged + * 0001 Force Device-nGnRE + */ + ARMV8PT_MEMATTR_FWB_WB = 6, + ARMV8PT_MEMATTR_OIWB = 0xf, + ARMV8PT_MEMATTR_NC = 5, + ARMV8PT_MEMATTR_DEV = 1, + + ARMV8PT_S2AP_READ = 1, + ARMV8PT_S2AP_WRITE = 2, +}; + +#define common_to_armv8pt(common_ptr) \ + container_of_const(common_ptr, struct pt_armv8, common) +#define to_armv8pt(pts) common_to_armv8pt((pts)->range->common) + +static inline pt_oaddr_t armv8pt_oa(const struct pt_state *pts) +{ + u64 entry = pts->entry; + pt_oaddr_t oa; + + oa = log2_mul(FIELD_GET(ARMV8PT_FMT_OA48, entry), PT_GRANUAL_LG2SZ); + + /* LPA support on 64K page size */ + if (PT_GRANUAL_SIZE == SZ_64K) + oa |= ((pt_oaddr_t)FIELD_GET(ARMV8PT_FMT_OA52, entry)) << 52; + return oa; +} + +static inline pt_oaddr_t armv8pt_table_pa(const struct pt_state *pts) +{ + return armv8pt_oa(pts); +} +#define pt_table_pa armv8pt_table_pa + +/* + * Return a block or page entry pointing at a physical address Returns the + * address adjusted for the item in a contiguous case. + */ +static inline pt_oaddr_t armv8pt_item_oa(const struct pt_state *pts) +{ + return armv8pt_oa(pts); +} +#define pt_item_oa armv8pt_item_oa + +static inline bool armv8pt_can_have_leaf(const struct pt_state *pts) +{ + /* + * See D5-18 Translation granule sizes, with block and page sizes, and + * output address ranges + */ + if ((PT_GRANUAL_SIZE == SZ_4K && pts->level > 2) || + (PT_GRANUAL_SIZE == SZ_16K && pts->level > 1) || + (PT_GRANUAL_SIZE == SZ_64K && pts_feature(pts, PT_FEAT_ARMV8_LPA) && pts->level > 2) || + (PT_GRANUAL_SIZE == SZ_64K && !pts_feature(pts, PT_FEAT_ARMV8_LPA) && pts->level > 1)) + return false; + return true; +} +#define pt_can_have_leaf armv8pt_can_have_leaf + +static inline unsigned int armv8pt_table_item_lg2sz(const struct pt_state *pts) +{ + return PT_GRANUAL_LG2SZ + + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64))) * pts->level; +} +#define pt_table_item_lg2sz armv8pt_table_item_lg2sz + +/* Number contigous entries that ARMV8PT_FMT_CONTIG will join at this level */ +static inline unsigned short +armv8pt_contig_count_lg2(const struct pt_state *pts) +{ + if (PT_GRANUAL_SIZE == SZ_4K) + return ilog2(16); /* 64KB, 2MB */ + else if (PT_GRANUAL_SIZE == SZ_16K && pts->level == 1) + return ilog2(32); /* 1GB */ + else if (PT_GRANUAL_SIZE == SZ_16K && pts->level == 0) + return ilog2(128); /* 2M */ + else if (PT_GRANUAL_SIZE == SZ_64K) + return ilog2(32); /* 2M, 16G */ + return ilog2(1); +} +#define pt_contig_count_lg2 armv8pt_contig_count_lg2 + +static inline unsigned int +armv8pt_entry_num_contig_lg2(const struct pt_state *pts) +{ + if (pts->entry & ARMV8PT_FMT_CONTIG) + return armv8pt_contig_count_lg2(pts); + return ilog2(1); +} +#define pt_entry_num_contig_lg2 armv8pt_entry_num_contig_lg2 + +static inline pt_vaddr_t armv8pt_full_va_prefix(const struct pt_common *common) +{ + if (pt_feature(common, PT_FEAT_ARMV8_TTBR1)) + return PT_VADDR_MAX; + return 0; +} +#define pt_full_va_prefix armv8pt_full_va_prefix + +static inline unsigned int armv8pt_num_items_lg2(const struct pt_state *pts) +{ + /* FIXME S2 concatenated tables */ + return PT_GRANUAL_LG2SZ - ilog2(sizeof(u64)); +} +#define pt_num_items_lg2 armv8pt_num_items_lg2 + +static inline enum pt_entry_type armv8pt_load_entry_raw(struct pt_state *pts) +{ + const u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + if (!(entry & ARMV8PT_FMT_VALID)) + return PT_ENTRY_EMPTY; + if (pts->level != 0 && (entry & ARMV8PT_FMT_TABLE)) + return PT_ENTRY_TABLE; + + /* + * Suppress returning VALID for levels that cannot have a page to remove + * code. + */ + if (!armv8pt_can_have_leaf(pts)) + return PT_ENTRY_EMPTY; + + /* Must be a block or page, don't check the page bit on level 0 */ + return PT_ENTRY_OA; +} +#define pt_load_entry_raw armv8pt_load_entry_raw + +static inline void +armv8pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + PT_WARN_ON(log2_mod(oa, oasz_lg2)); + + entry = ARMV8PT_FMT_VALID | + FIELD_PREP(ARMV8PT_FMT_OA48, log2_div(oa, PT_GRANUAL_LG2SZ)) | + FIELD_PREP(ARMV8PT_FMT_OA52, oa >> 48) | attrs->descriptor_bits; + + /* + * On the last level the leaf is called a page and has the page/table bit set, + * on other levels it is called a block and has it clear. + */ + if (pts->level == 0) + entry |= ARMV8PT_FMT_PAGE; + + if (oasz_lg2 != isz_lg2) { + u64 *end; + + PT_WARN_ON(oasz_lg2 != isz_lg2 + armv8pt_contig_count_lg2(pts)); + PT_WARN_ON(log2_mod(pts->index, armv8pt_contig_count_lg2(pts))); + + entry |= ARMV8PT_FMT_CONTIG; + tablep += pts->index; + end = tablep + log2_to_int(armv8pt_contig_count_lg2(pts)); + for (; tablep != end; tablep++) { + WRITE_ONCE(*tablep, entry); + entry += FIELD_PREP( + ARMV8PT_FMT_OA48, + log2_to_int(isz_lg2 - PT_GRANUAL_LG2SZ)); + } + } else { + WRITE_ONCE(tablep[pts->index], entry); + } + pts->entry = entry; +} +#define pt_install_leaf_entry armv8pt_install_leaf_entry + +static inline bool armv8pt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = ARMV8PT_FMT_VALID | ARMV8PT_FMT_TABLE | + FIELD_PREP(ARMV8PT_FMT_OA48, + log2_div(table_pa, PT_GRANUAL_LG2SZ)) | + FIELD_PREP(ARMV8PT_FMT_OA52, table_pa >> 48); + + if (pts_feature(pts, PT_FEAT_ARMV8_NS)) + entry |= ARMV8PT_FMT_NSTABLE; + + return pt_table_install64(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table armv8pt_install_table + +static inline void armv8pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + attrs->descriptor_bits = + pts->entry & + (ARMV8PT_FMT_SH | ARMV8PT_FMT_AF | ARMV8PT_FMT_UXN | + ARMV8PT_FMT_PXN | ARMV8PT_FMT_ATTRINDX | ARMV8PT_FMT_AP | + ARMV8PT_FMT_nG | ARMV8PT_FMT_S2MEMATTR | ARMV8PT_FMT_S2AP); +} +#define pt_attr_from_entry armv8pt_attr_from_entry + +static inline void armv8pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 *end; + + PT_WARN_ON(log2_mod(pts->index, num_contig_lg2)); + + tablep += pts->index; + end = tablep + log2_to_int(num_contig_lg2); + for (; tablep != end; tablep++) + WRITE_ONCE(*tablep, 0); +} +#define pt_clear_entry armv8pt_clear_entry + +/* + * Call fn over all the items in an entry. If the entry is contiguous this + * iterates over the entire contiguous entry, including items preceding + * pts->va. always_inline avoids an indirect function call. + */ +static __always_inline bool armv8pt_reduce_contig(const struct pt_state *pts, + bool (*fn)(u64 *tablep, + u64 entry)) +{ + u64 *tablep = pt_cur_table(pts, u64); + + if (pts->entry & ARMV8PT_FMT_CONTIG) { + unsigned int num_contig_lg2 = armv8pt_contig_count_lg2(pts); + u64 *end; + + tablep += log2_set_mod(pts->index, 0, num_contig_lg2); + end = tablep + log2_to_int(num_contig_lg2); + for (; tablep != end; tablep++) + if (fn(tablep, READ_ONCE(*tablep))) + return true; + return false; + } + return fn(tablep + pts->index, pts->entry); +} + +static inline bool armv8pt_check_is_dirty_s1(u64 *tablep, u64 entry) +{ + return (entry & (ARMV8PT_FMT_DBM | + FIELD_PREP(ARMV8PT_FMT_AP, ARMV8PT_AP_RDONLY))) == + ARMV8PT_FMT_DBM; +} + +static bool armv6pt_clear_dirty_s1(u64 *tablep, u64 entry) +{ + WRITE_ONCE(*tablep, + entry | FIELD_PREP(ARMV8PT_FMT_AP, ARMV8PT_AP_RDONLY)); + return false; +} + +static inline bool armv8pt_check_is_dirty_s2(u64 *tablep, u64 entry) +{ + const u64 DIRTY = ARMV8PT_FMT_DBM | + FIELD_PREP(ARMV8PT_FMT_S2AP, ARMV8PT_S2AP_WRITE); + + return (entry & DIRTY) == DIRTY; +} + +static bool armv6pt_clear_dirty_s2(u64 *tablep, u64 entry) +{ + WRITE_ONCE(*tablep, entry & ~(u64)FIELD_PREP(ARMV8PT_FMT_S2AP, + ARMV8PT_S2AP_WRITE)); + return false; +} + +static inline bool armv8pt_entry_write_is_dirty(const struct pt_state *pts) +{ + if (!pts_feature(pts, PT_FEAT_ARMV8_S2)) + return armv8pt_reduce_contig(pts, armv8pt_check_is_dirty_s1); + else + return armv8pt_reduce_contig(pts, armv8pt_check_is_dirty_s2); +} +#define pt_entry_write_is_dirty armv8pt_entry_write_is_dirty + +static inline void armv8pt_entry_set_write_clean(struct pt_state *pts) +{ + if (!pts_feature(pts, PT_FEAT_ARMV8_S2)) + armv8pt_reduce_contig(pts, armv6pt_clear_dirty_s1); + else + armv8pt_reduce_contig(pts, armv6pt_clear_dirty_s2); +} +#define pt_entry_set_write_clean armv8pt_entry_set_write_clean + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_armv8 + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->armpt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, armpt.common)->iommu; +} + +static inline int armv8pt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + bool is_s1 = !pt_feature(common, PT_FEAT_ARMV8_S2); + u64 pte = 0; + + if (is_s1) { + u64 ap = 0; + + if (!(iommu_prot & IOMMU_WRITE) && (iommu_prot & IOMMU_READ)) + ap |= ARMV8PT_AP_RDONLY; + if (!(iommu_prot & IOMMU_PRIV)) + ap |= ARMV8PT_AP_UNPRIV; + pte = ARMV8PT_FMT_nG | FIELD_PREP(ARMV8PT_FMT_AP, ap); + + if (iommu_prot & IOMMU_MMIO) + pte |= FIELD_PREP(ARMV8PT_FMT_ATTRINDX, + ARMV8PT_MAIR_ATTR_IDX_DEV); + else if (iommu_prot & IOMMU_CACHE) + pte |= FIELD_PREP(ARMV8PT_FMT_ATTRINDX, + ARMV8PT_MAIR_ATTR_IDX_CACHE); + } else { + u64 s2ap = 0; + + if (iommu_prot & IOMMU_READ) + s2ap |= ARMV8PT_S2AP_READ; + if (iommu_prot & IOMMU_WRITE) + s2ap |= ARMV8PT_S2AP_WRITE; + pte = FIELD_PREP(ARMV8PT_FMT_S2AP, s2ap); + + if (iommu_prot & IOMMU_MMIO) + pte |= FIELD_PREP(ARMV8PT_FMT_S2MEMATTR, + ARMV8PT_MEMATTR_DEV); + else if ((iommu_prot & IOMMU_CACHE) && + pt_feature(common, PT_FEAT_ARMV8_S2FWB)) + pte |= FIELD_PREP(ARMV8PT_FMT_S2MEMATTR, + ARMV8PT_MEMATTR_FWB_WB); + else if (iommu_prot & IOMMU_CACHE) + pte |= FIELD_PREP(ARMV8PT_FMT_S2MEMATTR, + ARMV8PT_MEMATTR_OIWB); + else + pte |= FIELD_PREP(ARMV8PT_FMT_S2MEMATTR, + ARMV8PT_MEMATTR_NC); + } + + /* + * For DBM the writable entry starts out dirty to avoid the HW doing + * memory accesses to dirty it. We can just leave the DBM bit + * permanently set with no cost. + */ + if (pt_feature(common, PT_FEAT_ARMV8_DBM) && (iommu_prot & IOMMU_WRITE)) + pte |= ARMV8PT_FMT_DBM; + + if (iommu_prot & IOMMU_CACHE) + pte |= FIELD_PREP(ARMV8PT_FMT_SH, ARMV8PT_SH_IS); + else + pte |= FIELD_PREP(ARMV8PT_FMT_SH, ARMV8PT_SH_OS); + + /* FIXME for mali: + pte |= ARM_LPAE_PTE_SH_OS; + */ + + if (iommu_prot & IOMMU_NOEXEC) + pte |= ARMV8PT_FMT_UXN | ARMV8PT_FMT_PXN; + + if (pt_feature(common, PT_FEAT_ARMV8_NS)) + pte |= ARMV8PT_FMT_NS; + + // FIXME not on mali: + pte |= ARMV8PT_FMT_AF; + + attrs->descriptor_bits = pte; + return 0; +} +#define pt_iommu_set_prot armv8pt_iommu_set_prot + +static inline int armv8pt_iommu_fmt_init(struct pt_iommu_armv8 *iommu_table, + struct pt_iommu_armv8_cfg *cfg) +{ + struct pt_armv8 *armv8pt = &iommu_table->armpt; + unsigned int levels; + + /* Atomicity of dirty bits conflicts with an incoherent cache */ + if ((cfg->features & PT_FEAT_ARMV8_DBM) && + (cfg->features & PT_FEAT_DMA_INCOHERENT)) + return -EOPNOTSUPP; + + /* FIXME are these inputs supposed to be an exact request, or a HW capability? */ + + if (cfg->ias_lg2 <= PT_GRANUAL_LG2SZ) + return -EINVAL; + + if ((PT_GRANUAL_SIZE == SZ_64K && cfg->oas_lg2 > 52) || + (PT_GRANUAL_SIZE != SZ_64K && cfg->oas_lg2 > 48)) + return -EINVAL; + + /*if (cfg->ias > 48) + table->feat_lva = true; */ + + cfg->ias_lg2 = min(cfg->ias_lg2, PT_MAX_VA_ADDRESS_LG2); + + levels = DIV_ROUND_UP(cfg->ias_lg2 - PT_GRANUAL_LG2SZ, + PT_GRANUAL_LG2SZ - ilog2(sizeof(u64))); + if (levels > PT_MAX_TOP_LEVEL + 1) + return -EINVAL; + + /* + * Table D5-6 PA size implications for the VTCR_EL2.{T0SZ, SL0} + * Single level is not supported without FEAT_TTST, which we are not + * implementing. + */ + if (pt_feature(&armv8pt->common, PT_FEAT_ARMV8_S2) && + PT_GRANUAL_SIZE == SZ_4K && levels == 1) + return -EINVAL; + + /* FIXME - test me S2 concatenated translation tables + if (levels > 1 && cfg->is_s2 && + cfg->ias_lg2 - (ARMV8PT_LVL0_ITEM_LG2SZ * (levels - 1)) <= 4) + levels--; + */ + pt_top_set_level(&armv8pt->common, levels - 1); + armv8pt->common.max_vasz_lg2 = cfg->ias_lg2; + armv8pt->common.max_oasz_lg2 = cfg->oas_lg2; + return 0; +} +#define pt_iommu_fmt_init armv8pt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static inline void armv8pt_kunit_setup_cfg(struct pt_iommu_armv8_cfg *cfg) +{ + cfg->ias_lg2 = 48; + cfg->oas_lg2 = 48; + + cfg->features &= ~(BIT(PT_FEAT_ARMV8_TTBR1) | BIT(PT_FEAT_ARMV8_S2) | + BIT(PT_FEAT_ARMV8_DBM) | BIT(PT_FEAT_ARMV8_S2FWB) | + BIT(PT_FEAT_ARMV8_NS)); +} +#define pt_kunit_setup_cfg armv8pt_kunit_setup_cfg +#endif + +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_IOMMU_IO_PGTABLE_LPAE) +#include + +static struct io_pgtable_ops * +armv8pt_iommu_alloc_io_pgtable(struct pt_iommu_armv8_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **unused_pgtbl_cfg) +{ + struct io_pgtable_cfg pgtbl_cfg = {}; + enum io_pgtable_fmt fmt; + + pgtbl_cfg.ias = cfg->ias_lg2; + pgtbl_cfg.oas = cfg->oas_lg2; + if (PT_GRANUAL_SIZE == SZ_64K) + pgtbl_cfg.pgsize_bitmap |= SZ_64K | SZ_512M; + if (PT_GRANUAL_SIZE == SZ_16K) + pgtbl_cfg.pgsize_bitmap |= SZ_16K | SZ_32M; + if (PT_GRANUAL_SIZE == SZ_4K) + pgtbl_cfg.pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; + pgtbl_cfg.coherent_walk = true; + + if (cfg->features & BIT(PT_FEAT_ARMV8_S2)) + fmt = ARM_64_LPAE_S2; + else + fmt = ARM_64_LPAE_S1; + + return alloc_io_pgtable_ops(fmt, &pgtbl_cfg, NULL); +} +#define pt_iommu_alloc_io_pgtable armv8pt_iommu_alloc_io_pgtable + +static void armv8pt_iommu_setup_ref_table(struct pt_iommu_armv8 *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct pt_common *common = &iommu_table->armpt.common; + + /* FIXME should determine the level from the pgtbl_cfg */ + if (pt_feature(common, PT_FEAT_ARMV8_S2)) + pt_top_set(common, __va(pgtbl_cfg->arm_lpae_s2_cfg.vttbr), + pt_top_get_level(common)); + else + pt_top_set(common, __va(pgtbl_cfg->arm_lpae_s1_cfg.ttbr), + pt_top_get_level(common)); +} +#define pt_iommu_setup_ref_table armv8pt_iommu_setup_ref_table + +static u64 armv8pt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) + return pts->entry & (~(u64)(ARMV8PT_FMT_OA48)); + return pts->entry & (~(u64)ARMV8PT_FMT_CONTIG); +} +#define pt_kunit_cmp_mask_entry armv8pt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/drivers/iommu/generic_pt/fmt/defs_armv8.h b/drivers/iommu/generic_pt/fmt/defs_armv8.h new file mode 100644 index 00000000000000..751372a6024e4a --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_armv8.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * VMSAv8-64 translation table in AArch64 mode + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_ARMV8_H +#define __GENERIC_PT_FMT_DEFS_ARMV8_H + +#include +#include + +/* Header self-compile default defines */ +#ifndef ARMV8_GRANUAL_SIZE +#define ARMV8_GRANUAL_SIZE 4096 +#endif + +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct armv8pt_write_attrs { + u64 descriptor_bits; + gfp_t gfp; +}; +#define pt_write_attrs armv8pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c b/drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c new file mode 100644 index 00000000000000..46a5aead0007fc --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT armv8 +#define PT_FMT_VARIANT 16k +#define PT_SUPPORTED_FEATURES \ + (BIT(PT_FEAT_DMA_INCOHERENT) | BIT(PT_FEAT_ARMV8_LPA) | \ + BIT(PT_FEAT_ARMV8_S2) | BIT(PT_FEAT_ARMV8_DBM) | \ + BIT(PT_FEAT_ARMV8_S2FWB)) +#define ARMV8_GRANUAL_SIZE 16384 + +#include "iommu_template.h" diff --git a/drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c b/drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c new file mode 100644 index 00000000000000..2143104dfe0d4d --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT armv8 +#define PT_FMT_VARIANT 4k +#define PT_SUPPORTED_FEATURES \ + (BIT(PT_FEAT_DMA_INCOHERENT) | BIT(PT_FEAT_ARMV8_LPA) | \ + BIT(PT_FEAT_ARMV8_S2) | BIT(PT_FEAT_ARMV8_DBM) | \ + BIT(PT_FEAT_ARMV8_S2FWB)) +#define ARMV8_GRANUAL_SIZE 4096 + +#include "iommu_template.h" diff --git a/drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c b/drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c new file mode 100644 index 00000000000000..df008e716b6017 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT armv8 +#define PT_FMT_VARIANT 64k +#define PT_SUPPORTED_FEATURES \ + (BIT(PT_FEAT_DMA_INCOHERENT) | BIT(PT_FEAT_ARMV8_LPA) | \ + BIT(PT_FEAT_ARMV8_S2) | BIT(PT_FEAT_ARMV8_DBM) | \ + BIT(PT_FEAT_ARMV8_S2FWB)) +#define ARMV8_GRANUAL_SIZE 65536 + +#include "iommu_template.h" diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index 6a865dbf075192..6c8296b1dd1a65 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -100,4 +100,26 @@ enum { PT_FEAT_FMT_START, }; +struct pt_armv8 { + struct pt_common common; +}; + +enum { + /* Use the upper address space instead of lower */ + PT_FEAT_ARMV8_TTBR1 = PT_FEAT_FMT_START, + /* + * Large Physical Address extension allows larger page sizes on 64k. + * Larger physical addresess are always supported + */ + PT_FEAT_ARMV8_LPA, + /* Use the Stage 2 format instead of Stage 1 */ + PT_FEAT_ARMV8_S2, + /* Use Dirty Bit Modifier, necessary for IOMMU dirty tracking */ + PT_FEAT_ARMV8_DBM, + /* For S2 uses the Force Write Back coding of the S2MEMATTR */ + PT_FEAT_ARMV8_S2FWB, + /* Set the NS and NSTable bits in all entries */ + PT_FEAT_ARMV8_NS, +}; + #endif diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index f77f6aef3f5958..64af0043d127bc 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -204,4 +204,77 @@ static inline void pt_iommu_deinit(struct pt_iommu *iommu_table) iommu_table->ops->deinit(iommu_table); } +struct pt_iommu_armv8 { + struct pt_iommu iommu; + struct pt_armv8 armpt; +}; + +struct pt_iommu_armv8_cfg { + struct device *iommu_device; + unsigned int features; + /* Input Address Size lg2 */ + u8 ias_lg2; + /* Output Address Size lg2 */ + u8 oas_lg2; +}; + +int pt_iommu_armv8_4k_init(struct pt_iommu_armv8 *table, + struct pt_iommu_armv8_cfg *cfg, gfp_t gfp); +int pt_iommu_armv8_16k_init(struct pt_iommu_armv8 *table, + struct pt_iommu_armv8_cfg *cfg, gfp_t gfp); +int pt_iommu_armv8_64k_init(struct pt_iommu_armv8 *table, + struct pt_iommu_armv8_cfg *cfg, gfp_t gfp); + +static size_t __pt_iommu_armv8_granuals_to_lg2(size_t granual_sizes) +{ + size_t supported_granuals = 0; + + if (IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_4K)) + supported_granuals |= BIT(12); + if (IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_16K)) + supported_granuals |= BIT(14); + if (IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_64K)) + supported_granuals |= BIT(16); + + granual_sizes &= supported_granuals; + if (!granual_sizes) + return 0; + + /* Prefer the CPU page size if possible */ + if (granual_sizes & PAGE_SIZE) + return PAGE_SHIFT; + + /* + * Otherwise prefer the largest page size smaller than the CPU page + * size + */ + if (granual_sizes % PAGE_SIZE) + return ilog2(rounddown_pow_of_two(granual_sizes % PAGE_SIZE)); + + /* Otherwise use the smallest page size available */ + return __ffs(granual_sizes); +} + +static inline int pt_iommu_armv8_init(struct pt_iommu_armv8 *table, + struct pt_iommu_armv8_cfg *cfg, + size_t granual_sizes, gfp_t gfp) +{ + switch (__pt_iommu_armv8_granuals_to_lg2(granual_sizes)) { + case 12: + if (!IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_4K)) + return -EOPNOTSUPP; + return pt_iommu_armv8_4k_init(table, cfg, gfp); + case 14: + if (!IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_16K)) + return -EOPNOTSUPP; + return pt_iommu_armv8_16k_init(table, cfg, gfp); + case 16: + if (!IS_ENABLED(CONFIG_IOMMU_PT_ARMV8_64K)) + return -EOPNOTSUPP; + return pt_iommu_armv8_64k_init(table, cfg, gfp); + default: + return -EOPNOTSUPP; + } +} + #endif From patchwork Thu Aug 15 15:11:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764918 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2608B1AC426 for ; Thu, 15 Aug 2024 15:11:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734709; cv=fail; b=oQ+WC67tHkOBTOxMCHFSBfRAlGjhtBdjSAsjmCK1Az5upR6u47b9/G/Rd6fhaRtDxzKvZPjLKU95M95YDWYD63/kurHMcmiFzS8QAcdVRObGrlaNFosaARXc+pS0pAQEUGhW6la+6H61inisSN8kS3ZRwwE5rzz9bPkm3P+q3Sc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734709; c=relaxed/simple; bh=TllDC80KUMNSN7CeUf4l8H87u5VzWHuI8R9DSSqUjEY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=ZJSJkDyRVtjfZFa4VeWJ5eAz3I+EwcqMH1ROwxZbqPEVHar+LNachpiEXXtswTYeaxql9XY5YCii1nVz054eMdp6cMV6I8VT/wNy/dY1S1gp15O7Y9mv/wIt9VGT/X+pq8Oiw2JHBN7DC4lMesAsytxVCrJXUwYLhrJQ6UW83H0= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=urCoF+Fz; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="urCoF+Fz" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=IXdmduK33MWdbL5DvwuWIbPLJPhJ8igiOV+GSbV+m2kWhz1bPhfKVrMOcLdoyx7IzuRIWJiyQNKrfKUCkWkIECoHNBwyfHZilzSRiLHiR14tV1VQwKVmVGkw7h8OyBhgjDH/YItJgDeYIkEE6zjlcnZtOEwsdGUjloJZoFXIExFGB1gVSjFGoJrva4NxsO8KcEB7hAMvjG5nmup1p6v7lUksO3UxXnM11fSR/xwXLcdc0gAX5erzgXCdi45J7tAyMpCZ+4b70z9p1jP+Dd0Mi4pbzTDn5BpUOiSiXYNxZJBzAmfFkk3jpxiUUC4CLZyUETttWLabdCz7/QRkVTrJRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RgpyrIrWeMvIdTyL/LbSbfN211GNOImYYvbh7eFv3DY=; b=Gm8DnfwlWBxC5BYN5YWNsgDhVYpoI4n94sz2gd/a+zOMJYbxV6B4/ie99qoT0WOCTGBmTkCutYRpRsZTIOoFdVwlf/1fJiYpHA2S0/PuKwBXokTpgKh4Kt4/qbIx7yZsJjkgpI/vN4IIkGXwT/T4FAllEFT+nY8X2zZWEAh32zIZ/qwyyvz0dqLCXav3sFIPTRobztFVC1dCLaUxX/3kyS02dWqdBCcVTKv8UOrxygZ+ZKtHpcHagQGgI8i1uIUN8bjSbpPEDvnbZ4119u+fiOzRYxrgQ3k2b1PzTe6RwXB94m7cyC97CiedaLwtJ0D7p9BOnv3nVe45kR7rbGOIXA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RgpyrIrWeMvIdTyL/LbSbfN211GNOImYYvbh7eFv3DY=; b=urCoF+Fz0Dio4E7YSJn4bhMLSPHP304u5XS/dLvM18r2ge2StL4UJh0iC/hOobl4ONuUuOj9vPrYSM4Xum0h2NWyyLF2aDDup8XJL6M7O6ynd/2223h7w1654H14YSNmEV0e+7ErYQztIRfAqJXkVTIcYDN+6zFTxrv1Jj0oIeLFZmf6jFapJYh1azlq0xJOrduJ1aOJkmJedLIHvbBcQKD3ohuE/bSj7xLZOb5Q3+yAbujdkMO10T9l1H9WwMByK35tRqnstf1a6NIliJ4ipF0oPDFxZ3+TZKHeK1VYFHMU/ckMW1Iu0FnPUfoKs0vbwwCh3RsmCpau+m07TE16ZA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:36 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:36 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 12/16] iommupt: Add the AMD IOMMU v1 page table format Date: Thu, 15 Aug 2024 12:11:28 -0300 Message-ID: <12-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR05CA0038.namprd05.prod.outlook.com (2603:10b6:208:335::19) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 8c793138-8b85-49b2-a175-08dcbd3c8d59 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: kQZQ3LqkAacC1Qla1g/NzFSbJYsTwDf2hodYisEyyreufvhgacgY0sUsJp4tCssrXVL6VwTCfO+1UcYlj0ubyp3/nhqVdWFxbTu9pKjxC+B6JjdBvOXu4xBhsgazCDuINk2l+aGZvh9cqqLxd6xc3tTohwvw74FUcL8OWOazg+WJDGITtWUzaSxWdz3uiEiEvGU7nOAHnXV8463Pkdm9xpCnF69WANDyyhJ0ui0hASDs+amol4GtjeYHiMPJqBqZQnKJftfn8s0ayH68RUjAxhJ8qWUUa+V1MWT4BcZzWWq+t+3Elly0bZdIgVdB2g98tMAeTBxWQs9f9tQBNgp7TzsBx5t4zHNYGkOfycuZ9k1/XPvXTy31TzCZssCxb55Hmfr99EE2iSU9Qnv6OlAUjdQtMhJrJh9Rp2Hi1XvR2QisaGLdorxOJwP0b7IKeJMFLtGy/DDzs3R/9uQDa9XhGrhGg1crPt/rFHwGH1HNB4yN0hIn3osmgncteFqtexbhpDUgwVELbxUCy5TifBakmpOPGh6a5pJeULqpSscdG86UYe3+fV1rdZNf7V4ObsVCRD3/grWoWbbVyMSmnxDZnsHMSOlANs8bgSwhE1sPwoYRrRfS0QEgrXN6G7lVd2xGUIsciPAXVh6wf7pPCKv+Po4yIcwGpBrqQE3gHF+gDQMk8kCM3tTc+ooolklgLcFoB9q93C9S2UXzxtTRyO/1sIzQ1VO95dRwfZGAn1h2VNiv47f0g/aKvjFiyJCV9t3h4osmdHxe9hQkfhbEp69k/5dAf+SjFG0LzKcvPzA1YpRVjrMy+Aj1cZiZCDJ4G8q+WA5T4DTIMW2NjwLxsSW1+geoNNRlm+liTnD2K7hgLdcr2uB+hztz3C86mATu4AkwV5WdHvsiMLmnrvUncagOlKb9Gr4+MIkgR0rOG5uElOqJNr5MTDIl1gY4I2N+g6XX0QCHxCpfznFWXQ/tmPT3vq+69tWA/8LeAZ+zRdDA/OBpEaseTUXowGXp8q0oeeMjE6E/wwpnFrC6D+5Ede3oaJstSZmgfyX06tRp+KL0C3/OifSefkrf7nvq2+2Y6sF7nbNE7t6rvi/zoA1EV3B/gIVa3M/kCpMmG/w6w6Pm1RbCMcbA7HN7VMEGY1N9nYyjWmAo6nN/5kV4enWxFK2Vf9vyrRMvcaSHYsjn1EMPoLHrJp2SvJMGbDbqyl2sZU0blIxjTthEan4LZ2/c1WGk+qDLArFKqFNkS49G0t8aVnyMpi6qaKusJa9fKFrbjEH+gQTWF/7JqqCq0ufBYeKJ+2whgAGVHGta7LAF8/ynCjTqcu2bl1DmgzPV5/4Ga2JazG8yM+fiidfQthl5tSKUfg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: hxlpOQXL9eRpzDA4ecwlJLvUALEA+w9D8C0Z1cKmjpr1zBnrbL3EIWvwGo7wX/OWrkkQ74LFqzUoGj1WKJK9k/9RkerwSuEbQ5umBki0T0UCCX8jkJ6uOB03jyyTgeALldAdh+aPh7x7njRxPERW+nfbrix6rzRA+/KlvnkjtxnBgtiFiDsNypibjb+RZdnBrytGT5hpBPKfre06UiihQBba5L7B695BREc+8lGFiFBuIxDJoT5kE9xAatUjotgzU1heSidLEdphG0OMbjLzIErf4fKB6jHqFvEn15DBR4kw3fhKw6g+mHYYVsMD1fFYV0AZ4RyAYnd5gTIyjOAV3tKk5N08fxQEcY7GbJNLOemyrpZ+QfVxPbqhdutNsle7qkvadEbJezxQIMzgNOcc7EvW5I8UwpvN19/FhWoB/KhBJkoZG/3b0UqJ7IovxpShOZHkSXumt1ubIqknQvG1QwDcRaofjztALcQOWZBjMfzZZGtoRDYkVPpf7B3QlM8szEeD/no3FcGkH0fYzn6hdmgOwm28C7dUi9KEQOQ9iXhIo5WYbRUds1xY0ySpmbPKPkUPg0X76l0F2Acgp/EFaJFO/kebUsRLIUDNzHzob8baqw9jKaV9OeJJBQP1/k0A19YrlKCqcrbuUIDgBF5nvv7JkLAoP2gg0FPZtn4mqSl5qWdGUA+H6h2EgyArNbhmAjeBl042RzY6/0E68eF9aFworcp7NDT123w/kJ9I88ZeuGxnDiZ7kWo+WtMMNFGAZxbEsvDFnthKLI5dnR/vTHpKelUNU4E2spKsZW9eOs3qi0mY1mMA8REYHEnbs7Xm9qgjH0SxxfoUn0luYS6xZ8LamJhb3xgyqzuqmWKb43YSZph3UUr/V11qTnNiB8mmLpruWq+KVtNX9Z88YJDKh1XQSdBajFOiqVpbV8IFD42ilN0Dd/75xxyUwFldUnzqFl8KjwqgsRgU6bgWGLG7TI70V3JXN9a3MbE+y7r5PUFFhmIaszR/LMMPycGptduXXunn7LfregMGLnp1QF0MpSKgQ9y91/MZcvsuOTc+eXV7PA5mLnMsQNzDHvOt0UtaOW2kcaVi1AEgAozLfDxEiIky9DzrP26CeYomlCH0QqjgMN6OCjZb10y1N22KrjwtHCH07y0sPdaw5HDU5LB+2gQXrcNPr7rF3bzup0xY18JPNe7ciAFtx8oFIexbiXWWd156iKqZOvNoIGaC6AFeb+3eD/7C3STOfyiqyCT2m3hodjonubLjs95zaBusVrpa871j8K4jGuHhg/wvDmeATV44PT13sHWDcP+P1guax0EUl2Xxp/HKSHjJJ4yKS70/spJIbaPLRw470j3vmy+wOgvV9Tg5dNHzisMPCvnzW3/8GFfu+XQ4lISrZDSld5vToRRm6qKe9wxMdGd1eT4WCDq451u4CgqLYaZ689OeHjfWmdZolwQiol6Y4otfxAVxN/8RUWdMWEdsLy9XiMMDDgtXjxylLwJaOM9o0D2e0Ofq/VDkT97O1RvktCNmrC5+rdMhYcyho83+Rn4Q6CYw+yk9shYMnVAbGhzqxIAv174= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8c793138-8b85-49b2-a175-08dcbd3c8d59 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.9642 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: t/DlYbaGvzn7vPhGNX1DFNqnZ8CUoZwXaXF/GV5fm2Xlob3SnVuvrNoDMX6VZn1o X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 AMD IOMMU v1 is unique in supporting contiguous pages with a variable size and it can decode the full 64 bit VA space. The general design is quite similar to the x86 PAE format, except with an additional level and quite different PTE encoding. This format is the only one that uses the PT_FEAT_DYNAMIC_TOP feature in the existing code. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 6 + drivers/iommu/generic_pt/fmt/Makefile | 2 + drivers/iommu/generic_pt/fmt/amdv1.h | 372 +++++++++++++++++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 ++ drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 9 + include/linux/generic_pt/common.h | 4 + include/linux/generic_pt/iommu.h | 12 + 7 files changed, 426 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 260fff5daa6e57..e34be10cf8bac2 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -29,6 +29,11 @@ config IOMMU_PT Generic library for building IOMMU page tables if IOMMU_PT +config IOMMU_PT_AMDV1 + tristate "IOMMU page table for 64 bit AMD IOMMU v1" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + config IOMMU_PT_ARMV8_4K tristate "IOMMU page table for 64 bit ARMv8 4k page size" depends on !GENERIC_ATOMIC64 # for cmpxchg64 @@ -69,6 +74,7 @@ config IOMMUT_PT_KUNIT_TEST tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS select IOMMU_IO_PGTABLE depends on KUNIT + depends on IOMMU_PT_AMDV1 || !IOMMU_PT_AMDV1 depends on IOMMU_PT_ARMV8_4K || !IOMMU_PT_ARMV8_4K depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index 9a9173ce85e075..16031fc1270178 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -1,5 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 +iommu_pt_fmt-$(CONFIG_IOMMU_PT_AMDV1) += amdv1 + iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_4K) += armv8_4k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_16K) += armv8_16k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k diff --git a/drivers/iommu/generic_pt/fmt/amdv1.h b/drivers/iommu/generic_pt/fmt/amdv1.h new file mode 100644 index 00000000000000..3c1af8f84cca02 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/amdv1.h @@ -0,0 +1,372 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * AMD IOMMU v1 page table + * + * This is described in Section "2.2.3 I/O Page Tables for Host Translations" + * of the "AMD I/O Virtualization Technology (IOMMU) Specification" + * + * Note the level numbering here matches the core code, so level 0 is the same + * as mode 1. + * + * FIXME: + * sme_set + */ +#ifndef __GENERIC_PT_FMT_AMDV1_H +#define __GENERIC_PT_FMT_AMDV1_H + +#include "defs_amdv1.h" +#include "../pt_defs.h" + +#include +#include +#include +#include +#include + +enum { + PT_MAX_OUTPUT_ADDRESS_LG2 = 52, + PT_MAX_VA_ADDRESS_LG2 = 64, + PT_ENTRY_WORD_SIZE = sizeof(u64), + PT_MAX_TOP_LEVEL = 5, + PT_GRANUAL_LG2SZ = 12, + PT_TABLEMEM_LG2SZ = 12, +}; + +/* PTE bits */ +enum { + AMDV1PT_FMT_PR = BIT(0), + AMDV1PT_FMT_NEXT_LEVEL = GENMASK_ULL(11, 9), + AMDV1PT_FMT_OA = GENMASK_ULL(51, 12), + AMDV1PT_FMT_FC = BIT_ULL(60), + AMDV1PT_FMT_IR = BIT_ULL(61), + AMDV1PT_FMT_IW = BIT_ULL(62), +}; + +/* + * gcc 13 has a bug where it thinks the output of FIELD_GET() is an enum, make + * these defines to avoid it. + */ +#define AMDV1PT_FMT_NL_DEFAULT 0 +#define AMDV1PT_FMT_NL_SIZE 7 + +#define common_to_amdv1pt(common_ptr) \ + container_of_const(common_ptr, struct pt_amdv1, common) +#define to_amdv1pt(pts) common_to_amdv1pt((pts)->range->common) + +static inline pt_oaddr_t amdv1pt_table_pa(const struct pt_state *pts) +{ + return log2_mul(FIELD_GET(AMDV1PT_FMT_OA, pts->entry), + PT_GRANUAL_LG2SZ); +} +#define pt_table_pa amdv1pt_table_pa + +/* Returns the oa for the start of the contiguous entry */ +static inline pt_oaddr_t amdv1pt_entry_oa(const struct pt_state *pts) +{ + pt_oaddr_t oa = FIELD_GET(AMDV1PT_FMT_OA, pts->entry); + + if (FIELD_GET(AMDV1PT_FMT_NEXT_LEVEL, pts->entry) == + AMDV1PT_FMT_NL_SIZE) { + unsigned int sz_bits = oalog2_ffz(oa); + + oa = log2_set_mod(oa, 0, sz_bits); + } else if (PT_WARN_ON(FIELD_GET(AMDV1PT_FMT_NEXT_LEVEL, pts->entry) != + AMDV1PT_FMT_NL_DEFAULT)) + return 0; + return log2_mul(oa, PT_GRANUAL_LG2SZ); +} +#define pt_entry_oa amdv1pt_entry_oa + +static inline bool amdv1pt_can_have_leaf(const struct pt_state *pts) +{ + /* + * Table 15: Page Tabel Level Parameters + * The top most level cannot have translation entries + */ + return pts->level < PT_MAX_TOP_LEVEL; +} +#define pt_can_have_leaf amdv1pt_can_have_leaf + +static inline unsigned int amdv1pt_table_item_lg2sz(const struct pt_state *pts) +{ + return PT_GRANUAL_LG2SZ + + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64))) * pts->level; +} +#define pt_table_item_lg2sz amdv1pt_table_item_lg2sz + +static inline unsigned int +amdv1pt_entry_num_contig_lg2(const struct pt_state *pts) +{ + u64 code; + + if (FIELD_GET(AMDV1PT_FMT_NEXT_LEVEL, pts->entry) == + AMDV1PT_FMT_NL_DEFAULT) + return ilog2(1); + + if (PT_WARN_ON(FIELD_GET(AMDV1PT_FMT_NEXT_LEVEL, pts->entry) != + AMDV1PT_FMT_NL_SIZE)) + return ilog2(1); + + /* + * Reverse: + * log2_div(log2_to_int(pgsz_lg2 - 1) - 1, PT_GRANUAL_LG2SZ)); + */ + code = FIELD_GET(AMDV1PT_FMT_OA, pts->entry); + return oalog2_ffz(code) + 1 - + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64))) * pts->level; +} +#define pt_entry_num_contig_lg2 amdv1pt_entry_num_contig_lg2 + +static inline unsigned int amdv1pt_num_items_lg2(const struct pt_state *pts) +{ + /* Top entry covers bits [63:57] only */ + /* if (pts->level == 5) + return 7; + */ + return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64)); +} +#define pt_num_items_lg2 amdv1pt_num_items_lg2 + +static inline pt_vaddr_t amdv1pt_possible_sizes(const struct pt_state *pts) +{ + unsigned int isz_lg2 = amdv1pt_table_item_lg2sz(pts); + + if (!amdv1pt_can_have_leaf(pts)) + return 0; + + /* + * Table 14: Example Page Size Encodings + * Address bits 51:32 can be used to encode page sizes greater that 4 + * Gbytes. Address bits 63:52 are zero-extended. + * + * 512GB Pages are not supported due to a hardware bug. + * Otherwise every power of two size is supported. + */ + return GENMASK_ULL(min(51, isz_lg2 + amdv1pt_num_items_lg2(pts) - 1), + isz_lg2) & + ~SZ_512G; +} +#define pt_possible_sizes amdv1pt_possible_sizes + +static inline enum pt_entry_type amdv1pt_load_entry_raw(struct pt_state *pts) +{ + const u64 *tablep = pt_cur_table(pts, u64); + unsigned int next_level; + u64 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + if (!(entry & AMDV1PT_FMT_PR)) + return PT_ENTRY_EMPTY; + + next_level = FIELD_GET(AMDV1PT_FMT_NEXT_LEVEL, pts->entry); + if (pts->level == 0 || next_level == AMDV1PT_FMT_NL_DEFAULT || + next_level == AMDV1PT_FMT_NL_SIZE) + return PT_ENTRY_OA; + return PT_ENTRY_TABLE; +} +#define pt_load_entry_raw amdv1pt_load_entry_raw + +static inline void +amdv1pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = AMDV1PT_FMT_PR | + FIELD_PREP(AMDV1PT_FMT_OA, log2_div(oa, PT_GRANUAL_LG2SZ)) | + attrs->descriptor_bits; + + if (oasz_lg2 == isz_lg2) { + entry |= FIELD_PREP(AMDV1PT_FMT_NEXT_LEVEL, + AMDV1PT_FMT_NL_DEFAULT); + WRITE_ONCE(tablep[pts->index], entry); + } else { + unsigned int end_index = + pts->index + log2_to_int(oasz_lg2 - isz_lg2); + unsigned int i; + + entry |= FIELD_PREP(AMDV1PT_FMT_NEXT_LEVEL, + AMDV1PT_FMT_NL_SIZE) | + FIELD_PREP(AMDV1PT_FMT_OA, + log2_div(log2_to_int(oasz_lg2 - 1) - 1, + PT_GRANUAL_LG2SZ)); + for (i = pts->index; i != end_index; i++) + WRITE_ONCE(tablep[i], entry); + } + pts->entry = entry; +} +#define pt_install_leaf_entry amdv1pt_install_leaf_entry + +static inline bool amdv1pt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + /* + * IR and IW are ANDed from the table levels along with the PTE. We + * always control permissions from the PTE, so always set IR and IW for + * tables. + */ + entry = AMDV1PT_FMT_PR | + FIELD_PREP(AMDV1PT_FMT_NEXT_LEVEL, pts->level) | + FIELD_PREP(AMDV1PT_FMT_OA, + log2_div(table_pa, PT_GRANUAL_LG2SZ)) | + AMDV1PT_FMT_IR | AMDV1PT_FMT_IW; + return pt_table_install64(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table amdv1pt_install_table + +static inline void amdv1pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + attrs->descriptor_bits = + pts->entry & (AMDV1PT_FMT_FC | AMDV1PT_FMT_IR | AMDV1PT_FMT_IW); +} +#define pt_attr_from_entry amdv1pt_attr_from_entry + +/* FIXME share code */ +static inline void amdv1pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 *end; + + PT_WARN_ON(log2_mod(pts->index, num_contig_lg2)); + + tablep += pts->index; + end = tablep + log2_to_int(num_contig_lg2); + for (; tablep != end; tablep++) + WRITE_ONCE(*tablep, 0); +} +#define pt_clear_entry amdv1pt_clear_entry + +/* FIXME pt_entry_write_is_dirty/etc */ + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_amdv1 + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->amdpt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, amdpt.common)->iommu; +} + +static inline int amdv1pt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + u64 pte; + + /* FIXME Intel allows control over the force coherence bit */ + pte = AMDV1PT_FMT_FC; + if (iommu_prot & IOMMU_READ) + pte |= AMDV1PT_FMT_IR; + if (iommu_prot & IOMMU_WRITE) + pte |= AMDV1PT_FMT_IW; + + attrs->descriptor_bits = pte; + return 0; +} +#define pt_iommu_set_prot amdv1pt_iommu_set_prot + +static inline int amdv1pt_iommu_fmt_init(struct pt_iommu_amdv1 *iommu_table, + struct pt_iommu_amdv1_cfg *cfg) +{ + struct pt_amdv1 *table = &iommu_table->amdpt; + + /* FIXME since this isn't configurable right now should we drop it? */ + pt_top_set_level(&table->common, 2); // FIXME + return 0; +} +#define pt_iommu_fmt_init amdv1pt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static void amdv1pt_kunit_setup_cfg(struct pt_iommu_amdv1_cfg *cfg) +{ +} +#define pt_kunit_setup_cfg amdv1pt_kunit_setup_cfg +#endif + +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_AMD_IOMMU) +#include +#include "../../amd/amd_iommu_types.h" + +static struct io_pgtable_ops * +amdv1pt_iommu_alloc_io_pgtable(struct pt_iommu_amdv1_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **pgtbl_cfg) +{ + struct amd_io_pgtable *pgtable; + struct io_pgtable_ops *pgtbl_ops; + + /* + * AMD expects that io_pgtable_cfg is allocated to its type by the + * caller. + */ + pgtable = kzalloc(sizeof(*pgtable), GFP_KERNEL); + if (!pgtable) + return NULL; + + pgtable->iop.cfg.iommu_dev = iommu_dev; + pgtable->iop.cfg.amd.nid = NUMA_NO_NODE; + pgtbl_ops = + alloc_io_pgtable_ops(AMD_IOMMU_V1, &pgtable->iop.cfg, NULL); + if (!pgtbl_ops) { + kfree(pgtable); + return NULL; + } + *pgtbl_cfg = &pgtable->iop.cfg; + return pgtbl_ops; +} +#define pt_iommu_alloc_io_pgtable amdv1pt_iommu_alloc_io_pgtable + +static void amdv1pt_iommu_free_pgtbl_cfg(struct io_pgtable_cfg *pgtbl_cfg) +{ + struct amd_io_pgtable *pgtable = + container_of(pgtbl_cfg, struct amd_io_pgtable, iop.cfg); + + kfree(pgtable); +} +#define pt_iommu_free_pgtbl_cfg amdv1pt_iommu_free_pgtbl_cfg + +static void amdv1pt_iommu_setup_ref_table(struct pt_iommu_amdv1 *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct amd_io_pgtable *pgtable = + container_of(pgtbl_cfg, struct amd_io_pgtable, iop.cfg); + struct pt_common *common = &iommu_table->amdpt.common; + + pt_top_set(common, (struct pt_table_p *)pgtable->root, + pgtable->mode - 1); + WARN_ON(pgtable->mode - 1 > PT_MAX_TOP_LEVEL || pgtable->mode <= 0); +} +#define pt_iommu_setup_ref_table amdv1pt_iommu_setup_ref_table + +static u64 amdv1pt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) + return pts->entry & (~(u64)(AMDV1PT_FMT_OA)); + return pts->entry; +} +#define pt_kunit_cmp_mask_entry amdv1pt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/drivers/iommu/generic_pt/fmt/defs_amdv1.h b/drivers/iommu/generic_pt/fmt/defs_amdv1.h new file mode 100644 index 00000000000000..a9d3b6216e7f30 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_amdv1.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_AMDV1_H +#define __GENERIC_PT_FMT_DEFS_AMDV1_H + +#include +#include + +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct amdv1pt_write_attrs { + u64 descriptor_bits; + gfp_t gfp; +}; +#define pt_write_attrs amdv1pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_amdv1.c b/drivers/iommu/generic_pt/fmt/iommu_amdv1.c new file mode 100644 index 00000000000000..81999511cc65da --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_amdv1.c @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT amdv1 +#define PT_SUPPORTED_FEATURES (BIT(PT_FEAT_FULL_VA) | BIT(PT_FEAT_DYNAMIC_TOP)) +#define PT_FORCE_FEATURES BIT(PT_FEAT_DYNAMIC_TOP) + +#include "iommu_template.h" diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index 6c8296b1dd1a65..e8d489dff756a8 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -100,6 +100,10 @@ enum { PT_FEAT_FMT_START, }; +struct pt_amdv1 { + struct pt_common common; +}; + struct pt_armv8 { struct pt_common common; }; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index 64af0043d127bc..bf139c5657fc06 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -204,6 +204,18 @@ static inline void pt_iommu_deinit(struct pt_iommu *iommu_table) iommu_table->ops->deinit(iommu_table); } +struct pt_iommu_amdv1 { + struct pt_iommu iommu; + struct pt_amdv1 amdpt; +}; + +struct pt_iommu_amdv1_cfg { + struct device *iommu_device; + unsigned int features; +}; +int pt_iommu_amdv1_init(struct pt_iommu_amdv1 *table, + struct pt_iommu_amdv1_cfg *cfg, gfp_t gfp); + struct pt_iommu_armv8 { struct pt_iommu iommu; struct pt_armv8 armpt; From patchwork Thu Aug 15 15:11:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764916 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2081.outbound.protection.outlook.com [40.107.92.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B29881ABEBC for ; Thu, 15 Aug 2024 15:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734708; cv=fail; b=OGx4upEIjYO1hND2xQAlbNQE096ltrsm/WS2o94eJzFVchEP4eRDtUih82xaqfmJWw52zgrLfeNpo7WJ/w6zuH1V1bPYwnBxWnX+JP2OiCRu5YUAYHrPRoA/triAZEIK3FuRKbGq3lOxxTYkgvALm+1WTQz3XSvnw+DrT246lMk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734708; c=relaxed/simple; bh=coT7JfYg+gzTjn8jfvGJe+k6ETFzLmTYywGENJ3FRKk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=O+mpOT+HP9aFBLz3dQnIjHL9U5p7QZTYb1HG1nJn75Xbejx0K9KBMKOzuenNQMz7wW6J77Q7NFDpRAzN8DDuee2kTCdtAYsD5QYvV6YbZgoOi1M+JgC4hdlRQinHZ1amc8UxDpv3k073GdU+OjROW/AFmQXlLnLq3KIGRSz74UE= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=bSBPzaMm; arc=fail smtp.client-ip=40.107.92.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="bSBPzaMm" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=NjyNlzlmgnGB8dY1vlvQQOUI1G9nmDrW1Deoebxg3uZdgIBSrd+14A2HuPllJcAkUxrzhDO63deHzS1V139hjxN5D1CdFjnNjuf01Dpx5i8mMvMiNlyIp96qAy+zXqlDMfP+pKkoTKNfEPYjz1VFZyWru4QxX1VRuaByosz5nKDU9T6j8SNYVv22cxWmUyDaipGns7VS2/itkbaIaIwW4lvIIRdMYPo6DIYJ4QoKaXXFfvj5UtCvEyUuPx3d2Zq5X5Y93fxu0A9jnyNEHHL+Ex8m3ko7a1wSjnYncPsYGpNBSNcjNYScqb5zSybxSch4gJ8nDorvhCNxT2w15SKPrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7cYkT3FGb7Blq6PsDHirZ/rYyxXqVErAlhjUc0owqMQ=; b=pTvliKpVB6TC9m906p+S3IhDz0Mo/74ng9lBEIWrkY2bmvEpaHdApUKNo6xJ4WQNDIfEhDrCWgHvWEscCEW3tWG1bM23hcVafAaGJCzDh/+WOU4yzqJvfZeXlqZG/iz+YbzSZxQ5Nu5D+dk0llBTkYO5YoKx7VQnLZw9goX7D7vKxxfQxiLLg9KwmIV26F15WljRuibRJjksDgFIG55Vihg80ASdNzbHRVik8AdoZbUM1Z5Ed1lCRcWW845QJnsKmdKWijsLcgTHTAPl0oYc58ooHpH1IlwmnZGZtVa0pLOh/JjX9lSSrFWgEv6TNVIjqo4POKpjWR2xwRJDvlxxQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7cYkT3FGb7Blq6PsDHirZ/rYyxXqVErAlhjUc0owqMQ=; b=bSBPzaMmnDHFoxUcRbiW5C3GDsyxAeUdEZQCxuJP3ddTt5cd6rEjVBE5yQvqo4RsxHHsxli3PBJhCWqOKui+z+NMFdalR4hPOb7ORJzCxtd2z7SGpPHxMwBfAnKpeFo9VH3KlMHWOmhs9Dp1I62fq+Q+SoHesq9cvkQTzv50lLzZsANlttW/SRL1DH4SWW36zADooVer7g1nphz3BWKdvXeA22YEQyxcps/U+FctUvSb4uVbju/8lXj2JUzkgIyZg17T1HNupvXKndUPYKGDSFFPHuqMPR2Wl2gRxbS/lWFFctKwI99+YmKMPtmvh1iZGJ8VzvBCSHSYRk+RMQmKVw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:36 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:36 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 13/16] iommupt: Add the x86 PAE page table format Date: Thu, 15 Aug 2024 12:11:29 -0300 Message-ID: <13-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL6PEPF0001641D.NAMP222.PROD.OUTLOOK.COM (2603:10b6:22e:400:0:1004:0:12) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: a20f3de1-daee-46c7-3816-08dcbd3c8d4d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: Tdgj4yW5szD7PrBalGKeV3/uakkPL3Pfq0XxaEwKPhfrkGZJiLkSQ7kuKYIrAjvwjSekQJnGXt+7bX/GvxAVH2w/aHKg/ZoW3TjmrAEDGd3rL5ph2vGFBQOAJZDzxJB3+WJ3pmAP/G0v2jXKNyrzvs4vKjdWf3V0gkEdZ0vSVnZwVP50cmQX+73MB3YQH2kxWZa9SogoPEIPeUtqWi0SdkQuIRRWyzQveLgXQABwtr4Qq5I7Cs4SJwA4KAlJbzUyzGR7HUEJI+rt399HI8i/e2cYDogfrOGzeTtdr+hO1lXnyis1shWeFWJYjE6kleiJg8WgVDj7C1dhG6pKoj+69Rg2opsiRly/DySAZrxuuS4CsDYQnO92aPsYZtD+dQFR1gRO+SeP9SlgbWn43TD6HF7wBH5dmPPN3p789hTRpvxS+P8N1s4A7OdpLS4S7HrlUKrke/+lipfO0l9vaR95NAWjWl1vHk3FLlcEm9b6KT3x6EUa0hiMXMk3wm7+NzQXTJWxkntWElvfU3dEFt+sB90K7YqOpIimS8GYIPZrIgE1ygjh/TxVbONyGER9a2nC4DLfUyQnnkJggaQz0tL1G7a8HI2EBUeUx7EaRFTh0KFztLz8froNGB70fRR3JOyhEK/irTtPTkv2Cdgqd/6wINKkGHyUKz3EteznAp1rnjgpFOX1feTC+6FYlqYrKzqTaJCeL2GumiDTmW0ifropf6aaHZRkNl7RHdzPT/xRRj4uYw5u5Va7w8mDjhAYQGW2P13joeIuDQj5MTJ3PV3aXbFkS3wcr2aZpdxRxUFF4ab9bbvsxzKlLGVFw5S6XdabfH/Nz2fMGvaBnnU43HyV1kwH4nAhVsRAWcOCr/Ru0Cs9FlWdOoxqSU14/0TbsIu4hCe4J65OrenuTgjRCX+5CfpTs0nkV8Zb+K+AX8XKaVa5EPbcWSgksXhNpTWauYv7RJJrsSVi8LYseUdFy/zd0YFe0bLWJjNq+ymR45xjSvwxNsBj+q30hRJkobS1cqJACvrS1G103jp5PNx3FM6Hz9c51YvHFO0tX/UExEE/PgA6smChG7QxTH2uwy5bxsRQ+4TA5WpojPUuSO+Az5f2PxscH0l05c4wBaWUhyk9yRO805BOdh39W5WuDDsZmymaHjaI5/yExsgUqFeff+5OlTtJHjsXugGzwTE03eGnbsgOdIcDgv7FRHFtxJmP8hCB+QTtF2h7rnHeu0w/FTcT8b1HSxz7Ohy2kwH+q6z1CY9s4Xfx/FGFSCoE3xg6s4IJ8tGcgDlu4vW1PRT2kQ1mMIl3rYn9xVUMur5s8E3LiQHWCT2//Zh7hjxLVzXyVEMZR1rTi2oRUy8NC/P0o6/Zmg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nPSMwaVqfn3Lx6kOrYwaHByRa51gJQyqYEQ6ZZq5uQwZeACb7sWRUfJ61ESoqXbevQm56ihthjmFIFWwVaLWaRlpw/ryYbgt6a649Cgtvil7csTr8Dn8TTExDV0knJ+xkq3+y56Ky9IrxkM5Njsgk6dGbSeLplgk/xGW1gID/ybhuVV77YxbjVYtXFyKLQHvMXYzFajJZT8bbYEyxfCGgDjs3aQTB4V/RFv+wnPzgH8g2Di/NPORJbHM0UqZMhxm0KEIx2fyKS8PHxtblNuBwA2YtpzcZ2685h+XP3WjO6Bhw960dli96zS5KaQhDGu02oYbY41y+4+Yb9Pv+cNFrQwu4bvgiG8ED/XcR2OuB0Q4gVG7oDMkf7Gtn4st707jO+b0n3TzF9Iwv8zvxn/1sDu9F9tG9vcx8ulb/A9BUCcyvwq8oEyc4LwLye3tdEWK16fMTzxuZ0wIpn2fU/pIf1x1a54y33FOXlGq9STHFhMkJXFsdslVI+0u+Q/9u6hIyHvBsSwGdEx+TX47C1K6tiyC0JJ4lwb2ZLqMpfNC+Nk6odMwGtBCTaZNny5Y+ebY6Ju078pO3eA+Y42k/Icnt58bsp/FGwBzy4oCXZhB89qk1pVJ2l/mgeiDV/Sjxvv5zORZIiTlHZCICZGP71aRq6O4Fe7OAkEilQqMhnGG8Oqeqtqs4a/IbEDItM860K65psOBJATwumqD6ncd5H4E0fRBYUBBuwWxgJQbVOcBKrZDy+TeYk6+X1X9VEQMami0LniiHPb3oZxJfD0nXoB9yAwIerl5VlfA12YiZ6i0ji3Hu4qsSkFxUUz3zTLzcnOwjFGYt/qabo0P2rgzyid136wxQSCWGjUk16/bsM5S45ynAF6EMfKk6gltWXeWP5Arjt8EOcSsOVQ52STLUIMVBzFS/GnwSiHqH2D3bOwFn2GmCX1ObJTLg4tKktMiU5H4Y5vN/wZvo9d6ijuq4nkPEdXo1tKjsyMai/5mAMijxvgJuCqfx/NRjhM/UV9KswN6vILB36VKhGazwLEPQI0soLUzrRSmK96qF1bcL1hc0BO7lrdSVvro0puykqPVu2KLgBtb8UGZQqCmybdxELFx7NBSTqXP3ZfcznhxbUjoiNeNlBQyZL7i3y63hK2X6PnSHRl6+WWtBMZ7Z9slBa7JTigadlAzLHF9p7q+JpCUOOGeoVXKdvEN7j2Tgl9QNXABrysXagbUiXOmIeLtaRzEUNAhANI79YfL6/RK0pg0//b3GRy3GV3pIaADnoEFIUr9F/xd6dsSa9L6/gWe6xOOrKIAIFSOtVPzKYxiYvhUkuXarx70BayauTDQXx/Q5qCbraFOXZeRBCqmAZDNUSqb9x9i8X6yb8Q4daq/raD8vK9p1RytNbeGVnvBExwVELQrKIvx7v4D4W+kwdvGTusmY9+ehckYL8YdlmRxyGOK8GYXozTBWHkBE5Uy0fN5lLKjkGPhS8P0Oz5PkRjkPMfJUUpWzb1pCSFco/8KWDkdblqETqADxzT/lN/N5y2edUcSw59XaR5/e7ohsmgwMCBW0/eKX9eY2u88hP/kSW9BY4c= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: a20f3de1-daee-46c7-3816-08dcbd3c8d4d X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:34.8480 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: X4J+K5XWrS0LSu+03WZEEuwe7iW1ZTM04xhhKdNeA0r+Pbiqw4bLEsyQW5z29Ruz X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 This is used by x86 CPUs and can be used in both x86 IOMMUs. When the x86 IOMMU is running SVA it is using this page table format. This implementation follows the AMD v2 io-pgtable version. There is nothing remarkable here, the format has a variable top and limited support for different page sizes and no contiguous pages support. In principle this can support the 32 bit configuration with fewer table levels. FIXME: Compare the bits against the VT-D version too. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 6 + drivers/iommu/generic_pt/fmt/Makefile | 2 + drivers/iommu/generic_pt/fmt/defs_x86pae.h | 21 ++ drivers/iommu/generic_pt/fmt/iommu_x86pae.c | 8 + drivers/iommu/generic_pt/fmt/x86pae.h | 283 ++++++++++++++++++++ include/linux/generic_pt/common.h | 4 + include/linux/generic_pt/iommu.h | 12 + 7 files changed, 336 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index e34be10cf8bac2..a7c006234fc218 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -70,6 +70,11 @@ config IOMMU_PT_ARMV8_64K If unsure, say N here. +config IOMMU_PT_X86PAE + tristate "IOMMU page table for x86 PAE" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + config IOMMUT_PT_KUNIT_TEST tristate "IOMMU Page Table KUnit Test" if !KUNIT_ALL_TESTS select IOMMU_IO_PGTABLE @@ -78,6 +83,7 @@ config IOMMUT_PT_KUNIT_TEST depends on IOMMU_PT_ARMV8_4K || !IOMMU_PT_ARMV8_4K depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K + depends on IOMMU_PT_X86PAE || !IOMMU_PT_X86PAE default KUNIT_ALL_TESTS endif endif diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index 16031fc1270178..fe3d7ae3685468 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -6,6 +6,8 @@ iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_4K) += armv8_4k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_16K) += armv8_16k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k +iommu_pt_fmt-$(CONFIG_IOMMU_PT_X86PAE) += x86pae + IOMMU_PT_KUNIT_TEST := define create_format obj-$(2) += iommu_$(1).o diff --git a/drivers/iommu/generic_pt/fmt/defs_x86pae.h b/drivers/iommu/generic_pt/fmt/defs_x86pae.h new file mode 100644 index 00000000000000..0d93454264b5da --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_x86pae.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_X86PAE_H +#define __GENERIC_PT_FMT_DEFS_X86PAE_H + +#include +#include + +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct x86pae_pt_write_attrs { + u64 descriptor_bits; + gfp_t gfp; +}; +#define pt_write_attrs x86pae_pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_x86pae.c b/drivers/iommu/generic_pt/fmt/iommu_x86pae.c new file mode 100644 index 00000000000000..f7ec71c61729e3 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_x86pae.c @@ -0,0 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT x86pae +#define PT_SUPPORTED_FEATURES 0 + +#include "iommu_template.h" diff --git a/drivers/iommu/generic_pt/fmt/x86pae.h b/drivers/iommu/generic_pt/fmt/x86pae.h new file mode 100644 index 00000000000000..9e0ee74275fcb3 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/x86pae.h @@ -0,0 +1,283 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * x86 PAE page table + * + * This is described in + * Section "4.4 PAE Paging" of the Intel Software Developer's Manual Volume 3 + * Section "2.2.6 I/O Page Tables for Guest Translations" of the "AMD I/O + * Virtualization Technology (IOMMU) Specification" + * + * It is used by x86 CPUs and The AMD and VT-D IOMMU HW. + * + * The named levels in the spec map to the pts->level as: + * Table/PTE - 0 + * Directory/PDE - 1 + * Directory Ptr/PDPTE - 2 + * PML4/PML4E - 3 + * PML5/PML5E - 4 + * FIXME: __sme_set + */ +#ifndef __GENERIC_PT_FMT_X86PAE_H +#define __GENERIC_PT_FMT_X86PAE_H + +#include "defs_x86pae.h" +#include "../pt_defs.h" + +#include +#include +#include + +enum { + PT_MAX_OUTPUT_ADDRESS_LG2 = 52, + PT_MAX_VA_ADDRESS_LG2 = 57, + PT_ENTRY_WORD_SIZE = sizeof(u64), + PT_MAX_TOP_LEVEL = 4, + PT_GRANUAL_LG2SZ = 12, + PT_TABLEMEM_LG2SZ = 12, +}; + +/* Shared descriptor bits */ +enum { + X86PAE_FMT_P = BIT(0), + X86PAE_FMT_RW = BIT(1), + X86PAE_FMT_U = BIT(2), + X86PAE_FMT_A = BIT(5), + X86PAE_FMT_D = BIT(6), + X86PAE_FMT_OA = GENMASK_ULL(51, 12), + X86PAE_FMT_XD = BIT_ULL(63), +}; + +/* PDPTE/PDE */ +enum { + X86PAE_FMT_PS = BIT(7), +}; + +#define common_to_x86pae_pt(common_ptr) \ + container_of_const(common_ptr, struct pt_x86pae, common) +#define to_x86pae_pt(pts) common_to_x86pae_pt((pts)->range->common) + +static inline pt_oaddr_t x86pae_pt_table_pa(const struct pt_state *pts) +{ + return log2_mul(FIELD_GET(X86PAE_FMT_OA, pts->entry), + PT_TABLEMEM_LG2SZ); +} +#define pt_table_pa x86pae_pt_table_pa + +static inline pt_oaddr_t x86pae_pt_entry_oa(const struct pt_state *pts) +{ + return log2_mul(FIELD_GET(X86PAE_FMT_OA, pts->entry), PT_GRANUAL_LG2SZ); +} +#define pt_entry_oa x86pae_pt_entry_oa + +static inline bool x86pae_pt_can_have_leaf(const struct pt_state *pts) +{ + return pts->level <= 2; +} +#define pt_can_have_leaf x86pae_pt_can_have_leaf + +static inline unsigned int +x86pae_pt_table_item_lg2sz(const struct pt_state *pts) +{ + return PT_GRANUAL_LG2SZ + + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64))) * pts->level; +} +#define pt_table_item_lg2sz x86pae_pt_table_item_lg2sz + +static inline unsigned int x86pae_pt_num_items_lg2(const struct pt_state *pts) +{ + return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64)); +} +#define pt_num_items_lg2 x86pae_pt_num_items_lg2 + +static inline enum pt_entry_type x86pae_pt_load_entry_raw(struct pt_state *pts) +{ + const u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + if (!(entry & X86PAE_FMT_P)) + return PT_ENTRY_EMPTY; + if (pts->level == 0 || + (x86pae_pt_can_have_leaf(pts) && (pts->entry & X86PAE_FMT_PS))) + return PT_ENTRY_OA; + return PT_ENTRY_TABLE; +} +#define pt_load_entry_raw x86pae_pt_load_entry_raw + +static inline void +x86pae_pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = X86PAE_FMT_P | + FIELD_PREP(X86PAE_FMT_OA, log2_div(oa, PT_GRANUAL_LG2SZ)) | + attrs->descriptor_bits; + if (pts->level != 0) + entry |= X86PAE_FMT_PS; + + WRITE_ONCE(tablep[pts->index], entry); + pts->entry = entry; +} +#define pt_install_leaf_entry x86pae_pt_install_leaf_entry + +static inline bool x86pae_pt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + /* + * FIXME according to the SDM D is ignored by HW on table pointers? + * io_pgtable_v2 sets it + */ + entry = X86PAE_FMT_P | X86PAE_FMT_RW | X86PAE_FMT_U | X86PAE_FMT_A | + X86PAE_FMT_D | + FIELD_PREP(X86PAE_FMT_OA, log2_div(table_pa, PT_GRANUAL_LG2SZ)); + return pt_table_install64(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table x86pae_pt_install_table + +static inline void x86pae_pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + attrs->descriptor_bits = pts->entry & + (X86PAE_FMT_RW | X86PAE_FMT_U | X86PAE_FMT_A | + X86PAE_FMT_D | X86PAE_FMT_XD); +} +#define pt_attr_from_entry x86pae_pt_attr_from_entry + +static inline void x86pae_pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u64 *tablep = pt_cur_table(pts, u64); + + WRITE_ONCE(tablep[pts->index], 0); +} +#define pt_clear_entry x86pae_pt_clear_entry + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_x86pae + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->x86pae_pt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, x86pae_pt.common) + ->iommu; +} + +static inline int x86pae_pt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + u64 pte; + + pte = X86PAE_FMT_U | X86PAE_FMT_A | X86PAE_FMT_D; + if (iommu_prot & IOMMU_WRITE) + pte |= X86PAE_FMT_RW; + + attrs->descriptor_bits = pte; + return 0; +} +#define pt_iommu_set_prot x86pae_pt_iommu_set_prot + +static inline int x86pae_pt_iommu_fmt_init(struct pt_iommu_x86pae *iommu_table, + struct pt_iommu_x86pae_cfg *cfg) +{ + struct pt_x86pae *table = &iommu_table->x86pae_pt; + + pt_top_set_level(&table->common, 3); // FIXME settable + return 0; +} +#define pt_iommu_fmt_init x86pae_pt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static void x86pae_pt_kunit_setup_cfg(struct pt_iommu_x86pae_cfg *cfg) +{ +} +#define pt_kunit_setup_cfg x86pae_pt_kunit_setup_cfg +#endif + +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_AMD_IOMMU) +#include +#include "../../amd/amd_iommu_types.h" + +static struct io_pgtable_ops * +x86pae_pt_iommu_alloc_io_pgtable(struct pt_iommu_x86pae_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **pgtbl_cfg) +{ + struct amd_io_pgtable *pgtable; + struct io_pgtable_ops *pgtbl_ops; + + /* + * AMD expects that io_pgtable_cfg is allocated to its type by the + * caller. + */ + pgtable = kzalloc(sizeof(*pgtable), GFP_KERNEL); + if (!pgtable) + return NULL; + + pgtable->iop.cfg.iommu_dev = iommu_dev; + pgtable->iop.cfg.amd.nid = NUMA_NO_NODE; + pgtbl_ops = alloc_io_pgtable_ops(AMD_IOMMU_V2, &pgtable->iop.cfg, NULL); + if (!pgtbl_ops) { + kfree(pgtable); + return NULL; + } + *pgtbl_cfg = &pgtable->iop.cfg; + return pgtbl_ops; +} +#define pt_iommu_alloc_io_pgtable x86pae_pt_iommu_alloc_io_pgtable + +static void x86pae_pt_iommu_free_pgtbl_cfg(struct io_pgtable_cfg *pgtbl_cfg) +{ + struct amd_io_pgtable *pgtable = + container_of(pgtbl_cfg, struct amd_io_pgtable, iop.cfg); + + kfree(pgtable); +} +#define pt_iommu_free_pgtbl_cfg x86pae_pt_iommu_free_pgtbl_cfg + +static void x86pae_pt_iommu_setup_ref_table(struct pt_iommu_x86pae *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct amd_io_pgtable *pgtable = + container_of(pgtbl_cfg, struct amd_io_pgtable, iop.cfg); + struct pt_common *common = &iommu_table->x86pae_pt.common; + + if (pgtbl_cfg->ias == 52 && PT_MAX_TOP_LEVEL >= 3) + pt_top_set(common, (struct pt_table_p *)pgtable->pgd, 3); + else if (pgtbl_cfg->ias == 57 && PT_MAX_TOP_LEVEL >= 4) + pt_top_set(common, (struct pt_table_p *)pgtable->pgd, 4); + else + WARN_ON(true); +} +#define pt_iommu_setup_ref_table x86pae_pt_iommu_setup_ref_table + +static u64 x86pae_pt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) + return pts->entry & (~(u64)(X86PAE_FMT_OA)); + return pts->entry; +} +#define pt_kunit_cmp_mask_entry x86pae_pt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index e8d489dff756a8..e35fb83657f73b 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -126,4 +126,8 @@ enum { PT_FEAT_ARMV8_NS, }; +struct pt_x86pae { + struct pt_common common; +}; + #endif diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index bf139c5657fc06..ca69bb6192d1a7 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -289,4 +289,16 @@ static inline int pt_iommu_armv8_init(struct pt_iommu_armv8 *table, } } +struct pt_iommu_x86pae { + struct pt_iommu iommu; + struct pt_x86pae x86pae_pt; +}; + +struct pt_iommu_x86pae_cfg { + struct device *iommu_device; + unsigned int features; +}; +int pt_iommu_x86pae_init(struct pt_iommu_x86pae *table, + struct pt_iommu_x86pae_cfg *cfg, gfp_t gfp); + #endif From patchwork Thu Aug 15 15:11:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764919 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2073.outbound.protection.outlook.com [40.107.92.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17A5B19DF9C for ; Thu, 15 Aug 2024 15:11:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.73 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734711; cv=fail; b=HraSgLdsMdzhhLKf9kYQpC0u3TNLeJy+fKqk2+HTNThFd+g6tOj8OeSI76tmzThjnRL+fyF/iMLiU5Wvu9gEDuSkj/kV4I+aw91rknJ2lr0xCccAKf4ZA4VracJh178gC/AHmbt/bh/OA3C7k9q6PiKJOMp9fK6NwIpt2PmedVc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734711; c=relaxed/simple; bh=floNC/8wQVWXeV4prn2EKTuRfrjjmUW7uhW5+F2hU9E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=tAFhWcIWGjbjRa1EidollYfolr3SdV3XktzVR+gcnH9ZvwjBC/6gOwz8hu1WGhcwOB+DtDxa0xD53JApjQNApRjImBf4lQ5mTH2ohKMOPcfeyYBsHJ8vQ6dtb7ekzr3MDlYvy+Sgk+ckboswF977wfVdf9xyMLk8OAgMqC/CKsM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Kd5nJ/I9; arc=fail smtp.client-ip=40.107.92.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Kd5nJ/I9" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=u3nNYvwhCsxvB4i3LM8P1CozhTomhgq45/FOFYBpaNQNKFbV2CYfPoiZm+aF5w8dGMW7Z6CV6WhkDVilSz67y/z2rQ6R8HfBMmMoxwT6vNHmxwrOcVaXJFmqswfvUWLc1U5KvwrNLMyQ5H0pkPCl9DxJCot5Bkx81g7I7qmQQZ19s9wryKGpnNbC7iY7KAlSm1osjo06y/eTaRdUCkJa9Rw6HP/4nQBoyq8VervDFVkREE+xiAvo4F3TeJD984ZYwk3OR4f4TrALgK70rIZCu/XB9UAxhOrqEwklNxRAOBi2yL3cwFxSX9aPGfXT0/n2XL7XwGKbLveIe3GeVjqtxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=g0j/1k4c/G8MHTtnGIYqp87lKLwi6X64vXDrTwMalFs=; b=CHUYyOU9b/83aolD5NIdRneCSCCIzQLKhI5FU02DuOpO4cGIzLqCYrgY11v4asyWNWucaC1CeeZ9vawzxQDMLGN1rYhEsuEJiWcyQPQXgXWDzOwK1l39C8rXP4CFZmFMCTQ4YO5ZSjdMKNA943GAQKGam6ZgJEckoTYI7h4pUmYiNXu0VQKKUBR0CW46cTkyy5prcsbEBkceoA9NLUojXU40CPMWAuOiBF8aE9n+7trPlt/avWRYYMpbOsIOGMnTKXQeMl9E44Z3QSRE6OP0ll1+O6m3IVAXKV0ooYy6Wfs18nYoia2Og5Uvna5Tp3WfaGnJa4CRr91BaTtsGN67nw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=g0j/1k4c/G8MHTtnGIYqp87lKLwi6X64vXDrTwMalFs=; b=Kd5nJ/I9PmvYUIs4so1zDZTU/v2lRWFhqNlbndWJxmrj0mY02zEnbgYx9cymao1FzgWg797hUsw7K8c4AsdDzrTUswIq7Jn854DUKiIS2pN+fCt1oj+sBA6/zwsJYURECllh3ehjKzRNf6F+GxMYrJ5M+wYOxlSQRWMuv37Rcox7m/dU2EBBCPP2oj0fhy7Gq+/0c0ryZjN5vZyFaFhw70iHAIRqjcCUenXnDD6fpvbmZI/VF8TY/y4AfBZEqNoLgMARaQ3qNEkGXJ4kQhPpsR8I8l/3vdi2WipXg/9TgvTLn9tEKe87IexItIZ2X4LA2QtRdg/vlatnaAoUflR/ZA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by DS0PR12MB6631.namprd12.prod.outlook.com (2603:10b6:8:d1::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Thu, 15 Aug 2024 15:11:41 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:41 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 14/16] iommupt: Add the DART v1/v2 page table format Date: Thu, 15 Aug 2024 12:11:30 -0300 Message-ID: <14-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: MN2PR05CA0054.namprd05.prod.outlook.com (2603:10b6:208:236::23) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|DS0PR12MB6631:EE_ X-MS-Office365-Filtering-Correlation-Id: 6b8c7175-a6a6-4fd8-3b8d-08dcbd3c8e1d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: UvLiRKekO0lqf54baC2BFGyUofW26wDh9wUg8xTnDQcIX3aqloLR/DyBK1Uqj9mvlqSVce+yaRR0/+pHI++czCtHuJwaYMMNy5PO+54KLTdizNuE4X/qUrsW8V5kpiZv/ViwFDoADMEqHJo3gb62cE1WnRuFOyJkgs81zqh2h7Yv8RCMpcfJN1fLkLFb+pFpITgQhZOyBdcGW3Hw717j7kyDvOXyXdPAzruS8bh7ybs2hw9zFxgmlHfCqI+0guNDN1T1tLUvYycqSI1rytg4aqc5Bll5BA+gaF+fd5gjnWtcBdGfyiIXzhpXfQYkXrHyOOub0RgOZ/7Th75gbhg4f4maXWsfwgwA/z3mgVCmTophWT9o7C8BQIXAfWYZFilc8eNKFNQP7H8JTIrKnkInrxY4Q9ffWaSEakiTqAXuWAU2zsm/lkVruy3z3SDa9tEQ23KjvPlUjXi47+xzS3nFmeqtUj7yUSn77tN+XT/BaOw82dnMXQRaPISOZ/VS/fpGcPZ6arO9M/A+LZgCP7XDa6ehTeSKv0JxmXMCRazq2MQ8p0mjRKDfknI2z31SODFmViPGoD8pqv7EVuNUA8wJJMuQ7bstQrMmhqPkXGZhUEU7LnRuSf6VI+HmhNHZn4mf97/MQDKa3QcM+6BPDNcrYk6IuTCL1tJfDygd/o4WwRytXnKB9gcizvD0GcMBkyP2hiZetB/J72qMy/8/sim4ORjMqTLqbT4ljjOCnn/RkX2LpwBIDOn5InCCj3BHfwD3D5ije2pnMZmgjBxyek5YUl4kmM4a0rbDECVjtjFMNZ5gMayXYHTcf7JUM3LeGBpDB+V7oWZm5sFrjOJKsmo6XnzsX9iH8zVFv2ceg6pS1SzqlNbGf82t52yVGYVVOx6h03l/tYdE9tyYHK5syESRSk4BjFccsRQtEon3tXnBCHeU5SwJNgbyaMjBwNle/f5duFmUzt+gT7fuxO9gy1dPGxvLGqXZHnN8VSUZOPgaLRKuuk7bzvH+k5xjNVksBk+gscGEtQJlereRz/cLpETXgrqPu0p+FJrL4XwFFeFZjUWAImI0lf/M0GADIhdb59H04y6rdEw4RcNV/xAe9IbCR7hxJCcrQQ3ifZg8k3nd2zeoPireUwgRVgfxM7brWnue9zkg0sW+FxAf91+aX7zL13tmqB9B4nAGmoQ8gut65A6LRSh1JBnU1Z1bbFdffcZeB9Nfso7gY7eW2HZ+rhGdNeRm+10z+dbepcwC8P+xGT5V1wNB6cEvK42jSEch4reHXLiNVkMMCmrqJ9s8HI5/H7mSZJjE1X9qyUSOHG3B/aau2t9qPSFlvTJ+linCW/alObi3LBLoRW06CmQmgfO/kw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 7V+UOlVKhruoY2eCREAa0rCGGgUbV8RnbNa7Sh+zTZcWIDIee+glD/3atvxEC8G+j+F8MbM8RWAIp3yRzpqN7YQi4c3lh7qr3LFoHRjAIRQSxPxbXGM2kAc/EBzySCDH9iSSq5TfFke4F5szSaJ/D8wWquZRepccxwMg/TuDNbhQ7su13+So6DEAjeRQGHmJR2RVAXZXaKfYOBCbFfyCBnqHmOy5UEXg2gAqTA6Uo8lzvWbpUAx22zMHm0GIDbi825voUUdJfiGOiEpeevurFuSGcTa04T+Zpk3aEl9QzQKb6u78VYENLjuTqy7xd58Gfn/HZNS5eX2Zsk6qu8tcYyOo1fAR5umqiFO1h+X8RKIIHS2xnW7Qe+LLsMVYzjccrztz8v4P7MOh2ZkcwljjMKgUzia/Uhqi9F+vbzxrZNnxXsvJ+CyfgyATwbNjGtu/jkvPwmjmmb6zOsVCR5l1pPU+/daiFZ9rNX4MJImpFdvz4fOQRBWIHxW5kBL12IPEzlsI3FtbqbG7urH/rEJgLngEy21OLakfoOKnrfprHWg1Pdsjhfh+oDTKySI6gSs8eqpsxsGCvANGgAuKxWNCl1PtEEMQUDLzvCSvcgNchI0AXOrzstb3HfsWm3tfnElteJsoAF2XksanV11YYBLW4XehY0UgBgnHyVyX3q+5IXmG6yZVXrifk24ieW9gB47cFSRAke65Ox1BJg5jCDC3wEVTr3w11c/KKrfMElqJzkxwbcA9sPXspDTgkbwBNBlD5E5jL/RdRAVfAyzAH92bzf2DEx1+0DbUbFrIaZhq3zk6QJZFY0pqRhX0suAtfatgIixNd8BQz8+pUfANkE1wC1rg3Hzp1erwhyIy+I3SbW0r1tsQ9+YhkqhbswR+UW6BdAx7nUvykdukl0GnesOD/2HiEEVbru8BP+tpnHWzVPRp68pdJMqXtjh44Gy94K0RcjdEyoZ/xR5eXTBjUq3GgIo9LFGuS6F83LsKUJ8rAo/C3A+nT3UE44fut+HBwRutQkSeLNow9bMUUWZDgWxRRirW4vJTiyqiJ8DBZaRt5E9hxOLQ7HCTo3E+yF0K8OhMaWERBYbBmgxMjJ088ZcZgFUL3Tx4yS7p5ETzkYF1SYyq/ktyrX07FPfOCkNkQfQ2gJ6SfKD5Y20uR6U2GidfkfRtjGGct6JMGK58YfkZN3M+5GW/q5Bf6eK2aU4NC/scrvGjgUSOGLg1kcAwZdgMD/YLW55y64b2uOYgBEO5lIrxAuyT9bmHbao8I9Bm821ScIgKVRwebx+tHLYuf9uipOMW6m4+Oer39JkBLP8Z27gUH2DS+aHteDB1CpdKzJc8n5RczUW6aeRbqhDRnj6xrEO7nUJhJ3yV1bFWsXILj9tL98XzVhnG+fdwCbvpmmmNCOECGdsHOz8xJR8wTilAdiAa/7VgtZwIbeHfa6Px43Ubbn0VyMi2lJfbVASPUyECNjxqilPen2oqpi+O/iLhGMpG9Ljyjy4/2rQyHPCs0vVL94peqT2FGn4tXJW1/hqJrqOAAgf4pNxfrJDHWk/OwW/EPYmfqkP+UFzp+dSSWcQ= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6b8c7175-a6a6-4fd8-3b8d-08dcbd3c8e1d X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:36.2640 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: wss0bsSHnKyPMCGdzNpG/VZD9W9o6Hf3IErKo31wL5WUgjZ3sQ5o1AiCxwKHuaIN X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB6631 This is used by Apple Silicon CPUs. DART is quite simple, with only two actual page table levels and only a single page size. No support for variable top levels. DARTv2 vs v1 has quite a different format, it may deserve its own file. DART has a unique feature where the top most level in the page table can be stored in the device registers. This level is limited to 4 entries. The approach here is to add an extra level to the table and the core code will permanently populate it with subtables at init time (FIXME core part not done yet). The IOMMU driver can then read back the subtables and load them into the TTBR registers. From an algorithm perspective this is just another table level with 4 items. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 12 + drivers/iommu/generic_pt/fmt/Makefile | 2 + drivers/iommu/generic_pt/fmt/dart.h | 371 ++++++++++++++++++++++ drivers/iommu/generic_pt/fmt/defs_dart.h | 21 ++ drivers/iommu/generic_pt/fmt/iommu_dart.c | 8 + include/linux/generic_pt/common.h | 10 + include/linux/generic_pt/iommu.h | 15 + 7 files changed, 439 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/dart.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_dart.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_dart.c diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index a7c006234fc218..5ff07eb6bd8729 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -70,6 +70,17 @@ config IOMMU_PT_ARMV8_64K If unsure, say N here. +config IOMMU_PT_DART + tristate "IOMMU page table for Apple Silicon DART" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + help + Enable support for the Apple DART pagetable formats. These include + the t8020 and t6000/t8110 DART formats used in Apple M1/M2 family + SoCs. + + If unsure, say N here. + config IOMMU_PT_X86PAE tristate "IOMMU page table for x86 PAE" depends on !GENERIC_ATOMIC64 # for cmpxchg64 @@ -83,6 +94,7 @@ config IOMMUT_PT_KUNIT_TEST depends on IOMMU_PT_ARMV8_4K || !IOMMU_PT_ARMV8_4K depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K + depends on IOMMU_PT_DART || !IOMMU_PT_DART depends on IOMMU_PT_X86PAE || !IOMMU_PT_X86PAE default KUNIT_ALL_TESTS endif diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index fe3d7ae3685468..a41a27561a82d0 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -6,6 +6,8 @@ iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_4K) += armv8_4k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_16K) += armv8_16k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k +iommu_pt_fmt-$(CONFIG_IOMMU_PT_DART) += dart + iommu_pt_fmt-$(CONFIG_IOMMU_PT_X86PAE) += x86pae IOMMU_PT_KUNIT_TEST := diff --git a/drivers/iommu/generic_pt/fmt/dart.h b/drivers/iommu/generic_pt/fmt/dart.h new file mode 100644 index 00000000000000..25b1e61908ab36 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/dart.h @@ -0,0 +1,371 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * DART Page Table + * + * This is derived from io-pgtable-dart.c + * + * Use a three level structure: + * L1 - 0 + * L2 - 1 + * PGD/TTBR's - 2 + * + * The latter level is folded into some other datastructure, in the + * io-pgtable-dart implementation this was a naked array, but here we make it a + * full level. + * + * FIXME: is it a mistake to put v1 and v2 into the same file? They seem quite + * different and if v1 is always a 4k granule and v2 always 16k it would make + * sense to split them. + * + * FIXME: core code should prepopulate the level 2 table + */ +#ifndef __GENERIC_PT_FMT_DART_H +#define __GENERIC_PT_FMT_DART_H + +#include "defs_dart.h" +#include "../pt_defs.h" + +#include +#include +#include + +enum { + PT_ENTRY_WORD_SIZE = sizeof(u64), + PT_MAX_TOP_LEVEL = 2, + DART_NUM_TTBRS_LG2 = ilog2(4), + /* + * This depends on dartv1/v2 and the granule size. max_vasz_lg2 has the + * right value + */ + PT_MAX_VA_ADDRESS_LG2 = 0, +}; + +enum { + DART_FMT_VALID = BIT(0), + DART_FMT_PTE_SUBPAGE_START = GENMASK_ULL(63, 52), + DART_FMT_PTE_SUBPAGE_END = GENMASK_ULL(51, 40), +}; + +/* DART v1 PTE layout */ +enum { + DART_FMT1_PTE_PROT_SP_DIS = BIT(1), + DART_FMT1_PTE_PROT_NO_WRITE = BIT(7), + DART_FMT1_PTE_PROT_NO_READ = BIT(8), + DART_FMT1_PTE_OA = GENMASK_ULL(35, 12), +}; + +/* DART v2 PTE layout */ +enum { + DART_FMT2_PTE_PROT_NO_CACHE = BIT(1), + DART_FMT2_PTE_PROT_NO_WRITE = BIT(2), + DART_FMT2_PTE_PROT_NO_READ = BIT(3), + DART_FMT2_PTE_OA = GENMASK_ULL(37, 10), +}; + +#define common_to_dartpt(common_ptr) \ + container_of_const(common_ptr, struct pt_dart, common) +#define to_dartpt(pts) common_to_dartpt((pts)->range->common) + +static inline unsigned int dartpt_granule_lg2sz(const struct pt_common *common) +{ + const struct pt_dart *dartpt = common_to_dartpt(common); + + return dartpt->granule_lg2sz; +} + +static inline pt_oaddr_t dartpt_oa(const struct pt_state *pts) +{ + if (pts_feature(pts, PT_FEAT_DART_V2)) + return log2_mul(FIELD_GET(DART_FMT2_PTE_OA, pts->entry), 14); + return log2_mul(FIELD_GET(DART_FMT1_PTE_OA, pts->entry), 12); +} + +static inline u64 dartpt_make_oa(const struct pt_state *pts, pt_oaddr_t oa) +{ + if (pts_feature(pts, PT_FEAT_DART_V2)) + return FIELD_PREP(DART_FMT2_PTE_OA, log2_div(oa, 14)); + return FIELD_PREP(DART_FMT1_PTE_OA, log2_div(oa, 12)); +} + +static inline unsigned int +dartpt_max_output_address_lg2(const struct pt_common *common) +{ + /* Width of the OA field plus the pfn size */ + if (pt_feature(common, PT_FEAT_DART_V2)) + return (37 - 10 + 1) + 14; + return (35 - 12 + 1) + 12; +} +#define pt_max_output_address_lg2 dartpt_max_output_address_lg2 + +static inline pt_oaddr_t dartpt_table_pa(const struct pt_state *pts) +{ + return dartpt_oa(pts); +} +#define pt_table_pa dartpt_table_pa + +static inline pt_oaddr_t dartpt_entry_oa(const struct pt_state *pts) +{ + return dartpt_oa(pts); +} +#define pt_entry_oa dartpt_entry_oa + +static inline bool dartpt_can_have_leaf(const struct pt_state *pts) +{ + return pts->level == 0; +} +#define pt_can_have_leaf dartpt_can_have_leaf + +static inline unsigned int dartpt_table_item_lg2sz(const struct pt_state *pts) +{ + unsigned int granule_lg2sz = dartpt_granule_lg2sz(pts->range->common); + + return granule_lg2sz + + (granule_lg2sz - ilog2(sizeof(u64))) * pts->level; +} +#define pt_table_item_lg2sz dartpt_table_item_lg2sz + +static inline unsigned int dartpt_num_items_lg2(const struct pt_state *pts) +{ + /* Level 3 is the TTBRs + if (pts->level == 3) + return DART_NUM_TTBRS_LG2; + */ + return dartpt_granule_lg2sz(pts->range->common) - ilog2(sizeof(u64)); +} +#define pt_num_items_lg2 dartpt_num_items_lg2 + +static inline enum pt_entry_type dartpt_load_entry_raw(struct pt_state *pts) +{ + const u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + if (!(entry & DART_FMT_VALID)) + return PT_ENTRY_EMPTY; + if (pts->level == 0) + return PT_ENTRY_OA; + return PT_ENTRY_TABLE; +} +#define pt_load_entry_raw dartpt_load_entry_raw + +static inline void dartpt_install_leaf_entry(struct pt_state *pts, + pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = DART_FMT_VALID | dartpt_make_oa(pts, oa) | + attrs->descriptor_bits; + /* subpage protection: always allow access to the entire page */ + entry |= FIELD_PREP(DART_FMT_PTE_SUBPAGE_START, 0) | + FIELD_PREP(DART_FMT_PTE_SUBPAGE_END, 0xfff); + + WRITE_ONCE(tablep[pts->index], entry); + pts->entry = entry; +} +#define pt_install_leaf_entry dartpt_install_leaf_entry + +static inline bool dartpt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = DART_FMT_VALID | dartpt_make_oa(pts, table_pa); + return pt_table_install64(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table dartpt_install_table + +static inline void dartpt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + if (pts_feature(pts, PT_FEAT_DART_V2)) + attrs->descriptor_bits = pts->entry & + (DART_FMT2_PTE_PROT_NO_CACHE | + DART_FMT2_PTE_PROT_NO_WRITE | + DART_FMT2_PTE_PROT_NO_READ); + else + attrs->descriptor_bits = pts->entry & + (DART_FMT1_PTE_PROT_SP_DIS | + DART_FMT1_PTE_PROT_NO_WRITE | + DART_FMT1_PTE_PROT_NO_READ); +} +#define pt_attr_from_entry dartpt_attr_from_entry + +static inline void dartpt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u64 *tablep = pt_cur_table(pts, u64); + + WRITE_ONCE(tablep[pts->index], 0); +} +#define pt_clear_entry dartpt_clear_entry + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_dart + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->dartpt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, dartpt.common) + ->iommu; +} + +static inline int dartpt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + u64 pte = 0; + + if (pt_feature(common, PT_FEAT_DART_V2)) { + if (!(iommu_prot & IOMMU_WRITE)) + pte |= DART_FMT2_PTE_PROT_NO_WRITE; + if (!(iommu_prot & IOMMU_READ)) + pte |= DART_FMT2_PTE_PROT_NO_READ; + if (!(iommu_prot & IOMMU_CACHE)) + pte |= DART_FMT2_PTE_PROT_NO_CACHE; + + /* + * FIXME is this a bug in io-pgtable-dart? It uncondtionally + * sets DART_FMT1_PTE_PROT_SP_DIS which is called NO_CACHE on + * v2 + */ + pte |= DART_FMT2_PTE_PROT_NO_CACHE; + } else { + if (!(iommu_prot & IOMMU_WRITE)) + pte |= DART_FMT1_PTE_PROT_NO_WRITE; + if (!(iommu_prot & IOMMU_READ)) + pte |= DART_FMT1_PTE_PROT_NO_READ; + pte |= DART_FMT1_PTE_PROT_SP_DIS; + } + + attrs->descriptor_bits = pte; + return 0; +} +#define pt_iommu_set_prot dartpt_iommu_set_prot + +static inline int dartpt_iommu_fmt_init(struct pt_iommu_dart *iommu_table, + struct pt_iommu_dart_cfg *cfg) +{ + struct pt_dart *table = &iommu_table->dartpt; + unsigned int l2_va_lg2sz; + unsigned int l3_num_items_lg2; + + /* The V2 OA requires a 16k page size */ + if (pt_feature(&iommu_table->dartpt.common, PT_FEAT_DART_V2)) + cfg->pgsize_bitmap = + log2_mod(cfg->pgsize_bitmap, ilog2(SZ_16K)); + + if ((cfg->oas_lg2 != 36 && cfg->oas_lg2 != 42) || + cfg->ias_lg2 > cfg->oas_lg2 || + !(cfg->pgsize_bitmap & (SZ_4K | SZ_16K))) + return -EOPNOTSUPP; + + /* + * The page size reflects both the size of the tables and the minimum + * granule size. + */ + table->granule_lg2sz = log2_ffs(cfg->pgsize_bitmap); + + /* Size of VA covered by the first two levels */ + l2_va_lg2sz = table->granule_lg2sz + + (table->granule_lg2sz - ilog2(sizeof(u64))) * 2; + + table->common.max_vasz_lg2 = cfg->ias_lg2; + if (cfg->ias_lg2 <= l2_va_lg2sz) { + /* + * Only a single TTBR, don't use the TTBR table, the table_root + * pointer will be TTBR[0] + */ + l3_num_items_lg2 = ilog2(1); + pt_top_set_level(&table->common, 1); + } else { + l3_num_items_lg2 = cfg->ias_lg2 - l2_va_lg2sz; + if (l3_num_items_lg2 > DART_NUM_TTBRS_LG2) + return -EOPNOTSUPP; + /* + * Otherwise the level=3 table holds the TTBR values encoded as + * page table entries. + */ + pt_top_set_level(&table->common, 2); + } + return 0; +} +#define pt_iommu_fmt_init dartpt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static void dart_pt_kunit_setup_cfg(struct pt_iommu_dart_cfg *cfg) +{ + cfg->features &= ~(BIT(PT_FEAT_DART_V2)); + cfg->oas_lg2 = 36; + cfg->ias_lg2 = 30; + cfg->pgsize_bitmap = SZ_4K; +} +#define pt_kunit_setup_cfg dart_pt_kunit_setup_cfg +#endif + +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_IOMMU_IO_PGTABLE_DART) +#include + +static struct io_pgtable_ops * +dartpt_iommu_alloc_io_pgtable(struct pt_iommu_dart_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **unused_pgtbl_cfg) +{ + struct io_pgtable_cfg pgtbl_cfg = {}; + enum io_pgtable_fmt fmt; + + pgtbl_cfg.ias = cfg->ias_lg2; + pgtbl_cfg.oas = cfg->oas_lg2; + pgtbl_cfg.pgsize_bitmap = SZ_4K; + pgtbl_cfg.coherent_walk = true; + + if (cfg->features & BIT(PT_FEAT_DART_V2)) + fmt = APPLE_DART2; + else + fmt = APPLE_DART; + + return alloc_io_pgtable_ops(fmt, &pgtbl_cfg, NULL); +} +#define pt_iommu_alloc_io_pgtable dartpt_iommu_alloc_io_pgtable + +static void dartpt_iommu_setup_ref_table(struct pt_iommu_dart *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct pt_common *common = &iommu_table->dartpt.common; + + /* FIXME should test multi-ttbr tables */ + WARN_ON(pgtbl_cfg->apple_dart_cfg.n_ttbrs != 1); + pt_top_set(common, __va(pgtbl_cfg->apple_dart_cfg.ttbr[0]), 1); +} +#define pt_iommu_setup_ref_table dartpt_iommu_setup_ref_table + +static u64 dartpt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) { + if (pts_feature(pts, PT_FEAT_DART_V2)) + return pts->entry & (~(u64)DART_FMT2_PTE_OA); + return pts->entry & (~(u64)DART_FMT1_PTE_OA); + } + return pts->entry; +} +#define pt_kunit_cmp_mask_entry dartpt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/drivers/iommu/generic_pt/fmt/defs_dart.h b/drivers/iommu/generic_pt/fmt/defs_dart.h new file mode 100644 index 00000000000000..9b074d22cc6bc0 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_dart.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_DART_H +#define __GENERIC_PT_FMT_DEFS_DART_H + +#include +#include + +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct dart_pt_write_attrs { + u64 descriptor_bits; + gfp_t gfp; +}; +#define pt_write_attrs dart_pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_dart.c b/drivers/iommu/generic_pt/fmt/iommu_dart.c new file mode 100644 index 00000000000000..67e8198a79e1ef --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_dart.c @@ -0,0 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT dart +#define PT_SUPPORTED_FEATURES BIT(PT_FEAT_DART_V2) + +#include "iommu_template.h" diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index e35fb83657f73b..edfbf5f8d047b6 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -126,6 +126,16 @@ enum { PT_FEAT_ARMV8_NS, }; +struct pt_dart { + struct pt_common common; + u8 granule_lg2sz; +}; + +enum { + /* Use the DART 2 rules instead of DART 1 */ + PT_FEAT_DART_V2 = PT_FEAT_FMT_START, +}; + struct pt_x86pae { struct pt_common common; }; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index ca69bb6192d1a7..0896e79863062e 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -289,6 +289,21 @@ static inline int pt_iommu_armv8_init(struct pt_iommu_armv8 *table, } } +struct pt_iommu_dart { + struct pt_iommu iommu; + struct pt_dart dartpt; +}; + +struct pt_iommu_dart_cfg { + struct device *iommu_device; + unsigned int features; + unsigned int ias_lg2; + unsigned int oas_lg2; + u64 pgsize_bitmap; +}; +int pt_iommu_dart_init(struct pt_iommu_dart *table, + struct pt_iommu_dart_cfg *cfg, gfp_t gfp); + struct pt_iommu_x86pae { struct pt_iommu iommu; struct pt_x86pae x86pae_pt; From patchwork Thu Aug 15 15:11:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764924 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C643F1AED5A for ; Thu, 15 Aug 2024 15:11:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734714; cv=fail; b=IxlHeUxm90keO/a7Ix3TdP/tSth5/L3SoUZ95zy46B6jdJgJr2+kiW6/MfU4/pZALup16oieGsbOYJnnaT+irG94PvOgJ2xDoEPBbyZ65W+FlI3YxCGAV6AEyE3jC0gLh21kjxP62XwbC5tmzcz3CyivZFkVZKhwptfmiXlELfk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734714; c=relaxed/simple; bh=1XcPgrzJVKo6BtF+++DIQ5Sth+zdkKblDM7x1SaHdA4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=Vv2Gf46geoiiDD7UD+TEjRiV0RPCTuiNpSulZdoatEhVsMA2VacdfIv4jUOGJGrazDclnv5CcWtiCopYFlRraOYDJRpM0m36NZRLP4BanCkVIpwt7J5Ad7bGvTSmrtC0WHQGtxvJ0Xij8XdXwZbKWGBrU5bFZDnX2wPNkbMAUNI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Rralaf3W; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Rralaf3W" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=f60b3maRMgdxiKEK+dqCgyL714VlxVRKwQykPU9rSfovJxgsp9UlBaScPpXJwmqIGq028zMDE1ESHArY/DQggQZdIi6W4rHI/fWbkL9C7Taq9DfDdUquuDL1H2cJJOF58VIw2NipGs6BsEylcvn5HlXCvVlRmuYh6y/qtjNX0D7FawkYIX5IeqOKfvGErEm7lf5hJGXjBwWS+zpbRcm9GUb3gmzFc0vJ4ni/DvN+y/sW7O9MosAuAuPgJb/XjgadmOhs6/AfJDW5VGgPk3fbMEcUOFq6jFhPtLR0H2ISb5VjkZWbM21fM0nQP5JIfIhQNsLbkc5Rkv7hLS53/Z6Znw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EypQGesWi28ruO1EyqITK/gMemNPOViuFlIPGd4zS1w=; b=yT5dde2QXuFFJQqsUkOb+4ZElUNs2MboqDjtZRyIfqUEr/LFT+7CLPZbP8IUOtvn71/FtxIoWMV/eB/9wl1XgBOkUqBK6VQMU3pOEYEo7NV5jr9u5nBSLSFFLV8YWi3jrA3Neo4dz+hel1cxCWp+kqksT5brBgdnSQfYBMOsvGorzaBPixW8iWgOI3Gc9q7r5lS4gYcDOfh2D7OLv/TWQl5LiN2jYJiIBOKFSmEhhF4rPhg72VLlmznDGbrYc9pt6GRjDt+GkqB37H7t7iRsMMYVqNEP620afLzwS2DSY38a9LPQyE7xETgzFOmhBBhFA2LlTn4MV2sqfgoyn5eGAA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EypQGesWi28ruO1EyqITK/gMemNPOViuFlIPGd4zS1w=; b=Rralaf3WZMtQlqjimocNn9yR0zLJ8Kfpk4UB7rjdUFN5ukoaAQUY624+e6Z7jRb8Wzy9mZEZ/0igucdIS70LJ9r3gvdTzRhCry99VvFybQCtoG6N0ISYnAYmDfOHSXAfvO0xcu3+R1iL4yyoWa/AwvNwSs6gQ6bwMHe2jWB+o9uiFKectnz9T5EgpM9cDZRjlNXBBgwfLievMIYV8wmvbxaLSLmk9/m/mqWD25Ulf1+s7YwRy1oz0YZRAO9+VWKLeQ3TvccDp0AXOb9tzU53M7hnWVBJgh1lyUnLWv7l5yuyxrqYlF8BDCkWw7ce1FcyhmwrUOJaZKnhkTwYN1QNPA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:37 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:37 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 15/16] iommupt: Add the 32 bit ARMv7s page table format Date: Thu, 15 Aug 2024 12:11:31 -0300 Message-ID: <15-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BL1PR13CA0235.namprd13.prod.outlook.com (2603:10b6:208:2bf::30) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 61e3e86b-d2d3-47a9-b702-08dcbd3c8d70 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: Ae6j7d5xsXUBDJwcd89E7mJDtUCISb5cM6nQ0r1lNNfJkBPS/YHLotO8RNtbFGKXbEUGjzsZUR8I6JzInanxMWLhGxccLbYTSzRlvDwVQlBqyKJgaGqWItCFHpv2BSozCj+3NV3TYPl4yxkmZwS2w7gjeOSsGV27gbSaRUlkfwPKhG4cGGari5ppwdwHpbHkdCXBwT1p12Bb45hpje6BVrwbBF6jAeckCHEHxG9EaEa3+9oha7K78xvgRGyoRA0lA6DIdDGPXIx/ppD+t4wD7k4AZe7lk3WahuiIRDUR2l6JaEM3g2RSBLZd7QG89PSqfvot11wq3RfwTWUr5yUKP30drViMhP51mq1UryahZD2P9G6TjGGcllmHssto3moLfdbZGqorVpoETuZFmK2TDYyKRTB3zL+pvfF25GiPA5lz3FVPt57h8dNUkRBuCq2LMjBxLaaahPfnt3qGqJ3g+JvPOhSQEFJULuIIXZG5URxlhoUpDu/2Ue8ziIMgMLjCNxMJd8mvOnhxEiK7CZjxIg2CA52i7MU1qsNkUBZjUg8qQuJPavCXtBJzalXfR0MPeuS4R8YB7oiv504DKR7IVonFiVNrcqyl0ZxBeLxclnr7bwH1z7DT3TXNRKWIpBNPZ5ZM0yYMaMzIibZq9aXu8KryOTDsGEkqM1Errw5Il+q3C78tnCbe2nsa38hsXajMgu7L6qnCaf1w/I87gCDrxkTZcMNC0KbnByRgFfo6n/ErxXHSbVC0sqXgyCaHI5I/dOLlevySO1J9lkHKLgYwkUvVUeVapjIdABHFpbM4JJdUKFbMx0BCeyH4/uDS8M9K+PfQl/b1NYRSadSkR1nOGuXyHRRGuyHx51WBQ3FvCYy/VBbacGO8i6LLurfCBB8Hzs7k7ZNsLIo2ViCW7yvyBfIy5cGfy5aoQkS6bY/glEqhcEi8Bmaco5ltb4FEBnJ6go0JxQ7TOKj17m7d/lIqMzNfxOnWDP9A6xJhpnaJtQNQMoI35X3ixZBnGS8AKir1yI2by65F2OyTS7K8cmJo/nx1JEb+eSGJeahf1Vagoyd84+/hDWsKFgyhCkR6V8Uppd464VHfGQ4HCp6BVmNno6Gz5et8tZy1K5+ImB97rKiC2TigE4qGlOYPOKNTtfCvLFNfGaDtAG7N8fPliJLKFFblyPoknH+B47GttSqahToPzI+HjBTCrxwrFMQ/ryAkzMPIs32GA7c+a5pVew3sBcR4XuteLt3YrK60LBqk0X1Qi/W1T46ovotBq6+hEFWWHkw01P/2D1b2J/fil7y61I6YJ7gbSH5HqhEk7Zt9mXZCclHs9ZGUKFHrqGJTHmFkX6ivWleqxgxH16iWckKJhw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: WPYfJeYYdmw1qMw5T0fX+z+cX33bH2GsRc16AWrMNejk3lTMw65POEbOwUOUuMeoYGLG1pYM8bpP9CWyFzxcH3WCGi4RRdP+NPuWlc5DGHEfl5ECRZYMg9O9BKoVGBle2F5iiaEwmG9AEQx25rjYlu1/1VuyPwA2/JaWCQMbV1p2jSjeXDWe+ut/FMfI7U9zSLBbTQB5tPLeordngGsxq862xul1Ecbwcwzp3f+L/nxaKfRnoNZeEATL0ChJrmrEZta+R8xaljE9MoHa+gcJutEWB0cuM6GjUAms/ShjGmoLBPqSSFg0GCfWWWdUmjaQyxWgarzjx7KXx6VhsgM8Fhd2cGOxySr3sQQY4aCyCdkaOBtrEV9n7oWEbNxkqQek4imPh7bT09EH0fjsx0s/j109t5Y2XIfSKwQOwM908gtUbzN7cEAJHmoI6u/JRjH2L7kndgTgS6OaG9JL8SysJZF+Qz85RQez1bttkCiKuBXtJo5gKl+Uj73Uxhg7hb+x3gOcdyFmSvuKLtIJmJ8pV9M5ewj51HTHO83k9BcT1afi7CV+qeeR/vSCCDX7I2wgb5Kt4Di619JwxPUDv2EH/CYtuqzPakBKlKS61CnUjkA1rs5+kFfLT2CHc5DwygDc1O6oQEpsCytPqbtiHezk1x/91sNyTcndNSNsvJOh15L/Qvrx+wesWsnCDJs683xHma/ul72EJ6hFCJokR3GsBI95eY7atyp+E0rVQtlNSgw+CUOE2U/yXwqg5orZRQmkwv6Vi6B78i3PfXQ7ZalGrO1kGXqeYDN/Pm/LwHw3dkbV1Q4OfkSw1EoLuW3/VipCjMivcsE+6vrDJ8Zt5PKiYD6z3r5shNS0fJBIfzY4cko9ffDzYsuIda4ZYneM9T5lrsK65r70dzF27wKEdAOc3nVv8IdSLSOonC1A6B8qzt/K7WPITIYGjDl5lqdMYqy5u3bGqTtaTfdohD63T4zcXskV/H7Rye0qj4OdVBpzSRryawm1QR5AhFqssFNdpnej+/ZFK+0qrGhFhziFTXeXgOIF9EHOXx82XrKOcgNCoWvPbLcvwk90OW+1dX/yUlAFSpPuCzz2iu68lm+hz/h7HcZdSfThMdgc5946lOTy9vc1cMPbL2nfoCDplfo89MvuGKuBF4f2PC4fq04zfWrLAtozHtUjI2m/sglGAUi7aM2FKXvyVEmBTDEC9xp9haW+LECgYYccsc62KYJthISTshSyW53rfIleN66CcHprMO/XlH17nFR4yyKk+BTxFISQul3mfMrfh/cJn3sIfn6/NAd/4CeFnS0Uj1hrTIXTz8OXszl9Xi/83u0Uu8uaCPhCYLzO30S4xFAPtAIL5YZqQy6hnGPVktAi1fcZsPDAPiMk0JWT1P0UCz3oKxE7oOBZ86ZCUzZzFcWcj/O5KSWIv4bFTfByBMXplFS/VYt6NUMYLOGiNWISnHOhtdn6ULgsZ6c7suFBSRd97AwX5iiifspUYAyvdEpbkcJwKMYnYvcAAABSeiOTIzqUae3J9OV+NxA7k7WCJFnqVwLnpGelO3HeMMIjlB328bVe969faYI= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 61e3e86b-d2d3-47a9-b702-08dcbd3c8d70 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:35.1114 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 1DXmI6tlgkxhUWo5jPT3I4MJdDdIyydD3r7ThcMtB3l8wx1kakQfb8N3XKPGkNCR X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 ARMv7s is the 32 bit ARM page table format, it has only two levels. The format behavior is good, but there are several additional complexities to support this format, 32 bit DMA and the 1k allocations for the table memory that are not ready. It is an interesting demonstration of how a format can work when there are many different bit encodings for the PTEs depending on page size and level. Also it translates the full 32 bit VA, and has a u32 VA with a u64 OA, which excercies some unique corner cases. Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 10 + drivers/iommu/generic_pt/fmt/Makefile | 2 + drivers/iommu/generic_pt/fmt/armv7s.h | 529 ++++++++++++++++++++ drivers/iommu/generic_pt/fmt/defs_armv7s.h | 23 + drivers/iommu/generic_pt/fmt/iommu_armv7s.c | 11 + include/linux/generic_pt/common.h | 9 + include/linux/generic_pt/iommu.h | 13 + 7 files changed, 597 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/armv7s.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv7s.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv7s.c diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 5ff07eb6bd8729..2d08b58e953e4d 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -34,6 +34,15 @@ config IOMMU_PT_AMDV1 depends on !GENERIC_ATOMIC64 # for cmpxchg64 default n +config IOMMU_PT_ARMV7S + tristate "IOMMU page table for the 32 bit ARMv7/v8 Short Descriptor Format" + default n + help + Enable support for the ARM Short-descriptor pagetable format. + This supports 32-bit virtual and physical addresses mapped using + 2-level tables with 4KB pages/1MB sections, and contiguous entries + for 64KB pages/16MB supersections. + config IOMMU_PT_ARMV8_4K tristate "IOMMU page table for 64 bit ARMv8 4k page size" depends on !GENERIC_ATOMIC64 # for cmpxchg64 @@ -91,6 +100,7 @@ config IOMMUT_PT_KUNIT_TEST select IOMMU_IO_PGTABLE depends on KUNIT depends on IOMMU_PT_AMDV1 || !IOMMU_PT_AMDV1 + depends on IOMMU_PT_ARMV7S || !IOMMU_PT_ARMV7S depends on IOMMU_PT_ARMV8_4K || !IOMMU_PT_ARMV8_4K depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index a41a27561a82d0..1e10be24758fef 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -2,6 +2,8 @@ iommu_pt_fmt-$(CONFIG_IOMMU_PT_AMDV1) += amdv1 +iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV7S) += armv7s + iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_4K) += armv8_4k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_16K) += armv8_16k iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k diff --git a/drivers/iommu/generic_pt/fmt/armv7s.h b/drivers/iommu/generic_pt/fmt/armv7s.h new file mode 100644 index 00000000000000..52a0eccf9fd5cf --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/armv7s.h @@ -0,0 +1,529 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * ARMv7 Short-descriptor format. This is described by the ARMv8 VMSAv8-32 + * Short-descriptor chapter in the Architecture Reference Manual. + * + * NOTE! The level numbering is consistent with the Generic Page Table API, but + * is backwards from what the ARM documents use. What ARM calls level 2 this + * calls level 0. + * + * This was called io-pgtable-armv7s.c and ARM_V7S + * + * FIXME: + * - mtk encoding + * - GFP_DMA32 + */ +#ifndef __GENERIC_PT_FMT_ARMV7S_H +#define __GENERIC_PT_FMT_ARMV7S_H + +#include "defs_armv7s.h" +#include "../pt_defs.h" + +#include +#include +#include + +enum { + /* + * FIXME: The code supports the large physical extensions and + * can set this to 40, but the io-pgtable-arm-v7s does not, so + * reduce it. + */ + PT_MAX_OUTPUT_ADDRESS_LG2 = 40, + PT_MAX_VA_ADDRESS_LG2 = 32, + PT_ENTRY_WORD_SIZE = sizeof(u32), + PT_MAX_TOP_LEVEL = 1, + PT_GRANUAL_LG2SZ = 12, + PT_TABLEMEM_LG2SZ = 10, +}; + +#define PT_FIXED_TOP_LEVEL PT_MAX_TOP_LEVEL + +enum { + ARMV7S_PT_FMT_TYPE = GENMASK(1, 0), +}; + +/* Top most level (ARM Level 1, pts level 1) */ +enum { + /* Translation Table */ + ARMV7S_PT_FMT1_TTB = GENMASK(31, 10), + + ARMV7S_PT_FMT1_B = BIT(2), + ARMV7S_PT_FMT1_C = BIT(3), + ARMV7S_PT_FMT1_XN = BIT(4), + ARMV7S_PT_FMT1_AP0 = BIT(10), + ARMV7S_PT_FMT1_AP1 = BIT(11), + ARMV7S_PT_FMT1_TEX = GENMASK(14, 12), + ARMV7S_PT_FMT1_AP2 = BIT(15), + ARMV7S_PT_FMT1_S = BIT(16), + ARMV7S_PT_FMT1_NG = BIT(17), + + /* Section */ + ARMV7S_PT_FMT1S_OA = GENMASK(31, 20), + + /* Supersection */ + ARMV7S_PT_FMT1SS_OA_C = GENMASK(8, 5), + ARMV7S_PT_FMT1_SUPER_SECTION = BIT(18), + ARMV7S_PT_FMT1SS_OA_B = GENMASK(23, 20), + ARMV7S_PT_FMT1SS_OA_A = GENMASK(31, 24), + +}; + +enum { + ARMV7S_PT_FMT1_TYPE_TABLE = 1, + /* PXN is not supported */ + ARMV7S_PT_FMT1_TYPE_SECTION = 2, +}; + +/* Lowest level (ARM Level 2, pts level 0) */ +enum { + ARMV7S_PT_FMT2_SMALL_PAGE = BIT(1), + ARMV7S_PT_FMT2_B = BIT(2), + ARMV7S_PT_FMT2_C = BIT(3), + ARMV7S_PT_FMT2_AP0 = BIT(4), + ARMV7S_PT_FMT2_AP1 = BIT(5), + ARMV7S_PT_FMT2_AP2 = BIT(9), + ARMV7S_PT_FMT2_S = BIT(10), + ARMV7S_PT_FMT2_NG = BIT(11), + + /* Small Page */ + ARMV7S_PT_FMT2S_XN = BIT(0), + ARMV7S_PT_FMT2S_TEX = GENMASK(8, 6), + ARMV7S_PT_FMT2S_OA = GENMASK(31, 12), + + /* Large Page */ + ARMV7S_PT_FMT2L_XN = BIT(15), + ARMV7S_PT_FMT2L_TEX = GENMASK(14, 12), + ARMV7S_PT_FMT2L_OA = GENMASK(31, 16), +}; + +enum { + ARMV7S_PT_FMT2_TYPE_LARGE_PAGE = 1, + ARMV7S_PT_FMT2_TYPE_SMALL_PAGE = 2, +}; + +#if 0 +/* Attr bits, relative to the field of FMTx_ATTRS */ +enum { + ARM_V7S_ATTR_NG = BIT(7), + ARM_V7S_ATTR_S = BIT(6), + ARM_V7S_ATTR_B = BIT(2), + + ARM_V7S_ATTR_AP0 = BIT(0), + ARM_V7S_ATTR_AP1 = BIT(1), + ARM_V7S_ATTR_AP2 = BIT(5), + + /* Simplified access permissions */ + ARM_V7S_PTE_AF = ARM_V7S_ATTR_AP0, + ARM_V7S_PTE_AP_UNPRIV = ARM_V7S_ATTR_AP1, + ARM_V7S_PTE_AP_RDONLY = ARM_V7S_ATTR_AP2, +}; +#endif + +#define common_to_armv7s_pt(common_ptr) \ + container_of_const(common_ptr, struct pt_armv7s, common) +#define to_armv7s_pt(pts) common_to_armv7s_pt((pts)->range->common) + +static inline pt_oaddr_t armv7s_pt_table_pa(const struct pt_state *pts) +{ + return oalog2_mul(FIELD_GET(ARMV7S_PT_FMT1_TTB, pts->entry), + PT_TABLEMEM_LG2SZ); +} +#define pt_table_pa armv7s_pt_table_pa + +/* Returns the oa for the start of the contiguous entry */ +static inline pt_oaddr_t armv7s_pt_entry_oa(const struct pt_state *pts) +{ + if (pts->level == 0) { + if (pts->entry & ARMV7S_PT_FMT2_SMALL_PAGE) + return oalog2_mul(FIELD_GET(ARMV7S_PT_FMT2S_OA, + pts->entry), + PT_GRANUAL_LG2SZ); + return oalog2_mul(FIELD_GET(ARMV7S_PT_FMT2L_OA, pts->entry), 16); + } + if (pts->entry & ARMV7S_PT_FMT1_SUPER_SECTION) + return oalog2_mul(FIELD_GET(ARMV7S_PT_FMT1SS_OA_A, pts->entry), + 24) | + oalog2_mul(FIELD_GET(ARMV7S_PT_FMT1SS_OA_B, pts->entry), + 32) | + oalog2_mul(FIELD_GET(ARMV7S_PT_FMT1SS_OA_C, pts->entry), + 36); + return oalog2_mul(FIELD_GET(ARMV7S_PT_FMT1S_OA, pts->entry), 20); +} +#define pt_entry_oa armv7s_pt_entry_oa + +static inline bool armv7s_pt_can_have_leaf(const struct pt_state *pts) +{ + return true; +} +#define pt_can_have_leaf armv7s_pt_can_have_leaf + +static inline unsigned int +armv7s_pt_table_item_lg2sz(const struct pt_state *pts) +{ + return PT_GRANUAL_LG2SZ + + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u32))) * pts->level; +} +#define pt_table_item_lg2sz armv7s_pt_table_item_lg2sz + +static inline unsigned short +armv7s_pt_contig_count_lg2(const struct pt_state *pts) +{ + return ilog2(16); +} +#define pt_contig_count_lg2 armv7s_pt_contig_count_lg2 + +static inline unsigned int +armv7s_pt_entry_num_contig_lg2(const struct pt_state *pts) +{ + if ((pts->level == 0 && !(pts->entry & ARMV7S_PT_FMT2_SMALL_PAGE)) || + (pts->level != 0 && pts->entry & ARMV7S_PT_FMT1_SUPER_SECTION)) + return armv7s_pt_contig_count_lg2(pts); + return ilog2(1); +} +#define pt_entry_num_contig_lg2 armv7s_pt_entry_num_contig_lg2 + +static inline pt_vaddr_t armv7s_pt_full_va_prefix(const struct pt_common *common) +{ + if (pt_feature(common, PT_FEAT_ARMV7S_TTBR1)) + return PT_VADDR_MAX; + return 0; +} +#define pt_full_va_prefix armv7s_pt_full_va_prefix + +/* Number of indexes in the current table level */ +static inline unsigned int armv7s_pt_num_items_lg2(const struct pt_state *pts) +{ + /* if (pts->level == 1) + return 12; + */ + return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u32)); +} +#define pt_num_items_lg2 armv7s_pt_num_items_lg2 + +static inline enum pt_entry_type armv7s_pt_load_entry_raw(struct pt_state *pts) +{ + const u32 *tablep = pt_cur_table(pts, u32); + unsigned int type; + u32 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + type = FIELD_GET(ARMV7S_PT_FMT_TYPE, entry); + if (type == 0) + return PT_ENTRY_EMPTY; + if (pts->level == 1 && type == ARMV7S_PT_FMT1_TYPE_TABLE) + return PT_ENTRY_TABLE; + return PT_ENTRY_OA; +} +#define pt_load_entry_raw armv7s_pt_load_entry_raw + +static inline void +armv7s_pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + unsigned int isz_lg2 = pt_table_item_lg2sz(pts); + u32 *tablep = pt_cur_table(pts, u32); + u32 entry = 0; + + PT_WARN_ON(oalog2_mod(oa, oasz_lg2)); + tablep += pts->index; + + if (oasz_lg2 == isz_lg2) { + if (pts->level == 0) + entry = FIELD_PREP(ARMV7S_PT_FMT_TYPE, + ARMV7S_PT_FMT2_TYPE_SMALL_PAGE) | + FIELD_PREP(ARMV7S_PT_FMT2S_OA, + oalog2_div(oa, PT_GRANUAL_LG2SZ)) | + attrs->pte2; + else + entry = FIELD_PREP(ARMV7S_PT_FMT_TYPE, + ARMV7S_PT_FMT1_TYPE_SECTION) | + FIELD_PREP(ARMV7S_PT_FMT1S_OA, + oalog2_div(oa, 20)) | + attrs->pte1; + WRITE_ONCE(*tablep, entry); + } else { + u32 *end; + + if (pts->level == 0) + entry = FIELD_PREP(ARMV7S_PT_FMT_TYPE, + ARMV7S_PT_FMT2_TYPE_LARGE_PAGE) | + FIELD_PREP(ARMV7S_PT_FMT2L_OA, + oalog2_div(oa, 16)) | + attrs->pte2l; + else + entry = FIELD_PREP(ARMV7S_PT_FMT_TYPE, + ARMV7S_PT_FMT1_TYPE_SECTION) | + ARMV7S_PT_FMT1_SUPER_SECTION | + FIELD_PREP(ARMV7S_PT_FMT1SS_OA_A, + oalog2_div(oa, 24)) | + FIELD_PREP(ARMV7S_PT_FMT1SS_OA_B, + oalog2_div(oa, 32)) | + FIELD_PREP(ARMV7S_PT_FMT1SS_OA_C, + oalog2_div(oa, 36)) | + attrs->pte1; + + PT_WARN_ON(oasz_lg2 != + isz_lg2 + armv7s_pt_contig_count_lg2(pts)); + PT_WARN_ON( + log2_mod(pts->index, armv7s_pt_contig_count_lg2(pts))); + + end = tablep + log2_to_int(armv7s_pt_contig_count_lg2(pts)); + for (; tablep != end; tablep++) + WRITE_ONCE(*tablep, entry); + } + pts->entry = entry; +} +#define pt_install_leaf_entry armv7s_pt_install_leaf_entry + +static inline bool armv7s_pt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u32 *tablep = pt_cur_table(pts, u32); + u32 entry; + + entry = FIELD_PREP(ARMV7S_PT_FMT_TYPE, ARMV7S_PT_FMT1_TYPE_TABLE) | + FIELD_PREP(ARMV7S_PT_FMT1_TTB, + oalog2_div(table_pa, PT_TABLEMEM_LG2SZ)); + + return pt_table_install32(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table armv7s_pt_install_table + +/* + * Trivial translation of the different bit assignments. pt_attr_from_entry() is + * not a performance path to justify something more optimized. + */ +#define _COPY_PTE_MASK(mask, l1, l2, l2l) \ + { \ + attrs->pte1 |= FIELD_PREP(l1, FIELD_GET(mask, entry)); \ + attrs->pte2 |= FIELD_PREP(l2, FIELD_GET(mask, entry)); \ + attrs->pte2l |= FIELD_PREP(l2l, FIELD_GET(mask, entry)); \ + } +#define COPY_PTE_MASK(name, entry, l1, l2, l2l) \ + _COPY_PTE_MASK(ARMV7S_PT_##entry##_##name, ARMV7S_PT_##l1##_##name, \ + ARMV7S_PT_##l2##_##name, ARMV7S_PT_##l2l##_##name) + +static inline void armv7s_pt_attr_from_pte1(u32 entry, + struct pt_write_attrs *attrs) +{ + COPY_PTE_MASK(NG, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(S, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(TEX, FMT1, FMT1, FMT2S, FMT2L); + COPY_PTE_MASK(AP0, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(AP1, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(AP2, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(XN, FMT1, FMT1, FMT2S, FMT2L); + COPY_PTE_MASK(B, FMT1, FMT1, FMT2, FMT2); + COPY_PTE_MASK(C, FMT1, FMT1, FMT2, FMT2); +} + +static inline void armv7s_pt_attr_from_pte2(u32 entry, + struct pt_write_attrs *attrs) +{ + COPY_PTE_MASK(NG, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(S, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(AP0, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(AP1, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(AP2, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(B, FMT2, FMT1, FMT2, FMT2); + COPY_PTE_MASK(C, FMT2, FMT1, FMT2, FMT2); +} + +static inline void armv7s_pt_attr_from_pte2s(u32 entry, + struct pt_write_attrs *attrs) +{ + COPY_PTE_MASK(TEX, FMT2S, FMT1, FMT2S, FMT2L); + COPY_PTE_MASK(XN, FMT2S, FMT1, FMT2S, FMT2L); +} + +static inline void armv7s_pt_attr_from_pte2l(u32 entry, + struct pt_write_attrs *attrs) +{ + COPY_PTE_MASK(TEX, FMT2L, FMT1, FMT2S, FMT2L); + COPY_PTE_MASK(XN, FMT2L, FMT1, FMT2S, FMT2L); +} +#undef _COPY_PTE_MASK +#undef COPY_PTE_MASK + +static inline void armv7s_pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + attrs->pte1 = 0; + attrs->pte2 = 0; + attrs->pte2l = 0; + if (pts->level == 0) { + armv7s_pt_attr_from_pte2(pts->entry, attrs); + if (pts->entry & ARMV7S_PT_FMT2_SMALL_PAGE) + armv7s_pt_attr_from_pte2s(pts->entry, attrs); + else + armv7s_pt_attr_from_pte2l(pts->entry, attrs); + } else { + armv7s_pt_attr_from_pte1(pts->entry, attrs); + } +} +#define pt_attr_from_entry armv7s_pt_attr_from_entry + +/* The starting index must be aligned to the contig */ +static inline void armv7s_pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u32 *tablep = pt_cur_table(pts, u32); + u32 *end; + + PT_WARN_ON(log2_mod(pts->index, num_contig_lg2)); + + tablep += pts->index; + end = tablep + log2_to_int(num_contig_lg2); + for (; tablep != end; tablep++) + WRITE_ONCE(*tablep, 0); +} +#define pt_clear_entry armv7s_pt_clear_entry + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_armv7s + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->armpt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, armpt.common) + ->iommu; +} + +/* + * There are three enccodings of the PTE bits. We compute each of the three and + * store them in the pt_write_attrs. install will use the right one. + */ +#define _SET_PTE_MASK(l1, l2, l2l, val) \ + ({ \ + pte1 |= FIELD_PREP(l1, val); \ + pte2 |= FIELD_PREP(l2, val); \ + pte2l |= FIELD_PREP(l2l, val); \ + }) +#define SET_PTE_MASK(name, l1, l2, l2l, val) \ + _SET_PTE_MASK(ARMV7S_PT_##l1##_##name, ARMV7S_PT_##l2##_##name, \ + ARMV7S_PT_##l2l##_##name, val) + +static inline int armv7s_pt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + bool ap = true; // FIXME IO_PGTABLE_QUIRK_NO_PERMS + u32 pte1 = 0; + u32 pte2 = 0; + u32 pte2l = 0; + + SET_PTE_MASK(NG, FMT1, FMT2, FMT2, 1); + SET_PTE_MASK(S, FMT1, FMT2, FMT2, 1); + + if (!(iommu_prot & IOMMU_MMIO)) + SET_PTE_MASK(TEX, FMT1, FMT2S, FMT2L, 1); + + /* + * Simplified access permissions: AF = AP0, UNPRIV = AP1, RDONLY = AP2 + */ + if (ap) { + /* AF */ + SET_PTE_MASK(AP0, FMT1, FMT2, FMT2, 1); + if (!(iommu_prot & IOMMU_PRIV)) + SET_PTE_MASK(AP1, FMT1, FMT2, FMT2, 1); + if (!(iommu_prot & IOMMU_WRITE)) + SET_PTE_MASK(AP2, FMT1, FMT2, FMT2, 1); + } + + if ((iommu_prot & IOMMU_NOEXEC) && ap) + SET_PTE_MASK(XN, FMT1, FMT2S, FMT2L, 1); + + if (iommu_prot & IOMMU_MMIO) { + SET_PTE_MASK(B, FMT1, FMT2, FMT2, 1); + } else if (iommu_prot & IOMMU_CACHE) { + SET_PTE_MASK(B, FMT1, FMT2, FMT2, 1); + SET_PTE_MASK(C, FMT1, FMT2, FMT2, 1); + } + + /* FIXME: + if (lvl == 1 && (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)) + pte |= ARM_V7S_ATTR_NS_SECTION; + */ + + attrs->pte1 = pte1; + attrs->pte2 = pte2; + attrs->pte2l = pte2l; + return 0; +} +#define pt_iommu_set_prot armv7s_pt_iommu_set_prot +#undef _SET_PTE_MASK +#undef SET_PTE_MASK + +static inline int armv7s_pt_iommu_fmt_init(struct pt_iommu_armv7s *iommu_table, + struct pt_iommu_armv7s_cfg *cfg) +{ + /* FIXME */ + cfg->features &= ~(BIT(PT_FEAT_ARMV7S_TTBR1)); + iommu_table->armpt.common.max_oasz_lg2 = 32; + return 0; +} +#define pt_iommu_fmt_init armv7s_pt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static inline void armv7s_pt_kunit_setup_cfg(struct pt_iommu_armv7s_cfg *cfg) +{ + cfg->features &= ~(BIT(PT_FEAT_ARMV7S_TTBR1)); +} +#define pt_kunit_setup_cfg armv7s_pt_kunit_setup_cfg +#endif + +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) +#include + +static struct io_pgtable_ops * +armv7s_pt_iommu_alloc_io_pgtable(struct pt_iommu_armv7s_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **unused_pgtbl_cfg) +{ + struct io_pgtable_cfg pgtbl_cfg = {}; + + pgtbl_cfg.ias = 32; + pgtbl_cfg.oas = 32; + pgtbl_cfg.pgsize_bitmap |= SZ_4K; + pgtbl_cfg.coherent_walk = true; + return alloc_io_pgtable_ops(ARM_V7S, &pgtbl_cfg, NULL); +} +#define pt_iommu_alloc_io_pgtable armv7s_pt_iommu_alloc_io_pgtable + +static void armv7s_pt_iommu_setup_ref_table(struct pt_iommu_armv7s *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct pt_common *common = &iommu_table->armpt.common; + + pt_top_set(common, + __va(log2_set_mod_t(u32, pgtbl_cfg->arm_v7s_cfg.ttbr, 0, 7)), + PT_FIXED_TOP_LEVEL); +} +#define pt_iommu_setup_ref_table armv7s_pt_iommu_setup_ref_table + +static u64 armv7s_pt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) + return pts->entry & (~(u32)(ARMV7S_PT_FMT1_TTB)); + return pts->entry; +} +#define pt_kunit_cmp_mask_entry armv7s_pt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/drivers/iommu/generic_pt/fmt/defs_armv7s.h b/drivers/iommu/generic_pt/fmt/defs_armv7s.h new file mode 100644 index 00000000000000..57ba77d5f19c46 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_armv7s.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_ARMV7S_H +#define __GENERIC_PT_FMT_DEFS_ARMV7S_H + +#include +#include + +typedef u32 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct armv7s_pt_write_attrs { + u32 pte1; + u32 pte2; + u32 pte2l; + gfp_t gfp; +}; +#define pt_write_attrs armv7s_pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_armv7s.c b/drivers/iommu/generic_pt/fmt/iommu_armv7s.c new file mode 100644 index 00000000000000..591a97332bbefe --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_armv7s.c @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ + +#define PT_FMT armv7s +#define PT_SUPPORTED_FEATURES \ + (BIT(PT_FEAT_DMA_INCOHERENT) | BIT(PT_FEAT_OA_SIZE_CHANGE) | \ + BIT(PT_FEAT_OA_TABLE_XCHG) | BIT(PT_FEAT_FULL_VA)) + +#include "iommu_template.h" diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index edfbf5f8d047b6..558302fe1e0324 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -108,6 +108,15 @@ struct pt_armv8 { struct pt_common common; }; +struct pt_armv7s { + struct pt_common common; +}; + +enum { + /* Use the upper address space instead of lower */ + PT_FEAT_ARMV7S_TTBR1 = PT_FEAT_FMT_START, +}; + enum { /* Use the upper address space instead of lower */ PT_FEAT_ARMV8_TTBR1 = PT_FEAT_FMT_START, diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index 0896e79863062e..351a69fe62dd1d 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -216,6 +216,19 @@ struct pt_iommu_amdv1_cfg { int pt_iommu_amdv1_init(struct pt_iommu_amdv1 *table, struct pt_iommu_amdv1_cfg *cfg, gfp_t gfp); +struct pt_iommu_armv7s { + struct pt_iommu iommu; + struct pt_armv7s armpt; +}; + +struct pt_iommu_armv7s_cfg { + struct device *iommu_device; + unsigned int features; +}; + +int pt_iommu_armv7s_init(struct pt_iommu_armv7s *table, + struct pt_iommu_armv7s_cfg *cfg, gfp_t gfp); + struct pt_iommu_armv8 { struct pt_iommu iommu; struct pt_armv8 armpt; From patchwork Thu Aug 15 15:11:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 13764920 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B20A1ABEA5 for ; Thu, 15 Aug 2024 15:11:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.46 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734711; cv=fail; b=ftNEbDi+VnYNfqIsZl+dCqerWsoAbty7mXGS4sQP65RBy3omOqWPudJWZIMc3LO9NtwV3ZdxYr8LtLFw5KT/Qx5z0aQCP9XIryQF2veq0GTU85E/FrXMRqGQ1EDgoFEijAVBVhYAWql9ujTYK673CG7c4Xb1HQLot4K/j+lzpUY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723734711; c=relaxed/simple; bh=6x+8hYLDQJj/xR78vQiydWLUc5Uv/3s7jaqfaXc/rC8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=K1Cn1qU/sfGbqEscotdNOb84j7K04PzEI8TtZW5sUs9DglnDL0cASdQmQBLBfAeMgKQ7VGEkf2ub1ZeWRi1ALEcXAi7tiWR0hkfD380vSqxH/KXvObKx5PHmc5GD6yKCX4EGaBre8pXtE6v2hoaQSF2ryCgSe61PGa0Dlgq5fdY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Bc65Mgp8; arc=fail smtp.client-ip=40.107.92.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Bc65Mgp8" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=XI1xVYjubtsxuOy3lzYwqbZx7uHBoPZ+7AlbBdl+v9t47szRRWEUAmXSI62B44oGtwjhOXd1w3T7o3af38Qg9ZyVJwy/ala2OB7/vYGVukW7aVobhX2YGvZ3j28YtOgDujbewShWk6QT4nBBcsHo2CpxgOiSn7zOgE1fabVowR8us/qU2mNExVADinRnsXjL33LfNCEPZapPG9VG4D+vmVL5Ef25FGLKyYonvdT4kTwYavQG2kAZnEt5tZcDwSquxGDp5ylwhIsWhJoCqpRzvHPpmuZp6vGZN6jnfzueRiWocqbE4SZyWP46sacju67npBHugtFVU0S3nfaYY2+x0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qFj7xNRRHlQCtrNxbLIFFl5Astekl53+NqQ4R/I4deE=; b=aWvOFuHRJRfX5x7Uw90xylcdiaoTFO1Cb/VPqVFo+9aH5w0o2aF/zVzui4ICVbmazaeTo8YvGWgiZyfAxCkDa2PCwJDRnk3Q6Su0jV7BaK46sQ0LcwR8QjQ41+UGBLJZyKJYJaGS0DrCnf/JBTv0fuJvACrCT1oxX7GaX+ueWim7iwpoXh6QJmceCETwXQVeHaZaUhM4rFNiDSOQKgLHq1UdSNDq04n4rkWNJxPgCd79Qp0w4aQpJyryiGk67tqX21KV/CL0ZshmOrT3A+BhWXIilhInVQruuP8h/E0449fmfQu+G5egekUEiZE6q/fq1LEoOEQ7VcB/LrrFaSLSkQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qFj7xNRRHlQCtrNxbLIFFl5Astekl53+NqQ4R/I4deE=; b=Bc65Mgp8Xz2fRW5fN3Ea3VvmRvcXsTeRvM41Xg5PGVye1yku7nA06kG4f/3DIp/n3hcT9s9kLsiXqfBEe7jw9+Qx4jg4h4wUxqOTctoGsHIvX1TUGMoLh/MJoCkG4SFF7/gZNOpyT+amTSB3gNI6KCam0sS6m9Nz3mW3mOOrwWT0yGREj8OL2+iyBXYCvlST9dh0hk9i0XzaEnU9c3HqQyEYx6Sx848BtuIdaO6uI9KYxsvOfhLLlsvasosRKaDyHNuolcvCRw5hj3asnEGsx/EvG02K6rqfa0Z3/TJ3yZYM3IXFIka7TS0hszJnhS1xmKQsOYHX4V3jJIJwKJd/Ng== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) by SN7PR12MB8146.namprd12.prod.outlook.com (2603:10b6:806:323::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.17; Thu, 15 Aug 2024 15:11:37 +0000 Received: from CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8]) by CH3PR12MB7763.namprd12.prod.outlook.com ([fe80::8b63:dd80:c182:4ce8%3]) with mapi id 15.20.7875.016; Thu, 15 Aug 2024 15:11:37 +0000 From: Jason Gunthorpe To: Cc: Alejandro Jimenez , Lu Baolu , David Hildenbrand , Christoph Hellwig , iommu@lists.linux.dev, Joao Martins , Kevin Tian , kvm@vger.kernel.org, linux-mm@kvack.org, Pasha Tatashin , Peter Xu , Ryan Roberts , Sean Christopherson , Tina Zhang Subject: [PATCH 16/16] iommupt: Add the Intel VT-D second stage page table format Date: Thu, 15 Aug 2024 12:11:32 -0300 Message-ID: <16-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> In-Reply-To: <0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com> References: X-ClientProxiedBy: BLAPR05CA0037.namprd05.prod.outlook.com (2603:10b6:208:335::18) To CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB7763:EE_|SN7PR12MB8146:EE_ X-MS-Office365-Filtering-Correlation-Id: 73f6fd1d-1f60-41c1-c916-08dcbd3c8d6b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014; X-Microsoft-Antispam-Message-Info: pz1rRhuegmyh0kVy+yP1TEpUjYHMDeyYggJ4+ch/WkuRAFnpwhcVmtuJyYGpc8fHlPTRGbH2bzl24GpUEfNVVXhWeA0gLzXgizNwxSQXmJyEs7v65J9VSpOsthZvtzrhBsJMpzPV8CQyFJKKiSQK1hxopX8cRMLRfzXBErPlQlYYQm0KxolDl8zBnZ7frZ8EISTSCcDEkLxfoQcoTAE3juEtfNEymT/LyqFEFbqphFRqwDpbvW/Mci8LiFZ5FzP6eDzvB9olh7eV3El8GniNx39ReuGNtL1pfo0AHk1+nQM5ZabkBhgKeYfk+AKaqvXOzHpObzztDkFHft3zd9RI7hzJbJrSJGE82qpcSutA3LMcrT7otwWE1uvv481c1R+QkE/AkZHtp5rQOiz8RBEP+87RvknID6mi2HOw4ZBKMffLTT3w1DYuRe0xHiA64LX2EGMqRBP9esrXZRV69RNoS7P/DQmZ4AN3zvrMjNuqWe8iVCjcMPZ+O+sCkYn4FuX9oSdgiHe4bagYDnxtI85RvyWVT6eyihbgjLRJ1o5ALjHceYnWLGbSIfIj8csLxXnCXS2vQEePE5E14cd5whRBCiMmN4BIBp8F9u1aHOyubRN60wlZNnr2YspGZxSXYKsQ/bOKnCKKwTrgoTJpQz+Mva4JMzabfxjmrdRLW8DHnSKsfcc4k61floTEq5sCxadO++keLgo8yJwXEeR0WnEzBkfAWJLokuOI58v0/7zIv+D0SEU3J1qfM2KGlrTVPpYP+4fQvpzaKRs6WkLFPIJDeDDSBMY1hCDE+hFfPmT2D5640XsxPx4ggaHfyBOXx7kqfw7JK6yv62ZMoOgXYTgc0W6Oa1dJCNSQMBvmlrzCluWh3nEJ+m4TzF/ZuZAmohvjuPlZq93lTWt6lmSAh8jxsmLCyt5/OuU4d3FaPIw6lDBcbzvuumdYR23ohGnOuw4zN2Of9mKzwPSHwqnqXMCX1Bv3et5nHko8C+oXtxQwMPUpJsZKOwm48iz1DiwAW1zgmd1z/IggmyhmXXS7lzsE8/ubDqTvSm8pEOrOzl90lnuZ/WaghcQj5CscIWpCdg3zvypND3h3+LP4/W8p7HPbaWPGjewvGWRhzmsEgjDLp13If9zZV4LOVskVb/idJviQQSgTdNPoLnvAptnWH8CTMKkPp+6oL09LX0qM6dSHG5Y7cTfD7pEA2kafCKwll30gvk/HCjxkdcrPvGIpnyHLJIGw5Y5Dd1tE8vjzeHTGX1aBXLf2Vj2rSlEE2cDwPlCkVRnqN7NEkKK6SIiL1IPjt/zf+Qjal5gOF+y0hetPsXt1WUDI76Lz7DVKD6ZWQWN9dt47uXAX+4SZFmapLThDuQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CH3PR12MB7763.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 0k6DbcVAv+9L2wkPVIvDXQ/gbn0bG9n/4g+LodFDNaxJqEUH/e6lRXc2dUjQDIDxd5ozoBdmmOUBXkq6oKCHHKiiKl6UkRTOzhlMqhKDm4CSyyOY7uIUac8Gu5e6wxJEPvv52jCHWQ7CgiAAph6dy5WvzxsR5lt0p9tqxHzPEWhwhP6xQ5VYkrBwOAG9HyvoYqXv+PBD6dHAiBU5LDnEDQr/lZKzTNkuVf1Awyt6H5tYqSd6Y+8z2PRFV8fpFiEg0lwY5XOGNHw/YT9k5hyBp4l8Op/mmY2qQbT/pOLkeGOysStDEk1lDZ+tCZ+45TWtaOdebBlJJd6QmZOTRQLd1gnwZ9Wr96brejGIW2UQ/6dd8iSCKjUd5AgETMsFCNMEu2vf4P/LJcaqJq0m5eoZgOD8Vh5FvqtwSWdePVHByI3KjPCaufYiZrjZbiPVSxlsyG1Ujxjig7m+Gz0huf5Wlpaiu1Um4i3thZlxZZFFv5zpUCojlJgq40A00zUZs2zACxZB2Qg2cjPQjY6/1gIC/EGkkigSEXpUGKV6PyUhPhhGqUsGJVCmfxNslrh4l9+STyULvGLwhadBw06ExGxIy8uE4Ty/ITNvlhy19NUOE1Z/q/81wEkhRUqNs6jBrGV356U+lGUshzyRYxZPgbMKYlbLS5fUxeB1W1S9apmJYLD8EqIjhhK5o/lIPoUof9laY5VfxtA0u22PiRavT3zkQM4en+hCzyuLnyIV5RVPuUlNbBsdppdtAyQzC4J5lrUJjIOmSBy72FkY/Jp/6u/5JAwkeZfvS9+hyZ1u/NoSXfHuEq+7RtWpmg4xgmKCB6ZOH4rseO5SDhPrXKEamXvXsrYpOOtj02PvF4pQF+42AV3LmvQO/TY9uKBB/3eiTGLZSYYDnZnEHr4yDjxTYjWMd7UVGjXMJHlqAJ6vaBpvimJkQgI3hjt1gqRT/ogf8xBbyageR56UD2rvpJ1s/VvcZlqOWitNsUH91aV1UyBxU8Lue1sMKp+jA0t/u6IlFR+yPxH7qLlHr7nCBaJs1iqOz7rrKJ3LNqiDafam3dDCPT9+0NbOdVlAvoNbJrXbz2byN6dl5iIUokCoGn2s4p/ymLNmfhVcwR75Ia8AGOlWrmccho5ffs/PxM/+GDV20l4cJ1XYPCd1NKzlzFUvKdh92a3CQHbWvAS7AzubNufwlCmzlX6j0aVB/zxCHwX8oPmx5RppndzZzBCrZJcYIBpJx45lBmO7H5NpNQNf3DJwehqK9rPA9fpUyh+2gJKnmiq7A0n3c8+XGZ2Rw3hqCFQ3ufoMd2usqiFrCmtl7XhXWx8GlIpHddkI3M8pjZhCCkZz1INQAXf9qjP133W92yLVQMsGYIsktsJNsyeZKsimnXIZMyOjU3DvJvltOdmeMsCEmpwBGoN3UJ0PPcd6cFVCcbm62tIilYFce8qdsny4UDChzip4OUJbQ47XorA5uVw52WFFmOdlon3kxn+PzQTOuIKRuDCod/6QGRwJWl/huhDN5voDlS5qC0UwU+aW1JquwXNztBtlHKPdVkHs1UZ1yNPVcg7yUnTPVp0tkStQG84= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 73f6fd1d-1f60-41c1-c916-08dcbd3c8d6b X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB7763.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 15:11:35.0962 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: UEaFXZouKfoXY3YANYSTj9cU3jGNmRGDJ14bQBvQC0R9p5eFQV1DfLu0lb8Q7WWu X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8146 The VT-D second stage format is almost the same as the x86 PAE format, except the bit encodings in the PTE are different and a few new PTE features, like force coherency are present. Among all the formats it is unique in not having a designated present bit. Cc: Tina Zhang Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Jason Gunthorpe --- drivers/iommu/generic_pt/Kconfig | 6 + drivers/iommu/generic_pt/fmt/Makefile | 2 + drivers/iommu/generic_pt/fmt/defs_vtdss.h | 21 ++ drivers/iommu/generic_pt/fmt/iommu_vtdss.c | 8 + drivers/iommu/generic_pt/fmt/vtdss.h | 276 +++++++++++++++++++++ include/linux/generic_pt/common.h | 4 + include/linux/generic_pt/iommu.h | 12 + 7 files changed, 329 insertions(+) create mode 100644 drivers/iommu/generic_pt/fmt/defs_vtdss.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_vtdss.c create mode 100644 drivers/iommu/generic_pt/fmt/vtdss.h diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig index 2d08b58e953e4d..c17e09e2d03025 100644 --- a/drivers/iommu/generic_pt/Kconfig +++ b/drivers/iommu/generic_pt/Kconfig @@ -90,6 +90,11 @@ config IOMMU_PT_DART If unsure, say N here. +config IOMMU_PT_VTDSS + tristate "IOMMU page table for Intel VT-D IOMMU Second Stage" + depends on !GENERIC_ATOMIC64 # for cmpxchg64 + default n + config IOMMU_PT_X86PAE tristate "IOMMU page table for x86 PAE" depends on !GENERIC_ATOMIC64 # for cmpxchg64 @@ -105,6 +110,7 @@ config IOMMUT_PT_KUNIT_TEST depends on IOMMU_PT_ARMV8_16K || !IOMMU_PT_ARMV8_16K depends on IOMMU_PT_ARMV8_64K || !IOMMU_PT_ARMV8_64K depends on IOMMU_PT_DART || !IOMMU_PT_DART + depends on IOMMU_PT_VTDSS || !IOMMU_PT_VTDSS depends on IOMMU_PT_X86PAE || !IOMMU_PT_X86PAE default KUNIT_ALL_TESTS endif diff --git a/drivers/iommu/generic_pt/fmt/Makefile b/drivers/iommu/generic_pt/fmt/Makefile index 1e10be24758fef..5a77c64d432534 100644 --- a/drivers/iommu/generic_pt/fmt/Makefile +++ b/drivers/iommu/generic_pt/fmt/Makefile @@ -10,6 +10,8 @@ iommu_pt_fmt-$(CONFIG_IOMMU_PT_ARMV8_64K) += armv8_64k iommu_pt_fmt-$(CONFIG_IOMMU_PT_DART) += dart +iommu_pt_fmt-$(CONFIG_IOMMU_PT_VTDSS) += vtdss + iommu_pt_fmt-$(CONFIG_IOMMU_PT_X86PAE) += x86pae IOMMU_PT_KUNIT_TEST := diff --git a/drivers/iommu/generic_pt/fmt/defs_vtdss.h b/drivers/iommu/generic_pt/fmt/defs_vtdss.h new file mode 100644 index 00000000000000..4a239bcaae2a90 --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/defs_vtdss.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + */ +#ifndef __GENERIC_PT_FMT_DEFS_VTDSS_H +#define __GENERIC_PT_FMT_DEFS_VTDSS_H + +#include +#include + +typedef u64 pt_vaddr_t; +typedef u64 pt_oaddr_t; + +struct vtdss_pt_write_attrs { + u64 descriptor_bits; + gfp_t gfp; +}; +#define pt_write_attrs vtdss_pt_write_attrs + +#endif diff --git a/drivers/iommu/generic_pt/fmt/iommu_vtdss.c b/drivers/iommu/generic_pt/fmt/iommu_vtdss.c new file mode 100644 index 00000000000000..12e7829815047b --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/iommu_vtdss.c @@ -0,0 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + */ +#define PT_FMT vtdss +#define PT_SUPPORTED_FEATURES 0 + +#include "iommu_template.h" diff --git a/drivers/iommu/generic_pt/fmt/vtdss.h b/drivers/iommu/generic_pt/fmt/vtdss.h new file mode 100644 index 00000000000000..233731365ac62d --- /dev/null +++ b/drivers/iommu/generic_pt/fmt/vtdss.h @@ -0,0 +1,276 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES + * + * Intel VT-D Second Stange 5/4 level page table + * + * This is described in + * Section "3.7 Second-Stage Translation" + * Section "9.8 Second-Stage Paging Entries" + * + * Of the "Intel Virtualization Technology for Directed I/O Architecture + * Specification". + * + * The named levels in the spec map to the pts->level as: + * Table/SS-PTE - 0 + * Directory/SS-PDE - 1 + * Directory Ptr/SS-PDPTE - 2 + * PML4/SS-PML4E - 3 + * PML5/SS-PML5E - 4 + * FIXME: + * force_snooping + * 1g optional + * forbid read-only + * Use of direct clflush instead of DMA API + */ +#ifndef __GENERIC_PT_FMT_VTDSS_H +#define __GENERIC_PT_FMT_VTDSS_H + +#include "defs_vtdss.h" +#include "../pt_defs.h" + +#include +#include +#include + +enum { + PT_MAX_OUTPUT_ADDRESS_LG2 = 52, + PT_MAX_VA_ADDRESS_LG2 = 57, + PT_ENTRY_WORD_SIZE = sizeof(u64), + PT_MAX_TOP_LEVEL = 4, + PT_GRANUAL_LG2SZ = 12, + PT_TABLEMEM_LG2SZ = 12, +}; + +/* Shared descriptor bits */ +enum { + VTDSS_FMT_R = BIT(0), + VTDSS_FMT_W = BIT(1), + VTDSS_FMT_X = BIT(2), + VTDSS_FMT_A = BIT(8), + VTDSS_FMT_D = BIT(9), + VTDSS_FMT_SNP = BIT(11), + VTDSS_FMT_OA = GENMASK_ULL(51, 12), +}; + +/* PDPTE/PDE */ +enum { + VTDSS_FMT_PS = BIT(7), +}; + +#define common_to_vtdss_pt(common_ptr) \ + container_of_const(common_ptr, struct pt_vtdss, common) +#define to_vtdss_pt(pts) common_to_vtdss_pt((pts)->range->common) + +static inline pt_oaddr_t vtdss_pt_table_pa(const struct pt_state *pts) +{ + return log2_mul(FIELD_GET(VTDSS_FMT_OA, pts->entry), PT_TABLEMEM_LG2SZ); +} +#define pt_table_pa vtdss_pt_table_pa + +static inline pt_oaddr_t vtdss_pt_entry_oa(const struct pt_state *pts) +{ + return log2_mul(FIELD_GET(VTDSS_FMT_OA, pts->entry), PT_GRANUAL_LG2SZ); +} +#define pt_entry_oa vtdss_pt_entry_oa + +static inline bool vtdss_pt_can_have_leaf(const struct pt_state *pts) +{ + return pts->level <= 2; +} +#define pt_can_have_leaf vtdss_pt_can_have_leaf + +static inline unsigned int vtdss_pt_table_item_lg2sz(const struct pt_state *pts) +{ + return PT_GRANUAL_LG2SZ + + (PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64))) * pts->level; +} +#define pt_table_item_lg2sz vtdss_pt_table_item_lg2sz + +static inline unsigned int vtdss_pt_num_items_lg2(const struct pt_state *pts) +{ + return PT_TABLEMEM_LG2SZ - ilog2(sizeof(u64)); +} +#define pt_num_items_lg2 vtdss_pt_num_items_lg2 + +static inline enum pt_entry_type vtdss_pt_load_entry_raw(struct pt_state *pts) +{ + const u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + pts->entry = entry = READ_ONCE(tablep[pts->index]); + if (!entry) + return PT_ENTRY_EMPTY; + if (pts->level == 0 || + (vtdss_pt_can_have_leaf(pts) && (pts->entry & VTDSS_FMT_PS))) + return PT_ENTRY_OA; + return PT_ENTRY_TABLE; +} +#define pt_load_entry_raw vtdss_pt_load_entry_raw + +static inline void +vtdss_pt_install_leaf_entry(struct pt_state *pts, pt_oaddr_t oa, + unsigned int oasz_lg2, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + entry = FIELD_PREP(VTDSS_FMT_OA, log2_div(oa, PT_GRANUAL_LG2SZ)) | + attrs->descriptor_bits; + if (pts->level != 0) + entry |= VTDSS_FMT_PS; + + WRITE_ONCE(tablep[pts->index], entry); + pts->entry = entry; +} +#define pt_install_leaf_entry vtdss_pt_install_leaf_entry + +static inline bool vtdss_pt_install_table(struct pt_state *pts, + pt_oaddr_t table_pa, + const struct pt_write_attrs *attrs) +{ + u64 *tablep = pt_cur_table(pts, u64); + u64 entry; + + /* + * FIXME according to the SDM D is ignored by HW on table pointers? + * io_pgtable_v2 sets it + */ + entry = VTDSS_FMT_R | VTDSS_FMT_W | + FIELD_PREP(VTDSS_FMT_OA, log2_div(table_pa, PT_GRANUAL_LG2SZ)); + return pt_table_install64(&tablep[pts->index], entry, pts->entry); +} +#define pt_install_table vtdss_pt_install_table + +static inline void vtdss_pt_attr_from_entry(const struct pt_state *pts, + struct pt_write_attrs *attrs) +{ + attrs->descriptor_bits = pts->entry & (VTDSS_FMT_R | VTDSS_FMT_W | + VTDSS_FMT_X | VTDSS_FMT_SNP); +} +#define pt_attr_from_entry vtdss_pt_attr_from_entry + +static inline void vtdss_pt_clear_entry(struct pt_state *pts, + unsigned int num_contig_lg2) +{ + u64 *tablep = pt_cur_table(pts, u64); + + WRITE_ONCE(tablep[pts->index], 0); +} +#define pt_clear_entry vtdss_pt_clear_entry + +/* --- iommu */ +#include +#include + +#define pt_iommu_table pt_iommu_vtdss + +/* The common struct is in the per-format common struct */ +static inline struct pt_common *common_from_iommu(struct pt_iommu *iommu_table) +{ + return &container_of(iommu_table, struct pt_iommu_table, iommu) + ->vtdss_pt.common; +} + +static inline struct pt_iommu *iommu_from_common(struct pt_common *common) +{ + return &container_of(common, struct pt_iommu_table, vtdss_pt.common) + ->iommu; +} + +static inline int vtdss_pt_iommu_set_prot(struct pt_common *common, + struct pt_write_attrs *attrs, + unsigned int iommu_prot) +{ + u64 pte = 0; + + /* + * VTDSS does not have a present bit, so we tell if any entry is present + * by checking for R or W. + */ + if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE))) + return -EINVAL; + + /* + * FIXME: The VTD driver has a bug setting DMA_FL_PTE_PRESENT on the SS + * table, which forces R on always. + */ + pte |= VTDSS_FMT_R; + + if (iommu_prot & IOMMU_READ) + pte |= VTDSS_FMT_R; + if (iommu_prot & IOMMU_WRITE) + pte |= VTDSS_FMT_W; +/* FIXME if (dmar_domain->set_pte_snp) + pte |= VTDSS_FMT_SNP; */ + + attrs->descriptor_bits = pte; + return 0; +} +#define pt_iommu_set_prot vtdss_pt_iommu_set_prot + +static inline int vtdss_pt_iommu_fmt_init(struct pt_iommu_vtdss *iommu_table, + struct pt_iommu_vtdss_cfg *cfg) +{ + struct pt_vtdss *table = &iommu_table->vtdss_pt; + + /* FIXME configurable */ + pt_top_set_level(&table->common, 3); + return 0; +} +#define pt_iommu_fmt_init vtdss_pt_iommu_fmt_init + +#if defined(GENERIC_PT_KUNIT) +static void vtdss_pt_kunit_setup_cfg(struct pt_iommu_vtdss_cfg *cfg) +{ +} +#define pt_kunit_setup_cfg vtdss_pt_kunit_setup_cfg +#endif + +/* + * Requires Tina's series: + * https://patch.msgid.link/r/20231106071226.9656-3-tina.zhang@intel.com + * See my github for an integrated version + */ +#if defined(GENERIC_PT_KUNIT) && IS_ENABLED(CONFIG_CONFIG_IOMMU_IO_PGTABLE_VTD) +#include + +static struct io_pgtable_ops * +vtdss_pt_iommu_alloc_io_pgtable(struct pt_iommu_vtdss_cfg *cfg, + struct device *iommu_dev, + struct io_pgtable_cfg **unused_pgtbl_cfg) +{ + struct io_pgtable_cfg pgtbl_cfg = {}; + + pgtbl_cfg.ias = 48; + pgtbl_cfg.oas = 52; + pgtbl_cfg.vtd_cfg.cap_reg = 4 << 8; + pgtbl_cfg.vtd_cfg.ecap_reg = BIT(26) | BIT(60) | BIT_ULL(48) | BIT_ULL(56); + pgtbl_cfg.pgsize_bitmap = SZ_4K; + pgtbl_cfg.coherent_walk = true; + return alloc_io_pgtable_ops(INTEL_IOMMU, &pgtbl_cfg, NULL); +} +#define pt_iommu_alloc_io_pgtable vtdss_pt_iommu_alloc_io_pgtable + +static void vtdss_pt_iommu_setup_ref_table(struct pt_iommu_vtdss *iommu_table, + struct io_pgtable_ops *pgtbl_ops) +{ + struct io_pgtable_cfg *pgtbl_cfg = + &io_pgtable_ops_to_pgtable(pgtbl_ops)->cfg; + struct pt_common *common = &iommu_table->vtdss_pt.common; + + pt_top_set(common, __va(pgtbl_cfg->vtd_cfg.pgd), 3); +} +#define pt_iommu_setup_ref_table vtdss_pt_iommu_setup_ref_table + +static u64 vtdss_pt_kunit_cmp_mask_entry(struct pt_state *pts) +{ + if (pts->type == PT_ENTRY_TABLE) + return pts->entry & (~(u64)(VTDSS_FMT_OA)); + return pts->entry; +} +#define pt_kunit_cmp_mask_entry vtdss_pt_kunit_cmp_mask_entry +#endif + +#endif diff --git a/include/linux/generic_pt/common.h b/include/linux/generic_pt/common.h index 558302fe1e0324..a3469132db7dda 100644 --- a/include/linux/generic_pt/common.h +++ b/include/linux/generic_pt/common.h @@ -145,6 +145,10 @@ enum { PT_FEAT_DART_V2 = PT_FEAT_FMT_START, }; +struct pt_vtdss { + struct pt_common common; +}; + struct pt_x86pae { struct pt_common common; }; diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h index 351a69fe62dd1d..b9ecab07b0223d 100644 --- a/include/linux/generic_pt/iommu.h +++ b/include/linux/generic_pt/iommu.h @@ -317,6 +317,18 @@ struct pt_iommu_dart_cfg { int pt_iommu_dart_init(struct pt_iommu_dart *table, struct pt_iommu_dart_cfg *cfg, gfp_t gfp); +struct pt_iommu_vtdss { + struct pt_iommu iommu; + struct pt_vtdss vtdss_pt; +}; + +struct pt_iommu_vtdss_cfg { + struct device *iommu_device; + unsigned int features; +}; +int pt_iommu_vtdss_init(struct pt_iommu_vtdss *table, + struct pt_iommu_vtdss_cfg *cfg, gfp_t gfp); + struct pt_iommu_x86pae { struct pt_iommu iommu; struct pt_x86pae x86pae_pt;