From patchwork Wed Feb 1 12:53:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jean-Philippe Brucker X-Patchwork-Id: 13124418 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B8F20C05027 for ; Wed, 1 Feb 2023 14:29:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=4lzlzIbZajzPq1GEzDCVCAjZRG09X79RUQudvKPmJDA=; b=uTz3ZAXcoT9zS2 LLMINaq3RNHy1HmMZqxQxtYK0juNDGj2IcG8gNib188NGjxYrV9YYSY2G3R6qqvjbm5c2pkAtwTSx rKoOoEo4fA3F0CrVnQ9f2dvcmXDWTivXuerAxLyn10XkQeXHmhFN5KleaKJfg/FkpbOtwEr6MyaWm Lgx41aa3Lf6u8C0Sl5OTQf++jtTqGYHNtsp2Pa3Zj5R7HwEfLp9bFKcSVG+qu+xx19VKjVgSi7Ali Ry9tsh+pYXPGEw+7aGUTH8dTqTwD+mFYfubspmauyloarMbL202apmm6ZmoZzaU8YHn9IuUvlEEVH enFJWx11LNdXxepDx4PA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pNE5m-00CLwS-4o; Wed, 01 Feb 2023 14:28:07 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pNDhb-00CBke-Qa for linux-arm-kernel@bombadil.infradead.org; Wed, 01 Feb 2023 14:03:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Z3Yg+/pxswLQbPunG2DXR1JjAIQ1RbfjYcFDMPdwdzM=; b=UuB3rWVYsVY/Vv0Q6ajvmHPXri sGhWlgKNsE29pfuBJHpet5QTJy3oL44mzwYGLaxwQzSonVlWsO52DhADo7fGFSA8jD+mvse8+4vck +CVU2+m5Ovt4mMuv/aInikRGD4Ipu9xvC1GzDn3ZyBVm+4mN9/i4CJ7Mr+v+uDTNm9aHElHL1BqsA CFUOlZ2qNTIIEuD9kZKGSjCCjdXphB0uFBOzovQOdGZ67nnOQz+gdzL6jgrVtHrnympYG7AEQH0vC jl9Hp0yJVpIqYTFBB1xO1o2cO72wT18snfg+9CXQViWKBgiq/7ob6LkXWKOfrh8tXJP7kzcyQX+g0 VFgpuGLw==; Received: from mail-wr1-x436.google.com ([2a00:1450:4864:20::436]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pNChZ-004m0z-1d for linux-arm-kernel@lists.infradead.org; Wed, 01 Feb 2023 12:59:22 +0000 Received: by mail-wr1-x436.google.com with SMTP id r2so17218767wrv.7 for ; Wed, 01 Feb 2023 04:59:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z3Yg+/pxswLQbPunG2DXR1JjAIQ1RbfjYcFDMPdwdzM=; b=p/nK7+wPjP1LcWDKxtBPABr2lvORJQGF5cwCSAMD9DwVZ1FAJ1dTpxTJnepxujM7dA 7xQceH6A7so/6YM33Q33Mid751l1L4aK2IopzI9FvOZ3a9nkxXtSGU1H2/eYx/BUoYMQ UKO3pbv+TuYBp3+FIBmvd3sWSps62WaMrbgik6YBvQ6tfGN03HHekQv2+FHK0wZPY2ZT p2FyzuY7awvCU5Qa8sxG9yFr5i9HClCJY7Say468qpHde3oyJ5myuMq2RCBmdFkGpgT0 HplkULNe7KeFKAnOw2uPphLzQUETd5b6CoDDJugCRT+WgAmzvgEfX12JL991faA7cPii Omgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z3Yg+/pxswLQbPunG2DXR1JjAIQ1RbfjYcFDMPdwdzM=; b=NI/gRakgs9n4aabwlMXgAfv7smVuNAv+y6gMXmzCOjL4C7lHRY3Tgj/aejJSJidB88 5aM/TIT2AAmgNyyW95UXMCtdz9+cQn+MUoOm9zbjEGreqAlGCDnS58vJsz9IfnmMwfH8 fS/ebI0h2fH/ZS4VIdSghj8d2tBB+KxQXwXq+FXvNAW0NBHR8+rsowVheudFJ/4vslOR OI195h1hEFsOO1qKENn2u2/hUYgP2/RjFmmm+AIy6DRsc0g1K+U6Vzl0+FpsIrDXqZ1j NYRlH37rNELfrq5Gw14f2V9fRBEFIxxPYpwdTWz6LhVyhdONJt7azd3xJ8ARebNQKphZ SOyA== X-Gm-Message-State: AO0yUKUOuH1NS34h389puLPp39IC+kbU1V5iKaM8oxtKkkfnuEXZ7YTW TMB4yA4Jpbyuby+NnqJVI9V/Tw== X-Google-Smtp-Source: AK7set+yLNu0DSsphBd0SuP5yvH+awkiyKJcWCgw8Jnpd5g9PXARhY0JjXD6HP7p0G4o7WR+j5r/4Q== X-Received: by 2002:adf:c713:0:b0:2bf:b8f0:f6c6 with SMTP id k19-20020adfc713000000b002bfb8f0f6c6mr2542639wrg.45.1675256374468; Wed, 01 Feb 2023 04:59:34 -0800 (PST) Received: from localhost.localdomain (054592b0.skybroadband.com. [5.69.146.176]) by smtp.gmail.com with ESMTPSA id m15-20020a056000024f00b002bfae16ee2fsm17972811wrz.111.2023.02.01.04.59.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Feb 2023 04:59:34 -0800 (PST) From: Jean-Philippe Brucker To: maz@kernel.org, catalin.marinas@arm.com, will@kernel.org, joro@8bytes.org Cc: robin.murphy@arm.com, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, smostafa@google.com, dbrazdil@google.com, ryan.roberts@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, iommu@lists.linux.dev, Jean-Philippe Brucker Subject: [RFC PATCH 19/45] KVM: arm64: iommu: Add domains Date: Wed, 1 Feb 2023 12:53:03 +0000 Message-Id: <20230201125328.2186498-20-jean-philippe@linaro.org> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230201125328.2186498-1-jean-philippe@linaro.org> References: <20230201125328.2186498-1-jean-philippe@linaro.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230201_125901_650132_A7DABA93 X-CRM114-Status: GOOD ( 24.49 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The IOMMU domain abstraction allows to share the same page tables between multiple devices. That may be necessary due to hardware constraints, if multiple devices cannot be isolated by the IOMMU (conventional PCI bus for example). It may also help with optimizing resource or TLB use. For pKVM in particular, it may be useful to reduce the amount of memory required for page tables. All devices owned by the host kernel could be attached to the same domain (though that requires host changes). Each IOMMU device holds an array of domains, and the host allocates domain IDs that index this array. The alloc() operation initializes the domain and prepares the page tables. The attach() operation initializes the device table that holds the PGD and its configuration. Signed-off-by: Jean-Philippe Brucker Signed-off-by: Mostafa Saleh --- arch/arm64/kvm/hyp/include/nvhe/iommu.h | 16 +++ include/kvm/iommu.h | 55 ++++++++ arch/arm64/kvm/hyp/nvhe/iommu/iommu.c | 161 ++++++++++++++++++++++++ 3 files changed, 232 insertions(+) diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h index 4959c30977b8..76d3fa6ce331 100644 --- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h +++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h @@ -2,8 +2,12 @@ #ifndef __ARM64_KVM_NVHE_IOMMU_H__ #define __ARM64_KVM_NVHE_IOMMU_H__ +#include +#include + #if IS_ENABLED(CONFIG_KVM_IOMMU) int kvm_iommu_init(void); +int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu); void *kvm_iommu_donate_page(void); void kvm_iommu_reclaim_page(void *p); @@ -74,8 +78,20 @@ static inline phys_addr_t kvm_iommu_iova_to_phys(pkvm_handle_t iommu_id, } #endif /* CONFIG_KVM_IOMMU */ +struct kvm_iommu_tlb_cookie { + struct kvm_hyp_iommu *iommu; + pkvm_handle_t domain_id; +}; + struct kvm_iommu_ops { int (*init)(void); + struct kvm_hyp_iommu *(*get_iommu_by_id)(pkvm_handle_t smmu_id); + int (*alloc_iopt)(struct io_pgtable *iopt, unsigned long pgd_hva); + int (*free_iopt)(struct io_pgtable *iopt); + int (*attach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id, + struct kvm_hyp_iommu_domain *domain, u32 endpoint_id); + int (*detach_dev)(struct kvm_hyp_iommu *iommu, pkvm_handle_t domain_id, + struct kvm_hyp_iommu_domain *domain, u32 endpoint_id); }; extern struct kvm_iommu_ops kvm_iommu_ops; diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h index 12b06a5df889..2bbe5f7bf726 100644 --- a/include/kvm/iommu.h +++ b/include/kvm/iommu.h @@ -3,6 +3,23 @@ #define __KVM_IOMMU_H #include +#include + +/* + * Parameters from the trusted host: + * @pgtable_cfg: page table configuration + * @domains: root domain table + * @nr_domains: max number of domains (exclusive) + * + * Other members are filled and used at runtime by the IOMMU driver. + */ +struct kvm_hyp_iommu { + struct io_pgtable_cfg pgtable_cfg; + void **domains; + size_t nr_domains; + + struct io_pgtable_params *pgtable; +}; struct kvm_hyp_iommu_memcache { struct kvm_hyp_memcache pages; @@ -12,4 +29,42 @@ struct kvm_hyp_iommu_memcache { extern struct kvm_hyp_iommu_memcache *kvm_nvhe_sym(kvm_hyp_iommu_memcaches); #define kvm_hyp_iommu_memcaches kvm_nvhe_sym(kvm_hyp_iommu_memcaches) +struct kvm_hyp_iommu_domain { + void *pgd; + u32 refs; +}; + +/* + * At the moment the number of domains is limited by the ASID and VMID size on + * Arm. With single-stage translation, that size is 2^8 or 2^16. On a lot of + * platforms the number of devices is actually the limiting factor and we'll + * only need a handful of domains, but with PASID or SR-IOV support that limit + * can be reached. + * + * In practice we're rarely going to need a lot of domains. To avoid allocating + * a large domain table, we use a two-level table, indexed by domain ID. With + * 4kB pages and 16-bytes domains, the leaf table contains 256 domains, and the + * root table 256 pointers. With 64kB pages, the leaf table contains 4096 + * domains and the root table 16 pointers. In this case, or when using 8-bit + * VMIDs, it may be more advantageous to use a single level. But using two + * levels allows to easily extend the domain size. + */ +#define KVM_IOMMU_MAX_DOMAINS (1 << 16) + +/* Number of entries in the level-2 domain table */ +#define KVM_IOMMU_DOMAINS_PER_PAGE \ + (PAGE_SIZE / sizeof(struct kvm_hyp_iommu_domain)) + +/* Number of entries in the root domain table */ +#define KVM_IOMMU_DOMAINS_ROOT_ENTRIES \ + (KVM_IOMMU_MAX_DOMAINS / KVM_IOMMU_DOMAINS_PER_PAGE) + +#define KVM_IOMMU_DOMAINS_ROOT_SIZE \ + (KVM_IOMMU_DOMAINS_ROOT_ENTRIES * sizeof(void *)) + +/* Bits [16:split] index the root table, bits [split-1:0] index the leaf table */ +#define KVM_IOMMU_DOMAIN_ID_SPLIT ilog2(KVM_IOMMU_DOMAINS_PER_PAGE) + +#define KVM_IOMMU_DOMAIN_ID_LEAF_MASK ((1 << KVM_IOMMU_DOMAIN_ID_SPLIT) - 1) + #endif /* __KVM_IOMMU_H */ diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c index 1a9184fbbd27..7404ea77ed9f 100644 --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c @@ -13,6 +13,22 @@ struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches; +/* + * Serialize access to domains and IOMMU driver internal structures (command + * queue, device tables) + */ +static hyp_spinlock_t iommu_lock; + +#define domain_to_iopt(_iommu, _domain, _domain_id) \ + (struct io_pgtable) { \ + .ops = &(_iommu)->pgtable->ops, \ + .pgd = (_domain)->pgd, \ + .cookie = &(struct kvm_iommu_tlb_cookie) { \ + .iommu = (_iommu), \ + .domain_id = (_domain_id), \ + }, \ + } + void *kvm_iommu_donate_page(void) { void *p; @@ -41,10 +57,155 @@ void kvm_iommu_reclaim_page(void *p) PAGE_SIZE); } +static struct kvm_hyp_iommu_domain * +handle_to_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id, + struct kvm_hyp_iommu **out_iommu) +{ + int idx; + struct kvm_hyp_iommu *iommu; + struct kvm_hyp_iommu_domain *domains; + + iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id); + if (!iommu) + return NULL; + + if (domain_id >= iommu->nr_domains) + return NULL; + domain_id = array_index_nospec(domain_id, iommu->nr_domains); + + idx = domain_id >> KVM_IOMMU_DOMAIN_ID_SPLIT; + domains = iommu->domains[idx]; + if (!domains) { + domains = kvm_iommu_donate_page(); + if (!domains) + return NULL; + iommu->domains[idx] = domains; + } + + *out_iommu = iommu; + return &domains[domain_id & KVM_IOMMU_DOMAIN_ID_LEAF_MASK]; +} + +int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id, + unsigned long pgd_hva) +{ + int ret = -EINVAL; + struct io_pgtable iopt; + struct kvm_hyp_iommu *iommu; + struct kvm_hyp_iommu_domain *domain; + + hyp_spin_lock(&iommu_lock); + domain = handle_to_domain(iommu_id, domain_id, &iommu); + if (!domain) + goto out_unlock; + + if (domain->refs) + goto out_unlock; + + iopt = domain_to_iopt(iommu, domain, domain_id); + ret = kvm_iommu_ops.alloc_iopt(&iopt, pgd_hva); + if (ret) + goto out_unlock; + + domain->refs = 1; + domain->pgd = iopt.pgd; +out_unlock: + hyp_spin_unlock(&iommu_lock); + return ret; +} + +int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id) +{ + int ret = -EINVAL; + struct io_pgtable iopt; + struct kvm_hyp_iommu *iommu; + struct kvm_hyp_iommu_domain *domain; + + hyp_spin_lock(&iommu_lock); + domain = handle_to_domain(iommu_id, domain_id, &iommu); + if (!domain) + goto out_unlock; + + if (domain->refs != 1) + goto out_unlock; + + iopt = domain_to_iopt(iommu, domain, domain_id); + ret = kvm_iommu_ops.free_iopt(&iopt); + + memset(domain, 0, sizeof(*domain)); + +out_unlock: + hyp_spin_unlock(&iommu_lock); + return ret; +} + +int kvm_iommu_attach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id, + u32 endpoint_id) +{ + int ret = -EINVAL; + struct kvm_hyp_iommu *iommu; + struct kvm_hyp_iommu_domain *domain; + + hyp_spin_lock(&iommu_lock); + domain = handle_to_domain(iommu_id, domain_id, &iommu); + if (!domain || !domain->refs || domain->refs == UINT_MAX) + goto out_unlock; + + ret = kvm_iommu_ops.attach_dev(iommu, domain_id, domain, endpoint_id); + if (ret) + goto out_unlock; + + domain->refs++; +out_unlock: + hyp_spin_unlock(&iommu_lock); + return ret; +} + +int kvm_iommu_detach_dev(pkvm_handle_t iommu_id, pkvm_handle_t domain_id, + u32 endpoint_id) +{ + int ret = -EINVAL; + struct kvm_hyp_iommu *iommu; + struct kvm_hyp_iommu_domain *domain; + + hyp_spin_lock(&iommu_lock); + domain = handle_to_domain(iommu_id, domain_id, &iommu); + if (!domain || domain->refs <= 1) + goto out_unlock; + + ret = kvm_iommu_ops.detach_dev(iommu, domain_id, domain, endpoint_id); + if (ret) + goto out_unlock; + + domain->refs--; +out_unlock: + hyp_spin_unlock(&iommu_lock); + return ret; +} + +int kvm_iommu_init_device(struct kvm_hyp_iommu *iommu) +{ + void *domains; + + domains = iommu->domains; + iommu->domains = kern_hyp_va(domains); + return pkvm_create_mappings(iommu->domains, iommu->domains + + KVM_IOMMU_DOMAINS_ROOT_ENTRIES, PAGE_HYP); +} + int kvm_iommu_init(void) { enum kvm_pgtable_prot prot; + hyp_spin_lock_init(&iommu_lock); + + if (WARN_ON(!kvm_iommu_ops.get_iommu_by_id || + !kvm_iommu_ops.alloc_iopt || + !kvm_iommu_ops.free_iopt || + !kvm_iommu_ops.attach_dev || + !kvm_iommu_ops.detach_dev)) + return -ENODEV; + /* The memcache is shared with the host */ prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_OWNED); return pkvm_create_mappings(kvm_hyp_iommu_memcaches,