From patchwork Tue Mar 18 03:59:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14020272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD4C2C282EC for ; Tue, 18 Mar 2025 03:59:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FB3D280005; Mon, 17 Mar 2025 23:59:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A8EE280001; Mon, 17 Mar 2025 23:59:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42376280005; Mon, 17 Mar 2025 23:59:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 27427280001 for ; Mon, 17 Mar 2025 23:59:58 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C18581A09A0 for ; Tue, 18 Mar 2025 03:59:58 +0000 (UTC) X-FDA: 83233318476.13.CAA25A1 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf04.hostedemail.com (Postfix) with ESMTP id D6A4540003 for ; Tue, 18 Mar 2025 03:59:56 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=UChpYSFd; spf=pass (imf04.hostedemail.com: domain of luxu.kernel@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=luxu.kernel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742270396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=Zu/8YPAnUjzi5pgpfk0KG8Eq5Haqn0zLMwvURij4/nZm4JoSRmLaplBlLMcOXS87ZB2lZK KkIG4Y60mr8QzxMHnon9vbVBvqZBIIHjHZnCR8SBPr7Jtayfo5Xf91dLlxLNeJacscIuMx mZmzms0SDxEZI3EMZmWhazo8o7OmxEg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=UChpYSFd; spf=pass (imf04.hostedemail.com: domain of luxu.kernel@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=luxu.kernel@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742270396; a=rsa-sha256; cv=none; b=jnDltQDNdmD9Pn1ezmhBUt279rLZ8HR1xoLC8bsE7o7UyCe1xl0idhh2F2NZ7OVhbSoQor czXpExco6v7B5tXyP80lSZjWlkXsP9WemnDy/GXorgcNbyZIwKmZDm6azNM5uA3M1Lrxd6 WP9YRvzHVFTRsrAOt73o9ETCiu2Luk4= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-223a7065ff8so62366735ad.0 for ; Mon, 17 Mar 2025 20:59:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270396; x=1742875196; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=UChpYSFd2fO4GPeAXqxqzgFX4ZuR2zdEoxyeEO4/V+feEWRsfIqrwElxbQ4GNCgFrq HIw+5QcWki0BWmZvkMIPW43wm8NkbM3NtECF5iF0la8Pilc1LBOcZxXlZxMwJjdQg5XO xrMCGlMUhdkZukixqovXazaL5XQ9CST2EQbGuEKztejPGWglZfbdmLozB8SietUMdut/ uBjSwQv7Kt1lwOvbjp3E0hrS/mSPNWw58d/BIJKtcenRsmdQTNvuaOS/PaMfDzQ7Pt+K esP4bddGdBH8gXaEeFfqzqOAAiv2e2/Esy2jTwz9/V7yLQgUaX3nonFFcM4w1ynGqvUK XF9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270396; x=1742875196; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xaRB3Z7Zn9kGyUKfUpl8WYZaXTZf5T6vpq1GZf0TWl4=; b=pAwanQaVno/HO9PO/v4ASGXAV5TV1lIzHvPhY+MqU3bOsQS9q2623KIgtP8LOZbsbe qdJLsViyf3nOpdsHgDBskDiutF8zv1UorGOK0zCkg7WmxcWg9U+4TEuX1nIF78QtKKNc nNwGEfAVn5YdsJeYPopBOwhqPFtuvc4T/SVlkMVQnTgGY67jCTNtyxi/qLsVk9fmLbHz A97CbCw7PD6fDbGa6Zx0Mrzo1w13agVtaz52sxJx+KpRDpedmcbfs18JzK2WXcnHn4vE HZjm0c2pnNWlPYqbNsLjQgx7aNGSAzhixGLS1QT4R9l3bwgucl/UaOY36ZSihZZi5icK b9Bg== X-Forwarded-Encrypted: i=1; AJvYcCWXx5NS34Qz9988imtjizdhBaAVHa5sH0ICWyf+M1H8Fbgh5IRbF5+WXXP6gWFvKNBeattl4qX57A==@kvack.org X-Gm-Message-State: AOJu0Yz4Hx6lpdlHubWatFkByQz3bYWYHh2/zcpESpTZO2uaWpJy8uNt Ln9teXubV8EAYH5snve4eFoswWJ2X/crEPd05yhRsM/UcL8RD6rTLC/Pz2kgG0HDUjOP2xhKiw9 8YGo= X-Gm-Gg: ASbGnctIQBhzAFhP+Qwk9P/O3ftjXVPscBQwk2KgTUez3f+JGT7ogBpBkyPmj3eT+0n rCJRqpJRLlbJDvoKuBvS+l/STbkt0E8I8kDxqKVuo68Jg1Xhs5lLesJ0jLz/I0vHJuHkzBVYWUD q7QFJmt1SQ/IIZtRSD4a7YWVVmI1gbm30Nj/82D3yGnGYYlv+ESKJbVSA/yhGA2F0WR17pf+lIH OdPrmN7dVSHqX6itvlS0mBx+Cgh6/8jD81prL7S1h/p4aco1MsXUPatdI1FdVHWBBCyQ7UT09m1 W2cxR6nQtSiwDMNVLgIrS4cGGENGKi5+H76biGTbHiKJ7m6asiIDdApl2iCqhy87ktE1uiDD43D XTfry5bXTazpnuhdhs/k+6TYF4ec= X-Google-Smtp-Source: AGHT+IExX+iDrDCGHA8Aq12SWZuDNNckGnu8pihoRkyjLOg0HP1RuQ216rkdL14FWV3g+VDeX680Yg== X-Received: by 2002:a17:903:238b:b0:223:6180:1bf7 with SMTP id d9443c01a7336-225e0b2fba7mr186404745ad.42.1742270395708; Mon, 17 Mar 2025 20:59:55 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.50 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:54 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 3/4] iommu/riscv: Introduce IOMMU page table lock Date: Tue, 18 Mar 2025 11:59:29 +0800 Message-Id: <20250318035930.11855-4-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: D6A4540003 X-Stat-Signature: kzw4yf6tsj9eooufcnbeen758eb5worb X-HE-Tag: 1742270396-81497 X-HE-Meta: U2FsdGVkX1+qO35oQbS0bVWXi+NGsrale3QO0ij82SD2CWX3uTwVIl2u88dLhU/xWvG4IEXE3QTwjwJlZpG7WlzOW3mTQs7DqZX0nrn6GIW4c02Oe0SRUdk+wmUbGRnDAT4yiMfjsfMNjQ9wyWkMtXI5yfzEVm56R8DIgFyhSInGFC/MWgrLzQaLb7zokJJ7mvdBVM4ZIBAIrXFprNTA2Oa6Xcg8AqxVR3fgF0vfuEO4JdBmuBAMOFJ0bgUhOfMV6C2wbEe7VvvfBmMn0UOhi7tfDB9udODbJZFtGurNTj6Rz7TGk6v6u3bdNaMR3+FW/2YvPP+0ujK8OW/mkrEe8VCiwUr0RNuEySzExCSqraycAdU/KuGx6ZLUjciBFtUNs9GeQYcf4m9zedN+jPfhi9m7O4MSEEHk6o7VxP41jRUueD+l8RKHjL8L6Bm4EPkcFlSJi8n3YjciTGXpulHnpJKzEAtYrfKucRq+t1DuK3qS/t4sAMZwkD0KjzDlPJLAwCTMBlG+Hc8MZYWs8xHDinsUeaVx/Ay64Hw9vWLjHd6UM6bLdPJZwx8v3FsIdeS0oyhE9YboW/1/BVfq6h2+NOIqLdQUN64QAsySQWM8yMnn17XsMXTGK9pFYX2stClRY2BE1EwR1WIiLBHsopcFWo0agq5W51Dp4TSaXnQ1hbURr9GxwVOjKW0opVe6lC8sYY65FvirKIfdPH6kwfKBSzrw7xlhHfpy8+aPvHV73aRPjITdNsIFE+sFbcSdp00X9qJW2HRuZXqX/jPS2jvRs8WIEyPB6fEmy9FOPOg3cBdC8NgXu0FqTnuSs72/Vvwrc8Ektck6NP9aDQU3DjqZYbNgbmZExlPTO0QUt/MHZG6L1x/hxP0nVMOgx5Gv/9b5iyKMA3Ft0rms2OAMSaUdjFu4rehXfHe3jn4XuHzALkEvhLgNODdVMqX545WC+y1f+TmB2A8k+Pk/NCW+sFy Gvheb6qD EfIhnUrN8nNmzM5X7GDOmSVck9/LaxjzdNETkas+4ytUCwD0wmZlCrtBJduHFOXGYryXdkoWQ2FPBVSpzsdijsWs3uwVkYpCodXk6WWxhpKBDW/LdkEAbmwdnKmPypTdNTMG/VZcPuhSAxuU+8JQjIljHDj6oOSxG2Ply4TD1gEmQJfQ50b3dYZszSYUOemgBd4bNhPCsp9d+qwOJpVIFy1mYLEUP/CXRgbiWK4WFnXvi2FPjLCZMyaIooXzoKoagmXOtP7rkC2A0YzJT+UIyx23m+2yrKNTKeTLCR5zEfNYfXWeM9pvtQZnSew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce page table lock to address competition issues when modifying multiple PTEs, for example, when applying Svnapot. We use fine-grained page table locks to minimize lock contention. Signed-off-by: Xu Lu --- drivers/iommu/riscv/iommu.c | 123 +++++++++++++++++++++++++++++++----- 1 file changed, 107 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c index 3b0c934decd08..ce4cf6569ffb4 100644 --- a/drivers/iommu/riscv/iommu.c +++ b/drivers/iommu/riscv/iommu.c @@ -808,6 +808,7 @@ struct riscv_iommu_domain { struct iommu_domain domain; struct list_head bonds; spinlock_t lock; /* protect bonds list updates. */ + spinlock_t page_table_lock; /* protect page table updates. */ int pscid; bool amo_enabled; int numa_node; @@ -1086,8 +1087,80 @@ static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, #define _io_pte_none(pte) (pte_val(pte) == 0) #define _io_pte_entry(pn, prot) (__pte((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot))) +#define RISCV_IOMMU_PMD_LEVEL 1 + +static bool riscv_iommu_ptlock_init(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + return ptlock_init(ptdesc); + return true; +} + +static void riscv_iommu_ptlock_free(struct ptdesc *ptdesc, int level) +{ + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptlock_free(ptdesc); +} + +static spinlock_t *riscv_iommu_ptlock(struct riscv_iommu_domain *domain, + pte_t *pte, int level) +{ + spinlock_t *ptl; /* page table page lock */ + +#ifdef CONFIG_SPLIT_PTE_PTLOCKS + if (level <= RISCV_IOMMU_PMD_LEVEL) + ptl = ptlock_ptr(page_ptdesc(virt_to_page(pte))); + else +#endif + ptl = &domain->page_table_lock; + spin_lock(ptl); + + return ptl; +} + +static void *riscv_iommu_alloc_pagetable_node(int numa_node, gfp_t gfp, int level) +{ + struct ptdesc *ptdesc; + void *addr; + + addr = iommu_alloc_page_node(numa_node, gfp); + if (!addr) + return NULL; + + ptdesc = page_ptdesc(virt_to_page(addr)); + if (!riscv_iommu_ptlock_init(ptdesc, level)) { + iommu_free_page(addr); + addr = NULL; + } + + return addr; +} + +static void riscv_iommu_free_pagetable(void *addr, int level) +{ + struct ptdesc *ptdesc = page_ptdesc(virt_to_page(addr)); + + riscv_iommu_ptlock_free(ptdesc, level); + iommu_free_page(addr); +} + +static int pgsize_to_level(size_t pgsize) +{ + int level = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 - + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + int shift = PAGE_SHIFT + PT_SHIFT * level; + + while (pgsize < ((size_t)1 << shift)) { + shift -= PT_SHIFT; + level--; + } + + return level; +} + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, - pte_t pte, struct list_head *freelist) + pte_t pte, int level, + struct list_head *freelist) { pte_t *ptr; int i; @@ -1102,10 +1175,11 @@ static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, pte = ptr[i]; if (!_io_pte_none(pte)) { ptr[i] = __pte(0); - riscv_iommu_pte_free(domain, pte, freelist); + riscv_iommu_pte_free(domain, pte, level - 1, freelist); } } + riscv_iommu_ptlock_free(page_ptdesc(virt_to_page(ptr)), level); if (freelist) list_add_tail(&virt_to_page(ptr)->lru, freelist); else @@ -1117,8 +1191,9 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, gfp_t gfp) { pte_t *ptr = domain->pgd_root; - pte_t pte, old; + pte_t pte; int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + spinlock_t *ptl; /* page table page lock */ void *addr; do { @@ -1146,14 +1221,21 @@ static pte_t *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, * page table. This might race with other mappings, retry. */ if (_io_pte_none(pte)) { - addr = iommu_alloc_page_node(domain->numa_node, gfp); + addr = riscv_iommu_alloc_pagetable_node(domain->numa_node, gfp, + level - 1); if (!addr) return NULL; - old = ptep_get(ptr); - if (!_io_pte_none(old)) + + ptl = riscv_iommu_ptlock(domain, ptr, level); + pte = ptep_get(ptr); + if (!_io_pte_none(pte)) { + spin_unlock(ptl); + riscv_iommu_free_pagetable(addr, level - 1); goto pte_retry; + } pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); set_pte(ptr, pte); + spin_unlock(ptl); } ptr = (pte_t *)pfn_to_virt(pte_pfn(pte)); } while (level-- > 0); @@ -1193,9 +1275,10 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); size_t size = 0; pte_t *ptr; - pte_t pte, old; + pte_t pte; unsigned long pte_prot; - int rc = 0; + int rc = 0, level; + spinlock_t *ptl; /* page table page lock */ LIST_HEAD(freelist); if (!(prot & IOMMU_WRITE)) @@ -1212,11 +1295,12 @@ static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, break; } - old = ptep_get(ptr); + level = pgsize_to_level(pgsize); + ptl = riscv_iommu_ptlock(domain, ptr, level); + riscv_iommu_pte_free(domain, ptep_get(ptr), level, &freelist); pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); set_pte(ptr, pte); - - riscv_iommu_pte_free(domain, old, &freelist); + spin_unlock(ptl); size += pgsize; iova += pgsize; @@ -1251,6 +1335,7 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, pte_t *ptr; size_t unmapped = 0; size_t pte_size; + spinlock_t *ptl; /* page table page lock */ while (unmapped < size) { ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); @@ -1261,7 +1346,9 @@ static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, if (iova & (pte_size - 1)) return unmapped; + ptl = riscv_iommu_ptlock(domain, ptr, pgsize_to_level(pte_size)); set_pte(ptr, __pte(0)); + spin_unlock(ptl); iommu_iotlb_gather_add_page(&domain->domain, gather, iova, pte_size); @@ -1291,13 +1378,14 @@ static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) { struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); const unsigned long pfn = virt_to_pfn(domain->pgd_root); + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; WARN_ON(!list_empty(&domain->bonds)); if ((int)domain->pscid > 0) ida_free(&riscv_iommu_pscids, domain->pscid); - riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), level, NULL); kfree(domain); } @@ -1358,7 +1446,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) struct riscv_iommu_device *iommu; unsigned int pgd_mode; dma_addr_t va_mask; - int va_bits; + int va_bits, level; iommu = dev_to_iommu(dev); if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { @@ -1381,11 +1469,14 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) INIT_LIST_HEAD_RCU(&domain->bonds); spin_lock_init(&domain->lock); + spin_lock_init(&domain->page_table_lock); domain->numa_node = dev_to_node(iommu->dev); domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD); domain->pgd_mode = pgd_mode; - domain->pgd_root = iommu_alloc_page_node(domain->numa_node, - GFP_KERNEL_ACCOUNT); + level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; + domain->pgd_root = riscv_iommu_alloc_pagetable_node(domain->numa_node, + GFP_KERNEL_ACCOUNT, + level); if (!domain->pgd_root) { kfree(domain); return ERR_PTR(-ENOMEM); @@ -1394,7 +1485,7 @@ static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1, RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); if (domain->pscid < 0) { - iommu_free_page(domain->pgd_root); + riscv_iommu_free_pagetable(domain->pgd_root, level); kfree(domain); return ERR_PTR(-ENOMEM); }