From patchwork Tue Mar 18 03:59:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Lu X-Patchwork-Id: 14020270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAD82C282EC for ; Tue, 18 Mar 2025 03:59:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5071D280003; Mon, 17 Mar 2025 23:59:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B662280001; Mon, 17 Mar 2025 23:59:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35C4F280003; Mon, 17 Mar 2025 23:59:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0AFCC280001 for ; Mon, 17 Mar 2025 23:59:48 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 931FE1C9064 for ; Tue, 18 Mar 2025 03:59:48 +0000 (UTC) X-FDA: 83233318056.25.FA01F8F Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf01.hostedemail.com (Postfix) with ESMTP id 9C64840004 for ; Tue, 18 Mar 2025 03:59:46 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=LCxrqay+; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of luxu.kernel@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=luxu.kernel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742270386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=ZM5P+IjBk0el9UE3Da9LxCxLFF8HP6/s45glS33gjHM8Z8mUp1j03+rg9mQZFEgOLGAwcW nqCnswBR8h8jK0WTcrdNP/WFIiqrxByVOGSulA5Prm1z/aM8gAiunyXLyzcplBXp35roVT Y2EyaF9X2m3kUDh9r1axpHvUCW8aOfU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742270386; a=rsa-sha256; cv=none; b=ddMf2UGFPRQMpulVzadidTggr5KEFuR078A3bo5hvjsPmoKHCNRgXO3HJL+OgwQBlHm+u+ MHxekR6THxEedlx9qHO9y0RlzJ5v/ejQY++4vX/hLS0VNR/cujynRI0GqJyryZY1ri+re8 VFt3Q2Xj1xuhUFnDm6OrST+Id9rzibs= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=LCxrqay+; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of luxu.kernel@bytedance.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=luxu.kernel@bytedance.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-22359001f1aso50273205ad.3 for ; Mon, 17 Mar 2025 20:59:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1742270385; x=1742875185; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=LCxrqay+Yj7/1UolCHoyDghEhTcjo9NlqCHK9CCNyu/dHOgVIWLz8iLrXX6JShIzGJ ftYePJVKWLds2DPFaSJklLruyN9OWMEMCYbwBkhC7y0ldnEqYBOeYKjLeJ+KezREQmsH GYwBKPEBLSEj74etaB2khTsJkUF8Y+hI0YrHjdHgNfBFI5ms7JxtvM6JhHeIrujtOCkC nGjY0B2VCMlFITKvjyDmZl8B/nGCgmtKH2XS+XXz+wZlkHylW6ri/atoVhlPKVsKj3iR O9g67eE3hXCK1F1SSoBrKXQYZAplQUghXcrXbrqpsgQ5UahhuXiivrA7xpLc9P6r3wyK xwCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742270385; x=1742875185; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8OMb2TInCE9Qb7KUVvNgWIsY4Uxx9EaJDiVCyGMWyMw=; b=T88gTctADANA9vnAhfGKZ5vcvPwDxgFcrDfwjlAKS/J3K4+pqbCMhG4j9+RNTsW7Yr 3GwmAQu5/CK6FNVW3ILrclJy3+zhScr8onWorAoBpztIse36VROGNRbQhjoRqG6wZTty tk9QbIFTU7kOlJB/YOf7owbqBUZ4ygKCRasNixSA1xPsUCjwR2xtKGxOUvJLTie0wn6Y pdPAz7ryuupxZFTT+U256FgamaDcAoYavn3YyrjDCmxdTsEPMFOOGPwTGrGK9pOD1da4 tpa9pAIyxQsZ6hYmjTg78ElTbsvcM+kZpHcAVxaerWQZ38tGKcd1GbQe4U/uVNvJJlIA PnZA== X-Forwarded-Encrypted: i=1; AJvYcCX7kRAx4m48dGV/4xx41anHirOESI6gF4NLfZf0tscDtubm5DhVAtznan97a5IrcUiDx7WC15vRNw==@kvack.org X-Gm-Message-State: AOJu0YyPWR82U7KdzyqWj7XF4ADnj/TWeWPms3POxLNtxjhvYhNl17f3 r4oO3rOwVJSg6kir+Go7XuQxxiUR+t8pjiBU02QaDzwhZ2HGPzZQHQ4n5xvYhms= X-Gm-Gg: ASbGncuZQJ5vXKEZmU8L2RovGeVZ13BHkyQhxvV/aMlM79y878hdzSOYGkZMhwn+J40 hF6eG4DZLV/8rV1TSZLhL5IRnBBQQGeuekEgNqbAQvpSYauG+nhisGZrFN6psqoSzPfuthCcnBg AIzTjhxpoNZkkX742f+y9ucga9hVH8R3TOWypO1MjKKQ8qS2UvBYZtimw5wPpUuMuhtbWCRP1TA Tf+99scCj7D5ZvMWM2ndUe8mte6P/vDEAuJSFeKTbpe5KvhjNLDx522/AREjtM9f1skg3rlZM3a 5O8sPdhYbrDey8ccx0kPKaZDOMBGxgd6XPYC+wMb1oPZkTlhLXHUg/mV161V1d+BlXeZSGszUYo 0RuPrFJ3HkfdXGZXFRTJBvz3eqp8= X-Google-Smtp-Source: AGHT+IGHO8cx8ckfDIHxWMgqvQhdq4sCCYAYAML9a+Udl9ONIKirtb2WHArjW9djfVtwLHqHp3jjYg== X-Received: by 2002:a17:903:1ca:b0:21f:b483:2ad5 with SMTP id d9443c01a7336-2262c555e78mr19060495ad.20.1742270385373; Mon, 17 Mar 2025 20:59:45 -0700 (PDT) Received: from J9GPGXL7NT.bytedance.net ([61.213.176.55]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-225c6bd4b30sm83720135ad.235.2025.03.17.20.59.40 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 17 Mar 2025 20:59:44 -0700 (PDT) From: Xu Lu To: akpm@linux-foundation.org, jhubbard@nvidia.com, kirill.shutemov@linux.intel.com, tjeznach@rivosinc.com, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com Cc: lihangjing@bytedance.com, xieyongji@bytedance.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Xu Lu Subject: [PATCH RESEND v2 1/4] mm/gup: Add huge pte handling logic in follow_page_pte() Date: Tue, 18 Mar 2025 11:59:27 +0800 Message-Id: <20250318035930.11855-2-luxu.kernel@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250318035930.11855-1-luxu.kernel@bytedance.com> References: <20250318035930.11855-1-luxu.kernel@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9C64840004 X-Stat-Signature: 54k9ypnxrn8e3zrarxizfhzhhee1mck5 X-HE-Tag: 1742270386-390773 X-HE-Meta: U2FsdGVkX1/7jGY41E0I4+bZJzp2mp4s9mpSfEvwcpvBk0YXpJ7m90kXkRb6Gjq3GDunyum7cGlidQElQwIkFhz2zoALfo/5bjLLyg+bHRxlHJPO1qaZo4JU3lxS85Xwwi5QhVLCxCkxWIBUkn8aZwmL29FSurc4A0S1ZE6J7roUVACUSbIObXWwWpXPCQmzfYkXZMlYe7ZlrQpaWxOe4mRZHBB936Ec4ev96p/5newrKpG5yRFazScdzVvzC1eiW5Su/hqWSlqRuhQO267ivxHlDoXkslQT1DKO50MzlhhcDNCKpWiIPW9MytiHA+2eebW4t77d0/vqwCV/YlAlSiu/VGbYZ7NyiNbbTD6aowWbI9GQQ+s5L7qXTw6zErpxG+dcDWwDhpQUmNPJbi/J+jxc/61Fz9DZ5YU9/RE/RDhCBdsPKhgvFQ0te7bULpH+cuuJUIAA+J0XTFVlM/PsLVjwE/ZJz7V+hL8iBsGIQqPGTasklWZWgEG6KbM+5s178Obnju2dxBo4xuFR5ajaOYtxk1Iuv/26p5f8yRI5T8GCwJ2MvYofYBwfF/OtaYxqkQM+kaFfVK7WAavBSPPdlxR2flxjXK1lRoaOvEBi7HzF6bKmuv2QumUSliZW9Az9gm26OiV9RLm+KpJGrpovtt9tvsphNJ+FvjjUgSU7JctIfaLwI3U/WyNWBT1EeuyzVmUtZv41beqlw5ctjgVJ7GF46p9X7O/MzcHBEMWyBhCPP4bpg/HoyBs4PAqZeQeCrRW6m2Gv6j0XDv5oxOlh4sM+YQQDOZj4G+GOaMXf0X4nCl0BbtGNWnJej+Vjsgz9r0+gHOsd/N7BqaRYJ2QpnFfmEVHAMOEHMQokNqR7xUp6hYTwgQmtKC0VAJClB7K6qh3x8RAzkuPbbfg5QNCamW5EbCN5enTJtHzwqMq0JKSeDE7kc9maXgn+cQNXZL8X8yQxkCMexu/tpE1f4J1 nz3Fui76 Fv1raptNYNVPa5j2GjXZh7UV6yCGEGbJsgwRU6ye0YQBcBCpQyjVpgC43p3Trgs5ZCbHcbtYrrvNbXY3ONm2FzSCO2xAHl5P96/w3Ue3r47An84Csad4E49tWQid72p6M+JPI02VGJgpEpHZpLZl93v4u2Cyc8cJZSscjL5Cb+DQhQRigzfhauBFvXPQJ0KOXVS5zIuEs32BQe0vVnk1SJ812BMrmVGBA8d0LSsTy3uCqABR20ttvU42sQUilVq4WRgxB76m/P4qSI/5aeEufZD0xWNXvwXAfi1eWia7I0CAj/A1IBDG4kW5tXkjWW0lhB+vo21gP+EDokObWzH8GFwdbFqI0+xYqRda7bzB8pF9IYVw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Page mapped at pte level can also be huge page when ARM CONT_PTE or RISC-V SVNAPOT is applied. Lack of huge pte handling logic in follow_page_pte() may lead to both performance and correctness issues. For example, on RISC-V platform, pages in the same 64K huge page have the same pte value, which means follow_page_pte() will get the same page for all of them using pte_pfn(). Then __get_user_pages() will return an array of pages with the same pfn. Mapping these pages causes memory confusion. This error can be triggered by the following code: void *addr = mmap(NULL, 0x10000, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | MAP_HUGE_64KB, -1, 0); struct vfio_iommu_type1_dma_map dmap_map = { .argsz = sizeof(dma_map), .flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, .vaddr = (uint64_t)addr, .size = 0x10000, }; ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); This commit supplies huge pte handling logic in follow_page_pte() to avoid such problems. Signed-off-by: Xu Lu --- arch/riscv/include/asm/pgtable.h | 6 ++++++ include/linux/pgtable.h | 8 ++++++++ mm/gup.c | 17 +++++++++++------ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 050fdc49b5ad7..40ae5979dd82c 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -800,6 +800,12 @@ static inline bool pud_user_accessible_page(pud_t pud) #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define pte_trans_huge pte_trans_huge +static inline int pte_trans_huge(pte_t pte) +{ + return pte_huge(pte) && pte_napot(pte); +} + static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..3f57ee6dcf017 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1584,6 +1584,14 @@ static inline unsigned long my_zero_pfn(unsigned long addr) #ifdef CONFIG_MMU +#if (defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(pte_trans_huge)) || \ + (!defined(CONFIG_TRANSPARENT_HUGEPAGE)) +static inline int pte_trans_huge(pte_t pte) +{ + return 0; +} +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_trans_huge(pmd_t pmd) { diff --git a/mm/gup.c b/mm/gup.c index 3883b307780ea..67981ee28df86 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -838,7 +838,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, static struct page *follow_page_pte(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, unsigned int flags, - struct dev_pagemap **pgmap) + struct follow_page_context *ctx) { struct mm_struct *mm = vma->vm_mm; struct folio *folio; @@ -879,8 +879,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, * case since they are only valid while holding the pgmap * reference. */ - *pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap); - if (*pgmap) + ctx->pgmap = get_dev_pagemap(pte_pfn(pte), ctx->pgmap); + if (ctx->pgmap) page = pte_page(pte); else goto no_page; @@ -940,6 +940,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, */ folio_mark_accessed(folio); } + if (is_vm_hugetlb_page(vma) || pte_trans_huge(pte)) { + ctx->page_mask = (1 << folio_order(folio)) - 1; + page = folio_page(folio, 0) + + ((address & (folio_size(folio) - 1)) >> PAGE_SHIFT); + } out: pte_unmap_unlock(ptep, ptl); return page; @@ -975,7 +980,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, return no_page_table(vma, flags, address); } if (likely(!pmd_leaf(pmdval))) - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags, address); @@ -988,14 +993,14 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, } if (unlikely(!pmd_leaf(pmdval))) { spin_unlock(ptl); - return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + return follow_page_pte(vma, address, pmd, flags, ctx); } if (pmd_trans_huge(pmdval) && (flags & FOLL_SPLIT_PMD)) { spin_unlock(ptl); split_huge_pmd(vma, pmd, address); /* If pmd was left empty, stuff a page table in there quickly */ return pte_alloc(mm, pmd) ? ERR_PTR(-ENOMEM) : - follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + follow_page_pte(vma, address, pmd, flags, ctx); } page = follow_huge_pmd(vma, address, pmd, flags, ctx); spin_unlock(ptl);