From patchwork Tue Mar 25 10:28:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for David Hildenbrand X-Patchwork-Id: 14028302 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47258C35FFC for ; Tue, 25 Mar 2025 10:28:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFD39280003; Tue, 25 Mar 2025 06:28:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8774280001; Tue, 25 Mar 2025 06:28:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90180280003; Tue, 25 Mar 2025 06:28:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6E593280001 for ; Tue, 25 Mar 2025 06:28:37 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EDC4081032 for ; Tue, 25 Mar 2025 10:28:37 +0000 (UTC) X-FDA: 83259699474.12.314A76F Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf13.hostedemail.com (Postfix) with ESMTP id D938B2000E for ; Tue, 25 Mar 2025 10:28:35 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=DlfPyjaY; dkim=pass header.d=linutronix.de header.s=2020e header.b=AinLuDfR; spf=pass (imf13.hostedemail.com: domain of tip-bot2@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tip-bot2@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742898516; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IX7ZciLSSPhoHHVqBAEaO5fkvju9wRbHw8KbG8TUtEY=; b=xu70miAz4uu1WgCsDKym0UPt3fml0lGYyPs5EeTuFGHEAMny4J1gNcREGMOb7hIWgo3aHn mnh5u1ZE5/Lirli7bLSHGcwhsJcyrWUAyDh7MgaI+N14XI23EAoJwGSd6VLcoWmPrG243l 3n76vO7yZJqfjSUB11WdstyI8wb6So0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=DlfPyjaY; dkim=pass header.d=linutronix.de header.s=2020e header.b=AinLuDfR; spf=pass (imf13.hostedemail.com: domain of tip-bot2@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=tip-bot2@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742898516; a=rsa-sha256; cv=none; b=b/LHomP30GxTjyTGHEPUXSiuDy05O0vFGp3mDgksBJK2kPjieoCLgn98LRivpsXB4edOjw K4SKBR24XOEXnGex0FG99THUyHJMN9m8wVC/cz4lFtE0mSO3mz0kcmwEnraCUGuwbCAJHc tQyOmFg+71TBFLpLyX5tl5SvbjojxUU= Date: Tue, 25 Mar 2025 10:28:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1742898513; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IX7ZciLSSPhoHHVqBAEaO5fkvju9wRbHw8KbG8TUtEY=; b=DlfPyjaYCVUmYIwy6SSV3yeNnrQBN5TYYVRbCf7rvBJ9kToFzo4V+BNv/ivj0UHqUbX1sZ 0k2vlec9obqdAPvv39vfOI1qRlZNzOlsdeVSCSWvGHgwNQKqn3k0aOY0LZosP/PLScjJWP IEz3Wu81fCf5nzMRIrxwirBtOpifr8dGJ0dKiHTQAUd56MIg4pqe/eqkJ/hebSJTqrb2qi 3V/h/otQOQMl3u0G0ieKzrLB5i13er1Uwt2f+k2eg5wDm8+GCBrowGG26bVZC/dyFGcDjG B5hPuldczS4Hf1cdT2qHGQ2onBZvK5QE2gRrHztPaawUBmphsoYdl4UigSGGMA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1742898513; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IX7ZciLSSPhoHHVqBAEaO5fkvju9wRbHw8KbG8TUtEY=; b=AinLuDfRTioRq1I1CscvOq6j7I3sMK2r2JFwar7VBzcR5RJSAcGmc9qGo3MVfK2GW/fjEJ 2jFy8+KFE62v81Bw== From: "tip-bot2 for David Hildenbrand" Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/urgent] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() Cc: xingwei lee , yuxin wang , Marius Fleischer , David Hildenbrand , Ingo Molnar , Andy Lutomirski , Peter Zijlstra , Rik van Riel , "H. Peter Anvin" , Linus Torvalds , Andrew Morton , linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250321112323.153741-1-david@redhat.com> References: <20250321112323.153741-1-david@redhat.com> MIME-Version: 1.0 Message-ID: <174289851258.14745.6063394229146983112.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Rspamd-Queue-Id: D938B2000E X-Stat-Signature: an5x5kapi53h6d58hrhur6asenfmcrk4 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Bulk: true X-HE-Tag: 1742898515-465710 X-HE-Meta: U2FsdGVkX19jCbRuB1aSAnZYu6NwkwljCVCMgX2iXqs61mjCc28LKfDiLrUHQjVel6QGuXxufe2RMDFL6mFRGo/h4fuHABHg0VC3Mb0G3cnVDoXTQTMlgx9hIsIl9OQ2EuqdfF1wlPjSmAP0i0oVTyxu/w8x9c5CctYSm7V1f14Yyuya1yBDJPF29n6jCCfLm0U4lUbIlosXEzm4mHSQE8bbqnBj00aNIiOLxhbGgXyKIFrGJbz20tL6UepObpNd9Riez9uKMl6JZvOhAbRJrVCnCKSNfhRcSUpvJ2GhY7z0nA8F5KspXZrFN3G7D0jfR+ywIZeFLO1mTexXDyJoaU1sqHOxDPH5q0oS+3h1LsYqxrEUb+WooSXKZRmhGnQ9cwHgrPePPjiC5lTQerXOLOqVOUh/yorGO7Lcl7hsd4rkMxoGPiSCIAAe8ropI1uJvdrufPvJA3X639+wHaJ4B9M7P3N1FPHLW8CgpkIt41i55LnpHAau+ABRngf+/J4UwqQPQc9glMO0lJ7gXC19NXjSpuxla5wu3Y4gyK/QI/jigUxxLvzzhpMI+B2QFifpwRbnermGSovXH89XoRCCIplWXt08At23ciz4Z9l66KzKBG7dN8jhjpUxOH2bph9eAAT2Xdb2XmDLWR9kLsWbg6fvXtKLwlNxDW+JVgnsq+gzlk54BKKQPrg3ur+jlJi4LTELFh3uloUpW1ub+Aju+/UP/WCAbabVbZ3v4wlZ676r681KcrngLvDpRxnsBnH59ewZsl5nkmuVcBLak19nUR+EXRIUaNGWMFkdNC6pZ6c4bvXOqRhbSgCm9ZclbEDVi/lzgSw/3rkrQKuwvH9x8Tbu46d9+aRvN7mrZqRpoRAwNtaEhPCQ4UvRV+s0G83ykxkXW9iwdeuJBUPtC+lpvL7BhTkMHRwdBAGQvav6rLkVKax/Or7iHgikg411Wz/QuDgTTQSVT73XtNewg2a 3gNz9Soa +Yri/eHKwvOdT8V4g7KwZzbYEQW7drOzVXxmAGLz/6EqKFTC+AqAMwVpyHbMTrTusBR2Opy4djyXkvCbZAwYX3cMXbNaK9nvEA413cVPIPM+ldeLS/bJXQPviX7idAjeQVwHyP7b/9GmUfSuIR4OuPYRs83+mC7xDFm/26wo43nA4jgdjqmfVZtrQELCyBzd8moFYSSrVKf2+jNWFGT22fepjRgaTyGW2NdFHD0HQ+onRUNgiyYBvooR3n9b32UACPmCs5JUh+XfCsWFvoR9WvonPgIys8DfJMG9WqOqjwXs/LOeteXBKo5vteFBEZ3fCvcXI2VEx+N9R58F89ZcPeI3slf7OrPOb/C5+mIUQ2q+s2YBQNUxAY5+d7mlfdyUeOE6WSFFtoAUiig+Zw4HvB3MIBfBEQQ6dKEp0eMobYOZokPtl+Uw6tQUIJW43Oc4r+NkndPG6/Vn/UKhIaOT+ORoadAqPp8HgwL1ZFYV7Sjb3ZiOQRW41OBZBo7AOENwqSjwxJAFiIZ8C7TX0Md1cJZRhQLIfAA3z04FrxpwXJUa00HzpdraRrR3kb41M4wSHNCwlwaL8oRMzoo7QDoeD8LtDkQUa/F/NHEO8EKgEo34+kt3vvHDlKX3+yUiRTGXuqxoLAM0FDovu8E9PEvg4NO3Jk4gaOiJ63vVBhXj2w3xe64YvhHbpUzX4wi8D+pCAEgxknOdt3X++zFHzHiY8wcJkZJaqh221fnQ4NFSPqhcFQPyxRO1VUh2OpqwO3CIR3blbdeARu5zVio9769BXMXa+QyDw7zvbCOs6W7BB5Yza2rJAUktaLPyK+KMhRWn8GZGRqrcMhZpgijBVRApE1XBbDmyql1WlatpaYB6apsy5HC3B2RbydUvBRmP8ztvAppBqwfKLneWIzJcTH+1YReLtj5mUre4HJ7VaqnA8MtO4nPk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 5bd79e2ed005fdedc7670fb6deaae1552aa8b694 Gitweb: https://git.kernel.org/tip/5bd79e2ed005fdedc7670fb6deaae1552aa8b694 Author: David Hildenbrand AuthorDate: Fri, 21 Mar 2025 12:23:23 +01:00 Committer: Ingo Molnar CommitterDate: Tue, 25 Mar 2025 11:14:15 +01:00 x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() If track_pfn_copy() fails, we already added the dst VMA to the maple tree. As fork() fails, we'll cleanup the maple tree, and stumble over the dst VMA for which we neither performed any reservation nor copied any page tables. Consequently untrack_pfn() will see VM_PAT and try obtaining the PAT information from the page table -- which fails because the page table was not copied. The easiest fix would be to simply clear the VM_PAT flag of the dst VMA if track_pfn_copy() fails. However, the whole thing is about "simply" clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy() and performed a reservation, but copying the page tables fails, we'll simply clear the VM_PAT flag, not properly undoing the reservation ... which is also wrong. So let's fix it properly: set the VM_PAT flag only if the reservation succeeded (leaving it clear initially), and undo the reservation if anything goes wrong while copying the page tables: clearing the VM_PAT flag after undoing the reservation. Note that any copied page table entries will get zapped when the VMA will get removed later, after copy_page_range() succeeded; as VM_PAT is not set then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be happy. Note that leaving these page tables in place without a reservation is not a problem, as we are aborting fork(); this process will never run. A reproducer can trigger this usually at the first try: https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110 Modules linked in: ... CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:get_pat_info+0xf6/0x110 ... Call Trace: ... untrack_pfn+0x52/0x110 unmap_single_vma+0xa6/0xe0 unmap_vmas+0x105/0x1f0 exit_mmap+0xf6/0x460 __mmput+0x4b/0x120 copy_process+0x1bf6/0x2aa0 kernel_clone+0xab/0x440 __do_sys_clone+0x66/0x90 do_syscall_64+0x95/0x180 Likely this case was missed in: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") ... and instead of undoing the reservation we simply cleared the VM_PAT flag. Keep the documentation of these functions in include/linux/pgtable.h, one place is more than sufficient -- we should clean that up for the other functions like track_pfn_remap/untrack_pfn separately. Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3") Reported-by: xingwei lee Reported-by: yuxin wang Reported-by: Marius Fleischer Signed-off-by: David Hildenbrand Signed-off-by: Ingo Molnar Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Rik van Riel Cc: "H. Peter Anvin" Cc: Linus Torvalds Cc: Andrew Morton Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20250321112323.153741-1-david@redhat.com Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/ Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/ --- arch/x86/mm/pat/memtype.c | 43 ++++++++++++++++---------------------- include/linux/pgtable.h | 31 +++++++++++++++++++++------ kernel/fork.c | 4 ++++- mm/memory.c | 11 +++------- 4 files changed, 52 insertions(+), 37 deletions(-) diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index e40861c..35ceccb 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -984,26 +984,30 @@ static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr, return -EINVAL; } -/* - * track_pfn_copy is called when vma that is covering the pfnmap gets - * copied through copy_page_range(). - * - * If the vma has a linear pfn mapping for the entire range, we get the prot - * from pte and reserve the entire vma range with single reserve_pfn_range call. - */ -int track_pfn_copy(struct vm_area_struct *vma) +int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn) { + const unsigned long vma_size = src_vma->vm_end - src_vma->vm_start; resource_size_t paddr; - unsigned long vma_size = vma->vm_end - vma->vm_start; pgprot_t pgprot; + int rc; - if (vma->vm_flags & VM_PAT) { - if (get_pat_info(vma, &paddr, &pgprot)) - return -EINVAL; - /* reserve the whole chunk covered by vma. */ - return reserve_pfn_range(paddr, vma_size, &pgprot, 1); - } + if (!(src_vma->vm_flags & VM_PAT)) + return 0; + /* + * Duplicate the PAT information for the dst VMA based on the src + * VMA. + */ + if (get_pat_info(src_vma, &paddr, &pgprot)) + return -EINVAL; + rc = reserve_pfn_range(paddr, vma_size, &pgprot, 1); + if (rc) + return rc; + + /* Reservation for the destination VMA succeeded. */ + vm_flags_set(dst_vma, VM_PAT); + *pfn = PHYS_PFN(paddr); return 0; } @@ -1095,15 +1099,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, } } -/* - * untrack_pfn_clear is called if the following situation fits: - * - * 1) while mremapping a pfnmap for a new region, with the old vma after - * its pfnmap page table has been removed. The new vma has a new pfnmap - * to the same pfn & cache type with VM_PAT set. - * 2) while duplicating vm area, the new vma fails to copy the pgtable from - * old vma. - */ void untrack_pfn_clear(struct vm_area_struct *vma) { vm_flags_clear(vma, VM_PAT); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d..df2aff5 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1508,10 +1508,12 @@ static inline void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, } /* - * track_pfn_copy is called when vma that is covering the pfnmap gets - * copied through copy_page_range(). + * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page + * tables copied during copy_page_range(). On success, stores the pfn to be + * passed to untrack_pfn_copy(). */ -static inline int track_pfn_copy(struct vm_area_struct *vma) +static inline int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn) { return 0; } @@ -1528,8 +1530,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma, } /* - * untrack_pfn_clear is called while mremapping a pfnmap for a new region - * or fails to copy pgtable during duplicate vm area. + * untrack_pfn_clear is called in the following cases on a VM_PFNMAP VMA: + * + * 1) During mremap() on the src VMA after the page tables were moved. + * 2) During fork() on the dst VMA, immediately after duplicating the src VMA. */ static inline void untrack_pfn_clear(struct vm_area_struct *vma) { @@ -1540,12 +1544,27 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot, unsigned long size); extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn); -extern int track_pfn_copy(struct vm_area_struct *vma); +extern int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn); extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, unsigned long size, bool mm_wr_locked); extern void untrack_pfn_clear(struct vm_area_struct *vma); #endif +/* + * untrack_pfn_copy is called when a VM_PFNMAP VMA failed to copy during + * copy_page_range(), but after track_pfn_copy() was already called. + */ +static inline void untrack_pfn_copy(struct vm_area_struct *dst_vma, + unsigned long pfn) +{ + untrack_pfn(dst_vma, pfn, dst_vma->vm_end - dst_vma->vm_start, true); + /* + * Reservation was freed, any copied page tables will get cleaned + * up later, but without getting PAT involved again. + */ +} + #ifdef CONFIG_MMU #ifdef __HAVE_COLOR_ZERO_PAGE static inline int is_zero_pfn(unsigned long pfn) diff --git a/kernel/fork.c b/kernel/fork.c index f11ac96..91171e5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -504,6 +504,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) vma_numab_state_init(new); dup_anon_vma_name(orig, new); + /* track_pfn_copy() will later take care of copying internal state. */ + if (unlikely(new->vm_flags & VM_PFNMAP)) + untrack_pfn_clear(new); + return new; } diff --git a/mm/memory.c b/mm/memory.c index 4f6d976..53f7b0a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1362,12 +1362,12 @@ int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) { pgd_t *src_pgd, *dst_pgd; - unsigned long next; unsigned long addr = src_vma->vm_start; unsigned long end = src_vma->vm_end; struct mm_struct *dst_mm = dst_vma->vm_mm; struct mm_struct *src_mm = src_vma->vm_mm; struct mmu_notifier_range range; + unsigned long next, pfn; bool is_cow; int ret; @@ -1378,11 +1378,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { - /* - * We do not free on error cases below as remove_vma - * gets called on error from higher level routine - */ - ret = track_pfn_copy(src_vma); + ret = track_pfn_copy(dst_vma, src_vma, &pfn); if (ret) return ret; } @@ -1419,7 +1415,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) continue; if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd, addr, next))) { - untrack_pfn_clear(dst_vma); ret = -ENOMEM; break; } @@ -1429,6 +1424,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) raw_write_seqcount_end(&src_mm->write_protect_seq); mmu_notifier_invalidate_range_end(&range); } + if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP)) + untrack_pfn_copy(dst_vma, pfn); return ret; }