From patchwork Tue Mar 25 19:19:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 14029392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E162C3600B for ; Tue, 25 Mar 2025 19:20:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D513280019; Tue, 25 Mar 2025 15:19:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7838A28000B; Tue, 25 Mar 2025 15:19:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E39C280019; Tue, 25 Mar 2025 15:19:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3644428000B for ; Tue, 25 Mar 2025 15:19:59 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E5D831C82B0 for ; Tue, 25 Mar 2025 19:19:58 +0000 (UTC) X-FDA: 83261038476.05.1A24572 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 94F5DC0009 for ; Tue, 25 Mar 2025 19:19:56 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JrfMjUz1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742930396; a=rsa-sha256; cv=none; b=TKuefz/nL8AU7oj8l2n/F+wbaJXsjgML5HAOCOP9pp0Cp9a9dEPpLnObF99UsDz5P001f8 ZBDF+p9Vi5osuy5qlpk84xMdehERTDcnG8D8nD2BAg1BA0Dx+rOGx7l1GXEuUhXe29DsFq jBmU2gEcEVMFRaLTtqRCrDJ6jcVPvKY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JrfMjUz1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742930396; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=RPItviY8/0WhEmkAlyHXuooS8Uc3VmDSsaun24VLMh8=; b=8BPOGiN1dmAcdC5uNvLpF4EkE0Ki2y1HvuX+V9MccLWvpian9bn/ct9BDmbUtDWIk17Spv MsNkW4NUPVNng5DS6w1XBGKCRodp544BP+o13Rs0AMMGcM1hMgGcsgwHgVGW6/Jq3KY4sz NRScPbfNCFrxx4/x8W+GJoL9x+i0+c8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1742930396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=RPItviY8/0WhEmkAlyHXuooS8Uc3VmDSsaun24VLMh8=; b=JrfMjUz1M768UQV84JaRm9dXdF7BZ5VBQQTkF8vWaximEfmr/T9CTH17Jn0w9Q5K0TJc5Z 5IBAXR0YtnXGldGNEYFfIJxTb5ZuIXCpzdtjotbRHrILBPtQULZ+QSYGzGE1Av3dszKID8 pQVCsyPcjWeCvAtJZzKdTw+7HdsVDsE= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-392-OxuKylcXN3yLHdnXojj4Ww-1; Tue, 25 Mar 2025 15:19:54 -0400 X-MC-Unique: OxuKylcXN3yLHdnXojj4Ww-1 X-Mimecast-MFC-AGG-ID: OxuKylcXN3yLHdnXojj4Ww_1742930393 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7c548e16909so575857385a.2 for ; Tue, 25 Mar 2025 12:19:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742930393; x=1743535193; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RPItviY8/0WhEmkAlyHXuooS8Uc3VmDSsaun24VLMh8=; b=tXuwToGBYKWN2pqF7UdaAuOX09hJaKfATg4R6yCjGAPwmk45JozYMOBVbKboOfJWJZ StrKUBSDzrTPhk/TwR9Q+i34dmyb1yVPAIUkgo2oQFQeqZCqwzVtCpxOjIgP8SzY8OHE dO3HuOrHuQhwiTN82blb4KLp+NU2TbJ+yLRNbSX71P6z0niyluJVZLqd9m/qeOw2wpb2 ucvJxzzZ/PNHiTR0oWZV47rOogVV8rXCwzyly6DZjfGBJyCcz2oFaS2MUVp6couNc1du HjRX48KYHW6JfsSEgSqB4LMosSjej69LbOw5/tP4hmP1Rcw3SzBJR26fTczhsHG7WOYu FIIg== X-Gm-Message-State: AOJu0YwVcTaQTYqTP1WI1uD6rHwD/kw22mt6Ic7XZsJUQENIYKzJPEGw mzQORjJdEZ4FGu+0QZ43Jr4HnLl3JC2ik0UbVZ8W6nCAqBxdjFmNXJWf/wWVN7FWJ0zqGOJC3Ju +YDFNF7zRQeOTT+A3LOEpPhp38hC6v3IlKZb3m63IL9wK/8tFm16xxcl7d4s= X-Gm-Gg: ASbGncvmmRJo9whpv0r3hCvrISEb/eBq3OxvFeU2GTJHLMSerfJjAjDym34jS2BDlf7 KEB6yJOdm94My4C9lUO46gBsOlNDodaEfE21KBwXZnavWwaX4g3qHkfdzc/eqrSDDzgJFp5OVL7 6u2FIC6YmStIgkGCsVBAAYACRUbcGp21bLRdoTtkYeQBArSGDroilndysR1UPpMnAv10tOmbLg+ La4QtrPHujzJvWg4PoW9RM1O5BboqVRu47CtosDhY1ktu4IqpXmlmTAcqOc1DN0X0l4SxH3FLe6 Hf/O6NXoOQ== X-Received: by 2002:a05:620a:2494:b0:7c5:48bc:8c84 with SMTP id af79cd13be357-7c5ba15d8b6mr2139618185a.15.1742930393307; Tue, 25 Mar 2025 12:19:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHnSDRvPkYHIS5gCFPV87Hk6I+UiKsIH9jkyhEF8jaj+I7eCFJJS+R1GvMrv8zc2ruYiMN8+g== X-Received: by 2002:a05:620a:2494:b0:7c5:48bc:8c84 with SMTP id af79cd13be357-7c5ba15d8b6mr2139613885a.15.1742930392788; Tue, 25 Mar 2025 12:19:52 -0700 (PDT) Received: from localhost ([99.209.85.25]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c5b935e8a9sm670571085a.114.2025.03.25.12.19.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 Mar 2025 12:19:52 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, David Hildenbrand , xingwei lee , yuxin wang , Marius Fleischer , Ingo Molnar , Borislav Petkov , Dan Carpenter , Andrew Morton , Linus Torvalds , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Rik van Riel , "H. Peter Anvin" , Peter Xu Subject: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() Date: Tue, 25 Mar 2025 20:19:51 +0100 Message-ID: <20250325191951.471185-1-david@redhat.com> X-Mailer: git-send-email 2.48.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: r3a6mNpmjJnQ70qLMaVJ-n7UOnUki0hbJBX9DlyzmCQ_1742930393 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspamd-Queue-Id: 94F5DC0009 X-Stat-Signature: 7ekpwe5zssp8rx3y1tx8ruj9qsccyup7 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1742930396-85260 X-HE-Meta: U2FsdGVkX1/38YoUx3bPN1l3iadwgMkOrMsoC51Wjh9qe4b/KZtbRiZc7Z7AU2CcOgdPYrxAk4NcXi6U9hNtfuWze3HRhiUW6vcmOWU9lufm5kdT8Bu09eOe/MeDSQwO0ODcX7KHGW5hmABw9BLVo0jcXmX9MPPDFN8VHwdfqSNa0rLaUeXkBQh00Z8nZ7wjIL+we1VJm40J2ZJ6I6KzxwkdNqt8z73bIVsP1Gvg7KdO96GJc7g0uxRaZk6d77v5Vss1nGLIxcMFMvT0CzOkxNsv0u8Md2EXu5gR+B48XV1jp4DlgkuD6QRcVJG1Y7yx1Q8zAmOvPsNu7PtQiX1aviJ7ZfNZCkuh4tL38thhcfhcq6NxbU3RQMirStV5BY3qEbcuYwZXqSbjBsO+WicYi/KstMhGEaasdL2gy/jMXomdSGokN1UZEdauu8H8JI49Za5Et6AlVNNn5qwESBHHobdyyECMLYT9V0dDsbSAjKjCG1aMFLRy+tgd9YygvQX6/78FR/b8Ns1t53+oFpzbq7Tt7G/F5dxdoJale1+29pxds9P73lww8a+UyLnVrKO98eB6sJ5ndmR2RD/LHDZL+iXciXWiwpRhPF+WMqpHDgy1Zcgpd2JUAzea3stgTWRXJCWk9mHsXAjXmFxfuXel6uwonFh3xhCwZoZrBQ+lTmQCEmRZAyb2WS9Uuxc8DvG20cZUR1D2RwPAbbfzUH2ZDQzS3xh7KYTJ1cnijvCRUM6hjUKxITG+biHzROHJTz/kVYNcuxueg67SmsbNwEYUQMVkIl2f57zDL3bLM84rtHunnZakZ2MvUGAl2Ng8KAd1MWkrJnHjH6UVNf+fy+jjsumm7AvQTHSa5dfgSfj5DvlIN+OPXo5gXBj3D9KvT6ExwXX2sNIjYPg/Sg1ntvrkX1sjERSF1eabGTzGj+cBQdSlMlAzip2LSn2rcS4PB5w9HNGFIEe9SqEeKUMrQXV NNE4DqZv UbWvvCMm6kN8oT0MNhP+cYV7Xxz5jzDr+uchDmbX30TCsCWVmz7qrUjsEuBwAGMStVN529EbZhAvOjvXETvPsIQq8vf1C4rttc1w7uqel6mqABRgr7ISzsWO4s3oeGP4KVy+rfjCJlvC77erlqCc3vY/6Q/sBmy6oxX/eDS1gMxgsyuE85WarlA049VjZBNYL55pPk8rPTj9fEz3DIiHSjXpP9sYAT9rqW8cilPJn0LPBB1p0R9W22MsTXUb69qtlx6Lma3Kvvvi/1obLr/UpP1jfTxzkXag8BMf1ssBy45hyId9fReMNtiD9rO4s+LWnUS/wCZReKdzW3HoXh2FZfiIcCctGWgV07RV75o5yH35r+sPM/KIBjjxWrBhp4ae8FfnMLNFTJIjt2n+74Akeh5oUAp8sty+S7q8YEG1qMBVJtnSHEQUIViF/WYvaNSlz4mBpCiogj9/mKLc4I95HiYcuTjQreDm0JXxhqXW+nkWX76JbgYTjx1pw3mKREfIB+VoFV8ATdnhDy03YoZMsI/TZUZPv3HpaqQYRFynKCNai3vp41OFcG3ZGbT762alXOWQQx1ma66/Ga7ZMxwyUbdeFJNq5YaMxI4cM1EsgPs8DUEk8+oTLwTl0koLECVvgPa2Mcz+9+haCqb09VMwH9Ovx9eqF/U6uN3GsjhyMMbwuqMgAUKn4EksSwP+PIqC3ZUIJc9xzAhxlfLT1TWTUa9uAEb3PZNIH7/lZe7BjJdRn3yPLmD4ic+CBaQN0HsODuj9HOqrJZgs9OPCoEogHMM8Ds4XO6H+fvuIUhVtF3TxTccALfHwxngRbBGlycV/Z8jsyUFxvIaH9+StbJYzEq9nbtjRLoDBnvuC8+TQn49cuHND3FDzQ7P2vABjg7dBZGIw3HNMuqyf3i7z6SKjQOvF9hm0pbw/Q1KFRt1cCvvX6W707bOA1zDnu1ZUkY6Pq4ILO5eALboFyKBcVHE6j0A+qhhBx fqjbZ2hE FFORbnD0UuJ1wMvpHHbah1QnPA3YtbsBQXmraIS9aI7Tns1qMfyAl9UBJK3tQxVXxZ1VVNQGrxk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If track_pfn_copy() fails, we already added the dst VMA to the maple tree. As fork() fails, we'll cleanup the maple tree, and stumble over the dst VMA for which we neither performed any reservation nor copied any page tables. Consequently untrack_pfn() will see VM_PAT and try obtaining the PAT information from the page table -- which fails because the page table was not copied. The easiest fix would be to simply clear the VM_PAT flag of the dst VMA if track_pfn_copy() fails. However, the whole thing is about "simply" clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy() and performed a reservation, but copying the page tables fails, we'll simply clear the VM_PAT flag, not properly undoing the reservation ... which is also wrong. So let's fix it properly: set the VM_PAT flag only if the reservation succeeded (leaving it clear initially), and undo the reservation if anything goes wrong while copying the page tables: clearing the VM_PAT flag after undoing the reservation. Note that any copied page table entries will get zapped when the VMA will get removed later, after copy_page_range() succeeded; as VM_PAT is not set then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be happy. Note that leaving these page tables in place without a reservation is not a problem, as we are aborting fork(); this process will never run. A reproducer can trigger this usually at the first try: https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c [ 45.239440] WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110 [ 45.241082] Modules linked in: ... [ 45.249119] CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92 [ 45.250598] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 45.252181] RIP: 0010:get_pat_info+0xf6/0x110 ... [ 45.268513] Call Trace: [ 45.269003] [ 45.269425] ? __warn.cold+0xb7/0x14d [ 45.270131] ? get_pat_info+0xf6/0x110 [ 45.270846] ? report_bug+0xff/0x140 [ 45.271519] ? handle_bug+0x58/0x90 [ 45.272192] ? exc_invalid_op+0x17/0x70 [ 45.272935] ? asm_exc_invalid_op+0x1a/0x20 [ 45.273717] ? get_pat_info+0xf6/0x110 [ 45.274438] ? get_pat_info+0x71/0x110 [ 45.275165] untrack_pfn+0x52/0x110 [ 45.275835] unmap_single_vma+0xa6/0xe0 [ 45.276549] unmap_vmas+0x105/0x1f0 [ 45.277256] exit_mmap+0xf6/0x460 [ 45.277913] __mmput+0x4b/0x120 [ 45.278512] copy_process+0x1bf6/0x2aa0 [ 45.279264] kernel_clone+0xab/0x440 [ 45.279959] __do_sys_clone+0x66/0x90 [ 45.280650] do_syscall_64+0x95/0x180 Likely this case was missed in commit d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") ... and instead of undoing the reservation we simply cleared the VM_PAT flag. Keep the documentation of these functions in include/linux/pgtable.h, one place is more than sufficient -- we should clean that up for the other functions like track_pfn_remap/untrack_pfn separately. Reported-by: xingwei lee Reported-by: yuxin wang Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/ Reported-by: Marius Fleischer Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/ Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3") Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dan Carpenter Cc: Andrew Morton Cc: Linus Torvalds Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Rik van Riel Cc: "H. Peter Anvin" Cc: Peter Xu Signed-off-by: David Hildenbrand --- v2 -> v3: * Make some !MMU configs happy by just moving the code into memtype.c v1 -> v2: * Avoid a second get_pat_info() [and thereby fix the error checking] by passing the pfn from track_pfn_copy() to untrack_pfn_copy() * Simplify untrack_pfn_copy() by calling untrack_pfn(). * Retested Not sure if we want to CC stable ... it's really hard to trigger in sane environments. --- arch/x86/mm/pat/memtype.c | 52 +++++++++++++++++++++------------------ include/linux/pgtable.h | 28 ++++++++++++++++----- kernel/fork.c | 4 +++ mm/memory.c | 11 +++------ 4 files changed, 58 insertions(+), 37 deletions(-) base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557 diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index feb8cc6a12bf2..d721cc19addbd 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -984,29 +984,42 @@ static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr, return -EINVAL; } -/* - * track_pfn_copy is called when vma that is covering the pfnmap gets - * copied through copy_page_range(). - * - * If the vma has a linear pfn mapping for the entire range, we get the prot - * from pte and reserve the entire vma range with single reserve_pfn_range call. - */ -int track_pfn_copy(struct vm_area_struct *vma) +int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn) { + const unsigned long vma_size = src_vma->vm_end - src_vma->vm_start; resource_size_t paddr; - unsigned long vma_size = vma->vm_end - vma->vm_start; pgprot_t pgprot; + int rc; - if (vma->vm_flags & VM_PAT) { - if (get_pat_info(vma, &paddr, &pgprot)) - return -EINVAL; - /* reserve the whole chunk covered by vma. */ - return reserve_pfn_range(paddr, vma_size, &pgprot, 1); - } + if (!(src_vma->vm_flags & VM_PAT)) + return 0; + + /* + * Duplicate the PAT information for the dst VMA based on the src + * VMA. + */ + if (get_pat_info(src_vma, &paddr, &pgprot)) + return -EINVAL; + rc = reserve_pfn_range(paddr, vma_size, &pgprot, 1); + if (rc) + return rc; + /* Reservation for the destination VMA succeeded. */ + vm_flags_set(dst_vma, VM_PAT); + *pfn = PHYS_PFN(paddr); return 0; } +void untrack_pfn_copy(struct vm_area_struct *dst_vma, unsigned long pfn) +{ + untrack_pfn(dst_vma, pfn, dst_vma->vm_end - dst_vma->vm_start, true); + /* + * Reservation was freed, any copied page tables will get cleaned + * up later, but without getting PAT involved again. + */ +} + /* * prot is passed in as a parameter for the new mapping. If the vma has * a linear pfn mapping for the entire range, or no vma is provided, @@ -1095,15 +1108,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, } } -/* - * untrack_pfn_clear is called if the following situation fits: - * - * 1) while mremapping a pfnmap for a new region, with the old vma after - * its pfnmap page table has been removed. The new vma has a new pfnmap - * to the same pfn & cache type with VM_PAT set. - * 2) while duplicating vm area, the new vma fails to copy the pgtable from - * old vma. - */ void untrack_pfn_clear(struct vm_area_struct *vma) { vm_flags_clear(vma, VM_PAT); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372e..4c107e17c547e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1508,14 +1508,25 @@ static inline void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, } /* - * track_pfn_copy is called when vma that is covering the pfnmap gets - * copied through copy_page_range(). + * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page + * tables copied during copy_page_range(). On success, stores the pfn to be + * passed to untrack_pfn_copy(). */ -static inline int track_pfn_copy(struct vm_area_struct *vma) +static inline int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn) { return 0; } +/* + * untrack_pfn_copy is called when a VM_PFNMAP VMA failed to copy during + * copy_page_range(), but after track_pfn_copy() was already called. + */ +static inline void untrack_pfn_copy(struct vm_area_struct *dst_vma, + unsigned long pfn) +{ +} + /* * untrack_pfn is called while unmapping a pfnmap for a region. * untrack can be called for a specific region indicated by pfn and size or @@ -1528,8 +1539,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma, } /* - * untrack_pfn_clear is called while mremapping a pfnmap for a new region - * or fails to copy pgtable during duplicate vm area. + * untrack_pfn_clear is called in the following cases on a VM_PFNMAP VMA: + * + * 1) During mremap() on the src VMA after the page tables were moved. + * 2) During fork() on the dst VMA, immediately after duplicating the src VMA. */ static inline void untrack_pfn_clear(struct vm_area_struct *vma) { @@ -1540,7 +1553,10 @@ extern int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot, unsigned long size); extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn); -extern int track_pfn_copy(struct vm_area_struct *vma); +extern int track_pfn_copy(struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma, unsigned long *pfn); +extern void untrack_pfn_copy(struct vm_area_struct *dst_vma, + unsigned long pfn); extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, unsigned long size, bool mm_wr_locked); extern void untrack_pfn_clear(struct vm_area_struct *vma); diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f32..ca2ca3884f763 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -504,6 +504,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) vma_numab_state_init(new); dup_anon_vma_name(orig, new); + /* track_pfn_copy() will later take care of copying internal state. */ + if (unlikely(new->vm_flags & VM_PFNMAP)) + untrack_pfn_clear(new); + return new; } diff --git a/mm/memory.c b/mm/memory.c index fb7b8dc751679..dc8efa1358e94 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1362,12 +1362,12 @@ int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) { pgd_t *src_pgd, *dst_pgd; - unsigned long next; unsigned long addr = src_vma->vm_start; unsigned long end = src_vma->vm_end; struct mm_struct *dst_mm = dst_vma->vm_mm; struct mm_struct *src_mm = src_vma->vm_mm; struct mmu_notifier_range range; + unsigned long next, pfn; bool is_cow; int ret; @@ -1378,11 +1378,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { - /* - * We do not free on error cases below as remove_vma - * gets called on error from higher level routine - */ - ret = track_pfn_copy(src_vma); + ret = track_pfn_copy(dst_vma, src_vma, &pfn); if (ret) return ret; } @@ -1419,7 +1415,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) continue; if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd, addr, next))) { - untrack_pfn_clear(dst_vma); ret = -ENOMEM; break; } @@ -1429,6 +1424,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) raw_write_seqcount_end(&src_mm->write_protect_seq); mmu_notifier_invalidate_range_end(&range); } + if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP)) + untrack_pfn_copy(dst_vma, pfn); return ret; }