From patchwork Thu Dec 22 20:55:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13080324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D66B6C001B2 for ; Thu, 22 Dec 2022 20:55:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E254900002; Thu, 22 Dec 2022 15:55:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 56B4C940007; Thu, 22 Dec 2022 15:55:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3711B900004; Thu, 22 Dec 2022 15:55:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 26374900002 for ; Thu, 22 Dec 2022 15:55:24 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E6A75140ECB for ; Thu, 22 Dec 2022 20:55:23 +0000 (UTC) X-FDA: 80271147726.30.732CEFA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 4000E1A0018 for ; Thu, 22 Dec 2022 20:55:22 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FQof78Ah; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671742522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lvRumMImlcFs3FZ95qQ7RqCaL1BsczRcfz4qHMe/Ro0=; b=aVL+DT+qwS9vHm+TGUZSway8ij0beneme3G5uBqnVg8s3bgXhTa9fTcfSijOU7KMYj8thF yEkqVK8hZS5ranoPY8WvPFfxnDXSCG+3KV1c457F2yL7ems5l61Sv8VDo2T8VhalnfvJ3f VBcy4n549CdEPnPGKPnOZa1aju5Vmfk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FQof78Ah; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671742522; a=rsa-sha256; cv=none; b=7sN8KSEQcTncXzI1x3GpFJ/z/BvF4ol+xvYXTUYRVZYnoAP7L+jfsSD1s4p7zAwhDmXeJK +7qahVs/VC3peM4u4STlC7KH4pcrEqSQLmRvcCCNMfu4ttOtlvTVc04YaAqkkcbDyL3Ts7 Jc5/NIlPnwmIKumTBgEKWvNKG/kSs/8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671742521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lvRumMImlcFs3FZ95qQ7RqCaL1BsczRcfz4qHMe/Ro0=; b=FQof78Ah8L+/08QoT2RCOBRsN+IQpAykVVP9ZwyaT84HuCc33Ha+VjpeMqVByDJjFG5Bmc iqWtbVHzda0xPatvYlgU7Ho2c5t/pw2R8pEO/4aWZmnudFRbqvOBXBZqzbygPQU/aRpT77 Qt9sWl957iSgOMIKkMzmakBeKvPltFU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-643-wBHk7L1HNYmRypQcJnJMyQ-1; Thu, 22 Dec 2022 15:55:18 -0500 X-MC-Unique: wBHk7L1HNYmRypQcJnJMyQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AA8FE101A521; Thu, 22 Dec 2022 20:55:17 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 68BF340C2004; Thu, 22 Dec 2022 20:55:15 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mike Kravetz , Peter Xu , Muchun Song , Miaohe Lin , stable@vger.kernel.org Subject: [PATCH v1 1/2] mm/hugetlb: fix PTE marker handling in hugetlb_change_protection() Date: Thu, 22 Dec 2022 21:55:10 +0100 Message-Id: <20221222205511.675832-2-david@redhat.com> In-Reply-To: <20221222205511.675832-1-david@redhat.com> References: <20221222205511.675832-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4000E1A0018 X-Rspam-User: X-Stat-Signature: x61riha3z37fjjp7a5k8tg887zbpwaj4 X-HE-Tag: 1671742522-926491 X-HE-Meta: U2FsdGVkX19JLzoDvHVX/+PanYYkBcTJA7nqvpoJf6ZXcAH8FYl8C3CUimW4gWeN5aeS2Qwc5N4GBSOl+JMix0KF6eH19q+x62PPOi+3BWKlwUGmP0FGsFZVXMC1EnuMbzNnHy8VPhbmcw2DerIcv/i3Rzb+SFV9i2n7V5yxjTpbYO9ekfDDgUhucwKLvOFQzSYPpXEiyBcI8YZvUGp9ioa0uE7cXnRvlu6B2TdIzwp4PrMR8B8lV2Nj9Fx2SouqVDUoagmvebJ4pEtsvQOzPINayfYQP/PrfKqYE/OhvBhmVyG/UIqmN2Fogoricyw8XUgCAHk5N1nNOJQw0RKUOsoso1GNUWT4DAtizz6qaLeHX4UIMjYUfBuaTZm0lHIiyaewgXGwOOeghbKg+rMGccPQJogsz+syjiyvCppBL66XWTb5ZHPgoPpl1Oe80jAE04GBGMZEV0CFgsdO2/H+SjPczwdxEkULCPYG/xdVZ0gki+DVGUBZi9/0sBNJwnUvp9HjI6VKpygA9jdaZhWWLDZbdLaxXFGVC+W7dQ6aacii7ci1ASmZCrSHtmF1IwtWPFZ4/rMLuE5vTja+46uI40ZEmdNPqMdIqg7FWIpN4LAVZ4zS5KSqzqVJ63jQ8FbgVZCik1rEZNub8qq1fIndPw9YSR+tl1NdrDcvIsJ8N0NAHKYfM/Q4WDVqxRchWNPEzRi6K3vC05h8HeuZcEwkBBBFkjBHnrbXBQYarTzh67HTaA7uKHnqRXKo9/eliAJoTTTL5sXbCu2iD/PaK9+je1ax4gQdthn9ebgtn0W0yHu2WNbrqepb3FAnBXxRQtQ6GyeulldVAijS+df5ggmlQ89RBrad9/B3tNYBqihueemdo7XtsMf+fiHLQhnHyMVFW44Q7CR7BxuBHSVVVB65JZo2jDwwwhiMj1unmCJSMJlZHUtr4hodLM2F11kW5p5a+NH5apXn/D1aQULlcvE E2A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two problematic cases when stumbling over a PTE marker in hugetlb_change_protection(): (1) We protect an uffd-wp PTE marker a second time using uffd-wp: we will end up in the "!huge_pte_none(pte)" case and mess up the PTE marker. (2) We unprotect a uffd-wp PTE marker: we will similarly end up in the "!huge_pte_none(pte)" case even though we cleared the PTE, because the "pte" variable is stale. We'll mess up the PTE marker. For example, if we later stumble over such a "wrongly modified" PTE marker, we'll treat it like a present PTE that maps some garbage page. This can, for example, be triggered by mapping a memfd backed by huge pages, registering uffd-wp, uffd-wp'ing an unmapped page and (a) uffd-wp'ing it a second time; or (b) uffd-unprotecting it; or (c) unregistering uffd-wp. Then, ff we trigger fallocate(FALLOC_FL_PUNCH_HOLE) on that file range, we will run into a VM_BUG_ON: [ 195.039560] page:00000000ba1f2987 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 [ 195.039565] flags: 0x7ffffc0001000(reserved|node=0|zone=0|lastcpupid=0x1fffff) [ 195.039568] raw: 0007ffffc0001000 ffffe742c0000008 ffffe742c0000008 0000000000000000 [ 195.039569] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 195.039569] page dumped because: VM_BUG_ON_PAGE(compound && !PageHead(page)) [ 195.039573] ------------[ cut here ]------------ [ 195.039574] kernel BUG at mm/rmap.c:1346! [ 195.039579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 195.039581] CPU: 7 PID: 4777 Comm: qemu-system-x86 Not tainted 6.0.12-200.fc36.x86_64 #1 [ 195.039583] Hardware name: LENOVO 20WNS1F81N/20WNS1F81N, BIOS N35ET50W (1.50 ) 09/15/2022 [ 195.039584] RIP: 0010:page_remove_rmap+0x45b/0x550 [ 195.039588] Code: [...] [ 195.039589] RSP: 0018:ffffbc03c3633ba8 EFLAGS: 00010292 [ 195.039591] RAX: 0000000000000040 RBX: ffffe742c0000000 RCX: 0000000000000000 [ 195.039592] RDX: 0000000000000002 RSI: ffffffff8e7aac1a RDI: 00000000ffffffff [ 195.039592] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffbc03c3633a08 [ 195.039593] R10: 0000000000000003 R11: ffffffff8f146328 R12: ffff9b04c42754b0 [ 195.039594] R13: ffffffff8fcc6328 R14: ffffbc03c3633c80 R15: ffff9b0484ab9100 [ 195.039595] FS: 00007fc7aaf68640(0000) GS:ffff9b0bbf7c0000(0000) knlGS:0000000000000000 [ 195.039596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 195.039597] CR2: 000055d402c49110 CR3: 0000000159392003 CR4: 0000000000772ee0 [ 195.039598] PKRU: 55555554 [ 195.039599] Call Trace: [ 195.039600] [ 195.039602] __unmap_hugepage_range+0x33b/0x7d0 [ 195.039605] unmap_hugepage_range+0x55/0x70 [ 195.039608] hugetlb_vmdelete_list+0x77/0xa0 [ 195.039611] hugetlbfs_fallocate+0x410/0x550 [ 195.039612] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 195.039616] vfs_fallocate+0x12e/0x360 [ 195.039618] __x64_sys_fallocate+0x40/0x70 [ 195.039620] do_syscall_64+0x58/0x80 [ 195.039623] ? syscall_exit_to_user_mode+0x17/0x40 [ 195.039624] ? do_syscall_64+0x67/0x80 [ 195.039626] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 195.039628] RIP: 0033:0x7fc7b590651f [ 195.039653] Code: [...] [ 195.039654] RSP: 002b:00007fc7aaf66e70 EFLAGS: 00000293 ORIG_RAX: 000000000000011d [ 195.039655] RAX: ffffffffffffffda RBX: 0000558ef4b7f370 RCX: 00007fc7b590651f [ 195.039656] RDX: 0000000018000000 RSI: 0000000000000003 RDI: 000000000000000c [ 195.039657] RBP: 0000000008000000 R08: 0000000000000000 R09: 0000000000000073 [ 195.039658] R10: 0000000008000000 R11: 0000000000000293 R12: 0000000018000000 [ 195.039658] R13: 00007fb8bbe00000 R14: 000000000000000c R15: 0000000000001000 [ 195.039661] Fix it by not going into the "!huge_pte_none(pte)" case if we stumble over an exclusive marker. spin_unlock() + continue would get the job done. However, instead, make it clearer that there are no fall-through statements: we process each case (hwpoison, migration, marker, !none, none) and then unlock the page table to continue with the next PTE. Let's avoid "continue" statements and use a single spin_unlock() at the end. Fixes: 60dfaad65aa9 ("mm/hugetlb: allow uffd wr-protect none ptes") Cc: Signed-off-by: David Hildenbrand Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 77f36e3681e3..3a94f519304f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6512,10 +6512,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, } pte = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { - spin_unlock(ptl); - continue; - } - if (unlikely(is_hugetlb_entry_migration(pte))) { + /* Nothing to do. */ + } else if (unlikely(is_hugetlb_entry_migration(pte))) { swp_entry_t entry = pte_to_swp_entry(pte); struct page *page = pfn_swap_entry_to_page(entry); @@ -6536,18 +6534,13 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, set_huge_pte_at(mm, address, ptep, newpte); pages++; } - spin_unlock(ptl); - continue; - } - if (unlikely(pte_marker_uffd_wp(pte))) { - /* - * This is changing a non-present pte into a none pte, - * no need for huge_ptep_modify_prot_start/commit(). - */ + } else if (unlikely(is_pte_marker(pte))) { + /* No other markers apply for now. */ + WARN_ON_ONCE(!pte_marker_uffd_wp(pte)); if (uffd_wp_resolve) + /* Safe to modify directly (non-present->none). */ huge_pte_clear(mm, address, ptep, psize); - } - if (!huge_pte_none(pte)) { + } else if (!huge_pte_none(pte)) { pte_t old_pte; unsigned int shift = huge_page_shift(hstate_vma(vma));