From patchwork Mon Oct 7 21:42:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13825306 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA851CED242 for ; Mon, 7 Oct 2024 21:42:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CAC16B0082; Mon, 7 Oct 2024 17:42:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 379BA6B0083; Mon, 7 Oct 2024 17:42:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2688A6B0085; Mon, 7 Oct 2024 17:42:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 08C1B6B0082 for ; Mon, 7 Oct 2024 17:42:29 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0F01A801AF for ; Mon, 7 Oct 2024 21:42:28 +0000 (UTC) X-FDA: 82648130376.08.F06D7ED Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf17.hostedemail.com (Postfix) with ESMTP id 6BE5940005 for ; Mon, 7 Oct 2024 21:42:26 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lc4JVmAg; spf=pass (imf17.hostedemail.com: domain of jannh@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728337278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=oEqYXVDZfj34LfL39Znx/Mkr1hRg4FTviS8biIN1NTU=; b=sSeUb0EhO97wXHJ5vuPeVMCIeZbVxerU0bNxGwOZD4FD2Vkc5o85w9uPGeeV2t4jfBRGs0 uiBFAsTZ7C637FuQfmahM4IHffwj3EJWlwwFuV6CSlNMo/gN2lR0IjWjjNmVXz3QQWVwFV fzQTDPLY0tyeXBVkijS/yZqPWh33RC4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lc4JVmAg; spf=pass (imf17.hostedemail.com: domain of jannh@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728337278; a=rsa-sha256; cv=none; b=mlkAlo/qyIO+3F+GAhru8voZXFSEd14kjx9vHF1jD7VXp33gXDkEMM/s6myZqXApOviNrB Q218daZxQjo7vS7lIO1ppUCS3jzaDqgjSoAmmYU3l98UQd7I/uj4dAUiECBZ5cyk7suIEa 3i8rePyyIj0z6Z/1vAMQ2yYAhAwdRgw= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-42cb1dd2886so87385e9.0 for ; Mon, 07 Oct 2024 14:42:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728337345; x=1728942145; darn=kvack.org; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:from:to:cc:subject:date:message-id:reply-to; bh=oEqYXVDZfj34LfL39Znx/Mkr1hRg4FTviS8biIN1NTU=; b=lc4JVmAgMBwgv3X84BSMTCsJ/3v8RclgPD+0pDZWHi1i6NazDcmoRNrQIMIyR9P/X+ xaj0mVBs+rv/oefNnn0U/1XIlRZNLhcMaKPEmgMcHMoxDXkUVgiRwdkwfNlm+Mf/5x+A 6p6fWBCsxLOzC+28vlqqFfks3R4BLZFb6fJpIhwDgWELiDIRKVfUQQBLdHf4hsw8jsTj 7c0l3IuT7KZzgWdFfTbDNu6m4IuBUIFwd3DLeSI08UXvAX+4flN1eeHosHNVgQPt9nkf 06F+rdH/dtHqv2SOBef5Wc9Os0Bs6nDRQhCFqk9ThQlgF1xVM/fO40yCtzSqEvIdf2HZ UClA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728337345; x=1728942145; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oEqYXVDZfj34LfL39Znx/Mkr1hRg4FTviS8biIN1NTU=; b=VuZZNLAi6LCpEHEcBU6xXkGELFo8AUXgkUzajvSgYOhbwMv64SApK1nxH4AfizKHzp b5HPI6omX2ae/elycFEEKCaxrrlOmE95PrTtgaSPFbeEXetFMfRbXM7FIZ74bi32BM8W iHAQrkpXPsaXDELc0wd+WoOWyzvWL1p6Kd/MWF2CF8YF9rjFw4+z2+luMRUuBwMoGCly nQP/qlT31Jg7Fno1wn6VDSFIcGI/PFdLJLPuKKNjMKCJaCGIFmQqpYC4RmkRflqpCOwK IHmOjta8jDfIDL1nDeTr7Y+3ftUz/fj0T4IltWNsizEDqSZLxKsebysFqj57Guzwp2Xk Yrog== X-Gm-Message-State: AOJu0Yy++V2GStfuEPGrpfI8x6eYGBQ4w9CvZojBE8UBlqGJsZQfY6E7 loi6vI5uMPGokDCo25tcaqtXyYwJ7JovbV25wRFrDn4V5XPvrODdw1GK5/dX+a66lXZGqmTwObp Yd0oj X-Google-Smtp-Source: AGHT+IE+EIiw1yOPib1Q6zhGz6XSc+go3+7VvfBDNqOXffHNmeo1/5vijDgYWuZ+sdL4mS7/SOTyXA== X-Received: by 2002:a05:600c:c0b:b0:42b:a8fc:3937 with SMTP id 5b1f17b1804b1-42fcdcdbc9emr1453395e9.4.1728337343926; Mon, 07 Oct 2024 14:42:23 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:39d2:ccab:c4ec:585b]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42f89e8a519sm87095545e9.14.2024.10.07.14.42.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 14:42:23 -0700 (PDT) From: Jann Horn Date: Mon, 07 Oct 2024 23:42:04 +0200 Subject: [PATCH] mm/mremap: Fix move_normal_pmd/retract_page_tables race MIME-Version: 1.0 Message-Id: <20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com> X-B4-Tracking: v=1; b=H4sIAKtVBGcC/x2NWwqDMBAAryL73YUYi7G9ShEJybZdyIssBEG8u 8HPYWDmAKHKJPAeDqjUWDinDuNjAPe36UfIvjNopZ+jUgZjbrSlXKMNW4kem6DLIdgihF/eUeO 0GHrZeXHGz9A7pVIX9+OznucFX/vrH3MAAAA= To: akpm@linux-foundation.org, david@redhat.com Cc: linux-mm@kvack.org, willy@infradead.org, hughd@google.com, lorenzo.stoakes@oracle.com, joel@joelfernandes.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Jann Horn X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1728337339; l=5356; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=cge+0ZUm06KK/0QAuuB6eoRRP/4JrG1w5WnCUSMC+us=; b=v+M5lo/3HNLWdMaYqWCgKp61ZnCEjXO6kZ29J1qibLRoisxNUT4w4LP94eX/peroRQBPTrj/0 GI58xxIN8OMBIF3GkTO48gkh57jyPnTb9SykhQG26HYEZjWrRqdv9It X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 6BE5940005 X-Stat-Signature: kgpr14ijhs1ceihmtxqi8nhc11mxno4o X-HE-Tag: 1728337346-693211 X-HE-Meta: U2FsdGVkX18XwOOJ25jQNGlfMh+G4rpWqitjLE+7TzD7dpmGp1SnzByHX9NVgi6A0VMm2OZ0XkGtjNbWIzmRctIzV0NiY8ojjVNp2xhi38Lk3zXyXsf3jaRnq4EVAnxjOLHhRfmeHKs97kQnUawt7MXFV3ehrpLC6zTUya3BlT6uC9M0d1LrmJL8OqG1g+1fPDngIMH47rVp0EM3s2NzO6WGX0se/NkScaXyVB6FFUgCYZNz0ejeTAmUo7VmYpXEzv74be58WmJtIkbKWM+6U8VgasJocC6LH4oSyquQ+SZg//VrDWdVYm31kxjKNCDGMlUcYbxQcbrTbjj2TI5yfPMxo/cwpSuV/CAizRnvuVajxL3eQJ6H3x8vvPd0pYf9PAqRieMQCBqAKrdgXYaNEmKCb86D97efDul7YABdATIw3TcF4nV/UpCBVlo4zdO7LICUy2QnrsWLqbPz/adIgUZDrHNpdJ+0BbdhjadxB2O/XJensCFmEEzim9gL6LD3j0wBvpAtHV/0OM+ackOdSqiX6Jl8W/0R9wdzKizCCBP9qUF+Lhhj/+37mdSm7Bxgx2u9cjoKyWTevGN3sJV+P40ugQ0DL3jmQzih3bOsc/nUPLSevseIdtJyTTc6XcOVRNWLxGyRZV0tl6iKEI8dAsg+SW7EE3C2Uf96OVGFkFExisATSdtA77S5OzlPCFh12NI37kU0EKIZifeEC3zrfazBmKg9Fqk7M/LFipa6flQ4ygtIWsOxkcYC8TVuDLS2IkSISAxAcVmWb40d2TOwtnDG2x9yRL9cwVBpbSLupXpqBy2FxIlF+87gbgX3cKeXxsV3W6cSANoX5d64v/7x7DV1JFJ+gs40b3tM0qsfvXfGdWla3P35+Uwv7GEf57VsuGz4GYdlOcF3tgHpX9IIJx6SohbguWS++/38a6gCeyPn0uwtX9QsIDGwX521oL/CKZdU6dbwn3QvMcVPBBY UWwJZmE6 sh5nuat3AMLTtZIbArFYN4idkS5t1HmIqoc4t8EfOuYfotXUiy2I6IwryaY+BVD61zBbgZmsIgyDm1La7BtUaCiJvQtkO3RXnts3l0cx9mJBDFWfzwXQJNubX+shcrwtV/NggU+UW87pWPZJrPTecICu4nGwnG7q/DOnMRM40fCtMK+YNiuEVpuQoYztmjFyPQoDcYIkisXQToqH6bCi/l8TWI7AfeBKmNrbo8jtBiKUaz3wBO2+X3ccXeM7ZhG/wi+mo4m3szmEA22Fjk8WztsU+bXMhCOYu8Kt3i8bgttoBJ8qsE7yNZzRigtLum6wTLVXHBIeLVT1Rm8C1hmXYUPhM4tCMVCh/yxPM1V98oIS1XysFvPnzdTtmUG9aRLW8+e8Bf2DZCVic15fxVy/6b1URkKeYj7GE2C9HWu/PBVHzZNE83FnhzZw6AA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In mremap(), move_page_tables() looks at the type of the PMD entry and the specified address range to figure out by which method the next chunk of page table entries should be moved. At that point, the mmap_lock is held in write mode, but no rmap locks are held yet. For PMD entries that point to page tables and are fully covered by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called, which first takes rmap locks, then does move_normal_pmd(). move_normal_pmd() takes the necessary page table locks at source and destination, then moves an entire page table from the source to the destination. The problem is: The rmap locks, which protect against concurrent page table removal by retract_page_tables() in the THP code, are only taken after the PMD entry has been read and it has been decided how to move it. So we can race as follows (with two processes that have mappings of the same tmpfs file that is stored on a tmpfs mount with huge=advise); note that process A accesses page tables through the MM while process B does it through the file rmap: process A process B ========= ========= mremap mremap_to move_vma move_page_tables get_old_pmd alloc_new_pmd *** PREEMPT *** madvise(MADV_COLLAPSE) do_madvise madvise_walk_vmas madvise_vma_behavior madvise_collapse hpage_collapse_scan_file collapse_file retract_page_tables i_mmap_lock_read(mapping) pmdp_collapse_flush i_mmap_unlock_read(mapping) move_pgt_entry(NORMAL_PMD, ...) take_rmap_locks move_normal_pmd drop_rmap_locks When this happens, move_normal_pmd() can end up creating bogus PMD entries in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect depends on arch-specific and machine-specific details; on x86, you can end up with physical page 0 mapped as a page table, which is likely exploitable for user->kernel privilege escalation. Fix the race by letting process B recheck that the PMD still points to a page table after the rmap locks have been taken. Otherwise, we bail and let the caller fall back to the PTE-level copying path, which will then bail immediately at the pmd_none() check. Bug reachability: Reaching this bug requires that you can create shmem/file THP mappings - anonymous THP uses different code that doesn't zap stuff under rmap locks. File THP is gated on an experimental config flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem THP to hit this bug. As far as I know, getting shmem THP normally requires that you can mount your own tmpfs with the right mount flags, which would require creating your own user+mount namespace; though I don't know if some distros maybe enable shmem THP by default or something like that. Bug impact: This issue can likely be used for user->kernel privilege escalation when it is reachable. Cc: stable@vger.kernel.org Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") Closes: https://project-zero.issues.chromium.org/371047675 Co-developed-by: David Hildenbrand Signed-off-by: Jann Horn Signed-off-by: David Hildenbrand Acked-by: Qi Zheng Reviewed-by: Lorenzo Stoakes --- @David: please confirm we can add your Signed-off-by to this patch after the Co-developed-by. (Context: David basically wrote the entire patch except for the commit message.) @akpm: This replaces the previous "[PATCH] mm/mremap: Prevent racing change of old pmd type". --- mm/mremap.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) --- base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b change-id: 20241007-move_normal_pmd-vs-collapse-fix-2-387e9a68c7d6 diff --git a/mm/mremap.c b/mm/mremap.c index 24712f8dbb6b..dda09e957a5d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, { spinlock_t *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; + bool res = false; pmd_t pmd; if (!arch_supports_page_table_move()) @@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - /* Clear the pmd */ pmd = *old_pmd; + + /* Racing with collapse? */ + if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd))) + goto out_unlock; + /* Clear the pmd */ pmd_clear(old_pmd); + res = true; VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); +out_unlock: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); - return true; + return res; } #else static inline bool move_normal_pmd(struct vm_area_struct *vma,