From patchwork Mon May 22 05:25:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13249789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79009C7EE23 for ; Mon, 22 May 2023 05:25:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 142396B0075; Mon, 22 May 2023 01:25:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F2C56B0078; Mon, 22 May 2023 01:25:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFDEA900002; Mon, 22 May 2023 01:25:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E06066B0075 for ; Mon, 22 May 2023 01:25:31 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A5691140422 for ; Mon, 22 May 2023 05:25:31 +0000 (UTC) X-FDA: 80816753262.17.EF180E6 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf17.hostedemail.com (Postfix) with ESMTP id C564B4000A for ; Mon, 22 May 2023 05:25:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Cd3nHwxa; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684733129; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=4IAqIWJPSoSdBBtvRm+B0njJi/2YOY1UHopYQKLxS/MA04DC54qg7mA/E5W/dTv2TY473I nTiBPmJQ/YhOR164pKDgGBcCS2QX6e0fHBrsPl5Tg5xlPzvwNzJTUZwt461org+ZDUyBA0 I25GB/juQak9sac/Zq3FJP8EIvjJBgw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Cd3nHwxa; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684733129; a=rsa-sha256; cv=none; b=O0hIg9BOa8Xc9sTNi6GvdEAHRelmGIG7pJIcAUcmGBaaSipzc1PfTo5UFkYfW4DykV1ZZn AN2S8kqf/LPjSXD/WEHfZmu/EohM74yKMXiBBIlSavva6hMiFCCCaLDQ6XMP1UrL3SfgRF zFCvjOTYpQtLruutinXNt/ATLMzT2I0= Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-ba8afcc82c0so8181973276.2 for ; Sun, 21 May 2023 22:25:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684733129; x=1687325129; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=Cd3nHwxaC/nh6HEQWhU6+zHLQfimMyJGpQmSRQFhhW/GPP873cSkYFOiS5uVG1tykM Ryshue+/dZ9gOhxIxMYC4NRprObjBBWLSBCffW9HayG++u4h8tLgWq5NECMQwyEn52g4 13GfY4O2VekWiqVxfPenJhxKSjbH9x7XqjT7KEDC9iyAKT/BVUfykNPLsCku8blfINLG XrU8Vy7pHm8Cw4m+KEkicTGibPauERe+Pk26G/c3KRes7WKtltas6Fuf97+EzswuIMvR NjmExJ27Xjr53poHCBGkRWEPVuUwRpPQE56LGrkD5LJsjNMZNI4MUNIfvWEA+VYa9OvY 6Vyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684733129; x=1687325129; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NBl0ppCRA8l25HwCsBAYjZw+ZZ23COv4vEuhAo6MgQs=; b=N8wLMSh7h5q3OnJQdxmy4ckkiooKrYlfetRI14f2Xn3kx6c0fJPXRH9GxElL84itQU /pFD7MMoh+LQy8/az/C1/OuVNvmg4dwApbjNZLyL8+/NRD6aMhLnbcghcr5jinJLYT4o G9qCt+75vsI5WJGmNKYNBGkI1vjBPINkeq8x9n0/OrkFaRLnXcBhPet7EtFI4+yDLKqN qyknbWJp6aw324pRQUT1hLnUvSkd5O2ywPmrzFI4Xp1v97zIf/RylCqJQLQfKWQ4d/vb 82JLShrHvWRMxmhGDgAA8Z4FzOOXP6zJ7pIuSWjTwSON8UabOKONpsY2eqxks/gafsRP WGFw== X-Gm-Message-State: AC+VfDxTRaIDgsgR3FTEu/FUriqc1f5faXlLLH+yVa6TRbIAyL5Hfs7y i2G0+WS0SxuHSQh50weVS7dADw== X-Google-Smtp-Source: ACHHUZ73cLvyzajJn77JRRVp9Omy0BAayWsdbmLAc2xDyGvyYdthxJkt0VkVvx7l16HAMXlwEB2XiA== X-Received: by 2002:a05:6902:1101:b0:ba7:3df3:6df5 with SMTP id o1-20020a056902110100b00ba73df36df5mr12099433ybu.38.1684733128711; Sun, 21 May 2023 22:25:28 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1267873ybn.49.2023.05.21.22.25.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 22:25:28 -0700 (PDT) Date: Sun, 21 May 2023 22:25:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 28/31] mm/memory: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C564B4000A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: gom8gunymcsme5r1cyo35n9trs8brrtr X-HE-Tag: 1684733129-943386 X-HE-Meta: U2FsdGVkX18nGsvr4uQhvshnBzke0dblYHwrRt51lr269D4eX64cRaTvGDLr82z9ff90GYw/4bMQ9nRjNbmU6vKP/GOyRpgYG62fiUZc6jy3zXG3fzxEWU5syoPxHnpyGdWD+pmP17ci7cWBKWHeTWG6f1j9Hg5jVpKsH1B2QRS40N7LNZy9ouPQOR6CDwNxmzlQxHfU60Dt5RhsirH5MpeBz/YbG0+u4wiJw390TwWcqhwzpJ5YKd9CSGAMGPi2fWjX0IDI2O/1hGXYGv7hA4O98iJEDhOqD1UHqxN3wTg4weOEfxzi5BPnED+iFmS3rokZ1SPWCBSkiU4DRLhFviRirmzPT5+H29eZAdeQtrKn4fC/PEC/Uaz+c/WR8eJ2QjNvpdaMMiU9hLqkTqQqeMQcxasEh+XIp0oUq/QUaeCGsEOvia/kiAVWYk9usefAp+3jlYh84xwdeKpzM1dvGzuTmRjDeG1qZco6apZTlcpUXve9N7s+S7nAoS060vjoucau1GqUVPgMk56sT/KFpG2itjqiPQa4Fc8pomBOS6lPE2ZjVwwHqhZK6QdCvF312LvY/fWpU7lZPT+IrnM15jbKqYZmm9koeID2nvFewqOnt4Sc2TWMo6tW7qpa8KkclpUc/g9/sf6+bzCLCkuC7uXuSX5MTmKi9oOmW2ZEF5guivjrV2XGS+xUEwxByVaVAO2QTZW16LmdJmZlU/mOq8NnxBOuyDGr2kcdoDRXmpVSB5Gcj+8PjEtYJ73Gl4h7jBBmbClYFU6xE9Oa4Er9uIMvMHOxA8iU0e2EDvAnl5IIuCpVfR2quyNGm0MjgfiYti6rwcyr/6F0UW9/nbDFwJiHvUczA/zAEdQmYz3GIHKYGsE0H/2xLZDLZtTSEGlG0PtDCvmutOnzk9CmPN5b9KUxuf0ndokcMF6xBgIZk121+qjyjQvPAm5rGSkN59oC1Fxqvk8g+vaact0/Jbc TUmjKmoe rncZZDUqwo4aHYcK3SHFz1rAvlhRZxpOM6QBSRmsJfLD5UE8D7fjaZwrxS6jFsK1+hFCSahqJ8Y7OMuVdfacLtUmneGku2fw0ciqEQCMZn9EZD7i/GaVx72qOLdBdQs90Svj9euggZoPmaiKxsj4v2vQhz16CXdOqZQrWT9yBsJfpmS6N/MZoT8Hxh56vG6v38m6IhzqNfqWiIPLJN2kXt7ysMswczAySDx7Xn8N42rxLRQtmbM1rM6DlZVQLzufwmCWasuMAECUQC+2aQqMBVLkKi1rRONnHfGS49soeZw5atudkH3Fp8qZCT0CibCEo3lQpcyQfdPi0rHz9syFBI51uW75VaC6MOX8Tq+UVu0AIfDqHytpP8tfKCNA/nIEIDvhCaLcgLApCe5KjchULhaKQgFvI/SZ1T7ExImCPKXZE1WbgjfIsYwQNNQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: copy_pte_range(): use pte_offset_map_nolock(), and allow for it to fail; but with a comment on some further assumptions that are being made there. zap_pte_range() and zap_pmd_range(): adjust their interaction so that a pte_offset_map_lock() failure in zap_pte_range() leads to a retry in zap_pmd_range(); remove call to pmd_none_or_trans_huge_or_clear_bad(). Allow pte_offset_map_lock() to fail in many functions. Update comment on calling pte_alloc() in do_anonymous_page(). Remove redundant calls to pmd_trans_unstable(), pmd_devmap_trans_unstable(), pmd_none() and pmd_bad(); but leave pmd_none_or_clear_bad() calls in free_pmd_range() and copy_pmd_range(), those do simplify the next level down. Signed-off-by: Hugh Dickins --- mm/memory.c | 172 +++++++++++++++++++++++++--------------------------- 1 file changed, 82 insertions(+), 90 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2eb54c0d5d3c..c7b920291a72 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1012,13 +1012,25 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, progress = 0; init_rss_vec(rss); + /* + * copy_pmd_range()'s prior pmd_none_or_clear_bad(src_pmd), and the + * error handling here, assume that exclusive mmap_lock on dst and src + * protects anon from unexpected THP transitions; with shmem and file + * protected by mmap_lock-less collapse skipping areas with anon_vma + * (whereas vma_needs_copy() skips areas without anon_vma). A rework + * can remove such assumptions later, but this is good enough for now. + */ dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) { ret = -ENOMEM; goto out; } - src_pte = pte_offset_map(src_pmd, addr); - src_ptl = pte_lockptr(src_mm, src_pmd); + src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl); + if (!src_pte) { + pte_unmap_unlock(dst_pte, dst_ptl); + /* ret == 0 */ + goto out; + } spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte = src_pte; orig_dst_pte = dst_pte; @@ -1083,8 +1095,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); - spin_unlock(src_ptl); - pte_unmap(orig_src_pte); + pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); cond_resched(); @@ -1388,10 +1399,11 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, swp_entry_t entry; tlb_change_page_size(tlb, PAGE_SIZE); -again: init_rss_vec(rss); - start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte = start_pte; + start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return addr; + flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { @@ -1507,17 +1519,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, * If we forced a TLB flush (either due to running out of * batch buffers or because we needed to flush dirty TLB * entries before releasing the ptl), free the batched - * memory too. Restart if we didn't do everything. + * memory too. Come back again if we didn't do everything. */ - if (force_flush) { - force_flush = 0; + if (force_flush) tlb_flush_mmu(tlb); - } - - if (addr != end) { - cond_resched(); - goto again; - } return addr; } @@ -1536,8 +1541,10 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); - else if (zap_huge_pmd(tlb, vma, pmd, addr)) - goto next; + else if (zap_huge_pmd(tlb, vma, pmd, addr)) { + addr = next; + continue; + } /* fall through */ } else if (details && details->single_folio && folio_test_pmd_mappable(details->single_folio) && @@ -1550,20 +1557,14 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, */ spin_unlock(ptl); } - - /* - * Here there can be other concurrent MADV_DONTNEED or - * trans huge page faults running, and if the pmd is - * none or trans huge it can change under us. This is - * because MADV_DONTNEED holds the mmap_lock in read - * mode. - */ - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) - goto next; - next = zap_pte_range(tlb, vma, pmd, addr, next, details); -next: - cond_resched(); - } while (pmd++, addr = next, addr != end); + if (pmd_none(*pmd)) { + addr = next; + continue; + } + addr = zap_pte_range(tlb, vma, pmd, addr, next, details); + if (addr != next) + pmd--; + } while (pmd++, cond_resched(), addr != end); return addr; } @@ -1905,6 +1906,10 @@ static int insert_pages(struct vm_area_struct *vma, unsigned long addr, const int batch_size = min_t(int, pages_to_write_in_pmd, 8); start_pte = pte_offset_map_lock(mm, pmd, addr, &pte_lock); + if (!start_pte) { + ret = -EFAULT; + goto out; + } for (pte = start_pte; pte_idx < batch_size; ++pte, ++pte_idx) { int err = insert_page_in_batch_locked(vma, pte, addr, pages[curr_page_idx], prot); @@ -2572,10 +2577,10 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, mapped_pte = pte = (mm == &init_mm) ? pte_offset_kernel(pmd, addr) : pte_offset_map_lock(mm, pmd, addr, &ptl); + if (!pte) + return -EINVAL; } - BUG_ON(pmd_huge(*pmd)); - arch_enter_lazy_mmu_mode(); if (fn) { @@ -2804,7 +2809,6 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, int ret; void *kaddr; void __user *uaddr; - bool locked = false; struct vm_area_struct *vma = vmf->vma; struct mm_struct *mm = vma->vm_mm; unsigned long addr = vmf->address; @@ -2830,12 +2834,12 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, * On architectures with software "accessed" bits, we would * take a double page fault, so mark it accessed here. */ + vmf->pte = NULL; if (!arch_has_hw_pte_young() && !pte_young(vmf->orig_pte)) { pte_t entry; vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked = true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* * Other thread has already handled the fault * and update local tlb only @@ -2857,13 +2861,12 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, * zeroes. */ if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { - if (locked) + if (vmf->pte) goto warn; /* Re-validate under PTL if the page is still mapped */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - locked = true; - if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) { /* The PTE changed under us, update local tlb */ update_mmu_tlb(vma, addr, vmf->pte); ret = -EAGAIN; @@ -2888,7 +2891,7 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, ret = 0; pte_unlock: - if (locked) + if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); kunmap_atomic(kaddr); flush_dcache_page(dst); @@ -3110,7 +3113,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte - we dropped the lock */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) { if (old_folio) { if (!folio_test_anon(old_folio)) { dec_mm_counter(mm, mm_counter_file(&old_folio->page)); @@ -3178,19 +3181,20 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) /* Free the old page.. */ new_folio = old_folio; page_copied = 1; - } else { + pte_unmap_unlock(vmf->pte, vmf->ptl); + } else if (vmf->pte) { update_mmu_tlb(vma, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); } - if (new_folio) - folio_put(new_folio); - - pte_unmap_unlock(vmf->pte, vmf->ptl); /* * No need to double call mmu_notifier->invalidate_range() callback as * the above ptep_clear_flush_notify() did already call it. */ mmu_notifier_invalidate_range_only_end(&range); + + if (new_folio) + folio_put(new_folio); if (old_folio) { if (page_copied) free_swap_cache(&old_folio->page); @@ -3230,6 +3234,8 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED)); vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* * We might have raced with another page fault while we released the * pte_offset_map_lock. @@ -3591,10 +3597,11 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) restore_exclusive_pte(vma, vmf->page, vmf->address, vmf->pte); - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); folio_unlock(folio); folio_put(folio); @@ -3625,6 +3632,8 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf) { vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return 0; /* * Be careful so that we will only recover a special uffd-wp pte into a * none pte. Otherwise it means the pte could have changed, so retry. @@ -3728,11 +3737,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = pfn_swap_entry_to_page(entry); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - spin_unlock(vmf->ptl); - goto out; - } - + if (unlikely(!vmf->pte || + !pte_same(*vmf->pte, vmf->orig_pte))) + goto unlock; /* * Get a page reference while we know the page can't be * freed. @@ -3807,7 +3814,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) + if (likely(vmf->pte && pte_same(*vmf->pte, vmf->orig_pte))) ret = VM_FAULT_OOM; goto unlock; } @@ -3877,7 +3884,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) + if (unlikely(!vmf->pte || !pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap; if (unlikely(!folio_test_uptodate(folio))) { @@ -4003,13 +4010,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out: if (si) put_swap_device(si); return ret; out_nomap: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); out_page: folio_unlock(folio); out_release: @@ -4041,22 +4050,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return VM_FAULT_SIGBUS; /* - * Use pte_alloc() instead of pte_alloc_map(). We can't run - * pte_offset_map() on pmds where a huge pmd might be created - * from a different thread. - * - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when - * parallel threads are excluded by other means. - * - * Here we only have mmap_read_lock(mm). + * Use pte_alloc() instead of pte_alloc_map(), so that OOM can + * be distinguished from a transient failure of pte_offset_map(). */ if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; - /* See comment in handle_pte_fault() */ - if (unlikely(pmd_trans_unstable(vmf->pmd))) - return 0; - /* Use the zero-page for reads */ if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm)) { @@ -4064,6 +4063,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vma->vm_page_prot)); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto unlock; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto unlock; @@ -4104,6 +4105,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + goto release; if (vmf_pte_changed(vmf)) { update_mmu_tlb(vma, vmf->address, vmf->pte); goto release; @@ -4131,7 +4134,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, vmf->address, vmf->pte); unlock: - pte_unmap_unlock(vmf->pte, vmf->ptl); + if (vmf->pte) + pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; release: folio_put(folio); @@ -4380,15 +4384,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } - /* - * See comment in handle_pte_fault() for how this scenario happens, we - * need to return NOPAGE so that we drop this page. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return VM_FAULT_NOPAGE; - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + if (!vmf->pte) + return VM_FAULT_NOPAGE; /* Re-check under ptl */ if (likely(!vmf_pte_changed(vmf))) { @@ -4630,17 +4629,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf) * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ if (!vma->vm_ops->fault) { - /* - * If we find a migration pmd entry or a none pmd entry, which - * should never happen, return SIGBUS - */ - if (unlikely(!pmd_present(*vmf->pmd))) + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) ret = VM_FAULT_SIGBUS; else { - vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, - vmf->pmd, - vmf->address, - &vmf->ptl); /* * Make sure this is not a temporary clearing of pte * by holding ptl and checking again. A R/M/W update @@ -5429,10 +5422,9 @@ int follow_pte(struct mm_struct *mm, unsigned long address, pmd = pmd_offset(pud, address); VM_BUG_ON(pmd_trans_huge(*pmd)); - if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) - goto out; - ptep = pte_offset_map_lock(mm, pmd, address, ptlp); + if (!ptep) + goto out; if (!pte_present(*ptep)) goto unlock; *ptepp = ptep;