From patchwork Mon May 22 04:46:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13249749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1757DC7EE23 for ; Mon, 22 May 2023 04:46:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54877900003; Mon, 22 May 2023 00:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F878900002; Mon, 22 May 2023 00:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C0D4900003; Mon, 22 May 2023 00:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 28BA6900002 for ; Mon, 22 May 2023 00:46:39 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0016A4043A for ; Mon, 22 May 2023 04:46:38 +0000 (UTC) X-FDA: 80816655318.06.CDCE280 Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf21.hostedemail.com (Postfix) with ESMTP id 28C6C1C000F for ; Mon, 22 May 2023 04:46:36 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=xs7LnD3g; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of hughd@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684730797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=PQWOt6gBc+CKDhx+TRiKDHcqxVuU7yFMtWFx3Old4wc=; b=hGNPWysO/4iIJNvuda06YhFShI4TiCIrm3Z26dUGdpUu/GBf1FXq+IB6FuyzY7wPW+QV16 NJ3IWwoXWJakgWGUbIzKflG1YnRI8Dpk6WHbEdbl66pQEaf212ibA5IR7T3rdxao8Aky67 3kVuG0JKjU6qlqcM3VYKnUbJZCLQiCo= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=xs7LnD3g; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of hughd@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684730797; a=rsa-sha256; cv=none; b=zfUD6UMAFElN/r2ZznFFkOUiNRTXGbbZzWsAFUFYkjgRDqocUi3XFsNXuDWbBut3Fv4FWL szjIQ+5gF6m98ErmFvTGjKacgWQkTr1USTUviOPLmge1D8QsaULrjz4kyKXtDLntCW3oKf SKbgDhvfpSJczvRQXfrWSZQ9hB+dX3o= Received: by mail-yb1-f176.google.com with SMTP id 3f1490d57ef6-b9e6ec482b3so8113398276.3 for ; Sun, 21 May 2023 21:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684730796; x=1687322796; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=PQWOt6gBc+CKDhx+TRiKDHcqxVuU7yFMtWFx3Old4wc=; b=xs7LnD3g01pC8tEgDYoSLWJNQiRFSCcFpPWkFQaxEplgFZwjHZdso8hybJ0lCp7R7o knFdhCFFwMz5GBWAhY+FVJASGIZDJoTFm7uwFtiukkYVNgYW+4Ll7KT25jalnATTLyNX PweKEETGzwyq818ek0NNVOc2y9iJNx1QaRZnN2ttWXNtj0c/cQN2RKbBzPbYr/gtj1xw IX/8mFVZyMhqbSediiSa3E+s9E4MTmxwiU/RrI5jKBlBSjlcT3NL1CJUNb+68ZO2AOcH Ddm9/MwfnfCIdgn2A45f+epy2MIvR5wmFZMQmVKYYNzXegUOBDieOb3S4X4/hX7vG2TI 9DaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684730796; x=1687322796; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PQWOt6gBc+CKDhx+TRiKDHcqxVuU7yFMtWFx3Old4wc=; b=NPk4eTlIfzE22qxPzwP9D0NxnMd3ZBpWPtCkSNFWE8dyrRMIplZwMJuz96bZ3REjIZ mXZ7VznTVQKucbZ+eoawv3yREEQznJCSet3wdwYk7pcNne4qw0HjrL5FEjzP1hxJJbcy 8oXs3eJ1l7lVyl2I9cpXxjKCGFCjGEuaNorAfAg4Dmad2Ciylaf0a498mpUZdWVUOjoV M0MzB8XYE3RbJrqDdfnwojkm7LqeSZ6JlDOWomVuJwLFOARNZK9XhTEEOfQ0ybuBHYMw HjSWFwlaX552jR6mYmb2WCgsOUBU/nund4Pg+kWpZiF1nE6Sd7wUrCPiGLK3yhrS1tXT FTIw== X-Gm-Message-State: AC+VfDzQs9NPEFSMSmcJLJON72ykNYtrYqzo7x2rkDLctqCHxjKV848D CBfBrPloUBptNmVxzovAJ+WjEg== X-Google-Smtp-Source: ACHHUZ5m9LLsR+uIw2+klOp58MI6kQmpKTJ2WRYOi9XUbXnhehZ8WAVxgEjjZAI/8TbxyyzIi8mmXw== X-Received: by 2002:a25:ea01:0:b0:ba6:9eb7:c41 with SMTP id p1-20020a25ea01000000b00ba69eb70c41mr10240608ybd.52.1684730795969; Sun, 21 May 2023 21:46:35 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g1-20020a259341000000b00b9a7db655ecsm981325ybo.23.2023.05.21.21.46.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:46:35 -0700 (PDT) Date: Sun, 21 May 2023 21:46:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 00/31] mm: allow pte_offset_map[_lock]() to fail Message-ID: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 28C6C1C000F X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 38jwiahwabebfb5aq8tbtrc5q4qcajeq X-HE-Tag: 1684730796-171293 X-HE-Meta: U2FsdGVkX1+keRu6WqpRSxH36LHydZ+cPMz4ZQ0H+HtyzqrBOsSXCBLyl5+TNQ1XG+P1SY16+h28AIq47gZS8l8isEhJHKjv8z4HFJfn/Z94ocx7+rbN2Pgt2U8eU2mak6ZK9FYix+GokKrAzX4RuyzA40861cjuN/oJPXdoNYex5zuOO+DOTSa6v2XGNPHYy9Vaih4DRni+qXBBia6bynhCiOh1zaln8vuobN6tTlZbRKoZeLjusI1nnrHSxCN9Axh2W1Z/WAZehMalu0wxZRTVmnv+4vdSp6wf6MORfjb7j0RC7EZp728sxdqwVF/oDb0m4Dspb+eaxsUg6wBqtd29H/gPyBfmJst8jdb/dIozsBHFriPDL0qCNBm0a809ZtGaheuG38qJv9pNCiaT3tKqH/PvchAV5skEkJ5I2vyWqOTe4bNVPV4HfIMDdvahqpOFXcWv6MHZnzAi0cicQLSe8aCErCjMIYa1wdI0s2UaLRfF1OKAXTZt0jztSdlZEw66bzHZmH0gVHk+ct6izQ+fS26c5YqW6lKEODwZ+UxxVpt59duTOUzrirbtDv56qkRIim/XQ2E0seHj2tbRVuV95B3CKyY8p3SHXaUcdNWwxd+9qehYwP2XPDLeZ5Q1xR8QPUKQXDSlQBhx/TxK3ruYNurqNf+UlyY1ruJT0EvlOOIe+Cr0HJHspxkaBtbXR3c5Y0xavGrDKJvEphJeG7hSr9DzwJLnGhhuciTOFMpGNscjxF737BD0GDFbRzeSeXkyueQfWgmdrQqHwHOgqmrct5rCQaLJjTrb/AZjRa8K8dY3PGsQ8mcg7vGu1+ATa+NK7u8Z7iwRhaHeGFSXla9EsIvWgc0jErWOa/gnWhjtcFc9JXkbzp7v9JHE+mj5xrD41pGtEtN0vjW+WRt5UUQdRTrzP2YBWGKvcDwpJjbv7KGEyetUMwX4dadqbGFF0AQohSh3U78wvNRt2Xj usPlpihn k5FtbzIzdWBDou6+araDb626+5oM8uAhWOw5qxpxRbYZnBwnbAGsrztjpc/VSDTOrwUh2RiGP9n8+7Pky+GtvWs0+Xv55Jm+kewF18WfsbjKBRkN8UcYKUZSa9yHw3V7GR0WznNOI7iJJKaoh7kmW2PBofeTrP3vjV3eA44X9Y5CcGIbXb1Qo+4+svO8iyjEabXllpsAoRrN+9GN+sCjeeGnkGC791mcoJRJ0A2tdtyXXdaRKyg02Rs4NQdmHrRw8aMhBZUUTCj8idDEh10mjeDNIdGFXudzqcdiwqwBqvAzevcbfh7mCoxCOCK1v5c/tH6C6MppI8NOiUJEz8CsBTKOdlxhuHBIqpqI9n/o30HTbvr2pUusA706Q8hO8E/DdeYbCfvJX9jKZGL8CWmeJcXsWVh0+f4Uatq8Pw3Lpe6NM92pAg3oTVEFTHfeSOUTLzeznWmVNAFVXjp0Egp8DyHVeR4duDq/pIDBpk1vLi8MQPv4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Here is a series of patches to mm, based on v6.4-rc2: preparing for changes to follow (mostly in mm/khugepaged.c) affecting pte_offset_map() and pte_offset_map_lock(). This follows on from the "arch: allow pte_offset_map[_lock]() to fail" https://lore.kernel.org/linux-mm/77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com/ series of 23 posted on 2023-05-09. These two series are "independent": neither depends for build or correctness on the other, but both series have to be in before a third series is added to make the effective changes - though I anticipate that people will want to see at least an initial version of that third series soon, to complete the context for them all. What is it all about? Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs; but likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick and easy attempt looked like it was going to shift the load from mmap rwsem to pmd spinlock - not an improvement. I would much prefer not to have to make these small but wide-ranging changes for such a niche case; but failed to find another way, and have heard that shmem MADV_COLLAPSE's usefulness is being limited by that mmap_write_lock it currently requires. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. What is this preparatory series about? The current mmap locking will not be enough to guard against that tricky transition between pmd entry pointing to page table, and empty pmd entry, and pmd entry pointing to huge page: pte_offset_map() will have to validate the pmd entry for itself, returning NULL if no page table is there. What to do about that varies: sometimes nearby error handling indicates just to skip it; but in many cases an ACTION_AGAIN or "goto again" is appropriate (and if that risks an infinite loop, then there must have been an oops, or pfn 0 mistaken for page table, before). Given the likely extension to freeing empty page tables, I have not limited this set of changes to a THP config; and it has been easier, and sets a better example, if each site is given appropriate handling: even where deeper study might prove that failure could only happen if the pmd table were corrupted. Several of the patches are, or include, cleanup on the way; and by the end, pmd_trans_unstable() and suchlike are deleted: pte_offset_map() and pte_offset_map_lock() then handle those original races and more. Most uses of pte_lockptr() are deprecated, with pte_offset_map_nolock() taking its place. Based on v6.4-rc2, but also good for -rc1, -rc3, current mm-everything and linux-next. 01/31 mm: use pmdp_get_lockless() without surplus barrier() 02/31 mm/migrate: remove cruft from migration_entry_wait()s 03/31 mm/pgtable: kmap_local_page() instead of kmap_atomic() 04/31 mm/pgtable: allow pte_offset_map[_lock]() to fail 05/31 mm/filemap: allow pte_offset_map_lock() to fail 06/31 mm/page_vma_mapped: delete bogosity in page_vma_mapped_walk() 07/31 mm/page_vma_mapped: reformat map_pte() with less indentation 08/31 mm/page_vma_mapped: pte_offset_map_nolock() not pte_lockptr() 09/31 mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails 10/31 mm/pagewalk: walk_pte_range() allow for pte_offset_map() 11/31 mm/vmwgfx: simplify pmd & pud mapping dirty helpers 12/31 mm/vmalloc: vmalloc_to_page() use pte_offset_kernel() 13/31 mm/hmm: retry if pte_offset_map() fails 14/31 fs/userfaultfd: retry if pte_offset_map() fails 15/31 mm/userfaultfd: allow pte_offset_map_lock() to fail 16/31 mm/debug_vm_pgtable,page_table_check: warn pte map fails 17/31 mm/various: give up if pte_offset_map[_lock]() fails 18/31 mm/mprotect: delete pmd_none_or_clear_bad_unless_transhuge() 19/31 mm/mremap: retry if either pte_offset_map_*lock() fails 20/31 mm/madvise: clean up pte_offset_map_lock() scans 21/31 mm/madvise: clean up force_shm_swapin_readahead() 22/31 mm/swapoff: allow pte_offset_map[_lock]() to fail 23/31 mm/mglru: allow pte_offset_map_nolock() to fail 24/31 mm/migrate_device: allow pte_offset_map_lock() to fail 25/31 mm/gup: remove FOLL_SPLIT_PMD use of pmd_trans_unstable() 26/31 mm/huge_memory: split huge pmd under one pte_offset_map() 27/31 mm/khugepaged: allow pte_offset_map[_lock]() to fail 28/31 mm/memory: allow pte_offset_map[_lock]() to fail 29/31 mm/memory: handle_pte_fault() use pte_offset_map_nolock() 30/31 mm/pgtable: delete pmd_trans_unstable() and friends 31/31 perf/core: Allow pte_offset_map() to fail Documentation/mm/split_page_table_lock.rst | 17 +- fs/proc/task_mmu.c | 32 ++-- fs/userfaultfd.c | 21 +-- include/linux/migrate.h | 4 +- include/linux/mm.h | 27 ++- include/linux/pgtable.h | 142 +++----------- include/linux/swapops.h | 17 +- kernel/events/core.c | 4 + mm/damon/vaddr.c | 12 +- mm/debug_vm_pgtable.c | 9 +- mm/filemap.c | 25 +-- mm/gup.c | 34 ++-- mm/hmm.c | 4 +- mm/huge_memory.c | 33 ++-- mm/khugepaged.c | 83 +++++---- mm/ksm.c | 10 +- mm/madvise.c | 146 ++++++++------- mm/mapping_dirty_helpers.c | 34 +--- mm/memcontrol.c | 8 +- mm/memory-failure.c | 8 +- mm/memory.c | 224 ++++++++++------------- mm/mempolicy.c | 7 +- mm/migrate.c | 40 ++-- mm/migrate_device.c | 31 +--- mm/mincore.c | 9 +- mm/mlock.c | 4 + mm/mprotect.c | 79 ++------ mm/mremap.c | 28 ++- mm/page_table_check.c | 2 + mm/page_vma_mapped.c | 97 +++++----- mm/pagewalk.c | 33 +++- mm/pgtable-generic.c | 56 ++++++ mm/swap_state.c | 3 + mm/swapfile.c | 38 ++-- mm/userfaultfd.c | 10 +- mm/vmalloc.c | 3 +- mm/vmscan.c | 16 +- 37 files changed, 641 insertions(+), 709 deletions(-) Hugh