From patchwork Tue Jun 20 07:35:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13285271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5363BEB64DB for ; Tue, 20 Jun 2023 07:36:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CFC348D0002; Tue, 20 Jun 2023 03:36:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CAC288D0001; Tue, 20 Jun 2023 03:36:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4E3D8D0002; Tue, 20 Jun 2023 03:36:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A8D1E8D0001 for ; Tue, 20 Jun 2023 03:36:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7B2D480674 for ; Tue, 20 Jun 2023 07:36:04 +0000 (UTC) X-FDA: 80922317448.15.A5D8C8C Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf08.hostedemail.com (Postfix) with ESMTP id AFA2E160009 for ; Tue, 20 Jun 2023 07:36:02 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=frIiNGSc; spf=pass (imf08.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687246562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=FpzqtsmXqE0VXfZTMUeJ78FSNoWcFIczYc6Ond/JfIk=; b=fGStOFMZE7TvjZP/AcSp/+p33B8PO/g+qvpUoHBb/Y6eyoS8P26Xm46u0Nr0jXUPQyrhf7 psaC6xwaGYEy1lpu0SderoKoMYoBkM3E+v1V0LOUxqwf0j6CeBxa1TE7rvkBYwydbEpzTX hnUFWxuyH4ajttiNkTFjeXVKGuweTmY= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=frIiNGSc; spf=pass (imf08.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687246562; a=rsa-sha256; cv=none; b=T//eJizK6sNBm2UR6Gmd6g/3EFsVoVcr4k7Ip7IQOr9lzG9rcIVpGhm2/FAbBhi11NqSvO h+4XpH8fBJDoW9TLwB0VJ5xZzX7cobnS18Tg4D9xzddSLyboIB8JzmVgYGCi2KOlNoloP/ 8jtjSv+1hFgLCSkv9AuuuN+mFZC5WTI= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-bcb6dbc477eso3395144276.1 for ; Tue, 20 Jun 2023 00:36:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246562; x=1689838562; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=FpzqtsmXqE0VXfZTMUeJ78FSNoWcFIczYc6Ond/JfIk=; b=frIiNGSckxr3Ip2k31541GefYKavTMPsaGf7AN2ZQ/qeqt+B/jHeTqmNeuJin3omia nAE3yn/kfNy+6HOeAKgwKpHwASyLd9XuAgSOH7v9/C3CHYEt5CaPSEM+VVdRLz3Arj1O 1MV9V/LQChOpWEI9pP0J0cw6jVVJL6Iu1QXbQ5ZfWl+32nsry7xWO3ybjTOFio3TkuYH GzINNRBUDJLfUTRZRl3zjXzV0fSEkZPaRdSVf9qkTwvAjySvDqnkKL/sr4BzQbYzqPUQ CyxCsy3C6a1tT4ChuzzxCy2ROctpNA35uBsJQXn7V0x/w2T1aFcY8JBVo5TvOp5j6A8C /jCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246562; x=1689838562; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=FpzqtsmXqE0VXfZTMUeJ78FSNoWcFIczYc6Ond/JfIk=; b=HS7/2gI9qtl9q4GDrxUq/8IHU6sdXx/WszVAfvg+XQXCfF3xojPJI2NIKpQBK6Ayd3 2Kt1eDwDCHtNN460Ub8mL+E/l2JnFF+lUmg2IjRWi2Pzz4IyWqcuolNGqqe3hFO96MLc Xh6Ae6p1ODGn7D6XA3U/0/2gjMK+mKuDsj2Wf8behLPFKwNlwVae/9qN3aDjBa9oPPI2 SCqdsD0kCgi8G8ZLELcvQ8bXMZfVkzvGi7ASGLj63NPMV98R/dPclPDWw+oq9qb6+SVV UoXcFF0CpxqyI6pXmix53hSjqmjg/rndb84CUvC7BvKUtEJSw4OQddnREuIpdTZunX1Y LjpA== X-Gm-Message-State: AC+VfDxFPyNasJ0eZCoIxHUnPuQL/2OFee/f+R05VFzKeNj+LomB5HI8 nREKCu3M2i6oD7vobTYQqbLnxA== X-Google-Smtp-Source: ACHHUZ7bq5s3dhMW3ZH11OR6RQmE3NYzsjKhQytjIcSskZHm6yt+ISQcBlLo4KPgUjAhZKGeadJUsA== X-Received: by 2002:a5b:c4a:0:b0:bcb:65d1:a01 with SMTP id d10-20020a5b0c4a000000b00bcb65d10a01mr6119105ybr.12.1687246561605; Tue, 20 Jun 2023 00:36:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id f124-20020a255182000000b00b9dfcc7a1fasm258265ybb.7.2023.06.20.00.35.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:36:01 -0700 (PDT) Date: Tue, 20 Jun 2023 00:35:49 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Shaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 00/12] mm: free retracted page table by RCU Message-ID: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AFA2E160009 X-Rspam-User: X-Stat-Signature: b8sfq7zgdrjyu16y6kdb5dn346y8f4c8 X-Rspamd-Server: rspam01 X-HE-Tag: 1687246562-43779 X-HE-Meta: U2FsdGVkX1+qA1N0+0eeIXDRk5twkDr26wUlm3pkuL1VjC/6xYoZ8SPWcurE0wk0B4e/cn74+yRiyhyejUFNyo872gE7Rm/43R109NfgXaaAQlEj8cKjjFkNRz64mS2dAcoMSQAtxTzi+jnEN/O1I3/mZJisj1eeTZZiTp+B3Bptmg77v9Rra2iauwfEMyCQZC3eb8CUbtaDoGiEhbnV5DaXB1rz96WKo52gH7kSyRf/66XeTzcfifYAsgJWbRQgAaTNDS2ydQcGJMCn60lMUE0eKhFuf/qrzplExWWxd2s645pK3zW2zXf5ByTjK3M4eQA0S+5vRydxe4ZVIbz366PRVyZ5FWnMiyXOt4SkiszIrCq+gvHEpFI8sddJQHzD0oblvf0evTQY5fT49htlmQrG8WV1OQTc3YJ0k5+hTKFiba8ngmgd0tx03gGgr1ho7QasNkN9+kww8orRXUWkoYMttaVuD6DYwc/8ctSEx9rETje4IGOitIPaxmbHdK6SqVice/gghcZ1LgdB9OSOvED6wlK7fnwSozq0udkVAyVdrOwzcXhQjSrsWbOG4Y+OTjOgrp4aJY4DC479Z9GiPIIOv0ine9IjKldTdGGYXN8zh7etwMsKVdSLY0yKYp44EhmMquAa8iXW8aysC3+qP8+8iModvLogF0FUtZobEsxRyM6dbMT+XC7bhVjayZ7T5FA5ZgX+l/dJ1XXZu1XcxjR4DkwLGJp/XfGhkJhuADHa1RhGzXmAavrhM2n5rYZJEU3i2D0xvyfl31pbpLx+ajVAfxygWmgxwyjUMBnRo1ZXVr1VFMwcYL25nDXkFnh2ehth4Q9X9OER0FrpQwQ/ATbTfJT36dr1+mNjJxevFcuxzP+Wi4vTod0xBb/rRd7gdu7tNYG/pTKssBVLDj0xMGXqhnLqdmElA6x7shiOFqddE0G/G1d9/OdhqGLV6tTz+jFztGQW5siE7FCj7gZ ksHqx+mG xbhCQ+Z0x4kM7fjPycOJhG7WnA05K2ObG5EjrSEJwtWmyg6hW8BS8uqALgO0rBaDrDMX4ISO6Gym/gJyKQB8m/X6Xykb/knFRe3dUy4W1ss00BIxqgmI+xQnyDfSMY4g4gq5P+is8DYq5mMeyy/zU9YqzUuNoF4kdeFQ7RSql1u/Xb7DQsGCYJs9Fl5TxG8x+4mSgi2KIOoqW6UihkmrjGagi/M8FmkM4nKE0xYKMxZsbssa7yYcMo+jwTxiwSZn594IMYbd76z0fsTX7EIdTohZuinDTh7/PNmMpawxQnt6vXdI69+oOJU4C99xtEv/lWbZG9EJ9qEJitUfg0p/FBg8YXQeY0eCSVQM2RfZWs3LkIxod5xB6L/RRl13JzP1QO9GSTNwKbjNmquJ1pVRN3OX0XtF2YxgFSGrJcDJ8uIxDflLfJvRMYJZZ3sdsCK9GPRzHmXLsKQpTD14X5cyCRi42H9FdJokQQrXOcCYplcYzsxeQFFwsi17M1nJu8E0ZT0NCPANLxwUvQlE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Here is v2 third series of patches to mm (and a few architectures), based on v6.4-rc5 with the preceding two series applied: in which khugepaged takes advantage of pte_offset_map[_lock]() allowing for pmd transitions. Differences from v1 are noted patch by patch below This follows on from the v2 "arch: allow pte_offset_map[_lock]() to fail" https://lore.kernel.org/linux-mm/a4963be9-7aa6-350-66d0-2ba843e1af44@google.com/ series of 23 posted on 2023-06-08 (and now in mm-stable - thank you), and the v2 "mm: allow pte_offset_map[_lock]() to fail" https://lore.kernel.org/linux-mm/c1c9a74a-bc5b-15ea-e5d2-8ec34bc921d@google.com/ series of 32 posted on 2023-06-08 (and now in mm-stable - thank you), and replaces the v1 "mm: free retracted page table by RCU" https://lore.kernel.org/linux-mm/35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com/ series of 12 posted on 2023-05-28 (which was bad on powerpc and s390). The first two series were "independent": neither depending for build or correctness on the other, but both series had to be in before this third series is added to make the effective changes; and it would probably be best to hold this series back until the following release, since it might now reveal missed imbalances which the first series hoped to fix. What is it all about? Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs: the usefulness of MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it currently requires. Likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. Based on the preceding two series over v6.4-rc5, or any v6.4-rc; and almost good on current mm-everything or current linux-next - just one patch conflicts, the 10/12: I'll reply to that one with its mm-everything or linux-next equivalent (ptent replacing *pte). 01/12 mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s v2: same as v1 02/12 mm/pgtable: add PAE safety to __pte_offset_map() v2: rename to pmdp_get_lockless_start/end() per Matthew; so use inlines without _irq_save(flags) macro oddity; add pmdp_get_lockless_sync() for use later in 09/12. 03/12 arm: adjust_pte() use pte_offset_map_nolock() v2: same as v1 04/12 powerpc: assert_pte_locked() use pte_offset_map_nolock() v2: same as v1 05/12 powerpc: add pte_free_defer() for pgtables sharing page v2: fix rcu_head usage to cope with concurrent deferrals; add para to commit message explaining rcu_head issue. 06/12 sparc: add pte_free_defer() for pte_t *pgtable_t v2: use page_address() instead of less common page_to_virt(); add para to commit message explaining simple conversion; changed title since sparc64 pgtables do not share page. 07/12 s390: add pte_free_defer() for pgtables sharing page v2: complete rewrite, integrated with s390's existing pgtable management; temporarily using a global mm_pgtable_list_lock, to be restored to per-mm spinlock in a later followup patch. 08/12 mm/pgtable: add pte_free_defer() for pgtable as page v2: add comment on rcu_head to "Page table pages", per JannH 09/12 mm/khugepaged: retract_page_tables() without mmap or vma lock v2: repeat checks under ptl because UFFD, per PeterX and JannH; bring back mmu_notifier calls for PMD, per JannH and Jason; pmdp_get_lockless_sync() to issue missing interrupt if PAE. 10/12 mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() v2: first check VMA, in case page tables torn down, per JannH; pmdp_get_lockless_sync() to issue missing interrupt if PAE; moved mmu_notifier after step 1, reworked final goto labels. 11/12 mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() v2: same as v1 12/12 mm: delete mmap_write_trylock() and vma_try_start_write() v2: same as v1 arch/arm/mm/fault-armv.c | 3 +- arch/powerpc/include/asm/pgalloc.h | 4 + arch/powerpc/mm/pgtable-frag.c | 51 ++++ arch/powerpc/mm/pgtable.c | 16 +- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 205 +++++++++---- arch/sparc/include/asm/pgalloc_64.h | 4 + arch/sparc/mm/init_64.c | 16 + include/linux/mm.h | 17 -- include/linux/mm_types.h | 6 +- include/linux/mmap_lock.h | 10 - include/linux/pgtable.h | 10 +- mm/khugepaged.c | 481 +++++++++++------------------- mm/pgtable-generic.c | 53 +++- 14 files changed, 467 insertions(+), 413 deletions(-) Hugh