From patchwork Mon May 22 04:53:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13249754 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 675AAC7EE23 for ; Mon, 22 May 2023 04:53:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B21E900006; Mon, 22 May 2023 00:53:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03AE7900002; Mon, 22 May 2023 00:53:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD1F0900006; Mon, 22 May 2023 00:53:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C3A94900002 for ; Mon, 22 May 2023 00:53:35 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 97FBE140443 for ; Mon, 22 May 2023 04:53:35 +0000 (UTC) X-FDA: 80816672790.20.8BF2587 Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf26.hostedemail.com (Postfix) with ESMTP id B109614000F for ; Mon, 22 May 2023 04:53:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=QlhUE6K0; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684731213; a=rsa-sha256; cv=none; b=fCFp8OT91ROU62YfUzNpgwTj6kbXI+78vjFsrCmcv9kVxUKaDlGDbhXQCLzPCFyWqBOYVI UNtlZP8onEBTEUhrz8xmZ+zJF4dl1vgBOPbPG7lENmHuG1Qm4MnmRqGZZM05tLszS+py6N qeiI7gbaavmIv4laxYTnRSAuI8PubDU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=QlhUE6K0; spf=pass (imf26.hostedemail.com: domain of hughd@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684731213; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=jDUDXwIpIgSCONn3XO0lzR4EWXvhX64jE/Wnxjp90SX2aCrAcxJsE799L8Un/IW6pPDmL2 kzhnzL6DpKoNEDftgwkB9VvWedrwT1EBB6RJVsmu3OvLcnk9exswTdCztVNAIDK++t8hg3 ZHG37Dc/61Q3IsO4BEejlAJ5vBHWJic= Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-b9daef8681fso4777470276.1 for ; Sun, 21 May 2023 21:53:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684731213; x=1687323213; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=QlhUE6K0ATPFlt6KGVkJhXqSKL2ZPZOFukSr2LTyU5JBI1yxIy3RC9hQaumaw7cyuQ ADgG035cRuNduod0FxNhIZ0JZ4NHX+U32x2wiEFQa00gNFi56c3mSK3uiJdpEyXYIUsT ROwP4RLPq5CG77EbsJtEYQtkmgcTOVN5CnUNanq9+UFDyx0pkaHUzcdHI1z5w0XPUSmi o2MufvhNsbrbye+c4cFuachrzWPeZh2+I5zinfMV/dyW4naX4xe0MJ9MR0i12bBzFRCj 8PLk4Zikwq2ioFjQFgXXXofFctLjXIybQaGA/r+no+6O/pmdY3s7wDE/uJygGJA7s2KN wLcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684731213; x=1687323213; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aQBwLL4HlECbSO5gR059+G6dephAddbWEgO6E2LDWZA=; b=TmjxKc7AG6Mx7mvv5LdAvfwZCcbj/vAqcsiNh6ITraYzWF6WuokFKaf6yxdXW73qLs kwEFD+TlOP41TPaoxxXAO4bk2kxSA//Mx8ApqLtm4iKzDWjIR3A9aO3cGsCG9qE5aSBG MuCDg4sUMJGaB3Qkqiq3QPzAE2xFmsQKRaYuMTZwvE15WNcyoy5PMb+swX3m2VhhrbNZ 6LNy/A6qluR6sdQZg/YNF8Gdf0HYes1IFodKwmnQ7DC86TYPkakhOQJMS59ha9vq8Roa Tkb34e79bDNje51E8y7QPC9YX6nilefF4OYuKAV+zw2DQu4sp91L5OTUO4azOYvKiQKw HCBA== X-Gm-Message-State: AC+VfDykt/VnHW945pmcrfawhTr9ozSe/Cmmdk9HNCZ59l1KbK+xzEZI o9NDwa5OC17ehg81n5uycoe4OA== X-Google-Smtp-Source: ACHHUZ4a/wUx7HBHObnjrL9fmwJm9cuLSPhXj2yvWOXUl7AxoJ4/x4VelqXRKeaAtFJsJWrg4u9aMA== X-Received: by 2002:a25:cfc6:0:b0:ba8:62ed:2221 with SMTP id f189-20020a25cfc6000000b00ba862ed2221mr9802761ybg.62.1684731212645; Sun, 21 May 2023 21:53:32 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x7-20020a259a07000000b00b8f6ec5a955sm1255333ybn.49.2023.05.21.21.53.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 May 2023 21:53:32 -0700 (PDT) Date: Sun, 21 May 2023 21:53:28 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail In-Reply-To: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> Message-ID: <8218ffdc-8be-54e5-0a8-83f5542af283@google.com> References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B109614000F X-Stat-Signature: 386khxmz649xez63rfz7a65wphn735ym X-Rspam-User: X-HE-Tag: 1684731213-925835 X-HE-Meta: U2FsdGVkX19Dc5KBMnzvo6EKH6IF2w0w/daHfgx3kIANKrfEQrx3vOK/o51P3l/MSTOtV9nL7Xu5TXRmEV8D98KBSQp+b7yDE6mzl0FKokjbN5HZAG9JHlZTOvmMTqM5eBhWvnLg7/VtyAkF9hgiAzM+S87vue4dEEkxOgEsoIU/OVX3Pkxr3wb4htqOt31/WeyE6Vbisv4ZePAsNgu8MbidGl41fhljH6bkQhoELnmXte/BIHG4SX2cIvQjWMoQ8eFAxaTVVretdOX8R91+jJIhTtRosML2rsEn5p08LSW95Lc78rqykrYpyFA3cq2kGOA9UWTBTvSnMmB1vf5hZP+hmshQb1LGK665e0lhR4ERxq6FsChK12BvCq6GaAEQa4VnJyybHqCeJ/QuIraA+0g+oE6/+kyNHd9zQOXUBBkQZMvxUAUli6Xc/5/Ui7jlduShCt8RmncF+C5lZXxXnHn3EPXuqbctULdIcFtTGGCv8qvMYlwXqnLWAT640ibVgK/7lq+wlUgEcdAVh/tUta1UmL1JVwjzIuFRRV3g5T3KRfUPTdyKBnmuce1qa+1MBElLysEug+zyz3HC11tV7jTHqWLlcH+1s8s47uiCAklySLSxZ1x6LWTd63NDAp9MIgRh8wzVQhXgGZ13CHYMla7CWC/JHzSkuzXWgoi0tkZfk7fi87eS/TkmkmYwWlqBMoKXus5Ofm8t5TQcJyAYHL/nH3wqT7O9CbjQ1PauBarTg66QwGHelB/TE9e5qdF0idO0uF6IjvUEwRKtp+7I4TT4AbOYqB2WUqJXHdAgl+jfca4szs3djeyEGimfDPXvX6aZnSB2s7+BzzhXnllW/GOfl4HUX2CMCo4HCrveGAAFQlzKpiFlxm9GpZq9brKsI4RPky84uHLwT2jYaJsZKvcyd+/Es6Q6T3kwNYOikiXX2nQ25Yaypa6lvLmJKOiz1T3g2Z1vQWMun6kRpdi 9C9lGUYt vdfHjLSMInhC/QSEdbjubEmFDkunf29LlwrcN97KSpebAyRxFvrduDUulJm8n9uE4n/CeABuG6IOuFeZ+F4oP9pz9mYkLUaTSNUnLeyEhN9DBH9AcJzZS/u2BZazD8/ogDWtd4Q0N1ii88kLOfb1yo2sEF5Tps02hnfbID+dGGVpODbfZC37u1e5lWnnBwXpq4/cWZ0LTsTPysiqWEiBqSe3W2i857Hzm1zekiTlJcsnW66o+LNkUGoGXQvDArntp4zwQ+pduYOrW7zieYsX3ih+NaoMOT+hbV8stk4kdBJnGxsxrA+ELfrKSbeGgr5ASJmy4fIg4XbOAgWW9PBmFUT8cZANOcPcPqQEISu4IibaRJhiYfESBPtdLsTBDy85Kd3v8Dph8w3TxGh/7FRIxP3r3JFajhMJ5gQZ6aiPxj8fZCjiXsOMEInAxmg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make pte_offset_map() a wrapper for __pte_offset_map() (optionally outputs pmdval), pte_offset_map_lock() a sparse __cond_lock wrapper for __pte_offset_map_lock(): those __funcs added in mm/pgtable-generic.c. __pte_offset_map() do pmdval validation (including pmd_clear_bad() when pmd_bad()), returning NULL if pmdval is not for a page table. __pte_offset_map_lock() verify pmdval unchanged after getting the lock, trying again if it changed. No #ifdef CONFIG_TRANSPARENT_HUGEPAGE around them: that could be done to cover the imminent case, but we expect to generalize it later, and it makes a mess of where to do the pmd_bad() clearing. Add pte_offset_map_nolock(): outputs ptl like pte_offset_map_lock(), without actually taking the lock. This will be preferred to open uses of pte_lockptr(), because (when split ptlock is in page table's struct page) it points to the right lock for the returned pte pointer, even if *pmd gets changed racily afterwards. Update corresponding Documentation. Do not add the anticipated rcu_read_lock() and rcu_read_unlock()s yet: they have to wait until all architectures are balancing pte_offset_map()s with pte_unmap()s (as in the arch series posted earlier). But comment where they will go, so that it's easy to add them for experiments. And only when those are in place can transient racy failure cases be enabled. Add more safety for the PAE mismatched pmd_low pmd_high case at that time. Signed-off-by: Hugh Dickins --- Documentation/mm/split_page_table_lock.rst | 17 ++++--- include/linux/mm.h | 27 +++++++---- include/linux/pgtable.h | 22 ++++++--- mm/pgtable-generic.c | 56 ++++++++++++++++++++++ 4 files changed, 101 insertions(+), 21 deletions(-) diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index 50ee0dfc95be..a834fad9de12 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -14,15 +14,20 @@ tables. Access to higher level tables protected by mm->page_table_lock. There are helpers to lock/unlock a table and other accessor functions: - pte_offset_map_lock() - maps pte and takes PTE table lock, returns pointer to the taken - lock; + maps PTE and takes PTE table lock, returns pointer to PTE with + pointer to its PTE table lock, or returns NULL if no PTE table; + - pte_offset_map_nolock() + maps PTE, returns pointer to PTE with pointer to its PTE table + lock (not taken), or returns NULL if no PTE table; + - pte_offset_map() + maps PTE, returns pointer to PTE, or returns NULL if no PTE table; + - pte_unmap() + unmaps PTE table; - pte_unmap_unlock() unlocks and unmaps PTE table; - pte_alloc_map_lock() - allocates PTE table if needed and take the lock, returns pointer - to taken lock or NULL if allocation failed; - - pte_lockptr() - returns pointer to PTE table lock; + allocates PTE table if needed and takes its lock, returns pointer to + PTE with pointer to its lock, or returns NULL if allocation failed; - pmd_lock() takes PMD table lock, returns pointer to taken lock; - pmd_lockptr() diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..3c2e56980853 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2787,14 +2787,25 @@ static inline void pgtable_pte_page_dtor(struct page *page) dec_lruvec_page_state(page, NR_PAGETABLE); } -#define pte_offset_map_lock(mm, pmd, address, ptlp) \ -({ \ - spinlock_t *__ptl = pte_lockptr(mm, pmd); \ - pte_t *__pte = pte_offset_map(pmd, address); \ - *(ptlp) = __ptl; \ - spin_lock(__ptl); \ - __pte; \ -}) +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp); +static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr) +{ + return __pte_offset_map(pmd, addr, NULL); +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); +static inline pte_t *pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pte_t *pte; + + __cond_lock(*ptlp, pte = __pte_offset_map_lock(mm, pmd, addr, ptlp)); + return pte; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp); #define pte_unmap_unlock(pte, ptl) do { \ spin_unlock(ptl); \ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94235ff2706e..3fabbb018557 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -94,14 +94,22 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) #define pte_offset_kernel pte_offset_kernel #endif -#if defined(CONFIG_HIGHPTE) -#define pte_offset_map(dir, address) \ - ((pte_t *)kmap_local_page(pmd_page(*(dir))) + \ - pte_index((address))) -#define pte_unmap(pte) kunmap_local((pte)) +#ifdef CONFIG_HIGHPTE +#define __pte_map(pmd, address) \ + ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) +#define pte_unmap(pte) do { \ + kunmap_local((pte)); \ + /* rcu_read_unlock() to be added later */ \ +} while (0) #else -#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) -#define pte_unmap(pte) ((void)(pte)) /* NOP */ +static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) +{ + return pte_offset_kernel(pmd, address); +} +static inline void pte_unmap(pte_t *pte) +{ + /* rcu_read_unlock() to be added later */ +} #endif /* Find an entry in the second-level page table.. */ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d2fc52bffafc..c7ab18a5fb77 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -10,6 +10,8 @@ #include #include #include +#include +#include #include #include @@ -229,3 +231,57 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, } #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) +{ + pmd_t pmdval; + + /* rcu_read_lock() to be added later */ + pmdval = pmdp_get_lockless(pmd); + if (pmdvalp) + *pmdvalp = pmdval; + if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) + goto nomap; + if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + goto nomap; + if (unlikely(pmd_bad(pmdval))) { + pmd_clear_bad(pmd); + goto nomap; + } + return __pte_map(&pmdval, addr); +nomap: + /* rcu_read_unlock() to be added later */ + return NULL; +} + +pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + pmd_t pmdval; + pte_t *pte; + + pte = __pte_offset_map(pmd, addr, &pmdval); + if (likely(pte)) + *ptlp = pte_lockptr(mm, &pmdval); + return pte; +} + +pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, spinlock_t **ptlp) +{ + spinlock_t *ptl; + pmd_t pmdval; + pte_t *pte; +again: + pte = __pte_offset_map(pmd, addr, &pmdval); + if (unlikely(!pte)) + return pte; + ptl = pte_lockptr(mm, &pmdval); + spin_lock(ptl); + if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { + *ptlp = ptl; + return pte; + } + pte_unmap_unlock(pte, ptl); + goto again; +}