From patchwork Wed Jun 16 01:23:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AFD9C48BDF for ; Wed, 16 Jun 2021 01:23:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4930C61159 for ; Wed, 16 Jun 2021 01:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4930C61159 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E13F96B006E; Tue, 15 Jun 2021 21:23:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC37A6B0070; Tue, 15 Jun 2021 21:23:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3D936B0071; Tue, 15 Jun 2021 21:23:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 8E48F6B006E for ; Tue, 15 Jun 2021 21:23:15 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 21091180AD806 for ; Wed, 16 Jun 2021 01:23:15 +0000 (UTC) X-FDA: 78257838750.40.1CF3066 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf24.hostedemail.com (Postfix) with ESMTP id DCFDEA00014E for ; Wed, 16 Jun 2021 01:23:04 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 9967161356; Wed, 16 Jun 2021 01:23:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806593; bh=rHRSsmAz4Uacfxb5kv+BqE8Wo9Zjx9aIfdgIONWU1Ik=; h=Date:From:To:Subject:In-Reply-To:From; b=Ok83/KO1DN1KISf3IdQFiu2ZhVujKe5PNr6nZZmz/1eJ+Ml/zMhMF2zSFyLaogOWA kVa++kAqst7Se+YUlvUIPii+wvJzaMaK+ZZFQdThILSL4pCnVZ7ZFMDUxcRLxtQMcj M5cIQLR2Qv8UprUpSn+d51+139s/oJ8YZBECXR1A= Date: Tue, 15 Jun 2021 18:23:13 -0700 From: Andrew Morton To: akpm@linux-foundation.org, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, songmuchun@bytedance.com, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org Subject: [patch 01/18] mm,hwpoison: fix race with hugetlb page allocation Message-ID: <20210616012313.cySUhIHf_%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="Ok83/KO1"; dmarc=none; spf=pass (imf24.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Stat-Signature: dggpmepwd688acbntwmf65jwkcawfwpq X-Rspamd-Queue-Id: DCFDEA00014E X-HE-Tag: 1623806584-399453 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Subject: mm,hwpoison: fix race with hugetlb page allocation When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is not enough because there's a time window where compound pages have non-zero refcount during hugetlb page initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags under hugetlb_lock makes sure that the hugetlb page is not transitive. It's notable that another new function, HWPoisonHandlable(), is helpful to prevent a race against other transitive page states (like a generic compound page just before PageHuge becomes true). Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi Reported-by: Muchun Song Acked-by: Mike Kravetz Cc: Oscar Salvador Cc: Michal Hocko Cc: Tony Luck Cc: [5.12+] Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 15 +++++++++++++++ mm/memory-failure.c | 29 +++++++++++++++++++++++++++-- 3 files changed, 48 insertions(+), 2 deletions(-) --- a/include/linux/hugetlb.h~mmhwpoison-fix-race-with-hugetlb-page-allocation +++ a/include/linux/hugetlb.h @@ -149,6 +149,7 @@ bool hugetlb_reserve_pages(struct inode long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); bool isolate_huge_page(struct page *page, struct list_head *list); +int get_hwpoison_huge_page(struct page *page, bool *hugetlb); void putback_active_hugepage(struct page *page); void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); @@ -339,6 +340,11 @@ static inline bool isolate_huge_page(str return false; } +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + return 0; +} + static inline void putback_active_hugepage(struct page *page) { } --- a/mm/hugetlb.c~mmhwpoison-fix-race-with-hugetlb-page-allocation +++ a/mm/hugetlb.c @@ -5857,6 +5857,21 @@ unlock: return ret; } +int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + int ret = 0; + + *hugetlb = false; + spin_lock_irq(&hugetlb_lock); + if (PageHeadHuge(page)) { + *hugetlb = true; + if (HPageFreed(page) || HPageMigratable(page)) + ret = get_page_unless_zero(page); + } + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void putback_active_hugepage(struct page *page) { spin_lock_irq(&hugetlb_lock); --- a/mm/memory-failure.c~mmhwpoison-fix-race-with-hugetlb-page-allocation +++ a/mm/memory-failure.c @@ -949,6 +949,17 @@ static int page_action(struct page_state return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY; } +/* + * Return true if a page type of a given page is supported by hwpoison + * mechanism (while handling could fail), otherwise false. This function + * does not return true for hugetlb or device memory pages, so it's assumed + * to be called only in the context where we never have such pages. + */ +static inline bool HWPoisonHandlable(struct page *page) +{ + return PageLRU(page) || __PageMovable(page); +} + /** * __get_hwpoison_page() - Get refcount for memory error handling: * @page: raw error page (hit by memory error) @@ -959,8 +970,22 @@ static int page_action(struct page_state static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); + int ret = 0; + bool hugetlb = false; + + ret = get_hwpoison_huge_page(head, &hugetlb); + if (hugetlb) + return ret; + + /* + * This check prevents from calling get_hwpoison_unless_zero() + * for any unsupported type of page in order to reduce the risk of + * unexpected races caused by taking a page refcount. + */ + if (!HWPoisonHandlable(head)) + return 0; - if (!PageHuge(head) && PageTransHuge(head)) { + if (PageTransHuge(head)) { /* * Non anonymous thp exists only in allocation/free time. We * can't handle such a case correctly, so let's give it up. @@ -1017,7 +1042,7 @@ try_again: ret = -EIO; } } else { - if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) { + if (PageHuge(p) || HWPoisonHandlable(p)) { ret = 1; } else { /* From patchwork Wed Jun 16 01:23:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CF2EC48BE5 for ; Wed, 16 Jun 2021 01:23:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB7D7613B1 for ; Wed, 16 Jun 2021 01:23:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB7D7613B1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7F93E6B0070; Tue, 15 Jun 2021 21:23:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A8A36B0071; Tue, 15 Jun 2021 21:23:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 622766B0072; Tue, 15 Jun 2021 21:23:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 2EAC66B0070 for ; Tue, 15 Jun 2021 21:23:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CA6B3DE04 for ; Wed, 16 Jun 2021 01:23:19 +0000 (UTC) X-FDA: 78257838918.06.2718DB7 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf22.hostedemail.com (Postfix) with ESMTP id 16AF3C001C7C for ; Wed, 16 Jun 2021 01:23:08 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id B686761369; Wed, 16 Jun 2021 01:23:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806597; bh=QKASjZC4YBYY3DdrkcmavZTXmy7gVhANBn05PKoBZ3w=; h=Date:From:To:Subject:In-Reply-To:From; b=ksOAFJ62ofW1ORrI9S7Uf4Wi7DUzcb0IYHUeoJck6SlHlrKy80qqXTw91DoA0U6yx CxX1DCtI62JW4MMTUs3IEksmU17XNgb3CLyYGrDV3+4PpBJ5lkWUg2D00sggOKCHGB /M72msU3aCtN4Uo3ydZ1zBY/UNIH972fRGqBIDB8= Date: Tue, 15 Jun 2021 18:23:16 -0700 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, hughd@google.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, peterx@redhat.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 02/18] mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare Message-ID: <20210616012316.0gJpsRaYU%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=ksOAFJ62; dmarc=none; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: uosf99csbpwdn6e5majeren6wzdr3p8t X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 16AF3C001C7C X-HE-Tag: 1623806588-910009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Subject: mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu Acked-by: Hugh Dickins Cc: Andrea Arcangeli Cc: [5.7+] Signed-off-by: Andrew Morton --- include/linux/swapops.h | 15 +++++++++++---- mm/swapfile.c | 2 +- 2 files changed, 12 insertions(+), 5 deletions(-) --- a/include/linux/swapops.h~mm-swap-fix-pte_same_as_swp-not-removing-uffd-wp-bit-when-compare +++ a/include/linux/swapops.h @@ -23,6 +23,16 @@ #define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT) #define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1) +/* Clear all flags but only keep swp_entry_t related information */ +static inline pte_t pte_swp_clear_flags(pte_t pte) +{ + if (pte_swp_soft_dirty(pte)) + pte = pte_swp_clear_soft_dirty(pte); + if (pte_swp_uffd_wp(pte)) + pte = pte_swp_clear_uffd_wp(pte); + return pte; +} + /* * Store a type+offset into a swp_entry_t in an arch-independent format */ @@ -66,10 +76,7 @@ static inline swp_entry_t pte_to_swp_ent { swp_entry_t arch_entry; - if (pte_swp_soft_dirty(pte)) - pte = pte_swp_clear_soft_dirty(pte); - if (pte_swp_uffd_wp(pte)) - pte = pte_swp_clear_uffd_wp(pte); + pte = pte_swp_clear_flags(pte); arch_entry = __pte_to_swp_entry(pte); return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } --- a/mm/swapfile.c~mm-swap-fix-pte_same_as_swp-not-removing-uffd-wp-bit-when-compare +++ a/mm/swapfile.c @@ -1900,7 +1900,7 @@ unsigned int count_swap_pages(int type, static inline int pte_same_as_swp(pte_t pte, pte_t swp_pte) { - return pte_same(pte_swp_clear_soft_dirty(pte), swp_pte); + return pte_same(pte_swp_clear_flags(pte), swp_pte); } /* From patchwork Wed Jun 16 01:23:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FAC4C48BDF for ; Wed, 16 Jun 2021 01:23:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 068E1613B1 for ; Wed, 16 Jun 2021 01:23:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 068E1613B1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9CD5E6B0071; Tue, 15 Jun 2021 21:23:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 957146B0072; Tue, 15 Jun 2021 21:23:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F82F6B0073; Tue, 15 Jun 2021 21:23:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 4C5CE6B0071 for ; Tue, 15 Jun 2021 21:23:23 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E8DE9181AC9C6 for ; Wed, 16 Jun 2021 01:23:22 +0000 (UTC) X-FDA: 78257839044.25.39F1A60 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf21.hostedemail.com (Postfix) with ESMTP id 98AF3E000253 for ; Wed, 16 Jun 2021 01:23:09 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D586161159; Wed, 16 Jun 2021 01:23:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806600; bh=HJt7DgSPqPsuOFn2TSXEtOS2EZSr5q/s8Ly2AowAmEM=; h=Date:From:To:Subject:In-Reply-To:From; b=c/s3T6+2F/MJf4hloxOFzLhvBaHiYQ7hTrFUCqagwCwlgLqTA7K9ZPYsNiunVRXnt aJFHIsbeyR2f49ErwQ4a7YWlUKkP9WVl//nihZVwPZIRcaHppJfqaY1qy8f1mjH0cN nwFYD0cOsMXXQlDT+QKWxxvSQTrrbvyjGTqD7eXU= Date: Tue, 15 Jun 2021 18:23:19 -0700 From: Andrew Morton To: akpm@linux-foundation.org, cl@linux.com, elver@google.com, guro@fb.com, iamjoonsoo.kim@lge.com, keescook@chromium.org, linux-mm@kvack.org, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, zplin@psu.edu Subject: [patch 03/18] mm/slub: clarify verification reporting Message-ID: <20210616012319.QR2BB6iJq%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="c/s3T6+2"; dmarc=none; spf=pass (imf21.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Stat-Signature: y85ta51o49dnys1nefz4bypc6dqik8r3 X-Rspamd-Queue-Id: 98AF3E000253 X-HE-Tag: 1623806589-810233 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kees Cook Subject: mm/slub: clarify verification reporting Patch series "Actually fix freelist pointer vs redzoning", v4. This fixes redzoning vs the freelist pointer (both for middle-position and very small caches). Both are "theoretical" fixes, in that I see no evidence of such small-sized caches actually be used in the kernel, but that's no reason to let the bugs continue to exist, especially since people doing local development keep tripping over it. :) This patch (of 3): Instead of repeating "Redzone" and "Poison", clarify which sides of those zones got tripped. Additionally fix column alignment in the trailer. Before: BUG test (Tainted: G B ): Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ After: BUG test (Tainted: G B ): Right Redzone overwritten ... Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ The earlier commits that slowly resulted in the "Before" reporting were: d86bd1bece6f ("mm/slub: support left redzone") ffc79d288000 ("slub: use print_hex_dump") 2492268472e7 ("SLUB: change error reporting format to follow lockdep loosely") Link: https://lkml.kernel.org/r/20210608183955.280836-1-keescook@chromium.org Link: https://lkml.kernel.org/r/20210608183955.280836-2-keescook@chromium.org Link: https://lore.kernel.org/lkml/cfdb11d7-fb8e-e578-c939-f7f5fb69a6bd@suse.cz/ Signed-off-by: Kees Cook Acked-by: Vlastimil Babka Cc: Marco Elver Cc: "Lin, Zhenpeng" Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Roman Gushchin Cc: Signed-off-by: Andrew Morton --- Documentation/vm/slub.rst | 10 +++++----- mm/slub.c | 14 +++++++------- 2 files changed, 12 insertions(+), 12 deletions(-) --- a/Documentation/vm/slub.rst~mm-slub-clarify-verification-reporting +++ a/Documentation/vm/slub.rst @@ -181,7 +181,7 @@ SLUB Debug output Here is a sample of slub debug output:: ==================================================================== - BUG kmalloc-8: Redzone overwritten + BUG kmalloc-8: Right Redzone overwritten -------------------------------------------------------------------- INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc @@ -189,10 +189,10 @@ Here is a sample of slub debug output:: INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58 INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554 - Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ - Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 - Redzone 0xc90f6d28: 00 cc cc cc . - Padding 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ + Bytes b4 (0xc90f6d10): 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ + Object (0xc90f6d20): 31 30 31 39 2e 30 30 35 1019.005 + Redzone (0xc90f6d28): 00 cc cc cc . + Padding (0xc90f6d50): 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [] dump_trace+0x63/0x1eb [] show_trace_log_lvl+0x1a/0x2f --- a/mm/slub.c~mm-slub-clarify-verification-reporting +++ a/mm/slub.c @@ -712,15 +712,15 @@ static void print_trailer(struct kmem_ca p, p - addr, get_freepointer(s, p)); if (s->flags & SLAB_RED_ZONE) - print_section(KERN_ERR, "Redzone ", p - s->red_left_pad, + print_section(KERN_ERR, "Redzone ", p - s->red_left_pad, s->red_left_pad); else if (p > addr + 16) print_section(KERN_ERR, "Bytes b4 ", p - 16, 16); - print_section(KERN_ERR, "Object ", p, + print_section(KERN_ERR, "Object ", p, min_t(unsigned int, s->object_size, PAGE_SIZE)); if (s->flags & SLAB_RED_ZONE) - print_section(KERN_ERR, "Redzone ", p + s->object_size, + print_section(KERN_ERR, "Redzone ", p + s->object_size, s->inuse - s->object_size); off = get_info_end(s); @@ -732,7 +732,7 @@ static void print_trailer(struct kmem_ca if (off != size_from_object(s)) /* Beginning of the filler is the free pointer */ - print_section(KERN_ERR, "Padding ", p + off, + print_section(KERN_ERR, "Padding ", p + off, size_from_object(s) - off); dump_stack(); @@ -909,11 +909,11 @@ static int check_object(struct kmem_cach u8 *endobject = object + s->object_size; if (s->flags & SLAB_RED_ZONE) { - if (!check_bytes_and_report(s, page, object, "Redzone", + if (!check_bytes_and_report(s, page, object, "Left Redzone", object - s->red_left_pad, val, s->red_left_pad)) return 0; - if (!check_bytes_and_report(s, page, object, "Redzone", + if (!check_bytes_and_report(s, page, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; } else { @@ -928,7 +928,7 @@ static int check_object(struct kmem_cach if (val != SLUB_RED_ACTIVE && (s->flags & __OBJECT_POISON) && (!check_bytes_and_report(s, page, p, "Poison", p, POISON_FREE, s->object_size - 1) || - !check_bytes_and_report(s, page, p, "Poison", + !check_bytes_and_report(s, page, p, "End Poison", p + s->object_size - 1, POISON_END, 1))) return 0; /* From patchwork Wed Jun 16 01:23:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22F1CC48BE5 for ; Wed, 16 Jun 2021 01:23:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C6D28613B1 for ; Wed, 16 Jun 2021 01:23:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6D28613B1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 69B276B0072; Tue, 15 Jun 2021 21:23:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 625DB6B0073; Tue, 15 Jun 2021 21:23:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C2D26B0074; Tue, 15 Jun 2021 21:23:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id 158DB6B0072 for ; Tue, 15 Jun 2021 21:23:25 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AF5A3180AD806 for ; Wed, 16 Jun 2021 01:23:24 +0000 (UTC) X-FDA: 78257839128.28.15D39B2 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf05.hostedemail.com (Postfix) with ESMTP id B557EE000240 for ; Wed, 16 Jun 2021 01:23:11 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 3578E61356; Wed, 16 Jun 2021 01:23:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806603; bh=JDA+uQj+K1d3SfrTSfO9ZzUv+CldmXRZMGxAbidp56s=; h=Date:From:To:Subject:In-Reply-To:From; b=IEaireWPGi22na24aSwrqMIPfH7JQShS37nF5Hpi5B+syAIRPUyTQemJWta9zylgm ZdqhAxBdSeAUghswITKDbyI8zGaEwgkFX/8LzqHcgPD+gV1HIPPBLGlVfZ9KfY9KnJ VDzJnGtNGqmC1LsyPCTu+Im8/VdJOpZuF9PTYhFo= Date: Tue, 15 Jun 2021 18:23:22 -0700 From: Andrew Morton To: akpm@linux-foundation.org, cl@linux.com, elver@google.com, guro@fb.com, iamjoonsoo.kim@lge.com, keescook@chromium.org, linux-mm@kvack.org, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, zplin@psu.edu Subject: [patch 04/18] mm/slub: fix redzoning for small allocations Message-ID: <20210616012322.mtEbkr1Dt%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=IEaireWP; spf=pass (imf05.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: qycuwqb9thm7uiniiegas5ygq13mi91x X-Rspamd-Queue-Id: B557EE000240 X-Rspamd-Server: rspam06 X-HE-Tag: 1623806591-324296 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kees Cook Subject: mm/slub: fix redzoning for small allocations The redzone area for SLUB exists between s->object_size and s->inuse (which is at least the word-aligned object_size). If a cache were created with an object_size smaller than sizeof(void *), the in-object stored freelist pointer would overwrite the redzone (e.g. with boot param "slub_debug=ZF"): BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): f6 f4 a5 40 1d e8 ...@.. Redzone (____ptrval____): 1a aa .. Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Store the freelist pointer out of line when object_size is smaller than sizeof(void *) and redzoning is enabled. Additionally remove the "smaller than sizeof(void *)" check under CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant: SLAB and SLOB both handle small sizes. (Note that no caches within this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org Fixes: 81819f0fc828 ("SLUB core") Signed-off-by: Kees Cook Acked-by: Vlastimil Babka Cc: Christoph Lameter Cc: David Rientjes Cc: Joonsoo Kim Cc: "Lin, Zhenpeng" Cc: Marco Elver Cc: Pekka Enberg Cc: Roman Gushchin Cc: Signed-off-by: Andrew Morton --- mm/slab_common.c | 3 +-- mm/slub.c | 8 +++++--- 2 files changed, 6 insertions(+), 5 deletions(-) --- a/mm/slab_common.c~mm-slub-fix-redzoning-for-small-allocations +++ a/mm/slab_common.c @@ -97,8 +97,7 @@ EXPORT_SYMBOL(kmem_cache_size); #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, unsigned int size) { - if (!name || in_interrupt() || size < sizeof(void *) || - size > KMALLOC_MAX_SIZE) { + if (!name || in_interrupt() || size > KMALLOC_MAX_SIZE) { pr_err("kmem_cache_create(%s) integrity check failed\n", name); return -EINVAL; } --- a/mm/slub.c~mm-slub-fix-redzoning-for-small-allocations +++ a/mm/slub.c @@ -3734,15 +3734,17 @@ static int calculate_sizes(struct kmem_c */ s->inuse = size; - if (((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || - s->ctor)) { + if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || + s->ctor) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on * kmem_cache_free. * * This is the case if we do RCU, have a constructor or - * destructor or are poisoning the objects. + * destructor, are poisoning the objects, or are + * redzoning an object smaller than sizeof(void *). * * The assumption that s->offset >= s->inuse means free * pointer is outside of the object is used in the From patchwork Wed Jun 16 01:23:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97B2BC48BE8 for ; Wed, 16 Jun 2021 01:23:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4BCED613B9 for ; Wed, 16 Jun 2021 01:23:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4BCED613B9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E3B1C6B0073; Tue, 15 Jun 2021 21:23:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEAEE6B0074; Tue, 15 Jun 2021 21:23:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C64826B0075; Tue, 15 Jun 2021 21:23:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 93DA76B0073 for ; Tue, 15 Jun 2021 21:23:28 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 313BD181AC9C6 for ; Wed, 16 Jun 2021 01:23:28 +0000 (UTC) X-FDA: 78257839296.29.AC3B77F Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf07.hostedemail.com (Postfix) with ESMTP id 9F8FFA0001CF for ; Wed, 16 Jun 2021 01:23:17 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 93138613B1; Wed, 16 Jun 2021 01:23:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806607; bh=iuk25coWgG0AAs1zL64BHgOLlaQBlGSxs8NbXFzAd7c=; h=Date:From:To:Subject:In-Reply-To:From; b=wUQVJJWLDvag7HuirUUUv5eApobvME8tDFcmU3XK8mEyFAVn5dNGf0lNTLwcPbc+r XT+wq1a5V7V1xm5c5sDn6ZMi8D56KTefFGZS9aPHpwHdfkcmi6Zr57midAmG0QylQV ihwujzUenl8lOjhLLvp/+978pIfiJvgd36QH3TJY= Date: Tue, 15 Jun 2021 18:23:26 -0700 From: Andrew Morton To: akpm@linux-foundation.org, cl@linux.com, elver@google.com, guro@fb.com, iamjoonsoo.kim@lge.com, keescook@chromium.org, linux-mm@kvack.org, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, zplin@psu.edu Subject: [patch 05/18] mm/slub: actually fix freelist pointer vs redzoning Message-ID: <20210616012326.IlNBEvZ__%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=wUQVJJWL; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: r67crmdd9twpb96kp5ppbdbj1bdakp1m X-Rspamd-Queue-Id: 9F8FFA0001CF X-Rspamd-Server: rspam06 X-HE-Tag: 1623806597-920000 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kees Cook Subject: mm/slub: actually fix freelist pointer vs redzoning It turns out that SLUB redzoning ("slub_debug=Z") checks from s->object_size rather than from s->inuse (which is normally bumped to make room for the freelist pointer), so a cache created with an object size less than 24 would have the freelist pointer written beyond s->object_size, causing the redzone to be corrupted by the freelist pointer. This was very visible with "slub_debug=ZF": BUG test (Tainted: G B ): Right Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200 INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620 Redzone (____ptrval____): bb bb bb bb bb bb bb bb ........ Object (____ptrval____): 00 00 00 00 00 f6 f4 a5 ........ Redzone (____ptrval____): 40 1d e8 1a aa @.... Padding (____ptrval____): 00 00 00 00 00 00 00 00 ........ Adjust the offset to stay within s->object_size. (Note that no caches of in this size range are known to exist in the kernel currently.) Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/ Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/Fixes: 89b83f282d8b (slub: avoid redzone when choosing freepointer location) Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcpqStkA@mail.gmail.com Signed-off-by: Kees Cook Reported-by: Marco Elver Reported-by: "Lin, Zhenpeng" Tested-by: Marco Elver Acked-by: Vlastimil Babka Cc: Christoph Lameter Cc: David Rientjes Cc: Joonsoo Kim Cc: Pekka Enberg Cc: Roman Gushchin Cc: Signed-off-by: Andrew Morton --- mm/slub.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) --- a/mm/slub.c~mm-slub-actually-fix-freelist-pointer-vs-redzoning +++ a/mm/slub.c @@ -3689,7 +3689,6 @@ static int calculate_sizes(struct kmem_c { slab_flags_t flags = s->flags; unsigned int size = s->object_size; - unsigned int freepointer_area; unsigned int order; /* @@ -3698,13 +3697,6 @@ static int calculate_sizes(struct kmem_c * the possible location of the free pointer. */ size = ALIGN(size, sizeof(void *)); - /* - * This is the area of the object where a freepointer can be - * safely written. If redzoning adds more to the inuse size, we - * can't use that portion for writing the freepointer, so - * s->offset must be limited within this for the general case. - */ - freepointer_area = size; #ifdef CONFIG_SLUB_DEBUG /* @@ -3730,7 +3722,7 @@ static int calculate_sizes(struct kmem_c /* * With that we have determined the number of bytes in actual use - * by the object. This is the potential offset to the free pointer. + * by the object and redzoning. */ s->inuse = size; @@ -3753,13 +3745,13 @@ static int calculate_sizes(struct kmem_c */ s->offset = size; size += sizeof(void *); - } else if (freepointer_area > sizeof(void *)) { + } else { /* * Store freelist pointer near middle of object to keep * it away from the edges of the object to avoid small * sized over/underflows from neighboring allocations. */ - s->offset = ALIGN(freepointer_area / 2, sizeof(void *)); + s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *)); } #ifdef CONFIG_SLUB_DEBUG From patchwork Wed Jun 16 01:23:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADECFC49361 for ; Wed, 16 Jun 2021 01:23:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6526F61369 for ; Wed, 16 Jun 2021 01:23:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6526F61369 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A5E36B0074; Tue, 15 Jun 2021 21:23:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 055996B0075; Tue, 15 Jun 2021 21:23:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEB046B0078; Tue, 15 Jun 2021 21:23:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id AB4336B0074 for ; Tue, 15 Jun 2021 21:23:31 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 52E16181AC9C6 for ; Wed, 16 Jun 2021 01:23:31 +0000 (UTC) X-FDA: 78257839422.23.FB9D905 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP id 8CBD0371 for ; Wed, 16 Jun 2021 01:23:17 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D999A613B6; Wed, 16 Jun 2021 01:23:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806610; bh=AyRJJ7WGN+NTSpNNAuNClssAiDJFyMHwKSXgGanFcZA=; h=Date:From:To:Subject:In-Reply-To:From; b=cbLJiqSatc2hgUv4n27GA+8Qcmx4pUDKZmp9OqFNg1itGmYnYltR+7Bo9SJtmUq3B IDV4mJmEVirDb93Y9BJ6kVcx4+cbi66mih21anNNNVaZnOjJ32hzdVejHkO+fmFqxx glVAmjfCjX8PGsnvmSROISi+Qgxnjz4Pb3kFLt7g= Date: Tue, 15 Jun 2021 18:23:29 -0700 From: Andrew Morton To: akpm@linux-foundation.org, almasrymina@google.com, axelrasmussen@google.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, peterx@redhat.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 06/18] mm/hugetlb: expand restore_reserve_on_error functionality Message-ID: <20210616012329.TuTIi9oFo%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8CBD0371 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=cbLJiqSa; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: jgc3mjne8xthoyasir59qyghn1o5nwjr X-HE-Tag: 1623806597-552960 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Kravetz Subject: mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz Reviewed-by: Mina Almasry Cc: Axel Rasmussen Cc: Peter Xu Cc: Muchun Song Cc: Michal Hocko Cc: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton --- fs/hugetlbfs/inode.c | 1 include/linux/hugetlb.h | 2 mm/hugetlb.c | 122 ++++++++++++++++++++++++++++++-------- 3 files changed, 101 insertions(+), 24 deletions(-) --- a/fs/hugetlbfs/inode.c~mm-hugetlb-expand-restore_reserve_on_error-functionality +++ a/fs/hugetlbfs/inode.c @@ -735,6 +735,7 @@ static long hugetlbfs_fallocate(struct f __SetPageUptodate(page); error = huge_add_to_page_cache(page, mapping, index); if (unlikely(error)) { + restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; --- a/include/linux/hugetlb.h~mm-hugetlb-expand-restore_reserve_on_error-functionality +++ a/include/linux/hugetlb.h @@ -610,6 +610,8 @@ struct page *alloc_huge_page_vma(struct unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h); --- a/mm/hugetlb.c~mm-hugetlb-expand-restore_reserve_on_error-functionality +++ a/mm/hugetlb.c @@ -2121,12 +2121,18 @@ out: * be restored when a newly allocated huge page must be freed. It is * to be called after calling vma_needs_reservation to determine if a * reservation exists. + * + * vma_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to + * be called after calling vma_needs_reservation to determine if a reservation + * exists. */ enum vma_resv_mode { VMA_NEEDS_RESV, VMA_COMMIT_RESV, VMA_END_RESV, VMA_ADD_RESV, + VMA_DEL_RESV, }; static long __vma_reservation_common(struct hstate *h, struct vm_area_struct *vma, unsigned long addr, @@ -2170,11 +2176,21 @@ static long __vma_reservation_common(str ret = region_del(resv, idx, idx + 1); } break; + case VMA_DEL_RESV: + if (vma->vm_flags & VM_MAYSHARE) { + region_abort(resv, idx, idx + 1, 1); + ret = region_del(resv, idx, idx + 1); + } else { + ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + /* region_add calls of range 1 should never fail. */ + VM_BUG_ON(ret < 0); + } + break; default: BUG(); } - if (vma->vm_flags & VM_MAYSHARE) + if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2222,25 +2238,39 @@ static long vma_add_reservation(struct h return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); } +static long vma_del_reservation(struct hstate *h, + struct vm_area_struct *vma, unsigned long addr) +{ + return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); +} + /* - * This routine is called to restore a reservation on error paths. In the - * specific error paths, a huge page was allocated (via alloc_huge_page) - * and is about to be freed. If a reservation for the page existed, - * alloc_huge_page would have consumed the reservation and set - * HPageRestoreReserve in the newly allocated page. When the page is freed - * via free_huge_page, the global reservation count will be incremented if - * HPageRestoreReserve is set. However, free_huge_page can not adjust the - * reserve map. Adjust the reserve map here to be consistent with global - * reserve count adjustments to be made by free_huge_page. - */ -static void restore_reserve_on_error(struct hstate *h, - struct vm_area_struct *vma, unsigned long address, - struct page *page) + * This routine is called to restore reservation information on error paths. + * It should ONLY be called for pages allocated via alloc_huge_page(), and + * the hugetlb mutex should remain held when calling this routine. + * + * It handles two specific cases: + * 1) A reservation was in place and the page consumed the reservation. + * HPageRestoreReserve is set in the page. + * 2) No reservation was in place for the page, so HPageRestoreReserve is + * not set. However, alloc_huge_page always updates the reserve map. + * + * In case 1, free_huge_page later in the error path will increment the + * global reserve count. But, free_huge_page does not have enough context + * to adjust the reservation map. This case deals primarily with private + * mappings. Adjust the reserve map here to be consistent with global + * reserve count adjustments to be made by free_huge_page. Make sure the + * reserve map indicates there is a reservation present. + * + * In case 2, simply undo reserve map modifications done by alloc_huge_page. + */ +void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct page *page) { - if (unlikely(HPageRestoreReserve(page))) { - long rc = vma_needs_reservation(h, vma, address); + long rc = vma_needs_reservation(h, vma, address); - if (unlikely(rc < 0)) { + if (HPageRestoreReserve(page)) { + if (unlikely(rc < 0)) /* * Rare out of memory condition in reserve map * manipulation. Clear HPageRestoreReserve so that @@ -2253,16 +2283,57 @@ static void restore_reserve_on_error(str * accounting of reserve counts. */ ClearHPageRestoreReserve(page); - } else if (rc) { - rc = vma_add_reservation(h, vma, address); - if (unlikely(rc < 0)) + else if (rc) + (void)vma_add_reservation(h, vma, address); + else + vma_end_reservation(h, vma, address); + } else { + if (!rc) { + /* + * This indicates there is an entry in the reserve map + * added by alloc_huge_page. We know it was added + * before the alloc_huge_page call, otherwise + * HPageRestoreReserve would be set on the page. + * Remove the entry so that a subsequent allocation + * does not consume a reservation. + */ + rc = vma_del_reservation(h, vma, address); + if (rc < 0) /* - * See above comment about rare out of - * memory condition. + * VERY rare out of memory condition. Since + * we can not delete the entry, set + * HPageRestoreReserve so that the reserve + * count will be incremented when the page + * is freed. This reserve will be consumed + * on a subsequent allocation. */ - ClearHPageRestoreReserve(page); + SetHPageRestoreReserve(page); + } else if (rc < 0) { + /* + * Rare out of memory condition from + * vma_needs_reservation call. Memory allocation is + * only attempted if a new entry is needed. Therefore, + * this implies there is not an entry in the + * reserve map. + * + * For shared mappings, no entry in the map indicates + * no reservation. We are done. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + /* + * For private mappings, no entry indicates + * a reservation is present. Since we can + * not add an entry, set SetHPageRestoreReserve + * on the page so reserve count will be + * incremented when freed. This reserve will + * be consumed on a subsequent allocation. + */ + SetHPageRestoreReserve(page); } else - vma_end_reservation(h, vma, address); + /* + * No reservation present, do nothing + */ + vma_end_reservation(h, vma, address); } } @@ -4037,6 +4108,8 @@ again: spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { + restore_reserve_on_error(h, vma, addr, + new); put_page(new); /* dst_entry won't change as in child */ goto again; @@ -5006,6 +5079,7 @@ out_release_unlock: if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; } From patchwork Wed Jun 16 01:23:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DC56C48BE8 for ; Wed, 16 Jun 2021 01:23:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D6B7B61159 for ; Wed, 16 Jun 2021 01:23:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D6B7B61159 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7ABE06B0075; Tue, 15 Jun 2021 21:23:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7337D6B0078; Tue, 15 Jun 2021 21:23:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 587506B007B; Tue, 15 Jun 2021 21:23:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 22CC46B0075 for ; Tue, 15 Jun 2021 21:23:35 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id ADE0BE0BD for ; Wed, 16 Jun 2021 01:23:34 +0000 (UTC) X-FDA: 78257839548.06.2F36DA2 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf30.hostedemail.com (Postfix) with ESMTP id DA735E000253 for ; Wed, 16 Jun 2021 01:23:27 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 2F71861356; Wed, 16 Jun 2021 01:23:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806613; bh=5vvt6dQdqnMn/3BBdU1MAi2Tx/HWOlAus5gQOe/kOzk=; h=Date:From:To:Subject:In-Reply-To:From; b=evOYMYHXYKi+i48c5dsWTJ3ni9YkSpYksM8CaCSOGOUTFO+CErKsr4FndzJNJtG+q 94JuIs8bj5z+6Kko5Vl8T72wHRLey8yh784TS53dT7CIbilqvN060vSUGFVkzUcSNZ tfVrpI5FXc8PMG10wuH13CdYx0/dYb4e0uzrYwuc= Date: Tue, 15 Jun 2021 18:23:32 -0700 From: Andrew Morton To: akpm@linux-foundation.org, jack@suse.cz, linux-mm@kvack.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, torvalds@linux-foundation.org, tytso@mit.edu, yangerkun@huawei.com, yukuai3@huawei.com Subject: [patch 07/18] mm/memory-failure: make sure wait for page writeback in memory_failure Message-ID: <20210616012332.bPPCzBRm4%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=evOYMYHX; dmarc=none; spf=pass (imf30.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Stat-Signature: wca9grntb6g55txu6berkdkejjs5jnf6 X-Rspamd-Queue-Id: DA735E000253 X-HE-Tag: 1623806607-951152 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: yangerkun Subject: mm/memory-failure: make sure wait for page writeback in memory_failure Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in clear_inode: [ 292.016156] ------------[ cut here ]------------ [ 292.017144] kernel BUG at fs/inode.c:519! [ 292.017860] Internal error: Oops - BUG: 0 [#1] SMP [ 292.018741] Dumping ftrace buffer: [ 292.019577] (ftrace buffer empty) [ 292.020430] Modules linked in: [ 292.021748] Process syz-executor.0 (pid: 249, stack limit = 0x00000000a12409d7) [ 292.023719] CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.95 [ 292.025206] Hardware name: linux,dummy-virt (DT) [ 292.026176] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 292.027244] pc : clear_inode+0x280/0x2a8 [ 292.028045] lr : clear_inode+0x280/0x2a8 [ 292.028877] sp : ffff8003366c7950 [ 292.029582] x29: ffff8003366c7950 x28: 0000000000000000 [ 292.030570] x27: ffff80032b5f4708 x26: ffff80032b5f4678 [ 292.031863] x25: ffff80036ae6b300 x24: ffff8003689254d0 [ 292.032902] x23: ffff80036ae69d80 x22: 0000000000033cc8 [ 292.033928] x21: 0000000000000000 x20: ffff80032b5f47a0 [ 292.034941] x19: ffff80032b5f4678 x18: 0000000000000000 [ 292.035958] x17: 0000000000000000 x16: 0000000000000000 [ 292.037102] x15: 0000000000000000 x14: 0000000000000000 [ 292.038103] x13: 0000000000000004 x12: 0000000000000000 [ 292.039137] x11: 1ffff00066cd8f52 x10: 1ffff00066cd8ec8 [ 292.040216] x9 : dfff200000000000 x8 : ffff10006ac1e86a [ 292.041432] x7 : dfff200000000000 x6 : ffff100066cd8f1e [ 292.042516] x5 : dfff200000000000 x4 : ffff80032b5f47a0 [ 292.043525] x3 : ffff200008000000 x2 : ffff200009867000 [ 292.044560] x1 : ffff8003366bb000 x0 : 0000000000000000 [ 292.045569] Call trace: [ 292.046083] clear_inode+0x280/0x2a8 [ 292.046828] ext4_clear_inode+0x38/0xe8 [ 292.047593] ext4_free_inode+0x130/0xc68 [ 292.048383] ext4_evict_inode+0xb20/0xcb8 [ 292.049162] evict+0x1a8/0x3c0 [ 292.049761] iput+0x344/0x460 [ 292.050350] do_unlinkat+0x260/0x410 [ 292.051042] __arm64_sys_unlinkat+0x6c/0xc0 [ 292.051846] el0_svc_common+0xdc/0x3b0 [ 292.052570] el0_svc_handler+0xf8/0x160 [ 292.053303] el0_svc+0x10/0x218 [ 292.053908] Code: 9413f4a9 d503201f f90017b6 97f4d5b1 (d4210000) [ 292.055471] ---[ end trace 01b339dd07795f8d ]--- [ 292.056443] Kernel panic - not syncing: Fatal exception [ 292.057488] SMP: stopping secondary CPUs [ 292.058419] Dumping ftrace buffer: [ 292.059078] (ftrace buffer empty) [ 292.059756] Kernel Offset: disabled [ 292.060443] CPU features: 0x10,a1006000 [ 292.061195] Memory Limit: none [ 292.061794] Rebooting in 86400 seconds.. Crash of this problem show that someone call __munlock_pagevec to clear page LRU without lock_page. #0 [ffff80035f02f4c0] __switch_to at ffff20000808d020 #1 [ffff80035f02f4f0] __schedule at ffff20000985102c #2 [ffff80035f02f5e0] schedule at ffff200009851d1c #3 [ffff80035f02f600] io_schedule at ffff2000098525c0 #4 [ffff80035f02f620] __lock_page at ffff20000842d2d4 #5 [ffff80035f02f710] __munlock_pagevec at ffff2000084c4600 #6 [ffff80035f02f870] munlock_vma_pages_range at ffff2000084c5928 #7 [ffff80035f02fa60] do_munmap at ffff2000084cbdf4 #8 [ffff80035f02faf0] mmap_region at ffff2000084ce20c #9 [ffff80035f02fb90] do_mmap at ffff2000084cf018 So memory_failure will call identify_page_state without wait_on_page_writeback. And after truncate_error_page clear the mapping of this page. end_page_writeback won't call sb_clear_inode_writeback to clear inode->i_wb_list. That will trigger BUG_ON in clear_inode! Fix it by checking PageWriteback too to help determine should we skip wait_on_page_writeback. Link: https://lkml.kernel.org/r/20210604084705.3729204-1-yangerkun@huawei.com Fixes: 0bc1f8b0682c ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU") Signed-off-by: yangerkun Acked-by: Naoya Horiguchi Cc: Jan Kara Cc: Theodore Ts'o Cc: Oscar Salvador Cc: Yu Kuai Signed-off-by: Andrew Morton --- mm/memory-failure.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-memory-failure-make-sure-wait-for-page-writeback-in-memory_failure +++ a/mm/memory-failure.c @@ -1552,7 +1552,12 @@ try_again: return 0; } - if (!PageTransTail(p) && !PageLRU(p)) + /* + * __munlock_pagevec may clear a writeback page's LRU flag without + * page_lock. We need wait writeback completion for this page or it + * may trigger vfs BUG while evict inode. + */ + if (!PageTransTail(p) && !PageLRU(p) && !PageWriteback(p)) goto identify_page_state; /* From patchwork Wed Jun 16 01:23:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30694C49361 for ; Wed, 16 Jun 2021 01:23:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB9F061369 for ; Wed, 16 Jun 2021 01:23:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB9F061369 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 820B06B0078; Tue, 15 Jun 2021 21:23:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CF556B007B; Tue, 15 Jun 2021 21:23:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 649346B007D; Tue, 15 Jun 2021 21:23:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id 32EDD6B0078 for ; Tue, 15 Jun 2021 21:23:39 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3B53EFB55 for ; Wed, 16 Jun 2021 01:23:38 +0000 (UTC) X-FDA: 78257839716.18.86482C0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf23.hostedemail.com (Postfix) with ESMTP id 66963A000151 for ; Wed, 16 Jun 2021 01:23:29 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 74EB9613B1; Wed, 16 Jun 2021 01:23:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806617; bh=cQ5Kb9gWEq0bMnW5jOLwX9BtI78Ce3eUp25+8n3Txbc=; h=Date:From:To:Subject:In-Reply-To:From; b=M41VIRYU93KnEIWUjsZt9Po98wdAS99c2lpDVz12YGw/b0KUofySkvm1i9DrENV4s imTkPVP+rEh4paBbeMJ6Od7//Z30FQcyt8h0fsQE9X6tBHfa7oLcwZ4I469SHukSx+ 7WFH9KQeLH6P1Fm8BW4kRrkMCbSNcC5WyIAGavn4= Date: Tue, 15 Jun 2021 18:23:36 -0700 From: Andrew Morton To: akpm@linux-foundation.org, anderson@redhat.com, benh@kernel.crashing.org, bhe@redhat.com, bhupesh.sharma@linaro.org, bp@alien8.de, catalin.marinas@arm.com, dyoung@redhat.com, james.morse@arm.com, k-hagio@ab.jp.nec.com, kernelfans@gmail.com, linux-mm@kvack.org, mark.rutland@arm.com, mingo@kernel.org, mm-commits@vger.kernel.org, mpe@ellerman.id.au, paulus@samba.org, stable@vger.kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, will@kernel.org Subject: [patch 08/18] crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo Message-ID: <20210616012336.mnhwEhLlA%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 66963A000151 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=M41VIRYU; spf=pass (imf23.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: 9jcaomctcas3wyyoq5fnu1wax1384kub X-HE-Tag: 1623806609-739568 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Pingfan Liu Subject: crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo As mentioned in kernel commit 1d50e5d0c505 ("crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the formula: #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate PAGES_PER_SECTION in makedumpfile just like kernel. Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g. recently in kernel commit f0b13ee23241 ("arm64/sparsemem: reduce SECTION_SIZE_BITS"). But user space wants a stable interface to get this info. Such info is impossible to be deduced from a crashdump vmcore. Hence append SECTION_SIZE_BITS to vmcoreinfo. Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html Signed-off-by: Pingfan Liu Acked-by: Baoquan He Cc: Bhupesh Sharma Cc: Kazuhito Hagio Cc: Dave Young Cc: Boris Petkov Cc: Ingo Molnar Cc: Thomas Gleixner Cc: James Morse Cc: Mark Rutland Cc: Will Deacon Cc: Catalin Marinas Cc: Michael Ellerman Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Dave Anderson Cc: Signed-off-by: Andrew Morton --- kernel/crash_core.c | 1 + 1 file changed, 1 insertion(+) --- a/kernel/crash_core.c~crash_core-vmcoreinfo-append-section_size_bits-to-vmcoreinfo +++ a/kernel/crash_core.c @@ -464,6 +464,7 @@ static int __init crash_save_vmcoreinfo_ VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS); VMCOREINFO_STRUCT_SIZE(mem_section); VMCOREINFO_OFFSET(mem_section, section_mem_map); + VMCOREINFO_NUMBER(SECTION_SIZE_BITS); VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); #endif VMCOREINFO_STRUCT_SIZE(page); From patchwork Wed Jun 16 01:23:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 719C4C49EA3 for ; Wed, 16 Jun 2021 01:23:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3D77D61159 for ; Wed, 16 Jun 2021 01:23:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D77D61159 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CDB0E6B007B; Tue, 15 Jun 2021 21:23:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C633C6B007D; Tue, 15 Jun 2021 21:23:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B06386B007E; Tue, 15 Jun 2021 21:23:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 73B4A6B007B for ; Tue, 15 Jun 2021 21:23:41 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2109EE088 for ; Wed, 16 Jun 2021 01:23:41 +0000 (UTC) X-FDA: 78257839842.35.0E91131 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf25.hostedemail.com (Postfix) with ESMTP id 753BA6000143 for ; Wed, 16 Jun 2021 01:23:31 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D280961159; Wed, 16 Jun 2021 01:23:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806620; bh=dGuK6Z9fTiEiwzM7arbMbx90t9BffJ9SG5/pKGr4ZdU=; h=Date:From:To:Subject:In-Reply-To:From; b=KOmsg/nIAyZPO9+O95Hcmkcm20AauEDZdl1yT6O84ULHpcmfE//jn6ijxBhSaTkAa oTIeVm7Ww6wL6UfL456fyp9ET5LSTIVBYNKSuLLZ4Q18LkSNLZNpOeBOir8ejcKQrj H+OvJag2zZnl2IFMyjvbvLubKgEoaaljM9HO9G1Q= Date: Tue, 15 Jun 2021 18:23:39 -0700 From: Andrew Morton To: akpm@linux-foundation.org, keescook@chromium.org, linux-mm@kvack.org, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, vannguye@cisco.com Subject: [patch 09/18] mm/slub.c: include swab.h Message-ID: <20210616012339.Z7kefWWvC%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 753BA6000143 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="KOmsg/nI"; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: 6g9qx4ukpgnhk9xaxrpseeqkq3gdt6hm X-HE-Tag: 1623806611-947235 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Andrew Morton Subject: mm/slub.c: include swab.h Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y. Hopefully. But it's the right thing to do anwyay. Fixes: 1ad53d9fa3f61 ("slub: improve bit diffusion for freelist ptr obfuscation") Link: https://bugzilla.kernel.org/show_bug.cgi?id=213417 Reported-by: Acked-by: Kees Cook Cc: Signed-off-by: Andrew Morton --- mm/slub.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/slub.c~mm-slubc-include-swabh +++ a/mm/slub.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include "slab.h" From patchwork Wed Jun 16 01:23:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8B8AC48BE8 for ; Wed, 16 Jun 2021 01:23:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5E34061356 for ; Wed, 16 Jun 2021 01:23:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E34061356 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0583F6B007D; Tue, 15 Jun 2021 21:23:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 007496B007E; Tue, 15 Jun 2021 21:23:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC5086B0080; Tue, 15 Jun 2021 21:23:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id A57BC6B007D for ; Tue, 15 Jun 2021 21:23:44 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 42F21824999B for ; Wed, 16 Jun 2021 01:23:44 +0000 (UTC) X-FDA: 78257839968.21.95EDBC7 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf04.hostedemail.com (Postfix) with ESMTP id C5D2E371 for ; Wed, 16 Jun 2021 01:23:37 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id EEF6561159; Wed, 16 Jun 2021 01:23:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806623; bh=JouAOaKCTdqY1DBVgWnTWERrrO0w/43jM+7ztemVfbM=; h=Date:From:To:Subject:In-Reply-To:From; b=QVjUdCIX3xbgtrEmftKbAt4lPXwC4Jo+fPJTEBRR+jiSmn0U9/61AStAo47JIEPPu 4bvF7yx84vd4cszrMY9BbsacWT8l8PGeJfCWvJXeGeDUZRd76kiECIE5sMJ0KIE7kI WVtMRrahERtXT+rXLsQ91LzSasNaB3HlWAnyNf4c= Date: Tue, 15 Jun 2021 18:23:42 -0700 From: Andrew Morton To: akpm@linux-foundation.org, gavin.dg@linux.alibaba.com, hughd@google.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org, xuyu@linux.alibaba.com Subject: [patch 10/18] mm, thp: use head page in __migration_entry_wait() Message-ID: <20210616012342.r-f9_qYFs%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=QVjUdCIX; dmarc=none; spf=pass (imf04.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Stat-Signature: tgyfd4mx5k1b5o4qhuhqzay77wgrrmhi X-Rspamd-Queue-Id: C5D2E371 X-HE-Tag: 1623806617-186824 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Xu Yu Subject: mm, thp: use head page in __migration_entry_wait() We notice that hung task happens in a corner but practical scenario when CONFIG_PREEMPT_NONE is enabled, as follows. Process 0 Process 1 Process 2..Inf split_huge_page_to_list unmap_page split_huge_pmd_address __migration_entry_wait(head) __migration_entry_wait(tail) remap_page (roll back) remove_migration_ptes rmap_walk_anon cond_resched Where __migration_entry_wait(tail) is occurred in kernel space, e.g., copy_to_user in fstat, which will immediately fault again without rescheduling, and thus occupy the cpu fully. When there are too many processes performing __migration_entry_wait on tail page, remap_page will never be done after cond_resched. This makes __migration_entry_wait operate on the compound head page, thus waits for remap_page to complete, whether the THP is split successfully or roll back. Note that put_and_wait_on_page_locked helps to drop the page reference acquired with get_page_unless_zero, as soon as the page is on the wait queue, before actually waiting. So splitting the THP is only prevented for a brief interval. Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com Fixes: ba98828088ad ("thp: add option to setup migration entries during PMD split") Suggested-by: Hugh Dickins Signed-off-by: Gang Deng Signed-off-by: Xu Yu Acked-by: Kirill A. Shutemov Acked-by: Hugh Dickins Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton --- mm/migrate.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/migrate.c~mm-thp-use-head-page-in-__migration_entry_wait +++ a/mm/migrate.c @@ -295,6 +295,7 @@ void __migration_entry_wait(struct mm_st goto out; page = migration_entry_to_page(entry); + page = compound_head(page); /* * Once page cache replacement of page migration started, page_count From patchwork Wed Jun 16 01:23:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 824B2C48BE8 for ; Wed, 16 Jun 2021 01:23:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 381F361369 for ; Wed, 16 Jun 2021 01:23:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 381F361369 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D28246B007E; Tue, 15 Jun 2021 21:23:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB2596B0080; Tue, 15 Jun 2021 21:23:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2C276B0081; Tue, 15 Jun 2021 21:23:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 7FFDE6B007E for ; Tue, 15 Jun 2021 21:23:48 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 22A318249980 for ; Wed, 16 Jun 2021 01:23:48 +0000 (UTC) X-FDA: 78257840136.39.55FD384 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf20.hostedemail.com (Postfix) with ESMTP id 2366AFF for ; Wed, 16 Jun 2021 01:23:35 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 4DAFA6137D; Wed, 16 Jun 2021 01:23:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806626; bh=yjKv8lvAh6d/PrfDR8Yktv3L+e2fvd1p3HqAx7amSsU=; h=Date:From:To:Subject:In-Reply-To:From; b=LOHm8+4OEE42rzNArFgmnA+veKuUaYZ7uhgGaFZ7LNHL0zyDv1cAGO44HETRfFGcJ t9SE97wNrscV5R2iBeUEg1KKWtfjrfGEsZj4Y9hj2h1PZjUkcJ30Oh/r0N1QRxPxhT LcrpYCdO9ohXTW2r7gjLfa77bTZKdlr6uGk0BFbY= Date: Tue, 15 Jun 2021 18:23:45 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 11/18] mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Message-ID: <20210616012345.5wSsteRFU%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=LOHm8+4O; dmarc=none; spf=pass (imf20.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Stat-Signature: 3rsitzmhc34f4aibooc9hj1ajp3hwq9k X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2366AFF X-HE-Tag: 1623806615-755391 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: fix __split_huge_pmd_locked() on shmem migration entry Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10. Here is v2 batch of long-standing THP bug fixes that I had not got around to sending before, but prompted now by Wang Yugui's report https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they have done no harm, but have *not* fixed that issue: something more is needed and I have no idea of what. This patch (of 7): Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case. Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line. And while there: use old_pmd rather than _pmd, as in the following blocks; and make it clearer to the eye that the !vma_is_anonymous() block is self-contained, making an early return after accounting for unmapping. Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins Cc: Kirill A. Shutemov Cc: Yang Shi Cc: Wang Yugui Cc: "Matthew Wilcox (Oracle)" Cc: Naoya Horiguchi Cc: Alistair Popple Cc: Ralph Campbell Cc: Zi Yan Cc: Miaohe Lin Cc: Minchan Kim Cc: Jue Wang Cc: Peter Xu Cc: Jan Kara Cc: Shakeel Butt Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton --- mm/huge_memory.c | 27 ++++++++++++++++++--------- mm/pgtable-generic.c | 5 ++--- 2 files changed, 20 insertions(+), 12 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry +++ a/mm/huge_memory.c @@ -2044,7 +2044,7 @@ static void __split_huge_pmd_locked(stru count_vm_event(THP_SPLIT_PMD); if (!vma_is_anonymous(vma)) { - _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); + old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); /* * We are going to unmap this huge page. So * just go ahead and zap it @@ -2053,16 +2053,25 @@ static void __split_huge_pmd_locked(stru zap_deposited_table(mm, pmd); if (vma_is_special_huge(vma)) return; - page = pmd_page(_pmd); - if (!PageDirty(page) && pmd_dirty(_pmd)) - set_page_dirty(page); - if (!PageReferenced(page) && pmd_young(_pmd)) - SetPageReferenced(page); - page_remove_rmap(page, true); - put_page(page); + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + page = migration_entry_to_page(entry); + } else { + page = pmd_page(old_pmd); + if (!PageDirty(page) && pmd_dirty(old_pmd)) + set_page_dirty(page); + if (!PageReferenced(page) && pmd_young(old_pmd)) + SetPageReferenced(page); + page_remove_rmap(page, true); + put_page(page); + } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + } + + if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside --- a/mm/pgtable-generic.c~mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry +++ a/mm/pgtable-generic.c @@ -135,9 +135,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_ar { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(!pmd_present(*pmdp)); - /* Below assumes pmd_present() is true */ - VM_BUG_ON(!pmd_trans_huge(*pmdp) && !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && + !pmd_devmap(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; From patchwork Wed Jun 16 01:23:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13BE9C48BE5 for ; Wed, 16 Jun 2021 01:23:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ABD9261369 for ; Wed, 16 Jun 2021 01:23:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ABD9261369 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4EC846B0080; Tue, 15 Jun 2021 21:23:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49AEF6B0081; Tue, 15 Jun 2021 21:23:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33BAC6B0082; Tue, 15 Jun 2021 21:23:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id EBD986B0080 for ; Tue, 15 Jun 2021 21:23:51 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 87C78180AD807 for ; Wed, 16 Jun 2021 01:23:51 +0000 (UTC) X-FDA: 78257840262.12.D18D089 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf17.hostedemail.com (Postfix) with ESMTP id 1C13840002E4 for ; Wed, 16 Jun 2021 01:23:44 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D9F2761356; Wed, 16 Jun 2021 01:23:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806630; bh=WA3iCUKYILY7e80wgM/qr+Fc2w58HgQ8nKkZ+bgDx5M=; h=Date:From:To:Subject:In-Reply-To:From; b=z8MM/iO6IfL36JQI9pcEa8Il5urd+443OA5CCuwEHoJqzQdrcIaL3nAbgPvvvLyya N1b/GoS3lBjTF+tIuJxPY8GjQL6AuazqS7a6IKjeWkK1cX3aHXqsqcnuNw0yvXWSrw L/6MIzDet7NeMH7I/9moeSO7lUzb7ZBlc5q1VWM4= Date: Tue, 15 Jun 2021 18:23:49 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 12/18] mm/thp: make is_huge_zero_pmd() safe and quicker Message-ID: <20210616012349.Q0qeVCwYo%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1C13840002E4 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="z8MM/iO6"; spf=pass (imf17.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: bfwgaqohofzd9unersj7ykypparqnk3p X-HE-Tag: 1623806623-691305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: make is_huge_zero_pmd() safe and quicker Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Reviewed-by: Yang Shi Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 8 +++++++- mm/huge_memory.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-) --- a/include/linux/huge_mm.h~mm-thp-make-is_huge_zero_pmd-safe-and-quicker +++ a/include/linux/huge_mm.h @@ -286,6 +286,7 @@ struct page *follow_devmap_pud(struct vm vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd); extern struct page *huge_zero_page; +extern unsigned long huge_zero_pfn; static inline bool is_huge_zero_page(struct page *page) { @@ -294,7 +295,7 @@ static inline bool is_huge_zero_page(str static inline bool is_huge_zero_pmd(pmd_t pmd) { - return is_huge_zero_page(pmd_page(pmd)); + return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd); } static inline bool is_huge_zero_pud(pud_t pud) @@ -439,6 +440,11 @@ static inline bool is_huge_zero_page(str { return false; } + +static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return false; +} static inline bool is_huge_zero_pud(pud_t pud) { --- a/mm/huge_memory.c~mm-thp-make-is_huge_zero_pmd-safe-and-quicker +++ a/mm/huge_memory.c @@ -62,6 +62,7 @@ static struct shrinker deferred_split_sh static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; +unsigned long huge_zero_pfn __read_mostly = ~0UL; bool transparent_hugepage_enabled(struct vm_area_struct *vma) { @@ -98,6 +99,7 @@ retry: __free_pages(zero_page, compound_order(zero_page)); goto retry; } + WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page)); /* We take additional reference here. It will be put back by shrinker */ atomic_set(&huge_zero_refcount, 2); @@ -147,6 +149,7 @@ static unsigned long shrink_huge_zero_pa if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) { struct page *zero_page = xchg(&huge_zero_page, NULL); BUG_ON(zero_page == NULL); + WRITE_ONCE(huge_zero_pfn, ~0UL); __free_pages(zero_page, compound_order(zero_page)); return HPAGE_PMD_NR; } @@ -2071,7 +2074,7 @@ static void __split_huge_pmd_locked(stru return; } - if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + if (is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside From patchwork Wed Jun 16 01:23:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81AA8C49361 for ; Wed, 16 Jun 2021 01:23:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 36CDF613B1 for ; Wed, 16 Jun 2021 01:23:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 36CDF613B1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D01726B006C; Tue, 15 Jun 2021 21:23:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB0EA6B0081; Tue, 15 Jun 2021 21:23:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2B086B0082; Tue, 15 Jun 2021 21:23:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id 7FF8B6B006C for ; Tue, 15 Jun 2021 21:23:55 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2AAB6EFF7 for ; Wed, 16 Jun 2021 01:23:55 +0000 (UTC) X-FDA: 78257840430.14.266CEFF Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf29.hostedemail.com (Postfix) with ESMTP id AE6E836D for ; Wed, 16 Jun 2021 01:23:41 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 752F061369; Wed, 16 Jun 2021 01:23:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806634; bh=VNaQnJrt7ThaeEg7GPJUWGZ3YgrzcprM6VRkm+qhEvs=; h=Date:From:To:Subject:In-Reply-To:From; b=qMOVo7H52dsI3Ess5JGcuLgMUyCtbNjkpev2j16jo1ep/2fz1w68Rw5Nmzaai7pOC Ho4AvLjGTkK+/O0hmW6B3CXYgFXMbmFXzO4ZoZgRsfzD5Wb8QMDYRCLh+EBwArouqc haGS/AX2qeeNzom2KZCqaG3VWBJHKLVXVUR1kGEo= Date: Tue, 15 Jun 2021 18:23:53 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 13/18] mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Message-ID: <20210616012353._aWHpXxeZ%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: AE6E836D Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=qMOVo7H5; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: tx13dcekh1pa4yxduia5kzj54ph417ci X-HE-Tag: 1623806621-77530 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: Kirill A. Shutemov Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- include/linux/rmap.h | 1 + mm/huge_memory.c | 2 +- mm/page_vma_mapped.c | 11 +++++++++++ mm/rmap.c | 17 ++++++++++++++++- 4 files changed, 29 insertions(+), 2 deletions(-) --- a/include/linux/rmap.h~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/include/linux/rmap.h @@ -91,6 +91,7 @@ enum ttu_flags { TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ + TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */ TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible * and caller guarantees they will --- a/mm/huge_memory.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/huge_memory.c @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; bool unmap_success; --- a/mm/page_vma_mapped.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/page_vma_mapped.c @@ -212,6 +212,17 @@ restart: pvmw->ptl = NULL; } } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(pvmw->page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } return false; } if (!map_pte(pvmw)) --- a/mm/rmap.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/rmap.c @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + if (flags & TTU_SYNC) + pvmw.flags = PVMW_SYNC; + /* munlock has nothing to gain from examining un-locked vmas */ if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) return true; @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enu else rmap_walk(page, &rwc); - return !page_mapcount(page) ? true : false; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + return !page_mapcount(page); } /** From patchwork Wed Jun 16 01:23:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EB49C48BDF for ; Wed, 16 Jun 2021 01:24:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 31950613B3 for ; Wed, 16 Jun 2021 01:24:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31950613B3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CB5A96B0081; Tue, 15 Jun 2021 21:23:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C14866B0082; Tue, 15 Jun 2021 21:23:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8F1D6B0083; Tue, 15 Jun 2021 21:23:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 760006B0081 for ; Tue, 15 Jun 2021 21:23:59 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D6D4D180AD80F for ; Wed, 16 Jun 2021 01:23:58 +0000 (UTC) X-FDA: 78257840556.15.0B8565F Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf03.hostedemail.com (Postfix) with ESMTP id 5FAB4C001C7C for ; Wed, 16 Jun 2021 01:23:48 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 1D77A613B1; Wed, 16 Jun 2021 01:23:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806637; bh=Mgp249HuEEV6yqFgx72+T+w+iJSf/IT0W/CGvsHpF0A=; h=Date:From:To:Subject:In-Reply-To:From; b=XyPSn//B1WZKH53y8cM/x/Url8YtTBeqGMDVVup51jyIutCiRjC+ahqLJKxDnG3r9 kCEn22y2UXLmIY1iUc6QxZirvDNcRoRItMPqN4Gtwyc2+EYhpHvUzdln/MIsHStglg d10qi/bOcMn0XZX8O84+tAv9l7pmgi0+pWIkcKTc= Date: Tue, 15 Jun 2021 18:23:56 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 14/18] mm/thp: fix vma_address() if virtual address below file offset Message-ID: <20210616012356.GKk-cakmm%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="XyPSn//B"; spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: e1ygnyh7d8gnd5crh3qu8rhred386fci X-Rspamd-Queue-Id: 5FAB4C001C7C X-Rspamd-Server: rspam06 X-HE-Tag: 1623806628-325014 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: fix vma_address() if virtual address below file offset Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap(). vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX. Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too. Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch. An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one(). Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered. Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- mm/internal.h | 51 ++++++++++++++++++++++++++++++----------- mm/page_vma_mapped.c | 16 ++++-------- mm/rmap.c | 16 ++++++------ 3 files changed, 52 insertions(+), 31 deletions(-) --- a/mm/internal.h~mm-thp-fix-vma_address-if-virtual-address-below-file-offset +++ a/mm/internal.h @@ -384,27 +384,52 @@ static inline void mlock_migrate_page(st extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); /* - * At what user virtual address is page expected in @vma? + * At what user virtual address is page expected in vma? + * Returns -EFAULT if all of the page is outside the range of vma. + * If page is a compound head, the entire compound page is considered. */ static inline unsigned long -__vma_address(struct page *page, struct vm_area_struct *vma) +vma_address(struct page *page, struct vm_area_struct *vma) { - pgoff_t pgoff = page_to_pgoff(page); - return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + pgoff_t pgoff; + unsigned long address; + + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page); + if (pgoff >= vma->vm_pgoff) { + address = vma->vm_start + + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address >= vma->vm_end) + address = -EFAULT; + } else if (PageHead(page) && + pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) { + /* Test above avoids possibility of wrap to 0 on 32-bit */ + address = vma->vm_start; + } else { + address = -EFAULT; + } + return address; } +/* + * Then at what user virtual address will none of the page be found in vma? + * Assumes that vma_address() already returned a good starting address. + * If page is a compound head, the entire compound page is considered. + */ static inline unsigned long -vma_address(struct page *page, struct vm_area_struct *vma) +vma_address_end(struct page *page, struct vm_area_struct *vma) { - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + thp_size(page) - PAGE_SIZE; - - /* page should be within @vma mapping range */ - VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma); + pgoff_t pgoff; + unsigned long address; - return max(start, vma->vm_start); + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page) + compound_nr(page); + address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address > vma->vm_end) + address = vma->vm_end; + return address; } static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, --- a/mm/page_vma_mapped.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset +++ a/mm/page_vma_mapped.c @@ -228,18 +228,18 @@ restart: if (!map_pte(pvmw)) goto next_pte; while (1) { + unsigned long end; + if (check_pte(pvmw)) return true; next_pte: /* Seek to next pte only makes sense for THP */ if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page)) return not_found(pvmw); + end = vma_address_end(pvmw->page, pvmw->vma); do { pvmw->address += PAGE_SIZE; - if (pvmw->address >= pvmw->vma->vm_end || - pvmw->address >= - __vma_address(pvmw->page, pvmw->vma) + - thp_size(pvmw->page)) + if (pvmw->address >= end) return not_found(pvmw); /* Did we cross page table boundary? */ if (pvmw->address % PMD_SIZE == 0) { @@ -277,14 +277,10 @@ int page_mapped_in_vma(struct page *page .vma = vma, .flags = PVMW_SYNC, }; - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + thp_size(page) - PAGE_SIZE; - if (unlikely(end < vma->vm_start || start >= vma->vm_end)) + pvmw.address = vma_address(page, vma); + if (pvmw.address == -EFAULT) return 0; - pvmw.address = max(start, vma->vm_start); if (!page_vma_mapped_walk(&pvmw)) return 0; page_vma_mapped_walk_done(&pvmw); --- a/mm/rmap.c~mm-thp-fix-vma_address-if-virtual-address-below-file-offset +++ a/mm/rmap.c @@ -707,7 +707,6 @@ static bool should_defer_flush(struct mm */ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) { - unsigned long address; if (PageAnon(page)) { struct anon_vma *page__anon_vma = page_anon_vma(page); /* @@ -722,10 +721,8 @@ unsigned long page_address_in_vma(struct return -EFAULT; } else return -EFAULT; - address = __vma_address(page, vma); - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) - return -EFAULT; - return address; + + return vma_address(page, vma); } pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) @@ -919,7 +916,7 @@ static bool page_mkclean_one(struct page */ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, 0, vma, vma->vm_mm, address, - min(vma->vm_end, address + page_size(page))); + vma_address_end(page, vma)); mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { @@ -1435,9 +1432,10 @@ static bool try_to_unmap_one(struct page * Note that the page can not be free in this function as call of * try_to_unmap() must hold a reference on the page. */ + range.end = PageKsm(page) ? + address + PAGE_SIZE : vma_address_end(page, vma); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - address, - min(vma->vm_end, address + page_size(page))); + address, range.end); if (PageHuge(page)) { /* * If sharing is possible, start and end will be adjusted @@ -1889,6 +1887,7 @@ static void rmap_walk_anon(struct page * struct vm_area_struct *vma = avc->vma; unsigned long address = vma_address(page, vma); + VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) @@ -1943,6 +1942,7 @@ static void rmap_walk_file(struct page * pgoff_start, pgoff_end) { unsigned long address = vma_address(page, vma); + VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) From patchwork Wed Jun 16 01:24:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3798C48BDF for ; Wed, 16 Jun 2021 01:24:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 686F1613D9 for ; Wed, 16 Jun 2021 01:24:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 686F1613D9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F37D6B0082; Tue, 15 Jun 2021 21:24:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A58B6B0083; Tue, 15 Jun 2021 21:24:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E86896B0085; Tue, 15 Jun 2021 21:24:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id B14096B0082 for ; Tue, 15 Jun 2021 21:24:02 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 59E4E181AC9C6 for ; Wed, 16 Jun 2021 01:24:02 +0000 (UTC) X-FDA: 78257840724.05.33792D6 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP id BC88CE000262 for ; Wed, 16 Jun 2021 01:23:52 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id B131D613B3; Wed, 16 Jun 2021 01:24:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806641; bh=3ag9FLiucw9yUHFFkFtBxEN63h2XIndBvgN6f1UPkO4=; h=Date:From:To:Subject:In-Reply-To:From; b=iiTD6G02u7SoSR6+ZaamPoCU1nyKXJFG795ca8w1Zzk73lv8s0zu0fnoG7U+SfquU ki5KgsmXlBIRvNer13Jcyndn1F7138JDzkgE85vkai0d9uXqRfh1d6aJ0TDLhpT1Rr A5WJ4Qfy4e/PUR3tflZIl11eqdz3hQE9RspmsTlY= Date: Tue, 15 Jun 2021 18:24:00 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 15/18] mm/thp: fix page_address_in_vma() on file THP tails Message-ID: <20210616012400.bqtmTMCeq%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=iiTD6G02; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam02 X-Stat-Signature: 5az8x84aqraf68ijohfxfzpwtxjd6ues X-Rspamd-Queue-Id: BC88CE000262 X-HE-Tag: 1623806632-385537 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jue Wang Subject: mm/thp: fix page_address_in_vma() on file THP tails Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later. Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Jue Wang Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Yang Shi Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Jan Kara Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- mm/rmap.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/mm/rmap.c~mm-thp-fix-page_address_in_vma-on-file-thp-tails +++ a/mm/rmap.c @@ -716,11 +716,11 @@ unsigned long page_address_in_vma(struct if (!vma->anon_vma || !page__anon_vma || vma->anon_vma->root != page__anon_vma->root) return -EFAULT; - } else if (page->mapping) { - if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping) - return -EFAULT; - } else + } else if (!vma->vm_file) { + return -EFAULT; + } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) { return -EFAULT; + } return vma_address(page, vma); } From patchwork Wed Jun 16 01:24:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78D73C48BE5 for ; Wed, 16 Jun 2021 01:24:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 282BE61356 for ; Wed, 16 Jun 2021 01:24:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 282BE61356 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C19B26B0083; Tue, 15 Jun 2021 21:24:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9F5E6B0085; Tue, 15 Jun 2021 21:24:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F1D36B0087; Tue, 15 Jun 2021 21:24:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6CCA56B0083 for ; Tue, 15 Jun 2021 21:24:06 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 14C89181AC9C6 for ; Wed, 16 Jun 2021 01:24:06 +0000 (UTC) X-FDA: 78257840892.39.CC21449 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf01.hostedemail.com (Postfix) with ESMTP id AA2195001703 for ; Wed, 16 Jun 2021 01:23:55 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 5A921613B6; Wed, 16 Jun 2021 01:24:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806645; bh=DDtM76yNyyywlRQfp+y/i/AKA7CDqozCENCVr4dEVks=; h=Date:From:To:Subject:In-Reply-To:From; b=UOmVE/+RBdcTRsOR7SdGjd/EnSPMc7TE+6lr6l4FQQ0s7MRI0ufO+nZz+rH4QNDT5 opCZIpE/JITT/aZj+Cvba8JL+St/+mxyY05/Ngd4cIrml6WytGrISHZ+qEjmGtWrSB BxUidap5Nx6vxd49XTvO+afs0UT3k2DRLorng9yM= Date: Tue, 15 Jun 2021 18:24:03 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 16/18] mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Message-ID: <20210616012403.XsyWFoLpX%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="UOmVE/+R"; spf=pass (imf01.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: k4x8wybegkzo79i15hms3sfhgkejd41s X-Rspamd-Queue-Id: AA2195001703 X-Rspamd-Server: rspam06 X-HE-Tag: 1623806635-733472 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Subject: mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Reviewed-by: Yang Shi Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Zi Yan Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 +++ mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++ mm/truncate.c | 43 +++++++++++++++++++------------------------ 3 files changed, 63 insertions(+), 24 deletions(-) --- a/include/linux/mm.h~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page +++ a/include/linux/mm.h @@ -1719,6 +1719,7 @@ struct zap_details { struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ + struct page *single_page; /* Locked page to be unmapped */ }; struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, @@ -1766,6 +1767,7 @@ extern vm_fault_t handle_mm_fault(struct extern int fixup_user_fault(struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); +void unmap_mapping_page(struct page *page); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, @@ -1786,6 +1788,7 @@ static inline int fixup_user_fault(struc BUG(); return -EFAULT; } +static inline void unmap_mapping_page(struct page *page) { } static inline void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { } static inline void unmap_mapping_range(struct address_space *mapping, --- a/mm/memory.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page +++ a/mm/memory.c @@ -1361,7 +1361,18 @@ static inline unsigned long zap_pmd_rang else if (zap_huge_pmd(tlb, vma, pmd, addr)) goto next; /* fall through */ + } else if (details && details->single_page && + PageTransCompound(details->single_page) && + next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) { + spinlock_t *ptl = pmd_lock(tlb->mm, pmd); + /* + * Take and drop THP pmd lock so that we cannot return + * prematurely, while zap_huge_pmd() has cleared *pmd, + * but not yet decremented compound_mapcount(). + */ + spin_unlock(ptl); } + /* * Here there can be other concurrent MADV_DONTNEED or * trans huge page faults running, and if the pmd is @@ -3237,6 +3248,36 @@ static inline void unmap_mapping_range_t } /** + * unmap_mapping_page() - Unmap single page from processes. + * @page: The locked page to be unmapped. + * + * Unmap this page from any userspace process which still has it mmaped. + * Typically, for efficiency, the range of nearby pages has already been + * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once + * truncation or invalidation holds the lock on a page, it may find that + * the page has been remapped again: and then uses unmap_mapping_page() + * to unmap it finally. + */ +void unmap_mapping_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + struct zap_details details = { }; + + VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON(PageTail(page)); + + details.check_mapping = mapping; + details.first_index = page->index; + details.last_index = page->index + thp_nr_pages(page) - 1; + details.single_page = page; + + i_mmap_lock_write(mapping); + if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) + unmap_mapping_range_tree(&mapping->i_mmap, &details); + i_mmap_unlock_write(mapping); +} + +/** * unmap_mapping_pages() - Unmap pages from processes. * @mapping: The address space containing pages to be unmapped. * @start: Index of first page to be unmapped. --- a/mm/truncate.c~mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page +++ a/mm/truncate.c @@ -167,13 +167,10 @@ void do_invalidatepage(struct page *page * its lock, b) when a concurrent invalidate_mapping_pages got there first and * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space. */ -static void -truncate_cleanup_page(struct address_space *mapping, struct page *page) +static void truncate_cleanup_page(struct page *page) { - if (page_mapped(page)) { - unsigned int nr = thp_nr_pages(page); - unmap_mapping_pages(mapping, page->index, nr, false); - } + if (page_mapped(page)) + unmap_mapping_page(page); if (page_has_private(page)) do_invalidatepage(page, 0, thp_size(page)); @@ -218,7 +215,7 @@ int truncate_inode_page(struct address_s if (page->mapping != mapping) return -EIO; - truncate_cleanup_page(mapping, page); + truncate_cleanup_page(page); delete_from_page_cache(page); return 0; } @@ -325,7 +322,7 @@ void truncate_inode_pages_range(struct a index = indices[pagevec_count(&pvec) - 1] + 1; truncate_exceptional_pvec_entries(mapping, &pvec, indices); for (i = 0; i < pagevec_count(&pvec); i++) - truncate_cleanup_page(mapping, pvec.pages[i]); + truncate_cleanup_page(pvec.pages[i]); delete_from_page_cache_batch(mapping, &pvec); for (i = 0; i < pagevec_count(&pvec); i++) unlock_page(pvec.pages[i]); @@ -639,6 +636,16 @@ int invalidate_inode_pages2_range(struct continue; } + if (!did_range_unmap && page_mapped(page)) { + /* + * If page is mapped, before taking its lock, + * zap the rest of the file in one hit. + */ + unmap_mapping_pages(mapping, index, + (1 + end - index), false); + did_range_unmap = 1; + } + lock_page(page); WARN_ON(page_to_index(page) != index); if (page->mapping != mapping) { @@ -646,23 +653,11 @@ int invalidate_inode_pages2_range(struct continue; } wait_on_page_writeback(page); - if (page_mapped(page)) { - if (!did_range_unmap) { - /* - * Zap the rest of the file in one hit. - */ - unmap_mapping_pages(mapping, index, - (1 + end - index), false); - did_range_unmap = 1; - } else { - /* - * Just zap this page - */ - unmap_mapping_pages(mapping, index, - 1, false); - } - } + + if (page_mapped(page)) + unmap_mapping_page(page); BUG_ON(page_mapped(page)); + ret2 = do_launder_page(mapping, page); if (ret2 == 0) { if (!invalidate_complete_page2(mapping, page)) From patchwork Wed Jun 16 01:24:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB71FC49361 for ; Wed, 16 Jun 2021 01:24:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A029D6137D for ; Wed, 16 Jun 2021 01:24:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A029D6137D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 426D46B0085; Tue, 15 Jun 2021 21:24:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B0A36B0087; Tue, 15 Jun 2021 21:24:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22A836B0088; Tue, 15 Jun 2021 21:24:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id DFE8D6B0085 for ; Tue, 15 Jun 2021 21:24:09 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 89B3B180AD806 for ; Wed, 16 Jun 2021 01:24:09 +0000 (UTC) X-FDA: 78257841018.15.1BA6A57 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf07.hostedemail.com (Postfix) with ESMTP id 0FE13A000240 for ; Wed, 16 Jun 2021 01:23:58 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id DF8D261356; Wed, 16 Jun 2021 01:24:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806648; bh=MLOw5/V1z8iHMvcwpUsf8dUxgEBAZXoa90kpwfudMfY=; h=Date:From:To:Subject:In-Reply-To:From; b=zWRBHnLR7naba5PXwzvOu2xV0Q8FApslj9zizTs3W32uSFq/6nIwFNny4bzcxTMee XPR2IS/XdQcJh1u9u5j9pcgEwEez61NWliHu6jRHeRqLYajvgDQi8mUGKWKKzYxl3O Q4YqpuaHbuYE3Pr1jnRbsivlqOMKGenhBGXhOF8Q= Date: Tue, 15 Jun 2021 18:24:07 -0700 From: Andrew Morton To: akpm@linux-foundation.org, apopple@nvidia.com, hughd@google.com, jack@suse.cz, juew@google.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, peterx@redhat.com, rcampbell@nvidia.com, shakeelb@google.com, shy828301@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangyugui@e16-tech.com, willy@infradead.org, ziy@nvidia.com Subject: [patch 17/18] mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Message-ID: <20210616012407.gaQGkGzcu%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=zWRBHnLR; spf=pass (imf07.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: nrsxdz83c5sneqk17sucrsjmbd9dnu7e X-Rspamd-Queue-Id: 0FE13A000240 X-Rspamd-Server: rspam06 X-HE-Tag: 1623806638-451280 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Shi Subject: mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion. And this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE(). [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi Reviewed-by: Zi Yan Acked-by: Kirill A. Shutemov Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Signed-off-by: Andrew Morton --- mm/huge_memory.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-) --- a/mm/huge_memory.c~mm-thp-replace-debug_vm-bug-with-vm_warn-when-unmap-fails-for-split +++ a/mm/huge_memory.c @@ -2352,15 +2352,15 @@ static void unmap_page(struct page *page { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; - bool unmap_success; VM_BUG_ON_PAGE(!PageHead(page), page); if (PageAnon(page)) ttu_flags |= TTU_SPLIT_FREEZE; - unmap_success = try_to_unmap(page, ttu_flags); - VM_BUG_ON_PAGE(!unmap_success, page); + try_to_unmap(page, ttu_flags); + + VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); } static void remap_page(struct page *page, unsigned int nr) @@ -2671,7 +2671,7 @@ int split_huge_page_to_list(struct page struct deferred_split *ds_queue = get_deferred_split_queue(head); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; - int count, mapcount, extra_pins, ret; + int extra_pins, ret; pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(head), head); @@ -2730,7 +2730,6 @@ int split_huge_page_to_list(struct page } unmap_page(head); - VM_BUG_ON_PAGE(compound_mapcount(head), head); /* block interrupt reentry in xa_lock and spinlock */ local_irq_disable(); @@ -2748,9 +2747,7 @@ int split_huge_page_to_list(struct page /* Prevent deferred_split_scan() touching ->_refcount */ spin_lock(&ds_queue->split_queue_lock); - count = page_count(head); - mapcount = total_mapcount(head); - if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; list_del(page_deferred_list(head)); @@ -2770,16 +2767,9 @@ int split_huge_page_to_list(struct page __split_huge_page(page, list, end); ret = 0; } else { - if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { - pr_alert("total_mapcount: %u, page_count(): %u\n", - mapcount, count); - if (PageTail(page)) - dump_page(head, NULL); - dump_page(page, "total_mapcount(head) > 0"); - BUG(); - } spin_unlock(&ds_queue->split_queue_lock); -fail: if (mapping) +fail: + if (mapping) xa_unlock(&mapping->i_pages); local_irq_enable(); remap_page(head, thp_nr_pages(head)); From patchwork Wed Jun 16 01:24:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12323697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F7A4C49EA2 for ; Wed, 16 Jun 2021 01:24:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8ACD61369 for ; Wed, 16 Jun 2021 01:24:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8ACD61369 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7B7216B0087; Tue, 15 Jun 2021 21:24:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 732306B0088; Tue, 15 Jun 2021 21:24:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 585056B0089; Tue, 15 Jun 2021 21:24:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 258D96B0087 for ; Tue, 15 Jun 2021 21:24:13 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AFEAC181AC9C6 for ; Wed, 16 Jun 2021 01:24:12 +0000 (UTC) X-FDA: 78257841144.25.688EE56 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP id 11BD980192EE for ; Wed, 16 Jun 2021 01:24:02 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 561AE6137D; Wed, 16 Jun 2021 01:24:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1623806651; bh=Cwq6W9yiFn6kEvmMv86He3UjQESiIWQieCGZ4UAvp7k=; h=Date:From:To:Subject:In-Reply-To:From; b=KmHSEw4RecV9geYmRj6G7pJY9py/mvtjeA09XVM/F+PLPFh7xN2FaRd+6gGi2SqRU 8VB7scCyAFi0Ln1ewVtjlFhk/icaDyVG8OrF3ql7tFnsHzTnKMToSb9QzU7yuqWoJs PFZU6lSXhpLZtGWN/Bo6HxIgURWfpaCqJt+2BlWo= Date: Tue, 15 Jun 2021 18:24:10 -0700 From: Andrew Morton To: akpm@linux-foundation.org, bhe@redhat.com, k-hagio-ab@nec.com, linux-mm@kvack.org, miles.chen@mediatek.com, mm-commits@vger.kernel.org, rppt@kernel.org, torvalds@linux-foundation.org Subject: [patch 18/18] mm/sparse: fix check_usemap_section_nr warnings Message-ID: <20210616012410.BpXfvlbxU%akpm@linux-foundation.org> In-Reply-To: <20210615182248.9a0ba90e8e66b9f4a53c0d23@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=KmHSEw4R; spf=pass (imf16.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: hs1p7xnu1s8dsyc7o571j6wprupwc1e7 X-Rspamd-Queue-Id: 11BD980192EE X-Rspamd-Server: rspam06 X-HE-Tag: 1623806642-169658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Miles Chen Subject: mm/sparse: fix check_usemap_section_nr warnings I see a "virt_to_phys used for non-linear address" warning from check_usemap_section_nr() on arm64 platforms. In current implementation of NODE_DATA, if CONFIG_NEED_MULTIPLE_NODES=y, pglist_data is dynamically allocated and assigned to node_data[]. For example, in arch/arm64/include/asm/mmzone.h: extern struct pglist_data *node_data[]; \#define NODE_DATA(nid) (node_data[(nid)]) If CONFIG_NEED_MULTIPLE_NODES=n, pglist_data is defined as a global variable named "contig_page_data". For example, in include/linux/mmzone.h: extern struct pglist_data contig_page_data; \#define NODE_DATA(nid) (&contig_page_data) If CONFIG_DEBUG_VIRTUAL is not enabled, __pa() can handle both dynamically allocated linear addresses and symbol addresses. However, if (CONFIG_DEBUG_VIRTUAL=y && CONFIG_NEED_MULTIPLE_NODES=n) ,we can see the "virt_to_phys used for non-linear address" warning because that &contig_page_data is not a linear address on arm64. To fix it, create a small function to handle both translation. Warning message: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] virt_to_phys used for non-linear address: (____ptrval____) (contig_page_data+0x0/0x1c00) [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x58/0x68 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.13.0-rc1-00074-g1140ab592e2e #3 [ 0.000000] Hardware name: linux,dummy-virt (DT) [ 0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--) [ 0.000000] pc : __virt_to_phys+0x58/0x68 [ 0.000000] lr : __virt_to_phys+0x54/0x68 [ 0.000000] sp : ffff800011833e70 [ 0.000000] x29: ffff800011833e70 x28: 00000000418a0018 x27: 0000000000000000 [ 0.000000] x26: 000000000000000a x25: ffff800011b70000 x24: ffff800011b70000 [ 0.000000] x23: fffffc0001c00000 x22: ffff800011b70000 x21: 0000000047ffffb0 [ 0.000000] x20: 0000000000000008 x19: ffff800011b082c0 x18: ffffffffffffffff [ 0.000000] x17: 0000000000000000 x16: ffff800011833bf9 x15: 0000000000000004 [ 0.000000] x14: 0000000000000fff x13: ffff80001186a548 x12: 0000000000000000 [ 0.000000] x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000 [ 0.000000] x8 : ffff8000115c9000 x7 : 737520737968705f x6 : ffff800011b62ef8 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 [ 0.000000] x2 : 0000000000000000 x1 : ffff80001159585e x0 : 0000000000000058 [ 0.000000] Call trace: [ 0.000000] __virt_to_phys+0x58/0x68 [ 0.000000] check_usemap_section_nr+0x50/0xfc [ 0.000000] sparse_init_nid+0x1ac/0x28c [ 0.000000] sparse_init+0x1c4/0x1e0 [ 0.000000] bootmem_init+0x60/0x90 [ 0.000000] setup_arch+0x184/0x1f0 [ 0.000000] start_kernel+0x78/0x488 [ 0.000000] ---[ end trace f68728a0d3053b60 ]--- Link: https://lkml.kernel.org/r/1623058729-27264-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen Cc: Mike Rapoport Cc: Baoquan He Cc: Kazu Signed-off-by: Andrew Morton --- mm/sparse.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) --- a/mm/sparse.c~mm-sparse-fix-check_usemap_section_nr-warnings +++ a/mm/sparse.c @@ -344,6 +344,15 @@ size_t mem_section_usage_size(void) return sizeof(struct mem_section_usage) + usemap_size(); } +static inline phys_addr_t pgdat_to_phys(struct pglist_data *pgdat) +{ +#ifndef CONFIG_NEED_MULTIPLE_NODES + return __pa_symbol(pgdat); +#else + return __pa(pgdat); +#endif +} + #ifdef CONFIG_MEMORY_HOTREMOVE static struct mem_section_usage * __init sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, @@ -362,7 +371,7 @@ sparse_early_usemaps_alloc_pgdat_section * from the same section as the pgdat where possible to avoid * this problem. */ - goal = __pa(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT); + goal = pgdat_to_phys(pgdat) & (PAGE_SECTION_MASK << PAGE_SHIFT); limit = goal + (1UL << PA_SECTION_SHIFT); nid = early_pfn_to_nid(goal >> PAGE_SHIFT); again: @@ -390,7 +399,7 @@ static void __init check_usemap_section_ } usemap_snr = pfn_to_section_nr(__pa(usage) >> PAGE_SHIFT); - pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); + pgdat_snr = pfn_to_section_nr(pgdat_to_phys(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr) return;