From patchwork Tue Apr 11 09:27:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Liu Shixin X-Patchwork-Id: 13207212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6552FC76196 for ; Tue, 11 Apr 2023 08:37:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E129A6B010E; Tue, 11 Apr 2023 04:37:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC2AA6B010F; Tue, 11 Apr 2023 04:37:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB1D928005B; Tue, 11 Apr 2023 04:37:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BAFFE6B010E for ; Tue, 11 Apr 2023 04:37:49 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 95DB51C67FD for ; Tue, 11 Apr 2023 08:37:49 +0000 (UTC) X-FDA: 80668457058.09.DB892A6 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf27.hostedemail.com (Postfix) with ESMTP id 7F97A40022 for ; Tue, 11 Apr 2023 08:37:45 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681202267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=fw5RwB07JZ4cx+BJUaglJ4dliKMow5H0hy3T4RvpQK4=; b=Ef2fcg9uP5T8DP0rthIBg4hC+yMfOIgFEqdwBbUBn54v0Lzwr81NVMo0fS6aiaEMWs5+do z9ph7/uHmr8RheMII+Lz/e+RgwAqHvC/7NEubcO8TeEklww1g8+vDY2HZADmw/R/RrLmlr DCkqPFDnbGcSaIXR3iYDpC6dGr/lTcY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681202267; a=rsa-sha256; cv=none; b=W/WaY98Vq6uWSRvzd05p9oXXnLZwGveiOXbRFulxuPttD9xHDYFlxBkDPBF7JDr1HGNwJT AXTi/FUx/f/krz04lMbBOHSxqoaVxxR1L3SzNZIGaMqPIdZjb8G/YAnaBL2TXmHj/E7vcI NsWkRU6WbcRaDBP7GRFjtp6+NMXGglw= Received: from dggpemm100009.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4PwfJb3gQtznbZL; Tue, 11 Apr 2023 16:34:07 +0800 (CST) Received: from huawei.com (10.175.113.32) by dggpemm100009.china.huawei.com (7.185.36.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Tue, 11 Apr 2023 16:37:40 +0800 From: Liu Shixin To: Andrew Morton , Naoya Horiguchi , Tony Luck CC: Miaohe Lin , Mike Kravetz , Muchun Song , , , Liu Shixin Subject: [PATCH -next] mm: hwpoison: support recovery from HugePage copy-on-write faults Date: Tue, 11 Apr 2023 17:27:41 +0800 Message-ID: <20230411092741.780679-1-liushixin2@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.175.113.32] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm100009.china.huawei.com (7.185.36.113) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 7F97A40022 X-Stat-Signature: 6ra6wwqgbbgu9mkdhj6h553bz4wubbwo X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681202265-766860 X-HE-Meta: U2FsdGVkX1/bkPohuE1P8u3i0tCEcTx6m0tkErl8kU9HRY7uU7Qu4ybiNazHidvIKx0cLJ5dJY3QseTZDdZdsr/q1QdJm/Xs0nmxub30/KGIDOHEoKm0Q3Y52akKSFmQS4PXp6lGJRoMLmSkdZsxwbUfOsjLS1XYSuLrlUsniX0WzjQLdKvU8rWCvHvkUI3phhzC0yLJlHA0GzhDedD38cTfq8oirr0JZZyQbIPhbDJ8JbE7Mw6v6Wg/Tt3HH1gjxjE3dWNSc67+d+zpmPvI4rYZw6YVtzWxneRbeNNQNbtUtydCkkddpYDzi/oA31e2tQuQnyiPuElo5czu5i6VXnSQSZFkCu2ZPHFTbSxmkXkdXO8GlAEqC+KlVIR4M3dMYyGhSgH3m+1lWDlb9L22MX70hprTcYdYPfwQw7gXtHIbmUcbUawgYYWaORNfA0nVpTF4/Gc0UeRvCMXYSgt4AZ9pCEjzDUZkJAYwGPhAcm5AAOnIag3zXxjAkOvjYaVhEMPqy1iBCrVyis5wv3F9mNg7PjsoX9TNm9UyloU8GPqOZsCEkzxwQRywVreramg3XPlnt2kfh/f3nq2AodJsAwukh1EpBamRmEfY72whvZZX/OgrMjZoO1Xg3iUIwkucTHIdjfJNvr4QxQ2/h1sDEgGWo7KFxQlwp1x68Cq1BZmmoj1nnDx8ikSkngArSHZA6UgZgeTWCvHZCA1M2uZ9xPfiqcd23JzA20PAwOcjhCmJKSctBxBV4fF/r3tMeGepK1n1QZTv0WCBIbRDh0gyYNExGDaNuESDcmWPN/de5XPqEqb6NwBMJh/wveLkqB+LDh5I87ztFW20SY3wQhE+t3Sr+Cior9QZTyqpQ6WLmnZd+Q0C+SbZaxvVuBM/otwfF2P06/WhAyQoTwGFE7yQRnlJw92Z9bjEm30otEq1hCRAz10IH56yxFbLW55y3pNzNf2JP2t4f7ybg3xlLTA 0QghD9pc mUltrDSg9AeoGeXMsc1dVpS7gPZlmA6jj3t/hZ0OvPJzp85m0cGbEiP7J0Hzwv3N3qqUhqXsnct/ullbiA6GcHyY86wvp4nEgnYJIuHmAxNxHgvvMdlKE6muw/zMP0GmO+JWhihLO0mFEZXyCBj2e1STYFB/hstli/9sw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Patch a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced a new copy_user_highpage_mc() function, and fix the kernel crash when the kernel is copying a normal page as the result of a copy-on-write fault and runs into an uncorrectable error. But it doesn't work for HugeTLB. This is to support HugeTLB by using copy_mc_user_highpage() in copy_subpage() and copy_user_gigantic_page() too. Moreover, this is also used by userfaultfd, it will return -EHWPOISON if running into an uncorrectable error. Signed-off-by: Liu Shixin Acked-by: Mike Kravetz --- include/linux/mm.h | 6 ++--- mm/hugetlb.c | 19 +++++++++++---- mm/memory.c | 59 +++++++++++++++++++++++++++++----------------- 3 files changed, 56 insertions(+), 28 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cb56b52fc5a2..64827c4d3818 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3714,9 +3714,9 @@ extern const struct attribute_group memory_failure_attr_group; extern void clear_huge_page(struct page *page, unsigned long addr_hint, unsigned int pages_per_huge_page); -void copy_user_folio(struct folio *dst, struct folio *src, - unsigned long addr_hint, - struct vm_area_struct *vma); +int copy_user_folio(struct folio *dst, struct folio *src, + unsigned long addr_hint, + struct vm_area_struct *vma); long copy_folio_from_user(struct folio *dst_folio, const void __user *usr_src, bool allow_pagefault); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index efc443a906fa..b92377aae9ae 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5142,9 +5142,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, ret = PTR_ERR(new_folio); break; } - copy_user_folio(new_folio, page_folio(ptepage), - addr, dst_vma); + ret = copy_user_folio(new_folio, page_folio(ptepage), + addr, dst_vma); put_page(ptepage); + if (ret) { + folio_put(new_folio); + break; + } /* Install the new hugetlb folio if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5661,7 +5665,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, goto out_release_all; } - copy_user_folio(new_folio, page_folio(old_page), address, vma); + if (copy_user_folio(new_folio, page_folio(old_page), address, vma)) { + ret = VM_FAULT_HWPOISON_LARGE; + goto out_release_all; + } __folio_mark_uptodate(new_folio); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, haddr, @@ -6304,9 +6311,13 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, *foliop = NULL; goto out; } - copy_user_folio(folio, *foliop, dst_addr, dst_vma); + ret = copy_user_folio(folio, *foliop, dst_addr, dst_vma); folio_put(*foliop); *foliop = NULL; + if (ret) { + folio_put(folio); + goto out; + } } /* diff --git a/mm/memory.c b/mm/memory.c index 42dd1ab5e4e6..a9ca2cd80638 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5734,12 +5734,12 @@ EXPORT_SYMBOL(__might_fault); * operation. The target subpage will be processed last to keep its * cache lines hot. */ -static inline void process_huge_page( +static inline int process_huge_page( unsigned long addr_hint, unsigned int pages_per_huge_page, - void (*process_subpage)(unsigned long addr, int idx, void *arg), + int (*process_subpage)(unsigned long addr, int idx, void *arg), void *arg) { - int i, n, base, l; + int i, n, base, l, ret; unsigned long addr = addr_hint & ~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1); @@ -5753,7 +5753,9 @@ static inline void process_huge_page( /* Process subpages at the end of huge page */ for (i = pages_per_huge_page - 1; i >= 2 * n; i--) { cond_resched(); - process_subpage(addr + i * PAGE_SIZE, i, arg); + ret = process_subpage(addr + i * PAGE_SIZE, i, arg); + if (ret) + return ret; } } else { /* If target subpage in second half of huge page */ @@ -5762,7 +5764,9 @@ static inline void process_huge_page( /* Process subpages at the begin of huge page */ for (i = 0; i < base; i++) { cond_resched(); - process_subpage(addr + i * PAGE_SIZE, i, arg); + ret = process_subpage(addr + i * PAGE_SIZE, i, arg); + if (ret) + return ret; } } /* @@ -5774,10 +5778,15 @@ static inline void process_huge_page( int right_idx = base + 2 * l - 1 - i; cond_resched(); - process_subpage(addr + left_idx * PAGE_SIZE, left_idx, arg); + ret = process_subpage(addr + left_idx * PAGE_SIZE, left_idx, arg); + if (ret) + return ret; cond_resched(); - process_subpage(addr + right_idx * PAGE_SIZE, right_idx, arg); + ret = process_subpage(addr + right_idx * PAGE_SIZE, right_idx, arg); + if (ret) + return ret; } + return 0; } static void clear_gigantic_page(struct page *page, @@ -5795,11 +5804,12 @@ static void clear_gigantic_page(struct page *page, } } -static void clear_subpage(unsigned long addr, int idx, void *arg) +static int clear_subpage(unsigned long addr, int idx, void *arg) { struct page *page = arg; clear_user_highpage(page + idx, addr); + return 0; } void clear_huge_page(struct page *page, @@ -5816,7 +5826,7 @@ void clear_huge_page(struct page *page, process_huge_page(addr_hint, pages_per_huge_page, clear_subpage, page); } -static void copy_user_gigantic_page(struct folio *dst, struct folio *src, +static int copy_user_gigantic_page(struct folio *dst, struct folio *src, unsigned long addr, struct vm_area_struct *vma, unsigned int pages_per_huge_page) @@ -5830,8 +5840,13 @@ static void copy_user_gigantic_page(struct folio *dst, struct folio *src, src_page = folio_page(src, i); cond_resched(); - copy_user_highpage(dst_page, src_page, addr + i*PAGE_SIZE, vma); + if (copy_mc_user_highpage(dst_page, src_page, + addr + i*PAGE_SIZE, vma)) { + memory_failure_queue(page_to_pfn(src_page), 0); + return -EHWPOISON; + } } + return 0; } struct copy_subpage_arg { @@ -5840,16 +5855,20 @@ struct copy_subpage_arg { struct vm_area_struct *vma; }; -static void copy_subpage(unsigned long addr, int idx, void *arg) +static int copy_subpage(unsigned long addr, int idx, void *arg) { struct copy_subpage_arg *copy_arg = arg; - copy_user_highpage(copy_arg->dst + idx, copy_arg->src + idx, - addr, copy_arg->vma); + if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx, + addr, copy_arg->vma)) { + memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0); + return -EHWPOISON; + } + return 0; } -void copy_user_folio(struct folio *dst, struct folio *src, - unsigned long addr_hint, struct vm_area_struct *vma) +int copy_user_folio(struct folio *dst, struct folio *src, + unsigned long addr_hint, struct vm_area_struct *vma) { unsigned int pages_per_huge_page = folio_nr_pages(dst); unsigned long addr = addr_hint & @@ -5860,13 +5879,11 @@ void copy_user_folio(struct folio *dst, struct folio *src, .vma = vma, }; - if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) { - copy_user_gigantic_page(dst, src, addr, vma, - pages_per_huge_page); - return; - } + if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) + return copy_user_gigantic_page(dst, src, addr, vma, + pages_per_huge_page); - process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg); + return process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg); } long copy_folio_from_user(struct folio *dst_folio,