From patchwork Fri Oct 21 20:01:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 13015350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F7BDC3A59D for ; Fri, 21 Oct 2022 20:01:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D616B8E0001; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C05AD8E0006; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9460D8E0001; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7BDC28E0003 for ; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 30AAE81305 for ; Fri, 21 Oct 2022 20:01:32 +0000 (UTC) X-FDA: 80046026424.12.E298AB4 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf13.hostedemail.com (Postfix) with ESMTP id 1A1FE20042 for ; Fri, 21 Oct 2022 20:01:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666382491; x=1697918491; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KaF9e0WVFrqnnol9CO30O+x7LQt9U5cOhvA5BN7/yRs=; b=XLMNBXfKFZxl9T7LJBQkDV3HqXfWxelOS8MhxMDUjjt/GtmBHiEaB9+B OY0DkamyIEQx9eLQ9k5DJBDO6c3R8nGJB67vQsne2ENSFL0Um4U7LyhBD XZWZ7YqGmjvZ0IPvZ7FLc9RpboHZJ6Pvw7UKCyGntPUbUJF7QNpWSF4F4 Dcdyrh5mhAmF++8UKMYTTMx6AWAU7NQXrhOcqH9rN35uEGH9o1anDWjJ6 UOezLmChjJwKgJf5UrmAOtplZeaJ9wF4abbhfy2vyjslOS3980++bTDhm 0kZPdfr+uybYfAH22CqJQrR7SnV22f1bUGlFZk3BOtaSFiYZq2vfzfC5E Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="369153374" X-IronPort-AV: E=Sophos;i="5.95,203,1661842800"; d="scan'208";a="369153374" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 13:01:27 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="633069088" X-IronPort-AV: E=Sophos;i="5.95,203,1661842800"; d="scan'208";a="633069088" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 13:01:27 -0700 From: Tony Luck To: Naoya Horiguchi , Andrew Morton Cc: Miaohe Lin , Matthew Wilcox , Shuai Xue , Dan Williams , Michael Ellerman , Nicholas Piggin , Christophe Leroy , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Tony Luck Subject: [PATCH v3 1/2] mm, hwpoison: Try to recover from copy-on write faults Date: Fri, 21 Oct 2022 13:01:19 -0700 Message-Id: <20221021200120.175753-2-tony.luck@intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221021200120.175753-1-tony.luck@intel.com> References: <20221019170835.155381-1-tony.luck@intel.com> <20221021200120.175753-1-tony.luck@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666382491; a=rsa-sha256; cv=none; b=6kJqMHRvIs8+pi8FGq4jxsg1I0upWN1wFvBZo7qeSG1MZDlpIK2rGVhnbybPx7o3J3oSBF O3we1PMOmJ7v+AVzGGi3a/xATPavkhcW9PKJ0RCFHkk83yVGX7OjijU/+bQW8HFqerX/q2 wDM1QLU1TaK98pK+swkMgkNNUvQbzhE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XLMNBXfK; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of tony.luck@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=tony.luck@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666382491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C8HsyIbzksL7xddJtKJ3gj2MC9GwhhXQUiSjyDvYCvw=; b=LItIFggsdNmkx7MANFOw74DuwVfg4UJRBR59ZovKcThf0jBzjoYIKNKgOiKDhuDgxiu/Vi cuQWFYa5lDwEuxltuzp9W7YcSi3bBl3mNCufy8lJ9lwHGPVki7b6eMKAqzB9xQqp1pGHH2 Kam3B93sVpbFW4bMXM2AfIEjl0wPV+w= Authentication-Results: imf13.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=XLMNBXfK; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf13.hostedemail.com: domain of tony.luck@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=tony.luck@intel.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: kpyp47a4pd35dp4ii5aiyq44pnekr19p X-Rspamd-Queue-Id: 1A1FE20042 X-HE-Tag: 1666382490-975775 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the kernel is copying a page as the result of a copy-on-write fault and runs into an uncorrectable error, Linux will crash because it does not have recovery code for this case where poison is consumed by the kernel. It is easy to set up a test case. Just inject an error into a private page, fork(2), and have the child process write to the page. I wrapped that neatly into a test at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git just enable ACPI error injection and run: # ./einj_mem-uc -f copy-on-write Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel() on architectures where that is available (currently x86 and powerpc). When an error is detected during the page copy, return VM_FAULT_HWPOISON to caller of wp_page_copy(). This propagates up the call stack. Both x86 and powerpc have code in their fault handler to deal with this code by sending a SIGBUS to the application. Note that this patch avoids a system crash and signals the process that triggered the copy-on-write action. It does not take any action for the memory error that is still in the shared page. To handle that a call to memory_failure() is needed. But this cannot be done from wp_page_copy() because it holds mmap_lock(). Perhaps the architecture fault handlers can deal with this loose end in a subsequent patch? On Intel/x86 this loose end will often be handled automatically because the memory controller provides an additional notification of the h/w poison in memory, the handler for this will call memory_failure(). This isn't a 100% solution. If there are multiple errors, not all may be logged in this way. Reviewed-by: Dan Williams Signed-off-by: Tony Luck Reviewed-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Reviewed-by: Alexander Potapenko --- Changes in V3: Dan Williams Rename copy_user_highpage_mc() to copy_mc_user_highpage() for consistency with Linus' discussion on names of functions that check for machine check. Write complete functions for the have/have-not copy_mc_to_kernel cases (so grep shows there are two versions) Change __wp_page_copy_user() to return 0 for success, negative for fail [I picked -EAGAIN for both non-EHWPOISON cases] Changes in V2: Naoya Horiguchi: 1) Use -EHWPOISON error code instead of minus one. 2) Poison path needs also to deal with old_page Tony Luck: Rewrote commit message Added some powerpc folks to Cc: list --- include/linux/highmem.h | 24 ++++++++++++++++++++++++ mm/memory.c | 30 ++++++++++++++++++++---------- 2 files changed, 44 insertions(+), 10 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index e9912da5441b..a32c64681f03 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -319,6 +319,30 @@ static inline void copy_user_highpage(struct page *to, struct page *from, #endif +#ifdef copy_mc_to_kernel +static inline int copy_mc_user_highpage(struct page *to, struct page *from, + unsigned long vaddr, struct vm_area_struct *vma) +{ + unsigned long ret; + char *vfrom, *vto; + + vfrom = kmap_local_page(from); + vto = kmap_local_page(to); + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); + kunmap_local(vto); + kunmap_local(vfrom); + + return ret; +} +#else +static inline int copy_mc_user_highpage(struct page *to, struct page *from, + unsigned long vaddr, struct vm_area_struct *vma) +{ + copy_user_highpage(to, from, vaddr, vma); + return 0; +} +#endif + #ifndef __HAVE_ARCH_COPY_HIGHPAGE static inline void copy_highpage(struct page *to, struct page *from) diff --git a/mm/memory.c b/mm/memory.c index f88c351aecd4..b6056eef2f72 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2848,10 +2848,16 @@ static inline int pte_unmap_same(struct vm_fault *vmf) return same; } -static inline bool __wp_page_copy_user(struct page *dst, struct page *src, - struct vm_fault *vmf) +/* + * Return: + * 0: copied succeeded + * -EHWPOISON: copy failed due to hwpoison in source page + * -EAGAIN: copied failed (some other reason) + */ +static inline int __wp_page_copy_user(struct page *dst, struct page *src, + struct vm_fault *vmf) { - bool ret; + int ret; void *kaddr; void __user *uaddr; bool locked = false; @@ -2860,8 +2866,9 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src, unsigned long addr = vmf->address; if (likely(src)) { - copy_user_highpage(dst, src, addr, vma); - return true; + if (copy_mc_user_highpage(dst, src, addr, vma)) + return -EHWPOISON; + return 0; } /* @@ -2888,7 +2895,7 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src, * and update local tlb only */ update_mmu_tlb(vma, addr, vmf->pte); - ret = false; + ret = -EAGAIN; goto pte_unlock; } @@ -2913,7 +2920,7 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src, if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { /* The PTE changed under us, update local tlb */ update_mmu_tlb(vma, addr, vmf->pte); - ret = false; + ret = -EAGAIN; goto pte_unlock; } @@ -2932,7 +2939,7 @@ static inline bool __wp_page_copy_user(struct page *dst, struct page *src, } } - ret = true; + ret = 0; pte_unlock: if (locked) @@ -3104,6 +3111,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) pte_t entry; int page_copied = 0; struct mmu_notifier_range range; + int ret; delayacct_wpcopy_start(); @@ -3121,19 +3129,21 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (!new_page) goto oom; - if (!__wp_page_copy_user(new_page, old_page, vmf)) { + ret = __wp_page_copy_user(new_page, old_page, vmf); + if (ret) { /* * COW failed, if the fault was solved by other, * it's fine. If not, userspace would re-fault on * the same address and we will handle the fault * from the second attempt. + * The -EHWPOISON case will not be retried. */ put_page(new_page); if (old_page) put_page(old_page); delayacct_wpcopy_end(); - return 0; + return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0; } kmsan_copy_page_meta(new_page, old_page); } From patchwork Fri Oct 21 20:01:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tony Luck X-Patchwork-Id: 13015349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB7BC433FE for ; Fri, 21 Oct 2022 20:01:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD0C78E0005; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A09B38E0003; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AB2E8E0005; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 76C508E0001 for ; Fri, 21 Oct 2022 16:01:32 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4307741500 for ; Fri, 21 Oct 2022 20:01:32 +0000 (UTC) X-FDA: 80046026424.13.85F6102 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf15.hostedemail.com (Postfix) with ESMTP id B7B55A003F for ; Fri, 21 Oct 2022 20:01:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666382491; x=1697918491; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V6m+YOfy5sxTfMqFs+Sj5pHdQRZvXRPoJwkK0Nyet44=; b=fXRkKuGBnm9sQtLaKRcWOgVl82mupyEn618Ph+5tVwVTNKWhSZ2oCYi3 pnRAkA4tyDLgh6vixCSG8IWCQOfZ8mbS3e/lRg1I4ynHadQtIQorpoJdh SKvCEou/NmSLrT65Ag6Q6G//414+OwalPgURkoPVSgvtfbRohvJ4ROpi9 /0S05B35XaqFI6JvUj/CL5JyCIsHhMYUEIsSmpVb8SQnVzwnFHKJHr/n+ CAOjLFCAfQz4yl1T5dk7vpnDLyCYxEmHRryEpO6yoYJ3G82GVJ4bdhGUq p+lxRVt82SCqudEZtmb0lXYq0dfrOMpkLPbRXfMF5y4iCjQNhO2hxggE3 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="369153376" X-IronPort-AV: E=Sophos;i="5.95,203,1661842800"; d="scan'208";a="369153376" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 13:01:28 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10507"; a="633069091" X-IronPort-AV: E=Sophos;i="5.95,203,1661842800"; d="scan'208";a="633069091" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Oct 2022 13:01:27 -0700 From: Tony Luck To: Naoya Horiguchi , Andrew Morton Cc: Miaohe Lin , Matthew Wilcox , Shuai Xue , Dan Williams , Michael Ellerman , Nicholas Piggin , Christophe Leroy , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Tony Luck Subject: [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline Date: Fri, 21 Oct 2022 13:01:20 -0700 Message-Id: <20221021200120.175753-3-tony.luck@intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221021200120.175753-1-tony.luck@intel.com> References: <20221019170835.155381-1-tony.luck@intel.com> <20221021200120.175753-1-tony.luck@intel.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=fXRkKuGB; spf=pass (imf15.hostedemail.com: domain of tony.luck@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=tony.luck@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666382491; a=rsa-sha256; cv=none; b=bj+roOHGp9FfoojumEjQ5TkqONWr4fAgc515Hx/oyuCQqbr5mauKO5ixznFqUqfvUvEUty FbcY2VgE8CPPbcZSkIeLsiUXRBx21XQwtpmzUsB0bhxc4BKRM1uXs6Kae2GTj9HZIpbmh4 3RajOpRn05wZEnqHZI5Xr95KzatFRIg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666382491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DtzjFiVfsz6uRTaunxk9mK5vEo7zwX7ecg9qpP3wQuw=; b=uF2ul/mj699UVOzAK4yRWlU+9i8NW+j25k08V6HUQ60Sn8x8DNDy5nlTPvJpRo/jACXq0n 53xpDfR4vBP1jiyNH89taGBL2K2hdYaRepVy2EmACnVdtQn3isuYPagszb2QNSQiEZIeTd fx4PG008zOeVd1TEpZKM8uTw2A0UIjk= Authentication-Results: imf15.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=fXRkKuGB; spf=pass (imf15.hostedemail.com: domain of tony.luck@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=tony.luck@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: 11n5waf5wqapsr5qr8ax7kzmmef7j9wo X-Rspamd-Queue-Id: B7B55A003F X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666382491-982932 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Cannot call memory_failure() directly from the fault handler because mmap_lock (and others) are held. It is important, but not urgent, to mark the source page as h/w poisoned and unmap it from other tasks. Use memory_failure_queue() to request a call to memory_failure() for the page with the error. Also provide a stub version for CONFIG_MEMORY_FAILURE=n Signed-off-by: Tony Luck Reviewed-by: Miaohe Lin --- include/linux/mm.h | 5 ++++- mm/memory.c | 4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc5565..03ced659eb58 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3268,7 +3268,6 @@ enum mf_flags { int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); -extern void memory_failure_queue(unsigned long pfn, int flags); extern void memory_failure_queue_kick(int cpu); extern int unpoison_memory(unsigned long pfn); extern int sysctl_memory_failure_early_kill; @@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p); extern atomic_long_t num_poisoned_pages __read_mostly; extern int soft_offline_page(unsigned long pfn, int flags); #ifdef CONFIG_MEMORY_FAILURE +extern void memory_failure_queue(unsigned long pfn, int flags); extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags); #else +static inline void memory_failure_queue(unsigned long pfn, int flags) +{ +} static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) { return 0; diff --git a/mm/memory.c b/mm/memory.c index b6056eef2f72..eae242351726 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, unsigned long addr = vmf->address; if (likely(src)) { - if (copy_mc_user_highpage(dst, src, addr, vma)) + if (copy_mc_user_highpage(dst, src, addr, vma)) { + memory_failure_queue(page_to_pfn(src), 0); return -EHWPOISON; + } return 0; }