diff mbox series

[v2] x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy

Message ID 20230526063242.133656-1-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series [v2] x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy | expand

Commit Message

Kefeng Wang May 26, 2023, 6:32 a.m. UTC
Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception
fixup types are used to identify fixups which allow in kernel #MC
recovery, that is the Machine Check Safe Copy.

If an MCE which has happened in kernel space but from which the kernel
can recover, mce.kflags MCE_IN_KERNEL_RECOV will set in error_context(),
and we try to fixup the exception in do_machine_check(). But due to lack
of MCE_IN_KERNEL_COPYIN, although the kernel won't panic, the corrupted
page don't be isolated, new one maybe consume it again, which is not what
we expected.

In order to avoid above issue, some hwpoison recover process[1][2][3],
memory_failure_queue() is called to cope with such unhandled corrupted
pages, and recently coredump hwpoison recovery support[4] is asked to
do the same thing, also there are some other already existed MC-safe
copy scenarios, eg, nvdimm, dm-writecache, dax, which don't isolate
corrupted pages.

The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
then let the core do_machine_check() to isolate corrupted page instead
of doing it one-by-one.

[1] commit d302c2398ba2 ("mm, hwpoison: when copy-on-write hits poison, take page offline")
[2] commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
[3] commit 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
[4] https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v2:
- try to describe more clear problem statement, per Dave Hansen
- collect RB

 arch/x86/kernel/cpu/mce/severity.c |  3 +--
 mm/ksm.c                           |  1 -
 mm/memory.c                        | 12 +++---------
 3 files changed, 4 insertions(+), 12 deletions(-)

Comments

Borislav Petkov May 26, 2023, 7:09 a.m. UTC | #1
On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote:
> The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
> then let the core do_machine_check() to isolate corrupted page instead
> of doing it one-by-one.

No, this whole thing is confused.

 * Indicates an MCE that happened in kernel space while copying data
 * from user.

#define MCE_IN_KERNEL_COPYIN

This is a very specific exception type: EX_TYPE_COPY which got added by

  278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access")

but Linus then removed all such user copy exception points in

  034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")

So now that EX_TYPE_COPY never happens.

And what you're doing is lumping the handling for
EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying
that the MCE happened while copying data from user.

And XSTATE_OP() is one example where this is not really the case.

So no, this is not correct.
Kefeng Wang May 26, 2023, 12:18 p.m. UTC | #2
On 2023/5/26 15:09, Borislav Petkov wrote:
> On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote:
>> The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
>> then let the core do_machine_check() to isolate corrupted page instead
>> of doing it one-by-one.
> 
> No, this whole thing is confused.
> 
>   * Indicates an MCE that happened in kernel space while copying data
>   * from user.
> 
> #define MCE_IN_KERNEL_COPYIN
> 
> This is a very specific exception type: EX_TYPE_COPY which got added by
> 
>    278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access")
> 
> but Linus then removed all such user copy exception points in
> 
>    034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")
> 
> So now that EX_TYPE_COPY never happens.

Is this broken the recover when kernel was copying from user space?

+ Youquan  could you help to check it?

> 
> And what you're doing is lumping the handling for
> EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying
> that the MCE happened while copying data from user.
> 
> And XSTATE_OP() is one example where this is not really the case.
> 

Oh, for XSTATE_OP(), it uses EX_TYPE_DEFAULT_MCE_SAFE, but I'm focus on 
EX_TYPE_DEFAULT_MCE_SAFE, which use copy_mc (arch/x86/lib/copy_mc_64.S),
like I maintained in changelog, CoW/Coredump/nvdimm/dax, they use 
copy_mc_xxx function,  sorry for mixed them up.


> So no, this is not correct.

so only add MCE_IN_KERNEL_COPYIN for EX_TYPE_DEFAULT_MCE_SAFE?

diff --git a/arch/x86/kernel/cpu/mce/severity.c 
b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..6d2587994623 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -293,11 +293,11 @@ static noinstr int error_context(struct mce *m, 
struct pt_regs *regs)
         case EX_TYPE_COPY:
                 if (!copy_user)
                         return IN_KERNEL;
+               fallthrough;
+       case EX_TYPE_DEFAULT_MCE_SAFE:
                 m->kflags |= MCE_IN_KERNEL_COPYIN;
                 fallthrough;
-
         case EX_TYPE_FAULT_MCE_SAFE:
-       case EX_TYPE_DEFAULT_MCE_SAFE:
                 m->kflags |= MCE_IN_KERNEL_RECOV;
                 return IN_KERNEL_RECOV;

Correct me if I am wrong, thanks for you reviewing.


>
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..63e94484c5d6 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -293,12 +293,11 @@  static noinstr int error_context(struct mce *m, struct pt_regs *regs)
 	case EX_TYPE_COPY:
 		if (!copy_user)
 			return IN_KERNEL;
-		m->kflags |= MCE_IN_KERNEL_COPYIN;
 		fallthrough;
 
 	case EX_TYPE_FAULT_MCE_SAFE:
 	case EX_TYPE_DEFAULT_MCE_SAFE:
-		m->kflags |= MCE_IN_KERNEL_RECOV;
+		m->kflags |= MCE_IN_KERNEL_RECOV | MCE_IN_KERNEL_COPYIN;
 		return IN_KERNEL_RECOV;
 
 	default:
diff --git a/mm/ksm.c b/mm/ksm.c
index 0156bded3a66..7abdf4892387 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2794,7 +2794,6 @@  struct page *ksm_might_need_to_copy(struct page *page,
 	if (new_page) {
 		if (copy_mc_user_highpage(new_page, page, address, vma)) {
 			put_page(new_page);
-			memory_failure_queue(page_to_pfn(page), 0);
 			return ERR_PTR(-EHWPOISON);
 		}
 		SetPageDirty(new_page);
diff --git a/mm/memory.c b/mm/memory.c
index 8358f3b853f2..74873e7126aa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2813,10 +2813,8 @@  static inline int __wp_page_copy_user(struct page *dst, struct page *src,
 	unsigned long addr = vmf->address;
 
 	if (likely(src)) {
-		if (copy_mc_user_highpage(dst, src, addr, vma)) {
-			memory_failure_queue(page_to_pfn(src), 0);
+		if (copy_mc_user_highpage(dst, src, addr, vma))
 			return -EHWPOISON;
-		}
 		return 0;
 	}
 
@@ -5851,10 +5849,8 @@  static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
 
 		cond_resched();
 		if (copy_mc_user_highpage(dst_page, src_page,
-					  addr + i*PAGE_SIZE, vma)) {
-			memory_failure_queue(page_to_pfn(src_page), 0);
+					  addr + i*PAGE_SIZE, vma))
 			return -EHWPOISON;
-		}
 	}
 	return 0;
 }
@@ -5870,10 +5866,8 @@  static int copy_subpage(unsigned long addr, int idx, void *arg)
 	struct copy_subpage_arg *copy_arg = arg;
 
 	if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
-				  addr, copy_arg->vma)) {
-		memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0);
+				  addr, copy_arg->vma))
 		return -EHWPOISON;
-	}
 	return 0;
 }