diff mbox series

[v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

Message ID 20210201161749.0e8dc212.yaoaili@kingsoft.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access. | expand

Commit Message

yaoaili [么爱利] Feb. 1, 2021, 8:17 a.m. UTC
When one page is already hwpoisoned by AO action, process may not be
killed, the process mapping this page may make a syscall include this
page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
mode it may be fixed by fixup_exception. Current code will just return
error code to user process.

This is not sufficient, we should send a SIGBUS to the process and log
the info to console, as we can't trust the process will handle the error
correctly.

Suggested-by: Feng Yang <yangfeng1@kingsoft.com>
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
---
 arch/x86/mm/fault.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

Comments

Andy Lutomirski Feb. 1, 2021, 4:58 p.m. UTC | #1
On Mon, Feb 1, 2021 at 12:17 AM Aili Yao <yaoaili@kingsoft.com> wrote:
>
> When one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> mode it may be fixed by fixup_exception. Current code will just return
> error code to user process.
>
> This is not sufficient, we should send a SIGBUS to the process and log
> the info to console, as we can't trust the process will handle the error
> correctly.

Does this happen when one process gets SIGBUSed due to memory failure
and another process independently hits the poisoned memory?  I'm not
entirely convinced that this is a problem.

In any case, this patch needs rebasing on top of my big fault series
-- as it stands, it's way too difficult to keep track of which paths
even call your new code..  And the various signal paths need to be
consolidated -- we already have three of them, and the last thing we
need is a fourth.
Tony Luck Feb. 1, 2021, 9:10 p.m. UTC | #2
> In any case, this patch needs rebasing on top of my big fault series

Is that series in some GIT tree? Or can you give a lore.kernel.org link?

Thanks

-Tony
Andy Lutomirski Feb. 1, 2021, 10:09 p.m. UTC | #3
On Mon, Feb 1, 2021 at 1:10 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> > In any case, this patch needs rebasing on top of my big fault series
>
> Is that series in some GIT tree? Or can you give a lore.kernel.org link?

https://lore.kernel.org/lkml/cover.1612113550.git.luto@kernel.org/T/#t

>
> Thanks
>
> -Tony
yaoaili [么爱利] Feb. 2, 2021, 6:42 a.m. UTC | #4
On Mon, 1 Feb 2021 08:58:27 -0800
Andy Lutomirski <luto@kernel.org> wrote:

> On Mon, Feb 1, 2021 at 12:17 AM Aili Yao <yaoaili@kingsoft.com> wrote:
> >
> > When one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> > mode it may be fixed by fixup_exception. Current code will just return
> > error code to user process.
> >
> > This is not sufficient, we should send a SIGBUS to the process and log
> > the info to console, as we can't trust the process will handle the error
> > correctly.  
> 
> Does this happen when one process gets SIGBUSed due to memory failure
> and another process independently hits the poisoned memory?  I'm not
> entirely convinced that this is a problem.
> 

OK, I will explain more, hope this will be helpful:
One page get poisoned which can be caused by at least two scenarios:
1. One user process access a address which corrupted, the memory failure() will 
be called, the function will unmap the page which contain the corrupt memory cell, the process
triggering the error will get signaled with SIGBUS. Other process sharing this page
will get its related pte marked with SWP_HWPOISON, and in early-kill case, these other processes
will also be signaled with SIGBUS, In later-kill case, It should be signaled when it touch the page
which has been poisoned. 

2.A patrol scrub UE error will also trigger the same process, page unmapped, pte being marked with 
SWP_HWPOISON. In later-kill case, the process which touch the poisoned page will trigger a page fault
and should be signaled with SIGBUS.

In this later-kill case, normally it will hit the following code:

    965 #ifdef CONFIG_MEMORY_FAILURE
    966         if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
    967                 struct task_struct *tsk = current;
    968                 unsigned lsb = 0;
    969 
    970                 pr_err(
    971         "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
    972                         tsk->comm, tsk->pid, address);
    973                 if (fault & VM_FAULT_HWPOISON_LARGE)
    974                         lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
    975                 if (fault & VM_FAULT_HWPOISON)
    976                         lsb = PAGE_SHIFT;
    977                 force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
    978                 return;
    979         }

Or there is a case that user process make a syscall including the posioned user address(assume to be ADD_A)
and make a request to copy the ADD_A content to kernel space. In such a case, it will trigger a page fault when
copying starts. As it's in kernel mode and the address is in user space, the process will hit:

    944 static void
    945 do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
    946           vm_fault_t fault)
    947 {
    948         /* Kernel mode? Handle exceptions or die: */
    949         if (!(error_code & X86_PF_USER)) {
    950                 no_context(regs, error_code, address, SIGBUS, BUS_ADRERR, fault);
    951                 return;
    952         } 

In no_context(), fixup_exception() will be called, usually the copy function in such a case will provide
one fixup function, which will return true, then no_context() return, finally the syscall will return one ERROR
code(errno=14 Bad address) to user process, which the user process won't know the where the real problem is.
From syslog, we can't guarantee memory error recovey log and the user process error will have a close correlation in
timestamp.

Previous behavior is not only for latest upstream code, but also apply to older kernel versions.

This patch is to correct this behavior by Sending SIGBUS to user process in such a scenario. This behavior
in this patch is consistent with other memory poison case.

Following is the test result:

Current 5.11 rc6:
 ./einj_mem_uc -S -c 1 -f copyin
0: copyin   vaddr = 0x7fed9808e400 paddr = 1b00c00400
./einj_mem_uc: couldn't write temp file (errno=14) 
Expected SIGBUS, didn't get one
page not present
Big surprise ... still running. Thought that would be fatal
Test passed

Current 5.11 rc6 with this patch:
./einj_mem_uc -S -c 1 -f copyin
0: copyin   vaddr = 0x7fef60e3d400 paddr = 14b7f43400
SIGBUS: addr = 0x7fef60e3d400
page not present
Big surprise ... still running. Thought that would be fatal
Test passed

And there is a small modification in the einj_mem_uc.c I missed in previous mail, which will lead the
test result unexpected.

    306 int trigger_copyin(char *addr)
    307 {

    316         if ((ret = write(copyin_fd, addr - memcpy_runup, memcpy_size)) != memcpy_size) {

    324 }


> In any case, this patch needs rebasing on top of my big fault series
> -- as it stands, it's way too difficult to keep track of which paths
> even call your new code..  And the various signal paths need to be
> consolidated -- we already have three of them, and the last thing we
> need is a fourth.

If you recognize the point listed, I will rebase the patch to your latest code.

Thanks
HORIGUCHI NAOYA(堀口 直也) Feb. 4, 2021, 7:25 a.m. UTC | #5
Hi Aili,

On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote:
> When one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> mode it may be fixed by fixup_exception. Current code will just return
> error code to user process.
> 
> This is not sufficient, we should send a SIGBUS to the process and log
> the info to console, as we can't trust the process will handle the error
> correctly.
> 
> Suggested-by: Feng Yang <yangfeng1@kingsoft.com>
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> ---
...

> @@ -662,12 +662,32 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>  		 * In this case we need to make sure we're not recursively
>  		 * faulting through the emulate_vsyscall() logic.
>  		 */
> +
> +		if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> +		    fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
> +			unsigned int lsb = 0;
> +
> +			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> +				current->comm, current->pid, address);
> +
> +			sanitize_error_code(address, &error_code);
> +			set_signal_archinfo(address, error_code);
> +
> +			if (fault & VM_FAULT_HWPOISON_LARGE)
> +				lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> +			if (fault & VM_FAULT_HWPOISON)
> +				lsb = PAGE_SHIFT;
> +
> +			force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);

This part contains some duplicated code with do_sigbus(), so some refactoring (like
adding a common function) would be more helpful.

Thanks,
Naoya Horiguchi
yaoaili [么爱利] Feb. 5, 2021, 5:06 a.m. UTC | #6
On Thu, 4 Feb 2021 07:25:55 +0000
HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> Hi Aili,
> 
> On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote:
> > When one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> > mode it may be fixed by fixup_exception. Current code will just return
> > error code to user process.
> > 
> > This is not sufficient, we should send a SIGBUS to the process and log
> > the info to console, as we can't trust the process will handle the error
> > correctly.
> > 
> > Suggested-by: Feng Yang <yangfeng1@kingsoft.com>
> > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> > ---  
> ...
> 
> > @@ -662,12 +662,32 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> >  		 * In this case we need to make sure we're not recursively
> >  		 * faulting through the emulate_vsyscall() logic.
> >  		 */
> > +
> > +		if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> > +		    fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
> > +			unsigned int lsb = 0;
> > +
> > +			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> > +				current->comm, current->pid, address);
> > +
> > +			sanitize_error_code(address, &error_code);
> > +			set_signal_archinfo(address, error_code);
> > +
> > +			if (fault & VM_FAULT_HWPOISON_LARGE)
> > +				lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> > +			if (fault & VM_FAULT_HWPOISON)
> > +				lsb = PAGE_SHIFT;
> > +
> > +			force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);  
> 
> This part contains some duplicated code with do_sigbus(), so some refactoring (like
> adding a common function) would be more helpful.

Yes, agree, I will modify this and rebase to the big fault series from tip.

Thanks
Aili Yao
diff mbox series

Patch

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..23095b94cf42 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -631,7 +631,7 @@  static void set_signal_archinfo(unsigned long address,
 
 static noinline void
 no_context(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address, int signal, int si_code)
+	   unsigned long address, int signal, int si_code, vm_fault_t fault)
 {
 	struct task_struct *tsk = current;
 	unsigned long flags;
@@ -662,12 +662,32 @@  no_context(struct pt_regs *regs, unsigned long error_code,
 		 * In this case we need to make sure we're not recursively
 		 * faulting through the emulate_vsyscall() logic.
 		 */
+
+		if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
+		    fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
+			unsigned int lsb = 0;
+
+			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+				current->comm, current->pid, address);
+
+			sanitize_error_code(address, &error_code);
+			set_signal_archinfo(address, error_code);
+
+			if (fault & VM_FAULT_HWPOISON_LARGE)
+				lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+			if (fault & VM_FAULT_HWPOISON)
+				lsb = PAGE_SHIFT;
+
+			force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
+
+			return;
+		}
+
 		if (current->thread.sig_on_uaccess_err && signal) {
 			sanitize_error_code(address, &error_code);
 
 			set_signal_archinfo(address, error_code);
 
-			/* XXX: hwpoison faults will set the wrong code. */
 			force_sig_fault(signal, si_code, (void __user *)address);
 		}
 
@@ -836,7 +856,7 @@  __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	if (is_f00f_bug(regs, address))
 		return;
 
-	no_context(regs, error_code, address, SIGSEGV, si_code);
+	no_context(regs, error_code, address, SIGSEGV, si_code, 0);
 }
 
 static noinline void
@@ -927,7 +947,7 @@  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 {
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & X86_PF_USER)) {
-		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR, fault);
 		return;
 	}
 
@@ -966,7 +986,7 @@  mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	       unsigned long address, vm_fault_t fault)
 {
 	if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
-		no_context(regs, error_code, address, 0, 0);
+		no_context(regs, error_code, address, 0, 0, 0);
 		return;
 	}
 
@@ -974,7 +994,7 @@  mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		/* Kernel mode? Handle exceptions or die: */
 		if (!(error_code & X86_PF_USER)) {
 			no_context(regs, error_code, address,
-				   SIGSEGV, SEGV_MAPERR);
+				   SIGSEGV, SEGV_MAPERR, 0);
 			return;
 		}
 
@@ -1396,7 +1416,7 @@  void do_user_addr_fault(struct pt_regs *regs,
 	if (fault_signal_pending(fault, regs)) {
 		if (!user_mode(regs))
 			no_context(regs, hw_error_code, address, SIGBUS,
-				   BUS_ADRERR);
+				   BUS_ADRERR, 0);
 		return;
 	}