diff mbox series

mm/hwpoison: Fix error page recovered but reported "not recovered"

Message ID 20220107194450.1687264-1-tony.luck@intel.com (mailing list archive)
State New
Headers show
Series mm/hwpoison: Fix error page recovered but reported "not recovered" | expand

Commit Message

Tony Luck Jan. 7, 2022, 7:44 p.m. UTC
From: Youquan Song <youquan.song@intel.com>

When an uncorrected memory error is consumed there is a race between
the CMCI from the memory controller reporting an uncorrected error
with a UCNA signature, and the core reporting and SRAR signature
machine check when the data is about to be consumed.

If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure() and the machine
check processing code finds the page already poisoned. It calls
kill_accessing_process() to make sure a SIGBUS is sent. But
returns the wrong error code.

Console log looks like this:

[34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400
[34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered
[34775.690310] Memory failure: 0x3710b3: already hardware poisoned
[34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption
[34775.706072] mce: Memory error not recovered

Fix kill_accessing_process() to return -EHWPOISON to avoid the noise
message "Memory error not recovered" and skip duplicate SIGBUS.

[Tony: Reworded some parts of commit message]

Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

This code is very subtle ... the fix makes the "not recovered" message
go away ... but I'm not more than 75% confident that this is the right
fix. Please check carefully :-)

 mm/memory-failure.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Naoya Horiguchi Jan. 12, 2022, 12:11 p.m. UTC | #1
On Fri, Jan 07, 2022 at 11:44:50AM -0800, Tony Luck wrote:
> From: Youquan Song <youquan.song@intel.com>
> 
> When an uncorrected memory error is consumed there is a race between
> the CMCI from the memory controller reporting an uncorrected error
> with a UCNA signature, and the core reporting and SRAR signature
> machine check when the data is about to be consumed.
> 
> If the CMCI wins that race, the page is marked poisoned when
> uc_decode_notifier() calls memory_failure() and the machine
> check processing code finds the page already poisoned. It calls
> kill_accessing_process() to make sure a SIGBUS is sent. But
> returns the wrong error code.
> 
> Console log looks like this:
> 
> [34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400
> [34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered
> [34775.690310] Memory failure: 0x3710b3: already hardware poisoned
> [34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption
> [34775.706072] mce: Memory error not recovered
> 
> Fix kill_accessing_process() to return -EHWPOISON to avoid the noise
> message "Memory error not recovered" and skip duplicate SIGBUS.
> 
> [Tony: Reworded some parts of commit message]
> 
> Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
> Signed-off-by: Youquan Song <youquan.song@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> This code is very subtle ... the fix makes the "not recovered" message
> go away ... but I'm not more than 75% confident that this is the right
> fix. Please check carefully :-)
> 
>  mm/memory-failure.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3a274468f193..a67f558b08ea 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -707,7 +707,8 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
>  	if (ret == 1 && priv.tk.addr)
>  		kill_proc(&priv.tk, pfn, flags);
>  	mmap_read_unlock(p->mm);
> -	return ret ? -EFAULT : -EHWPOISON;

Thank you for reporting, the original code was wrong and the trinary operation
returns opposite code.  -EHWPOISON here is to notify kill_me_maybe() that it
does not have to send SIGBUS in its own because kill_accessing_process() already
sent SIGBUS with proper virtual address info.

> +
> +	return (ret < 0) ? -EFAULT : -EHWPOISON;

It seems to me that this returns -EHWPOISON whether any hwpoison entry is
found or not, so this fix can cause another issue.
We want to return -EHWPOISON only when kill_accessing_process() sent SIGBUS,
so I think that the following diff should be what we want.
Could you check this fix works for you?

Thanks,
Naoya Horiguchi
---
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 14ae5c18e776..4c9bd1d37301 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -707,8 +707,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 			      (void *)&priv);
 	if (ret == 1 && priv.tk.addr)
 		kill_proc(&priv.tk, pfn, flags);
+	else
+		ret = 0;
 	mmap_read_unlock(p->mm);
-	return ret ? -EFAULT : -EHWPOISON;
+	return ret > 0 ? -EHWPOISON : -EFAULT;
 }
 
 static const char *action_name[] = {
Tony Luck Jan. 13, 2022, 2 a.m. UTC | #2
On Wed, Jan 12, 2022 at 09:11:45PM +0900, Naoya Horiguchi wrote:
> On Fri, Jan 07, 2022 at 11:44:50AM -0800, Tony Luck wrote:
> > From: Youquan Song <youquan.song@intel.com>

> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 14ae5c18e776..4c9bd1d37301 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -707,8 +707,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
>  			      (void *)&priv);
>  	if (ret == 1 && priv.tk.addr)
>  		kill_proc(&priv.tk, pfn, flags);
> +	else
> +		ret = 0;
>  	mmap_read_unlock(p->mm);
> -	return ret ? -EFAULT : -EHWPOISON;
> +	return ret > 0 ? -EHWPOISON : -EFAULT;
>  }
>  
>  static const char *action_name[] = {

Yes. This fixes the problem (and your explanation helped
me understand this code better).

Fell free to take any words you need from the original patch
comment and switch to:

Reported-by: Youquan Song <youquan.song@intel.com>

Thanks for looking (and fixing!)

-Tony
HORIGUCHI NAOYA(堀口 直也) Jan. 13, 2022, 6:52 a.m. UTC | #3
On Wed, Jan 12, 2022 at 06:00:51PM -0800, Luck, Tony wrote:
> On Wed, Jan 12, 2022 at 09:11:45PM +0900, Naoya Horiguchi wrote:
> > On Fri, Jan 07, 2022 at 11:44:50AM -0800, Tony Luck wrote:
> > > From: Youquan Song <youquan.song@intel.com>
> 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 14ae5c18e776..4c9bd1d37301 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -707,8 +707,10 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
> >  			      (void *)&priv);
> >  	if (ret == 1 && priv.tk.addr)
> >  		kill_proc(&priv.tk, pfn, flags);
> > +	else
> > +		ret = 0;
> >  	mmap_read_unlock(p->mm);
> > -	return ret ? -EFAULT : -EHWPOISON;
> > +	return ret > 0 ? -EHWPOISON : -EFAULT;
> >  }
> >  
> >  static const char *action_name[] = {
> 
> Yes. This fixes the problem (and your explanation helped
> me understand this code better).
> 

Thank you for confirming.  I just sent v2.

> Fell free to take any words you need from the original patch
> comment and switch to:
> 
> Reported-by: Youquan Song <youquan.song@intel.com>
> 
> Thanks for looking (and fixing!)

Your welcome :)

- Naoya
diff mbox series

Patch

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3a274468f193..a67f558b08ea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -707,7 +707,8 @@  static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
 	if (ret == 1 && priv.tk.addr)
 		kill_proc(&priv.tk, pfn, flags);
 	mmap_read_unlock(p->mm);
-	return ret ? -EFAULT : -EHWPOISON;
+
+	return (ret < 0) ? -EFAULT : -EHWPOISON;
 }
 
 static const char *action_name[] = {