diff mbox series

[2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)

Message ID 20240501232458.3919593-3-jane.chu@oracle.com (mailing list archive)
State New
Headers show
Series Enhance soft hwpoison handling and injection | expand

Commit Message

Jane Chu May 1, 2024, 11:24 p.m. UTC
The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
a synchrous way in a sense, the injector is also a process under
test, and should it have the poisoned page mapped in its address
space, it should legitimately get killed as much as in a real UE
situation.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Miaohe Lin May 5, 2024, 7:02 a.m. UTC | #1
On 2024/5/2 7:24, Jane Chu wrote:
> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
> a synchrous way in a sense, the injector is also a process under
> test, and should it have the poisoned page mapped in its address
> space, it should legitimately get killed as much as in a real UE
> situation.

Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
Thanks.
.

> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 1a073fcc4c0c..eaeae5252c02 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior,
>  		} else {
>  			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>  				 pfn, start);
> -			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
> +			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
>  			if (ret == -EOPNOTSUPP)
>  				ret = 0;
>  		}
>
Jane Chu May 6, 2024, 7:54 p.m. UTC | #2
On 5/5/2024 12:02 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
>> a synchrous way in a sense, the injector is also a process under
>> test, and should it have the poisoned page mapped in its address
>> space, it should legitimately get killed as much as in a real UE
>> situation.
> Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
> Thanks.

So the first question is: Is there a need to preserve the existing 
behavior of  madvise(MADV_HWPOISON)?

The madvise(2) man page says -

        *MADV_HWPOISON *(since Linux 2.6.32)
               Poison the pages in the range specified by/addr/  and/length/
               and handle subsequent references to those pages like a
               hardware memory corruption.  This operation is available
               only for privileged (*CAP_SYS_ADMIN*) processes.  This
               operation may result in the calling process receiving a
               *SIGBUS *and the page being unmapped.

               This feature is intended for testing of memory error-
               handling code; it is available only if the kernel was
               configured with*CONFIG_MEMORY_FAILURE*.

And the impression from my reading is that: there doesn't seem to be a need.

A couple observations -
- The man page states that the calling process may receive a SIGBUS and the page being unmapped.
But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match
the man page.
- There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is,
soft inject without MF_ACTION_REQUIRED flag.

thanks,
-jane

> .
>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/madvise.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 1a073fcc4c0c..eaeae5252c02 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior,
>>   		} else {
>>   			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>>   				 pfn, start);
>> -			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
>> +			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
>>   			if (ret == -EOPNOTSUPP)
>>   				ret = 0;
>>   		}
>>
Miaohe Lin May 8, 2024, 7:58 a.m. UTC | #3
On 2024/5/7 3:54, Jane Chu wrote:
> On 5/5/2024 12:02 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
>>> a synchrous way in a sense, the injector is also a process under
>>> test, and should it have the poisoned page mapped in its address
>>> space, it should legitimately get killed as much as in a real UE
>>> situation.
>> Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
>> Thanks.
> 
> So the first question is: Is there a need to preserve the existing behavior of  madvise(MADV_HWPOISON)?
> 
> The madvise(2) man page says -
> 
>        *MADV_HWPOISON *(since Linux 2.6.32)
>               Poison the pages in the range specified by/addr/  and/length/
>               and handle subsequent references to those pages like a
>               hardware memory corruption.  This operation is available
>               only for privileged (*CAP_SYS_ADMIN*) processes.  This
>               operation may result in the calling process receiving a
>               *SIGBUS *and the page being unmapped.
> 
>               This feature is intended for testing of memory error-
>               handling code; it is available only if the kernel was
>               configured with*CONFIG_MEMORY_FAILURE*.
> 
> And the impression from my reading is that: there doesn't seem to be a need.
> 
> A couple observations -
> - The man page states that the calling process may receive a SIGBUS and the page being unmapped.
> But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match
> the man page.
> - There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is,
> soft inject without MF_ACTION_REQUIRED flag.
> 

I tend to agree with you. It might be a good idea to add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON).
Thanks.
.
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 1a073fcc4c0c..eaeae5252c02 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1127,7 +1127,7 @@  static int madvise_inject_error(int behavior,
 		} else {
 			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				 pfn, start);
-			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
+			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
 			if (ret == -EOPNOTSUPP)
 				ret = 0;
 		}