Message ID | 20240501232458.3919593-3-jane.chu@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enhance soft hwpoison handling and injection | expand |
On 2024/5/2 7:24, Jane Chu wrote: > The soft hwpoison injector via madvise(MADV_HWPOISON) operates in > a synchrous way in a sense, the injector is also a process under > test, and should it have the poisoned page mapped in its address > space, it should legitimately get killed as much as in a real UE > situation. Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison? Thanks. . > > Signed-off-by: Jane Chu <jane.chu@oracle.com> > --- > mm/madvise.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 1a073fcc4c0c..eaeae5252c02 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior, > } else { > pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", > pfn, start); > - ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED); > + ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED); > if (ret == -EOPNOTSUPP) > ret = 0; > } >
On 5/5/2024 12:02 AM, Miaohe Lin wrote: > On 2024/5/2 7:24, Jane Chu wrote: >> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in >> a synchrous way in a sense, the injector is also a process under >> test, and should it have the poisoned page mapped in its address >> space, it should legitimately get killed as much as in a real UE >> situation. > Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison? > Thanks. So the first question is: Is there a need to preserve the existing behavior of madvise(MADV_HWPOISON)? The madvise(2) man page says - *MADV_HWPOISON *(since Linux 2.6.32) Poison the pages in the range specified by/addr/ and/length/ and handle subsequent references to those pages like a hardware memory corruption. This operation is available only for privileged (*CAP_SYS_ADMIN*) processes. This operation may result in the calling process receiving a *SIGBUS *and the page being unmapped. This feature is intended for testing of memory error- handling code; it is available only if the kernel was configured with*CONFIG_MEMORY_FAILURE*. And the impression from my reading is that: there doesn't seem to be a need. A couple observations - - The man page states that the calling process may receive a SIGBUS and the page being unmapped. But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match the man page. - There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is, soft inject without MF_ACTION_REQUIRED flag. thanks, -jane > . > >> Signed-off-by: Jane Chu <jane.chu@oracle.com> >> --- >> mm/madvise.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/madvise.c b/mm/madvise.c >> index 1a073fcc4c0c..eaeae5252c02 100644 >> --- a/mm/madvise.c >> +++ b/mm/madvise.c >> @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior, >> } else { >> pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", >> pfn, start); >> - ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED); >> + ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED); >> if (ret == -EOPNOTSUPP) >> ret = 0; >> } >>
On 2024/5/7 3:54, Jane Chu wrote: > On 5/5/2024 12:02 AM, Miaohe Lin wrote: > >> On 2024/5/2 7:24, Jane Chu wrote: >>> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in >>> a synchrous way in a sense, the injector is also a process under >>> test, and should it have the poisoned page mapped in its address >>> space, it should legitimately get killed as much as in a real UE >>> situation. >> Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison? >> Thanks. > > So the first question is: Is there a need to preserve the existing behavior of madvise(MADV_HWPOISON)? > > The madvise(2) man page says - > > *MADV_HWPOISON *(since Linux 2.6.32) > Poison the pages in the range specified by/addr/ and/length/ > and handle subsequent references to those pages like a > hardware memory corruption. This operation is available > only for privileged (*CAP_SYS_ADMIN*) processes. This > operation may result in the calling process receiving a > *SIGBUS *and the page being unmapped. > > This feature is intended for testing of memory error- > handling code; it is available only if the kernel was > configured with*CONFIG_MEMORY_FAILURE*. > > And the impression from my reading is that: there doesn't seem to be a need. > > A couple observations - > - The man page states that the calling process may receive a SIGBUS and the page being unmapped. > But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match > the man page. > - There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is, > soft inject without MF_ACTION_REQUIRED flag. > I tend to agree with you. It might be a good idea to add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON). Thanks. .
diff --git a/mm/madvise.c b/mm/madvise.c index 1a073fcc4c0c..eaeae5252c02 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior, } else { pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", pfn, start); - ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED); + ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED); if (ret == -EOPNOTSUPP) ret = 0; }
The soft hwpoison injector via madvise(MADV_HWPOISON) operates in a synchrous way in a sense, the injector is also a process under test, and should it have the poisoned page mapped in its address space, it should legitimately get killed as much as in a real UE situation. Signed-off-by: Jane Chu <jane.chu@oracle.com> --- mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)