mbox series

[v1,0/4] fmm/hwpoison: Fix regressions in memory failure handling

Message ID 20250211060200.33845-1-xueshuai@linux.alibaba.com (mailing list archive)
Headers show
Series fmm/hwpoison: Fix regressions in memory failure handling | expand

Message

Shuai Xue Feb. 11, 2025, 6:01 a.m. UTC
This patch addresses three regressions identified in memory failure
handling, as discovered using ras-tools[1]:

- `./einj_mem_uc copyin -f`
- `./einj_mem_uc futex -f`
- `./einj_mem_uc instr`

The regressions in the copyin and futex cases were caused by the
replacement of `EX_TYPE_UACCESS` with `EX_TYPE_EFAULT_REG` in some
copy-from-user operations, leading to kernel panics. The instr case
regression resulted from the PTE entry not being marked as hwpoison,
causing the system to send unnecessary SIGBUS signals.

These fixes ensure proper handling of memory errors and prevent kernel
panics and unnecessary signal dispatch.

[1]https://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

Shuai Xue (4):
  x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY
  x86/mce: dump error msg from severities
  x86/mce: add EX_TYPE_EFAULT_REG as in-kernel recovery context to fix
    copy-from-user operations regression
  mm/hwpoison: Fix incorrect "not recovered" report for recovered clean
    pages

 arch/x86/kernel/cpu/mce/core.c     | 19 +++++++++++++------
 arch/x86/kernel/cpu/mce/severity.c | 21 ++++++++++++++++-----
 mm/memory-failure.c                |  5 ++---
 3 files changed, 31 insertions(+), 14 deletions(-)