Message ID | 20221123195408.135161-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | hwpoison, shmem, hugetlb: fix data loss issue 5.10.y | expand |
On 2022/11/24 AM3:54, Mike Kravetz wrote: > This is a request for adding the following patches to stable 5.10.y. > > Poisoned shmem and hugetlb pages are removed from the pagecache. > Subsequent access to the offset in the file results in a NEW zero > filled page. Application code does not get notified of the data > loss, and the only 'clue' is a message in the system log. Data > loss has been experienced by real users. > > This was addressed upstream. Most commits were marked for backports, > but some were not. This was discussed here [1] and here [2]. > > Patches apply cleanly to v5.4.224 and pass tests checking for this > specific data loss issue. LTP mm tests show no regressions. > > All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly" > required a small bit of change to apply correctly: mostly for context. > > linux-mm Cc'ed as it would be great to get at least an ACK from others > familiar with this issue. > > [1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/ > [2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/ > > James Houghton (1): > hugetlbfs: don't delete error page from pagecache > > Yang Shi (5): > mm: hwpoison: remove the unnecessary THP check > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > mm: hwpoison: refactor refcount check handling > mm: hwpoison: handle non-anonymous THP correctly > mm: shmem: don't truncate page if memory failure happens > > fs/hugetlbfs/inode.c | 13 ++-- > include/linux/page-flags.h | 23 ++++++ > mm/huge_memory.c | 2 + > mm/hugetlb.c | 4 + > mm/memory-failure.c | 153 ++++++++++++++++++++++++------------- > mm/memory.c | 9 +++ > mm/page_alloc.c | 4 +- > mm/shmem.c | 51 +++++++++++-- > 8 files changed, 191 insertions(+), 68 deletions(-) > Hi, folks Thank you for your effort. Data loss will break the data consistency of end users and it is critical to notify users. I tried to apply this patch set to 5.10.168 stable release[1] and run mm_regression[3] test cases following steps[4] provided by Naoya. All four cases passed. #./run.sh project summary -p Project Name: debug PASS mm/hwpoison/shmem_link/link-hard.auto3 PASS mm/hwpoison/shmem_link/link-sym.auto3 PASS mm/hwpoison/shmem_rw/thp-always.auto3 PASS mm/hwpoison/shmem_rw/thp-never.auto3 Progress: 4 / 4 (100%) Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Cheers, Shuai [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tag/?h=v5.10.168 [2] https://github.com/nhoriguchi/mm_regression [3] https://lore.kernel.org/stable/20221116235842.GA62826@u2004/