Message ID | e223b0e6ba2f4924984b1917cc717bd5@honor.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] mm: Fix possible NULL pointer dereference in __swap_duplicate | expand |
On Wed, Feb 19, 2025 at 2:56 PM gaoxu <gaoxu2@honor.com> wrote: > > Add a NULL check on the return value of swp_swap_info in __swap_duplicate > to prevent crashes caused by NULL pointer dereference. > > The reason why swp_swap_info() returns NULL is unclear; it may be due to > CPU cache issues or DDR bit flips. The probability of this issue is very > small, and the stack info we encountered is as follows: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000058 > [RB/E]rb_sreason_str_set: sreason_str set null_pointer > Mem abort info: > ESR = 0x0000000096000005 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x05: level 1 translation fault > Data abort info: > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 > [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, > pud=0000000000000000 > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP > Skip md ftrace buffer dump for: 0x1609e0 > ... > pc : swap_duplicate+0x44/0x164 > lr : copy_page_range+0x508/0x1e78 > sp : ffffffc0f2a699e0 > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > Call trace: > swap_duplicate+0x44/0x164 > copy_page_range+0x508/0x1e78 > copy_process+0x1278/0x21cc > kernel_clone+0x90/0x438 > __arm64_sys_clone+0x5c/0x8c > invoke_syscall+0x58/0x110 > do_el0_svc+0x8c/0xe0 > el0_svc+0x38/0x9c > el0t_64_sync_handler+0x44/0xec > el0t_64_sync+0x1a8/0x1ac > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Oops: Fatal exception > SMP: stopping secondary CPUs > > The patch seems to only provide a workaround, but there are no more > effective software solutions to handle the bit flips problem. This path > will change the issue from a system crash to a process exception, thereby > reducing the impact on the entire machine. > > Signed-off-by: gao xu <gaoxu2@honor.com> Regardless of whether the above statement is 100% accurate or whether a bit-flip actually exists, providing this check still seems useful, at least for defensive programming. Reviewed-by: Barry Song <baohua@kernel.org> > --- > v1 -> v2: > - Add WARN_ON_ONCE. > - update the commit info. > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > V3 -> v4: Add swap entry logging per Barry Song's suggestion. > --- > mm/swapfile.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 7448a3876..403df1817 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3521,6 +3521,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) > int err, i; > > si = swp_swap_info(entry); > + if (WARN_ON_ONCE(!si)) { > + pr_err("%s%08lx\n", Bad_file, entry.val); > + return -EINVAL; > + } > > offset = swp_offset(entry); > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); > -- > 2.17.1
On Wed, 19 Feb 2025 15:28:26 +1300 Barry Song <21cnbao@gmail.com> wrote: > > The patch seems to only provide a workaround, but there are no more > > effective software solutions to handle the bit flips problem. This path > > will change the issue from a system crash to a process exception, thereby > > reducing the impact on the entire machine. > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > Regardless of whether the above statement is 100% accurate or whether > a bit-flip actually exists, providing this check still seems useful, > at least for > defensive programming. I'm doubtful as well. How often has this crash been observed?
> > On Wed, 19 Feb 2025 15:28:26 +1300 Barry Song <21cnbao@gmail.com> > wrote: > > > > The patch seems to only provide a workaround, but there are no more > > > effective software solutions to handle the bit flips problem. This > > > path will change the issue from a system crash to a process > > > exception, thereby reducing the impact on the entire machine. > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > > Regardless of whether the above statement is 100% accurate or whether > > a bit-flip actually exists, providing this check still seems useful, > > at least for defensive programming. > > I'm doubtful as well. > > How often has this crash been observed? The probability of this issue occurring is approximately 1 in 500,000 per week.
diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..403df1817 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3521,6 +3521,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) int err, i; si = swp_swap_info(entry); + if (WARN_ON_ONCE(!si)) { + pr_err("%s%08lx\n", Bad_file, entry.val); + return -EINVAL; + } offset = swp_offset(entry); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
Add a NULL check on the return value of swp_swap_info in __swap_duplicate to prevent crashes caused by NULL pointer dereference. The reason why swp_swap_info() returns NULL is unclear; it may be due to CPU cache issues or DDR bit flips. The probability of this issue is very small, and the stack info we encountered is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace buffer dump for: 0x1609e0 ... pc : swap_duplicate+0x44/0x164 lr : copy_page_range+0x508/0x1e78 sp : ffffffc0f2a699e0 x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call trace: swap_duplicate+0x44/0x164 copy_page_range+0x508/0x1e78 copy_process+0x1278/0x21cc kernel_clone+0x90/0x438 __arm64_sys_clone+0x5c/0x8c invoke_syscall+0x58/0x110 do_el0_svc+0x8c/0xe0 el0_svc+0x38/0x9c el0t_64_sync_handler+0x44/0xec el0t_64_sync+0x1a8/0x1ac Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs The patch seems to only provide a workaround, but there are no more effective software solutions to handle the bit flips problem. This path will change the issue from a system crash to a process exception, thereby reducing the impact on the entire machine. Signed-off-by: gao xu <gaoxu2@honor.com> --- v1 -> v2: - Add WARN_ON_ONCE. - update the commit info. v2 -> v3: Delete the review tags (This is my issue, and I apologize). V3 -> v4: Add swap entry logging per Barry Song's suggestion. --- mm/swapfile.c | 4 ++++ 1 file changed, 4 insertions(+)