Message ID | 20240826055503.1522320-1-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name | expand |
On Mon 26-08-24 13:55:03, Li Zhijian wrote: > It's observed that a crash occurs during hot-remove a memory device, > in which user is accessing the hugetlb. See calltrace as following: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 > Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s > mirror dm_region_hash dm_log dm_mod > CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > RIP: 0010:do_user_addr_fault+0x2a0/0x790 > Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 > RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 > RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 > R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 > FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? __warn+0x8d/0x190 > ? do_user_addr_fault+0x2a0/0x790 > ? report_bug+0x1c3/0x1d0 > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x14/0x70 > ? asm_exc_invalid_op+0x16/0x20 > ? do_user_addr_fault+0x2a0/0x790 > ? exc_page_fault+0x31/0x200 > exc_page_fault+0x68/0x200 > <...snip...> > BUG: unable to handle page fault for address: 0000000000001000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > ---[ end trace 0000000000000000 ]--- > BUG: unable to handle page fault for address: 0000000000001000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > RIP: 0010:dentry_name+0x1f4/0x440 > <...snip...> > ? dentry_name+0x2fa/0x440 > vsnprintf+0x1f3/0x4f0 > vprintk_store+0x23a/0x540 > vprintk_emit+0x6d/0x330 > _printk+0x58/0x80 > dump_mapping+0x10b/0x1a0 > ? __pfx_free_object_rcu+0x10/0x10 > __dump_page+0x26b/0x3e0 > ? vprintk_emit+0xe0/0x330 > ? _printk+0x58/0x80 > ? dump_page+0x17/0x50 > dump_page+0x17/0x50 > do_migrate_range+0x2f7/0x7f0 > ? do_migrate_range+0x42/0x7f0 > ? offline_pages+0x2f4/0x8c0 > offline_pages+0x60a/0x8c0 > memory_subsys_offline+0x9f/0x1c0 > ? lockdep_hardirqs_on+0x77/0x100 > ? _raw_spin_unlock_irqrestore+0x38/0x60 > device_offline+0xe3/0x110 > state_store+0x6e/0xc0 > kernfs_fop_write_iter+0x143/0x200 > vfs_write+0x39f/0x560 > ksys_write+0x65/0xf0 > do_syscall_64+0x62/0x130 > > Previously, some sanity check have been done in dump_mapping() before > the print facility parsing '%pd' though, it's still possible to run into > an invalid dentry.d_name.name. > > Since dump_mapping() only needs to dump the filename only, retrieve it > by itself in a safer way to prevent an unnecessary crash. > > Note that either retrieving the filename with '%pd' or > strncpy_from_kernel_nofault(), the filename could be unreliable. > > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> I guess this is a reliability improvement :). Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > fs/inode.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index f356fe2ec2b6..d3f9d73d59d0 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -562,6 +562,7 @@ void dump_mapping(const struct address_space *mapping) > struct hlist_node *dentry_first; > struct dentry *dentry_ptr; > struct dentry dentry; > + char fname[64] = {}; > unsigned long ino; > > /* > @@ -598,11 +599,14 @@ void dump_mapping(const struct address_space *mapping) > return; > } > > + if (strncpy_from_kernel_nofault(fname, dentry.d_name.name, 63) < 0) > + strscpy(fname, "<invalid>"); > /* > - * if dentry is corrupted, the %pd handler may still crash, > - * but it's unlikely that we reach here with a corrupt mapping > + * Even if strncpy_from_kernel_nofault() succeeded, > + * the fname could be unreliable > */ > - pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", a_ops, ino, &dentry); > + pr_warn("aops:%ps ino:%lx dentry name(?):\"%s\"\n", > + a_ops, ino, fname); > } > > void clear_inode(struct inode *inode) > -- > 2.29.2 >
On Mon, 26 Aug 2024 13:55:03 +0800, Li Zhijian wrote: > It's observed that a crash occurs during hot-remove a memory device, > in which user is accessing the hugetlb. See calltrace as following: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 > Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s > mirror dm_region_hash dm_log dm_mod > CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > RIP: 0010:do_user_addr_fault+0x2a0/0x790 > Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 > RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 > RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 > R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 > FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? __warn+0x8d/0x190 > ? do_user_addr_fault+0x2a0/0x790 > ? report_bug+0x1c3/0x1d0 > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x14/0x70 > ? asm_exc_invalid_op+0x16/0x20 > ? do_user_addr_fault+0x2a0/0x790 > ? exc_page_fault+0x31/0x200 > exc_page_fault+0x68/0x200 > <...snip...> > BUG: unable to handle page fault for address: 0000000000001000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > ---[ end trace 0000000000000000 ]--- > BUG: unable to handle page fault for address: 0000000000001000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 > Oops: Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > RIP: 0010:dentry_name+0x1f4/0x440 > <...snip...> > ? dentry_name+0x2fa/0x440 > vsnprintf+0x1f3/0x4f0 > vprintk_store+0x23a/0x540 > vprintk_emit+0x6d/0x330 > _printk+0x58/0x80 > dump_mapping+0x10b/0x1a0 > ? __pfx_free_object_rcu+0x10/0x10 > __dump_page+0x26b/0x3e0 > ? vprintk_emit+0xe0/0x330 > ? _printk+0x58/0x80 > ? dump_page+0x17/0x50 > dump_page+0x17/0x50 > do_migrate_range+0x2f7/0x7f0 > ? do_migrate_range+0x42/0x7f0 > ? offline_pages+0x2f4/0x8c0 > offline_pages+0x60a/0x8c0 > memory_subsys_offline+0x9f/0x1c0 > ? lockdep_hardirqs_on+0x77/0x100 > ? _raw_spin_unlock_irqrestore+0x38/0x60 > device_offline+0xe3/0x110 > state_store+0x6e/0xc0 > kernfs_fop_write_iter+0x143/0x200 > vfs_write+0x39f/0x560 > ksys_write+0x65/0xf0 > do_syscall_64+0x62/0x130 > > [...] Applied to the vfs.misc branch of the vfs/vfs.git tree. Patches in the vfs.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.misc [1/1] fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name https://git.kernel.org/vfs/vfs/c/e57de7d9e7fc
diff --git a/fs/inode.c b/fs/inode.c index f356fe2ec2b6..d3f9d73d59d0 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -562,6 +562,7 @@ void dump_mapping(const struct address_space *mapping) struct hlist_node *dentry_first; struct dentry *dentry_ptr; struct dentry dentry; + char fname[64] = {}; unsigned long ino; /* @@ -598,11 +599,14 @@ void dump_mapping(const struct address_space *mapping) return; } + if (strncpy_from_kernel_nofault(fname, dentry.d_name.name, 63) < 0) + strscpy(fname, "<invalid>"); /* - * if dentry is corrupted, the %pd handler may still crash, - * but it's unlikely that we reach here with a corrupt mapping + * Even if strncpy_from_kernel_nofault() succeeded, + * the fname could be unreliable */ - pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", a_ops, ino, &dentry); + pr_warn("aops:%ps ino:%lx dentry name(?):\"%s\"\n", + a_ops, ino, fname); } void clear_inode(struct inode *inode)
It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ #492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> --- fs/inode.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)