diff mbox series

fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name

Message ID 20240826055503.1522320-1-lizhijian@fujitsu.com (mailing list archive)
State New
Headers show
Series fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name | expand

Commit Message

Zhijian Li (Fujitsu) Aug. 26, 2024, 5:55 a.m. UTC
It's observed that a crash occurs during hot-remove a memory device,
in which user is accessing the hugetlb. See calltrace as following:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790
Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s
mirror dm_region_hash dm_log dm_mod
CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:do_user_addr_fault+0x2a0/0x790
Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41
RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046
RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658
R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000
FS:  00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __warn+0x8d/0x190
 ? do_user_addr_fault+0x2a0/0x790
 ? report_bug+0x1c3/0x1d0
 ? handle_bug+0x3c/0x70
 ? exc_invalid_op+0x14/0x70
 ? asm_exc_invalid_op+0x16/0x20
 ? do_user_addr_fault+0x2a0/0x790
 ? exc_page_fault+0x31/0x200
 exc_page_fault+0x68/0x200
<...snip...>
BUG: unable to handle page fault for address: 0000000000001000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
 ---[ end trace 0000000000000000 ]---
 BUG: unable to handle page fault for address: 0000000000001000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G        W          6.10.0-rc2-lizhijian+ #492
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 RIP: 0010:dentry_name+0x1f4/0x440
<...snip...>
? dentry_name+0x2fa/0x440
vsnprintf+0x1f3/0x4f0
vprintk_store+0x23a/0x540
vprintk_emit+0x6d/0x330
_printk+0x58/0x80
dump_mapping+0x10b/0x1a0
? __pfx_free_object_rcu+0x10/0x10
__dump_page+0x26b/0x3e0
? vprintk_emit+0xe0/0x330
? _printk+0x58/0x80
? dump_page+0x17/0x50
dump_page+0x17/0x50
do_migrate_range+0x2f7/0x7f0
? do_migrate_range+0x42/0x7f0
? offline_pages+0x2f4/0x8c0
offline_pages+0x60a/0x8c0
memory_subsys_offline+0x9f/0x1c0
? lockdep_hardirqs_on+0x77/0x100
? _raw_spin_unlock_irqrestore+0x38/0x60
device_offline+0xe3/0x110
state_store+0x6e/0xc0
kernfs_fop_write_iter+0x143/0x200
vfs_write+0x39f/0x560
ksys_write+0x65/0xf0
do_syscall_64+0x62/0x130

Previously, some sanity check have been done in dump_mapping() before
the print facility parsing '%pd' though, it's still possible to run into
an invalid dentry.d_name.name.

Since dump_mapping() only needs to dump the filename only, retrieve it
by itself in a safer way to prevent an unnecessary crash.

Note that either retrieving the filename with '%pd' or
strncpy_from_kernel_nofault(), the filename could be unreliable.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 fs/inode.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Jan Kara Aug. 28, 2024, 1:32 p.m. UTC | #1
On Mon 26-08-24 13:55:03, Li Zhijian wrote:
> It's observed that a crash occurs during hot-remove a memory device,
> in which user is accessing the hugetlb. See calltrace as following:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790
> Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s
> mirror dm_region_hash dm_log dm_mod
> CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:do_user_addr_fault+0x2a0/0x790
> Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41
> RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046
> RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658
> R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000
> FS:  00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  ? __warn+0x8d/0x190
>  ? do_user_addr_fault+0x2a0/0x790
>  ? report_bug+0x1c3/0x1d0
>  ? handle_bug+0x3c/0x70
>  ? exc_invalid_op+0x14/0x70
>  ? asm_exc_invalid_op+0x16/0x20
>  ? do_user_addr_fault+0x2a0/0x790
>  ? exc_page_fault+0x31/0x200
>  exc_page_fault+0x68/0x200
> <...snip...>
> BUG: unable to handle page fault for address: 0000000000001000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>  ---[ end trace 0000000000000000 ]---
>  BUG: unable to handle page fault for address: 0000000000001000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G        W          6.10.0-rc2-lizhijian+ #492
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:dentry_name+0x1f4/0x440
> <...snip...>
> ? dentry_name+0x2fa/0x440
> vsnprintf+0x1f3/0x4f0
> vprintk_store+0x23a/0x540
> vprintk_emit+0x6d/0x330
> _printk+0x58/0x80
> dump_mapping+0x10b/0x1a0
> ? __pfx_free_object_rcu+0x10/0x10
> __dump_page+0x26b/0x3e0
> ? vprintk_emit+0xe0/0x330
> ? _printk+0x58/0x80
> ? dump_page+0x17/0x50
> dump_page+0x17/0x50
> do_migrate_range+0x2f7/0x7f0
> ? do_migrate_range+0x42/0x7f0
> ? offline_pages+0x2f4/0x8c0
> offline_pages+0x60a/0x8c0
> memory_subsys_offline+0x9f/0x1c0
> ? lockdep_hardirqs_on+0x77/0x100
> ? _raw_spin_unlock_irqrestore+0x38/0x60
> device_offline+0xe3/0x110
> state_store+0x6e/0xc0
> kernfs_fop_write_iter+0x143/0x200
> vfs_write+0x39f/0x560
> ksys_write+0x65/0xf0
> do_syscall_64+0x62/0x130
> 
> Previously, some sanity check have been done in dump_mapping() before
> the print facility parsing '%pd' though, it's still possible to run into
> an invalid dentry.d_name.name.
> 
> Since dump_mapping() only needs to dump the filename only, retrieve it
> by itself in a safer way to prevent an unnecessary crash.
> 
> Note that either retrieving the filename with '%pd' or
> strncpy_from_kernel_nofault(), the filename could be unreliable.
> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

I guess this is a reliability improvement :). Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/inode.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f356fe2ec2b6..d3f9d73d59d0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -562,6 +562,7 @@ void dump_mapping(const struct address_space *mapping)
>  	struct hlist_node *dentry_first;
>  	struct dentry *dentry_ptr;
>  	struct dentry dentry;
> +	char fname[64] = {};
>  	unsigned long ino;
>  
>  	/*
> @@ -598,11 +599,14 @@ void dump_mapping(const struct address_space *mapping)
>  		return;
>  	}
>  
> +	if (strncpy_from_kernel_nofault(fname, dentry.d_name.name, 63) < 0)
> +		strscpy(fname, "<invalid>");
>  	/*
> -	 * if dentry is corrupted, the %pd handler may still crash,
> -	 * but it's unlikely that we reach here with a corrupt mapping
> +	 * Even if strncpy_from_kernel_nofault() succeeded,
> +	 * the fname could be unreliable
>  	 */
> -	pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", a_ops, ino, &dentry);
> +	pr_warn("aops:%ps ino:%lx dentry name(?):\"%s\"\n",
> +		a_ops, ino, fname);
>  }
>  
>  void clear_inode(struct inode *inode)
> -- 
> 2.29.2
>
Christian Brauner Aug. 28, 2024, 2:48 p.m. UTC | #2
On Mon, 26 Aug 2024 13:55:03 +0800, Li Zhijian wrote:
> It's observed that a crash occurs during hot-remove a memory device,
> in which user is accessing the hugetlb. See calltrace as following:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790
> Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s
> mirror dm_region_hash dm_log dm_mod
> CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ #492
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> RIP: 0010:do_user_addr_fault+0x2a0/0x790
> Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41
> RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046
> RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658
> R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000
> FS:  00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  ? __warn+0x8d/0x190
>  ? do_user_addr_fault+0x2a0/0x790
>  ? report_bug+0x1c3/0x1d0
>  ? handle_bug+0x3c/0x70
>  ? exc_invalid_op+0x14/0x70
>  ? asm_exc_invalid_op+0x16/0x20
>  ? do_user_addr_fault+0x2a0/0x790
>  ? exc_page_fault+0x31/0x200
>  exc_page_fault+0x68/0x200
> <...snip...>
> BUG: unable to handle page fault for address: 0000000000001000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>  ---[ end trace 0000000000000000 ]---
>  BUG: unable to handle page fault for address: 0000000000001000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G        W          6.10.0-rc2-lizhijian+ #492
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:dentry_name+0x1f4/0x440
> <...snip...>
> ? dentry_name+0x2fa/0x440
> vsnprintf+0x1f3/0x4f0
> vprintk_store+0x23a/0x540
> vprintk_emit+0x6d/0x330
> _printk+0x58/0x80
> dump_mapping+0x10b/0x1a0
> ? __pfx_free_object_rcu+0x10/0x10
> __dump_page+0x26b/0x3e0
> ? vprintk_emit+0xe0/0x330
> ? _printk+0x58/0x80
> ? dump_page+0x17/0x50
> dump_page+0x17/0x50
> do_migrate_range+0x2f7/0x7f0
> ? do_migrate_range+0x42/0x7f0
> ? offline_pages+0x2f4/0x8c0
> offline_pages+0x60a/0x8c0
> memory_subsys_offline+0x9f/0x1c0
> ? lockdep_hardirqs_on+0x77/0x100
> ? _raw_spin_unlock_irqrestore+0x38/0x60
> device_offline+0xe3/0x110
> state_store+0x6e/0xc0
> kernfs_fop_write_iter+0x143/0x200
> vfs_write+0x39f/0x560
> ksys_write+0x65/0xf0
> do_syscall_64+0x62/0x130
> 
> [...]

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name
      https://git.kernel.org/vfs/vfs/c/e57de7d9e7fc
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index f356fe2ec2b6..d3f9d73d59d0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -562,6 +562,7 @@  void dump_mapping(const struct address_space *mapping)
 	struct hlist_node *dentry_first;
 	struct dentry *dentry_ptr;
 	struct dentry dentry;
+	char fname[64] = {};
 	unsigned long ino;
 
 	/*
@@ -598,11 +599,14 @@  void dump_mapping(const struct address_space *mapping)
 		return;
 	}
 
+	if (strncpy_from_kernel_nofault(fname, dentry.d_name.name, 63) < 0)
+		strscpy(fname, "<invalid>");
 	/*
-	 * if dentry is corrupted, the %pd handler may still crash,
-	 * but it's unlikely that we reach here with a corrupt mapping
+	 * Even if strncpy_from_kernel_nofault() succeeded,
+	 * the fname could be unreliable
 	 */
-	pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", a_ops, ino, &dentry);
+	pr_warn("aops:%ps ino:%lx dentry name(?):\"%s\"\n",
+		a_ops, ino, fname);
 }
 
 void clear_inode(struct inode *inode)