Message ID | tencent_9E9EB81B474B0E1B23256EBA05BB79332408@qq.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ext4: fix deadlock in ext4_xattr_inode_iget | expand |
On Thu, Apr 04, 2024 at 09:54:02AM +0800, Edward Adam Davis wrote: > According to mark inode dirty context, it does not need to be protected by lock > i_data_sem, and if it is protected by i_data_sem, a deadlock will occur. The i_data_sem lock is not to protect mark_inode_dirty_context, but to avoid races with the writeback code, which you can see right before you added the down_write() line. More detail about why it is necessary can be found in commit 90e775b71ac4 ("ext4: fix lost truncate due to race with writeback"): The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again before inode is reclaimed: ext4_setattr() mpage_map_and_submit_extent() EXT4_I(inode)->i_disksize = attr->ia_size; ... ... disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT /* False because i_size isn't * updated yet */ if (disksize > i_size_read(inode)) /* True, because i_disksize is * already truncated */ if (disksize > EXT4_I(inode)->i_disksize) /* Overwrite i_disksize * update from truncate */ ext4_update_i_disksize() i_size_write(inode, attr->ia_size); For other places updating i_disksize such race cannot happen because i_mutex prevents these races. Writeback is the only place where we do not hold i_mutex and we cannot grab it there because of lock ordering. We fix the race by doing both i_disksize and i_size update in truncate Atomically under i_data_sem and in mpage_map_and_submit_extent() we move the check against i_size under i_data_sem as well. So your proposed fix would introduce a regression by re-enabling the bug which is fixed by commit 90e775b71ac4. In any case, as Andreas has pointed out, this is a false positive; the supposed deadlock involves an ea_inode in stack trace #0, whereas the stack trace #1 involves a write to a data inode. Andreas has suggested fixing this by annotating the lock appropriately. This case is not going to happen in real production systems today, since triggering it requires using the debugging mount option debug_want_extra_isize. So while it would be good to avoid the false positive lockdep warning, fixing this is a lower priority bug --- it certainly isn't security issue that syzbot developers like to point at when talking about the "Linux security disaster". It isn't even a real production level bug! Cheers, - Ted
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 537803250ca9..d2cbe3dddfab 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5417,6 +5417,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, down_write(&EXT4_I(inode)->i_data_sem); old_disksize = EXT4_I(inode)->i_disksize; EXT4_I(inode)->i_disksize = attr->ia_size; + up_write(&EXT4_I(inode)->i_data_sem); rc = ext4_mark_inode_dirty(handle, inode); if (!error) error = rc; @@ -5425,6 +5426,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, * with i_disksize to avoid races with writeback code * running ext4_wb_update_i_disksize(). */ + down_write(&EXT4_I(inode)->i_data_sem); if (!error) i_size_write(inode, attr->ia_size); else
[Syzbot reported] WARNING: possible circular locking dependency detected 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted ------------------------------------------------------ syz-executor545/5275 is trying to acquire lock: ffff888077730400 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:793 [inline] ffff888077730400 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x173/0x440 fs/ext4/xattr.c:461 but task is already holding lock: ffff888077730c88 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x1ba0/0x29d0 fs/ext4/inode.c:5417 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem/3){++++}-{3:3}: down_write+0x3a/0x50 kernel/locking/rwsem.c:1579 ext4_update_i_disksize fs/ext4/ext4.h:3383 [inline] ext4_xattr_inode_write fs/ext4/xattr.c:1446 [inline] ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1594 [inline] ext4_xattr_set_entry+0x3a14/0x3cf0 fs/ext4/xattr.c:1719 ext4_xattr_ibody_set+0x126/0x380 fs/ext4/xattr.c:2287 ext4_xattr_set_handle+0x98d/0x1480 fs/ext4/xattr.c:2444 ext4_xattr_set+0x149/0x380 fs/ext4/xattr.c:2558 __vfs_setxattr+0x176/0x1e0 fs/xattr.c:200 __vfs_setxattr_noperm+0x127/0x5e0 fs/xattr.c:234 __vfs_setxattr_locked+0x182/0x260 fs/xattr.c:295 vfs_setxattr+0x146/0x350 fs/xattr.c:321 do_setxattr+0x146/0x170 fs/xattr.c:629 setxattr+0x15d/0x180 fs/xattr.c:652 path_setxattr+0x179/0x1e0 fs/xattr.c:671 __do_sys_lsetxattr fs/xattr.c:694 [inline] __se_sys_lsetxattr fs/xattr.c:690 [inline] __x64_sys_lsetxattr+0xc1/0x160 fs/xattr.c:690 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd5/0x260 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 -> #0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x540 kernel/locking/lockdep.c:5719 down_write+0x3a/0x50 kernel/locking/rwsem.c:1579 inode_lock include/linux/fs.h:793 [inline] ext4_xattr_inode_iget+0x173/0x440 fs/ext4/xattr.c:461 ext4_xattr_inode_get+0x16c/0x870 fs/ext4/xattr.c:535 ext4_xattr_move_to_block fs/ext4/xattr.c:2640 [inline] ext4_xattr_make_inode_space fs/ext4/xattr.c:2742 [inline] ext4_expand_extra_isize_ea+0x1367/0x1ae0 fs/ext4/xattr.c:2834 __ext4_expand_extra_isize+0x346/0x480 fs/ext4/inode.c:5789 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5832 [inline] __ext4_mark_inode_dirty+0x55a/0x860 fs/ext4/inode.c:5910 ext4_setattr+0x1c14/0x29d0 fs/ext4/inode.c:5420 notify_change+0x745/0x11c0 fs/attr.c:497 do_truncate+0x15c/0x220 fs/open.c:65 handle_truncate fs/namei.c:3300 [inline] do_open fs/namei.c:3646 [inline] path_openat+0x24b9/0x2990 fs/namei.c:3799 do_filp_open+0x1dc/0x430 fs/namei.c:3826 do_sys_openat2+0x17a/0x1e0 fs/open.c:1406 do_sys_open fs/open.c:1421 [inline] __do_sys_openat fs/open.c:1437 [inline] __se_sys_openat fs/open.c:1432 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1432 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd5/0x260 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#8/1); lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#8/1); *** DEADLOCK *** [Fix] According to mark inode dirty context, it does not need to be protected by lock i_data_sem, and if it is protected by i_data_sem, a deadlock will occur. Reported-by: syzbot+ee72b9a7aad1e5a77c5c@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> --- fs/ext4/inode.c | 2 ++ 1 file changed, 2 insertions(+)