Message ID | 20200902103347.32255-1-johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: don't call btrfs_sync_file from iomap context | expand |
On 9/2/20 6:33 AM, Johannes Thumshirn wrote: > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > for direct I/O. > > ============================================ > WARNING: possible recursive locking detected > 5.9.0-rc2+ #746 Not tainted > -------------------------------------------- > aio-stress/922 is trying to acquire lock: > ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > > but task is already holding lock: > ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&sb->s_type->i_mutex_key#11); > lock(&sb->s_type->i_mutex_key#11); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by aio-stress/922: > #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > > stack backtrace: > CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > Call Trace: > dump_stack+0x78/0xa0 > __lock_acquire.cold+0x121/0x29f > ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > lock_acquire+0x93/0x3b0 > ? btrfs_sync_file+0xf7/0x560 [btrfs] > down_write+0x33/0x70 > ? btrfs_sync_file+0xf7/0x560 [btrfs] > btrfs_sync_file+0xf7/0x560 [btrfs] > iomap_dio_complete+0x10d/0x120 > iomap_dio_rw+0x3c8/0x520 > btrfs_direct_IO+0xd3/0x160 [btrfs] > btrfs_file_write_iter+0x1fe/0x630 [btrfs] > ? find_held_lock+0x2b/0x80 > aio_write+0xcd/0x180 > ? __might_fault+0x31/0x80 > ? find_held_lock+0x2b/0x80 > ? __might_fault+0x31/0x80 > io_submit_one+0x4e1/0xb30 > ? find_held_lock+0x2b/0x80 > __x64_sys_io_submit+0x71/0x220 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7f5940881f79 > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > This happens because iomap_dio_complete() calls into generic_write_sync() > if we have the data-sync flag set. But as we're still under the > inode_lock() from btrfs_file_write_iter() we will deadlock once > btrfs_sync_file() tries to acquire the inode_lock(). > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > already takes care of persisting the data on disk. We can temporarily drop > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > iomap code won't try to call into the sync routines as well. > > References: https://github.com/btrfs/fstests/issues/12 > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > --- > Changes to RFC: > - Re-apply IOCB_DSYNC before calling into generic_write_sync (Goldwyn) This breaks DSYNC AIO, because we need it to run once we've done all the IO. You hit the deadlock because the io all completed before we exited iomap_dio_rw(), but generally that won't happen, and so we need the DSYNC set for the last reference on the dio to actually run the ->fsync. Thanks, Josef
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b62679382799..a963f80cadf2 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2023,6 +2023,12 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, atomic_inc(&BTRFS_I(inode)->sync_writers); if (iocb->ki_flags & IOCB_DIRECT) { + /* + * Temporarily clear to IOCB_DSYNC flag for DIO, as it will + * deadlock in btrfs_sync_file() otherwise. Drop again once we + * have revisited locking around btrfs_sync_file(). + */ + iocb->ki_flags &= ~IOCB_DSYNC; num_written = __btrfs_direct_write(iocb, from); } else { num_written = btrfs_buffered_write(iocb, from); @@ -2043,6 +2049,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, spin_lock(&BTRFS_I(inode)->lock); BTRFS_I(inode)->last_sub_trans = root->log_transid; spin_unlock(&BTRFS_I(inode)->lock); + + if (sync) + iocb->ki_flags |= IOCB_DSYNC; + if (num_written > 0) num_written = generic_write_sync(iocb, num_written);
Fstests generic/113 exposes a deadlock introduced by the switch to iomap for direct I/O. ============================================ WARNING: possible recursive locking detected 5.9.0-rc2+ #746 Not tainted -------------------------------------------- aio-stress/922 is trying to acquire lock: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] but task is already holding lock: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sb->s_type->i_mutex_key#11); lock(&sb->s_type->i_mutex_key#11); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by aio-stress/922: #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] stack backtrace: CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x78/0xa0 __lock_acquire.cold+0x121/0x29f ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] lock_acquire+0x93/0x3b0 ? btrfs_sync_file+0xf7/0x560 [btrfs] down_write+0x33/0x70 ? btrfs_sync_file+0xf7/0x560 [btrfs] btrfs_sync_file+0xf7/0x560 [btrfs] iomap_dio_complete+0x10d/0x120 iomap_dio_rw+0x3c8/0x520 btrfs_direct_IO+0xd3/0x160 [btrfs] btrfs_file_write_iter+0x1fe/0x630 [btrfs] ? find_held_lock+0x2b/0x80 aio_write+0xcd/0x180 ? __might_fault+0x31/0x80 ? find_held_lock+0x2b/0x80 ? __might_fault+0x31/0x80 io_submit_one+0x4e1/0xb30 ? find_held_lock+0x2b/0x80 __x64_sys_io_submit+0x71/0x220 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5940881f79 Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 This happens because iomap_dio_complete() calls into generic_write_sync() if we have the data-sync flag set. But as we're still under the inode_lock() from btrfs_file_write_iter() we will deadlock once btrfs_sync_file() tries to acquire the inode_lock(). Calling into generic_write_sync() is not needed as __btrfs_direct_write() already takes care of persisting the data on disk. We can temporarily drop the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the iomap code won't try to call into the sync routines as well. References: https://github.com/btrfs/fstests/issues/12 Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- Changes to RFC: - Re-apply IOCB_DSYNC before calling into generic_write_sync (Goldwyn) --- fs/btrfs/file.c | 10 ++++++++++ 1 file changed, 10 insertions(+)