diff mbox series

[v2] btrfs: don't call btrfs_sync_file from iomap context

Message ID 20200902103347.32255-1-johannes.thumshirn@wdc.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: don't call btrfs_sync_file from iomap context | expand

Commit Message

Johannes Thumshirn Sept. 2, 2020, 10:33 a.m. UTC
Fstests generic/113 exposes a deadlock introduced by the switch to iomap
for direct I/O.

 ============================================
 WARNING: possible recursive locking detected
 5.9.0-rc2+ #746 Not tainted
 --------------------------------------------
 aio-stress/922 is trying to acquire lock:
 ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs]

 but task is already holding lock:
 ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs]

 other info that might help us debug this:
 Possible unsafe locking scenario:

 CPU0
 ----
 lock(&sb->s_type->i_mutex_key#11);
 lock(&sb->s_type->i_mutex_key#11);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

 2 locks held by aio-stress/922:
 #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs]
 #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs]

 stack backtrace:
 CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
 Call Trace:
 dump_stack+0x78/0xa0
 __lock_acquire.cold+0x121/0x29f
 ? btrfs_dio_iomap_end+0x65/0x130 [btrfs]
 lock_acquire+0x93/0x3b0
 ? btrfs_sync_file+0xf7/0x560 [btrfs]
 down_write+0x33/0x70
 ? btrfs_sync_file+0xf7/0x560 [btrfs]
 btrfs_sync_file+0xf7/0x560 [btrfs]
 iomap_dio_complete+0x10d/0x120
 iomap_dio_rw+0x3c8/0x520
 btrfs_direct_IO+0xd3/0x160 [btrfs]
 btrfs_file_write_iter+0x1fe/0x630 [btrfs]
 ? find_held_lock+0x2b/0x80
 aio_write+0xcd/0x180
 ? __might_fault+0x31/0x80
 ? find_held_lock+0x2b/0x80
 ? __might_fault+0x31/0x80
 io_submit_one+0x4e1/0xb30
 ? find_held_lock+0x2b/0x80
 __x64_sys_io_submit+0x71/0x220
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f5940881f79
 Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48
 RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
 RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79
 RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000
 RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030
 R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008
 R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070

This happens because iomap_dio_complete() calls into generic_write_sync()
if we have the data-sync flag set. But as we're still under the
inode_lock() from btrfs_file_write_iter() we will deadlock once
btrfs_sync_file() tries to acquire the inode_lock().

Calling into generic_write_sync() is not needed as __btrfs_direct_write()
already takes care of persisting the data on disk. We can temporarily drop
the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the
iomap code won't try to call into the sync routines as well.

References: https://github.com/btrfs/fstests/issues/12
Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

---
Changes to RFC:
- Re-apply IOCB_DSYNC before calling into generic_write_sync (Goldwyn)
---
 fs/btrfs/file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Josef Bacik Sept. 2, 2020, 11:13 a.m. UTC | #1
On 9/2/20 6:33 AM, Johannes Thumshirn wrote:
> Fstests generic/113 exposes a deadlock introduced by the switch to iomap
> for direct I/O.
> 
>   ============================================
>   WARNING: possible recursive locking detected
>   5.9.0-rc2+ #746 Not tainted
>   --------------------------------------------
>   aio-stress/922 is trying to acquire lock:
>   ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs]
> 
>   but task is already holding lock:
>   ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs]
> 
>   other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>   CPU0
>   ----
>   lock(&sb->s_type->i_mutex_key#11);
>   lock(&sb->s_type->i_mutex_key#11);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>   2 locks held by aio-stress/922:
>   #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs]
>   #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs]
> 
>   stack backtrace:
>   CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>   Call Trace:
>   dump_stack+0x78/0xa0
>   __lock_acquire.cold+0x121/0x29f
>   ? btrfs_dio_iomap_end+0x65/0x130 [btrfs]
>   lock_acquire+0x93/0x3b0
>   ? btrfs_sync_file+0xf7/0x560 [btrfs]
>   down_write+0x33/0x70
>   ? btrfs_sync_file+0xf7/0x560 [btrfs]
>   btrfs_sync_file+0xf7/0x560 [btrfs]
>   iomap_dio_complete+0x10d/0x120
>   iomap_dio_rw+0x3c8/0x520
>   btrfs_direct_IO+0xd3/0x160 [btrfs]
>   btrfs_file_write_iter+0x1fe/0x630 [btrfs]
>   ? find_held_lock+0x2b/0x80
>   aio_write+0xcd/0x180
>   ? __might_fault+0x31/0x80
>   ? find_held_lock+0x2b/0x80
>   ? __might_fault+0x31/0x80
>   io_submit_one+0x4e1/0xb30
>   ? find_held_lock+0x2b/0x80
>   __x64_sys_io_submit+0x71/0x220
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   RIP: 0033:0x7f5940881f79
>   Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48
>   RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
>   RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79
>   RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000
>   RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030
>   R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008
>   R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070
> 
> This happens because iomap_dio_complete() calls into generic_write_sync()
> if we have the data-sync flag set. But as we're still under the
> inode_lock() from btrfs_file_write_iter() we will deadlock once
> btrfs_sync_file() tries to acquire the inode_lock().
> 
> Calling into generic_write_sync() is not needed as __btrfs_direct_write()
> already takes care of persisting the data on disk. We can temporarily drop
> the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the
> iomap code won't try to call into the sync routines as well.
> 
> References: https://github.com/btrfs/fstests/issues/12
> Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO")
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> ---
> Changes to RFC:
> - Re-apply IOCB_DSYNC before calling into generic_write_sync (Goldwyn)

This breaks DSYNC AIO, because we need it to run once we've done all the 
IO.  You hit the deadlock because the io all completed before we exited 
iomap_dio_rw(), but generally that won't happen, and so we need the 
DSYNC set for the last reference on the dio to actually run the ->fsync. 
  Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b62679382799..a963f80cadf2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2023,6 +2023,12 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 		atomic_inc(&BTRFS_I(inode)->sync_writers);
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
+		/*
+		 * Temporarily clear to IOCB_DSYNC flag for DIO, as it will
+		 * deadlock in btrfs_sync_file() otherwise. Drop again once we
+		 * have revisited locking around btrfs_sync_file().
+		 */
+		iocb->ki_flags &= ~IOCB_DSYNC;
 		num_written = __btrfs_direct_write(iocb, from);
 	} else {
 		num_written = btrfs_buffered_write(iocb, from);
@@ -2043,6 +2049,10 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	spin_lock(&BTRFS_I(inode)->lock);
 	BTRFS_I(inode)->last_sub_trans = root->log_transid;
 	spin_unlock(&BTRFS_I(inode)->lock);
+
+	if (sync)
+		iocb->ki_flags |= IOCB_DSYNC;
+
 	if (num_written > 0)
 		num_written = generic_write_sync(iocb, num_written);