diff mbox

[RFC] Btrfs: fix deadlock between sys_sync and freeze

Message ID 1344920474-10014-1-git-send-email-liub.liubo@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

bo liu Aug. 14, 2012, 5:01 a.m. UTC
From: Liu Bo <bo.li.liu@oracle.com>

I found this while testing xfstests 068, the story is

    t1                                            t2
  sys_sync                                    thaw_super
    iterate_supers
      down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
      sync_fs (with wait mode)
        start_transaction
          sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN

In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/super.c   |   15 +++++++++++++++
 include/linux/fs.h |    5 +++++
 2 files changed, 20 insertions(+), 0 deletions(-)

Comments

Marco Aug. 14, 2012, 12:59 p.m. UTC | #1
Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:
> From: Liu Bo <bo.li.liu@oracle.com>
>
> I found this while testing xfstests 068, the story is
>
>      t1                                            t2
>    sys_sync                                    thaw_super
>      iterate_supers
>        down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
>        sync_fs (with wait mode)
>          start_transaction
>            sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN
>
> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.
>

IMHO, we should avoid to call the sync operation on a frozen fs. The 
freeze operation, indeed, already include a sync operation. According to 
man page, no other operation should modify the fs after the freeze. So 
for me the modification is inside sync_filesystem (and sync_one_sb).

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Aug. 14, 2012, 1:53 p.m. UTC | #2
On 08/14/2012 08:59 PM, Marco Stornelli wrote:
> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:
>> From: Liu Bo <bo.li.liu@oracle.com>
>>
>> I found this while testing xfstests 068, the story is
>>
>>      t1                                            t2
>>    sys_sync                                    thaw_super
>>      iterate_supers
>>        down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
>>        sync_fs (with wait mode)
>>          start_transaction
>>            sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN
>>
>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.
>>
> 
> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation.
> According to man page, no other operation should modify the fs after the freeze. 
> So for me the modification is inside sync_filesystem (and sync_one_sb).

Do you mean that we should add the trylock check in sync_filesystem?

But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb().

thanks,
liubo

> 
> Marco
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marco Aug. 14, 2012, 2:12 p.m. UTC | #3
Il 14/08/2012 15:53, Liu Bo ha scritto:
> On 08/14/2012 08:59 PM, Marco Stornelli wrote:
>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:
>>> From: Liu Bo <bo.li.liu@oracle.com>
>>>
>>> I found this while testing xfstests 068, the story is
>>>
>>>       t1                                            t2
>>>     sys_sync                                    thaw_super
>>>       iterate_supers
>>>         down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
>>>         sync_fs (with wait mode)
>>>           start_transaction
>>>             sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN
>>>
>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
>>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.
>>>
>>
>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation.
>> According to man page, no other operation should modify the fs after the freeze.
>> So for me the modification is inside sync_filesystem (and sync_one_sb).
>
> Do you mean that we should add the trylock check in sync_filesystem?
>
> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb().
>
> thanks,
> liubo
>

I meant that we should check if there are in a "complete" freeze state 
(according to the "states" of a freeze transaction) and simply skip the 
sync operation.

Marco
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Aug. 15, 2012, 12:51 p.m. UTC | #4
(CCed Jan, the author of freeze code)

On 08/14/2012 10:12 PM, Marco Stornelli wrote:
> Il 14/08/2012 15:53, Liu Bo ha scritto:
>> On 08/14/2012 08:59 PM, Marco Stornelli wrote:
>>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:
>>>> From: Liu Bo <bo.li.liu@oracle.com>
>>>>
>>>> I found this while testing xfstests 068, the story is
>>>>
>>>>       t1                                            t2
>>>>     sys_sync                                    thaw_super
>>>>       iterate_supers
>>>>         down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
>>>>         sync_fs (with wait mode)
>>>>           start_transaction
>>>>             sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN
>>>>
>>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
>>>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.
>>>>
>>>
>>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation.
>>> According to man page, no other operation should modify the fs after the freeze.
>>> So for me the modification is inside sync_filesystem (and sync_one_sb).
>>
>> Do you mean that we should add the trylock check in sync_filesystem?
>>
>> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb().
>>
>> thanks,
>> liubo
>>
> 
> I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation.
> 

I'm ok with it.

What do you think about it, Jan?  Any comments?

thanks,
liubo

> Marco

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Aug. 15, 2012, 1:12 p.m. UTC | #5
On Wed 15-08-12 20:51:17, Liu Bo wrote:
> (CCed Jan, the author of freeze code)
> On 08/14/2012 10:12 PM, Marco Stornelli wrote:
> > Il 14/08/2012 15:53, Liu Bo ha scritto:
> >> On 08/14/2012 08:59 PM, Marco Stornelli wrote:
> >>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto:
> >>>> From: Liu Bo <bo.li.liu@oracle.com>
> >>>>
> >>>> I found this while testing xfstests 068, the story is
> >>>>
> >>>>       t1                                            t2
> >>>>     sys_sync                                    thaw_super
> >>>>       iterate_supers
> >>>>         down_read(sb->s_umount)                   down_write(sb->s_umount) --->wait for t1
> >>>>         sync_fs (with wait mode)
> >>>>           start_transaction
> >>>>             sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN
> >>>>
> >>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we
> >>>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock.
> >>>>
> >>>
> >>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation.
> >>> According to man page, no other operation should modify the fs after the freeze.
> >>> So for me the modification is inside sync_filesystem (and sync_one_sb).
> >>
> >> Do you mean that we should add the trylock check in sync_filesystem?
> >>
> >> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb().
> >>
> >> thanks,
> >> liubo
> >>
> > 
> > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation.
> > 
> 
> I'm ok with it.
> 
> What do you think about it, Jan?  Any comments?
  Hum, so what I don't exactly understand is why does btrfs start a
transaction in ->sync_fs(). The idea of freeze code is that when filesystem
is frozen, there is no data / metadata to write and thus we avoid deadlocks
arising from trying to write anything with s_umount held.

So checking whether filesystem is frozen and avoiding transaction start in
that case in btrfs_sync_fs() will work but looks hacky. Rather the check
should be for the thing which is the reason for transaction start-end pair
in btrfs_sync_fs(). My naive guess would be we should check whether there
is any transaction running...

								Honza
diff mbox

Patch

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f2eb24c..1e04b41 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -847,6 +847,21 @@  int btrfs_sync_fs(struct super_block *sb, int wait)
 		return 0;
 	}
 
+	/*
+	 * sys_sync can cause an ABBA deadlock with freeze/thaw
+	 * o freeze_super()   grabs s_umount lock and set sb to SB_FREEZE_FS.
+	 * o thaw_super()     grabs s_umount lock and set sb to SB_UNFROZEN.
+	 * o iterate_supers() grabs s_umount lock, and sync fs, during which
+	 *                    we need to do sb_start_intwrite() in starting a
+	 *                    new transaction.
+	 * so iterate_supers() will wait for thaw_super() to reset sb's frozen
+	 * state, while thaw_super() will wait for iterate_supers() to drop the
+	 * s_umount lock.  This is an ABBA deadlock.
+	 */
+	if (!sb_start_intwrite_trylock(sb))
+		return 0;
+	sb_end_intwrite(sb);
+
 	btrfs_wait_ordered_extents(root, 0, 0);
 
 	trans = btrfs_start_transaction(root, 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa11047..8a3efd0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1700,6 +1700,11 @@  static inline void sb_start_intwrite(struct super_block *sb)
 	__sb_start_write(sb, SB_FREEZE_FS, true);
 }
 
+static inline int sb_start_intwrite_trylock(struct super_block *sb)
+{
+	return __sb_start_write(sb, SB_FREEZE_FS, false);
+}
+
 
 extern bool inode_owner_or_capable(const struct inode *inode);