diff mbox

btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

Message ID 1421653361-18630-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Qu Wenruo Jan. 19, 2015, 7:42 a.m. UTC
Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending
changes) will call btrfs_start_transaction() in sync_fs(), to handle
some operations needed to be done in next transaction.

However this can cause deadlock if the filesystem is frozen, with the
following sys_r+w output:
[  143.255932] Call Trace:
[  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
[  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
[btrfs]
[  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
[  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
[  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
[  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
[  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
[  143.256111] Call Trace:
[  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
[  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
[  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
[  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
[  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
[  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
[  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17

The reason is like the following:
(Holding s_umount)
VFS sync_fs staff:
|- btrfs_sync_fs()
   |- btrfs_start_transaction()
      |- sb_start_intwrite()
      (Waiting thaw_fs to unfreeze)
					VFS thaw_fs staff:
					thaw_fs()
					(Waiting sync_fs to release
					 s_umount)

So deadlock happens.
This can be easily triggered by fstest/generic/068 with inode_cache
mount option.

The fix is to check if the fs is frozen, if the fs is frozen, just
return and waiting for the next transaction.

Cc: David Sterba <dsterba@suse.cz>
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/super.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

David Sterba Jan. 19, 2015, 2:06 p.m. UTC | #1
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
> The fix is to check if the fs is frozen, if the fs is frozen, just
> return and waiting for the next transaction.
> 
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>  			 */
>  			if (fs_info->pending_changes == 0)
>  				return 0;
> +			/*
> +			 * Test if the fs is frozen, or start_trasaction
> +			 * will deadlock on itself.
> +			 */
> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
> +				__sb_end_write(sb, SB_FREEZE_FS);
> +			else
> +				return 0;

I'm not sure this is the right fix. We should use either
mnt_want_write_file or sb_start_write around the start/commit functions.
The fs may be frozen already, but we also have to catch transition to
that state, or RO remount.

Also, returning 0 is not right, the ioctl actually skipped the expected
work.

>  			trans = btrfs_start_transaction(root, 0);
>  		} else {
>  			return PTR_ERR(trans);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 20, 2015, 12:19 a.m. UTC | #2
On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote:
> Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending
> changes) will call btrfs_start_transaction() in sync_fs(), to handle
> some operations needed to be done in next transaction.
> 
> However this can cause deadlock if the filesystem is frozen, with the
> following sys_r+w output:
> [  143.255932] Call Trace:
> [  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
> [  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
> [  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
> [btrfs]
> [  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
> [btrfs]
> [  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
> [  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
> [  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
> [  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
> [  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
> [  143.256111] Call Trace:
> [  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
> [  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
> [  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
> [  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
> [  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
> [  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
> [  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
> 
> The reason is like the following:
> (Holding s_umount)
> VFS sync_fs staff:
> |- btrfs_sync_fs()
>    |- btrfs_start_transaction()
>       |- sb_start_intwrite()
>       (Waiting thaw_fs to unfreeze)
> 					VFS thaw_fs staff:
> 					thaw_fs()
> 					(Waiting sync_fs to release
> 					 s_umount)
> 
> So deadlock happens.
> This can be easily triggered by fstest/generic/068 with inode_cache
> mount option.
> 
> The fix is to check if the fs is frozen, if the fs is frozen, just
> return and waiting for the next transaction.
> 
> Cc: David Sterba <dsterba@suse.cz>
> Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/super.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 60f7cbe..1d9f1e6 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>  			 */
>  			if (fs_info->pending_changes == 0)
>  				return 0;


I think the problem is here -- why ->pending_changes is not 0 when the
filesystem is frozen? so I think the reason of this problem is btrfs_freeze
forget to deal with the pending changes, and the correct fix is to correct
the behavior of btrfs_freeze().

Thanks
Miao

> +			/*
> +			 * Test if the fs is frozen, or start_trasaction
> +			 * will deadlock on itself.
> +			 */
> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
> +				__sb_end_write(sb, SB_FREEZE_FS);
> +			else
> +				return 0;
>  			trans = btrfs_start_transaction(root, 0);
>  		} else {
>  			return PTR_ERR(trans);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 20, 2015, 12:26 a.m. UTC | #3
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, <linux-btrfs@vger.kernel.org>
Date: 2015?01?20? 08:19
> On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote:
>> Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending
>> changes) will call btrfs_start_transaction() in sync_fs(), to handle
>> some operations needed to be done in next transaction.
>>
>> However this can cause deadlock if the filesystem is frozen, with the
>> following sys_r+w output:
>> [  143.255932] Call Trace:
>> [  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
>> [  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
>> [  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
>> [btrfs]
>> [  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
>> [btrfs]
>> [  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
>> [  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
>> [  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
>> [  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
>> [  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
>> [  143.256111] Call Trace:
>> [  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
>> [  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
>> [  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
>> [  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
>> [  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
>> [  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
>> [  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
>>
>> The reason is like the following:
>> (Holding s_umount)
>> VFS sync_fs staff:
>> |- btrfs_sync_fs()
>>     |- btrfs_start_transaction()
>>        |- sb_start_intwrite()
>>        (Waiting thaw_fs to unfreeze)
>> 					VFS thaw_fs staff:
>> 					thaw_fs()
>> 					(Waiting sync_fs to release
>> 					 s_umount)
>>
>> So deadlock happens.
>> This can be easily triggered by fstest/generic/068 with inode_cache
>> mount option.
>>
>> The fix is to check if the fs is frozen, if the fs is frozen, just
>> return and waiting for the next transaction.
>>
>> Cc: David Sterba <dsterba@suse.cz>
>> Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/super.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 60f7cbe..1d9f1e6 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>   			 */
>>   			if (fs_info->pending_changes == 0)
>>   				return 0;
>
> I think the problem is here -- why ->pending_changes is not 0 when the
> filesystem is frozen?
This happens when already no transaction is running but some one set 
inode_cache or things needs pending.
And the freeze follows.
> so I think the reason of this problem is btrfs_freeze
> forget to deal with the pending changes, and the correct fix is to correct
> the behavior of btrfs_freeze().
Great! Thanks for pointing this!
Starting a transaction in btrfs_freeze() seems to be the silver bullet 
for such case.

Thanks
Qu
>
> Thanks
> Miao
>
>> +			/*
>> +			 * Test if the fs is frozen, or start_trasaction
>> +			 * will deadlock on itself.
>> +			 */
>> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
>> +				__sb_end_write(sb, SB_FREEZE_FS);
>> +			else
>> +				return 0;
>>   			trans = btrfs_start_transaction(root, 0);
>>   		} else {
>>   			return PTR_ERR(trans);
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 20, 2015, 2:51 a.m. UTC | #4
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?19? 22:06
> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>> The fix is to check if the fs is frozen, if the fs is frozen, just
>> return and waiting for the next transaction.
>>
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>   			 */
>>   			if (fs_info->pending_changes == 0)
>>   				return 0;
>> +			/*
>> +			 * Test if the fs is frozen, or start_trasaction
>> +			 * will deadlock on itself.
>> +			 */
>> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
>> +				__sb_end_write(sb, SB_FREEZE_FS);
>> +			else
>> +				return 0;
> I'm not sure this is the right fix. We should use either
> mnt_want_write_file or sb_start_write around the start/commit functions.
> The fs may be frozen already, but we also have to catch transition to
> that state, or RO remount.
But the deadlock between s_umount and frozen level is a larger problem...

Even Miao mentioned that we can start a transaction in btrfs_freeze(), 
but there is still possibility that
we try to change the feature of the frozen btrfs and do sync, again the 
deadlock will happen.
Although handling in btrfs_freeze() is also needed, but can't resolve 
all the problem.

IMHO the fix is still needed, or at least as a workaround until we find 
a real root solution for it
(If nobody want to revert the patchset)

BTW, what about put the pending changes to a workqueue? If we don't 
start transaction under
s_umount context like sync_fs()

Thanks,
Qu
>
> Also, returning 0 is not right, the ioctl actually skipped the expected
> work.
>
>>   			trans = btrfs_start_transaction(root, 0);
>>   		} else {
>>   			return PTR_ERR(trans);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 20, 2015, 2:53 a.m. UTC | #5
Add CC to Miao Xie <miaoxie@huawei.com>

-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie 
<miaox@cn.fujitsu.com>
Date: 2015?01?20? 10:51
>
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
> frozen fs to avoid deadlock.
> From: David Sterba <dsterba@suse.cz>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015?01?19? 22:06
>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>> The fix is to check if the fs is frozen, if the fs is frozen, just
>>> return and waiting for the next transaction.
>>>
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int 
>>> wait)
>>>                */
>>>               if (fs_info->pending_changes == 0)
>>>                   return 0;
>>> +            /*
>>> +             * Test if the fs is frozen, or start_trasaction
>>> +             * will deadlock on itself.
>>> +             */
>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>> +            else
>>> +                return 0;
>> I'm not sure this is the right fix. We should use either
>> mnt_want_write_file or sb_start_write around the start/commit functions.
>> The fs may be frozen already, but we also have to catch transition to
>> that state, or RO remount.
> But the deadlock between s_umount and frozen level is a larger problem...
>
> Even Miao mentioned that we can start a transaction in btrfs_freeze(), 
> but there is still possibility that
> we try to change the feature of the frozen btrfs and do sync, again 
> the deadlock will happen.
> Although handling in btrfs_freeze() is also needed, but can't resolve 
> all the problem.
>
> IMHO the fix is still needed, or at least as a workaround until we 
> find a real root solution for it
> (If nobody want to revert the patchset)
>
> BTW, what about put the pending changes to a workqueue? If we don't 
> start transaction under
> s_umount context like sync_fs()
>
> Thanks,
> Qu
>>
>> Also, returning 0 is not right, the ioctl actually skipped the expected
>> work.
>>
>>>               trans = btrfs_start_transaction(root, 0);
>>>           } else {
>>>               return PTR_ERR(trans);
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 20, 2015, 3:06 a.m. UTC | #6
On Tue, 20 Jan 2015 10:53:05 +0800, Qu Wenruo wrote:
> Add CC to Miao Xie <miaoxie@huawei.com>
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to
> avoid deadlock.
> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie <miaox@cn.fujitsu.com>
> Date: 2015?01?20? 10:51
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>> to avoid deadlock.
>> From: David Sterba <dsterba@suse.cz>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2015?01?19? 22:06
>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>> The fix is to check if the fs is frozen, if the fs is frozen, just
>>>> return and waiting for the next transaction.
>>>>
>>>> --- a/fs/btrfs/super.c
>>>> +++ b/fs/btrfs/super.c
>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>                */
>>>>               if (fs_info->pending_changes == 0)
>>>>                   return 0;
>>>> +            /*
>>>> +             * Test if the fs is frozen, or start_trasaction
>>>> +             * will deadlock on itself.
>>>> +             */
>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>> +            else
>>>> +                return 0;
>>> I'm not sure this is the right fix. We should use either
>>> mnt_want_write_file or sb_start_write around the start/commit functions.
>>> The fs may be frozen already, but we also have to catch transition to
>>> that state, or RO remount.
>> But the deadlock between s_umount and frozen level is a larger problem...
>>
>> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but
>> there is still possibility that
>> we try to change the feature of the frozen btrfs and do sync, again the
>> deadlock will happen.
>> Although handling in btrfs_freeze() is also needed, but can't resolve all the
>> problem.
>>
>> IMHO the fix is still needed, or at least as a workaround until we find a real
>> root solution for it
>> (If nobody want to revert the patchset)
>>
>> BTW, what about put the pending changes to a workqueue? If we don't start
>> transaction under
>> s_umount context like sync_fs()

I don't like this fix.
I think we should deal with those pending changes when we freeze a filesystem.
or we break the rule of fs freeze.

Thanks
Miao

>>
>> Thanks,
>> Qu
>>>
>>> Also, returning 0 is not right, the ioctl actually skipped the expected
>>> work.
>>>
>>>>               trans = btrfs_start_transaction(root, 0);
>>>>           } else {
>>>>               return PTR_ERR(trans);
>>
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 20, 2015, 3:17 a.m. UTC | #7
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, <dsterba@suse.cz>, 
<linux-btrfs@vger.kernel.org>
Date: 2015?01?20? 11:06
> On Tue, 20 Jan 2015 10:53:05 +0800, Qu Wenruo wrote:
>> Add CC to Miao Xie <miaoxie@huawei.com>
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to
>> avoid deadlock.
>> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie <miaox@cn.fujitsu.com>
>> Date: 2015?01?20? 10:51
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>>> to avoid deadlock.
>>> From: David Sterba <dsterba@suse.cz>
>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Date: 2015?01?19? 22:06
>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>> The fix is to check if the fs is frozen, if the fs is frozen, just
>>>>> return and waiting for the next transaction.
>>>>>
>>>>> --- a/fs/btrfs/super.c
>>>>> +++ b/fs/btrfs/super.c
>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>                 */
>>>>>                if (fs_info->pending_changes == 0)
>>>>>                    return 0;
>>>>> +            /*
>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>> +             * will deadlock on itself.
>>>>> +             */
>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>> +            else
>>>>> +                return 0;
>>>> I'm not sure this is the right fix. We should use either
>>>> mnt_want_write_file or sb_start_write around the start/commit functions.
>>>> The fs may be frozen already, but we also have to catch transition to
>>>> that state, or RO remount.
>>> But the deadlock between s_umount and frozen level is a larger problem...
>>>
>>> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but
>>> there is still possibility that
>>> we try to change the feature of the frozen btrfs and do sync, again the
>>> deadlock will happen.
>>> Although handling in btrfs_freeze() is also needed, but can't resolve all the
>>> problem.
>>>
>>> IMHO the fix is still needed, or at least as a workaround until we find a real
>>> root solution for it
>>> (If nobody want to revert the patchset)
>>>
>>> BTW, what about put the pending changes to a workqueue? If we don't start
>>> transaction under
>>> s_umount context like sync_fs()
> I don't like this fix.
> I think we should deal with those pending changes when we freeze a filesystem.
> or we break the rule of fs freeze.
I am afraid handling it in btrfs_freeze() won't help.
Case like freeze() -> change_feature -> sync() -> unfreeze() will still 
deadlock in sync().

Even cleared the pending changes in freeze(), it can still be set 
through sysfs interface even the fs is frozen.

And in fact, if we put the things like attach/create a transaction into 
a workqueue, we will not break
the freeze rule.
Since if the fs is frozen, there is no running transaction and we need 
to create a new one,
that will call sb_start_intwrite(), which will sleep until the fs is 
unfreeze.

Thanks,
Qu
>
> Thanks
> Miao
>
>>> Thanks,
>>> Qu
>>>> Also, returning 0 is not right, the ioctl actually skipped the expected
>>>> work.
>>>>
>>>>>                trans = btrfs_start_transaction(root, 0);
>>>>>            } else {
>>>>>                return PTR_ERR(trans);
>> .
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 20, 2015, 8:16 a.m. UTC | #8
On Tue, 20 Jan 2015 11:17:07 +0800, Qu Wenruo wrote:
>>>>>> --- a/fs/btrfs/super.c
>>>>>> +++ b/fs/btrfs/super.c
>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>>                 */
>>>>>>                if (fs_info->pending_changes == 0)
>>>>>>                    return 0;
>>>>>> +            /*
>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>> +             * will deadlock on itself.
>>>>>> +             */
>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>> +            else
>>>>>> +                return 0;
>>>>> I'm not sure this is the right fix. We should use either
>>>>> mnt_want_write_file or sb_start_write around the start/commit functions.
>>>>> The fs may be frozen already, but we also have to catch transition to
>>>>> that state, or RO remount.
>>>> But the deadlock between s_umount and frozen level is a larger problem...
>>>>
>>>> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but
>>>> there is still possibility that
>>>> we try to change the feature of the frozen btrfs and do sync, again the
>>>> deadlock will happen.
>>>> Although handling in btrfs_freeze() is also needed, but can't resolve all the
>>>> problem.
>>>>
>>>> IMHO the fix is still needed, or at least as a workaround until we find a real
>>>> root solution for it
>>>> (If nobody want to revert the patchset)
>>>>
>>>> BTW, what about put the pending changes to a workqueue? If we don't start
>>>> transaction under
>>>> s_umount context like sync_fs()
>> I don't like this fix.
>> I think we should deal with those pending changes when we freeze a filesystem.
>> or we break the rule of fs freeze.
> I am afraid handling it in btrfs_freeze() won't help.
> Case like freeze() -> change_feature -> sync() -> unfreeze() will still deadlock
> in sync().

We should not change feature after the fs is freezed.

> Even cleared the pending changes in freeze(), it can still be set through sysfs
> interface even the fs is frozen.
> 
> And in fact, if we put the things like attach/create a transaction into a
> workqueue, we will not break
> the freeze rule.
> Since if the fs is frozen, there is no running transaction and we need to create
> a new one,
> that will call sb_start_intwrite(), which will sleep until the fs is unfreeze.

I read the pending change code just now, and I found the pending change is just
used for changing the mount option now, so I think as a work-around fix we
needn't start a new transaction to handle the pending flags which are set after
the current transaction is committed, because the data on the disk is
integrated.

Thanks
Miao


> 
> Thanks,
> Qu
>>
>> Thanks
>> Miao
>>
>>>> Thanks,
>>>> Qu
>>>>> Also, returning 0 is not right, the ioctl actually skipped the expected
>>>>> work.
>>>>>
>>>>>>                trans = btrfs_start_transaction(root, 0);
>>>>>>            } else {
>>>>>>                return PTR_ERR(trans);
>>> .
>>>
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Jan. 20, 2015, 5:13 p.m. UTC | #9
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>  			 */
>  			if (fs_info->pending_changes == 0)
>  				return 0;
> +			/*
> +			 * Test if the fs is frozen, or start_trasaction
> +			 * will deadlock on itself.
> +			 */
> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
> +				__sb_end_write(sb, SB_FREEZE_FS);
> +			else
> +				return 0;

The more I look into that the more I think that the first fix is the
right one.

Has been pointed out in this thread, it is ok to skip processing the
pending changes if the filesystem is frozen.

The pending changes have to flushed from sync (by design), we cannot use
mnt_want_write or the sb_start* protections that.

The btrfs_freeze callback can safely do the last commit, that's under
s_umount held by vfs::freeze_super. Then any other new transaction would
block. Any other call to btrfs_sync_fs will not find any active
transaction and with this patch will not start one. Sounds safe to me.

I think the right level to check is SB_FREEZE_WRITE though, to stop any
potential writes as soon as possible and when the s_umount lock is still
held in vfs::freeze_super.

I'll collect the relevant patches and will send it for review.


>  			trans = btrfs_start_transaction(root, 0);
>  		} else {
>  			return PTR_ERR(trans);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 21, 2015, 12:58 a.m. UTC | #10
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?21? 01:13
> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>   			 */
>>   			if (fs_info->pending_changes == 0)
>>   				return 0;
>> +			/*
>> +			 * Test if the fs is frozen, or start_trasaction
>> +			 * will deadlock on itself.
>> +			 */
>> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
>> +				__sb_end_write(sb, SB_FREEZE_FS);
>> +			else
>> +				return 0;
> The more I look into that the more I think that the first fix is the
> right one.
>
> Has been pointed out in this thread, it is ok to skip processing the
> pending changes if the filesystem is frozen.
That's good, for me, either this patch or the patch 2~5 in the patchset 
will solve the sync_fs() problem
on frozen fs. Just different timing to start the new transaction.

But the patchset one has the problem, which needs to deal with the sysfs 
interface changes, or sync_fs()
will still cause deadlock.
So I tried to revert the sysfs related patches, but it seems overkilled, 
needing extra btrfs_start_transaction*
things.

As you already picked this one, I'm completely OK with this.
>
> The pending changes have to flushed from sync (by design), we cannot use
> mnt_want_write or the sb_start* protections that.
>
> The btrfs_freeze callback can safely do the last commit, that's under
> s_umount held by vfs::freeze_super. Then any other new transaction would
> block. Any other call to btrfs_sync_fs will not find any active
> transaction and with this patch will not start one. Sounds safe to me.
>
> I think the right level to check is SB_FREEZE_WRITE though, to stop any
> potential writes as soon as possible and when the s_umount lock is still
> held in vfs::freeze_super.
SB_FREEZE_WRITE seems good for me.
But I didn't catch the difference between 
SB_FREEZE_FS(WRITE/PAGEFAULT/COMPLETE),
since freeze() conflicts with sync_fs(), when we comes to btrfs_sync_fs(),
the fs is either totally frozen or unfrozen and frozen level won't 
change during the protection of s_umount.

Although SB_FREEZE_WRITE seems better in its meaning and makes it more 
readable.
>
> I'll collect the relevant patches and will send it for review.
Thanks for collecting them and sending them out.

Thanks,
Qu
>
>
>>   			trans = btrfs_start_transaction(root, 0);
>>   		} else {
>>   			return PTR_ERR(trans);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 21, 2015, 1:05 a.m. UTC | #11
On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> 
wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
> frozen fs to avoid deadlock.
> From: David Sterba <dsterba@suse.cz>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015?01?21? 01:13
>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
>>> int wait)
>>>   			 */
>>>   			if (fs_info->pending_changes == 0)
>>>   				return 0;
>>> +			/*
>>> +			 * Test if the fs is frozen, or start_trasaction
>>> +			 * will deadlock on itself.
>>> +			 */
>>> +			if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>> +				__sb_end_write(sb, SB_FREEZE_FS);
>>> +			else
>>> +				return 0;

But what if someone freezes the FS after __sb_end_write() and before 
btrfs_start_transaction()?   I don't see what keeps new freezers from 
coming in.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 21, 2015, 1:09 a.m. UTC | #12
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Chris Mason <clm@fb.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?21? 09:05
>
>
> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> 
> wrote:
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
>> frozen fs to avoid deadlock.
>> From: David Sterba <dsterba@suse.cz>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2015?01?21? 01:13
>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>> --- a/fs/btrfs/super.c
>>>> +++ b/fs/btrfs/super.c
>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
>>>> int wait)
>>>>                */
>>>>               if (fs_info->pending_changes == 0)
>>>>                   return 0;
>>>> +            /*
>>>> +             * Test if the fs is frozen, or start_trasaction
>>>> +             * will deadlock on itself.
>>>> +             */
>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>> +            else
>>>> +                return 0;
>
> But what if someone freezes the FS after __sb_end_write() and before 
> btrfs_start_transaction()?   I don't see what keeps new freezers from 
> coming in.
>
> -chris
Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount 
mutex, so freeze will not happen
during sync.

Thanks,
Qu
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 21, 2015, 1:10 a.m. UTC | #13
On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> 
wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
> frozen fs to avoid deadlock.
> From: Chris Mason <clm@fb.com>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015?01?21? 09:05
>> 
>> 
>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> 
>> wrote:
>>> 
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
>>> frozen fs to avoid deadlock.
>>> From: David Sterba <dsterba@suse.cz>
>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Date: 2015?01?21? 01:13
>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>> --- a/fs/btrfs/super.c
>>>>> +++ b/fs/btrfs/super.c
>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
>>>>> int wait)
>>>>>                */
>>>>>               if (fs_info->pending_changes == 0)
>>>>>                   return 0;
>>>>> +            /*
>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>> +             * will deadlock on itself.
>>>>> +             */
>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>> +            else
>>>>> +                return 0;
>> 
>> But what if someone freezes the FS after __sb_end_write() and before 
>> btrfs_start_transaction()?   I don't see what keeps new freezers 
>> from coming in.
>> 
>> -chris
> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount 
> mutex, so freeze will not happen
> during sync.

You're right.  I was worried about the sync ioctl, but the mutex won't 
be held there to deadlock against.  We'll be fine.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 21, 2015, 3:10 a.m. UTC | #14
On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:
> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>> to avoid deadlock.
>> From: Chris Mason <clm@fb.com>
>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2015?01?21? 09:05
>>>
>>>
>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
>>>> fs to avoid deadlock.
>>>> From: David Sterba <dsterba@suse.cz>
>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Date: 2015?01?21? 01:13
>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>>> --- a/fs/btrfs/super.c
>>>>>> +++ b/fs/btrfs/super.c
>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>>                */
>>>>>>               if (fs_info->pending_changes == 0)
>>>>>>                   return 0;
>>>>>> +            /*
>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>> +             * will deadlock on itself.
>>>>>> +             */
>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>> +            else
>>>>>> +                return 0;
>>>
>>> But what if someone freezes the FS after __sb_end_write() and before
>>> btrfs_start_transaction()?   I don't see what keeps new freezers from coming in.
>>>
>>> -chris
>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
>> freeze will not happen
>> during sync.
> 
> You're right.  I was worried about the sync ioctl, but the mutex won't be held
> there to deadlock against.  We'll be fine.

There is another problem which is introduced by pending change. That is we will
start and commmit a transaction by changing pending mount option after we set
the fs to be R/O.

I think it is better that we don't start a new transaction for pending changes
which are set after the transaction is committed, just make them be handled by
the next transaction, the reason is:
- Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
- Data on the disk is right and integrated


Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 21, 2015, 3:15 a.m. UTC | #15
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?21? 11:10
> On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:
>> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>> -------- Original Message --------
>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>>> to avoid deadlock.
>>> From: Chris Mason <clm@fb.com>
>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Date: 2015?01?21? 09:05
>>>>
>>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
>>>>> fs to avoid deadlock.
>>>>> From: David Sterba <dsterba@suse.cz>
>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> Date: 2015?01?21? 01:13
>>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>>>> --- a/fs/btrfs/super.c
>>>>>>> +++ b/fs/btrfs/super.c
>>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>>>                 */
>>>>>>>                if (fs_info->pending_changes == 0)
>>>>>>>                    return 0;
>>>>>>> +            /*
>>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>>> +             * will deadlock on itself.
>>>>>>> +             */
>>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>>> +            else
>>>>>>> +                return 0;
>>>> But what if someone freezes the FS after __sb_end_write() and before
>>>> btrfs_start_transaction()?   I don't see what keeps new freezers from coming in.
>>>>
>>>> -chris
>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
>>> freeze will not happen
>>> during sync.
>> You're right.  I was worried about the sync ioctl, but the mutex won't be held
>> there to deadlock against.  We'll be fine.
> There is another problem which is introduced by pending change. That is we will
> start and commmit a transaction by changing pending mount option after we set
> the fs to be R/O.
Oh, I missed this problem.
>
> I think it is better that we don't start a new transaction for pending changes
> which are set after the transaction is committed, just make them be handled by
> the next transaction,
This will cause another problem, nobody can ensure there will be next 
transaction and the change may
never to written into disk.

For example, if we change the features/label through sysfs, and then 
umount the fs,
since there is no write, there is no running transaction and if we don't 
start a new transaction,
it won't be flushed to disk.

Thanks,
Qu
> the reason is:
> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
> - Data on the disk is right and integrated
>
>
> Thanks
> Miao

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 21, 2015, 3:26 a.m. UTC | #16
On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to
> avoid deadlock.
> From: Miao Xie <miaoxie@huawei.com>
> To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015?01?21? 11:10
>> On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:
>>> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>> -------- Original Message --------
>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>>>> to avoid deadlock.
>>>> From: Chris Mason <clm@fb.com>
>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Date: 2015?01?21? 09:05
>>>>>
>>>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
>>>>>> fs to avoid deadlock.
>>>>>> From: David Sterba <dsterba@suse.cz>
>>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>>> Date: 2015?01?21? 01:13
>>>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>>>>> --- a/fs/btrfs/super.c
>>>>>>>> +++ b/fs/btrfs/super.c
>>>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>>>>                 */
>>>>>>>>                if (fs_info->pending_changes == 0)
>>>>>>>>                    return 0;
>>>>>>>> +            /*
>>>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>>>> +             * will deadlock on itself.
>>>>>>>> +             */
>>>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>>>> +            else
>>>>>>>> +                return 0;
>>>>> But what if someone freezes the FS after __sb_end_write() and before
>>>>> btrfs_start_transaction()?   I don't see what keeps new freezers from
>>>>> coming in.
>>>>>
>>>>> -chris
>>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
>>>> freeze will not happen
>>>> during sync.
>>> You're right.  I was worried about the sync ioctl, but the mutex won't be held
>>> there to deadlock against.  We'll be fine.
>> There is another problem which is introduced by pending change. That is we will
>> start and commmit a transaction by changing pending mount option after we set
>> the fs to be R/O.
> Oh, I missed this problem.
>>
>> I think it is better that we don't start a new transaction for pending changes
>> which are set after the transaction is committed, just make them be handled by
>> the next transaction,
> This will cause another problem, nobody can ensure there will be next
> transaction and the change may
> never to written into disk.

First, the pending changes is mount option, that is in-memory data.
Second, the same problem would happen after you freeze fs.

> 
> For example, if we change the features/label through sysfs, and then umount the fs,

It is different from pending change.
If you want to change features/label,  you should get write permission and make
sure the fs is not be freezed because those are on-disk data. So the problem
doesn't exist, or there is a bug.

Thanks
Miao

> since there is no write, there is no running transaction and if we don't start a
> new transaction,
> it won't be flushed to disk.
> 
> Thanks,
> Qu
>> the reason is:
>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>> - Data on the disk is right and integrated
>>
>>
>> Thanks
>> Miao
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 21, 2015, 3:53 a.m. UTC | #17
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, Chris Mason <clm@fb.com>
Date: 2015?01?21? 11:26
> On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote:
>> -------- Original Message --------
>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to
>> avoid deadlock.
>> From: Miao Xie <miaoxie@huawei.com>
>> To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Date: 2015?01?21? 11:10
>>> On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:
>>>> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>> -------- Original Message --------
>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
>>>>> to avoid deadlock.
>>>>> From: Chris Mason <clm@fb.com>
>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> Date: 2015?01?21? 09:05
>>>>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>>>> -------- Original Message --------
>>>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
>>>>>>> fs to avoid deadlock.
>>>>>>> From: David Sterba <dsterba@suse.cz>
>>>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>>>> Date: 2015?01?21? 01:13
>>>>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
>>>>>>>>> --- a/fs/btrfs/super.c
>>>>>>>>> +++ b/fs/btrfs/super.c
>>>>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>>>>>>>>>                  */
>>>>>>>>>                 if (fs_info->pending_changes == 0)
>>>>>>>>>                     return 0;
>>>>>>>>> +            /*
>>>>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>>>>> +             * will deadlock on itself.
>>>>>>>>> +             */
>>>>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>>>>> +            else
>>>>>>>>> +                return 0;
>>>>>> But what if someone freezes the FS after __sb_end_write() and before
>>>>>> btrfs_start_transaction()?   I don't see what keeps new freezers from
>>>>>> coming in.
>>>>>>
>>>>>> -chris
>>>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
>>>>> freeze will not happen
>>>>> during sync.
>>>> You're right.  I was worried about the sync ioctl, but the mutex won't be held
>>>> there to deadlock against.  We'll be fine.
>>> There is another problem which is introduced by pending change. That is we will
>>> start and commmit a transaction by changing pending mount option after we set
>>> the fs to be R/O.
>> Oh, I missed this problem.
>>> I think it is better that we don't start a new transaction for pending changes
>>> which are set after the transaction is committed, just make them be handled by
>>> the next transaction,
>> This will cause another problem, nobody can ensure there will be next
>> transaction and the change may
>> never to written into disk.
> First, the pending changes is mount option, that is in-memory data.
> Second, the same problem would happen after you freeze fs.
Pending changes are *not* only mount options. Feature change and label 
change are also pending changes if using sysfs.
Normal ioctl label changing is not affected.

For freeze, it's not the same problem since the fs will be unfreeze 
sooner or later and transaction will be initiated.
>
>> For example, if we change the features/label through sysfs, and then umount the fs,
> It is different from pending change.
No, now features/label changing using sysfs both use pending changes to 
do the commit.
See BTRFS_PENDING_COMMIT bit.
So freeze -> change features/label -> sync will still cause the deadlock 
in the same way,
and you can try it yourself.

Thanks,
Qu

> If you want to change features/label,  you should get write permission and make
> sure the fs is not be freezed because those are on-disk data. So the problem
> doesn't exist, or there is a bug.
>
> Thanks
> Miao
>
>> since there is no write, there is no running transaction and if we don't start a
>> new transaction,
>> it won't be flushed to disk.
>>
>> Thanks,
>> Qu
>>> the reason is:
>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>>> - Data on the disk is right and integrated
>>>
>>>
>>> Thanks
>>> Miao
>> .
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 21, 2015, 7:04 a.m. UTC | #18
On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
>>>>>>>>>> +            /*
>>>>>>>>>> +             * Test if the fs is frozen, or start_trasaction
>>>>>>>>>> +             * will deadlock on itself.
>>>>>>>>>> +             */
>>>>>>>>>> +            if (__sb_start_write(sb, SB_FREEZE_FS, false))
>>>>>>>>>> +                __sb_end_write(sb, SB_FREEZE_FS);
>>>>>>>>>> +            else
>>>>>>>>>> +                return 0;
>>>>>>> But what if someone freezes the FS after __sb_end_write() and before
>>>>>>> btrfs_start_transaction()?   I don't see what keeps new freezers from
>>>>>>> coming in.
>>>>>>>
>>>>>>> -chris
>>>>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
>>>>>> freeze will not happen
>>>>>> during sync.
>>>>> You're right.  I was worried about the sync ioctl, but the mutex won't be held
>>>>> there to deadlock against.  We'll be fine.
>>>> There is another problem which is introduced by pending change. That is we will
>>>> start and commmit a transaction by changing pending mount option after we set
>>>> the fs to be R/O.
>>> Oh, I missed this problem.
>>>> I think it is better that we don't start a new transaction for pending changes
>>>> which are set after the transaction is committed, just make them be handled by
>>>> the next transaction,
>>> This will cause another problem, nobody can ensure there will be next
>>> transaction and the change may
>>> never to written into disk.
>> First, the pending changes is mount option, that is in-memory data.
>> Second, the same problem would happen after you freeze fs.
> Pending changes are *not* only mount options. Feature change and label change
> are also pending changes if using sysfs.

My miss, I don't notice feature and label change by sysfs.

But the implementation of feature and label change by sysfs is wrong, we can
not change them without write permission.

> Normal ioctl label changing is not affected.
> 
> For freeze, it's not the same problem since the fs will be unfreeze sooner or
> later and transaction will be initiated.

You can not assume the operations of the users, they might freeze the fs and
then shutdown the machine.

>>
>>> For example, if we change the features/label through sysfs, and then umount
>>> the fs,
>> It is different from pending change.
> No, now features/label changing using sysfs both use pending changes to do the
> commit.
> See BTRFS_PENDING_COMMIT bit.
> So freeze -> change features/label -> sync will still cause the deadlock in the
> same way,
> and you can try it yourself.

As I said above, the implementation of sysfs feature and label change is wrong,
it is better to separate them from the pending mount option change, make the
sysfs feature and label change be done in the context of transaction after
getting the write permission. If so, we needn't do anything special when sync
the fs.

In short, changing the sysfs feature and label change implementation and
removing the unnecessary btrfs_start_transaction in sync_fs can fix the
deadlock.

Thanks
Miao

> 
> Thanks,
> Qu
> 
>> If you want to change features/label,  you should get write permission and make
>> sure the fs is not be freezed because those are on-disk data. So the problem
>> doesn't exist, or there is a bug.
>>
>> Thanks
>> Miao
>>
>>> since there is no write, there is no running transaction and if we don't start a
>>> new transaction,
>>> it won't be flushed to disk.
>>>
>>> Thanks,
>>> Qu
>>>> the reason is:
>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>>>> - Data on the disk is right and integrated
>>>>
>>>>
>>>> Thanks
>>>> Miao
>>> .
>>>
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 21, 2015, 7:47 a.m. UTC | #19
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, Chris Mason <clm@fb.com>
Date: 2015?01?21? 15:04
> On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
>>>> [snipped]
>>>> This will cause another problem, nobody can ensure there will be next
>>>> transaction and the change may
>>>> never to written into disk.
>>> First, the pending changes is mount option, that is in-memory data.
>>> Second, the same problem would happen after you freeze fs.
>> Pending changes are *not* only mount options. Feature change and label change
>> are also pending changes if using sysfs.
> My miss, I don't notice feature and label change by sysfs.
>
> But the implementation of feature and label change by sysfs is wrong, we can
> not change them without write permission.
>
>> Normal ioctl label changing is not affected.
>>
>> For freeze, it's not the same problem since the fs will be unfreeze sooner or
>> later and transaction will be initiated.
> You can not assume the operations of the users, they might freeze the fs and
> then shutdown the machine.
>
>>>> For example, if we change the features/label through sysfs, and then umount
>>>> the fs,
>>> It is different from pending change.
>> No, now features/label changing using sysfs both use pending changes to do the
>> commit.
>> See BTRFS_PENDING_COMMIT bit.
>> So freeze -> change features/label -> sync will still cause the deadlock in the
>> same way,
>> and you can try it yourself.
> As I said above, the implementation of sysfs feature and label change is wrong,
> it is better to separate them from the pending mount option change, make the
> sysfs feature and label change be done in the context of transaction after
> getting the write permission. If so, we needn't do anything special when sync
> the fs.
>
> In short, changing the sysfs feature and label change implementation and
> removing the unnecessary btrfs_start_transaction in sync_fs can fix the
> deadlock.
Your method will only fix the deadlock, but will introduce the risk like 
pending inode_cache will never
be written to disk as I already explained. (If still using the 
fs_info->pending_changes mechanism)
To ensure pending changes written to disk sync_fs() should start a 
transaction if needed,
or there will be chance that no transaction can handle it.

But I don't see the necessity to pending current work(inode_cache, 
feature/label changes) to next transaction.

To David:
I'm a little curious about why inode_cache needs to be delayed to next 
transaction.
In btrfs_remount() we have s_umount mutex, and we synced the whole 
filesystem already,
so there should be no running transaction and we can just set any mount 
option into fs_info.

Or even in worst case, there is a racing window, we can still start a 
transaction and do the commit,
a little overhead in such minor case won't impact the overall performance.

For sysfs change, I prefer attach or start transaction method, and for 
mount option change, and
such sysfs tuning is also minor case for a filesystem.

What do you think about reverting the whole patchset and rework the 
sysfs interface?

Thanks,
Qu
>
> Thanks
> Miao
>
>> Thanks,
>> Qu
>>
>>> If you want to change features/label,  you should get write permission and make
>>> sure the fs is not be freezed because those are on-disk data. So the problem
>>> doesn't exist, or there is a bug.
>>>
>>> Thanks
>>> Miao
>>>
>>>> since there is no write, there is no running transaction and if we don't start a
>>>> new transaction,
>>>> it won't be flushed to disk.
>>>>
>>>> Thanks,
>>>> Qu
>>>>> the reason is:
>>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>>>>> - Data on the disk is right and integrated
>>>>>
>>>>>
>>>>> Thanks
>>>>> Miao
>>>> .
>>>>
>> .
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 21, 2015, 8:46 a.m. UTC | #20
On Wed, 21 Jan 2015 15:47:54 +0800, Qu Wenruo wrote:
>> On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
>>>>> [snipped]
>>>>> This will cause another problem, nobody can ensure there will be next
>>>>> transaction and the change may
>>>>> never to written into disk.
>>>> First, the pending changes is mount option, that is in-memory data.
>>>> Second, the same problem would happen after you freeze fs.
>>> Pending changes are *not* only mount options. Feature change and label change
>>> are also pending changes if using sysfs.
>> My miss, I don't notice feature and label change by sysfs.
>>
>> But the implementation of feature and label change by sysfs is wrong, we can
>> not change them without write permission.
>>
>>> Normal ioctl label changing is not affected.
>>>
>>> For freeze, it's not the same problem since the fs will be unfreeze sooner or
>>> later and transaction will be initiated.
>> You can not assume the operations of the users, they might freeze the fs and
>> then shutdown the machine.
>>
>>>>> For example, if we change the features/label through sysfs, and then umount
>>>>> the fs,
>>>> It is different from pending change.
>>> No, now features/label changing using sysfs both use pending changes to do the
>>> commit.
>>> See BTRFS_PENDING_COMMIT bit.
>>> So freeze -> change features/label -> sync will still cause the deadlock in the
>>> same way,
>>> and you can try it yourself.
>> As I said above, the implementation of sysfs feature and label change is wrong,
>> it is better to separate them from the pending mount option change, make the
>> sysfs feature and label change be done in the context of transaction after
>> getting the write permission. If so, we needn't do anything special when sync
>> the fs.
>>
>> In short, changing the sysfs feature and label change implementation and
>> removing the unnecessary btrfs_start_transaction in sync_fs can fix the
>> deadlock.
> Your method will only fix the deadlock, but will introduce the risk like pending
> inode_cache will never
> be written to disk as I already explained. (If still using the
> fs_info->pending_changes mechanism)
> To ensure pending changes written to disk sync_fs() should start a transaction
> if needed,
> or there will be chance that no transaction can handle it.

We are sure that writting down the inode cache need transaction. But INODE_CACHE
is not a forcible flag. Sometimes though you set it, it is very likely that the
inode cache files are not created and the data is not written down because the
fs might still be reading inode usage information, and this operation might span
several transactions. So I think what you worried is not a problem.

Thanks
Miao

> 
> But I don't see the necessity to pending current work(inode_cache, feature/label
> changes) to next transaction.
> 
> To David:
> I'm a little curious about why inode_cache needs to be delayed to next transaction.
> In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem
> already,
> so there should be no running transaction and we can just set any mount option
> into fs_info.
> 
> Or even in worst case, there is a racing window, we can still start a
> transaction and do the commit,
> a little overhead in such minor case won't impact the overall performance.
> 
> For sysfs change, I prefer attach or start transaction method, and for mount
> option change, and
> such sysfs tuning is also minor case for a filesystem.
> 
> What do you think about reverting the whole patchset and rework the sysfs
> interface?
> 
> Thanks,
> Qu
>>
>> Thanks
>> Miao
>>
>>> Thanks,
>>> Qu
>>>
>>>> If you want to change features/label,  you should get write permission and make
>>>> sure the fs is not be freezed because those are on-disk data. So the problem
>>>> doesn't exist, or there is a bug.
>>>>
>>>> Thanks
>>>> Miao
>>>>
>>>>> since there is no write, there is no running transaction and if we don't
>>>>> start a
>>>>> new transaction,
>>>>> it won't be flushed to disk.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>> the reason is:
>>>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>>>>>> - Data on the disk is right and integrated
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Miao
>>>>> .
>>>>>
>>> .
>>>
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Jan. 23, 2015, 4:59 p.m. UTC | #21
On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote:
> > Pending changes are *not* only mount options. Feature change and label change
> > are also pending changes if using sysfs.
> 
> My miss, I don't notice feature and label change by sysfs.
> 
> But the implementation of feature and label change by sysfs is wrong, we can
> not change them without write permission.

Label change does not happen if the fs is readonly. If the filesystem is
RW and label is changed through sysfs, then remount to RO will sync the
filesystem and the new label will be saved.

The sysfs features write handler is missing that protection though, I'll
send a patch.

> > For freeze, it's not the same problem since the fs will be unfreeze sooner or
> > later and transaction will be initiated.
> 
> You can not assume the operations of the users, they might freeze the fs and
> then shutdown the machine.

The semantics of freezing should make the on-device image consistent,
but still keep some changes in memory.

> >>> For example, if we change the features/label through sysfs, and then umount
> >>> the fs,
> >> It is different from pending change.
> > No, now features/label changing using sysfs both use pending changes to do the
> > commit.
> > See BTRFS_PENDING_COMMIT bit.
> > So freeze -> change features/label -> sync will still cause the deadlock in the
> > same way,
> > and you can try it yourself.
> 
> As I said above, the implementation of sysfs feature and label change is wrong,
> it is better to separate them from the pending mount option change, make the
> sysfs feature and label change be done in the context of transaction after
> getting the write permission. If so, we needn't do anything special when sync
> the fs.

That would mean to drop the write support of sysfs files that change
global filesystem state (label and features right now). This would leave
only the ioctl way to do that. I'd like to keep the sysfs write support
though for ease of use from scripts and languages not ioctl-friendly.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Jan. 23, 2015, 5:39 p.m. UTC | #22
On Wed, Jan 21, 2015 at 03:47:54PM +0800, Qu Wenruo wrote:
> To David:
> I'm a little curious about why inode_cache needs to be delayed to next 
> transaction.
> In btrfs_remount() we have s_umount mutex, and we synced the whole 
> filesystem already,
> so there should be no running transaction and we can just set any mount 
> option into fs_info.

See our discussion under the noinode_cache option:

http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30075.html
http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30414.html

> What do you think about reverting the whole patchset and rework the 
> sysfs interface?

IMO reverting should be the last option, we have a minimal fix to the
sync deadlock and you've proposed the per-trasaction mount options to
replace the pending inode_change.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 23, 2015, 6:21 p.m. UTC | #23
On Fri, Jan 23, 2015 at 12:39 PM, David Sterba <dsterba@suse.cz> wrote:
> On Wed, Jan 21, 2015 at 03:47:54PM +0800, Qu Wenruo wrote:
>>  To David:
>>  I'm a little curious about why inode_cache needs to be delayed to 
>> next
>>  transaction.
>>  In btrfs_remount() we have s_umount mutex, and we synced the whole
>>  filesystem already,
>>  so there should be no running transaction and we can just set any 
>> mount
>>  option into fs_info.
> 
> See our discussion under the noinode_cache option:
> 
> https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%2540vger.kernel.org/msg30075.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=sv%2BL93W9i7vNsbS3ozpylY3o%2F3wpA4TZTQTtFh3mUXg%3D%0A&s=2d678af317413a7452f047aa9ed07bc7e5424d4bae831ac15fae5f23a2acd080
> https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%2540vger.kernel.org/msg30414.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=sv%2BL93W9i7vNsbS3ozpylY3o%2F3wpA4TZTQTtFh3mUXg%3D%0A&s=2fab711b3d70ab27c008694249bc62596f37e41af84dfc21077629930b4fe854
> 
>>  What do you think about reverting the whole patchset and rework the
>>  sysfs interface?
> 
> IMO reverting should be the last option, we have a minimal fix to the
> sync deadlock and you've proposed the per-trasaction mount options to
> replace the pending inode_change.

I agree, I'd rather build on top of what we have than use reverts at 
this point.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
miaoxie (A) Jan. 26, 2015, 12:31 a.m. UTC | #24
On Fri, 23 Jan 2015 17:59:49 +0100, David Sterba wrote:
> On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote:
>>> Pending changes are *not* only mount options. Feature change and label change
>>> are also pending changes if using sysfs.
>>
>> My miss, I don't notice feature and label change by sysfs.
>>
>> But the implementation of feature and label change by sysfs is wrong, we can
>> not change them without write permission.
> 
> Label change does not happen if the fs is readonly. If the filesystem is
> RW and label is changed through sysfs, then remount to RO will sync the
> filesystem and the new label will be saved.
> 
> The sysfs features write handler is missing that protection though, I'll
> send a patch.

First, the R/O protection is so cheap, there is a race between R/O remount and
label/feature change, please consider the following case:
	Remount R/O task		Label/Attr Change Task
					Check R/O
	remount ro R/O
					change Label/feature

Second, it forgets to handle the freezing event.

> 
>>> For freeze, it's not the same problem since the fs will be unfreeze sooner or
>>> later and transaction will be initiated.
>>
>> You can not assume the operations of the users, they might freeze the fs and
>> then shutdown the machine.
> 
> The semantics of freezing should make the on-device image consistent,
> but still keep some changes in memory.
> 
>>>>> For example, if we change the features/label through sysfs, and then umount
>>>>> the fs,
>>>> It is different from pending change.
>>> No, now features/label changing using sysfs both use pending changes to do the
>>> commit.
>>> See BTRFS_PENDING_COMMIT bit.
>>> So freeze -> change features/label -> sync will still cause the deadlock in the
>>> same way,
>>> and you can try it yourself.
>>
>> As I said above, the implementation of sysfs feature and label change is wrong,
>> it is better to separate them from the pending mount option change, make the
>> sysfs feature and label change be done in the context of transaction after
>> getting the write permission. If so, we needn't do anything special when sync
>> the fs.
> 
> That would mean to drop the write support of sysfs files that change
> global filesystem state (label and features right now). This would leave
> only the ioctl way to do that. I'd like to keep the sysfs write support
> though for ease of use from scripts and languages not ioctl-friendly.
> .

not drop the write support of sysfs, just fix the bug and make it change the
label and features under the writable context.

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 60f7cbe..1d9f1e6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@  int btrfs_sync_fs(struct super_block *sb, int wait)
 			 */
 			if (fs_info->pending_changes == 0)
 				return 0;
+			/*
+			 * Test if the fs is frozen, or start_trasaction
+			 * will deadlock on itself.
+			 */
+			if (__sb_start_write(sb, SB_FREEZE_FS, false))
+				__sb_end_write(sb, SB_FREEZE_FS);
+			else
+				return 0;
 			trans = btrfs_start_transaction(root, 0);
 		} else {
 			return PTR_ERR(trans);