Message ID | 1421653361-18630-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: > The fix is to check if the fs is frozen, if the fs is frozen, just > return and waiting for the next transaction. > > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > */ > if (fs_info->pending_changes == 0) > return 0; > + /* > + * Test if the fs is frozen, or start_trasaction > + * will deadlock on itself. > + */ > + if (__sb_start_write(sb, SB_FREEZE_FS, false)) > + __sb_end_write(sb, SB_FREEZE_FS); > + else > + return 0; I'm not sure this is the right fix. We should use either mnt_want_write_file or sb_start_write around the start/commit functions. The fs may be frozen already, but we also have to catch transition to that state, or RO remount. Also, returning 0 is not right, the ioctl actually skipped the expected work. > trans = btrfs_start_transaction(root, 0); > } else { > return PTR_ERR(trans); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote: > Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending > changes) will call btrfs_start_transaction() in sync_fs(), to handle > some operations needed to be done in next transaction. > > However this can cause deadlock if the filesystem is frozen, with the > following sys_r+w output: > [ 143.255932] Call Trace: > [ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70 > [ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100 > [ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0 > [btrfs] > [ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20 > [btrfs] > [ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs] > [ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30 > [ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0 > [ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90 > [ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 > [ 143.256111] Call Trace: > [ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70 > [ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0 > [ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20 > [ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0 > [ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540 > [ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0 > [ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 > > The reason is like the following: > (Holding s_umount) > VFS sync_fs staff: > |- btrfs_sync_fs() > |- btrfs_start_transaction() > |- sb_start_intwrite() > (Waiting thaw_fs to unfreeze) > VFS thaw_fs staff: > thaw_fs() > (Waiting sync_fs to release > s_umount) > > So deadlock happens. > This can be easily triggered by fstest/generic/068 with inode_cache > mount option. > > The fix is to check if the fs is frozen, if the fs is frozen, just > return and waiting for the next transaction. > > Cc: David Sterba <dsterba@suse.cz> > Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > --- > fs/btrfs/super.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 60f7cbe..1d9f1e6 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > */ > if (fs_info->pending_changes == 0) > return 0; I think the problem is here -- why ->pending_changes is not 0 when the filesystem is frozen? so I think the reason of this problem is btrfs_freeze forget to deal with the pending changes, and the correct fix is to correct the behavior of btrfs_freeze(). Thanks Miao > + /* > + * Test if the fs is frozen, or start_trasaction > + * will deadlock on itself. > + */ > + if (__sb_start_write(sb, SB_FREEZE_FS, false)) > + __sb_end_write(sb, SB_FREEZE_FS); > + else > + return 0; > trans = btrfs_start_transaction(root, 0); > } else { > return PTR_ERR(trans); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie <miaoxie@huawei.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com>, <linux-btrfs@vger.kernel.org> Date: 2015?01?20? 08:19 > On Mon, 19 Jan 2015 15:42:41 +0800, Qu Wenruo wrote: >> Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending >> changes) will call btrfs_start_transaction() in sync_fs(), to handle >> some operations needed to be done in next transaction. >> >> However this can cause deadlock if the filesystem is frozen, with the >> following sys_r+w output: >> [ 143.255932] Call Trace: >> [ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70 >> [ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100 >> [ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0 >> [btrfs] >> [ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20 >> [btrfs] >> [ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs] >> [ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30 >> [ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0 >> [ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90 >> [ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 >> [ 143.256111] Call Trace: >> [ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70 >> [ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0 >> [ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20 >> [ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0 >> [ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540 >> [ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0 >> [ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 >> >> The reason is like the following: >> (Holding s_umount) >> VFS sync_fs staff: >> |- btrfs_sync_fs() >> |- btrfs_start_transaction() >> |- sb_start_intwrite() >> (Waiting thaw_fs to unfreeze) >> VFS thaw_fs staff: >> thaw_fs() >> (Waiting sync_fs to release >> s_umount) >> >> So deadlock happens. >> This can be easily triggered by fstest/generic/068 with inode_cache >> mount option. >> >> The fix is to check if the fs is frozen, if the fs is frozen, just >> return and waiting for the next transaction. >> >> Cc: David Sterba <dsterba@suse.cz> >> Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> >> --- >> fs/btrfs/super.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> index 60f7cbe..1d9f1e6 100644 >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >> */ >> if (fs_info->pending_changes == 0) >> return 0; > > I think the problem is here -- why ->pending_changes is not 0 when the > filesystem is frozen? This happens when already no transaction is running but some one set inode_cache or things needs pending. And the freeze follows. > so I think the reason of this problem is btrfs_freeze > forget to deal with the pending changes, and the correct fix is to correct > the behavior of btrfs_freeze(). Great! Thanks for pointing this! Starting a transaction in btrfs_freeze() seems to be the silver bullet for such case. Thanks Qu > > Thanks > Miao > >> + /* >> + * Test if the fs is frozen, or start_trasaction >> + * will deadlock on itself. >> + */ >> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >> + __sb_end_write(sb, SB_FREEZE_FS); >> + else >> + return 0; >> trans = btrfs_start_transaction(root, 0); >> } else { >> return PTR_ERR(trans); >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba <dsterba@suse.cz> To: Qu Wenruo <quwenruo@cn.fujitsu.com> Date: 2015?01?19? 22:06 > On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >> The fix is to check if the fs is frozen, if the fs is frozen, just >> return and waiting for the next transaction. >> >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >> */ >> if (fs_info->pending_changes == 0) >> return 0; >> + /* >> + * Test if the fs is frozen, or start_trasaction >> + * will deadlock on itself. >> + */ >> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >> + __sb_end_write(sb, SB_FREEZE_FS); >> + else >> + return 0; > I'm not sure this is the right fix. We should use either > mnt_want_write_file or sb_start_write around the start/commit functions. > The fs may be frozen already, but we also have to catch transition to > that state, or RO remount. But the deadlock between s_umount and frozen level is a larger problem... Even Miao mentioned that we can start a transaction in btrfs_freeze(), but there is still possibility that we try to change the feature of the frozen btrfs and do sync, again the deadlock will happen. Although handling in btrfs_freeze() is also needed, but can't resolve all the problem. IMHO the fix is still needed, or at least as a workaround until we find a real root solution for it (If nobody want to revert the patchset) BTW, what about put the pending changes to a workqueue? If we don't start transaction under s_umount context like sync_fs() Thanks, Qu > > Also, returning 0 is not right, the ioctl actually skipped the expected > work. > >> trans = btrfs_start_transaction(root, 0); >> } else { >> return PTR_ERR(trans); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Add CC to Miao Xie <miaoxie@huawei.com> -------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Qu Wenruo <quwenruo@cn.fujitsu.com> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie <miaox@cn.fujitsu.com> Date: 2015?01?20? 10:51 > > -------- Original Message -------- > Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on > frozen fs to avoid deadlock. > From: David Sterba <dsterba@suse.cz> > To: Qu Wenruo <quwenruo@cn.fujitsu.com> > Date: 2015?01?19? 22:06 >> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>> The fix is to check if the fs is frozen, if the fs is frozen, just >>> return and waiting for the next transaction. >>> >>> --- a/fs/btrfs/super.c >>> +++ b/fs/btrfs/super.c >>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int >>> wait) >>> */ >>> if (fs_info->pending_changes == 0) >>> return 0; >>> + /* >>> + * Test if the fs is frozen, or start_trasaction >>> + * will deadlock on itself. >>> + */ >>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>> + __sb_end_write(sb, SB_FREEZE_FS); >>> + else >>> + return 0; >> I'm not sure this is the right fix. We should use either >> mnt_want_write_file or sb_start_write around the start/commit functions. >> The fs may be frozen already, but we also have to catch transition to >> that state, or RO remount. > But the deadlock between s_umount and frozen level is a larger problem... > > Even Miao mentioned that we can start a transaction in btrfs_freeze(), > but there is still possibility that > we try to change the feature of the frozen btrfs and do sync, again > the deadlock will happen. > Although handling in btrfs_freeze() is also needed, but can't resolve > all the problem. > > IMHO the fix is still needed, or at least as a workaround until we > find a real root solution for it > (If nobody want to revert the patchset) > > BTW, what about put the pending changes to a workqueue? If we don't > start transaction under > s_umount context like sync_fs() > > Thanks, > Qu >> >> Also, returning 0 is not right, the ioctl actually skipped the expected >> work. >> >>> trans = btrfs_start_transaction(root, 0); >>> } else { >>> return PTR_ERR(trans); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 20 Jan 2015 10:53:05 +0800, Qu Wenruo wrote: > Add CC to Miao Xie <miaoxie@huawei.com> > > -------- Original Message -------- > Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to > avoid deadlock. > From: Qu Wenruo <quwenruo@cn.fujitsu.com> > To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie <miaox@cn.fujitsu.com> > Date: 2015?01?20? 10:51 >> >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >> to avoid deadlock. >> From: David Sterba <dsterba@suse.cz> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >> Date: 2015?01?19? 22:06 >>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>> The fix is to check if the fs is frozen, if the fs is frozen, just >>>> return and waiting for the next transaction. >>>> >>>> --- a/fs/btrfs/super.c >>>> +++ b/fs/btrfs/super.c >>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>> */ >>>> if (fs_info->pending_changes == 0) >>>> return 0; >>>> + /* >>>> + * Test if the fs is frozen, or start_trasaction >>>> + * will deadlock on itself. >>>> + */ >>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>> + else >>>> + return 0; >>> I'm not sure this is the right fix. We should use either >>> mnt_want_write_file or sb_start_write around the start/commit functions. >>> The fs may be frozen already, but we also have to catch transition to >>> that state, or RO remount. >> But the deadlock between s_umount and frozen level is a larger problem... >> >> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but >> there is still possibility that >> we try to change the feature of the frozen btrfs and do sync, again the >> deadlock will happen. >> Although handling in btrfs_freeze() is also needed, but can't resolve all the >> problem. >> >> IMHO the fix is still needed, or at least as a workaround until we find a real >> root solution for it >> (If nobody want to revert the patchset) >> >> BTW, what about put the pending changes to a workqueue? If we don't start >> transaction under >> s_umount context like sync_fs() I don't like this fix. I think we should deal with those pending changes when we freeze a filesystem. or we break the rule of fs freeze. Thanks Miao >> >> Thanks, >> Qu >>> >>> Also, returning 0 is not right, the ioctl actually skipped the expected >>> work. >>> >>>> trans = btrfs_start_transaction(root, 0); >>>> } else { >>>> return PTR_ERR(trans); >> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie <miaoxie@huawei.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com>, <dsterba@suse.cz>, <linux-btrfs@vger.kernel.org> Date: 2015?01?20? 11:06 > On Tue, 20 Jan 2015 10:53:05 +0800, Qu Wenruo wrote: >> Add CC to Miao Xie <miaoxie@huawei.com> >> >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to >> avoid deadlock. >> From: Qu Wenruo <quwenruo@cn.fujitsu.com> >> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, Miao Xie <miaox@cn.fujitsu.com> >> Date: 2015?01?20? 10:51 >>> -------- Original Message -------- >>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >>> to avoid deadlock. >>> From: David Sterba <dsterba@suse.cz> >>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>> Date: 2015?01?19? 22:06 >>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>> The fix is to check if the fs is frozen, if the fs is frozen, just >>>>> return and waiting for the next transaction. >>>>> >>>>> --- a/fs/btrfs/super.c >>>>> +++ b/fs/btrfs/super.c >>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>> */ >>>>> if (fs_info->pending_changes == 0) >>>>> return 0; >>>>> + /* >>>>> + * Test if the fs is frozen, or start_trasaction >>>>> + * will deadlock on itself. >>>>> + */ >>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>> + else >>>>> + return 0; >>>> I'm not sure this is the right fix. We should use either >>>> mnt_want_write_file or sb_start_write around the start/commit functions. >>>> The fs may be frozen already, but we also have to catch transition to >>>> that state, or RO remount. >>> But the deadlock between s_umount and frozen level is a larger problem... >>> >>> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but >>> there is still possibility that >>> we try to change the feature of the frozen btrfs and do sync, again the >>> deadlock will happen. >>> Although handling in btrfs_freeze() is also needed, but can't resolve all the >>> problem. >>> >>> IMHO the fix is still needed, or at least as a workaround until we find a real >>> root solution for it >>> (If nobody want to revert the patchset) >>> >>> BTW, what about put the pending changes to a workqueue? If we don't start >>> transaction under >>> s_umount context like sync_fs() > I don't like this fix. > I think we should deal with those pending changes when we freeze a filesystem. > or we break the rule of fs freeze. I am afraid handling it in btrfs_freeze() won't help. Case like freeze() -> change_feature -> sync() -> unfreeze() will still deadlock in sync(). Even cleared the pending changes in freeze(), it can still be set through sysfs interface even the fs is frozen. And in fact, if we put the things like attach/create a transaction into a workqueue, we will not break the freeze rule. Since if the fs is frozen, there is no running transaction and we need to create a new one, that will call sb_start_intwrite(), which will sleep until the fs is unfreeze. Thanks, Qu > > Thanks > Miao > >>> Thanks, >>> Qu >>>> Also, returning 0 is not right, the ioctl actually skipped the expected >>>> work. >>>> >>>>> trans = btrfs_start_transaction(root, 0); >>>>> } else { >>>>> return PTR_ERR(trans); >> . >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 20 Jan 2015 11:17:07 +0800, Qu Wenruo wrote: >>>>>> --- a/fs/btrfs/super.c >>>>>> +++ b/fs/btrfs/super.c >>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>>> */ >>>>>> if (fs_info->pending_changes == 0) >>>>>> return 0; >>>>>> + /* >>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>> + * will deadlock on itself. >>>>>> + */ >>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>> + else >>>>>> + return 0; >>>>> I'm not sure this is the right fix. We should use either >>>>> mnt_want_write_file or sb_start_write around the start/commit functions. >>>>> The fs may be frozen already, but we also have to catch transition to >>>>> that state, or RO remount. >>>> But the deadlock between s_umount and frozen level is a larger problem... >>>> >>>> Even Miao mentioned that we can start a transaction in btrfs_freeze(), but >>>> there is still possibility that >>>> we try to change the feature of the frozen btrfs and do sync, again the >>>> deadlock will happen. >>>> Although handling in btrfs_freeze() is also needed, but can't resolve all the >>>> problem. >>>> >>>> IMHO the fix is still needed, or at least as a workaround until we find a real >>>> root solution for it >>>> (If nobody want to revert the patchset) >>>> >>>> BTW, what about put the pending changes to a workqueue? If we don't start >>>> transaction under >>>> s_umount context like sync_fs() >> I don't like this fix. >> I think we should deal with those pending changes when we freeze a filesystem. >> or we break the rule of fs freeze. > I am afraid handling it in btrfs_freeze() won't help. > Case like freeze() -> change_feature -> sync() -> unfreeze() will still deadlock > in sync(). We should not change feature after the fs is freezed. > Even cleared the pending changes in freeze(), it can still be set through sysfs > interface even the fs is frozen. > > And in fact, if we put the things like attach/create a transaction into a > workqueue, we will not break > the freeze rule. > Since if the fs is frozen, there is no running transaction and we need to create > a new one, > that will call sb_start_intwrite(), which will sleep until the fs is unfreeze. I read the pending change code just now, and I found the pending change is just used for changing the mount option now, so I think as a work-around fix we needn't start a new transaction to handle the pending flags which are set after the current transaction is committed, because the data on the disk is integrated. Thanks Miao > > Thanks, > Qu >> >> Thanks >> Miao >> >>>> Thanks, >>>> Qu >>>>> Also, returning 0 is not right, the ioctl actually skipped the expected >>>>> work. >>>>> >>>>>> trans = btrfs_start_transaction(root, 0); >>>>>> } else { >>>>>> return PTR_ERR(trans); >>> . >>> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) > */ > if (fs_info->pending_changes == 0) > return 0; > + /* > + * Test if the fs is frozen, or start_trasaction > + * will deadlock on itself. > + */ > + if (__sb_start_write(sb, SB_FREEZE_FS, false)) > + __sb_end_write(sb, SB_FREEZE_FS); > + else > + return 0; The more I look into that the more I think that the first fix is the right one. Has been pointed out in this thread, it is ok to skip processing the pending changes if the filesystem is frozen. The pending changes have to flushed from sync (by design), we cannot use mnt_want_write or the sb_start* protections that. The btrfs_freeze callback can safely do the last commit, that's under s_umount held by vfs::freeze_super. Then any other new transaction would block. Any other call to btrfs_sync_fs will not find any active transaction and with this patch will not start one. Sounds safe to me. I think the right level to check is SB_FREEZE_WRITE though, to stop any potential writes as soon as possible and when the s_umount lock is still held in vfs::freeze_super. I'll collect the relevant patches and will send it for review. > trans = btrfs_start_transaction(root, 0); > } else { > return PTR_ERR(trans); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba <dsterba@suse.cz> To: Qu Wenruo <quwenruo@cn.fujitsu.com> Date: 2015?01?21? 01:13 > On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >> */ >> if (fs_info->pending_changes == 0) >> return 0; >> + /* >> + * Test if the fs is frozen, or start_trasaction >> + * will deadlock on itself. >> + */ >> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >> + __sb_end_write(sb, SB_FREEZE_FS); >> + else >> + return 0; > The more I look into that the more I think that the first fix is the > right one. > > Has been pointed out in this thread, it is ok to skip processing the > pending changes if the filesystem is frozen. That's good, for me, either this patch or the patch 2~5 in the patchset will solve the sync_fs() problem on frozen fs. Just different timing to start the new transaction. But the patchset one has the problem, which needs to deal with the sysfs interface changes, or sync_fs() will still cause deadlock. So I tried to revert the sysfs related patches, but it seems overkilled, needing extra btrfs_start_transaction* things. As you already picked this one, I'm completely OK with this. > > The pending changes have to flushed from sync (by design), we cannot use > mnt_want_write or the sb_start* protections that. > > The btrfs_freeze callback can safely do the last commit, that's under > s_umount held by vfs::freeze_super. Then any other new transaction would > block. Any other call to btrfs_sync_fs will not find any active > transaction and with this patch will not start one. Sounds safe to me. > > I think the right level to check is SB_FREEZE_WRITE though, to stop any > potential writes as soon as possible and when the s_umount lock is still > held in vfs::freeze_super. SB_FREEZE_WRITE seems good for me. But I didn't catch the difference between SB_FREEZE_FS(WRITE/PAGEFAULT/COMPLETE), since freeze() conflicts with sync_fs(), when we comes to btrfs_sync_fs(), the fs is either totally frozen or unfrozen and frozen level won't change during the protection of s_umount. Although SB_FREEZE_WRITE seems better in its meaning and makes it more readable. > > I'll collect the relevant patches and will send it for review. Thanks for collecting them and sending them out. Thanks, Qu > > >> trans = btrfs_start_transaction(root, 0); >> } else { >> return PTR_ERR(trans); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > -------- Original Message -------- > Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on > frozen fs to avoid deadlock. > From: David Sterba <dsterba@suse.cz> > To: Qu Wenruo <quwenruo@cn.fujitsu.com> > Date: 2015?01?21? 01:13 >> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>> --- a/fs/btrfs/super.c >>> +++ b/fs/btrfs/super.c >>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, >>> int wait) >>> */ >>> if (fs_info->pending_changes == 0) >>> return 0; >>> + /* >>> + * Test if the fs is frozen, or start_trasaction >>> + * will deadlock on itself. >>> + */ >>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>> + __sb_end_write(sb, SB_FREEZE_FS); >>> + else >>> + return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason <clm@fb.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com> Date: 2015?01?21? 09:05 > > > On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> > wrote: >> >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on >> frozen fs to avoid deadlock. >> From: David Sterba <dsterba@suse.cz> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >> Date: 2015?01?21? 01:13 >>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>> --- a/fs/btrfs/super.c >>>> +++ b/fs/btrfs/super.c >>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, >>>> int wait) >>>> */ >>>> if (fs_info->pending_changes == 0) >>>> return 0; >>>> + /* >>>> + * Test if the fs is frozen, or start_trasaction >>>> + * will deadlock on itself. >>>> + */ >>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>> + else >>>> + return 0; > > But what if someone freezes the FS after __sb_end_write() and before > btrfs_start_transaction()? I don't see what keeps new freezers from > coming in. > > -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. Thanks, Qu > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > -------- Original Message -------- > Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on > frozen fs to avoid deadlock. > From: Chris Mason <clm@fb.com> > To: Qu Wenruo <quwenruo@cn.fujitsu.com> > Date: 2015?01?21? 09:05 >> >> >> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >> wrote: >>> >>> -------- Original Message -------- >>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on >>> frozen fs to avoid deadlock. >>> From: David Sterba <dsterba@suse.cz> >>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>> Date: 2015?01?21? 01:13 >>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>> --- a/fs/btrfs/super.c >>>>> +++ b/fs/btrfs/super.c >>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, >>>>> int wait) >>>>> */ >>>>> if (fs_info->pending_changes == 0) >>>>> return 0; >>>>> + /* >>>>> + * Test if the fs is frozen, or start_trasaction >>>>> + * will deadlock on itself. >>>>> + */ >>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>> + else >>>>> + return 0; >> >> But what if someone freezes the FS after __sb_end_write() and before >> btrfs_start_transaction()? I don't see what keeps new freezers >> from coming in. >> >> -chris > Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount > mutex, so freeze will not happen > during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: > On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >> to avoid deadlock. >> From: Chris Mason <clm@fb.com> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >> Date: 2015?01?21? 09:05 >>> >>> >>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>> >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen >>>> fs to avoid deadlock. >>>> From: David Sterba <dsterba@suse.cz> >>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>> Date: 2015?01?21? 01:13 >>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>>> --- a/fs/btrfs/super.c >>>>>> +++ b/fs/btrfs/super.c >>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>>> */ >>>>>> if (fs_info->pending_changes == 0) >>>>>> return 0; >>>>>> + /* >>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>> + * will deadlock on itself. >>>>>> + */ >>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>> + else >>>>>> + return 0; >>> >>> But what if someone freezes the FS after __sb_end_write() and before >>> btrfs_start_transaction()? I don't see what keeps new freezers from coming in. >>> >>> -chris >> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so >> freeze will not happen >> during sync. > > You're right. I was worried about the sync ioctl, but the mutex won't be held > there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie <miaoxie@huawei.com> To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com> Date: 2015?01?21? 11:10 > On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: >> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>> -------- Original Message -------- >>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >>> to avoid deadlock. >>> From: Chris Mason <clm@fb.com> >>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>> Date: 2015?01?21? 09:05 >>>> >>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>>> -------- Original Message -------- >>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen >>>>> fs to avoid deadlock. >>>>> From: David Sterba <dsterba@suse.cz> >>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>>> Date: 2015?01?21? 01:13 >>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>>>> --- a/fs/btrfs/super.c >>>>>>> +++ b/fs/btrfs/super.c >>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>>>> */ >>>>>>> if (fs_info->pending_changes == 0) >>>>>>> return 0; >>>>>>> + /* >>>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>>> + * will deadlock on itself. >>>>>>> + */ >>>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>>> + else >>>>>>> + return 0; >>>> But what if someone freezes the FS after __sb_end_write() and before >>>> btrfs_start_transaction()? I don't see what keeps new freezers from coming in. >>>> >>>> -chris >>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so >>> freeze will not happen >>> during sync. >> You're right. I was worried about the sync ioctl, but the mutex won't be held >> there to deadlock against. We'll be fine. > There is another problem which is introduced by pending change. That is we will > start and commmit a transaction by changing pending mount option after we set > the fs to be R/O. Oh, I missed this problem. > > I think it is better that we don't start a new transaction for pending changes > which are set after the transaction is committed, just make them be handled by > the next transaction, This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. For example, if we change the features/label through sysfs, and then umount the fs, since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu > the reason is: > - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) > - Data on the disk is right and integrated > > > Thanks > Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote: > > -------- Original Message -------- > Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to > avoid deadlock. > From: Miao Xie <miaoxie@huawei.com> > To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com> > Date: 2015?01?21? 11:10 >> On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: >>> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>> -------- Original Message -------- >>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >>>> to avoid deadlock. >>>> From: Chris Mason <clm@fb.com> >>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>> Date: 2015?01?21? 09:05 >>>>> >>>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>>>> -------- Original Message -------- >>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen >>>>>> fs to avoid deadlock. >>>>>> From: David Sterba <dsterba@suse.cz> >>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>>>> Date: 2015?01?21? 01:13 >>>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>>>>> --- a/fs/btrfs/super.c >>>>>>>> +++ b/fs/btrfs/super.c >>>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>>>>> */ >>>>>>>> if (fs_info->pending_changes == 0) >>>>>>>> return 0; >>>>>>>> + /* >>>>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>>>> + * will deadlock on itself. >>>>>>>> + */ >>>>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>>>> + else >>>>>>>> + return 0; >>>>> But what if someone freezes the FS after __sb_end_write() and before >>>>> btrfs_start_transaction()? I don't see what keeps new freezers from >>>>> coming in. >>>>> >>>>> -chris >>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so >>>> freeze will not happen >>>> during sync. >>> You're right. I was worried about the sync ioctl, but the mutex won't be held >>> there to deadlock against. We'll be fine. >> There is another problem which is introduced by pending change. That is we will >> start and commmit a transaction by changing pending mount option after we set >> the fs to be R/O. > Oh, I missed this problem. >> >> I think it is better that we don't start a new transaction for pending changes >> which are set after the transaction is committed, just make them be handled by >> the next transaction, > This will cause another problem, nobody can ensure there will be next > transaction and the change may > never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. > > For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao > since there is no write, there is no running transaction and if we don't start a > new transaction, > it won't be flushed to disk. > > Thanks, > Qu >> the reason is: >> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) >> - Data on the disk is right and integrated >> >> >> Thanks >> Miao > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie <miaoxie@huawei.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com>, Chris Mason <clm@fb.com> Date: 2015?01?21? 11:26 > On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote: >> -------- Original Message -------- >> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to >> avoid deadlock. >> From: Miao Xie <miaoxie@huawei.com> >> To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com> >> Date: 2015?01?21? 11:10 >>> On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: >>>> On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>>> -------- Original Message -------- >>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs >>>>> to avoid deadlock. >>>>> From: Chris Mason <clm@fb.com> >>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>>> Date: 2015?01?21? 09:05 >>>>>> On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>>>>> -------- Original Message -------- >>>>>>> Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen >>>>>>> fs to avoid deadlock. >>>>>>> From: David Sterba <dsterba@suse.cz> >>>>>>> To: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>>>>> Date: 2015?01?21? 01:13 >>>>>>>> On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: >>>>>>>>> --- a/fs/btrfs/super.c >>>>>>>>> +++ b/fs/btrfs/super.c >>>>>>>>> @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >>>>>>>>> */ >>>>>>>>> if (fs_info->pending_changes == 0) >>>>>>>>> return 0; >>>>>>>>> + /* >>>>>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>>>>> + * will deadlock on itself. >>>>>>>>> + */ >>>>>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>>>>> + else >>>>>>>>> + return 0; >>>>>> But what if someone freezes the FS after __sb_end_write() and before >>>>>> btrfs_start_transaction()? I don't see what keeps new freezers from >>>>>> coming in. >>>>>> >>>>>> -chris >>>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so >>>>> freeze will not happen >>>>> during sync. >>>> You're right. I was worried about the sync ioctl, but the mutex won't be held >>>> there to deadlock against. We'll be fine. >>> There is another problem which is introduced by pending change. That is we will >>> start and commmit a transaction by changing pending mount option after we set >>> the fs to be R/O. >> Oh, I missed this problem. >>> I think it is better that we don't start a new transaction for pending changes >>> which are set after the transaction is committed, just make them be handled by >>> the next transaction, >> This will cause another problem, nobody can ensure there will be next >> transaction and the change may >> never to written into disk. > First, the pending changes is mount option, that is in-memory data. > Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. > >> For example, if we change the features/label through sysfs, and then umount the fs, > It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze -> change features/label -> sync will still cause the deadlock in the same way, and you can try it yourself. Thanks, Qu > If you want to change features/label, you should get write permission and make > sure the fs is not be freezed because those are on-disk data. So the problem > doesn't exist, or there is a bug. > > Thanks > Miao > >> since there is no write, there is no running transaction and if we don't start a >> new transaction, >> it won't be flushed to disk. >> >> Thanks, >> Qu >>> the reason is: >>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) >>> - Data on the disk is right and integrated >>> >>> >>> Thanks >>> Miao >> . >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: >>>>>>>>>> + /* >>>>>>>>>> + * Test if the fs is frozen, or start_trasaction >>>>>>>>>> + * will deadlock on itself. >>>>>>>>>> + */ >>>>>>>>>> + if (__sb_start_write(sb, SB_FREEZE_FS, false)) >>>>>>>>>> + __sb_end_write(sb, SB_FREEZE_FS); >>>>>>>>>> + else >>>>>>>>>> + return 0; >>>>>>> But what if someone freezes the FS after __sb_end_write() and before >>>>>>> btrfs_start_transaction()? I don't see what keeps new freezers from >>>>>>> coming in. >>>>>>> >>>>>>> -chris >>>>>> Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so >>>>>> freeze will not happen >>>>>> during sync. >>>>> You're right. I was worried about the sync ioctl, but the mutex won't be held >>>>> there to deadlock against. We'll be fine. >>>> There is another problem which is introduced by pending change. That is we will >>>> start and commmit a transaction by changing pending mount option after we set >>>> the fs to be R/O. >>> Oh, I missed this problem. >>>> I think it is better that we don't start a new transaction for pending changes >>>> which are set after the transaction is committed, just make them be handled by >>>> the next transaction, >>> This will cause another problem, nobody can ensure there will be next >>> transaction and the change may >>> never to written into disk. >> First, the pending changes is mount option, that is in-memory data. >> Second, the same problem would happen after you freeze fs. > Pending changes are *not* only mount options. Feature change and label change > are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. > Normal ioctl label changing is not affected. > > For freeze, it's not the same problem since the fs will be unfreeze sooner or > later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. >> >>> For example, if we change the features/label through sysfs, and then umount >>> the fs, >> It is different from pending change. > No, now features/label changing using sysfs both use pending changes to do the > commit. > See BTRFS_PENDING_COMMIT bit. > So freeze -> change features/label -> sync will still cause the deadlock in the > same way, > and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Thanks Miao > > Thanks, > Qu > >> If you want to change features/label, you should get write permission and make >> sure the fs is not be freezed because those are on-disk data. So the problem >> doesn't exist, or there is a bug. >> >> Thanks >> Miao >> >>> since there is no write, there is no running transaction and if we don't start a >>> new transaction, >>> it won't be flushed to disk. >>> >>> Thanks, >>> Qu >>>> the reason is: >>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) >>>> - Data on the disk is right and integrated >>>> >>>> >>>> Thanks >>>> Miao >>> . >>> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-------- Original Message -------- Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie <miaoxie@huawei.com> To: Qu Wenruo <quwenruo@cn.fujitsu.com>, Chris Mason <clm@fb.com> Date: 2015?01?21? 15:04 > On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: >>>> [snipped] >>>> This will cause another problem, nobody can ensure there will be next >>>> transaction and the change may >>>> never to written into disk. >>> First, the pending changes is mount option, that is in-memory data. >>> Second, the same problem would happen after you freeze fs. >> Pending changes are *not* only mount options. Feature change and label change >> are also pending changes if using sysfs. > My miss, I don't notice feature and label change by sysfs. > > But the implementation of feature and label change by sysfs is wrong, we can > not change them without write permission. > >> Normal ioctl label changing is not affected. >> >> For freeze, it's not the same problem since the fs will be unfreeze sooner or >> later and transaction will be initiated. > You can not assume the operations of the users, they might freeze the fs and > then shutdown the machine. > >>>> For example, if we change the features/label through sysfs, and then umount >>>> the fs, >>> It is different from pending change. >> No, now features/label changing using sysfs both use pending changes to do the >> commit. >> See BTRFS_PENDING_COMMIT bit. >> So freeze -> change features/label -> sync will still cause the deadlock in the >> same way, >> and you can try it yourself. > As I said above, the implementation of sysfs feature and label change is wrong, > it is better to separate them from the pending mount option change, make the > sysfs feature and label change be done in the context of transaction after > getting the write permission. If so, we needn't do anything special when sync > the fs. > > In short, changing the sysfs feature and label change implementation and > removing the unnecessary btrfs_start_transaction in sync_fs can fix the > deadlock. Your method will only fix the deadlock, but will introduce the risk like pending inode_cache will never be written to disk as I already explained. (If still using the fs_info->pending_changes mechanism) To ensure pending changes written to disk sync_fs() should start a transaction if needed, or there will be chance that no transaction can handle it. But I don't see the necessity to pending current work(inode_cache, feature/label changes) to next transaction. To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. Or even in worst case, there is a racing window, we can still start a transaction and do the commit, a little overhead in such minor case won't impact the overall performance. For sysfs change, I prefer attach or start transaction method, and for mount option change, and such sysfs tuning is also minor case for a filesystem. What do you think about reverting the whole patchset and rework the sysfs interface? Thanks, Qu > > Thanks > Miao > >> Thanks, >> Qu >> >>> If you want to change features/label, you should get write permission and make >>> sure the fs is not be freezed because those are on-disk data. So the problem >>> doesn't exist, or there is a bug. >>> >>> Thanks >>> Miao >>> >>>> since there is no write, there is no running transaction and if we don't start a >>>> new transaction, >>>> it won't be flushed to disk. >>>> >>>> Thanks, >>>> Qu >>>>> the reason is: >>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) >>>>> - Data on the disk is right and integrated >>>>> >>>>> >>>>> Thanks >>>>> Miao >>>> . >>>> >> . >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 21 Jan 2015 15:47:54 +0800, Qu Wenruo wrote: >> On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: >>>>> [snipped] >>>>> This will cause another problem, nobody can ensure there will be next >>>>> transaction and the change may >>>>> never to written into disk. >>>> First, the pending changes is mount option, that is in-memory data. >>>> Second, the same problem would happen after you freeze fs. >>> Pending changes are *not* only mount options. Feature change and label change >>> are also pending changes if using sysfs. >> My miss, I don't notice feature and label change by sysfs. >> >> But the implementation of feature and label change by sysfs is wrong, we can >> not change them without write permission. >> >>> Normal ioctl label changing is not affected. >>> >>> For freeze, it's not the same problem since the fs will be unfreeze sooner or >>> later and transaction will be initiated. >> You can not assume the operations of the users, they might freeze the fs and >> then shutdown the machine. >> >>>>> For example, if we change the features/label through sysfs, and then umount >>>>> the fs, >>>> It is different from pending change. >>> No, now features/label changing using sysfs both use pending changes to do the >>> commit. >>> See BTRFS_PENDING_COMMIT bit. >>> So freeze -> change features/label -> sync will still cause the deadlock in the >>> same way, >>> and you can try it yourself. >> As I said above, the implementation of sysfs feature and label change is wrong, >> it is better to separate them from the pending mount option change, make the >> sysfs feature and label change be done in the context of transaction after >> getting the write permission. If so, we needn't do anything special when sync >> the fs. >> >> In short, changing the sysfs feature and label change implementation and >> removing the unnecessary btrfs_start_transaction in sync_fs can fix the >> deadlock. > Your method will only fix the deadlock, but will introduce the risk like pending > inode_cache will never > be written to disk as I already explained. (If still using the > fs_info->pending_changes mechanism) > To ensure pending changes written to disk sync_fs() should start a transaction > if needed, > or there will be chance that no transaction can handle it. We are sure that writting down the inode cache need transaction. But INODE_CACHE is not a forcible flag. Sometimes though you set it, it is very likely that the inode cache files are not created and the data is not written down because the fs might still be reading inode usage information, and this operation might span several transactions. So I think what you worried is not a problem. Thanks Miao > > But I don't see the necessity to pending current work(inode_cache, feature/label > changes) to next transaction. > > To David: > I'm a little curious about why inode_cache needs to be delayed to next transaction. > In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem > already, > so there should be no running transaction and we can just set any mount option > into fs_info. > > Or even in worst case, there is a racing window, we can still start a > transaction and do the commit, > a little overhead in such minor case won't impact the overall performance. > > For sysfs change, I prefer attach or start transaction method, and for mount > option change, and > such sysfs tuning is also minor case for a filesystem. > > What do you think about reverting the whole patchset and rework the sysfs > interface? > > Thanks, > Qu >> >> Thanks >> Miao >> >>> Thanks, >>> Qu >>> >>>> If you want to change features/label, you should get write permission and make >>>> sure the fs is not be freezed because those are on-disk data. So the problem >>>> doesn't exist, or there is a bug. >>>> >>>> Thanks >>>> Miao >>>> >>>>> since there is no write, there is no running transaction and if we don't >>>>> start a >>>>> new transaction, >>>>> it won't be flushed to disk. >>>>> >>>>> Thanks, >>>>> Qu >>>>>> the reason is: >>>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) >>>>>> - Data on the disk is right and integrated >>>>>> >>>>>> >>>>>> Thanks >>>>>> Miao >>>>> . >>>>> >>> . >>> > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote: > > Pending changes are *not* only mount options. Feature change and label change > > are also pending changes if using sysfs. > > My miss, I don't notice feature and label change by sysfs. > > But the implementation of feature and label change by sysfs is wrong, we can > not change them without write permission. Label change does not happen if the fs is readonly. If the filesystem is RW and label is changed through sysfs, then remount to RO will sync the filesystem and the new label will be saved. The sysfs features write handler is missing that protection though, I'll send a patch. > > For freeze, it's not the same problem since the fs will be unfreeze sooner or > > later and transaction will be initiated. > > You can not assume the operations of the users, they might freeze the fs and > then shutdown the machine. The semantics of freezing should make the on-device image consistent, but still keep some changes in memory. > >>> For example, if we change the features/label through sysfs, and then umount > >>> the fs, > >> It is different from pending change. > > No, now features/label changing using sysfs both use pending changes to do the > > commit. > > See BTRFS_PENDING_COMMIT bit. > > So freeze -> change features/label -> sync will still cause the deadlock in the > > same way, > > and you can try it yourself. > > As I said above, the implementation of sysfs feature and label change is wrong, > it is better to separate them from the pending mount option change, make the > sysfs feature and label change be done in the context of transaction after > getting the write permission. If so, we needn't do anything special when sync > the fs. That would mean to drop the write support of sysfs files that change global filesystem state (label and features right now). This would leave only the ioctl way to do that. I'd like to keep the sysfs write support though for ease of use from scripts and languages not ioctl-friendly. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 21, 2015 at 03:47:54PM +0800, Qu Wenruo wrote: > To David: > I'm a little curious about why inode_cache needs to be delayed to next > transaction. > In btrfs_remount() we have s_umount mutex, and we synced the whole > filesystem already, > so there should be no running transaction and we can just set any mount > option into fs_info. See our discussion under the noinode_cache option: http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30075.html http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg30414.html > What do you think about reverting the whole patchset and rework the > sysfs interface? IMO reverting should be the last option, we have a minimal fix to the sync deadlock and you've proposed the per-trasaction mount options to replace the pending inode_change. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 23, 2015 at 12:39 PM, David Sterba <dsterba@suse.cz> wrote: > On Wed, Jan 21, 2015 at 03:47:54PM +0800, Qu Wenruo wrote: >> To David: >> I'm a little curious about why inode_cache needs to be delayed to >> next >> transaction. >> In btrfs_remount() we have s_umount mutex, and we synced the whole >> filesystem already, >> so there should be no running transaction and we can just set any >> mount >> option into fs_info. > > See our discussion under the noinode_cache option: > > https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%2540vger.kernel.org/msg30075.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=sv%2BL93W9i7vNsbS3ozpylY3o%2F3wpA4TZTQTtFh3mUXg%3D%0A&s=2d678af317413a7452f047aa9ed07bc7e5424d4bae831ac15fae5f23a2acd080 > https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%2540vger.kernel.org/msg30414.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=sv%2BL93W9i7vNsbS3ozpylY3o%2F3wpA4TZTQTtFh3mUXg%3D%0A&s=2fab711b3d70ab27c008694249bc62596f37e41af84dfc21077629930b4fe854 > >> What do you think about reverting the whole patchset and rework the >> sysfs interface? > > IMO reverting should be the last option, we have a minimal fix to the > sync deadlock and you've proposed the per-trasaction mount options to > replace the pending inode_change. I agree, I'd rather build on top of what we have than use reverts at this point. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 23 Jan 2015 17:59:49 +0100, David Sterba wrote: > On Wed, Jan 21, 2015 at 03:04:02PM +0800, Miao Xie wrote: >>> Pending changes are *not* only mount options. Feature change and label change >>> are also pending changes if using sysfs. >> >> My miss, I don't notice feature and label change by sysfs. >> >> But the implementation of feature and label change by sysfs is wrong, we can >> not change them without write permission. > > Label change does not happen if the fs is readonly. If the filesystem is > RW and label is changed through sysfs, then remount to RO will sync the > filesystem and the new label will be saved. > > The sysfs features write handler is missing that protection though, I'll > send a patch. First, the R/O protection is so cheap, there is a race between R/O remount and label/feature change, please consider the following case: Remount R/O task Label/Attr Change Task Check R/O remount ro R/O change Label/feature Second, it forgets to handle the freezing event. > >>> For freeze, it's not the same problem since the fs will be unfreeze sooner or >>> later and transaction will be initiated. >> >> You can not assume the operations of the users, they might freeze the fs and >> then shutdown the machine. > > The semantics of freezing should make the on-device image consistent, > but still keep some changes in memory. > >>>>> For example, if we change the features/label through sysfs, and then umount >>>>> the fs, >>>> It is different from pending change. >>> No, now features/label changing using sysfs both use pending changes to do the >>> commit. >>> See BTRFS_PENDING_COMMIT bit. >>> So freeze -> change features/label -> sync will still cause the deadlock in the >>> same way, >>> and you can try it yourself. >> >> As I said above, the implementation of sysfs feature and label change is wrong, >> it is better to separate them from the pending mount option change, make the >> sysfs feature and label change be done in the context of transaction after >> getting the write permission. If so, we needn't do anything special when sync >> the fs. > > That would mean to drop the write support of sysfs files that change > global filesystem state (label and features right now). This would leave > only the ioctl way to do that. I'd like to keep the sysfs write support > though for ease of use from scripts and languages not ioctl-friendly. > . not drop the write support of sysfs, just fix the bug and make it change the label and features under the writable context. Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60f7cbe..1d9f1e6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info->pending_changes == 0) return 0; + /* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans);
Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100 [ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30 [ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0 [ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90 [ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0 [ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0 [ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba <dsterba@suse.cz> Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> --- fs/btrfs/super.c | 8 ++++++++ 1 file changed, 8 insertions(+)