Message ID | 1344920474-10014-1-git-send-email-liub.liubo@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: > From: Liu Bo <bo.li.liu@oracle.com> > > I found this while testing xfstests 068, the story is > > t1 t2 > sys_sync thaw_super > iterate_supers > down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 > sync_fs (with wait mode) > start_transaction > sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN > > In this patch, I add an helper sb_start_intwrite_trylock() and use it before we > start_transaction in sync_fs() with wait mode so that we won't hit the deadlock. > IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. According to man page, no other operation should modify the fs after the freeze. So for me the modification is inside sync_filesystem (and sync_one_sb). Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/14/2012 08:59 PM, Marco Stornelli wrote: > Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >> From: Liu Bo <bo.li.liu@oracle.com> >> >> I found this while testing xfstests 068, the story is >> >> t1 t2 >> sys_sync thaw_super >> iterate_supers >> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >> sync_fs (with wait mode) >> start_transaction >> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >> >> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock. >> > > IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. > According to man page, no other operation should modify the fs after the freeze. > So for me the modification is inside sync_filesystem (and sync_one_sb). Do you mean that we should add the trylock check in sync_filesystem? But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). thanks, liubo > > Marco > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 14/08/2012 15:53, Liu Bo ha scritto: > On 08/14/2012 08:59 PM, Marco Stornelli wrote: >> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >>> From: Liu Bo <bo.li.liu@oracle.com> >>> >>> I found this while testing xfstests 068, the story is >>> >>> t1 t2 >>> sys_sync thaw_super >>> iterate_supers >>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >>> sync_fs (with wait mode) >>> start_transaction >>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >>> >>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock. >>> >> >> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. >> According to man page, no other operation should modify the fs after the freeze. >> So for me the modification is inside sync_filesystem (and sync_one_sb). > > Do you mean that we should add the trylock check in sync_filesystem? > > But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). > > thanks, > liubo > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(CCed Jan, the author of freeze code) On 08/14/2012 10:12 PM, Marco Stornelli wrote: > Il 14/08/2012 15:53, Liu Bo ha scritto: >> On 08/14/2012 08:59 PM, Marco Stornelli wrote: >>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: >>>> From: Liu Bo <bo.li.liu@oracle.com> >>>> >>>> I found this while testing xfstests 068, the story is >>>> >>>> t1 t2 >>>> sys_sync thaw_super >>>> iterate_supers >>>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 >>>> sync_fs (with wait mode) >>>> start_transaction >>>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN >>>> >>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we >>>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock. >>>> >>> >>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. >>> According to man page, no other operation should modify the fs after the freeze. >>> So for me the modification is inside sync_filesystem (and sync_one_sb). >> >> Do you mean that we should add the trylock check in sync_filesystem? >> >> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). >> >> thanks, >> liubo >> > > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. > I'm ok with it. What do you think about it, Jan? Any comments? thanks, liubo > Marco -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 15-08-12 20:51:17, Liu Bo wrote: > (CCed Jan, the author of freeze code) > On 08/14/2012 10:12 PM, Marco Stornelli wrote: > > Il 14/08/2012 15:53, Liu Bo ha scritto: > >> On 08/14/2012 08:59 PM, Marco Stornelli wrote: > >>> Il 14/08/2012 07:01, liub.liubo@gmail.com ha scritto: > >>>> From: Liu Bo <bo.li.liu@oracle.com> > >>>> > >>>> I found this while testing xfstests 068, the story is > >>>> > >>>> t1 t2 > >>>> sys_sync thaw_super > >>>> iterate_supers > >>>> down_read(sb->s_umount) down_write(sb->s_umount) --->wait for t1 > >>>> sync_fs (with wait mode) > >>>> start_transaction > >>>> sb_start_intwrite --------------------> wait for t2 to set s_writers.frozen to SB_UNFROZEN > >>>> > >>>> In this patch, I add an helper sb_start_intwrite_trylock() and use it before we > >>>> start_transaction in sync_fs() with wait mode so that we won't hit the deadlock. > >>>> > >>> > >>> IMHO, we should avoid to call the sync operation on a frozen fs. The freeze operation, indeed, already include a sync operation. > >>> According to man page, no other operation should modify the fs after the freeze. > >>> So for me the modification is inside sync_filesystem (and sync_one_sb). > >> > >> Do you mean that we should add the trylock check in sync_filesystem? > >> > >> But it seems to be useless because we already run into down_read(sb->s_umount) before starting sync_one_sb(). > >> > >> thanks, > >> liubo > >> > > > > I meant that we should check if there are in a "complete" freeze state (according to the "states" of a freeze transaction) and simply skip the sync operation. > > > > I'm ok with it. > > What do you think about it, Jan? Any comments? Hum, so what I don't exactly understand is why does btrfs start a transaction in ->sync_fs(). The idea of freeze code is that when filesystem is frozen, there is no data / metadata to write and thus we avoid deadlocks arising from trying to write anything with s_umount held. So checking whether filesystem is frozen and avoiding transaction start in that case in btrfs_sync_fs() will work but looks hacky. Rather the check should be for the thing which is the reason for transaction start-end pair in btrfs_sync_fs(). My naive guess would be we should check whether there is any transaction running... Honza
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f2eb24c..1e04b41 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -847,6 +847,21 @@ int btrfs_sync_fs(struct super_block *sb, int wait) return 0; } + /* + * sys_sync can cause an ABBA deadlock with freeze/thaw + * o freeze_super() grabs s_umount lock and set sb to SB_FREEZE_FS. + * o thaw_super() grabs s_umount lock and set sb to SB_UNFROZEN. + * o iterate_supers() grabs s_umount lock, and sync fs, during which + * we need to do sb_start_intwrite() in starting a + * new transaction. + * so iterate_supers() will wait for thaw_super() to reset sb's frozen + * state, while thaw_super() will wait for iterate_supers() to drop the + * s_umount lock. This is an ABBA deadlock. + */ + if (!sb_start_intwrite_trylock(sb)) + return 0; + sb_end_intwrite(sb); + btrfs_wait_ordered_extents(root, 0, 0); trans = btrfs_start_transaction(root, 0); diff --git a/include/linux/fs.h b/include/linux/fs.h index aa11047..8a3efd0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1700,6 +1700,11 @@ static inline void sb_start_intwrite(struct super_block *sb) __sb_start_write(sb, SB_FREEZE_FS, true); } +static inline int sb_start_intwrite_trylock(struct super_block *sb) +{ + return __sb_start_write(sb, SB_FREEZE_FS, false); +} + extern bool inode_owner_or_capable(const struct inode *inode);