Message ID | 1458839211-11600-1-git-send-email-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 24, 2016 at 5:06 PM, <fdmanana@kernel.org> wrote: > From: Filipe Manana <fdmanana@suse.com> > > If we delete a snapshot, delete its parent directory, create a new > directory with the same name as that parent and then fsync either that > new directory or some file inside it, we end up with a log tree that > is not possible to replay because the log replay procedure interprets > the snapshot's directory item as a regular entry and not as a root > item, resulting in the following failure and trace when mounting the > filesystem: > > [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257 > [52174.512570] ------------[ cut here ]------------ > [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]() > [52174.514681] BTRFS: Transaction aborted (error -2) > [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc > [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1 > [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 > [52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758 > [52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd > [52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88 > [52174.524053] Call Trace: > [52174.524053] [<ffffffff81264e93>] dump_stack+0x67/0x90 > [52174.524053] [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2 > [52174.524053] [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs] > [52174.524053] [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50 > [52174.524053] [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs] > [52174.524053] [<ffffffff8118f5e9>] ? iput+0xb0/0x284 > [52174.524053] [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs] > [52174.524053] [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs] > [52174.524053] [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs] > [52174.524053] [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs] > [52174.524053] [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs] > [52174.524053] [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs] > [52174.524053] [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs] > [52174.524053] [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs] > [52174.524053] [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs] > [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf > [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 > [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde > [52174.524053] [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs] > [52174.524053] [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf > [52174.524053] [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3 > [52174.524053] [<ffffffff8117bafa>] mount_fs+0x67/0x131 > [52174.524053] [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde > [52174.524053] [<ffffffff8119590f>] do_mount+0x8a6/0x9e8 > [52174.524053] [<ffffffff811358dd>] ? strndup_user+0x3f/0x59 > [52174.524053] [<ffffffff81195c65>] SyS_mount+0x77/0x9f > [52174.524053] [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b > [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]--- > > So when we delete a directory we need to propagate its last_unlink_trans > value (updated on snapshot deletion) to its parent and then check at > fsync time for it and fallback for a transaction commit. > > A test case for fstests follows. > > seq=`basename $0` > seqres=$RESULT_DIR/$seq > echo "QA output created by $seq" > tmp=/tmp/$$ > status=1 # failure is the default! > trap "_cleanup; exit \$status" 0 1 2 3 15 > > _cleanup() > { > _cleanup_flakey > cd / > rm -f $tmp.* > } > > # get standard environment, filters and checks > . ./common/rc > . ./common/filter > . ./common/dmflakey > > # real QA test starts here > _supported_fs btrfs > _supported_os Linux > _require_scratch > _require_dm_target flakey > _require_metadata_journaling $SCRATCH_DEV > > rm -f $seqres.full > > _populate_fs() > { > _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \ > $SCRATCH_MNT/testdir/snap > _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap > rmdir $SCRATCH_MNT/testdir > mkdir $SCRATCH_MNT/testdir > } > > _scratch_mkfs >>$seqres.full 2>&1 > _init_flakey > _mount_flakey > > mkdir $SCRATCH_MNT/testdir > _populate_fs > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir > _flakey_drop_and_remount > > echo "Filesystem contents after the first log replay:" > ls -R $SCRATCH_MNT | _filter_scratch > > # Now do the same as before but instead of doing an fsync against the directory, > # do an fsync against a file inside the directory. > > _populate_fs > touch $SCRATCH_MNT/testdir/foobar > $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foobar > _flakey_drop_and_remount > > echo "Filesystem contents after the second log replay:" > ls -R $SCRATCH_MNT | _filter_scratch > > _unmount_flakey > status=0 > exit > > Signed-off-by: Filipe Manana <fdmanana@suse.com> This patch is no longer needed. It is replaced by the following new patch: https://patchwork.kernel.org/patch/8694271/ Which besides fixing this problem, fixes some other much more serious. Thanks. > --- > fs/btrfs/inode.c | 22 +++++++++++++++++++++- > fs/btrfs/tree-log.c | 6 +++++- > 2 files changed, 26 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 41a5688..b5c23b5 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4173,6 +4173,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) > int err = 0; > struct btrfs_root *root = BTRFS_I(dir)->root; > struct btrfs_trans_handle *trans; > + u64 last_unlink_trans; > > if (inode->i_size > BTRFS_EMPTY_DIR_SIZE) > return -ENOTEMPTY; > @@ -4195,11 +4196,30 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) > if (err) > goto out; > > + last_unlink_trans = BTRFS_I(inode)->last_unlink_trans; > + > /* now the directory is empty */ > err = btrfs_unlink_inode(trans, root, dir, d_inode(dentry), > dentry->d_name.name, dentry->d_name.len); > - if (!err) > + if (!err) { > btrfs_i_size_write(inode, 0); > + /* > + * Propagate the last_unlink_trans value of the deleted dir to > + * its parent directory. This is to prevent an unrecoverable > + * log tree in the case we do something like this: > + * 1) create dir foo > + * 2) create snapshot under dir foo > + * 3) delete the snapshot > + * 4) rmdir foo > + * 5) mkdir foo > + * 6) fsync foo or some file inside foo > + */ > + if (last_unlink_trans >= trans->transid) { > + mutex_lock(&BTRFS_I(dir)->log_mutex); > + BTRFS_I(dir)->last_unlink_trans = last_unlink_trans; > + mutex_unlock(&BTRFS_I(dir)->log_mutex); > + } > + } > out: > btrfs_end_transaction(trans, root); > btrfs_btree_balance_dirty(root); > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 24d03c7..d3f38dd 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -4875,8 +4875,12 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, > if (!parent || d_really_is_negative(parent) || sb != d_inode(parent)->i_sb) > break; > > - if (IS_ROOT(parent)) > + if (IS_ROOT(parent)) { > + inode = d_inode(parent); > + if (btrfs_must_commit_transaction(trans, inode)) > + ret = 1; > break; > + } > > parent = dget_parent(parent); > dput(old_parent); > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 41a5688..b5c23b5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4173,6 +4173,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) int err = 0; struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_trans_handle *trans; + u64 last_unlink_trans; if (inode->i_size > BTRFS_EMPTY_DIR_SIZE) return -ENOTEMPTY; @@ -4195,11 +4196,30 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) if (err) goto out; + last_unlink_trans = BTRFS_I(inode)->last_unlink_trans; + /* now the directory is empty */ err = btrfs_unlink_inode(trans, root, dir, d_inode(dentry), dentry->d_name.name, dentry->d_name.len); - if (!err) + if (!err) { btrfs_i_size_write(inode, 0); + /* + * Propagate the last_unlink_trans value of the deleted dir to + * its parent directory. This is to prevent an unrecoverable + * log tree in the case we do something like this: + * 1) create dir foo + * 2) create snapshot under dir foo + * 3) delete the snapshot + * 4) rmdir foo + * 5) mkdir foo + * 6) fsync foo or some file inside foo + */ + if (last_unlink_trans >= trans->transid) { + mutex_lock(&BTRFS_I(dir)->log_mutex); + BTRFS_I(dir)->last_unlink_trans = last_unlink_trans; + mutex_unlock(&BTRFS_I(dir)->log_mutex); + } + } out: btrfs_end_transaction(trans, root); btrfs_btree_balance_dirty(root); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 24d03c7..d3f38dd 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4875,8 +4875,12 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, if (!parent || d_really_is_negative(parent) || sb != d_inode(parent)->i_sb) break; - if (IS_ROOT(parent)) + if (IS_ROOT(parent)) { + inode = d_inode(parent); + if (btrfs_must_commit_transaction(trans, inode)) + ret = 1; break; + } parent = dget_parent(parent); dput(old_parent);