From patchwork Thu Mar 24 17:06:51 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 8663061 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id AD3049F36E for ; Thu, 24 Mar 2016 17:07:05 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9D32A20376 for ; Thu, 24 Mar 2016 17:07:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 662D320351 for ; Thu, 24 Mar 2016 17:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750888AbcCXRHA (ORCPT ); Thu, 24 Mar 2016 13:07:00 -0400 Received: from mail.kernel.org ([198.145.29.136]:58528 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750717AbcCXRG6 (ORCPT ); Thu, 24 Mar 2016 13:06:58 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DBC5D20376 for ; Thu, 24 Mar 2016 17:06:56 +0000 (UTC) Received: from debian3.lan (unknown [176.78.112.39]) (using TLSv1.2 with cipher AES128-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3FEC120351 for ; Thu, 24 Mar 2016 17:06:55 +0000 (UTC) From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH] Btrfs: fix unreplayable log after snapshot deletion and parent re-creation Date: Thu, 24 Mar 2016 17:06:51 +0000 Message-Id: <1458839211-11600-1-git-send-email-fdmanana@kernel.org> X-Mailer: git-send-email 2.7.0.rc3 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Filipe Manana If we delete a snapshot, delete its parent directory, create a new directory with the same name as that parent and then fsync either that new directory or some file inside it, we end up with a log tree that is not possible to replay because the log replay procedure interprets the snapshot's directory item as a regular entry and not as a root item, resulting in the following failure and trace when mounting the filesystem: [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257 [52174.512570] ------------[ cut here ]------------ [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]() [52174.514681] BTRFS: Transaction aborted (error -2) [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G W 4.5.0-rc6-btrfs-next-27+ #1 [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [52174.524053] 0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758 [52174.524053] 0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd [52174.524053] 00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88 [52174.524053] Call Trace: [52174.524053] [] dump_stack+0x67/0x90 [52174.524053] [] warn_slowpath_common+0x99/0xb2 [52174.524053] [] ? __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [] warn_slowpath_fmt+0x48/0x50 [52174.524053] [] __btrfs_unlink_inode+0x178/0x351 [btrfs] [52174.524053] [] ? iput+0xb0/0x284 [52174.524053] [] btrfs_unlink_inode+0x1c/0x3d [btrfs] [52174.524053] [] check_item_in_log+0x1fe/0x29b [btrfs] [52174.524053] [] replay_dir_deletes+0x167/0x1cf [btrfs] [52174.524053] [] fixup_inode_link_count+0x289/0x2aa [btrfs] [52174.524053] [] fixup_inode_link_counts+0xcb/0x105 [btrfs] [52174.524053] [] btrfs_recover_log_trees+0x258/0x32c [btrfs] [52174.524053] [] ? replay_one_extent+0x511/0x511 [btrfs] [52174.524053] [] open_ctree+0x1dd4/0x21b9 [btrfs] [52174.524053] [] btrfs_mount+0x97e/0xaed [btrfs] [52174.524053] [] ? trace_hardirqs_on+0xd/0xf [52174.524053] [] mount_fs+0x67/0x131 [52174.524053] [] vfs_kern_mount+0x6c/0xde [52174.524053] [] btrfs_mount+0x1ac/0xaed [btrfs] [52174.524053] [] ? trace_hardirqs_on+0xd/0xf [52174.524053] [] ? lockdep_init_map+0xb9/0x1b3 [52174.524053] [] mount_fs+0x67/0x131 [52174.524053] [] vfs_kern_mount+0x6c/0xde [52174.524053] [] do_mount+0x8a6/0x9e8 [52174.524053] [] ? strndup_user+0x3f/0x59 [52174.524053] [] SyS_mount+0x77/0x9f [52174.524053] [] entry_SYSCALL_64_fastpath+0x12/0x6b [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]--- So when we delete a directory we need to propagate its last_unlink_trans value (updated on snapshot deletion) to its parent and then check at fsync time for it and fallback for a transaction commit. A test case for fstests follows. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _require_dm_target flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _populate_fs() { _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \ $SCRATCH_MNT/testdir/snap _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap rmdir $SCRATCH_MNT/testdir mkdir $SCRATCH_MNT/testdir } _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey mkdir $SCRATCH_MNT/testdir _populate_fs $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir _flakey_drop_and_remount echo "Filesystem contents after the first log replay:" ls -R $SCRATCH_MNT | _filter_scratch # Now do the same as before but instead of doing an fsync against the directory, # do an fsync against a file inside the directory. _populate_fs touch $SCRATCH_MNT/testdir/foobar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foobar _flakey_drop_and_remount echo "Filesystem contents after the second log replay:" ls -R $SCRATCH_MNT | _filter_scratch _unmount_flakey status=0 exit Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 22 +++++++++++++++++++++- fs/btrfs/tree-log.c | 6 +++++- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 41a5688..b5c23b5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4173,6 +4173,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) int err = 0; struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_trans_handle *trans; + u64 last_unlink_trans; if (inode->i_size > BTRFS_EMPTY_DIR_SIZE) return -ENOTEMPTY; @@ -4195,11 +4196,30 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) if (err) goto out; + last_unlink_trans = BTRFS_I(inode)->last_unlink_trans; + /* now the directory is empty */ err = btrfs_unlink_inode(trans, root, dir, d_inode(dentry), dentry->d_name.name, dentry->d_name.len); - if (!err) + if (!err) { btrfs_i_size_write(inode, 0); + /* + * Propagate the last_unlink_trans value of the deleted dir to + * its parent directory. This is to prevent an unrecoverable + * log tree in the case we do something like this: + * 1) create dir foo + * 2) create snapshot under dir foo + * 3) delete the snapshot + * 4) rmdir foo + * 5) mkdir foo + * 6) fsync foo or some file inside foo + */ + if (last_unlink_trans >= trans->transid) { + mutex_lock(&BTRFS_I(dir)->log_mutex); + BTRFS_I(dir)->last_unlink_trans = last_unlink_trans; + mutex_unlock(&BTRFS_I(dir)->log_mutex); + } + } out: btrfs_end_transaction(trans, root); btrfs_btree_balance_dirty(root); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 24d03c7..d3f38dd 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4875,8 +4875,12 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans, if (!parent || d_really_is_negative(parent) || sb != d_inode(parent)->i_sb) break; - if (IS_ROOT(parent)) + if (IS_ROOT(parent)) { + inode = d_inode(parent); + if (btrfs_must_commit_transaction(trans, inode)) + ret = 1; break; + } parent = dget_parent(parent); dput(old_parent);