From patchwork Fri Jun 17 21:12:28 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 892532 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p5HLCl7r008820 for ; Fri, 17 Jun 2011 21:12:50 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757951Ab1FQVMo (ORCPT ); Fri, 17 Jun 2011 17:12:44 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:46237 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757273Ab1FQVMo (ORCPT ); Fri, 17 Jun 2011 17:12:44 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p5HLCaow026509 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 17 Jun 2011 21:12:37 GMT Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id p5HLCZ9w004744 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Jun 2011 21:12:35 GMT Received: from abhmt104.oracle.com (abhmt104.oracle.com [141.146.116.56]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p5HLCUeS013193; Fri, 17 Jun 2011 16:12:30 -0500 Received: from localhost (/67.247.78.12) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 17 Jun 2011 14:12:29 -0700 From: Chris Mason To: "linux-btrfs" , "Miao Xie" , "Tsutomu Itoh" Subject: please review snapshot corruption path with delayed metadata insertion Date: Fri, 17 Jun 2011 17:12:28 -0400 Message-Id: <1308345034-sup-1969@shiny> User-Agent: Sup/git X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090208.4DFBC346.004A:SCFMA922111,ss=1,fgs=0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Fri, 17 Jun 2011 21:14:15 +0000 (UTC) Hi everyone, I think I tracked down the oops we were seeing Tsutomu Itoh's balance test. The delayed metadata insertion code was allowing delayed updates to queue up and be process after the snapshot was created. I've fixed this up by moving the delayed metadata run down into the snapshot creation code, please take a look. If nobody objects I'll have this in the pull I send to Linus this weekend. commit e999376f094162aa425ae749aa1df95ab928d010 Author: Chris Mason Date: Fri Jun 17 16:14:09 2011 -0400 Btrfs: avoid delayed metadata items during commits Snapshot creation has two phases. One is the initial snapshot setup, and the second is done during commit, while nobody is allowed to modify the root we are snapshotting. The delayed metadata insertion code can break that rule, it does a delayed inode update on the inode of the parent of the snapshot, and delayed directory item insertion. This makes sure to run the pending delayed operations before we record the snapshot root, which avoids corruptions. Signed-off-by: Chris Mason --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index fc515b7..f1cbd02 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1237,6 +1237,13 @@ again: return 0; } +void btrfs_assert_delayed_root_empty(struct btrfs_root *root) +{ + struct btrfs_delayed_root *delayed_root; + delayed_root = btrfs_get_delayed_root(root); + WARN_ON(btrfs_first_delayed_node(delayed_root)); +} + void btrfs_balance_delayed_items(struct btrfs_root *root) { struct btrfs_delayed_root *delayed_root; diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h index cb79b67..d1a6a29 100644 --- a/fs/btrfs/delayed-inode.h +++ b/fs/btrfs/delayed-inode.h @@ -137,4 +137,8 @@ int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent, /* for init */ int __init btrfs_delayed_inode_init(void); void btrfs_delayed_inode_exit(void); + +/* for debugging */ +void btrfs_assert_delayed_root_empty(struct btrfs_root *root); + #endif diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c073d85..51dcec8 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -957,6 +957,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, ret = btrfs_update_inode(trans, parent_root, parent_inode); BUG_ON(ret); + /* + * pull in the delayed directory update + * and the delayed inode item + * otherwise we corrupt the FS during + * snapshot + */ + ret = btrfs_run_delayed_items(trans, root); + BUG_ON(ret); + record_root_in_trans(trans, root); btrfs_set_root_last_snapshot(&root->root_item, trans->transid); memcpy(new_root_item, &root->root_item, sizeof(*new_root_item)); @@ -1018,14 +1027,6 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans, int ret; list_for_each_entry(pending, head, list) { - /* - * We must deal with the delayed items before creating - * snapshots, or we will create a snapthot with inconsistent - * information. - */ - ret = btrfs_run_delayed_items(trans, fs_info->fs_root); - BUG_ON(ret); - ret = create_pending_snapshot(trans, fs_info, pending); BUG_ON(ret); } @@ -1319,15 +1320,21 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, */ mutex_lock(&root->fs_info->reloc_mutex); - ret = create_pending_snapshots(trans, root->fs_info); + ret = btrfs_run_delayed_items(trans, root); BUG_ON(ret); - ret = btrfs_run_delayed_items(trans, root); + ret = create_pending_snapshots(trans, root->fs_info); BUG_ON(ret); ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1); BUG_ON(ret); + /* + * make sure none of the code above managed to slip in a + * delayed item + */ + btrfs_assert_delayed_root_empty(root); + WARN_ON(cur_trans != trans->transaction); btrfs_scrub_pause(root);