From patchwork Mon Apr 13 19:22:48 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 6211061 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 290F6BF4A7 for ; Mon, 13 Apr 2015 19:23:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D26C5202E6 for ; Mon, 13 Apr 2015 19:23:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6591420340 for ; Mon, 13 Apr 2015 19:23:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932400AbbDMTXG (ORCPT ); Mon, 13 Apr 2015 15:23:06 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:8478 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754547AbbDMTXB (ORCPT ); Mon, 13 Apr 2015 15:23:01 -0400 Received: from pps.filterd (m0004060 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id t3DJKxrE010937 for ; Mon, 13 Apr 2015 12:23:00 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=dlWAKZUM5GbG/r0o730Syv5zxsgviEOyjwXWGUWaG3Y=; b=UltAJMxK9PCZroO2LnTlSNxYKPNvPphSfe84ldDX0cGd788YSmWwex9+LJ1VZdyEyehB RLPRUL/zQj8AZHcZ5m2XRauJc+Mftn1zFPjyFBJ2QjjvE5Vi4rEuvoy7zI7FKsiD0Enp 5lHQtgf8Jpggu4a/43sBCck8H5e5mcren88= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0b-00082601.pphosted.com with ESMTP id 1trj5hrfn7-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Mon, 13 Apr 2015 12:23:00 -0700 Received: from mx-out.facebook.com (192.168.52.13) by PRN-CHUB07.TheFacebook.com (192.168.16.17) with Microsoft SMTP Server (TLS) id 14.3.195.1; Mon, 13 Apr 2015 12:22:59 -0700 Received: from facebook.com (2401:db00:20:702c:face:0:2f:0) by mx-out.facebook.com (10.212.232.59) with ESMTP id 7de533cae21211e499fe0002c991e86a-37bac2c0 for ; Mon, 13 Apr 2015 12:22:57 -0700 Received: by devbig264.prn2.facebook.com (Postfix, from userid 8731) id 658DC7C12CC; Mon, 13 Apr 2015 12:22:57 -0700 (PDT) From: Chris Mason To: Subject: [PATCH 1/5] Btrfs: account for crcs in delayed ref processing Date: Mon, 13 Apr 2015 12:22:48 -0700 Message-ID: <1428952972-1266934-2-git-send-email-clm@fb.com> X-Mailer: git-send-email 1.8.1 In-Reply-To: <1428952972-1266934-1-git-send-email-clm@fb.com> References: <1428952972-1266934-1-git-send-email-clm@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68, 1.0.33, 0.0.0000 definitions=2015-04-13_04:2015-04-10, 2015-04-13, 1970-01-01 signatures=0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Josef Bacik As we delete large extents, we end up doing huge amounts of COW in order to delete the corresponding crcs. This adds accounting so that we keep track of that space and flushing of delayed refs so that we don't build up too much delayed crc work. This helps limit the delayed work that must be done at commit time and tries to avoid ENOSPC aborts because the crcs eat all the global reserves. Signed-off-by: Chris Mason --- fs/btrfs/delayed-ref.c | 22 ++++++++++++++++++++-- fs/btrfs/delayed-ref.h | 10 ++++++++++ fs/btrfs/extent-tree.c | 46 +++++++++++++++++++++++++++++++--------------- fs/btrfs/inode.c | 25 ++++++++++++++++++------- fs/btrfs/transaction.c | 4 ++++ 5 files changed, 83 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 6d16bea..8f8ed7d 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -489,11 +489,13 @@ update_existing_ref(struct btrfs_trans_handle *trans, * existing and update must have the same bytenr */ static noinline void -update_existing_head_ref(struct btrfs_delayed_ref_node *existing, +update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs, + struct btrfs_delayed_ref_node *existing, struct btrfs_delayed_ref_node *update) { struct btrfs_delayed_ref_head *existing_ref; struct btrfs_delayed_ref_head *ref; + int old_ref_mod; existing_ref = btrfs_delayed_node_to_head(existing); ref = btrfs_delayed_node_to_head(update); @@ -541,7 +543,20 @@ update_existing_head_ref(struct btrfs_delayed_ref_node *existing, * only need the lock for this case cause we could be processing it * currently, for refs we just added we know we're a-ok. */ + old_ref_mod = existing_ref->total_ref_mod; existing->ref_mod += update->ref_mod; + existing_ref->total_ref_mod += update->ref_mod; + + /* + * If we are going to from a positive ref mod to a negative or vice + * versa we need to make sure to adjust pending_csums accordingly. + */ + if (existing_ref->is_data) { + if (existing_ref->total_ref_mod >= 0 && old_ref_mod < 0) + delayed_refs->pending_csums -= existing->num_bytes; + if (existing_ref->total_ref_mod < 0 && old_ref_mod >= 0) + delayed_refs->pending_csums += existing->num_bytes; + } spin_unlock(&existing_ref->lock); } @@ -605,6 +620,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, head_ref->is_data = is_data; head_ref->ref_root = RB_ROOT; head_ref->processing = 0; + head_ref->total_ref_mod = count_mod; spin_lock_init(&head_ref->lock); mutex_init(&head_ref->mutex); @@ -614,7 +630,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node); if (existing) { - update_existing_head_ref(&existing->node, ref); + update_existing_head_ref(delayed_refs, &existing->node, ref); /* * we've updated the existing ref, free the newly * allocated ref @@ -622,6 +638,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); head_ref = existing; } else { + if (is_data && count_mod < 0) + delayed_refs->pending_csums += num_bytes; delayed_refs->num_heads++; delayed_refs->num_heads_ready++; atomic_inc(&delayed_refs->num_entries); diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index a764e23..5eb0892 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -88,6 +88,14 @@ struct btrfs_delayed_ref_head { struct rb_node href_node; struct btrfs_delayed_extent_op *extent_op; + + /* + * This is used to track the final ref_mod from all the refs associated + * with this head ref, this is not adjusted as delayed refs are run, + * this is meant to track if we need to do the csum accounting or not. + */ + int total_ref_mod; + /* * when a new extent is allocated, it is just reserved in memory * The actual extent isn't inserted into the extent allocation tree @@ -138,6 +146,8 @@ struct btrfs_delayed_ref_root { /* total number of head nodes ready for processing */ unsigned long num_heads_ready; + u64 pending_csums; + /* * set when the tree is flushing before a transaction commit, * used by the throttling code to decide if new updates need diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 41e5812..a6f88eb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2538,6 +2538,12 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, * list before we release it. */ if (btrfs_delayed_ref_is_head(ref)) { + if (locked_ref->is_data && + locked_ref->total_ref_mod < 0) { + spin_lock(&delayed_refs->lock); + delayed_refs->pending_csums -= ref->num_bytes; + spin_unlock(&delayed_refs->lock); + } btrfs_delayed_ref_unlock(locked_ref); locked_ref = NULL; } @@ -2626,11 +2632,31 @@ static inline u64 heads_to_leaves(struct btrfs_root *root, u64 heads) return div_u64(num_bytes, BTRFS_LEAF_DATA_SIZE(root)); } +/* + * Takes the number of bytes to be csumm'ed and figures out how many leaves it + * would require to store the csums for that many bytes. + */ +static u64 csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes) +{ + u64 csum_size; + u64 num_csums_per_leaf; + u64 num_csums; + + csum_size = BTRFS_LEAF_DATA_SIZE(root) - sizeof(struct btrfs_item); + num_csums_per_leaf = div64_u64(csum_size, + (u64)btrfs_super_csum_size(root->fs_info->super_copy)); + num_csums = div64_u64(csum_bytes, root->sectorsize); + num_csums += num_csums_per_leaf - 1; + num_csums = div64_u64(num_csums, num_csums_per_leaf); + return num_csums; +} + int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_block_rsv *global_rsv; u64 num_heads = trans->transaction->delayed_refs.num_heads_ready; + u64 csum_bytes = trans->transaction->delayed_refs.pending_csums; u64 num_bytes; int ret = 0; @@ -2639,6 +2665,7 @@ int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans, if (num_heads > 1) num_bytes += (num_heads - 1) * root->nodesize; num_bytes <<= 1; + num_bytes += csum_bytes_to_leaves(root, csum_bytes) * root->nodesize; global_rsv = &root->fs_info->global_block_rsv; /* @@ -5065,30 +5092,19 @@ static u64 calc_csum_metadata_size(struct inode *inode, u64 num_bytes, int reserve) { struct btrfs_root *root = BTRFS_I(inode)->root; - u64 csum_size; - int num_csums_per_leaf; - int num_csums; - int old_csums; + u64 old_csums, num_csums; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM && BTRFS_I(inode)->csum_bytes == 0) return 0; - old_csums = (int)div_u64(BTRFS_I(inode)->csum_bytes, root->sectorsize); + old_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes); + if (reserve) BTRFS_I(inode)->csum_bytes += num_bytes; else BTRFS_I(inode)->csum_bytes -= num_bytes; - csum_size = BTRFS_LEAF_DATA_SIZE(root) - sizeof(struct btrfs_item); - num_csums_per_leaf = (int)div_u64(csum_size, - sizeof(struct btrfs_csum_item) + - sizeof(struct btrfs_disk_key)); - num_csums = (int)div_u64(BTRFS_I(inode)->csum_bytes, root->sectorsize); - num_csums = num_csums + num_csums_per_leaf - 1; - num_csums = num_csums / num_csums_per_leaf; - - old_csums = old_csums + num_csums_per_leaf - 1; - old_csums = old_csums / num_csums_per_leaf; + num_csums = csum_bytes_to_leaves(root, BTRFS_I(inode)->csum_bytes); /* No change, no need to reserve more */ if (old_csums == num_csums) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e3fe137f..cec23cf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4197,9 +4197,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, int extent_type = -1; int ret; int err = 0; - int be_nice = 0; u64 ino = btrfs_ino(inode); u64 bytes_deleted = 0; + bool be_nice = 0; + bool should_throttle = 0; BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY); @@ -4405,19 +4406,20 @@ delete: btrfs_header_owner(leaf), ino, extent_offset, 0); BUG_ON(ret); - if (be_nice && pending_del_nr && - (pending_del_nr % 16 == 0) && - bytes_deleted > 1024 * 1024) { + if (btrfs_should_throttle_delayed_refs(trans, root)) btrfs_async_run_delayed_refs(root, trans->delayed_ref_updates * 2, 0); - } } if (found_type == BTRFS_INODE_ITEM_KEY) break; + should_throttle = + btrfs_should_throttle_delayed_refs(trans, root); + if (path->slots[0] == 0 || - path->slots[0] != pending_del_slot) { + path->slots[0] != pending_del_slot || + (be_nice && should_throttle)) { if (pending_del_nr) { ret = btrfs_del_items(trans, root, path, pending_del_slot, @@ -4430,6 +4432,15 @@ delete: pending_del_nr = 0; } btrfs_release_path(path); + if (be_nice && should_throttle) { + unsigned long updates = trans->delayed_ref_updates; + if (updates) { + trans->delayed_ref_updates = 0; + ret = btrfs_run_delayed_refs(trans, root, updates * 2); + if (ret && !err) + err = ret; + } + } goto search_again; } else { path->slots[0]--; @@ -4449,7 +4460,7 @@ error: btrfs_free_path(path); - if (be_nice && bytes_deleted > 32 * 1024 * 1024) { + if (be_nice && btrfs_should_throttle_delayed_refs(trans, root)) { unsigned long updates = trans->delayed_ref_updates; if (updates) { trans->delayed_ref_updates = 0; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ba831ee..8b9eea8 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -64,6 +64,9 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction) if (atomic_dec_and_test(&transaction->use_count)) { BUG_ON(!list_empty(&transaction->list)); WARN_ON(!RB_EMPTY_ROOT(&transaction->delayed_refs.href_root)); + if (transaction->delayed_refs.pending_csums) + printk(KERN_ERR "pending csums is %llu\n", + transaction->delayed_refs.pending_csums); while (!list_empty(&transaction->pending_chunks)) { struct extent_map *em; @@ -223,6 +226,7 @@ loop: cur_trans->delayed_refs.href_root = RB_ROOT; atomic_set(&cur_trans->delayed_refs.num_entries, 0); cur_trans->delayed_refs.num_heads_ready = 0; + cur_trans->delayed_refs.pending_csums = 0; cur_trans->delayed_refs.num_heads = 0; cur_trans->delayed_refs.flushing = 0; cur_trans->delayed_refs.run_delayed_start = 0;