From patchwork Tue Jun 6 23:45:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 9770205 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7B44C6034B for ; Tue, 6 Jun 2017 23:45:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D0D4284E4 for ; Tue, 6 Jun 2017 23:45:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 51A5B2851F; Tue, 6 Jun 2017 23:45:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C26F6284E4 for ; Tue, 6 Jun 2017 23:45:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751494AbdFFXpv (ORCPT ); Tue, 6 Jun 2017 19:45:51 -0400 Received: from mail-pg0-f46.google.com ([74.125.83.46]:34337 "EHLO mail-pg0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751474AbdFFXpq (ORCPT ); Tue, 6 Jun 2017 19:45:46 -0400 Received: by mail-pg0-f46.google.com with SMTP id v18so29916111pgb.1 for ; Tue, 06 Jun 2017 16:45:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references; bh=WFXKwx4WDAROT/p4c6Njrd1FdGWXwTQowh6lsF9ZAvg=; b=ndiJvmQKFyZfQrKnieoeleQt9Q7BRuqqwGMG1CwwYls7aj4/NqtNFG0VJTINIIeJsR ehAIhVcag8zyDK12PD2ssrPQF7qA9kZSVzOLzCXvswWdircPIo8nMbBGDl24l/1I5kXD VWmMTJO9fYPzx6jtmNE8LpcNoYDv9zcmZt+qLCjqFShXjUsEoSopO7duFNsfmv6Sdoli 4JszdmPVUNrITvpjIsHi1q+YU3PcKIMCUvS2mOH3v5OJN5TTJXasLXdvt+7LRh5pY3wl lLEAYROvJTrUxHWEfRPFSHUAmgkHyLEqPSrxNTL6jhnOrAYvx/7G4vEH+kTqQgoOgVl3 UtsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=WFXKwx4WDAROT/p4c6Njrd1FdGWXwTQowh6lsF9ZAvg=; b=sXDqfXOeTeuTv0HEnIybFseQs0ekUs0fMbQxAkUF6nct8D8JzdxxrCg245Cx2EpNie 266ZNqrsm7e0iJuQ+lsPPyNTgFVZuaP7hyupl9jM+0+tnDMQLeekj2b3Lrfx5OPn13yx HDvb9istD+YuYtvm7Z1+LAHHv1QAIrKMBd/6Jk+hpQnWB//+xV6zqnT2X9aDdlm27C+4 7YzFMzd30n+vSyMFjxJcbBZboJF211BmvgWq5rD4Mjot3n1J5hfvrQgRbyH4txcc2ir9 KlJL9tLZnoIaqa7PkuHzvuxCjF59B5mvtnCu+xEc/GLL9naaYHHI1+koF8R1nDbwDqQb Avcw== X-Gm-Message-State: AODbwcA403Qki8UButmlaVHh/aSwHhIxAC2NFYfN9LrP6gjjZ438nBsK /4e+EegGNleKLTn1KL8vbA== X-Received: by 10.99.8.1 with SMTP id 1mr24997415pgi.15.1496792745952; Tue, 06 Jun 2017 16:45:45 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::3:f879]) by smtp.gmail.com with ESMTPSA id u194sm60175680pgc.2.2017.06.06.16.45.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Jun 2017 16:45:45 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: Josef Bacik , Liu Bo , kernel-team@fb.com Subject: [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting Date: Tue, 6 Jun 2017 16:45:31 -0700 Message-Id: X-Mailer: git-send-email 2.13.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval The total_bytes_pinned counter is completely broken when accounting delayed refs: - If two drops for the same extent are merged, we will decrement total_bytes_pinned twice but only increment it once. - If an add is merged into a drop or vice versa, we will decrement the total_bytes_pinned counter but never increment it. - If multiple references to an extent are dropped, we will account it multiple times, potentially vastly over-estimating the number of bytes that will be freed by a commit and doing unnecessary work when we're close to ENOSPC. The last issue is relatively minor, but the first two make the total_bytes_pinned counter leak or underflow very often. These accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu to keep track of possibly pinned bytes"), but they were papered over by zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix race of using total_bytes_pinned"). We need to make sure that an extent is accounted as pinned exactly once if and only if we will drop references to it when when the transaction is committed. Ideally we would only add to total_bytes_pinned when the *last* reference is dropped, but this information isn't readily available for data extents. Again, this over-estimation can lead to extra commits when we're close to ENOSPC, but it's not as bad as before. The fix implemented here is to increment total_bytes_pinned when the total refmod count for an extent goes negative and decrement it if the refmod count goes back to non-negative or after we've run all of the delayed refs for that extent. Signed-off-by: Omar Sandoval Reviewed-by: Liu Bo --- fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6dce7abafe84..75ad24f8d253 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2112,6 +2112,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset) { + int old_ref_mod, new_ref_mod; int ret; BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID && @@ -2122,14 +2123,18 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, num_bytes, parent, root_objectid, (int)owner, BTRFS_ADD_DELAYED_REF, NULL, - NULL, NULL); + &old_ref_mod, &new_ref_mod); } else { ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr, num_bytes, parent, root_objectid, owner, offset, - 0, BTRFS_ADD_DELAYED_REF, NULL, - NULL); + 0, BTRFS_ADD_DELAYED_REF, + &old_ref_mod, &new_ref_mod); } + + if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0) + add_pinned_bytes(fs_info, -num_bytes, owner, root_objectid); + return ret; } @@ -2433,6 +2438,16 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, head = btrfs_delayed_node_to_head(node); trace_run_delayed_ref_head(fs_info, node, head, node->action); + if (head->total_ref_mod < 0) { + struct btrfs_block_group_cache *cache; + + cache = btrfs_lookup_block_group(fs_info, node->bytenr); + ASSERT(cache); + percpu_counter_add(&cache->space_info->total_bytes_pinned, + -node->num_bytes); + btrfs_put_block_group(cache); + } + if (insert_reserved) { btrfs_pin_extent(fs_info, node->bytenr, node->num_bytes, 1); @@ -6269,6 +6284,8 @@ static int update_block_group(struct btrfs_trans_handle *trans, trace_btrfs_space_reservation(info, "pinned", cache->space_info->flags, num_bytes, 1); + percpu_counter_add(&cache->space_info->total_bytes_pinned, + num_bytes); set_extent_dirty(info->pinned_extents, bytenr, bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL); @@ -7038,8 +7055,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, goto out; } } - add_pinned_bytes(info, -num_bytes, owner_objectid, - root_objectid); } else { if (found_extent) { BUG_ON(is_data && refs_to_drop != @@ -7171,13 +7186,16 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, int ret; if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { + int old_ref_mod, new_ref_mod; + ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start, buf->len, parent, root->root_key.objectid, btrfs_header_level(buf), BTRFS_DROP_DELAYED_REF, NULL, - NULL, NULL); + &old_ref_mod, &new_ref_mod); BUG_ON(ret); /* -ENOMEM */ + pin = old_ref_mod >= 0 && new_ref_mod < 0; } if (last_ref && btrfs_header_generation(buf) == trans->transid) { @@ -7226,12 +7244,12 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset) { + int old_ref_mod, new_ref_mod; int ret; if (btrfs_is_testing(fs_info)) return 0; - add_pinned_bytes(fs_info, num_bytes, owner, root_objectid); /* * tree log blocks never actually go into the extent allocation @@ -7241,20 +7259,25 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID); /* unlocks the pinned mutex */ btrfs_pin_extent(fs_info, bytenr, num_bytes, 1); + old_ref_mod = new_ref_mod = 0; ret = 0; } else if (owner < BTRFS_FIRST_FREE_OBJECTID) { ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr, num_bytes, parent, root_objectid, (int)owner, BTRFS_DROP_DELAYED_REF, NULL, - NULL, NULL); + &old_ref_mod, &new_ref_mod); } else { ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr, num_bytes, parent, root_objectid, owner, offset, 0, BTRFS_DROP_DELAYED_REF, - NULL, NULL); + &old_ref_mod, &new_ref_mod); } + + if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0) + add_pinned_bytes(fs_info, num_bytes, owner, root_objectid); + return ret; }