From patchwork Tue Jul 5 09:32:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 9213913 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 37B4C60752 for ; Tue, 5 Jul 2016 09:33:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 285B028997 for ; Tue, 5 Jul 2016 09:33:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1D1D128999; Tue, 5 Jul 2016 09:33:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8A0E28997 for ; Tue, 5 Jul 2016 09:33:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754314AbcGEJdD (ORCPT ); Tue, 5 Jul 2016 05:33:03 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:51306 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751070AbcGEJcY (ORCPT ); Tue, 5 Jul 2016 05:32:24 -0400 X-IronPort-AV: E=Sophos;i="5.20,367,1444665600"; d="scan'208";a="656058" Received: from unknown (HELO cn.fujitsu.com) ([10.167.250.3]) by song.cn.fujitsu.com with ESMTP; 05 Jul 2016 17:32:12 +0800 Received: from localhost.localdomain (unknown [10.167.226.34]) by cn.fujitsu.com (Postfix) with ESMTP id 320674056406 for ; Tue, 5 Jul 2016 17:32:10 +0800 (CST) From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH v2 2/3] btrfs: relocation: Fix leaking qgroups numbers on data extents Date: Tue, 5 Jul 2016 17:32:06 +0800 Message-Id: <20160705093207.14983-3-quwenruo@cn.fujitsu.com> X-Mailer: git-send-email 2.9.0 In-Reply-To: <20160705093207.14983-1-quwenruo@cn.fujitsu.com> References: <20160705093207.14983-1-quwenruo@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 320674056406.A7289 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: quwenruo@cn.fujitsu.com Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When balancing data extents, qgroup will leak all its numbers for balanced data extents. The root cause is that balancing is doing non-standard and almost insane tree block swap hack. The problem happens in the following steps: (Use 4M as original data extent size, and 257 as src root objectid) 1) Balance create new data extents and increase its refs Balance will alloc new data extent and create new EXTENT_DATA in data reloc tree, while its refernece is increased with 2 different referencer: 257 and data reloc tree. While at that time, file tree is still referring to old extents. So, the backref walker will only find data reloc tree as referencer for the new data extent, which doesn't count as file tree. Extent bytenr | Real referencer | backrefs | nr_old_roots | ----------------------------------------------------------------------- New | Data reloc | Data reloc + 257 | 0 | Old | 257 | 257 | 1 | Qgroup number: 4M + metadata 2) Commit trans before merging reloc tree Then we goes to prepare_to_merge(), which will commit transacation. In the qgroup update codes inside commit_transaction, although backref walk codes find the new data extents has 2 backref, but file tree backref can't find referencer(file tree is still referring to old extents), and data reloc doesn't count as file tree. So for nr_old_roots and nr_new_roots of that 2 extents: Extent bytenr | nr_old_roots | nr_new_roots | qgroup change | ------------------------------------------------------------------------| New | 0(doesn't exist) | 0(data reloc tree) | 0 | Old | 1(root 257) | 1(257) | 0 | Qgroup number: 4M + metadata +-0 = 4M + metadata 3) Swap tree blocks and free old tree blocks Then we goes to merge_reloc_roots(), which swaps the tree blocks directly, and free the old tree blocks. Freeing tree blocks will also free its data extents, this goes through normal routine, and qgroup handles it well, decreasing the numbers. And since new data extent is not updated here (updated in step 1), so qgroup won't scan new data extent. Extent bytenr | nr_old_roots | nr_new_roots | qgroup change | ------------------------------------------------------------------| New |---No modification, doesn't go through qgroup----|Error Old | 1(root 257) | 0(doesn't exist) | -4M | In fact, New extent's owner changed from data reloc to root 257, which must go through qgroup accounting. But the hideous hack of replacing path doesn't do it. Qgroup number: 4M + metadata -4M = metadata This patch will fix it by re-dirtying new extent at step 3), so backref walk and qgroup can get correct result. After this patch, step 3) will be: Extent bytenr | nr_old_roots | nr_new_roots | qgroup change | ----------------------------------------------------------------------| New | 0(data reloc tree) | 1(root 257) | +4M | Old | 1(root 257) | 0(doesn't exist) | -4M | And Qgroup number will remain correct: 4M + metadata +4M -4M = 4M + metadata So we only need to ensure we don't miss some extents. Reported-by: Mark Fasheh Reported-by: Filipe Manana Signed-off-by: Qu Wenruo --- v2: Use refactored interface. Update comment and commit message --- fs/btrfs/relocation.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 0477dca..0a7ddf1 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -31,6 +31,7 @@ #include "async-thread.h" #include "free-space-cache.h" #include "inode-map.h" +#include "qgroup.h" /* * backref_node, mapping_node and tree_block start with this @@ -1750,6 +1751,68 @@ int memcmp_node_keys(struct extent_buffer *eb, int slot, } /* + * Helper function to fixup screwed qgroups caused by increased extent ref, + * which doesn't follow normal extent ref update behavior. + * (Correct behavior is, increase extent ref and modify source root in + * one trans) + * No better solution as long as we're doing swapping trick to do balance. + */ +static int qgroup_redirty_data_extents(struct btrfs_trans_handle *trans, + struct btrfs_root *root, u64 bytenr, + u64 gen) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct extent_buffer *leaf; + struct btrfs_delayed_ref_root *delayed_refs; + int slot; + int ret = 0; + + if (!fs_info->quota_enabled || !is_fstree(root->objectid)) + return 0; + if (WARN_ON(!trans)) + return -EINVAL; + + delayed_refs = &trans->transaction->delayed_refs; + + leaf = read_tree_block(root, bytenr, gen); + if (IS_ERR(leaf)) { + return PTR_ERR(leaf); + } else if (!extent_buffer_uptodate(leaf)) { + ret = -EIO; + goto out; + } + + /* We only care leaf, which may contains EXTENT_DATA */ + if (btrfs_header_level(leaf) != 0) + goto out; + + for (slot = 0; slot < btrfs_header_nritems(leaf); slot++) { + struct btrfs_key key; + struct btrfs_file_extent_item *fi; + + btrfs_item_key_to_cpu(leaf, &key, slot); + if (key.type != BTRFS_EXTENT_DATA_KEY) + continue; + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + if (btrfs_file_extent_type(leaf, fi) == + BTRFS_FILE_EXTENT_INLINE || + btrfs_file_extent_disk_bytenr(leaf, fi) == 0) + continue; + + ret = btrfs_qgroup_record_dirty_extent(trans, fs_info, + btrfs_file_extent_disk_bytenr(leaf, fi), + btrfs_file_extent_disk_num_bytes(leaf, fi), + GFP_NOFS); + if (ret < 0) + break; + } +out: + free_extent_buffer(leaf); + return ret; + +} + +/* * try to replace tree blocks in fs tree with the new blocks * in reloc tree. tree blocks haven't been modified since the * reloc tree was create can be replaced. @@ -1919,7 +1982,25 @@ again: 0); BUG_ON(ret); + /* + * The relocated EXTENT_DATA inside new tree block is allocated + * from data reloc tree, which doesn't affect qgroup. + * + * While after path replacement, EXTENT_DATA in the tree block + * is accounted as fs/file tree, which does affect qgroup. + * + * So here we need to info qgroup to re-account all these + * possible owner-changed EXTENT_DATA, to make qgroup + * correct again. + * + * XXX: Why balance is always using such hacks? Isn't there any + * more sane method? + */ + ret = qgroup_redirty_data_extents(trans, dest, new_bytenr, + new_ptr_gen); btrfs_unlock_up_safe(path, 0); + if (ret < 0) + break; ret = level; break;