From patchwork Thu Oct 4 21:24:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA8F317E0 for ; Thu, 4 Oct 2018 21:31:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBC482963F for ; Thu, 4 Oct 2018 21:31:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B088629644; Thu, 4 Oct 2018 21:31:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D9112963F for ; Thu, 4 Oct 2018 21:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727354AbeJEE0J (ORCPT ); Fri, 5 Oct 2018 00:26:09 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45760 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726826AbeJEE0J (ORCPT ); Fri, 5 Oct 2018 00:26:09 -0400 Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 024C1202A9 for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Date: Thu, 4 Oct 2018 23:24:38 +0200 Message-Id: <20181004212443.26519-2-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP num_bytes is used to store the chunk length of the chunk that we're allocating. Do not reuse it for something really different in the same function. Signed-off-by: Hans van Kranenburg Reviewed-by: Nikolay Borisov --- fs/btrfs/volumes.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f4405e430da6..9c9bb127eeee 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4837,8 +4837,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, goto error_del_extent; for (i = 0; i < map->num_stripes; i++) { - num_bytes = map->stripes[i].dev->bytes_used + stripe_size; - btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes); + btrfs_device_set_bytes_used(map->stripes[i].dev, + map->stripes[i].dev->bytes_used + + stripe_size); } atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space); From patchwork Thu Oct 4 21:24:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3FBFB184E for ; Thu, 4 Oct 2018 21:31:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 311722963F for ; Thu, 4 Oct 2018 21:31:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 25A9729640; Thu, 4 Oct 2018 21:31:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C642A29645 for ; Thu, 4 Oct 2018 21:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727787AbeJEE0O (ORCPT ); Fri, 5 Oct 2018 00:26:14 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45761 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726113AbeJEE0K (ORCPT ); Fri, 5 Oct 2018 00:26:10 -0400 Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 0E182202EB for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Date: Thu, 4 Oct 2018 23:24:39 +0200 Message-Id: <20181004212443.26519-3-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP num_bytes is really a way too generic name for a variable in this function. There are a dozen other variables that hold a number of bytes as value. Give it a name that actually describes what it does, which is holding the size of the chunk that we're allocating. Signed-off-by: Hans van Kranenburg Reviewed-by: Nikolay Borisov --- fs/btrfs/volumes.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9c9bb127eeee..40fa85e68b1f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -123,6 +123,8 @@ const char *get_raid_name(enum btrfs_raid_types type) return btrfs_raid_array[type].raid_name; } + + static int init_first_rw_device(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info); @@ -4599,7 +4601,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 max_stripe_size; u64 max_chunk_size; u64 stripe_size; - u64 num_bytes; + u64 chunk_size; int ndevs; int i; int j; @@ -4801,9 +4803,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, map->type = type; map->sub_stripes = sub_stripes; - num_bytes = stripe_size * data_stripes; + chunk_size = stripe_size * data_stripes; - trace_btrfs_chunk_alloc(info, map, start, num_bytes); + trace_btrfs_chunk_alloc(info, map, start, chunk_size); em = alloc_extent_map(); if (!em) { @@ -4814,7 +4816,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags); em->map_lookup = map; em->start = start; - em->len = num_bytes; + em->len = chunk_size; em->block_start = 0; em->block_len = em->len; em->orig_block_len = stripe_size; @@ -4832,7 +4834,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, refcount_inc(&em->refs); write_unlock(&em_tree->lock); - ret = btrfs_make_block_group(trans, 0, type, start, num_bytes); + ret = btrfs_make_block_group(trans, 0, type, start, chunk_size); if (ret) goto error_del_extent; From patchwork Thu Oct 4 21:24:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 775F6184E for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68BA42963F for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5A50129640; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBA0A29640 for ; Thu, 4 Oct 2018 21:30:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727727AbeJEE0K (ORCPT ); Fri, 5 Oct 2018 00:26:10 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45762 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727089AbeJEE0K (ORCPT ); Fri, 5 Oct 2018 00:26:10 -0400 Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 1A9B82032C for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/6] btrfs: alloc_chunk: fix more DUP stripe size handling Date: Thu, 4 Oct 2018 23:24:40 +0200 Message-Id: <20181004212443.26519-4-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling" fixed calculating the stripe_size for a new DUP chunk. However, the same calculation reappears a bit later, and that one was not changed yet. The resulting bug that is exposed is that the newly allocated device extents ('stripes') can have a few MiB overlap with the next thing stored after them, which is another device extent or the end of the disk. The scenario in which this can happen is: * The block device for the filesystem is less than 10GiB in size. * The amount of contiguous free unallocated disk space chosen to use for chunk allocation is 20% of the total device size, or a few MiB more or less. An example: - The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB) - There's 1578MiB unallocated raw disk space left in one contiguous piece. In this case stripe_size is first calculated as 789MiB, (half of 1578MiB). Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we enter the if block. Now stripe_size value is immediately overwritten while calculating an adjusted value based on max_chunk_size, which ends up as 788MiB. Next, the value is rounded up to a 16MiB boundary, 800MiB, which is actually more than the value we had before. However, the last comparison fails to detect this, because it's comparing the value with the total amount of free space, which is about twice the size of stripe_size. In the example above, this means that the resulting raw disk space being allocated is 1600MiB, while only a gap of 1578MiB has been found. The second device extent object for this DUP chunk will overlap for 22MiB with whatever comes next. The underlying problem here is that the stripe_size is reused all the time for different things. So, when entering the code in the if block, stripe_size is immediately overwritten with something else. If later we decide we want to have the previous value back, then the logic to compute it was copy pasted in again. With this change, the value in stripe_size is not unnecessarily destroyed, so the duplicated calculation is not needed any more. Signed-off-by: Hans van Kranenburg --- fs/btrfs/volumes.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 40fa85e68b1f..7045814fc98d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4763,19 +4763,16 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, /* * Use the number of data stripes to figure out how big this chunk * is really going to be in terms of logical address space, - * and compare that answer with the max chunk size + * and compare that answer with the max chunk size. If it's higher, + * we try to reduce stripe_size. */ if (stripe_size * data_stripes > max_chunk_size) { - stripe_size = div_u64(max_chunk_size, data_stripes); - - /* bump the answer up to a 16MB boundary */ - stripe_size = round_up(stripe_size, SZ_16M); - - /* - * But don't go higher than the limits we found while searching - * for free extents + /* Reduce stripe_size, round it up to a 16MB boundary + * again and then use it, unless it ends up being even + * bigger than the previous value we had already. */ - stripe_size = min(devices_info[ndevs - 1].max_avail, + stripe_size = min(round_up(div_u64(max_chunk_size, + data_stripes), SZ_16M), stripe_size); } From patchwork Thu Oct 4 21:24:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B05517E0 for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B11729645 for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E9982964B; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9AEE82963F for ; Thu, 4 Oct 2018 21:30:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727532AbeJEE0K (ORCPT ); Fri, 5 Oct 2018 00:26:10 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45758 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726753AbeJEE0J (ORCPT ); Fri, 5 Oct 2018 00:26:09 -0400 Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 2718B20395 for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56 Date: Thu, 4 Oct 2018 23:24:41 +0200 Message-Id: <20181004212443.26519-5-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP RAID5 and RAID6 profile store one copy of the data, not 2 or 3. These values are not used anywhere by the way. Signed-off-by: Hans van Kranenburg Reviewed-by: Nikolay Borisov --- fs/btrfs/volumes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7045814fc98d..d82b3d735ebe 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -96,7 +96,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_min = 2, .tolerated_failures = 1, .devs_increment = 1, - .ncopies = 2, + .ncopies = 1, .raid_name = "raid5", .bg_flag = BTRFS_BLOCK_GROUP_RAID5, .mindev_error = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, @@ -108,7 +108,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_min = 3, .tolerated_failures = 2, .devs_increment = 1, - .ncopies = 3, + .ncopies = 1, .raid_name = "raid6", .bg_flag = BTRFS_BLOCK_GROUP_RAID6, .mindev_error = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, From patchwork Thu Oct 4 21:24:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626895 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5E4B14BD for ; Thu, 4 Oct 2018 21:31:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 985C12963F for ; Thu, 4 Oct 2018 21:31:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8D32529645; Thu, 4 Oct 2018 21:31:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 20DB02963F for ; Thu, 4 Oct 2018 21:31:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727012AbeJEE0J (ORCPT ); Fri, 5 Oct 2018 00:26:09 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45757 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726113AbeJEE0I (ORCPT ); Fri, 5 Oct 2018 00:26:08 -0400 X-Greylist: delayed 372 seconds by postgrey-1.27 at vger.kernel.org; Fri, 05 Oct 2018 00:26:08 EDT Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 32DFA2051E for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 5/6] btrfs: introduce nparity raid_attr Date: Thu, 4 Oct 2018 23:24:42 +0200 Message-Id: <20181004212443.26519-6-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Instead of hardcoding exceptions for RAID5 and RAID6 in the code, use an nparity field in raid_attr. Signed-off-by: Hans van Kranenburg --- fs/btrfs/volumes.c | 18 +++++++++++------- fs/btrfs/volumes.h | 2 ++ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d82b3d735ebe..453046497ac8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -37,6 +37,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies = 2, + .nparity = 0, .raid_name = "raid10", .bg_flag = BTRFS_BLOCK_GROUP_RAID10, .mindev_error = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET, @@ -49,6 +50,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies = 2, + .nparity = 0, .raid_name = "raid1", .bg_flag = BTRFS_BLOCK_GROUP_RAID1, .mindev_error = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, @@ -61,6 +63,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies = 2, + .nparity = 0, .raid_name = "dup", .bg_flag = BTRFS_BLOCK_GROUP_DUP, .mindev_error = 0, @@ -73,6 +76,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies = 1, + .nparity = 0, .raid_name = "raid0", .bg_flag = BTRFS_BLOCK_GROUP_RAID0, .mindev_error = 0, @@ -85,6 +89,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies = 1, + .nparity = 0, .raid_name = "single", .bg_flag = 0, .mindev_error = 0, @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 1, .ncopies = 1, + .nparity = 2, .raid_name = "raid5", .bg_flag = BTRFS_BLOCK_GROUP_RAID5, .mindev_error = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, @@ -109,6 +115,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 2, .devs_increment = 1, .ncopies = 1, + .nparity = 2, .raid_name = "raid6", .bg_flag = BTRFS_BLOCK_GROUP_RAID6, .mindev_error = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, @@ -4597,6 +4604,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int devs_min; /* min devs needed */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies; /* how many copies to data has */ + int nparity; /* number of stripes worth of bytes to + store parity information */ int ret; u64 max_stripe_size; u64 max_chunk_size; @@ -4623,6 +4632,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, devs_min = btrfs_raid_array[index].devs_min; devs_increment = btrfs_raid_array[index].devs_increment; ncopies = btrfs_raid_array[index].ncopies; + nparity = btrfs_raid_array[index].nparity; if (type & BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = SZ_1G; @@ -4752,13 +4762,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, * this will have to be fixed for RAID1 and RAID10 over * more drives */ - data_stripes = num_stripes / ncopies; - - if (type & BTRFS_BLOCK_GROUP_RAID5) - data_stripes = num_stripes - 1; - - if (type & BTRFS_BLOCK_GROUP_RAID6) - data_stripes = num_stripes - 2; + data_stripes = (num_stripes - nparity) / ncopies; /* * Use the number of data stripes to figure out how big this chunk diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 23e9285d88de..0fe005b4295a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -331,6 +331,8 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies; /* how many copies to data has */ + int nparity; /* number of stripes worth of bytes to + store parity information */ int mindev_error; /* error code if min devs requisite is unmet */ const char raid_name[8]; /* name of the raid */ u64 bg_flag; /* block group flag of the raid */ From patchwork Thu Oct 4 21:24:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hans van Kranenburg X-Patchwork-Id: 10626889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD3B614BD for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB24D2963F for ; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9FC5A29640; Thu, 4 Oct 2018 21:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCBF329644 for ; Thu, 4 Oct 2018 21:30:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727739AbeJEE0L (ORCPT ); Fri, 5 Oct 2018 00:26:11 -0400 Received: from smtp.dpl.mendix.net ([83.96.177.10]:45756 "EHLO smtp.dpl.mendix.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726570AbeJEE0K (ORCPT ); Fri, 5 Oct 2018 00:26:10 -0400 Received: from mekker.bofh.hq.mendix.net (mekker.bofh.hq.mendix.net [IPv6:2001:828:13c8:10b::21]) by smtp.dpl.mendix.net (Postfix) with ESMTP id 3FFE9209D4 for ; Thu, 4 Oct 2018 23:24:44 +0200 (CEST) From: Hans van Kranenburg To: linux-btrfs@vger.kernel.org Subject: [PATCH 6/6] btrfs: alloc_chunk: rework chunk/stripe calculations Date: Thu, 4 Oct 2018 23:24:43 +0200 Message-Id: <20181004212443.26519-7-hans.van.kranenburg@mendix.com> X-Mailer: git-send-email 2.19.0.329.g76f2f5c1e3 In-Reply-To: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> References: <20181004212443.26519-1-hans.van.kranenburg@mendix.com> MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Previously, the stripe_size variable was modified too many times in the __btrfs_alloc_chunk function. The most problematic place was the if block dealing with a chunk bigger than max_chunk_size, which would throw away (overwrite) the value of stripe_size, maybe realizing a few lines later that the previous value was actually better and executing a copy of former logic to try get it back in the previous state. Instead of on-the-fly calculating the target chunk size, adjust the max_stripe_size variable based on the max_chunk_size that was set before, and use that to simply compare it to stripe_size at some point. This removes the whole problematic if block. Signed-off-by: Hans van Kranenburg --- fs/btrfs/volumes.c | 46 +++++++++++++++++++++------------------------- fs/btrfs/volumes.h | 2 +- 2 files changed, 22 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 453046497ac8..862ee17ee0e5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4596,14 +4596,15 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, struct btrfs_device_info *devices_info = NULL; u64 total_avail; int num_stripes; /* total number of stripes to allocate */ - int data_stripes; /* number of stripes that count for - block group size */ + int num_data_stripes; /* number of stripes worth of bytes to + store data including copies */ int sub_stripes; /* sub_stripes info for map */ int dev_stripes; /* stripes per dev */ int devs_max; /* max devs to use */ int devs_min; /* min devs needed */ int devs_increment; /* ndevs has to be a multiple of this */ - int ncopies; /* how many copies to data has */ + int ncopies; /* how many times actual data is duplicated + inside num_data_stripes */ int nparity; /* number of stripes worth of bytes to store parity information */ int ret; @@ -4747,6 +4748,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, } ndevs = min(ndevs, devs_max); + num_stripes = ndevs * dev_stripes; + num_data_stripes = num_stripes - nparity; /* * The primary goal is to maximize the number of stripes, so use as @@ -4756,31 +4759,24 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, * max_avail is the total size so we have to adjust. */ stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes); - num_stripes = ndevs * dev_stripes; - - /* - * this will have to be fixed for RAID1 and RAID10 over - * more drives - */ - data_stripes = (num_stripes - nparity) / ncopies; /* - * Use the number of data stripes to figure out how big this chunk - * is really going to be in terms of logical address space, - * and compare that answer with the max chunk size. If it's higher, - * we try to reduce stripe_size. + * Now that we know how many stripes we're going to use, we can adjust + * down max_stripe_size if needed, paying attention to max_chunk_size. + * + * By multiplying chunk size with ncopies, we get the total amount of + * bytes that need to fit into all the non-parity stripes. + * + * A chunk is allowed to end up being a bit bigger than max_chunk_size + * when rounding up the stripe_size to a 16MiB boundary makes it so. + * Unless... it ends up being bigger than the amount of physical free + * space we can use for it. */ - if (stripe_size * data_stripes > max_chunk_size) { - /* Reduce stripe_size, round it up to a 16MB boundary - * again and then use it, unless it ends up being even - * bigger than the previous value we had already. - */ - stripe_size = min(round_up(div_u64(max_chunk_size, - data_stripes), SZ_16M), - stripe_size); - } + max_stripe_size = min(round_up((max_chunk_size * ncopies) / + num_data_stripes, SZ_16M), + max_stripe_size); - /* align to BTRFS_STRIPE_LEN */ + stripe_size = min(max_stripe_size, stripe_size); stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN); map = kmalloc(map_lookup_size(num_stripes), GFP_NOFS); @@ -4804,7 +4800,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, map->type = type; map->sub_stripes = sub_stripes; - chunk_size = stripe_size * data_stripes; + chunk_size = div_u64(stripe_size * num_data_stripes, ncopies); trace_btrfs_chunk_alloc(info, map, start, chunk_size); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0fe005b4295a..ee2ec77b1291 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -330,7 +330,7 @@ struct btrfs_raid_attr { int devs_min; /* min devs needed */ int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ - int ncopies; /* how many copies to data has */ + int ncopies; /* how many copies the data has */ int nparity; /* number of stripes worth of bytes to store parity information */ int mindev_error; /* error code if min devs requisite is unmet */